Class: TwitterCldr::Segmentation::RuleSetBuilder
- Inherits:
-
Object
- Object
- TwitterCldr::Segmentation::RuleSetBuilder
- Defined in:
- lib/twitter_cldr/segmentation/rule_set_builder.rb
Class Method Summary collapse
-
.exception_rule_for(locale, boundary_type) ⇒ Object
See the comment above exceptions_for.
-
.implicit_end_of_text_rule ⇒ Object
The implicit initial rules are always “start-of-text ÷” and “÷ end-of-text”.
-
.implicit_final_rule ⇒ Object
The implicit final rule is always “Any ÷ Any”.
- .load(locale, boundary_type, options = {}) ⇒ Object
Class Method Details
.exception_rule_for(locale, boundary_type) ⇒ Object
See the comment above exceptions_for. Basically, we only support exceptions for the “sentence” boundary type since the ULI JSON data doesn’t distinguish between boundary types.
19 20 21 22 23 24 25 26 27 28 |
# File 'lib/twitter_cldr/segmentation/rule_set_builder.rb', line 19 def exception_rule_for(locale, boundary_type) cache_key = TwitterCldr::Utils.compute_cache_key(locale, boundary_type) exceptions_cache[cache_key] ||= begin exceptions = exceptions_for(locale, boundary_type) regex_contents = exceptions.map { |exc| Regexp.escape(exc) }.join("|") parse("(?:#{regex_contents}) ×", nil).tap do |rule| rule.id = 0 end end end |
.implicit_end_of_text_rule ⇒ Object
The implicit initial rules are always “start-of-text ÷” and “÷ end-of-text”. We don’t need the start-of-text one.
40 41 42 43 44 45 |
# File 'lib/twitter_cldr/segmentation/rule_set_builder.rb', line 40 def implicit_end_of_text_rule @implicit_end_of_text_rule ||= parse('.\z ÷', nil).tap do |rule| rule.id = 9998 end end |
.implicit_final_rule ⇒ Object
The implicit final rule is always “Any ÷ Any”
31 32 33 34 35 36 |
# File 'lib/twitter_cldr/segmentation/rule_set_builder.rb', line 31 def implicit_final_rule @implicit_final_rule ||= parse('. ÷ .', nil).tap do |rule| rule.id = 9999 end end |