Class: TwitterCldr::Shared::UnicodeRegex

Inherits:
Object
  • Object
show all
Extended by:
Forwardable
Defined in:
lib/twitter_cldr/shared/unicode_regex.rb

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(elements, modifiers = nil) ⇒ UnicodeRegex

Returns a new instance of UnicodeRegex.



58
59
60
61
# File 'lib/twitter_cldr/shared/unicode_regex.rb', line 58

def initialize(elements, modifiers = nil)
  @elements = elements
  @modifiers = modifiers
end

Instance Attribute Details

#elementsObject (readonly)

Returns the value of attribute elements.



56
57
58
# File 'lib/twitter_cldr/shared/unicode_regex.rb', line 56

def elements
  @elements
end

#modifiersObject (readonly)

Returns the value of attribute modifiers.



56
57
58
# File 'lib/twitter_cldr/shared/unicode_regex.rb', line 56

def modifiers
  @modifiers
end

Class Method Details

.all_unicodeObject

All unicode characters



21
22
23
24
25
# File 'lib/twitter_cldr/shared/unicode_regex.rb', line 21

def all_unicode
  @all_unicode ||= TwitterCldr::Utils::RangeSet.new(
    [0..0x10FFFF]
  )
end

.compile(str, modifiers = "", symbol_table = nil) ⇒ Object



12
13
14
15
16
17
18
# File 'lib/twitter_cldr/shared/unicode_regex.rb', line 12

def compile(str, modifiers = "", symbol_table = nil)
  new(
    parser.parse(tokenizer.tokenize(str), {
      symbol_table: symbol_table
    }), modifiers
  )
end

.invalid_regexp_charsObject

A few <control> characters (i.e. 2..7) and public/private surrogates (i.e. 55296..57343). These don’t play nicely with Ruby’s regular expression engine, and I think we can safely disregard them.



30
31
32
33
34
# File 'lib/twitter_cldr/shared/unicode_regex.rb', line 30

def invalid_regexp_chars
  @invalid_regexp_chars ||= TwitterCldr::Utils::RangeSet.new(
    [2..7, 55296..57343]
  )
end

.valid_regexp_charsObject



36
37
38
# File 'lib/twitter_cldr/shared/unicode_regex.rb', line 36

def valid_regexp_chars
  @valid_regexp_chars ||= all_unicode.subtract(invalid_regexp_chars)
end

Instance Method Details

#to_regexpObject



63
64
65
# File 'lib/twitter_cldr/shared/unicode_regex.rb', line 63

def to_regexp
  @regexp ||= Regexp.new(to_regexp_str, modifier_union)
end

#to_regexp_strObject



67
68
69
# File 'lib/twitter_cldr/shared/unicode_regex.rb', line 67

def to_regexp_str
  @regexp_str ||= elements.map(&:to_regexp_str).join
end

#to_sObject



71
72
73
74
75
# File 'lib/twitter_cldr/shared/unicode_regex.rb', line 71

def to_s
  @elements.inject('') do |ret, element|
    ret + element.to_s
  end
end