Class: Twitter::Regex

Inherits:
Object
  • Object
show all
Defined in:
lib/regex.rb

Overview

A collection of regular expressions for parsing Tweet text. The regular expression list is frozen at load time to ensure immutability. These reular expressions are used throughout the Twitter classes. Special care has been taken to make sure these reular expressions work with Tweets in all languages.

Constant Summary collapse

REGEXEN =

:nodoc:

{}
UNICODE_SPACES =

Space is more than %20, U+3000 for example is the full-width space used with Kanji. Provide a short-hand to access both the list of characters and a pattern suitible for use with String#split

Taken from: ActiveSupport::Multibyte::Handlers::UTF8Handler::UNICODE_WHITESPACE
[
  (0x0009..0x000D).to_a,  # White_Space # Cc   [5] <control-0009>..<control-000D>
  0x0020,          # White_Space # Zs       SPACE
  0x0085,          # White_Space # Cc       <control-0085>
  0x00A0,          # White_Space # Zs       NO-BREAK SPACE
  0x1680,          # White_Space # Zs       OGHAM SPACE MARK
  0x180E,          # White_Space # Zs       MONGOLIAN VOWEL SEPARATOR
  (0x2000..0x200A).to_a, # White_Space # Zs  [11] EN QUAD..HAIR SPACE
  0x2028,          # White_Space # Zl       LINE SEPARATOR
  0x2029,          # White_Space # Zp       PARAGRAPH SEPARATOR
  0x202F,          # White_Space # Zs       NARROW NO-BREAK SPACE
  0x205F,          # White_Space # Zs       MEDIUM MATHEMATICAL SPACE
  0x3000,          # White_Space # Zs       IDEOGRAPHIC SPACE
].flatten.freeze
LATIN_ACCENTS =

Latin accented characters (subtracted 0xD7 from the range, it’s a confusable multiplication sign. Looks like “x”)

[(0xc0..0xd6).to_a, (0xd8..0xf6).to_a, (0xf8..0xff).to_a].flatten.pack('U*').freeze
HASHTAG_CHARACTERS =

Characters considered valid in a hashtag but not at the beginning, where only a-z and 0-9 are valid.

/[a-z0-9_#{LATIN_ACCENTS}]/io

Class Method Summary collapse

Class Method Details

.[](key) ⇒ Object

Return the regular expression for a given key. If the key is not a known symbol a nil will be returned.



71
72
73
# File 'lib/regex.rb', line 71

def self.[](key)
  REGEXEN[key]
end