Class: Regex::Character
- Inherits:
-
AtomicExpression
- Object
- Expression
- AtomicExpression
- Regex::Character
- Defined in:
- lib/regex/character.rb
Overview
A regular expression that matches a specific character in a given character set
Constant Summary collapse
- DigramSequences =
Constant with all special 2-characters escape sequences
{ '\a' => 0x7, # alarm '\n' => 0xA, # newline '\r' => 0xD, # carriage return '\t' => 0x9, # tab '\e' => 0x1B, # escape '\f' => 0xC, # form feed '\v' => 0xB, # vertical feed # Single octal digit literals '\0' => 0, '\1' => 1, '\2' => 2, '\3' => 3, '\4' => 4, '\5' => 5, '\6' => 6, '\7' => 7 }.freeze
- MetaChars =
'\^$.|+?*()[]{}'.freeze
- MetaCharsInClass =
Characters with special meaning in char. class
'\^[]-'.freeze
Instance Attribute Summary collapse
-
#codepoint ⇒ Object
readonly
The integer value that uniquely identifies the character.
-
#lexeme ⇒ Object
readonly
The initial text representation of the character (if any).
Attributes inherited from Expression
Class Method Summary collapse
-
.char2codepoint(aChar) ⇒ Object
Convertion method that returns the codepoint for the given single character.
-
.codepoint2char(aCodepoint) ⇒ Object
Convertion method that returns a character given a codepoint (integer) value.
-
.esc2codepoint(esc_seq) ⇒ Object
Convertion method that returns the codepoint for the given escape sequence (a String).
Instance Method Summary collapse
-
#==(other) ⇒ Object
Returns true iff this Character and parameter 'another' represent the same character.
-
#char ⇒ Object
Return the character as a String object.
-
#explain ⇒ Object
Return a plain English description of the character.
-
#initialize(aValue) ⇒ Character
constructor
Constructor.
Methods inherited from AtomicExpression
Methods inherited from Expression
Constructor Details
#initialize(aValue) ⇒ Character
Constructor. [aValue] Initialize the character with a either a String literal or a codepoint value. Examples: Initializing with codepoint value... RegAn::Character.new(0x3a3) # Represents: Σ (Unicode GREEK CAPITAL LETTER SIGMA) RegAn::Character.new(931) # Also represents: Σ (931 dec == 3a3 hex)
Initializing with a single character string RegAn::Character.new(?\u03a3) # Also represents: Σ RegAn::Character.new('Σ') # Obviously, represents a Σ
Initializing with an escape sequence string Recognized escaped characters are: \a (alarm, 0x07), \n (newline, 0xA), \r (carriage return, 0xD), \t (tab, 0x9), \e (escape, 0x1B), \f (form feed, 0xC) \uXXXX where XXXX is a 4 hex digits integer value, \uX..., \ooo (octal) \xXX (hex) Any other escaped character will be treated as a literal character RegAn::Character.new('\n') # Represents a newline RegAn::Character.new('\u03a3') # Represents a Σ
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
# File 'lib/regex/character.rb', line 59 def initialize(aValue) case aValue when String if aValue.size == 1 # Literal single character case... @codepoint = self.class.char2codepoint(aValue) else # Should be an escape sequence... @codepoint = self.class.esc2codepoint(aValue) end @lexeme = aValue when Integer @codepoint = aValue else raise StandardError, "Cannot initialize a Character with a '#{aValue}'." end end |
Instance Attribute Details
#codepoint ⇒ Object (readonly)
The integer value that uniquely identifies the character.
32 33 34 |
# File 'lib/regex/character.rb', line 32 def codepoint @codepoint end |
#lexeme ⇒ Object (readonly)
The initial text representation of the character (if any).
35 36 37 |
# File 'lib/regex/character.rb', line 35 def lexeme @lexeme end |
Class Method Details
.char2codepoint(aChar) ⇒ Object
Convertion method that returns the codepoint for the given single character. Example: RegAn::Character::char2codepoint('Σ') # Returns: 0x3a3
89 90 91 |
# File 'lib/regex/character.rb', line 89 def self.char2codepoint(aChar) aChar.ord end |
.codepoint2char(aCodepoint) ⇒ Object
Convertion method that returns a character given a codepoint (integer) value. Example: RegAn::Character::codepoint2char(0x3a3) # Returns: Σ ( The Unicode GREEK CAPITAL LETTER SIGMA)
82 83 84 |
# File 'lib/regex/character.rb', line 82 def self.codepoint2char(aCodepoint) [aCodepoint].pack('U') # Remark: chr() fails with codepoints > 256 end |
.esc2codepoint(esc_seq) ⇒ Object
Convertion method that returns the codepoint for the given escape sequence (a String). Recognized escaped characters are: \a (alarm, 0x07), \n (newline, 0xA), \r (carriage return, 0xD), \t (tab, 0x9), \e (escape, 0x1B), \f (form feed, 0xC), \v (vertical feed, 0xB) \uXXXX where XXXX is a 4 hex digits integer value, \uX..., \ooo (octal) \xXX (hex) Any other escaped character will be treated as a literal character Example: RegAn::Character::esc2codepoint('\n') # Returns: 0xd
103 104 105 106 107 108 109 |
# File 'lib/regex/character.rb', line 103 def self.esc2codepoint(esc_seq) msg = "Escape sequence #{esc_seq} does not begin with a backslash (\)." raise StandardError, msg unless esc_seq[0] == '\\' result = (esc_seq.length == 2) ? digram2codepoint(esc_seq) : esc_number2codepoint(esc_seq) return result end |
Instance Method Details
#==(other) ⇒ Object
Returns true iff this Character and parameter 'another' represent the same character. [another] any Object. The way the equality is tested depends on the another's class Example: newOne = Character.new(?\u03a3) newOne == newOne # true. Identity newOne == Character.new(?\u03a3) # true. Both have same codepoint newOne == ?\u03a3 # true. The single character String match exactly the char attribute. newOne == 0x03a3 # true. The Integer is compared to the codepoint value. Will test equality with any Object that knows the to_s method
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
# File 'lib/regex/character.rb', line 125 def ==(other) result = case other when Character to_str == other.to_str when Integer codepoint == other when String other.size > 1 ? false : to_str == other else # Unknown type: try with a convertion self == other.to_s # Recursive call end return result end |
#char ⇒ Object
Return the character as a String object
112 113 114 |
# File 'lib/regex/character.rb', line 112 def char self.class.codepoint2char(@codepoint) end |
#explain ⇒ Object
Return a plain English description of the character
145 146 147 |
# File 'lib/regex/character.rb', line 145 def explain "the character '#{to_str}'" end |