Class: Regex::Character

Inherits:

AtomicExpression

Object
Expression
AtomicExpression
Regex::Character

show all

Defined in:: lib/regex/character.rb

Overview

A regular expression that matches a specific character in a given character set

Constant Summary collapse

DigramSequences = Constant with all special 2-characters escape sequences

{
  '\a' => 0x7, # alarm
  '\n' => 0xA, # newline
  '\r' => 0xD, # carriage return
  '\t' => 0x9, # tab
  '\e' => 0x1B, # escape
  '\f' => 0xC, # form feed
  '\v' => 0xB, # vertical feed
  # Single octal digit literals
  '\0' => 0,
  '\1' => 1,
  '\2' => 2,
  '\3' => 3,
  '\4' => 4,
  '\5' => 5,
  '\6' => 6,
  '\7' => 7
}.freeze

MetaChars =

'\^$.|+?*()[]{}'.freeze

MetaCharsInClass = Characters with special meaning in char. class

'\^[]-'.freeze

Instance Attribute Summary collapse

#codepoint ⇒ Object readonly
The integer value that uniquely identifies the character.
#lexeme ⇒ Object readonly
The initial text representation of the character (if any).

Attributes inherited from Expression

#begin_anchor, #end_anchor

Class Method Summary collapse

.char2codepoint(aChar) ⇒ Object
Convertion method that returns the codepoint for the given single character.
.codepoint2char(aCodepoint) ⇒ Object
Convertion method that returns a character given a codepoint (integer) value.
.esc2codepoint(esc_seq) ⇒ Object
Convertion method that returns the codepoint for the given escape sequence (a String).

Instance Method Summary collapse

#==(other) ⇒ Object
Returns true iff this Character and parameter 'another' represent the same character.
#char ⇒ Object
Return the character as a String object.
#explain ⇒ Object
Return a plain English description of the character.
#initialize(aValue) ⇒ Character constructor
Constructor.

Methods inherited from AtomicExpression

#atomic?, #done!, #lazy!

Methods inherited from Expression

#atomic?, #options, #to_str

Constructor Details

#initialize(aValue) ⇒ `Character`

Constructor. [aValue] Initialize the character with a either a String literal or a codepoint value. Examples: Initializing with codepoint value... RegAn::Character.new(0x3a3) # Represents: Σ (Unicode GREEK CAPITAL LETTER SIGMA) RegAn::Character.new(931) # Also represents: Σ (931 dec == 3a3 hex)

Initializing with a single character string RegAn::Character.new(?\u03a3) # Also represents: Σ RegAn::Character.new('Σ') # Obviously, represents a Σ

Initializing with an escape sequence string Recognized escaped characters are: \a (alarm, 0x07), \n (newline, 0xA), \r (carriage return, 0xD), \t (tab, 0x9), \e (escape, 0x1B), \f (form feed, 0xC) \uXXXX where XXXX is a 4 hex digits integer value, \uX..., \ooo (octal) \xXX (hex) Any other escaped character will be treated as a literal character RegAn::Character.new('\n') # Represents a newline RegAn::Character.new('\u03a3') # Represents a Σ

# File 'lib/regex/character.rb', line 59

def initialize(aValue)
  case aValue
    when String
      if aValue.size == 1
        # Literal single character case...
        @codepoint = self.class.char2codepoint(aValue)
      else
        # Should be an escape sequence...
        @codepoint = self.class.esc2codepoint(aValue)
      end
      @lexeme = aValue

    when Integer
      @codepoint = aValue
    else
      raise StandardError, "Cannot initialize a Character with a '#{aValue}'."
  end
end

Instance Attribute Details

#codepoint ⇒ `Object` (readonly)

The integer value that uniquely identifies the character.



32
33
34

# File 'lib/regex/character.rb', line 32

def codepoint
  @codepoint
end

#lexeme ⇒ `Object` (readonly)

The initial text representation of the character (if any).



35
36
37

# File 'lib/regex/character.rb', line 35

def lexeme
  @lexeme
end

Class Method Details

.char2codepoint(aChar) ⇒ `Object`

Convertion method that returns the codepoint for the given single character. Example: RegAn::Character::char2codepoint('Σ') # Returns: 0x3a3



89
90
91

# File 'lib/regex/character.rb', line 89

def self.char2codepoint(aChar)
  aChar.ord
end

.codepoint2char(aCodepoint) ⇒ `Object`

Convertion method that returns a character given a codepoint (integer) value. Example: RegAn::Character::codepoint2char(0x3a3) # Returns: Σ ( The Unicode GREEK CAPITAL LETTER SIGMA)



82
83
84

# File 'lib/regex/character.rb', line 82

def self.codepoint2char(aCodepoint)
  [aCodepoint].pack('U') # Remark: chr() fails with codepoints > 256
end

.esc2codepoint(esc_seq) ⇒ `Object`

Convertion method that returns the codepoint for the given escape sequence (a String). Recognized escaped characters are: \a (alarm, 0x07), \n (newline, 0xA), \r (carriage return, 0xD), \t (tab, 0x9), \e (escape, 0x1B), \f (form feed, 0xC), \v (vertical feed, 0xB) \uXXXX where XXXX is a 4 hex digits integer value, \uX..., \ooo (octal) \xXX (hex) Any other escaped character will be treated as a literal character Example: RegAn::Character::esc2codepoint('\n') # Returns: 0xd

Raises:

(StandardError)

# File 'lib/regex/character.rb', line 103

def self.esc2codepoint(esc_seq)
  msg = "Escape sequence #{esc_seq} does not begin with a backslash (\)."
  raise StandardError, msg unless esc_seq[0] == '\\'
  result = (esc_seq.length == 2) ? digram2codepoint(esc_seq) : esc_number2codepoint(esc_seq)

  return result
end

Instance Method Details

#==(other) ⇒ `Object`

Returns true iff this Character and parameter 'another' represent the same character. [another] any Object. The way the equality is tested depends on the another's class Example: newOne = Character.new(?\u03a3) newOne == newOne # true. Identity newOne == Character.new(?\u03a3) # true. Both have same codepoint newOne == ?\u03a3 # true. The single character String match exactly the char attribute. newOne == 0x03a3 # true. The Integer is compared to the codepoint value. Will test equality with any Object that knows the to_s method

# File 'lib/regex/character.rb', line 125

def ==(other)
  result = case other
    when Character
      to_str == other.to_str

    when Integer
      codepoint == other

    when String
      other.size > 1 ? false : to_str == other

    else
      # Unknown type: try with a convertion
      self == other.to_s # Recursive call
  end

  return result
end

#char ⇒ `Object`

Return the character as a String object



112
113
114

# File 'lib/regex/character.rb', line 112

def char
  self.class.codepoint2char(@codepoint)
end

#explain ⇒ `Object`

Return a plain English description of the character



145
146
147

# File 'lib/regex/character.rb', line 145

def explain
  "the character '#{to_str}'"
end