Class: ToknInternal::RegParse

Inherits:
Object
  • Object
show all
Defined in:
lib/tokn/reg_parse.rb

Overview

Parses a single regular expression from a string. Produces an NFA with distinguished start and end states (none of these states are marked as final states)

Here is the grammar for regular expressions. Spaces are ignored, and can be liberally sprinkled within the regular expressions to aid readability. To represent a space, the s escape sequence must be used. See the file ‘sampletokens.txt’ for some examples.

Expressions have one of these types:

E : base class
J : a Join expression, formed by concatenating one or more together
Q : a Quantified expression; followed optionally by '*', '+', or '?'
P : a Parenthesized expression, which is optionally surrounded with (), {}, []

E -> J '|' E  
   | J

J -> Q J
   | Q

Q -> P '*'
   | P '+'
   | P '?'
   | P

P -> '(' E ')'
   | '{' TOKENNAME '}'
   | '[^' SETSEQ ']'     A code not appearing in the set
   | '[' SETSEQ ']'        
   | CHARCODE

SETSEQ -> SET SETSEQ
        | SET 

SET -> CHARCODE 
        | CHARCODE '-' CHARCODE

CHARCODE ->   
         a |  b |  c  ...   any printable except {,},[, etc.
     |  \xhh                  hex value from 00...ff
     |  \uhhhh                hex value from 0000...ffff (e.g., unicode)
     |  \f | \n | \r | \t     formfeed, linefeed, return, tab
     |  \s                    a space (' ')
     |  \*                    where * is some other non-alphabetic
                               character that needs to be escaped

The parser performs recursive descent parsing; each method returns an NFA represented by a pair of states: the start and end states.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(script, tokenDefMap = nil) ⇒ RegParse

Construct a parser and perform the parsing

Parameters:

  • script

    script to parse

  • tokenDefMap (defaults to: nil)

    if not nil, a map of previously parsed regular expressions (mapping names to ids) to be consulted if a curly brace expression appears in the script



73
74
75
76
77
78
# File 'lib/tokn/reg_parse.rb', line 73

def initialize(script, tokenDefMap = nil) 
  @script = script.strip
  @nextStateId = 0
  @tokenDefMap = tokenDefMap
  parseScript
end

Instance Attribute Details

#endStateObject (readonly)

Returns the value of attribute endState.



65
66
67
# File 'lib/tokn/reg_parse.rb', line 65

def endState
  @endState
end

#startStateObject (readonly)

Returns the value of attribute startState.



65
66
67
# File 'lib/tokn/reg_parse.rb', line 65

def startState
  @startState
end

Instance Method Details

#inspectObject



81
82
83
84
85
# File 'lib/tokn/reg_parse.rb', line 81

def inspect   
  s = "RegParse: #{@script}"
  s += " start:"+d(@startState)+" end:"+d(@endState)
  return s
end