Class: Rlex::Lexer
- Inherits:
-
Object
- Object
- Rlex::Lexer
- Defined in:
- lib/rlex/lexer.rb
Overview
Implements a simple lexer using a StringScanner
.
The lexer was written for use with Racc, a Ruby variant of Yacc. But there is no code dependency on that project so the lexer may also be used on its own or with other packages.
-
Ignored input takes precedence over rules and keywords, so if a prefix is matched by an ignore pattern, it’s ignored even if it’s also a keyword or matched by a rule
-
The lexer is greedy, so if a prefix is matched by multiple rules or keywords, the lexer chooses the option consuming the most input
Instance Method Summary collapse
-
#ignore(pattern) ⇒ Regexp
Instructs the lexer to ignore input matched by the specified pattern.
-
#initialize ⇒ Lexer
constructor
Initializes an empty Lexer.
-
#keyword(name = nil, kword) ⇒ Symbol
Defines a static sequence of input as a keyword.
-
#next_token ⇒ Token
Returns the next token matched from the remaining input.
-
#rule(name, pattern) ⇒ Symbol
Defines a rule to match the specified pattern.
-
#start(input) ⇒ String
Initializes the lexer with new input.
Constructor Details
#initialize ⇒ Lexer
Initializes an empty Lexer.
43 44 45 46 47 |
# File 'lib/rlex/lexer.rb', line 43 def initialize @ignored = [] @rules = [] @keywords = {} end |
Instance Method Details
#ignore(pattern) ⇒ Regexp
Ignored input takes precedence over rules and keywords, so if a prefix is matched by an ignore pattern, it’s ignored even if it’s also a keyword or matched by a rule
Instructs the lexer to ignore input matched by the specified pattern. If appropriate, call this multiple times to ignore several patterns.
61 62 63 64 |
# File 'lib/rlex/lexer.rb', line 61 def ignore(pattern) @ignored << pattern return pattern end |
#keyword(name = nil, kword) ⇒ Symbol
Use keywords for efficiency instead of rules whenever the matched input is static
Defines a static sequence of input as a keyword.
101 102 103 104 105 106 107 108 |
# File 'lib/rlex/lexer.rb', line 101 def keyword(name = nil, kword) # @todo Validate the keyword name name = kword if name == nil pattern = Regexp.new(Regexp.escape kword.to_s) rule name, pattern @keywords[kword.to_s] = Token.new name.to_sym, kword.to_s return name.to_sym end |
#next_token ⇒ Token
Returns the next token matched from the remaining input. If no input is left, or the lexer has not been initialized, EOF_TOKEN
is returned.
132 133 134 135 136 137 138 139 140 141 142 |
# File 'lib/rlex/lexer.rb', line 132 def next_token return EOF_TOKEN if @scanner.nil? or @scanner.empty? return next_token if ignore_prefix? rule = greediest_rule if rule prefix = @scanner.scan(rule.pattern) keyword = @keywords[prefix] return keyword ? keyword : Token.new(rule.name, prefix) end raise "unexpected input <#{@scanner.peek(5)}>" end |
#rule(name, pattern) ⇒ Symbol
Use keywords for efficiency instead of rules whenever the matched input is static
Defines a rule to match the specified pattern.
79 80 81 82 83 |
# File 'lib/rlex/lexer.rb', line 79 def rule(name, pattern) # @todo Validate the rule name @rules << (Rule.new name.to_sym, pattern) return name.to_sym end |
#start(input) ⇒ String
This resets the lexer with a new StringScanner so any state information related to previous input is lost
Initializes the lexer with new input.
119 120 121 122 |
# File 'lib/rlex/lexer.rb', line 119 def start(input) @scanner = StringScanner.new input return input end |