Class: RLTK::Lexer
- Inherits:
-
Object
- Object
- RLTK::Lexer
- Defined in:
- lib/rltk/lexer.rb
Overview
The Lexer class may be sub-classed to produce new lexers. These lexers have a lot of features, and are described in the main documentation.
Direct Known Subclasses
Defined Under Namespace
Classes: Environment, Rule
Class Attribute Summary collapse
-
.start_state ⇒ Symbol
readonly
State in which the lexer starts.
Instance Attribute Summary collapse
-
#env ⇒ Environment
readonly
Environment used by an instantiated lexer.
Class Method Summary collapse
-
.inherited(klass) ⇒ void
Called when the Lexer class is sub-classed, it installes necessary instance class variables.
-
.install_icvars ⇒ void
Installs instance class varialbes into a class.
-
.lex(string, file_name = nil, env = self::Environment.new(@start_state)) ⇒ Array<Token>
Lex string, using env as the environment.
-
.lex_file(file_name, env = self::Environment.new(@start_state)) ⇒ Array<Token>
A wrapper function that calls Lexer.lex on the contents of a file.
-
.match_first ⇒ void
Used to tell a lexer to use the first match found instead of the longest match found.
-
.rule(pattern, state = :default, flags = [], &action) ⇒ void
(also: r)
This method is used to define a new lexing rule.
-
.start(state) ⇒ void
Changes the starting state of the lexer.
Instance Method Summary collapse
-
#initialize ⇒ Lexer
constructor
Instantiates a new lexer and creates an environment to be used for subsequent calls.
-
#lex(string, file_name = nil) ⇒ Array<Token>
Lexes a string using the encapsulated environment.
-
#lex_file(file_name) ⇒ Array<Token>
Lexes a file using the encapsulated environment.
Constructor Details
#initialize ⇒ Lexer
Instantiates a new lexer and creates an environment to be used for subsequent calls.
222 223 224 |
# File 'lib/rltk/lexer.rb', line 222 def initialize @env = self.class::Environment.new(self.class.start_state) end |
Class Attribute Details
.start_state ⇒ Symbol (readonly)
Returns State in which the lexer starts.
66 67 68 |
# File 'lib/rltk/lexer.rb', line 66 def start_state @start_state end |
Instance Attribute Details
#env ⇒ Environment (readonly)
Returns Environment used by an instantiated lexer.
58 59 60 |
# File 'lib/rltk/lexer.rb', line 58 def env @env end |
Class Method Details
.inherited(klass) ⇒ void
This method returns an undefined value.
Called when the Lexer class is sub-classed, it installes necessary instance class variables.
72 73 74 |
# File 'lib/rltk/lexer.rb', line 72 def inherited(klass) klass.install_icvars end |
.install_icvars ⇒ void
This method returns an undefined value.
Installs instance class varialbes into a class.
79 80 81 82 83 |
# File 'lib/rltk/lexer.rb', line 79 def install_icvars @match_type = :longest @rules = Hash.new {|h,k| h[k] = Array.new} @start_state = :default end |
.lex(string, file_name = nil, env = self::Environment.new(@start_state)) ⇒ Array<Token>
Lex string, using env as the environment. This method will return the array of tokens generated by the lexer with a token of type EOS (End of Stream) appended to the end.
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
# File 'lib/rltk/lexer.rb', line 94 def lex(string, file_name = nil, env = self::Environment.new(@start_state)) # Offset from start of stream. stream_offset = 0 # Offset from the start of the line. line_offset = 0 line_number = 1 # Empty token list. tokens = Array.new # The scanner. scanner = StringScanner.new(string) # Start scanning the input string. until scanner.eos? match = nil # If the match_type is set to :longest all of the # rules for the current state need to be scanned # and the longest match returned. If the # match_type is :first, we only need to scan until # we find a match. @rules[env.state].each do |rule| if (rule.flags - env.flags).empty? if txt = scanner.check(rule.pattern) if not match or match.first.length < txt.length match = [txt, rule] break if @match_type == :first end end end end if match rule = match.last txt = scanner.scan(rule.pattern) type, value = env.rule_exec(rule.pattern.match(txt), txt, &rule.action) if type pos = StreamPosition.new(stream_offset, line_number, line_offset, txt.length, file_name) tokens << Token.new(type, value, pos) end # Advance our stat counters. stream_offset += txt.length if (newlines = txt.count("\n")) > 0 line_number += newlines line_offset = txt.rpartition("\n").last.length else line_offset += txt.length() end else error = LexingError.new(stream_offset, line_number, line_offset, scanner.rest) raise(error, 'Unable to match string with any of the given rules') end end return tokens << Token.new(:EOS) end |
.lex_file(file_name, env = self::Environment.new(@start_state)) ⇒ Array<Token>
A wrapper function that calls lex on the contents of a file.
165 166 167 |
# File 'lib/rltk/lexer.rb', line 165 def lex_file(file_name, env = self::Environment.new(@start_state)) File.open(file_name, 'r') { |f| self.lex(f.read, file_name, env) } end |
.match_first ⇒ void
This method returns an undefined value.
Used to tell a lexer to use the first match found instead of the longest match found.
173 174 175 |
# File 'lib/rltk/lexer.rb', line 173 def match_first @match_type = :first end |
.rule(pattern, state = :default, flags = [], &action) ⇒ void Also known as: r
This method returns an undefined value.
This method is used to define a new lexing rule. The first argument is the regular expression used to match substrings of the input. The second argument is the state to which the rule belongs. Flags that need to be set for the rule to be considered are specified by the third argument. The last argument is a block that returns a type and value to be used in constructing a Token. If no block is specified the matched substring will be discarded and lexing will continue.
193 194 195 196 197 198 199 200 201 202 203 |
# File 'lib/rltk/lexer.rb', line 193 def rule(pattern, state = :default, flags = [], &action) # If no action is given we will set it to an empty # action. action ||= Proc.new() {} pattern = Regexp.new(pattern) if pattern.is_a?(String) r = Rule.new(pattern, action, state, flags) if state == :ALL then @rules.each_key { |k| @rules[k] << r } else @rules[state] << r end end |
.start(state) ⇒ void
This method returns an undefined value.
Changes the starting state of the lexer.
211 212 213 |
# File 'lib/rltk/lexer.rb', line 211 def start(state) @start_state = state end |
Instance Method Details
#lex(string, file_name = nil) ⇒ Array<Token>
Lexes a string using the encapsulated environment.
232 233 234 |
# File 'lib/rltk/lexer.rb', line 232 def lex(string, file_name = nil) self.class.lex(string, file_name, @env) end |
#lex_file(file_name) ⇒ Array<Token>
Lexes a file using the encapsulated environment.
241 242 243 |
# File 'lib/rltk/lexer.rb', line 241 def lex_file(file_name) self.class.lex_file(file_name, @env) end |