Class: Rly::Lex
- Inherits:
-
Object
- Object
- Rly::Lex
- Defined in:
- lib/rly/lex.rb,
lib/rly/helpers.rb
Overview
Base class for your lexer.
Generally, you define a new lexer by subclassing Rly::Lex. Your code should use methods Lex.token, Lex.ignore, Lex.literals, Lex.on_error to make the lexer configuration (check the methods documentation for details).
Once you got your lexer configured, you can create its instances passing a String to be tokenized. You can then use #next method to get tokens. If you have more string to tokenize, you can append it to input buffer at any time with #input.
Direct Known Subclasses
Instance Attribute Summary collapse
-
#lineno ⇒ Fixnum
Tracks the current line number for generated tokens.
-
#pos ⇒ Fixnum
Tracks the current position in the input string.
DSL Class Methods collapse
-
.ignore(ign) ⇒ Object
Specifies a list of one-char symbols to be ignored in input.
-
.literals(lit) ⇒ Object
Specifies a list of one-char literals.
-
.on_error(&block) ⇒ Object
Specifies a block that should be called on error.
-
.token(*args) {|tok| ... } ⇒ Object
Adds a token definition to a class.
Class Method Summary collapse
- .callables ⇒ Object
- .ignore_spaces_and_tabs ⇒ Object
- .lex_double_quoted_string_tokens ⇒ Object
- .lex_number_tokens ⇒ Object
- .metatokens(*args) ⇒ Object
- .metatokens_list ⇒ Object
- .terminals ⇒ Object
- .token_regexps ⇒ Object
Instance Method Summary collapse
- #build_token(type, value) ⇒ Object
- #ignore_symbol ⇒ Object
-
#initialize(input = "") ⇒ Lex
constructor
Creates a new lexer instance for given input.
-
#input(input) ⇒ Object
Appends string to input buffer.
- #inspect ⇒ Object
-
#next ⇒ LexToken?
Processes the next token in input.
Constructor Details
#initialize(input = "") ⇒ Lex
Creates a new lexer instance for given input
63 64 65 66 67 |
# File 'lib/rly/lex.rb', line 63 def initialize(input="") @input = input @pos = 0 @lineno = 0 end |
Instance Attribute Details
#lineno ⇒ Fixnum
Tracks the current line number for generated tokens
lineno’s value should be increased manually. Check the example for a demo rule.
30 31 32 |
# File 'lib/rly/lex.rb', line 30 def lineno @lineno end |
#pos ⇒ Fixnum
Tracks the current position in the input string
Genreally, it should only be used to skip a few characters in the error hander.
44 45 46 |
# File 'lib/rly/lex.rb', line 44 def pos @pos end |
Class Method Details
.callables ⇒ Object
191 192 193 |
# File 'lib/rly/lex.rb', line 191 def callables @callables ||= {} end |
.ignore(ign) ⇒ Object
Specifies a list of one-char symbols to be ignored in input
This method allows to skip over formatting symbols (like tabs and spaces) quickly.
346 347 348 349 |
# File 'lib/rly/lex.rb', line 346 def ignore(ign) @ignores = ign nil end |
.ignore_spaces_and_tabs ⇒ Object
4 5 6 |
# File 'lib/rly/helpers.rb', line 4 def self.ignore_spaces_and_tabs ignore " \t" end |
.lex_double_quoted_string_tokens ⇒ Object
15 16 17 18 19 20 |
# File 'lib/rly/helpers.rb', line 15 def self.lex_double_quoted_string_tokens token :STRING, /"[^"]*"/ do |t| t.value = t.value[1...-1] t end end |
.lex_number_tokens ⇒ Object
8 9 10 11 12 13 |
# File 'lib/rly/helpers.rb', line 8 def self.lex_number_tokens token :NUMBER, /\d+/ do |t| t.value = t.value.to_i t end end |
.literals(lit) ⇒ Object
Specifies a list of one-char literals
Literals may be used in the case when you have several one-character tokens and you don’t want to define them one by one using token method.
320 321 322 323 |
# File 'lib/rly/lex.rb', line 320 def literals(lit) @literals = lit nil end |
.metatokens(*args) ⇒ Object
218 219 220 |
# File 'lib/rly/lex.rb', line 218 def (*args) @metatokens_list = args end |
.metatokens_list ⇒ Object
214 215 216 |
# File 'lib/rly/lex.rb', line 214 def @metatokens_list ||= [] end |
.on_error(&block) ⇒ Object
Specifies a block that should be called on error
In case of lexing error the lexer first tries to fix it by providing a chance for developer to look on the failing character. If this block is not provided, the lexing error always results in Rly::LexError.
You must increment the lexer’s #pos as part of the action. You may also return a new Rly::LexToken or nil to skip the input
375 376 377 378 |
# File 'lib/rly/lex.rb', line 375 def on_error(&block) @error_block = block nil end |
.terminals ⇒ Object
187 188 189 |
# File 'lib/rly/lex.rb', line 187 def terminals self.tokens.map { |t,r,b| t }.compact + self.literals_list.chars.to_a + self. end |
.token(*args) {|tok| ... } ⇒ Object
Adds a token definition to a class
This method adds a token definition to be lated used to tokenize input. It can be used to register normal tokens, and also functional tokens (the latter ones are processed as usual but are not being returned).
290 291 292 293 294 295 296 297 298 299 |
# File 'lib/rly/lex.rb', line 290 def token(*args, &block) if args.length == 2 self.tokens << [args[0], args[1], block] elsif args.length == 1 self.tokens << [nil, args[0], block] else raise ArgumentError end nil end |
.token_regexps ⇒ Object
195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 |
# File 'lib/rly/lex.rb', line 195 def token_regexps return @token_regexps if @token_regexps collector = [] self.tokens.each do |name, rx, block| name = "__anonymous_#{block.hash}".to_sym unless name self.callables[name] = block rxs = rx.to_s named_rxs = "\\A(?<#{name}>#{rxs})" collector << named_rxs end rxss = collector.join('|') @token_regexps = Regexp.new(rxss) end |
Instance Method Details
#build_token(type, value) ⇒ Object
178 179 180 |
# File 'lib/rly/lex.rb', line 178 def build_token(type, value) LexToken.new(type, value, self, @pos, @lineno) end |
#ignore_symbol ⇒ Object
182 183 184 |
# File 'lib/rly/lex.rb', line 182 def ignore_symbol @pos += 1 end |
#input(input) ⇒ Object
Appends string to input buffer
The given string is appended to input buffer, further #next calls will tokenize it as usual.
90 91 92 93 |
# File 'lib/rly/lex.rb', line 90 def input(input) @input << input nil end |
#inspect ⇒ Object
69 70 71 |
# File 'lib/rly/lex.rb', line 69 def inspect "#<#{self.class} pos=#{@pos} len=#{@input.length} lineno=#{@lineno}>" end |
#next ⇒ LexToken?
Processes the next token in input
This is the main interface to lexer. It returns next available token or nil if there are no more tokens available in the input string.
#each Raises Rly::LexError if the input cannot be processed. This happens if there were no matches by ‘token’ rules and no matches by ‘literals’ rule. If the on_error handler is not set, the exception will be raised immediately, however, if the handler is set, the eception will be raised only if the #pos after returning from error handler is still unchanged.
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
# File 'lib/rly/lex.rb', line 119 def next while @pos < @input.length if self.class.ignores_list[@input[@pos]] ignore_symbol next end m = self.class.token_regexps.match(@input[@pos..-1]) if m && ! m[0].empty? val = nil type = nil resolved_type = nil m.names.each do |n| if m[n] type = n.to_sym resolved_type = (n.start_with?('__anonymous_') ? nil : type) val = m[n] break end end if type tok = build_token(resolved_type, val) @pos += m.end(0) tok = self.class.callables[type].call(tok) if self.class.callables[type] if tok && tok.type return tok else next end end end if self.class.literals_list[@input[@pos]] tok = build_token(@input[@pos], @input[@pos]) matched = true @pos += 1 return tok end if self.class.error_hander pos = @pos tok = build_token(:error, @input[@pos]) tok = self.class.error_hander.call(tok) if pos == @pos raise LexError.new("Illegal character '#{@input[@pos]}' at index #{@pos}") else return tok if tok && tok.type end else raise LexError.new("Illegal character '#{@input[@pos]}' at index #{@pos}") end end return nil end |