Class: Rouge::RegexLexer Abstract
Overview
A stateful lexer that uses sets of regular expressions to tokenize a string. Most lexers are instances of RegexLexer.
Direct Known Subclasses
Lexers::C, Lexers::CSS, Lexers::CSharp, Lexers::Clojure, Lexers::Coffeescript, Lexers::CommonLisp, Lexers::Conf, Lexers::Cpp, Lexers::Diff, Lexers::Elixir, Lexers::Erlang, Lexers::Factor, Lexers::Gherkin, Lexers::Go, Lexers::Groovy, Lexers::HTML, Lexers::HTTP, Lexers::Haml, Lexers::Haskell, Lexers::INI, Lexers::IO, Lexers::JSON, Lexers::Java, Lexers::Javascript, Lexers::LLVM, Lexers::LiterateCoffeescript, Lexers::LiterateHaskell, Lexers::Lua, Lexers::Make, Lexers::Markdown, Lexers::Nginx, Lexers::Perl, Lexers::Prolog, Lexers::Puppet, Lexers::Python, Lexers::R, Lexers::Racket, Lexers::Ruby, Lexers::Rust, Lexers::SQL, Lexers::Sass, Lexers::Scheme, Lexers::Scss, Lexers::Sed, Lexers::Sed::Regex, Lexers::Sed::Replacement, Lexers::Shell, Lexers::Smalltalk, Lexers::TCL, Lexers::TOML, Lexers::TeX, Lexers::VimL, Lexers::XML, Lexers::YAML, TemplateLexer
Defined Under Namespace
Classes: Rule, State, StateDSL
Constant Summary collapse
- MAX_NULL_SCANS =
The number of successive scans permitted without consuming the input stream. If this is exceeded, the match fails.
5
Constants included from Token::Tokens
Token::Tokens::Num, Token::Tokens::Str
Class Method Summary collapse
- .get_state(name) ⇒ Object
-
.start(&b) ⇒ Object
Specify an action to be run every fresh lex.
-
.start_procs ⇒ Object
The routines to run at the beginning of a fresh lex.
-
.state(name, &b) ⇒ Object
Define a new state for this lexer with the given name.
-
.states ⇒ Object
The states hash for this lexer.
Instance Method Summary collapse
-
#delegate(lexer, text = nil) ⇒ Object
Delegate the lex to another lexer.
- #get_state(state_name) ⇒ Object
-
#goto(state_name) ⇒ Object
replace the head of the stack with the given state.
-
#group(tok) ⇒ Object
Yield a token with the next matched group.
- #groups(*tokens) ⇒ Object
-
#in_state?(state_name) ⇒ Boolean
Check if ‘state_name` is in the state stack.
-
#pop!(times = 1) ⇒ Object
Pop the state stack.
-
#push(state_name = nil, &b) ⇒ Object
Push a state onto the stack.
-
#reset! ⇒ Object
reset this lexer to its initial state.
-
#reset_stack ⇒ Object
reset the stack back to ‘[:root]`.
- #run_callback(stream, callback, &output_stream) ⇒ Object
- #run_rule(rule, scanner, &b) ⇒ Object
-
#stack ⇒ Object
The state stack.
-
#state ⇒ Object
The current state - i.e.
-
#state?(state_name) ⇒ Boolean
Check if ‘state_name` is the state on top of the state stack.
-
#step(state, stream, &b) ⇒ Object
Runs one step of the lex.
-
#stream_tokens(str, &b) ⇒ Object
This implements the lexer protocol, by yielding [token, value] pairs.
-
#token(tok, val = :__absent__) ⇒ Object
Yield a token.
Methods inherited from Lexer
aliases, all, analyze_text, assert_utf8!, #debug, default_options, demo, demo_file, desc, filenames, find, find_fancy, guess, guess_by_filename, guess_by_mimetype, guess_by_source, guesses, #initialize, lex, #lex, mimetypes, #option, #options, #tag, tag
Methods included from Token::Tokens
Constructor Details
This class inherits a constructor from Rouge::Lexer
Class Method Details
.get_state(name) ⇒ Object
136 137 138 139 140 141 142 |
# File 'lib/rouge/regex_lexer.rb', line 136 def self.get_state(name) return name if name.is_a? State state = states[name.to_s] raise "unknown state: #{name}" unless state state.load!(self) end |
.start(&b) ⇒ Object
Specify an action to be run every fresh lex.
124 125 126 |
# File 'lib/rouge/regex_lexer.rb', line 124 def self.start(&b) start_procs << b end |
.start_procs ⇒ Object
The routines to run at the beginning of a fresh lex.
115 116 117 |
# File 'lib/rouge/regex_lexer.rb', line 115 def self.start_procs @start_procs ||= InheritableList.new(superclass.start_procs) end |
.state(name, &b) ⇒ Object
Define a new state for this lexer with the given name. The block will be evaluated in the context of a StateDSL.
130 131 132 133 |
# File 'lib/rouge/regex_lexer.rb', line 130 def self.state(name, &b) name = name.to_s states[name] = State.new(name, &b) end |
.states ⇒ Object
The states hash for this lexer.
109 110 111 |
# File 'lib/rouge/regex_lexer.rb', line 109 def self.states @states ||= {} end |
Instance Method Details
#delegate(lexer, text = nil) ⇒ Object
Delegate the lex to another lexer. The #lex method will be called with ‘:continue` set to true, so that #reset! will not be called. In this way, a single lexer can be repeatedly delegated to while maintaining its own internal state stack.
303 304 305 306 307 308 309 310 311 |
# File 'lib/rouge/regex_lexer.rb', line 303 def delegate(lexer, text=nil) debug { " delegating to #{lexer.inspect}" } text ||= @current_stream[0] lexer.lex(text, :continue => true) do |tok, val| debug { " delegated token: #{tok.inspect}, #{val.inspect}" } token(tok, val) end end |
#get_state(state_name) ⇒ Object
145 146 147 |
# File 'lib/rouge/regex_lexer.rb', line 145 def get_state(state_name) self.class.get_state(state_name) end |
#goto(state_name) ⇒ Object
replace the head of the stack with the given state
343 344 345 346 |
# File 'lib/rouge/regex_lexer.rb', line 343 def goto(state_name) raise 'empty stack!' if stack.empty? stack[-1] = get_state(state_name) end |
#group(tok) ⇒ Object
Yield a token with the next matched group. Subsequent calls to this method will yield subsequent groups.
284 285 286 |
# File 'lib/rouge/regex_lexer.rb', line 284 def group(tok) yield_token(tok, @current_stream[@group_count += 1]) end |
#groups(*tokens) ⇒ Object
288 289 290 291 292 |
# File 'lib/rouge/regex_lexer.rb', line 288 def groups(*tokens) tokens.each_with_index do |tok, i| yield_token(tok, @current_stream[i+1]) end end |
#in_state?(state_name) ⇒ Boolean
Check if ‘state_name` is in the state stack.
356 357 358 359 360 361 |
# File 'lib/rouge/regex_lexer.rb', line 356 def in_state?(state_name) state_name = state_name.to_s stack.any? do |state| state.name == state_name.to_s end end |
#pop!(times = 1) ⇒ Object
Pop the state stack. If a number is passed in, it will be popped that number of times.
332 333 334 335 336 337 338 339 340 |
# File 'lib/rouge/regex_lexer.rb', line 332 def pop!(times=1) raise 'empty stack!' if stack.empty? debug { " popping stack: #{times}" } stack.pop(times) nil end |
#push(state_name = nil, &b) ⇒ Object
Push a state onto the stack. If no state name is given and you’ve passed a block, a state will be dynamically created using the StateDSL.
316 317 318 319 320 321 322 323 324 325 326 327 328 |
# File 'lib/rouge/regex_lexer.rb', line 316 def push(state_name=nil, &b) push_state = if state_name get_state(state_name) elsif block_given? State.new(b.inspect, &b).load!(self.class) else # use the top of the stack by default self.state end debug { " pushing #{push_state.name}" } stack.push(push_state) end |
#reset! ⇒ Object
reset this lexer to its initial state. This runs all of the start_procs.
166 167 168 169 170 171 172 173 |
# File 'lib/rouge/regex_lexer.rb', line 166 def reset! @stack = nil @current_stream = nil self.class.start_procs.each do |pr| instance_eval(&pr) end end |
#reset_stack ⇒ Object
reset the stack back to ‘[:root]`.
349 350 351 352 353 |
# File 'lib/rouge/regex_lexer.rb', line 349 def reset_stack debug { ' resetting stack' } stack.clear stack.push get_state(:root) end |
#run_callback(stream, callback, &output_stream) ⇒ Object
234 235 236 237 238 239 |
# File 'lib/rouge/regex_lexer.rb', line 234 def run_callback(stream, callback, &output_stream) with_output_stream(output_stream) do @group_count = 0 instance_exec(stream, &callback) end end |
#run_rule(rule, scanner, &b) ⇒ Object
246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 |
# File 'lib/rouge/regex_lexer.rb', line 246 def run_rule(rule, scanner, &b) # XXX HACK XXX # StringScanner's implementation of ^ is b0rken. # see http://bugs.ruby-lang.org/issues/7092 # TODO: this doesn't cover cases like /(a|^b)/, but it's # the most common, for now... return false if rule.beginning_of_line? && !scanner.beginning_of_line? if (@null_steps ||= 0) >= MAX_NULL_SCANS debug { " too many scans without consuming the string!" } return false end scanner.scan(rule.re) or return false if scanner.matched_size.zero? @null_steps += 1 else @null_steps = 0 end true end |
#stack ⇒ Object
The state stack. This is initially the single state ‘[:root]`. It is an error for this stack to be empty.
152 153 154 |
# File 'lib/rouge/regex_lexer.rb', line 152 def stack @stack ||= [get_state(:root)] end |
#state ⇒ Object
The current state - i.e. one on top of the state stack.
NB: if the state stack is empty, this will throw an error rather than returning nil.
160 161 162 |
# File 'lib/rouge/regex_lexer.rb', line 160 def state stack.last or raise 'empty stack!' end |
#state?(state_name) ⇒ Boolean
Check if ‘state_name` is the state on top of the state stack.
364 365 366 |
# File 'lib/rouge/regex_lexer.rb', line 364 def state?(state_name) state_name.to_s == state.name end |
#step(state, stream, &b) ⇒ Object
Runs one step of the lex. Rules in the current state are tried until one matches, at which point its callback is called.
210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
# File 'lib/rouge/regex_lexer.rb', line 210 def step(state, stream, &b) state.rules.each do |rule| case rule when State debug { " entering mixin #{rule.name}" } return true if step(rule, stream, &b) debug { " exiting mixin #{rule.name}" } when Rule debug { " trying #{rule.inspect}" } if run_rule(rule, stream) debug { " got #{stream[0].inspect}" } run_callback(stream, rule.callback, &b) return true end end end false end |
#stream_tokens(str, &b) ⇒ Object
This implements the lexer protocol, by yielding [token, value] pairs.
The process for lexing works as follows, until the stream is empty:
-
We look at the state on top of the stack (which by default is ‘[:root]`).
-
Each rule in that state is tried until one is successful. If one is found, that rule’s callback is evaluated - which may yield tokens and manipulate the state stack. Otherwise, one character is consumed with an ‘’Error’‘ token, and we continue at (1.)
187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 |
# File 'lib/rouge/regex_lexer.rb', line 187 def stream_tokens(str, &b) stream = StringScanner.new(str) @current_stream = stream until stream.eos? debug { "lexer: #{self.class.tag}" } debug { "stack: #{stack.map(&:name).inspect}" } debug { "stream: #{stream.peek(20).inspect}" } success = step(get_state(state), stream, &b) if !success debug { " no match, yielding Error" } b.call(Token::Tokens::Error, stream.getch) end end end |
#token(tok, val = :__absent__) ⇒ Object
Yield a token.
277 278 279 280 |
# File 'lib/rouge/regex_lexer.rb', line 277 def token(tok, val=:__absent__) val = @current_stream[0] if val == :__absent__ yield_token(tok, val) end |