Class: Violet::Lexer
- Inherits:
-
Object
- Object
- Violet::Lexer
- Defined in:
- lib/violet/lexer.rb
Overview
Public: Lexes a JavaScript source string.
Constant Summary collapse
- LINE_TERMINATORS =
Public: Matches line terminators: line feeds, carriage returns, line separators, and paragraph separators. See section 7.3 of the ES 5.1 spec.
/[\n\r\u2028\u2029]/
- NORMALIZE_LINE_ENDINGS =
Public: Matches line separators, paragraph separators, and carriage returns not followed by line separators. Used to convert all line terminators to line feeds. CRLF line endings are preserved.
/[\u2028\u2029]|(?:\r[^\n])/
- IDENTIFIER_START =
Public: Matches Unicode letters, ‘$`, `_`, and Unicode escape sequences. See section 7.6.
/[$_\p{Lu}\p{Ll}\p{Lt}\p{Lm}\p{Lo}\p{Nl}]/
- IDENTIFIER_FRAGMENT =
Public: Matches identifier starting characters, Unicode combining marks, Unicode digits, Unicode connector punctuators, zero-width non-joiners, and zero-width joiners. See section 7.1.
Regexp.union(IDENTIFIER_START, /[\p{Mn}\p{Mc}\p{Nd}\p{Pc}\u200c\u200d]/)
- TOKEN =
Public: Matches an ECMAScript token. This is a superset of the ‘Token` production defined in section 7.5 of the spec.
%r( ## Whitespace characters: tab, vertical tab, form feed, space, # non-breaking space, byte-order mark, and other Unicode space separators # (Category Z). The space and non-breaking space characters are matched by # the \p{Z} Unicode category class. See section 7.2 of the ES spec. (?<whitespace>[\t\v\f\ufeff\uffff\p{Z}])? # Line terminators. See section 7.3. (?<line_terminator>#{LINE_TERMINATORS})? # Line and block comments. See section 7.4. (?<line_comment>//)? (?<block_comment>/\*)? # Single- and double-quoted string literals. See section 7.8.4. (?<single_quoted_string>')? (?<double_quoted_string>")? # Numeric literals. See section 7.8.3. (?<number>\.?[0-9])? # RegExp literals. See section 7.8.5. This capture may also match the # `DivPunctuator` production. (?:(?<pattern>/)[^=])? # Punctuators. See section 7.7. (?<punctuator>\>>>=|===|!==|>>>|<<=|>>=|<=|>=|==|!=|\+\+|--|<<|>>|&&| \|\||\+=|-=|\*=|%=|&=|\|=|\^=|/=|\{|\}|\(|\)|\[|\]|\.|;|,|<|>|\+|-| \*|%|\||&|\||\^|!|~|\?|:|=|/)? )x
- LITERALS =
Internal: The ‘true`, `false`, and `null` literals, as well as the `undefined` value. The lexer marks these four values as primitives.
%w( undefined null true false )
- STRINGS =
Internal: A ‘Hash` that contains the quote character, token kind, and the unterminated string and invalid line continuation error messages for single- and double-quoted string tokens.
%w( single ' double " ).each_slice(2).with_object({}) do |(kind, quote), value| value[kind.to_sym] = { :quote => quote, :kind => "#{kind}_quoted_string".to_sym, :unterminated_string_error => "Unterminated #{kind}-quoted string literal.", :invalid_continuation_error => "Unescaped line terminators are not permitted within #{kind}-quoted string literals." } end
Instance Attribute Summary collapse
-
#column ⇒ Object
readonly
Public: Gets the current column.
-
#line ⇒ Object
readonly
Public: Gets the current line.
-
#source ⇒ Object
readonly
Public: Gets the source string.
Instance Method Summary collapse
-
#eof? ⇒ Boolean
Public: Returns ‘true` if the lexer has reached the end of the source string.
-
#initialize(source) ⇒ Lexer
constructor
Public: Creates a new ‘Lexer` with a source string.
-
#insert_before(token, original) ⇒ Object
Public: Inserts a new token into the token stream, before a reference token.
-
#lex(pattern = true) ⇒ Object
Public: Lexes a token.
-
#lex_block_comment ⇒ Object
Internal: Lexes a block comment at the current scan position.
-
#lex_identifier ⇒ Object
Internal: Lexes a regular expression literal.
-
#lex_line_comment ⇒ Object
Internal: Lexes a line comment at the current scan position.
-
#lex_line_terminator ⇒ Object
Internal: Lexes a line terminator at the current scan position: either a line feed, carriage return, line separator, or paragraph separator.
-
#lex_number ⇒ Object
Internal: Lexes a decimal or hexadecimal numeric value.
-
#lex_pattern ⇒ Object
Internal: Lexes a regular expression literal.
-
#lex_string(style) ⇒ Object
Internal: Lexes a single- or double-quoted string primitive at the current scan position.
-
#lex_whitespace ⇒ Object
Internal: Lexes a whitespace token at the current scan position.
-
#match_decimal? ⇒ Boolean
Internal: Returns the maximum number of characters, relative to the current scan pointer, that may be parsed as valid decimal characters.
-
#match_identifier?(lex_as_fragment = false) ⇒ Boolean
Internal: Returns the maximum number of characters, relative to the current scan pointer, that may be parsed as valid identifier characters.
-
#reset! ⇒ Object
Public: Resets the lexer to its original position and clears the token stream.
-
#tokens(*patterns) ⇒ Object
Public: Produces a complete token stream from the source.
Constructor Details
#initialize(source) ⇒ Lexer
Public: Creates a new ‘Lexer` with a source string.
source - The source ‘String`.
82 83 84 85 86 87 88 |
# File 'lib/violet/lexer.rb', line 82 def initialize(source) @source = source # Replace all line terminators with a single line feed, but preserve CRLF # line endings. @normalized_source = @source.gsub(NORMALIZE_LINE_ENDINGS, ?\n) reset! end |
Instance Attribute Details
#column ⇒ Object (readonly)
Public: Gets the current column.
77 78 79 |
# File 'lib/violet/lexer.rb', line 77 def column @column end |
#line ⇒ Object (readonly)
Public: Gets the current line.
74 75 76 |
# File 'lib/violet/lexer.rb', line 74 def line @line end |
#source ⇒ Object (readonly)
Public: Gets the source string.
71 72 73 |
# File 'lib/violet/lexer.rb', line 71 def source @source end |
Instance Method Details
#eof? ⇒ Boolean
Public: Returns ‘true` if the lexer has reached the end of the source string.
181 182 183 |
# File 'lib/violet/lexer.rb', line 181 def eof? @terminated || @index >= @source.size end |
#insert_before(token, original) ⇒ Object
Public: Inserts a new token into the token stream, before a reference token. If the reference token is the end-of-file mark, the token is appended instead.
token - The ‘Token` to be inserted into the token stream. original - The reference `Token` before which the new `Token` is inserted.
Returns the new ‘Token`.
122 123 124 125 126 127 128 129 130 131 132 133 |
# File 'lib/violet/lexer.rb', line 122 def insert_before(token, original) if original[:name] == Token::Types[:eof] token[:index] = @tokens.size @tokens << token else token[:index] = original[:index] @tokens[token[:index]] = token original[:index] += 1 @tokens[original[:index]] = original end token end |
#lex(pattern = true) ⇒ Object
Public: Lexes a token.
pattern - If the token is ‘/` or `/=`, specifies whether it may be lexed
as part of a regular expression. If `false`, the token will be lexed as
a division operator instead (default: true).
Returns the lexed ‘Token`, or `nil` if the lexer has finished scanning the source.
193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 |
# File 'lib/violet/lexer.rb', line 193 def lex(pattern = true) return if @terminated if eof? @terminated ||= true token = Token.new(self, :eof, @source.size...@source.size) return token end token = TOKEN.match(@source, @index) do |match| case # Produces a whitespace, line terminator, line comment (`// ...`), or # block comment (`/* ... */`) token. when match[:whitespace] then lex_whitespace when match[:line_terminator] then lex_line_terminator when match[:line_comment] then lex_line_comment when match[:block_comment] then lex_block_comment # Produces a single- or double-quoted string token. A single method is # used to produce both kinds of tokens. when match[:single_quoted_string] then lex_string :single when match[:double_quoted_string] then lex_string :double # Produces a hexadecimal or decimal token. Octal numbers produce an # error, as they are prohibited in ES 5. when match[:number] then lex_number # `/` and `/=` may be interpreted as either regular expressions or # division operators. The `pattern` argument specifies whether # these tokens should be lexed as RegExps or punctuators. when pattern && match[:pattern] then lex_pattern else # The `<pattern>` capture may contain the `/` and `/=` tokens. if result = match[:pattern] || match[:punctuator] token = Token.new(self, :punctuator, @index...@index += result.size) @column += token.size token else # Lex the token as an identifier. lex_identifier end end end # Record the position of the token in the token stream. token[:index] = @tokens.size @tokens << token token end |
#lex_block_comment ⇒ Object
Internal: Lexes a block comment at the current scan position.
Returns the lexed ‘Token`.
282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 |
# File 'lib/violet/lexer.rb', line 282 def lex_block_comment start = @index # Mark the ending position of the comment. stop = @source.index("*/", start) if stop # Advance the current position past the end of the comment. @index = stop + 2 token = Token.new(self, :block_comment, start...@index) token[:isComment] = token[:isWhite] = true # Block comments trigger automatic semicolon insertion only if they # span multiple lines. The normalized source is used to quickly # detect line terminators. index = lines = 0 # Advance the current line. lines += 1 while index = @normalized_source[start...@index].index(?\n, index + 1) if lines.zero? # For single-line block comments, increase the column by the size of # the token. @column += token[:size] else # For multiline block comments, record the number of lines comprising # the comment and reset the column. @line += token[:lines] = lines @column = 0 end else # Unterminated block comment. If a line terminator is found, the comment # is assumed to end immediately before it. Otherwise, the comment is # assumed to end two characters after the current scan position. stop = @normalized_source.index(?\n, @index) @index = stop || @index + 2 token = Token.new(self, :error, start...@index) token[:error] = "Unterminated block comment." token[:isComment] = token[:isWhite] = token[:tokenError] = true @column += token[:size] end token end |
#lex_identifier ⇒ Object
Internal: Lexes a regular expression literal. See sections 7.1 and 7.6.
Returns the lexed ‘Token`.
651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 |
# File 'lib/violet/lexer.rb', line 651 def lex_identifier size = match_identifier? if size.zero? character = @source[@index] token = Token.new(self, :error, @index...@index += 1) token[:tokenError] = true token[:error] = if character == ?\\ @source[@index] == ?u ? "Invalid Unicode escape sequence." : "Illegal escape sequence." else "Invalid token." end else token = Token.new(self, :identifier, @index...@index += size) # Mark the token as a primitive if it is in the `Lexer::LITERALS` array. token[:isPrimitive] = LITERALS.include? token[:value] end @column += token[:size] token end |
#lex_line_comment ⇒ Object
Internal: Lexes a line comment at the current scan position.
Returns the lexed ‘Token`.
272 273 274 275 276 277 |
# File 'lib/violet/lexer.rb', line 272 def lex_line_comment @column = @normalized_source.index(?\n, @index) || @source.length token = Token.new(self, :line_comment, @index...@index = @column) token[:isComment] = token[:isWhite] = true token end |
#lex_line_terminator ⇒ Object
Internal: Lexes a line terminator at the current scan position: either a line feed, carriage return, line separator, or paragraph separator. See section 7.3 of the spec.
Returns the lexed ‘Token`.
252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 |
# File 'lib/violet/lexer.rb', line 252 def lex_line_terminator character = @source[@index] stop = @index + 1 # If the current character is a carriage return and the next character is # a line feed, the source string contains CRLF line endings. The `stop` # position is advanced one additional character, so that "\r\n" is treated # as a single terminator. stop += 1 if character == ?\r && @source[stop] == ?\n # Advance the current index past the terminator. token = Token.new(self, :line_terminator, @index...@index = stop) token[:lines] = 1 token[:isWhite] = true @line += 1 @column = 0 token end |
#lex_number ⇒ Object
Internal: Lexes a decimal or hexadecimal numeric value. See section 7.8.3.
Returns the lexed ‘Token`.
380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 |
# File 'lib/violet/lexer.rb', line 380 def lex_number start = @index @index += 1 # If the token begins with a `0x`, parse the remainder as a hexadecimal value. if @source[start..@index] =~ /0[xX]/ position = @index += 1 # Consume characters until the end of the string or a non-hexdigit # character is encountered. @index += 1 until eof? || @source[@index] !~ /\h/ # If no additional characters were consumed, the hex value is invalid. if position == @index token = Token.new(self, :error, start...@index) token[:error] = "Invalid hexdigit value." token[:isNumber] = token[:tokenError] = true else # The value is syntactically sound. token = Token.new(self, :hexadecimal_number, start...@index) token[:isPrimitive] = token[:isNumber] = true end else # Determine if an octal escape sequence is being parsed (i.e., a leading # zero followed by a decimal digit). is_octal = @source[start..@index] =~ /0\d/ # Parse the integral expression before the decimal point. unless @source[start] == ?. # Consume characters until the end of the string or a non-decimal # character is encountered. @index += match_decimal? # Advance past the decimal point. @index += 1 if @source[@index] == ?. end # Parse the decimal component. @index += match_decimal? # Parse the exponent. if @source[@index] =~ /[eE]/ # Advance past the sign. @index += 1 if @source[@index += 1] =~ /[+-]/ # Mark the current position and consume decimal digits past the # exponential. position = @index @index += match_decimal? # If no additional characters were consumed but an exponent was lexed, # the decimal value is invalid. if position == @index token = Token.new(self, :error, start...@index) token[:error] = "Exponents may not be empty." token[:tokenError] = true end end unless token # Octal literals are invalid in ES 5. if is_octal token = Token.new(self, :error, start...@index) token[:error] = "Invalid octal escape sequence." token[:isNumber] = token[:isOctal] = token[:tokenError] = true else # Syntactically valid decimal value. As with hexdigits, the parser # will determine if the lexed value is semantically sound. token = Token.new(self, :decimal_number, start...@index) token[:isPrimitive] = token[:isNumber] = true end end end @column += token[:size] token end |
#lex_pattern ⇒ Object
Internal: Lexes a regular expression literal. See section 7.8.5.
Returns the lexed ‘Token`.
450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 |
# File 'lib/violet/lexer.rb', line 450 def lex_pattern start = @index # Maintains a hash of the initial and terminal positions of balanced # regular expression characters: grouping parentheses, character class # brackets, and quantifier braces. balanced = {} # Ensures that all capturing groups in the pattern are balanced. groups = [] # A flag that specifies if the regular expression is terminated. terminated = false # Only the last syntax error is preserved for improperly constructed # regular expressions. syntax_error = nil loop do @index += 1 break if eof? # Use the normalized input to quickly detect line terminators. case character = @normalized_source[@index] when ?\n # Line terminators cannot occur within RegExp literals. token = Token.new(self, :error, start...@index) token[:error] = "Line terminators are not permitted within RegExp literals." token[:tokenError] = token[:errorHasContent] = true # Avoid emitting a second unterminated RegExp error once lexing is # complete. terminated = true break when ?/ # An unescaped `/` marks the end of the regular expression. terminated = true break when /[?*+]/ syntax_error = "`?`, `*`, and `+` require a value to repeat." when ?^ # `^` may only occur immediately following `|`, or at the beginning # of either the pattern, a capturing group, or a lookahead assertion # (`?:`, `?=`, or `?!`). Note that `^` may also negate a character # class; however, character classes have different semantics and are # lexed separately. unless @source[@index - 1] =~ %r{[/|(]} || @source[@index - 3, 3] =~ /\(\?[:!=]/ syntax_error = "`^` may not occur here." end when ?$ # `$` may only occur immediately before `|`, or at the end of either # the pattern, a capturing group, or a lookahead assertion. unless @source[@index + 1] =~ %r{[/|)]} syntax_error = "`$` may not occur here." end when ?} # Interpreters can distinguish between and automatically escape braces # not used to delimit quantifiers. Nevertheless, it's considered a bad # practice to leave special characters unescaped in RegExps. Both the # Violet lexer and the ZeParser tokenizer assume that all unescaped # braces delimit quantifiers, and emit errors accordingly. syntax_error = "Mismatched `}`." else # Lex capturing groups. if character == ?( # Mark the initial position of the capturing group. groups << @index - start elsif character == ?) if groups.empty? syntax_error = "Capturing group parentheses must be balanced." else # Record the initial and terminal positions of the parentheses delimiting the group. terminal = @index - start balanced[initial = groups.pop] = terminal balanced[terminal] = initial end end # Character Classes. # ------------------ if character == ?[ # Record the initial position of the character class. initial = @index - start # Characters in character classes are treated literally, so there # is no need to escape them. The exceptions are line terminators and # unescaped closing brackets, which are not part of the # `RegularExpressionClassChar` grammar. loop do @index += 1 break if eof? || @normalized_source[@index] == ?\n || @source[@index] == ?] if @source[@index] == ?\\ if @normalized_source[@index + 1] == ?\n # Abort lexing if a line terminator is encountered. break else # Skip lexing the subsequent escaped character. This ensures # that escaped closing brackets (`\]`) are lexed correctly. @index += 1 end end end if @source[@index] == ?] # Record the initial and terminal positions of the brackets # delimiting the class. terminal = @index - start balanced[initial] = terminal balanced[terminal] = initial else token = Token.new(self, :error, start...@index) token[:error] = "Character class brackets must be balanced." token[:tokenError] = true # Avoid emitting an unterminated RegExp error once lexing is # complete. terminated = true break end # Lex escaped characters. Escape sequences may occur anywhere within # the RegExp, and indicate that the following character should be # interpreted literally. elsif character == ?\\ && @normalized_source[@index + 1] != ?\n @index += 1 end # Lookahead Assertions and Quantifiers. # ------------------------------------- if character == ?( # Lex a non-capturing group, positive lookahead, or negative lookahead. @index += 2 if @source[@index + 1, 2] =~ /\?[:=!]/ else # Lex quantifiers. case @source[@index + 1] when ?? # The `?` quantifier matches the preceding character zero or one # times. @index += 1 when /[*+]/ # The `*` quantifier matches the preceding character zero or more # times; `+` matches a character one or more times. `*?` and `+?` # indicate a non-greedy match. @index += 1 if @source[@index += 1] == ?? when ?{ # Advance one character and mark the initial position of the # quantifier. @index += 1 initial = @index - start # The `{n}` quantifier matches the preceding character exactly # `n` times. `{n,}` matches at least `n` occurrences of the # preceding character. `{n,m}` matches at least `n` and at most # `m` occurrences. unless @source[@index += 1] =~ /\d/ syntax_error = "Quantifier curly requires at least one digit before the comma" end # Lex the `n` value. @index += match_decimal? # Lex the `m` value, if any, if a comma is specified. @index += match_decimal? if @source[@index += 1] == ?, # Quantifier braces must be balanced. if @source[@index + 1] == ?} @index += 1 terminal = @index - start balanced[initial] = terminal balanced[terminal] = initial # A trailing `?` indicates a non-greedy match. @index += 1 if @source[@index + 1] == ?? else syntax_error = "Quantifier curly requires to be closed" end end end end end # Construct the token. # -------------------- unless terminated token = Token.new(self, :error, start...@index) token[:error] = "Unterminated RegExp literal." token[:tokenError] = true else # Advance one character and lex the regular expression flags, if any, # as an identifier fragment (the grammar for `RegularExpressionFlags` # is that of `IdentifierPart`). @index += 1 @index += match_identifier? :fragment if !groups.empty? # If the `groups` list is not empty, at least one set of capturing # group parentheses was not balanced. token = Token.new(self, :error, start...@index) token[:tokenError] = true token[:error] = "Mismatched `(` or `)`." elsif syntax_error # Add the last syntax error to the stack. token = Token.new(self, :error, start...@index) token[:tokenError] = token[:errorHasContent] = true token[:error] = syntax_error else token = Token.new(self, :pattern, start...@index) token[:isPrimitive] = true token[:pairs] = balanced end end @column += @index - start token end |
#lex_string(style) ⇒ Object
Internal: Lexes a single- or double-quoted string primitive at the current scan position.
style - A ‘Symbol` that specifies the quoting style. The quoting style
must be defined as a key in the `Lexer::STRINGS` hash.
Returns the lexed ‘Token`. Raises `KeyError` if the quoting style is not defined in the hash.
329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 |
# File 'lib/violet/lexer.rb', line 329 def lex_string(style) style = STRINGS.fetch(style) start = @index lines = 0 loop do # Parse escape sequences in strings. until eof? || @source[@index += 1] != ?\\ # Record the number of new lines if the string contains linefeeds. The shadow input is # used to avoid repeatedly normalizing line endings. @line += (lines += 1) if @normalized_source[@index + 1] == ?\n # Advance to the next character. @index += 1 end # If the string contains an unescaped line terminator, it is a syntax error. Some # environments permit unescaped new lines in strings; however, the spec disallows them. if @source[@index] =~ LINE_TERMINATORS token = Token.new(self, :error, start...@index) token[:error] = style[:invalid_continuation_error] token[:isString] = token[:tokenError] = true break end # Consume escape sequences until either the end of the source or the end-of-string character # is reached. break if eof? || @source[@index] == style[:quote] end # If the end of the source is reached without consuming the end-of-string character, the # source contains an unterminated string literal. if @source[@index] == style[:quote] # Advance the index past the end-of-string character. @index += 1 token = Token.new(self, style[:kind], start...@index) token[:isPrimitive] = token[:isString] = true # Update the line and column entries accordingly. if lines.zero? @column += token[:size] else token[:lines] = lines @column = 0 end else token = Token.new(self, :error, start...@index) token[:error] = style[:unterminated_string_error] token[:isString] = token[:tokenError] = true @column += token[:size] end token end |
#lex_whitespace ⇒ Object
Internal: Lexes a whitespace token at the current scan position.
Returns the lexed ‘Token`.
240 241 242 243 244 245 |
# File 'lib/violet/lexer.rb', line 240 def lex_whitespace token = Token.new(self, :whitespace, @index...@index += 1) token[:isWhite] = true @column += 1 token end |
#match_decimal? ⇒ Boolean
Internal: Returns the maximum number of characters, relative to the current scan pointer, that may be parsed as valid decimal characters. The scan pointer is not advanced.
173 174 175 176 177 |
# File 'lib/violet/lexer.rb', line 173 def match_decimal? size = @index size += 1 until eof? || @source[size] !~ /\d/ size - @index end |
#match_identifier?(lex_as_fragment = false) ⇒ Boolean
Internal: Returns the maximum number of characters, relative to the current scan pointer, that may be parsed as valid identifier characters. The scan pointer is not advanced.
lex_as_fragment - A boolean that specifies whether the identifier may be
lexed as a fragment. Certain productions allow identifier fragments,
while others require that the identifier begin with a subset of valid
fragment characters (default: false).
143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
# File 'lib/violet/lexer.rb', line 143 def match_identifier?(lex_as_fragment = false) size = @index # Identifier starting characters are restricted to a subset of valid # identifier fragment characters. until eof? # Unicode escape sequences may occur anywhere within an identifier. if /^\\u\h{4}$/ =~ @source[size, 6] # Advance the scan pointer past the Unicode escape sequence. size += 6 else character = @source[size] if lex_as_fragment # Use the full `IdentifierPart` production. break unless character =~ IDENTIFIER_FRAGMENT else # The initial character must conform to the more restrictive # `IdentifierStart` production. break unless character =~ IDENTIFIER_START # All subsequent characters may be lexed as identifier fragments. lex_as_fragment = true end size += 1 end end size - @index end |
#reset! ⇒ Object
Public: Resets the lexer to its original position and clears the token stream.
92 93 94 95 96 |
# File 'lib/violet/lexer.rb', line 92 def reset! @index = @line = @column = 0 @terminated = false (@tokens ||= []).clear end |
#tokens(*patterns) ⇒ Object
Public: Produces a complete token stream from the source. This method resets the lexer prior to lexing the source string.
patterns - Zero or more boolean arguments that correspond to each lexed
token and specify if the `/` and `/=` tokens may be interpreted as
regular expressions (`true`) or division operators (`false`). This
flag only applies to division and regular expression tokens; setting
it for other tokens has no effect.
106 107 108 109 110 111 112 |
# File 'lib/violet/lexer.rb', line 106 def tokens(*patterns) reset! index = -1 # Lex tokens until the end-of-file mark is reached. loop { break unless lex patterns[index += 1] } @tokens end |