Class: Violet::Lexer

Inherits:
Object
  • Object
show all
Defined in:
lib/violet/lexer.rb

Overview

Public: Lexes a JavaScript source string.

Constant Summary collapse

LINE_TERMINATORS =

Public: Matches line terminators: line feeds, carriage returns, line separators, and paragraph separators. See section 7.3 of the ES 5.1 spec.

/[\n\r\u2028\u2029]/
NORMALIZE_LINE_ENDINGS =

Public: Matches line separators, paragraph separators, and carriage returns not followed by line separators. Used to convert all line terminators to line feeds. CRLF line endings are preserved.

/[\u2028\u2029]|(?:\r[^\n])/
IDENTIFIER_START =

Public: Matches Unicode letters, ‘$`, `_`, and Unicode escape sequences. See section 7.6.

/[$_\p{Lu}\p{Ll}\p{Lt}\p{Lm}\p{Lo}\p{Nl}]/
IDENTIFIER_FRAGMENT =

Public: Matches identifier starting characters, Unicode combining marks, Unicode digits, Unicode connector punctuators, zero-width non-joiners, and zero-width joiners. See section 7.1.

Regexp.union(IDENTIFIER_START, /[\p{Mn}\p{Mc}\p{Nd}\p{Pc}\u200c\u200d]/)
TOKEN =

Public: Matches an ECMAScript token. This is a superset of the ‘Token` production defined in section 7.5 of the spec.

%r(
  ## Whitespace characters: tab, vertical tab, form feed, space,
  # non-breaking space, byte-order mark, and other Unicode space separators
  # (Category Z). The space and non-breaking space characters are matched by
  # the \p{Z} Unicode category class. See section 7.2 of the ES spec.
  (?<whitespace>[\t\v\f\ufeff\uffff\p{Z}])?
  # Line terminators. See section 7.3.
  (?<line_terminator>#{LINE_TERMINATORS})?
  # Line and block comments. See section 7.4.
  (?<line_comment>//)?
  (?<block_comment>/\*)?
  # Single- and double-quoted string literals. See section 7.8.4.
  (?<single_quoted_string>')?
  (?<double_quoted_string>")?
  # Numeric literals. See section 7.8.3.
  (?<number>\.?[0-9])?
  # RegExp literals. See section 7.8.5. This capture may also match the
  # `DivPunctuator` production.
  (?:(?<pattern>/)[^=])?
  # Punctuators. See section 7.7.
  (?<punctuator>\>>>=|===|!==|>>>|<<=|>>=|<=|>=|==|!=|\+\+|--|<<|>>|&&|
    \|\||\+=|-=|\*=|%=|&=|\|=|\^=|/=|\{|\}|\(|\)|\[|\]|\.|;|,|<|>|\+|-|
    \*|%|\||&|\||\^|!|~|\?|:|=|/)?
)x
LITERALS =

Internal: The ‘true`, `false`, and `null` literals, as well as the `undefined` value. The lexer marks these four values as primitives.

%w( undefined null true false )
STRINGS =

Internal: A ‘Hash` that contains the quote character, token kind, and the unterminated string and invalid line continuation error messages for single- and double-quoted string tokens.

%w( single ' double " ).each_slice(2).with_object({}) do |(kind, quote), value|
  value[kind.to_sym] = {
    :quote => quote,
    :kind => "#{kind}_quoted_string".to_sym,
    :unterminated_string_error => "Unterminated #{kind}-quoted string literal.",
    :invalid_continuation_error => "Unescaped line terminators are not permitted within #{kind}-quoted string literals."
  }
end

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(source) ⇒ Lexer

Public: Creates a new ‘Lexer` with a source string.

source - The source ‘String`.



82
83
84
85
86
87
88
# File 'lib/violet/lexer.rb', line 82

def initialize(source)
  @source = source
  # Replace all line terminators with a single line feed, but preserve CRLF
  # line endings.
  @normalized_source = @source.gsub(NORMALIZE_LINE_ENDINGS, ?\n)
  reset!
end

Instance Attribute Details

#columnObject (readonly)

Public: Gets the current column.



77
78
79
# File 'lib/violet/lexer.rb', line 77

def column
  @column
end

#lineObject (readonly)

Public: Gets the current line.



74
75
76
# File 'lib/violet/lexer.rb', line 74

def line
  @line
end

#sourceObject (readonly)

Public: Gets the source string.



71
72
73
# File 'lib/violet/lexer.rb', line 71

def source
  @source
end

Instance Method Details

#eof?Boolean

Public: Returns ‘true` if the lexer has reached the end of the source string.

Returns:

  • (Boolean)


181
182
183
# File 'lib/violet/lexer.rb', line 181

def eof?
  @terminated || @index >= @source.size
end

#insert_before(token, original) ⇒ Object

Public: Inserts a new token into the token stream, before a reference token. If the reference token is the end-of-file mark, the token is appended instead.

token - The ‘Token` to be inserted into the token stream. original - The reference `Token` before which the new `Token` is inserted.

Returns the new ‘Token`.



122
123
124
125
126
127
128
129
130
131
132
133
# File 'lib/violet/lexer.rb', line 122

def insert_before(token, original)
  if original[:name] == Token::Types[:eof]
    token[:index] = @tokens.size
    @tokens << token
  else
    token[:index] = original[:index]
    @tokens[token[:index]] = token
    original[:index] += 1
    @tokens[original[:index]] = original
  end
  token
end

#lex(pattern = true) ⇒ Object

Public: Lexes a token.

pattern - If the token is ‘/` or `/=`, specifies whether it may be lexed

as part of a regular expression. If `false`, the token will be lexed as
a division operator instead (default: true).

Returns the lexed ‘Token`, or `nil` if the lexer has finished scanning the source.



193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
# File 'lib/violet/lexer.rb', line 193

def lex(pattern = true)
  return if @terminated
  if eof?
    @terminated ||= true
    token = Token.new(self, :eof, @source.size...@source.size)
    return token
  end
  token = TOKEN.match(@source, @index) do |match|
    case
    # Produces a whitespace, line terminator, line comment (`// ...`), or
    # block comment (`/* ... */`) token.
    when match[:whitespace] then lex_whitespace
    when match[:line_terminator] then lex_line_terminator
    when match[:line_comment] then lex_line_comment
    when match[:block_comment] then lex_block_comment
    # Produces a single- or double-quoted string token. A single method is
    # used to produce both kinds of tokens.
    when match[:single_quoted_string] then lex_string :single
    when match[:double_quoted_string] then lex_string :double
    # Produces a hexadecimal or decimal token. Octal numbers produce an
    # error, as they are prohibited in ES 5.
    when match[:number] then lex_number
    # `/` and `/=` may be interpreted as either regular expressions or
    # division operators. The `pattern` argument specifies whether
    # these tokens should be lexed as RegExps or punctuators.
    when pattern && match[:pattern] then lex_pattern
    else
      # The `<pattern>` capture may contain the `/` and `/=` tokens.
      if result = match[:pattern] || match[:punctuator]
        token = Token.new(self, :punctuator, @index...@index += result.size)
        @column += token.size
        token
      else
        # Lex the token as an identifier.
        lex_identifier
      end
    end
  end
  # Record the position of the token in the token stream.
  token[:index] = @tokens.size
  @tokens << token
  token
end

#lex_block_commentObject

Internal: Lexes a block comment at the current scan position.

Returns the lexed ‘Token`.



282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
# File 'lib/violet/lexer.rb', line 282

def lex_block_comment
  start = @index
  # Mark the ending position of the comment.
  stop = @source.index("*/", start)
  if stop
    # Advance the current position past the end of the comment.
    @index = stop + 2
    token = Token.new(self, :block_comment, start...@index)
    token[:isComment] = token[:isWhite] = true
    # Block comments trigger automatic semicolon insertion only if they
    # span multiple lines. The normalized source is used to quickly
    # detect line terminators.
    index = lines = 0
    # Advance the current line.
    lines += 1 while index = @normalized_source[start...@index].index(?\n, index + 1)
    if lines.zero?
      # For single-line block comments, increase the column by the size of
      # the token.
      @column += token[:size]
    else
      # For multiline block comments, record the number of lines comprising
      # the comment and reset the column.
      @line += token[:lines] = lines
      @column = 0
    end
  else
    # Unterminated block comment. If a line terminator is found, the comment
    # is assumed to end immediately before it. Otherwise, the comment is
    # assumed to end two characters after the current scan position.
    stop = @normalized_source.index(?\n, @index)
    @index = stop || @index + 2
    token = Token.new(self, :error, start...@index)
    token[:error] = "Unterminated block comment."
    token[:isComment] = token[:isWhite] = token[:tokenError] = true
    @column += token[:size]
  end
  token
end

#lex_identifierObject

Internal: Lexes a regular expression literal. See sections 7.1 and 7.6.

Returns the lexed ‘Token`.



651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
# File 'lib/violet/lexer.rb', line 651

def lex_identifier
  size = match_identifier?
  if size.zero?
    character = @source[@index]
    token = Token.new(self, :error, @index...@index += 1)
    token[:tokenError] = true
    token[:error] = if character == ?\\
      @source[@index] == ?u ? "Invalid Unicode escape sequence." : "Illegal escape sequence."
    else
      "Invalid token."
    end
  else
    token = Token.new(self, :identifier, @index...@index += size)
    # Mark the token as a primitive if it is in the `Lexer::LITERALS` array.
    token[:isPrimitive] = LITERALS.include? token[:value]
  end
  @column += token[:size]
  token
end

#lex_line_commentObject

Internal: Lexes a line comment at the current scan position.

Returns the lexed ‘Token`.



272
273
274
275
276
277
# File 'lib/violet/lexer.rb', line 272

def lex_line_comment
  @column = @normalized_source.index(?\n, @index) || @source.length
  token = Token.new(self, :line_comment, @index...@index = @column)
  token[:isComment] = token[:isWhite] = true
  token
end

#lex_line_terminatorObject

Internal: Lexes a line terminator at the current scan position: either a line feed, carriage return, line separator, or paragraph separator. See section 7.3 of the spec.

Returns the lexed ‘Token`.



252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
# File 'lib/violet/lexer.rb', line 252

def lex_line_terminator
  character = @source[@index]
  stop = @index + 1
  # If the current character is a carriage return and the next character is
  # a line feed, the source string contains CRLF line endings. The `stop`
  # position is advanced one additional character, so that "\r\n" is treated
  # as a single terminator.
  stop += 1 if character == ?\r && @source[stop] == ?\n
  # Advance the current index past the terminator.
  token = Token.new(self, :line_terminator, @index...@index = stop)
  token[:lines] = 1
  token[:isWhite] = true
  @line += 1
  @column = 0
  token
end

#lex_numberObject

Internal: Lexes a decimal or hexadecimal numeric value. See section 7.8.3.

Returns the lexed ‘Token`.



380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
# File 'lib/violet/lexer.rb', line 380

def lex_number
  start = @index
  @index += 1
  # If the token begins with a `0x`, parse the remainder as a hexadecimal value.
  if @source[start..@index] =~ /0[xX]/
    position = @index += 1
    # Consume characters until the end of the string or a non-hexdigit
    # character is encountered.
    @index += 1 until eof? || @source[@index] !~ /\h/
    # If no additional characters were consumed, the hex value is invalid.
    if position == @index
      token = Token.new(self, :error, start...@index)
      token[:error] = "Invalid hexdigit value."
      token[:isNumber] = token[:tokenError] = true
    else
      # The value is syntactically sound.
      token = Token.new(self, :hexadecimal_number, start...@index)
      token[:isPrimitive] = token[:isNumber] = true
    end
  else
    # Determine if an octal escape sequence is being parsed (i.e., a leading
    # zero followed by a decimal digit).
    is_octal = @source[start..@index] =~ /0\d/
    # Parse the integral expression before the decimal point.
    unless @source[start] == ?.
      # Consume characters until the end of the string or a non-decimal
      # character is encountered.
      @index += match_decimal?
      # Advance past the decimal point.
      @index += 1 if @source[@index] == ?.
    end
    # Parse the decimal component.
    @index += match_decimal?
    # Parse the exponent.
    if @source[@index] =~ /[eE]/
      # Advance past the sign.
      @index += 1 if @source[@index += 1] =~ /[+-]/
      # Mark the current position and consume decimal digits past the
      # exponential.
      position = @index
      @index += match_decimal?
      # If no additional characters were consumed but an exponent was lexed,
      # the decimal value is invalid.
      if position == @index
        token = Token.new(self, :error, start...@index)
        token[:error] = "Exponents may not be empty."
        token[:tokenError] = true
      end
    end
    unless token
      # Octal literals are invalid in ES 5.
      if is_octal
        token = Token.new(self, :error, start...@index)
        token[:error] = "Invalid octal escape sequence."
        token[:isNumber] = token[:isOctal] = token[:tokenError] = true
      else
        # Syntactically valid decimal value. As with hexdigits, the parser
        # will determine if the lexed value is semantically sound.
        token = Token.new(self, :decimal_number, start...@index)
        token[:isPrimitive] = token[:isNumber] = true
      end
    end
  end
  @column += token[:size]
  token
end

#lex_patternObject

Internal: Lexes a regular expression literal. See section 7.8.5.

Returns the lexed ‘Token`.



450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
# File 'lib/violet/lexer.rb', line 450

def lex_pattern
  start = @index
  # Maintains a hash of the initial and terminal positions of balanced
  # regular expression characters: grouping parentheses, character class
  # brackets, and quantifier braces.
  balanced = {}
  # Ensures that all capturing groups in the pattern are balanced.
  groups = []
  # A flag that specifies if the regular expression is terminated.
  terminated = false
  # Only the last syntax error is preserved for improperly constructed
  # regular expressions.
  syntax_error = nil
  loop do
    @index += 1
    break if eof?
    # Use the normalized input to quickly detect line terminators.
    case character = @normalized_source[@index]
    when ?\n
      # Line terminators cannot occur within RegExp literals.
      token = Token.new(self, :error, start...@index)
      token[:error] = "Line terminators are not permitted within RegExp literals."
      token[:tokenError] = token[:errorHasContent] = true
      # Avoid emitting a second unterminated RegExp error once lexing is
      # complete.
      terminated = true
      break
    when ?/
      # An unescaped `/` marks the end of the regular expression.
      terminated = true
      break
    when /[?*+]/
      syntax_error = "`?`, `*`, and `+` require a value to repeat."
    when ?^
      # `^` may only occur immediately following `|`, or at the beginning
      # of either the pattern, a capturing group, or a lookahead assertion
      # (`?:`, `?=`, or `?!`). Note that `^` may also negate a character
      # class; however, character classes have different semantics and are
      # lexed separately.
      unless @source[@index - 1] =~ %r{[/|(]} || @source[@index - 3, 3] =~ /\(\?[:!=]/
        syntax_error = "`^` may not occur here."
      end
    when ?$
      # `$` may only occur immediately before `|`, or at the end of either
      # the pattern, a capturing group, or a lookahead assertion.
      unless @source[@index + 1] =~ %r{[/|)]}
        syntax_error = "`$` may not occur here."
      end
    when ?}
      # Interpreters can distinguish between and automatically escape braces
      # not used to delimit quantifiers. Nevertheless, it's considered a bad
      # practice to leave special characters unescaped in RegExps. Both the
      # Violet lexer and the ZeParser tokenizer assume that all unescaped
      # braces delimit quantifiers, and emit errors accordingly.
      syntax_error = "Mismatched `}`."
    else
      # Lex capturing groups.
      if character == ?(
        # Mark the initial position of the capturing group.
        groups << @index - start
      elsif character == ?)
        if groups.empty?
          syntax_error = "Capturing group parentheses must be balanced."
        else
          # Record the initial and terminal positions of the parentheses delimiting the group.
          terminal = @index - start
          balanced[initial = groups.pop] = terminal
          balanced[terminal] = initial
        end
      end

      # Character Classes.
      # ------------------
      if character == ?[
        # Record the initial position of the character class.
        initial = @index - start
        # Characters in character classes are treated literally, so there
        # is no need to escape them. The exceptions are line terminators and
        # unescaped closing brackets, which are not part of the
        # `RegularExpressionClassChar` grammar.
        loop do
          @index += 1
          break if eof? || @normalized_source[@index] == ?\n || @source[@index] == ?]
          if @source[@index] == ?\\
            if @normalized_source[@index + 1] == ?\n
              # Abort lexing if a line terminator is encountered.
              break
            else
              # Skip lexing the subsequent escaped character. This ensures
              # that escaped closing brackets (`\]`) are lexed correctly.
              @index += 1
            end
          end
        end
        if @source[@index] == ?]
          # Record the initial and terminal positions of the brackets
          # delimiting the class.
          terminal = @index - start
          balanced[initial] = terminal
          balanced[terminal] = initial
        else
          token = Token.new(self, :error, start...@index)
          token[:error] = "Character class brackets must be balanced."
          token[:tokenError] = true
          # Avoid emitting an unterminated RegExp error once lexing is
          # complete.
          terminated = true
          break
        end
      # Lex escaped characters. Escape sequences may occur anywhere within
      # the RegExp, and indicate that the following character should be
      # interpreted literally.
      elsif character == ?\\ && @normalized_source[@index + 1] != ?\n
        @index += 1
      end

      # Lookahead Assertions and Quantifiers.
      # -------------------------------------
      if character == ?(
        # Lex a non-capturing group, positive lookahead, or negative lookahead.
        @index += 2 if @source[@index + 1, 2] =~ /\?[:=!]/
      else
        # Lex quantifiers.
        case @source[@index + 1]
        when ??
          # The `?` quantifier matches the preceding character zero or one
          # times.
          @index += 1
        when /[*+]/
          # The `*` quantifier matches the preceding character zero or more
          # times; `+` matches a character one or more times. `*?` and `+?`
          # indicate a non-greedy match.
          @index += 1 if @source[@index += 1] == ??
        when ?{
          # Advance one character and mark the initial position of the
          # quantifier.
          @index += 1
          initial = @index - start
          # The `{n}` quantifier matches the preceding character exactly
          # `n` times. `{n,}` matches at least `n` occurrences of the
          # preceding character. `{n,m}` matches at least `n` and at most
          # `m` occurrences.
          unless @source[@index += 1] =~ /\d/
            syntax_error = "Quantifier curly requires at least one digit before the comma"
          end
          # Lex the `n` value.
          @index += match_decimal?
          # Lex the `m` value, if any, if a comma is specified.
          @index += match_decimal? if @source[@index += 1] == ?,
          # Quantifier braces must be balanced.
          if @source[@index + 1] == ?}
            @index += 1
            terminal = @index - start
            balanced[initial] = terminal
            balanced[terminal] = initial
            # A trailing `?` indicates a non-greedy match.
            @index += 1 if @source[@index + 1] == ??
          else
            syntax_error = "Quantifier curly requires to be closed"
          end
        end
      end
    end
  end

  # Construct the token.
  # --------------------
  unless terminated
    token = Token.new(self, :error, start...@index)
    token[:error] = "Unterminated RegExp literal."
    token[:tokenError] = true
  else
    # Advance one character and lex the regular expression flags, if any,
    # as an identifier fragment (the grammar for `RegularExpressionFlags`
    # is that of `IdentifierPart`).
    @index += 1
    @index += match_identifier? :fragment
    if !groups.empty?
      # If the `groups` list is not empty, at least one set of capturing
      # group parentheses was not balanced.
      token = Token.new(self, :error, start...@index)
      token[:tokenError] = true
      token[:error] = "Mismatched `(` or `)`."
    elsif syntax_error
      # Add the last syntax error to the stack.
      token = Token.new(self, :error, start...@index)
      token[:tokenError] = token[:errorHasContent] = true
      token[:error] = syntax_error
    else
      token = Token.new(self, :pattern, start...@index)
      token[:isPrimitive] = true
      token[:pairs] = balanced
    end
  end
  @column += @index - start
  token
end

#lex_string(style) ⇒ Object

Internal: Lexes a single- or double-quoted string primitive at the current scan position.

style - A ‘Symbol` that specifies the quoting style. The quoting style

must be defined as a key in the `Lexer::STRINGS` hash.

Returns the lexed ‘Token`. Raises `KeyError` if the quoting style is not defined in the hash.



329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
# File 'lib/violet/lexer.rb', line 329

def lex_string(style)
  style = STRINGS.fetch(style)
  start = @index
  lines = 0
  loop do
    # Parse escape sequences in strings.
    until eof? || @source[@index += 1] != ?\\
      # Record the number of new lines if the string contains linefeeds. The shadow input is
      # used to avoid repeatedly normalizing line endings.
      @line += (lines += 1) if @normalized_source[@index + 1] == ?\n
      # Advance to the next character.
      @index += 1
    end
    # If the string contains an unescaped line terminator, it is a syntax error. Some
    # environments permit unescaped new lines in strings; however, the spec disallows them.
    if @source[@index] =~ LINE_TERMINATORS
      token = Token.new(self, :error, start...@index)
      token[:error] = style[:invalid_continuation_error]
      token[:isString] = token[:tokenError] = true
      break
    end
    # Consume escape sequences until either the end of the source or the end-of-string character
    # is reached.
    break if eof? || @source[@index] == style[:quote]
  end
  # If the end of the source is reached without consuming the end-of-string character, the
  # source contains an unterminated string literal.
  if @source[@index] == style[:quote]
    # Advance the index past the end-of-string character.
    @index += 1
    token = Token.new(self, style[:kind], start...@index)
    token[:isPrimitive] = token[:isString] = true
    # Update the line and column entries accordingly.
    if lines.zero?
      @column += token[:size]
    else
      token[:lines] = lines
      @column = 0
    end
  else
    token = Token.new(self, :error, start...@index)
    token[:error] = style[:unterminated_string_error]
    token[:isString] = token[:tokenError] = true
    @column += token[:size]
  end
  token
end

#lex_whitespaceObject

Internal: Lexes a whitespace token at the current scan position.

Returns the lexed ‘Token`.



240
241
242
243
244
245
# File 'lib/violet/lexer.rb', line 240

def lex_whitespace
  token = Token.new(self, :whitespace, @index...@index += 1)
  token[:isWhite] = true
  @column += 1
  token
end

#match_decimal?Boolean

Internal: Returns the maximum number of characters, relative to the current scan pointer, that may be parsed as valid decimal characters. The scan pointer is not advanced.

Returns:

  • (Boolean)


173
174
175
176
177
# File 'lib/violet/lexer.rb', line 173

def match_decimal?
  size = @index
  size += 1 until eof? || @source[size] !~ /\d/
  size - @index
end

#match_identifier?(lex_as_fragment = false) ⇒ Boolean

Internal: Returns the maximum number of characters, relative to the current scan pointer, that may be parsed as valid identifier characters. The scan pointer is not advanced.

lex_as_fragment - A boolean that specifies whether the identifier may be

lexed as a fragment. Certain productions allow identifier fragments,
while others require that the identifier begin with a subset of valid
fragment characters (default: false).

Returns:

  • (Boolean)


143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
# File 'lib/violet/lexer.rb', line 143

def match_identifier?(lex_as_fragment = false)
  size = @index
  # Identifier starting characters are restricted to a subset of valid
  # identifier fragment characters.
  until eof?
    # Unicode escape sequences may occur anywhere within an identifier.
    if /^\\u\h{4}$/ =~ @source[size, 6]
      # Advance the scan pointer past the Unicode escape sequence.
      size += 6
    else
      character = @source[size]
      if lex_as_fragment
        # Use the full `IdentifierPart` production.
        break unless character =~ IDENTIFIER_FRAGMENT
      else
        # The initial character must conform to the more restrictive
        # `IdentifierStart` production.
        break unless character =~ IDENTIFIER_START
        # All subsequent characters may be lexed as identifier fragments.
        lex_as_fragment = true
      end
      size += 1
    end
  end
  size - @index
end

#reset!Object

Public: Resets the lexer to its original position and clears the token stream.



92
93
94
95
96
# File 'lib/violet/lexer.rb', line 92

def reset!
  @index = @line = @column = 0
  @terminated = false
  (@tokens ||= []).clear
end

#tokens(*patterns) ⇒ Object

Public: Produces a complete token stream from the source. This method resets the lexer prior to lexing the source string.

patterns - Zero or more boolean arguments that correspond to each lexed

token and specify if the `/` and `/=` tokens may be interpreted as
regular expressions (`true`) or division operators (`false`). This
flag only applies to division and regular expression tokens; setting
it for other tokens has no effect.


106
107
108
109
110
111
112
# File 'lib/violet/lexer.rb', line 106

def tokens(*patterns)
  reset!
  index = -1
  # Lex tokens until the end-of-file mark is reached.
  loop { break unless lex patterns[index += 1] }
  @tokens
end