Class: SyntaxTree::Parser

Inherits:

Ripper

Object
Ripper
SyntaxTree::Parser

show all

Defined in:: lib/syntax_tree/parser.rb

Overview

Parser is a subclass of the Ripper library that subscribes to the stream of tokens and nodes coming from the parser and builds up a syntax tree.

Defined Under Namespace

Classes: MultiByteString, ParseError, PinVisitor, Semicolon, SingleByteString, TokenList

Instance Attribute Summary collapse

#comments ⇒ Object readonly
Array[ Comment | EmbDoc ]

the list of comments that have been found while parsing the source.
#line_counts ⇒ Object readonly
Array[ SingleByteString | MultiByteString ]

the list of objects that represent the start of each line in character offsets.
#source ⇒ Object readonly
String

the source being parsed.
#tokens ⇒ Object readonly
Array[ untyped ]

a running list of tokens that have been found in the source.

Instance Method Summary collapse

#initialize(source) ⇒ Parser constructor

A new instance of Parser.

Constructor Details

#initialize(source) ⇒ `Parser`

Returns a new instance of Parser.

# File 'lib/syntax_tree/parser.rb', line 116

def initialize(source, *)
  super

  # We keep the source around so that we can refer back to it when we're
  # generating the AST. Sometimes it's easier to just reference the source
  # string when you want to check if it contains a certain character, for
  # example.
  @source = source

  # This is the full set of comments that have been found by the parser.
  # It's a running list. At the end of every block of statements, they will
  # go in and attempt to grab any comments that are on their own line and
  # turn them into regular statements. So at the end of parsing the only
  # comments left in here will be comments on lines that also contain code.
  @comments = []

  # This is the current embdoc (comments that start with =begin and end with
  # =end). Since they can't be nested, there's no need for a stack here, as
  # there can only be one active. These end up getting dumped into the
  # comments list before getting picked up by the statements that surround
  # them.
  @embdoc = nil

  # This is an optional node that can be present if the __END__ keyword is
  # used in the file. In that case, this will represent the content after
  # that keyword.
  @__end__ = nil

  # Heredocs can actually be nested together if you're using interpolation,
  # so this is a stack of heredoc nodes that are currently being created.
  # When we get to the token that finishes off a heredoc node, we pop the
  # top one off. If there are others surrounding it, then the body events
  # will now be added to the correct nodes.
  @heredocs = []

  # This is a running list of tokens that have fired. It's useful mostly for
  # maintaining location information. For example, if you're inside the
  # handle of a def event, then in order to determine where the AST node
  # started, you need to look backward in the tokens to find a def keyword.
  # Most of the time, when a parser event consumes one of these events, it
  # will be deleted from the list. So ideally, this list stays pretty short
  # over the course of parsing a source string.
  @tokens = TokenList.new

  # Here we're going to build up a list of SingleByteString or
  # MultiByteString objects. They're each going to represent a string in the
  # source. They are used by the `char_pos` method to determine where we are
  # in the source string.
  @line_counts = []
  last_index = 0

  @source.each_line do |line|
    @line_counts << if line.size == line.bytesize
      SingleByteString.new(last_index)
    else
      MultiByteString.new(last_index, line)
    end

    last_index += line.size
  end

  # Make sure line counts is filled out with the first and last line at
  # minimum so that it has something to compare against if the parser is in
  # a lineno=2 state for an empty file.
  @line_counts << SingleByteString.new(0) if @line_counts.empty?
  @line_counts << SingleByteString.new(last_index)
end

Instance Attribute Details

#comments ⇒ `Object` (readonly)

Array[ Comment | EmbDoc ]: the list of comments that have been found

while parsing the source.



114
115
116

# File 'lib/syntax_tree/parser.rb', line 114

def comments
  @comments
end

#line_counts ⇒ `Object` (readonly)

Array[ SingleByteString | MultiByteString ]: the list of objects that

represent the start of each line in character offsets



105
106
107

# File 'lib/syntax_tree/parser.rb', line 105

def line_counts
  @line_counts
end

#source ⇒ `Object` (readonly)

String: the source being parsed



101
102
103

# File 'lib/syntax_tree/parser.rb', line 101

def source
  @source
end

#tokens ⇒ `Object` (readonly)

Array[ untyped ]: a running list of tokens that have been found in the

source. This list changes a lot as certain nodes will “consume” these tokens to determine their bounds.



110
111
112

# File 'lib/syntax_tree/parser.rb', line 110

def tokens
  @tokens
end