Class: Peggy::Parser

Inherits:
Object
  • Object
show all
Defined in:
lib/parser.rb

Overview

Packrat parser class. Note all methods have a trailing exclamation (!) or question mark (?), or have long names with underscores (_). This is because productions are methods and we need to avoid name collisions. To use this class you must subclass Parser and provide your productions as methods. Your productions must call match? or one of the protected convenience routines to perform parsing. Productions must never call another production directly, or results will not get memoized and you will slow down your parse conciderably, and possibly risk getting into an infinite recursion (until the stack blows its top). Note, as a conveience in writting productions, you can call any match? function multiple times, passing each returned index, such as in a sequence, without checking the results of each production.

Direct Known Subclasses

Builder, PEG

Instance Attribute Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#debug_flagObject

Tells parser to print intermediate results if set.



37
38
39
# File 'lib/parser.rb', line 37

def debug_flag
  @debug_flag
end

#ignore_productionsObject

The productions to ignore.



47
48
49
# File 'lib/parser.rb', line 47

def ignore_productions
  @ignore_productions
end

#parse_resultsObject (readonly)

The results of the parse. A hash (keys of indexs) of hashes (keys of production symbols and values of end indexes.



44
45
46
# File 'lib/parser.rb', line 44

def parse_results
  @parse_results
end

#source_textObject

The source to parse, can be set prior to calling parse!().



40
41
42
# File 'lib/parser.rb', line 40

def source_text
  @source_text
end

Instance Method Details

#[](range) ⇒ Object

Return a range (or character) of the source_text.



50
51
52
53
# File 'lib/parser.rb', line 50

def [] range
  raise "source_text not set" if source_text.nil?
  source_text[range]
end

#allow?(goal, index) ⇒ Boolean

Try to match a production from the given index. Returns the end index if found or start index if not found.

Returns:

  • (Boolean)


83
84
85
86
87
# File 'lib/parser.rb', line 83

def allow? goal, index
  return NO_MATCH if index == NO_MATCH # allow users to not check results of a sequence
  found = match? goal, index
  found == NO_MATCH ? index : found
end

#check?(goal, index) ⇒ Boolean

Try to match a production from the given index then backtrack. Returns index if found or NO_MATCH if not.

Returns:

  • (Boolean)


91
92
93
94
95
# File 'lib/parser.rb', line 91

def check? goal, index
  return NO_MATCH if index == NO_MATCH # allow users to not check results of a sequence
  found = match? goal, index
  found == NO_MATCH ? NO_MATCH : index
end

#correct_regexp!(re) ⇒ Object

Make sure regular expressions match the beginning of the string, actually from the string from the given index.



185
186
187
188
# File 'lib/parser.rb', line 185

def correct_regexp! re
  source = re.source
  source[0..1] == '\\A' ? re : Regexp.new("\\A(#{source})", re.options)
end

#dissallow?(goal, index) ⇒ Boolean

Try not to match a production from the given index then backtrack. Returns index if not found or NO_MATCH if found.

Returns:

  • (Boolean)


99
100
101
102
103
# File 'lib/parser.rb', line 99

def dissallow? goal, index
  return NO_MATCH if index == NO_MATCH # allow users to not check results of a sequence
  found = match? goal, index
  found == NO_MATCH ? index : NO_MATCH
end

#eof(index) ⇒ Object

Special production that only matches the end of source_text. Note, this function does not end in (?) or (!) because it is meant be used as a normal production.



107
108
109
110
# File 'lib/parser.rb', line 107

def eof index
  return NO_MATCH if index == NO_MATCH # allow users to not check results of a sequence
  index >= source_text.length ? index : NO_MATCH
end

#ignore?(index) ⇒ Boolean

Match tokens that should be ignored. Used by match?(). Returns end index if found or start index if not found. Subclasses should override this method if they wish to ignore other text, such as comments.

Returns:

  • (Boolean)


136
137
138
139
140
141
142
143
144
145
# File 'lib/parser.rb', line 136

def ignore? index
  return NO_MATCH if index == NO_MATCH # allow users to not check results of a sequence
  return index if @ignoring || ignore_productions.nil?
  @ignoring = true
  ignore_productions.each do |prod|
    index = allow? prod, index
  end
  @ignoring = nil
  index
end

#literal?(value, index) ⇒ Boolean

Match a literal string or regular expression from the given index. Returns the end index if found or NO_MATCH if not found.

Returns:

  • (Boolean)


149
150
151
152
153
154
155
156
157
158
159
# File 'lib/parser.rb', line 149

def literal? value, index
  return NO_MATCH if index == NO_MATCH # allow users to not check results of a sequence
  case value
  when String
    string? value, index
  when Regexp
    regexp? value, index
  else
    raise "Unknown literal: #{value.inspect}"
  end
end

#match?(goal, index) ⇒ Boolean

Match a production from the given index. Returns the end index if found or NO_MATCH if not found.

Returns:

  • (Boolean)


114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
# File 'lib/parser.rb', line 114

def match? goal, index
  return NO_MATCH if index == NO_MATCH # allow users to not check results of a sequence
  index = ignore? index unless @ignoring
  goal = goal.to_sym
  position = parse_results[index]
  found = position.fetch(goal) do
    position[goal] = IN_USE # used to prevent inifinite recursion in case user attemts 
                            # a left recursion
    if (result = send goal, index)
      position[:found_order] = [] unless position.has_key?(:found_order)
      position[:found_order] << goal
    end
    position[goal] = result
  end
  puts "found #{goal} at #{index}...#{found} #{source_text[index...found].inspect}" if found && debug_flag
  raise "Parser cannot handle infinite (left) recursions. Please rewrite usage of '#{goal}'." if found == IN_USE
  found
end

#parse?(goal, source = nil, index = 0) ⇒ Boolean

Envokes the parser from the beginning of the source on the given production goal. You sohuld provide the source here or you can set source_text prior to calling. If index is provided the parser will ignore characters previous to it.

Returns:

  • (Boolean)


58
59
60
61
62
63
64
65
66
# File 'lib/parser.rb', line 58

def parse? goal, source = nil, index = 0
  source_text = source unless source.nil?
    # Hash of automatic hashes
  @parse_results = Hash.new {|h1, k1| h1[k1] = {}}
  @keys = nil
  index = match? goal, index
  puts pp(parse_results) if debug_flag
  index
end

#query?(*args) ⇒ Boolean

Queries the parse results for a heirarchy of production matches. An array of index ranges is returned, or an empny array if none are found. This can only be called after parse_results have been set by a parse.

Returns:

  • (Boolean)


71
72
73
74
75
76
77
78
79
# File 'lib/parser.rb', line 71

def query? *args
  raise "You must first call parse!" unless parse_results
  @keys = @parse_results.keys.sort unless @keys
  found_list = []
  index = 0
  args.each do |arg|
    index = find? arg, index
  end
end

#regexp?(value, index) ⇒ Boolean

Match a regular expression from the given index. Returns the end index if found or NO_MATCH if not found.

Returns:

  • (Boolean)


174
175
176
177
178
179
180
181
# File 'lib/parser.rb', line 174

def regexp? value, index
  return NO_MATCH if index == NO_MATCH # allow users to not check results of a sequence
  value = correct_regexp! value
  index = ignore? index unless @ignoring
  found = value.match source_text[index..-1]
# puts "#{value.inspect} ~= #{found[0].inspect}" if found
  found ? found.end(0) + index : NO_MATCH
end

#string?(value, index) ⇒ Boolean

Match a string from the given index. Returns the end index if found or NO_MATCH if not found.

Returns:

  • (Boolean)


163
164
165
166
167
168
169
170
# File 'lib/parser.rb', line 163

def string? value, index
  return NO_MATCH if index == NO_MATCH # allow users to not check results of a sequence
  value = value.to_s
  index = ignore? index unless @ignoring
  i2 = index + value.length
# puts source_text[index...i2].inspect + ' ' + value.inspect
  source_text[index...i2] == value ? i2 : NO_MATCH
end