Class: Peggy::Parser
- Inherits:
-
Object
- Object
- Peggy::Parser
- Defined in:
- lib/parse/parser.rb,
lib/parse/ast.rb
Overview
Packrat parser class. Note all methods have a trailing exclamation (!) or question mark (?), or have long names with underscores (_). This is because productions are methods and we need to avoid name collisions. To use this class you must subclass Parser and provide your productions as methods. Your productions must call match? or one of the protected convenience routines to perform parsing. Productions must never call another production directly, or results will not get memoized and you will slow down your parse conciderably, and possibly risk getting into an infinite recursion (until the stack blows its top). Note, as a conveience in writting productions, you can call any match? function multiple times, passing each returned index, such as in a sequence, without checking the results of each production.
Direct Known Subclasses
Instance Attribute Summary collapse
-
#debug_flag ⇒ Object
Tells parser to print intermediate results if set.
-
#ignore_productions ⇒ Object
The productions to ignore.
-
#parse_results ⇒ Object
readonly
The results of the parse.
-
#source_text ⇒ Object
The source to parse, can be set prior to calling parse!().
Instance Method Summary collapse
-
#[](range) ⇒ Object
Return a range (or character) of the source_text.
-
#_memoize(goal, index, result, position = ) ⇒ Object
Record the results of the parse in the parse_results memo.
-
#allow?(goal, index) ⇒ Boolean
Try to match a production from the given index.
-
#ast?(options = {}) ⇒ Boolean
Create an Abstract Syntax Tree from the parse results.
-
#check?(goal, index) ⇒ Boolean
Try to match a production from the given index then backtrack.
-
#correct_regexp!(re) ⇒ Object
Make sure regular expressions match the beginning of the string, actually from the string from the given index.
-
#dissallow?(goal, index) ⇒ Boolean
Try not to match a production from the given index then backtrack.
-
#eof(index) ⇒ Object
Special production that only matches the end of source_text.
-
#ignore?(index) ⇒ Boolean
Match tokens that should be ignored.
-
#literal?(value, index) ⇒ Boolean
Match a literal string or regular expression from the given index.
-
#match?(goal, index) ⇒ Boolean
Match a production from the given index.
-
#parse?(goal, source = nil, index = 0) ⇒ Boolean
Invokes the parser from the beginning of the source on the given production goal.
-
#query?(*args) ⇒ Boolean
Queries the parse results for a heirarchy of production matches.
-
#regexp?(value, index) ⇒ Boolean
Match a regular expression from the given index.
-
#string?(value, index) ⇒ Boolean
Match a string from the given index.
Instance Attribute Details
#debug_flag ⇒ Object
Tells parser to print intermediate results if set.
78 79 80 |
# File 'lib/parse/parser.rb', line 78 def debug_flag @debug_flag end |
#ignore_productions ⇒ Object
The productions to ignore.
88 89 90 |
# File 'lib/parse/parser.rb', line 88 def ignore_productions @ignore_productions end |
#parse_results ⇒ Object (readonly)
The results of the parse. A hash (keys of indexs) of hashes (keys of production symbols and values of end indexes.
85 86 87 |
# File 'lib/parse/parser.rb', line 85 def parse_results @parse_results end |
#source_text ⇒ Object
The source to parse, can be set prior to calling parse!().
81 82 83 |
# File 'lib/parse/parser.rb', line 81 def source_text @source_text end |
Instance Method Details
#[](range) ⇒ Object
Return a range (or character) of the source_text.
91 92 93 94 |
# File 'lib/parse/parser.rb', line 91 def [] range raise "source_text not set" if source_text.nil? source_text[range] end |
#_memoize(goal, index, result, position = ) ⇒ Object
Record the results of the parse in the parse_results memo.
171 172 173 174 175 176 177 178 179 |
# File 'lib/parse/parser.rb', line 171 def _memoize goal, index, result, position = parse_results[index] if result position[:found_order] = [] unless position.has_key?(:found_order) position[:found_order] << goal position[goal.to_s] = source_text[index...result] if result - index < 40 && goal.is_a?(Symbol) end position[goal] = result if result || goal.is_a?(Symbol) result end |
#allow?(goal, index) ⇒ Boolean
Try to match a production from the given index. Returns the end index if found or start index if not found.
124 125 126 127 128 |
# File 'lib/parse/parser.rb', line 124 def allow? goal, index return NO_MATCH if index == NO_MATCH # allow users to not check results of a sequence found = match? goal, index found == NO_MATCH ? index : found end |
#ast?(options = {}) ⇒ Boolean
Create an Abstract Syntax Tree from the parse results. You must call parse?() prior to this. Valid options:
-
:ignore=>[symbol of element to ignore]
218 219 220 221 222 |
# File 'lib/parse/ast.rb', line 218 def ast? ={} ast = AST.new source_text, parse_results, #puts ast ast end |
#check?(goal, index) ⇒ Boolean
Try to match a production from the given index then backtrack. Returns index if found or NO_MATCH if not.
132 133 134 135 136 |
# File 'lib/parse/parser.rb', line 132 def check? goal, index return NO_MATCH if index == NO_MATCH # allow users to not check results of a sequence found = match? goal, index found == NO_MATCH ? NO_MATCH : index end |
#correct_regexp!(re) ⇒ Object
Make sure regular expressions match the beginning of the string, actually from the string from the given index.
233 234 235 236 |
# File 'lib/parse/parser.rb', line 233 def correct_regexp! re source = re.source source[0..1] == '\\A' ? re : Regexp.new("\\A(#{source})", re.) end |
#dissallow?(goal, index) ⇒ Boolean
Try not to match a production from the given index then backtrack. Returns index if not found or NO_MATCH if found.
140 141 142 143 144 |
# File 'lib/parse/parser.rb', line 140 def dissallow? goal, index return NO_MATCH if index == NO_MATCH # allow users to not check results of a sequence found = match? goal, index found == NO_MATCH ? index : NO_MATCH end |
#eof(index) ⇒ Object
Special production that only matches the end of source_text. Note, this function does not end in (?) or (!) because it is meant be used as a normal production.
148 149 150 151 |
# File 'lib/parse/parser.rb', line 148 def eof index return NO_MATCH if index == NO_MATCH # allow users to not check results of a sequence index >= source_text.length ? index : NO_MATCH end |
#ignore?(index) ⇒ Boolean
Match tokens that should be ignored. Used by match?(). Returns end index if found or start index if not found. Subclasses should override this method if they wish to ignore other text, such as comments.
184 185 186 187 188 189 190 191 192 193 |
# File 'lib/parse/parser.rb', line 184 def ignore? index return NO_MATCH if index == NO_MATCH # allow users to not check results of a sequence return index if @ignoring || ignore_productions.nil? @ignoring = true ignore_productions.each do |prod| index = allow? prod, index end @ignoring = nil index end |
#literal?(value, index) ⇒ Boolean
Match a literal string or regular expression from the given index. Returns the end index if found or NO_MATCH if not found.
197 198 199 200 201 202 203 204 205 206 207 |
# File 'lib/parse/parser.rb', line 197 def literal? value, index return NO_MATCH if index == NO_MATCH # allow users to not check results of a sequence case value when String string? value, index when Regexp regexp? value, index else raise "Unknown literal: #{value.inspect}" end end |
#match?(goal, index) ⇒ Boolean
Match a production from the given index. Returns the end index if found or NO_MATCH if not found.
155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
# File 'lib/parse/parser.rb', line 155 def match? goal, index return NO_MATCH if index == NO_MATCH # allow users to not check results of a sequence index = ignore? index unless @ignoring goal = goal.to_sym position = parse_results[index] found = position.fetch(goal) do position[goal] = IN_USE # used to prevent inifinite recursion in case user attemts # a left recursion _memoize goal, index, send(goal, index), position end puts "found #{goal} at #{index}...#{found} #{source_text[index...found].inspect}" if found && debug_flag raise "Parser cannot handle infinite (left) recursions. Please rewrite usage of '#{goal}'." if found == IN_USE found end |
#parse?(goal, source = nil, index = 0) ⇒ Boolean
Invokes the parser from the beginning of the source on the given production goal. You may provide the source here or you can set source_text prior to calling. If index is provided the parser will ignore characters previous to it.
99 100 101 102 103 104 105 106 107 |
# File 'lib/parse/parser.rb', line 99 def parse? goal, source = nil, index = 0 self.source_text = source unless source.nil? # Hash of automatic hashes @parse_results = Hash.new {|h1, k1| h1[k1] = {}} # OrderedHash.new {|h1, k1| h1[k1] = {}} @keys = nil index = match? goal, index pp(parse_results) if debug_flag index end |
#query?(*args) ⇒ Boolean
Queries the parse results for a heirarchy of production matches. An array of index ranges is returned, or an empny array if none are found. This can only be called after parse_results have been set by a parse.
112 113 114 115 116 117 118 119 120 |
# File 'lib/parse/parser.rb', line 112 def query? *args raise "You must first call parse!" unless parse_results @keys = @parse_results.keys.sort unless @keys found_list = [] index = 0 args.each do |arg| index = find? arg, index end end |
#regexp?(value, index) ⇒ Boolean
Match a regular expression from the given index. Returns the end index if found or NO_MATCH if not found.
222 223 224 225 226 227 228 229 |
# File 'lib/parse/parser.rb', line 222 def regexp? value, index return NO_MATCH if index == NO_MATCH # allow users to not check results of a sequence value = correct_regexp! value index = ignore? index unless @ignoring found = value.match source_text[index..-1] # puts "#{value.inspect} ~= #{found[0].inspect}" if found _memoize(value, index, found ? found.end(0) + index : NO_MATCH) end |
#string?(value, index) ⇒ Boolean
Match a string from the given index. Returns the end index if found or NO_MATCH if not found.
211 212 213 214 215 216 217 218 |
# File 'lib/parse/parser.rb', line 211 def string? value, index return NO_MATCH if index == NO_MATCH # allow users to not check results of a sequence value = value.to_s index = ignore? index unless @ignoring i2 = index + value.length # puts source_text[index...i2].inspect + ' ' + value.inspect _memoize(value, index, source_text[index...i2] == value ? i2 : NO_MATCH) end |