Class: Whittle::Parser
- Inherits:
-
Object
- Object
- Whittle::Parser
- Defined in:
- lib/whittle/parser.rb
Overview
Parsers are created by subclassing the Parser class and defining a context-free grammar.
Unlike other LALR(1) parsers, Whittle does not rely on code-generation, instead it synthesizes a parse table from the grammar at runtime, on the first parse.
While Whittle’s implementation works a little differently to yacc/bison and ruby parser generators like racc and citrus, the parseable grammars are the same. LALR(1) parsers are very powerful and it is generally said that the languages they cannot parse are difficult for humans to understand.
You should refer to the README for a full description of how to use the parser, but a quick example follows.
Class Method Summary collapse
-
.parse_table ⇒ Hash
Returns the entire parse table used to interpret input into the parser.
-
.parse_table_for_rule(name) ⇒ Hash
Prepare the parse table for a given rule instead of the start rule.
-
.rule(name) ⇒ RuleSet, Rule
Declares a new rule.
-
.rules ⇒ Hash<String, RuleSet>
Returns a Hash mapping rule names with their RuleSets.
-
.start(name = nil) ⇒ Symbol
Declares most general rule that can be used to describe an entire input.
Instance Method Summary collapse
-
#error(state, token, context) ⇒ Object
Invoked when the parser detects an error.
-
#lex(input) {|({ :name => :$end, :line => line, :value => nil, :offset => offset })| ... } ⇒ Object
Accepts a String as input and repeatedly yields terminal tokens found in the grammar.
-
#parse(input, options = {}) ⇒ Object
Accepts input in the form of a String and attempts to parse it according to the grammar.
-
#rules ⇒ Object
Alias for class method Parser.rules.
Class Method Details
.parse_table ⇒ Hash
Returns the entire parse table used to interpret input into the parser.
You should not need to call this method, though you may wish to inspect its contents during debugging.
Note that the token nil
in the parse table represents “anything” and its action is always to reduce.
Shift-reduce conflicts are resolved at runtime and therefore remain in the parse table.
122 123 124 |
# File 'lib/whittle/parser.rb', line 122 def parse_table @parse_table ||= parse_table_for_rule(start) end |
.parse_table_for_rule(name) ⇒ Hash
Prepare the parse table for a given rule instead of the start rule.
Warning: this method does not memoize the result, so you should not use it in production.
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
# File 'lib/whittle/parser.rb', line 135 def parse_table_for_rule(name) raise GrammarError, "Undefined start rule #{name.inspect}" unless rules.key?(name) rule = if rules[name].terminal? RuleSet.new(:$start, false).tap { |r| r[name].as { |prog| prog } } else rules[name] end rule.build_parse_table( {}, self, { :initial => true, :state => [rule, 0].hash, :seen => [], :offset => 0, :prec => 0 } ) end |
.rule(name) ⇒ RuleSet, Rule
Declares a new rule.
The are three ways to call this method:
1. rule("+")
2. rule(:int => /[0-9]+/)
3. rule(:expr) do |r|
r[:int, "+", :int].as { |a, _, b| a + b }
end
Variants (1) and (2) define basic terminal symbols (direct chunks of the input string), while variant (3) takes a block to define one or more nonterminal rules.
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
# File 'lib/whittle/parser.rb', line 74 def rule(name) if block_given? raise ArgumentError, "Parser#rule does not accept both a Hash and a block" if name.kind_of?(Hash) rules[name] = RuleSet.new(name, false) rules[name].tap { |r| yield r } else key, value = if name.kind_of?(Hash) raise ArgumentError, "Only one element allowed in Hash for Parser#rule" unless name.length == 1 name.first else [name, name] end rules[key] = RuleSet.new(key, true) rules[key][value].as(:value) end end |
.rules ⇒ Hash<String, RuleSet>
Returns a Hash mapping rule names with their RuleSets.
51 52 53 |
# File 'lib/whittle/parser.rb', line 51 def rules @rules ||= {} end |
.start(name = nil) ⇒ Symbol
Declares most general rule that can be used to describe an entire input.
Called without any arguments, returns the current start rule.
105 106 107 108 |
# File 'lib/whittle/parser.rb', line 105 def start(name = nil) @start = name unless name.nil? @start end |
Instance Method Details
#error(state, token, context) ⇒ Object
Invoked when the parser detects an error.
The default implementation raises a RuntimeError specifying the allowed inputs and the received input, along with a line number.
You may override this method with your own implementation, which, at least in theory, can recover from the error and allow the parse to continue, though this is an extremely advanced topic and requires a good understanding of how LALR(1) parsers operate.
281 282 283 |
# File 'lib/whittle/parser.rb', line 281 def error(state, token, context) raise ParseErrorBuilder.exception(state, token, context) end |
#lex(input) {|({ :name => :$end, :line => line, :value => nil, :offset => offset })| ... } ⇒ Object
Accepts a String as input and repeatedly yields terminal tokens found in the grammar.
The last token yielded is always named :$end and has the value of nil
.
You may override this method to define a smarter implementation, should you need to.
244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 |
# File 'lib/whittle/parser.rb', line 244 def lex(input) line = 1 offset = 0 ending = input.length until offset == ending do next_token(input, offset, line).tap do |token| raise UnconsumedInputError, "Unmatched input #{input[offset..-1].inspect} on line #{line}" if token.nil? token[:offset] = offset line, token[:line] = token[:line], line offset += token[:value].length yield token unless token[:discarded] end end yield ({ :name => :$end, :line => line, :value => nil, :offset => offset }) end |
#parse(input, options = {}) ⇒ Object
Accepts input in the form of a String and attempts to parse it according to the grammar.
The input is scanned using a lexical analysis routine, defined by the #lex method. Each token detected by the routine is used to pick an action from the parse table.
Each time a sequence of inputs has been read that concludes a rule in the grammar, the inputs are passed as arguments to the block for that rule, converting the sequence into single input before the parse continues.
If the parser encounters a token it does not expect, a parse error will be raised, specifying what was expected, what was received, and on which line the error occurred.
A successful parse returns the result of evaluating the start rule, whatever that may be.
It is possible to specify a different start rule during development.
193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 |
# File 'lib/whittle/parser.rb', line 193 def parse(input, = {}) table = if .key?(:rule) self.class.parse_table_for_rule([:rule]) else self.class.parse_table end states = [table.keys.first] args = [] line = 1 lex(input) do |token| line = token[:line] loop do state = table[states.last] if instruction = state[token[:name]] || state[nil] case instruction[:action] when :shift states << instruction[:state] args << token[:rule].action.call(token[:value]) break when :reduce, :accept rule = instruction[:rule] size = rule.components.length args << rule.action.call(*args.pop(size)) states.pop(size) if states.length == 1 && instruction[:action] == :accept return args.pop elsif goto = table[states.last][rule.name] states << goto[:state] next end end end error(state, token, :input => input, :states => states, :args => args) end end end |
#rules ⇒ Object
Alias for class method Parser.rules
161 162 163 |
# File 'lib/whittle/parser.rb', line 161 def rules self.class.rules end |