Class: Whittle::Parser

Inherits:
Object
  • Object
show all
Defined in:
lib/whittle/parser.rb

Overview

Parsers are created by subclassing the Parser class and defining a context-free grammar.

Unlike other LALR(1) parsers, Whittle does not rely on code-generation, instead it synthesizes a parse table from the grammar at runtime, on the first parse.

While Whittle’s implementation works a little differently to yacc/bison and ruby parser generators like racc and citrus, the parseable grammars are the same. LALR(1) parsers are very powerful and it is generally said that the languages they cannot parse are difficult for humans to understand.

You should refer to the README for a full description of how to use the parser, but a quick example follows.

Examples:

A simple Whittle Parser


class Calculator < Whittle::Parser
  rule(:wsp => /\s+/).skip!

  rule(:int => /[0-9]+/).as { |i| Integer(i) }

  rule("+") % :left ^ 1
  rule("-") % :left ^ 1
  rule("/") % :left ^ 2
  rule("*") % :left ^ 2

  rule(:expr) do |r|
    r[:expr, "+", :expr].as { |left, _, right| left + right }
    r[:expr, "-", :expr].as { |left, _, right| left - right }
    r[:expr, "/", :expr].as { |left, _, right| left / right }
    r[:expr, "*", :expr].as { |left, _, right| left * right }
    r[:int]
  end

  start(:expr)
end

calculator = Calculator.new
calculator.parse("1 + (2 * 6) - 7")
# => 6

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.parse_tableHash

Returns the entire parse table used to interpret input into the parser.

You should not need to call this method, though you may wish to inspect its contents during debugging.

Note that the token nil in the parse table represents “anything” and its action is always to reduce.

Shift-reduce conflicts are resolved at runtime and therefore remain in the parse table.

Returns:

  • (Hash)

    a 2-dimensional Hash representing states with actions to perform for a given lookahead



122
123
124
# File 'lib/whittle/parser.rb', line 122

def parse_table
  @parse_table ||= parse_table_for_rule(start)
end

.parse_table_for_rule(name) ⇒ Hash

Prepare the parse table for a given rule instead of the start rule.

Warning: this method does not memoize the result, so you should not use it in production.

Parameters:

  • name (Symbol, String)

    the name of the Rule to use as the start rule

Returns:

  • (Hash)

    the complete parse table for this rule

Raises:



135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
# File 'lib/whittle/parser.rb', line 135

def parse_table_for_rule(name)
  raise GrammarError, "Undefined start rule #{name.inspect}" unless rules.key?(name)

  rule = if rules[name].terminal?
    RuleSet.new(:$start, false).tap { |r| r[name].as { |prog| prog } }
  else
    rules[name]
  end

  rule.build_parse_table(
    {},
    self,
    {
      :initial => true,
      :state   => [rule, 0].hash,
      :seen    => [],
      :offset  => 0,
      :prec    => 0
    }
  )
end

.rule(name) ⇒ RuleSet, Rule

Declares a new rule.

The are three ways to call this method:

1. rule("+")
2. rule(:int => /[0-9]+/)
3. rule(:expr) do |r|
     r[:int, "+", :int].as { |a, _, b| a + b }
   end

Variants (1) and (2) define basic terminal symbols (direct chunks of the input string), while variant (3) takes a block to define one or more nonterminal rules.

Parameters:

  • name (Symbol, String, Hash)

    the name of the rule, or a Hash mapping the name to a pattern

Returns:

  • (RuleSet, Rule)

    the newly created RuleSet if a block was given, otherwise a rule representing a terminal token for the input string name.



74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# File 'lib/whittle/parser.rb', line 74

def rule(name)
  if block_given?
    raise ArgumentError,
      "Parser#rule does not accept both a Hash and a block" if name.kind_of?(Hash)

    rules[name] = RuleSet.new(name, false)
    rules[name].tap { |r| yield r }
  else
    key, value = if name.kind_of?(Hash)
      raise ArgumentError,
        "Only one element allowed in Hash for Parser#rule" unless name.length == 1

      name.first
    else
      [name, name]
    end

    rules[key] = RuleSet.new(key, true)
    rules[key][value].as(:value)
  end
end

.rulesHash<String, RuleSet>

Returns a Hash mapping rule names with their RuleSets.

Returns:

  • (Hash<String, RuleSet>)

    all rules defined by the parser



51
52
53
# File 'lib/whittle/parser.rb', line 51

def rules
  @rules ||= {}
end

.start(name = nil) ⇒ Symbol

Declares most general rule that can be used to describe an entire input.

Called without any arguments, returns the current start rule.

Parameters:

  • name (Symbol) (defaults to: nil)

    the name of a rule defined in the parser (does not need to be defined beforehand)

Returns:

  • (Symbol)

    the new (or current) start rule



105
106
107
108
# File 'lib/whittle/parser.rb', line 105

def start(name = nil)
  @start = name unless name.nil?
  @start
end

Instance Method Details

#error(state, token, context) ⇒ Object

Invoked when the parser detects an error.

The default implementation raises a RuntimeError specifying the allowed inputs and the received input, along with a line number.

You may override this method with your own implementation, which, at least in theory, can recover from the error and allow the parse to continue, though this is an extremely advanced topic and requires a good understanding of how LALR(1) parsers operate.

Parameters:

  • state (Hash)

    the possible actions for the current parser state

  • token (Hash)

    the received token

  • context (Hash)

    the current parse context (input + arg stack + state stack)



281
282
283
# File 'lib/whittle/parser.rb', line 281

def error(state, token, context)
  raise ParseErrorBuilder.exception(state, token, context)
end

#lex(input) {|({ :name => :$end, :line => line, :value => nil, :offset => offset })| ... } ⇒ Object

Accepts a String as input and repeatedly yields terminal tokens found in the grammar.

The last token yielded is always named :$end and has the value of nil.

You may override this method to define a smarter implementation, should you need to.

Parameters:

  • input (String)

    the complete input string the lex

Yields:

  • (({ :name => :$end, :line => line, :value => nil, :offset => offset }))


244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
# File 'lib/whittle/parser.rb', line 244

def lex(input)
  line   = 1
  offset = 0
  ending = input.length

  until offset == ending do
    next_token(input, offset, line).tap do |token|
      raise UnconsumedInputError,
        "Unmatched input #{input[offset..-1].inspect} on line #{line}" if token.nil?

      token[:offset] = offset
      line, token[:line] = token[:line], line
      offset += token[:value].length
      yield token unless token[:discarded]
    end
  end

  yield ({ :name => :$end, :line => line, :value => nil, :offset => offset })
end

#parse(input, options = {}) ⇒ Object

Accepts input in the form of a String and attempts to parse it according to the grammar.

The input is scanned using a lexical analysis routine, defined by the #lex method. Each token detected by the routine is used to pick an action from the parse table.

Each time a sequence of inputs has been read that concludes a rule in the grammar, the inputs are passed as arguments to the block for that rule, converting the sequence into single input before the parse continues.

If the parser encounters a token it does not expect, a parse error will be raised, specifying what was expected, what was received, and on which line the error occurred.

A successful parse returns the result of evaluating the start rule, whatever that may be.

It is possible to specify a different start rule during development.

Examples:

Using a different start rule


parser.parse(str, :rule => :another_rule)

Parameters:

  • input (String)

    a complete input string to parse according to the grammar

  • options (Hash) (defaults to: {})

    currently the only supported option is :rule, to specify a different once-off start rule

Returns:

  • (Object)

    whatever the grammar defines



193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
# File 'lib/whittle/parser.rb', line 193

def parse(input, options = {})
  table  = if options.key?(:rule)
    self.class.parse_table_for_rule(options[:rule])
  else
    self.class.parse_table
  end

  states = [table.keys.first]
  args   = []
  line   = 1

  lex(input) do |token|
    line  = token[:line]

    loop do
      state = table[states.last]

      if instruction = state[token[:name]] || state[nil]
        case instruction[:action]
        when :shift
          states << instruction[:state]
          args   << token[:rule].action.call(token[:value])
          break
        when :reduce, :accept
          rule = instruction[:rule]
          size = rule.components.length
          args << rule.action.call(*args.pop(size))
          states.pop(size)

          if states.length == 1 && instruction[:action] == :accept
            return args.pop
          elsif goto = table[states.last][rule.name]
            states << goto[:state]
            next
          end
        end
      end

      error(state, token, :input => input, :states => states, :args => args)
    end
  end
end

#rulesObject

Alias for class method Parser.rules

See Also:



161
162
163
# File 'lib/whittle/parser.rb', line 161

def rules
  self.class.rules
end