Module: RPrec

Defined in:
lib/rprec.rb,
lib/rprec/op.rb,
lib/rprec/dsl.rb,
lib/rprec/prec.rb,
lib/rprec/token.rb,
lib/rprec/stream.rb,
lib/rprec/grammar.rb,
lib/rprec/version.rb,
lib/rprec/parse_error.rb,
lib/rprec/array_stream.rb,
lib/rprec/regexp_lexer.rb

Overview

RPrec is an implementation of operator-precedence parsing. The operator-precedence parsing is also known as Pratt parsing. This implementation is extended for mixfix operators which are operators consists of multi parts (e.g. ... ? ... : ... operator).

Usage

RPrec::Grammar is a grammar of the target language and also behaves its parser.

We have a RPrec::DSL to easily build a RPrec::Grammar instance. The below is an example grammar for simple arithmetic expressions.

grammar = RPrec::DSL.build do
  prec :main => :add_sub

  prec :add_sub => :mul_div do
    left_assoc '+' do |left, op_tok, right|
      [:add, left, right]
    end
    left_assoc '-' do |left, op_tok, right|
      [:sub, left, right]
    end
  end

  prec :mul_div => :unary do
    left_assoc '*' do |left, op_tok, right|
      [:mul, left, right]
    end
    left_assoc '/' do |left, op_tok, right|
      [:div, left, right]
    end
  end

  prec :unary => :atom do
    prefix '+' do |op_tok, expr|
      [:plus, expr]
    end
    prefix '-' do |op_tok, expr|
      [:minus, expr]
    end
  end

  prec :atom do
    closed 'INT' do |int_tok|
      [:int, int_tok.value]
    end
    closed '(', :add_sub, ')' do |lpar_tok, expr, rpar_tok|
      [:paren, expr]
    end
  end
end

Here, prec is the only available method in the RPrec::DSL context. This method is for defining a precedence and takes the argument of one of the following forms: name => succs, name => succs, and name. prec name => succ is an alias for prec name => [succ] and prec name is an alias for prec name => []. succs is an array of precedence names which have higher binding powers than this precedence's. Precedence names must be symbols.

prec also can take a block to define operators belonging to this precedence. Operators can be defined by the following eight methods:

  • prefix key, *parts, &block
  • postfix key, *parts, &block
  • non_assoc_prefix key, *parts, &block
  • non_assoc_postfix key, *parts, &block
  • closed key, *parts, &block
  • left_assoc key, *parts, &block
  • right_assoc key, *parts, &block
  • non_assoc key, *parts, &block

They define operators with their own name associativity. The first argument key specifies the key token that is used to determine which the operator to be parsed. The rest arguments are parts: they consist of precedence names or tokens, meaning that they are required after the key token. They can take a block, which is an action called on parsed.

To parse a string using this grammar, we can use the RPrec::Grammar#parse method. This method takes a RPrec::Stream object. RPrec::Stream is a token stream (an array) with convenient methods for parsing. It can be created easily by using RPrec::ArrayStream class.

# A stream for the source `"1 + 2 * 3"`.
stream = RPrec::ArrayStream.new([
  RPrec::Token.new('INT', 1),
  RPrec::Token.new('+'),
  RPrec::Token.new('INT', 2),
  RPrec::Token.new('*'),
  RPrec::Token.new('INT', 3),
])

grammar.parse(stream)
# => [:add, [:int, 1], [:mul, [:int, 2], [:int, 3]]]

Limitations

This implementation assumes that an operator can be determined by the current token on parsing. If a grammar is not so, this reports the "Conflict with the key token ..." error.

For example, the following grammar is rejected because the key token '[' is conflicted.

grammar = RPrec::DSL.build do
  prec :range do
     closed '[', :int, ',', :int, ']'
     closed '[', :int, ',', :int, ')'
  end
end
# raises "Conflict with the key token '['"

References

Defined Under Namespace

Classes: ArrayStream, DSL, Error, Grammar, Op, ParseError, Prec, RegexpLexer, RegexpStream, Stream, Token

Constant Summary collapse

VERSION =

The version.

'1.0.0'