raabro

Build Status Gem Version

A very dumb PEG parser library.

Son to aabro, grandson to neg, grand-grandson to parslet. There is also a javascript version jaabro.

a sample parser/rewriter

You use raabro by providing the parsing rules, then some rewrite rules.

The parsing rules make use of the raabro basic parsers seq, alt, str, rex, eseq, ...

The rewrite rules match names passed as first argument to the basic parsers to rewrite the resulting parse trees.

require 'raabro'


module Fun include Raabro

  # parse
  #
  # Last function is the root, "i" stands for "input".

  def pstart(i); rex(nil, i, /\(\s*/); end
  def pend(i); rex(nil, i, /\)\s*/); end
    # parenthese start and end, including trailing white space

  def comma(i); rex(nil, i, /,\s*/); end
    # a comma, including trailing white space

  def num(i); rex(:num, i, /-?[0-9]+\s*/); end
    # name is :num, a positive or negative integer

  def args(i); eseq(nil, i, :pstart, :exp, :comma, :pend); end
    # a set of :exp, beginning with a (, punctuated by commas and ending with )

  def funame(i); rex(nil, i, /[a-z][a-z0-9]*/); end
  def fun(i); seq(:fun, i, :funame, :args); end
    # name is :fun, a function composed of a function name
    # followed by arguments

  def exp(i); alt(nil, i, :fun, :num); end
    # an expression is either (alt) a function or a number

  # rewrite
  #
  # Names above (:num, :fun, ...) get a rewrite_xxx function.
  # "t" stands for "tree".

  def rewrite_exp(t); rewrite(t.children[0]); end
  def rewrite_num(t); t.string.to_i; end

  def rewrite_fun(t)

    funame, args = t.children

    [ funame.string ] +
    args.gather.collect { |e| rewrite(e) }
      #
      # #gather collect all the children in a tree that have
      # a name, in this example, names can be :exp, :num, :fun
  end
end


p Fun.parse('mul(1, 2)')
  # => ["mul", 1, 2]

p Fun.parse('mul(1, add(-2, 3))')
  # => ["mul", 1, ["add", -2, 3]]

p Fun.parse('mul (1, 2)')
  # => nil (doesn't accept a space after the function name)

This sample is available at: doc/readme0.rb.

custom rewrite()

By default, a parser gets a rewrite(t) that looks at the parse tree node names and calls the corresponding rewrite_{node_name}().

It's OK to provide a custom rewrite(t) function.

module Hello include Raabro

  def hello(i); str(:hello, i, 'hello'); end

  def rewrite(t)
    [ :ok, t.string ]
  end
end

basic parsers

One makes a parser by composing basic parsers, for example:

  def args(i); eseq(:args, i, :pa, :exp, :com, :pz); end
  def funame(i); rex(:funame, i, /[a-z][a-z0-9]*/); end
  def fun(i); seq(:fun, i, :funame, :args); end

where the fun parser is a sequence combining the funame parser then the args one. :fun (the first argument to the basic parser seq) will be the name of the resulting (local) parse tree.

Below is a list of the basic parsers provided by Raabro.

The first parameter to the basic parser is the name used by rewrite rules. The second parameter is a Raabro::Input instance, mostly a wrapped string.

def str(name, input, string)
  # matching a string

def rex(name, input, regex_or_string)
  # matching a regexp
  # no need for ^ or \A, checks the match occurs at current offset

def seq(name, input, *parsers)
  # a sequence of parsers

def alt(name, input, *parsers)
  # tries the parsers returns as soon as one succeeds

def altg(name, input, *parsers)
  # tries all the parsers, returns with the longest match

def rep(name, input, parser, min, max=0)
  # repeats the the wrapped parser

def nott(name, input, parser)
  # succeeds if the wrapped parser fails, fails if it succeeds

def ren(name, input, parser)
  # renames the output of the wrapped parser

def jseq(name, input, eltpa, seppa)
  #
  # seq(name, input, eltpa, seppa, eltpa, seppa, eltpa, seppa, ...)
  #
  # a sequence of `eltpa` parsers separated (joined) by `seppa` parsers

def eseq(name, input, startpa, eltpa, seppa, endpa)
  #
  # seq(name, input, startpa, eltpa, seppa, eltpa, seppa, ..., endpa)
  #
  # a sequence of `eltpa` parsers separated (joined) by `seppa` parsers
  # preceded by a `startpa` parser and followed by a `endpa` parser

the seq parser and its quantifiers

seq is special, it understands "quantifiers": '?', '+' or '*'. They make behave seq a bit like a classical regex.

The '!' (bang, not) quantifier is explained at the end of this section.

module CartParser include Raabro

  def fruit(i)
    rex(:fruit, i, /(tomato|apple|orange)/)
  end
  def vegetable(i)
    rex(:vegetable, i, /(potato|cabbage|carrot)/)
  end

  def cart(i)
    seq(:cart, i, :fruit, '*', :vegetable, '*')
  end
    # zero or more fruits followed by zero or more vegetables
end

(Yes, this sample parser parses string like "appletomatocabbage", it's not very useful, but I hope you get the point about .seq)

The '!' (bang, not) quantifier is a kind of "negative lookahead".

  def menu(i)
    seq(:menu, i, :mise_en_bouche, :main, :main, '!', :dessert)
  end

Lousy example, but here a main cannot follow a main.

trees

An instance of Raabro::Tree is passed to rewrite() and rewrite_{name}() functions.

The most useful methods of this class are:

class Raabro::Tree

  # Look for the first child or sub-child with the given name.
  # If the given name is nil, looks for the first child with a name (not nil).
  #
  def sublookup(name=nil)

  # Gathers all the children or sub-children with the given name.
  # If the given name is nil, gathers all the children with a name (not nil).
  # When a child matches, does not pursue gathering from the children of the
  # matching child.
  #
  def subgather(name=nil)
end

I'm using "child or sub-child" instead of "descendant" because once a child or sub-child matches, those methods do not consider the children or sub-children of that matching entity.

Here is a closeup on the rewrite functions of the sample parser at doc/readme1.rb (extracted from an early version of floraison/dense):

require 'raabro'

module PathParser include Raabro

  # (...)

  def rewrite_name(t); t.string; end
  def rewrite_off(t); t.string.to_i; end
  def rewrite_index(t); rewrite(t.sublookup); end
  def rewrite_path(t); t.subgather(:index).collect { |tt| rewrite(tt) }; end
end

Where rewrite_index(t) returns the result of the rewrite of the first of its children that has a name and rewrite_path(t) collects the result of the rewrite of all of its children that have the "index" name.

errors

By default, a parser will return nil when it cannot successfully parse the input.

For example, given the above Fun parser, parsing some truncated input would yield nil:

tree = Sample::Fun.parse('f(a, b')
  # yields `nil`...

One can reparse with error: true and receive an error array with the parse error details:

err = Sample::Fun.parse('f(a, b', error: true)
  # yields:
  # [ line, column, offest, error_message, error_visual ]
[ 1, 4, 3, 'parsing failed .../:exp/:fun/:arg', "f(a, b\n   ^---" ]

The last string in the error array looks like when printed out:

f(a, b
   ^---

error when not all is consumed

Consider the following toy parser:

module ToPlus include Raabro

  # parse

  def to_plus(input); rep(:tos, input, :to, 1); end

  # rewrite

  def rewrite(t); [ :ok, t.string ]; end
end
Sample::ToPlus.parse('totota')
  # yields nil since all the input was not parsed, "ta" is remaining

Sample::ToPlus.parse('totota', all: false)
  # yields
[ :ok, "toto" ]
  # and doesn't care about the remaining input "ta"

Sample::ToPlus.parse('totota', error: true)
  # yields
[ 1, 5, 4, "parsing failed, not all input was consumed", "totota\n    ^---" ]

The last string in the error array looks like when printed out:

totota
    ^---

LICENSE

MIT, see LICENSE.txt