Module: OOPeg::Parser::Combinators

Includes:
Basics, Mappers, Repeaters, Satisfiers, Selectors
Included in:
OOPeg::Parser, ClassMethods
Defined in:
lib/oo_peg/parser/combinators.rb,
lib/oo_peg/parser/combinators/lazy.rb,
lib/oo_peg/parser/combinators/basics.rb,
lib/oo_peg/parser/combinators/mappers.rb,
lib/oo_peg/parser/combinators/repeaters.rb,
lib/oo_peg/parser/combinators/selectors.rb,
lib/oo_peg/parser/combinators/satisfiers.rb

Overview

The raison d’être of a PEG Parser are in fact the combinators

So, without further ado…

many

If a parser p parses a string, the parser p.many parses, well many occurrences of such strings concatenated.

N.B. By default many parses the empty input (0 occurrences) and can therefore never fail! but we can force a minimum of occurrences.

As indicated below, many returns a list of all parsed occurrences.

# example: many, the combinator for lists

vowel_parser = char_parser('aeiouy').many

parse(vowel_parser, "").ast => []
parse(vowel_parser, "ae").ast => %w[a e]

And we can force a minimum of occurrences

# example: manyer, more than many

two_chars = char_parser.many(min: 2)

parse(two_chars, 'ab').ast => %w[a b]
parse(two_chars, 'a').error => '"many(char_parser(TrueSet))" did not succeed the required 2 times, but only 1'

# We can see that the error message lacks some context, let's try again

named = char_parser.many(min: 2, name: '2 chars')
parse(named, '').error => '"2 chars" did not succeed the required 2 times, but only 0'

It is very frequently not an array of parsed strings, that we want, but the whole string, enter the

joined combinator

# example: joining together

all_parser = char_parser.many.joined

parse(all_parser, "some text").ast => "some text"

map modify the result’s ast

In the above example we have seen, that the map combinator returns a list, oftentimes we want the parsed string to be returned…

Enter map

# example: mapping a list back to a string

vowel_string_parser = char_parser('aeiouy').many.map(&:join)
parse(vowel_string_parser, "yo").ast => "yo"
Tagging, a special form of map

tagged(tag) is a short, convenience method for map { [tag, it] }

# example: tagging a result

tagged_char_parser = char_parser.tagged(:char)

parse(tagged_char_parser, 'q').ast => [:char, 'q']

Mapping Lists

Oftentimes, when applying map to the many combinator we want to map all elements of the returned list, enter…

map_many
# example: a digit parser

digit_parser = char_class_parser(:digit).many.map_many(&:to_i)

parse(digit_parser, '104a').ast => [1, 0, 4]

A shortcut, using the mapper inside many

# example: a shortcut digit parser

digit_parser = char_class_parser(:digit).many(&:to_i)

parse(digit_parser, '104a').ast => [1, 0, 4]

join_list

The inherent structure of this library will oftentimes return a list of the following structure [first_element, [second_element, ..., last_element]]

e.g.

# example: the crux with +and+ and +many+

element = ws_parser.not.joined
at_least_one = element.and(ws_parser.ignore.and(element).many(&:first))

parse(at_least_one, 'a b c').ast => ['a', ['b', 'c']]

This can be fixed with join_list

# example: fixing +and+ and +many+

element = ws_parser.not.joined
at_least_one = element.and(ws_parser.ignore.and(element).many(&:first)).join_list

parse(at_least_one, 'a b c').ast => ['a', 'b', 'c']

N.B This is a very specialized combinator which expects the AST it transforms being of the shape [_, [*]] if this is not the case a NoMatchingPatternError will be thrown

# example: Just be careful man

bad_parser = char_parser.join_list

  expect { parse(bad_parser, 'a') }
    .to raise_error(NoMatchingPatternError)

maybe (oftentimes called option in other PEG Parsers)

If a parser p parses a string, the parser p.maybe parses, well 0 or 1 occurrences of such a string.

N.B. maybe parses the empty input (0 occurrences) and can therefore never fail!

We could also implement maybe with map as follows, but there is a subtle difference between [], "", nil to be observed.

# example: map and maybe

optional_char = char_parser.many(max: 1).map(&:join)
maybe_char   = char_parser.maybe

parse(optional_char, "abc").ast => "a"
parse(optional_char, "").ast => ""

parse(maybe_char, "abc").ast => "a"
parse(maybe_char, "").ast => nil

or (oftentimes called select in other PEG Parsers)

Can be an instance method

# example: or as an instance method

# yet another implementation of maybe :P

maybe_parser = char_parser('a').or(char_parser('b'), true_parser)

parse(maybe_parser, 'a').ast => 'a'
parse(maybe_parser, 'b').ast => 'b'
parse(maybe_parser, 'c').ast => nil 
parse(maybe_parser, 'c') is! ok

Can be a class method

# example: or as a class method

int_or_number_parser = OOPeg::Parser.or(int_parser, set_parser('one', 'two'))

parse(int_or_number_parser, '12').ast => 12
parse(int_or_number_parser, 'one').ast => 'one'
parse(int_or_number_parser, 'two').ast => 'two'

parse(int_or_number_parser, 'three') not! ok

and (oftentimes called sequence in other PEG Parsers)

Can be an instance method

# example: and as an instance method

# yet another implementation of set_parser("abc")

abc_parser = char_parser('a').and(char_parser('b'), char_parser('c'))

parse(abc_parser, 'abc').ast => %w[a b c] 
parse(abc_parser, 'bca') not! ok

Can be a class method

# example: and as a class method

abc_parser = OOPeg::Parser.and(char_parser('a'), char_parser('b'), char_parser('c'))

parse(abc_parser, 'abc').ast => %w[a b c] 
parse(abc_parser, 'bca') not! ok

Sometimes we do not want some parsed text in our AST, therefore the .and combinator ignores nil ASTs.

# example: ignore a sign (at your own peril of course)

parser = OOPeg::Parser.and( char_parser('+').map {nil}, char_class_parser(:digit).many.joined )

parse(parser, "+42").ast => ['42']

As it is a little bit cumbersome to write map { nil } we made a more idiomatic way to express this behavior:

ignore

# example: a better way of ignorance

parser = OOPeg::Parser.and( char_parser('+').ignore, char_class_parser(:digit).many.joined )

parse(parser, "+42").ast => ['42']

not negation

# example: what is not a character

parse(char_class_parser(:digit).not.joined, 'a').ast => 'a'
parse(char_class_parser(:digit).not, '7') not! ok

Practical use of not is when it is combined with many.

Like in regular expressions where \S is oftentimes combined to \S+

# example: not whitespaces

not_ws_parser = ws_parser.not.many(min: 1).joined

parse(not_ws_parser, ' ') not! ok
parse(not_ws_parser, 't-32 ').ast => 't-32'

satifsfy can fail a successful result does not touch failing results

This, e.g. is used in the kwd_parser or set_parser

# example: an odd parser

odd_parser = int_parser.satisfy(name: "oddy", &:odd?)

parse(odd_parser, '11').ast => 11
parse(odd_parser, '10') not! ok

parse(odd_parser, '10').error => 'oddy failed'

N.B. it can also modify the result

even_parser = int_parser.satisfy { |n|
  n.even? ? [:ok, n/2] : [:error, "#{n}'s odd"]
}

parse(even_parser, '84').ast => 42

parse(even_parser, '73').error => "73's odd"

Change result for succeeding and failing parsers: map_or_rename

This allows us to modify the name or error message in case of failure and still transforming the ast if a block is given.

# example: A better odd parser?

better = int_parser.satisfy(&:odd?).map_or_rename(error: "Oh no", &:succ)

parse(better, '11').ast => 12
parse(better, '12').error => "Oh no"

# example: No need to provide a block

parse(char_parser.map_or_rename(name: "My Parser"), '') not! ok

Lookahead, an important concept…

We can just check on the future \o/

# example: Look ahead, do not advance

result = parse(int_parser.lookahead, '42')

result.ast is! nil
result is! ok
result.input.pos => 1
result.input.content => %w[4 2] 

# example: Look ahead, be disappointed

parse(int_parser.lookahead, 'alpha') not! ok

What About Recursive Parsers?

Let us assume that we want to parse a language that is recursive, how can we create a recursive parser that does not loop up to a stack overflow?

Here is a toy example to introduce the concept (but it will come in very handy later in our advanced OOPeg::Parsers::Advanced::SexpParser .

A parser for the language a^n • b^n, the naïve way.

# example: Look Ma' a stack overflow

$ab_count = 0
def ab_parser
  $ab_count += 1
  return true_parser if $ab_count == 1_000
  char_parser('a').and(ab_parser, char_parser('b')).or(true_parser)
end

parse(ab_parser, 'aabb')

$ab_count => 1_000

This problem can be solved by delaying the recursively called parser, which is done, with the aptly named:

delay combinator

# example: Look Ma' it's working now

$ab_count = 0
def ab_parser
  $ab_count += 1
  return true_parser if $ab_count == 1_000
  char_parser('a').and(delay {ab_parser}, char_parser('b')).joined.or(true_parser)
end

parse(ab_parser, 'aabb').ast => "aabb" 

$ab_count => 3

Defined Under Namespace

Modules: Basics, Lazy, Mappers, Repeaters, Satisfiers, Selectors

Instance Method Summary collapse

Instance Method Details

#_maybe(parser:, name: nil, replace_with: nil) ⇒ Object



65
66
67
68
69
70
71
72
73
74
# File 'lib/oo_peg/parser/combinators/repeaters.rb', line 65

def _maybe(parser:, name: nil, replace_with: nil)
  Parser.new(name || "maybe(#{parser.name})") do |input, name|
    case Parser.parse(parser, input)
    in {ok: false}
      Result.ok(ast: replace_with, input:)
    in success
      success
    end
  end
end

#_sequence(*parsers, name:) ⇒ Object

Raises:

  • (ArgumentError)


76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
# File 'lib/oo_peg/parser/combinators/repeaters.rb', line 76

def _sequence(*parsers, name:)
  parsers = parsers.flatten
  raise ArgumentError, "all parsers must be instances of Parser" unless parsers.all? { Parser === it }
  name ||= "seq(#{parsers.map(&:name).join(", ")})"
  Parser.new(name) do |input|
    original_input = input
    result = parsers.reduce_while [input, []] do |(input, ast), parser|

      parsed = Parser.parse(parser, input)
      # require "debug"; binding.break if parsed.input.nil?
      case parsed
      in {ok: true, ast: nil, input:}
        cont_reduce([input, ast])
      in {ok: true, ast: ast_node, input:}
        cont_reduce([input, [*ast, ast_node]])
      in {ok: false, error:}
        halt_reduce(Result.nok(input: original_input, error:, parser_name: name))
      end
    end

    case result
    in {ok: false} => error
      result
    in [input, ast]
      Result.ok(ast:, input:)
    end
  end
end