Module: OOPeg::Parser::Combinators
- Includes:
- Basics, Mappers, Repeaters, Satisfiers, Selectors
- Included in:
- OOPeg::Parser, ClassMethods
- Defined in:
- lib/oo_peg/parser/combinators.rb,
lib/oo_peg/parser/combinators/lazy.rb,
lib/oo_peg/parser/combinators/basics.rb,
lib/oo_peg/parser/combinators/mappers.rb,
lib/oo_peg/parser/combinators/repeaters.rb,
lib/oo_peg/parser/combinators/selectors.rb,
lib/oo_peg/parser/combinators/satisfiers.rb
Overview
The raison d’être of a PEG Parser are in fact the combinators
So, without further ado…
many
If a parser p parses a string, the parser p.many parses, well many occurrences of such strings concatenated.
N.B. By default many parses the empty input (0 occurrences) and can therefore never fail! but we can force a minimum of occurrences.
As indicated below, many returns a list of all parsed occurrences.
# example: many, the combinator for lists
vowel_parser = char_parser('aeiouy').many
parse(vowel_parser, "").ast => []
parse(vowel_parser, "ae").ast => %w[a e]
And we can force a minimum of occurrences
# example: manyer, more than many
two_chars = char_parser.many(min: 2)
parse(two_chars, 'ab').ast => %w[a b]
parse(two_chars, 'a').error => '"many(char_parser(TrueSet))" did not succeed the required 2 times, but only 1'
# We can see that the error message lacks some context, let's try again
named = char_parser.many(min: 2, name: '2 chars')
parse(named, '').error => '"2 chars" did not succeed the required 2 times, but only 0'
It is very frequently not an array of parsed strings, that we want, but the whole string, enter the
joined combinator
# example: joining together
all_parser = char_parser.many.joined
parse(all_parser, "some text").ast => "some text"
map modify the result’s ast
In the above example we have seen, that the map combinator returns a list, oftentimes we want the parsed string to be returned…
Enter map
# example: mapping a list back to a string
vowel_string_parser = char_parser('aeiouy').many.map(&:join)
parse(vowel_string_parser, "yo").ast => "yo"
Tagging, a special form of map
tagged(tag) is a short, convenience method for map { [tag, it] }
# example: tagging a result
tagged_char_parser = char_parser.tagged(:char)
parse(tagged_char_parser, 'q').ast => [:char, 'q']
Mapping Lists
Oftentimes, when applying map to the many combinator we want to map all elements of the returned list, enter…
map_many
# example: a digit parser
digit_parser = char_class_parser(:digit).many.map_many(&:to_i)
parse(digit_parser, '104a').ast => [1, 0, 4]
A shortcut, using the mapper inside many
# example: a shortcut digit parser
digit_parser = char_class_parser(:digit).many(&:to_i)
parse(digit_parser, '104a').ast => [1, 0, 4]
join_list
The inherent structure of this library will oftentimes return a list of the following structure [first_element, [second_element, ..., last_element]]
e.g.
# example: the crux with +and+ and +many+
element = ws_parser.not.joined
at_least_one = element.and(ws_parser.ignore.and(element).many(&:first))
parse(at_least_one, 'a b c').ast => ['a', ['b', 'c']]
This can be fixed with join_list
# example: fixing +and+ and +many+
element = ws_parser.not.joined
at_least_one = element.and(ws_parser.ignore.and(element).many(&:first)).join_list
parse(at_least_one, 'a b c').ast => ['a', 'b', 'c']
N.B This is a very specialized combinator which expects the AST it transforms being of the shape [_, [*]] if this is not the case a NoMatchingPatternError will be thrown
# example: Just be careful man
bad_parser = char_parser.join_list
expect { parse(bad_parser, 'a') }
.to raise_error(NoMatchingPatternError)
maybe (oftentimes called option in other PEG Parsers)
If a parser p parses a string, the parser p.maybe parses, well 0 or 1 occurrences of such a string.
N.B. maybe parses the empty input (0 occurrences) and can therefore never fail!
We could also implement maybe with map as follows, but there is a subtle difference between [], "", nil to be observed.
# example: map and maybe
optional_char = char_parser.many(max: 1).map(&:join)
maybe_char = char_parser.maybe
parse(optional_char, "abc").ast => "a"
parse(optional_char, "").ast => ""
parse(maybe_char, "abc").ast => "a"
parse(maybe_char, "").ast => nil
or (oftentimes called select in other PEG Parsers)
Can be an instance method
# example: or as an instance method
# yet another implementation of maybe :P
maybe_parser = char_parser('a').or(char_parser('b'), true_parser)
parse(maybe_parser, 'a').ast => 'a'
parse(maybe_parser, 'b').ast => 'b'
parse(maybe_parser, 'c').ast => nil
parse(maybe_parser, 'c') is! ok
Can be a class method
# example: or as a class method
int_or_number_parser = OOPeg::Parser.or(int_parser, set_parser('one', 'two'))
parse(int_or_number_parser, '12').ast => 12
parse(int_or_number_parser, 'one').ast => 'one'
parse(int_or_number_parser, 'two').ast => 'two'
parse(int_or_number_parser, 'three') not! ok
and (oftentimes called sequence in other PEG Parsers)
Can be an instance method
# example: and as an instance method
# yet another implementation of set_parser("abc")
abc_parser = char_parser('a').and(char_parser('b'), char_parser('c'))
parse(abc_parser, 'abc').ast => %w[a b c]
parse(abc_parser, 'bca') not! ok
Can be a class method
# example: and as a class method
abc_parser = OOPeg::Parser.and(char_parser('a'), char_parser('b'), char_parser('c'))
parse(abc_parser, 'abc').ast => %w[a b c]
parse(abc_parser, 'bca') not! ok
Sometimes we do not want some parsed text in our AST, therefore the .and combinator ignores nil ASTs.
# example: ignore a sign (at your own peril of course)
parser = OOPeg::Parser.and( char_parser('+').map {nil}, char_class_parser(:digit).many.joined )
parse(parser, "+42").ast => ['42']
As it is a little bit cumbersome to write map { nil } we made a more idiomatic way to express this behavior:
ignore
# example: a better way of ignorance
parser = OOPeg::Parser.and( char_parser('+').ignore, char_class_parser(:digit).many.joined )
parse(parser, "+42").ast => ['42']
not negation
# example: what is not a character
parse(char_class_parser(:digit).not.joined, 'a').ast => 'a'
parse(char_class_parser(:digit).not, '7') not! ok
Practical use of not is when it is combined with many.
Like in regular expressions where \S is oftentimes combined to \S+
# example: not whitespaces
not_ws_parser = ws_parser.not.many(min: 1).joined
parse(not_ws_parser, ' ') not! ok
parse(not_ws_parser, 't-32 ').ast => 't-32'
satifsfy can fail a successful result does not touch failing results
This, e.g. is used in the kwd_parser or set_parser
# example: an odd parser
odd_parser = int_parser.satisfy(name: "oddy", &:odd?)
parse(odd_parser, '11').ast => 11
parse(odd_parser, '10') not! ok
parse(odd_parser, '10').error => 'oddy failed'
N.B. it can also modify the result
even_parser = int_parser.satisfy { |n|
n.even? ? [:ok, n/2] : [:error, "#{n}'s odd"]
}
parse(even_parser, '84').ast => 42
parse(even_parser, '73').error => "73's odd"
Change result for succeeding and failing parsers: map_or_rename
This allows us to modify the name or error message in case of failure and still transforming the ast if a block is given.
# example: A better odd parser?
better = int_parser.satisfy(&:odd?).map_or_rename(error: "Oh no", &:succ)
parse(better, '11').ast => 12
parse(better, '12').error => "Oh no"
# example: No need to provide a block
parse(char_parser.map_or_rename(name: "My Parser"), '') not! ok
Lookahead, an important concept…
We can just check on the future \o/
# example: Look ahead, do not advance
result = parse(int_parser.lookahead, '42')
result.ast is! nil
result is! ok
result.input.pos => 1
result.input.content => %w[4 2]
# example: Look ahead, be disappointed
parse(int_parser.lookahead, 'alpha') not! ok
What About Recursive Parsers?
Let us assume that we want to parse a language that is recursive, how can we create a recursive parser that does not loop up to a stack overflow?
Here is a toy example to introduce the concept (but it will come in very handy later in our advanced OOPeg::Parsers::Advanced::SexpParser .
A parser for the language a^n • b^n, the naïve way.
# example: Look Ma' a stack overflow
$ab_count = 0
def ab_parser
$ab_count += 1
return true_parser if $ab_count == 1_000
char_parser('a').and(ab_parser, char_parser('b')).or(true_parser)
end
parse(ab_parser, 'aabb')
$ab_count => 1_000
This problem can be solved by delaying the recursively called parser, which is done, with the aptly named:
delay combinator
# example: Look Ma' it's working now
$ab_count = 0
def ab_parser
$ab_count += 1
return true_parser if $ab_count == 1_000
char_parser('a').and(delay {ab_parser}, char_parser('b')).joined.or(true_parser)
end
parse(ab_parser, 'aabb').ast => "aabb"
$ab_count => 3
Defined Under Namespace
Modules: Basics, Lazy, Mappers, Repeaters, Satisfiers, Selectors
Instance Method Summary collapse
Instance Method Details
#_maybe(parser:, name: nil, replace_with: nil) ⇒ Object
65 66 67 68 69 70 71 72 73 74 |
# File 'lib/oo_peg/parser/combinators/repeaters.rb', line 65 def _maybe(parser:, name: nil, replace_with: nil) Parser.new(name || "maybe(#{parser.name})") do |input, name| case Parser.parse(parser, input) in {ok: false} Result.ok(ast: replace_with, input:) in success success end end end |
#_sequence(*parsers, name:) ⇒ Object
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
# File 'lib/oo_peg/parser/combinators/repeaters.rb', line 76 def _sequence(*parsers, name:) parsers = parsers.flatten raise ArgumentError, "all parsers must be instances of Parser" unless parsers.all? { Parser === it } name ||= "seq(#{parsers.map(&:name).join(", ")})" Parser.new(name) do |input| original_input = input result = parsers.reduce_while [input, []] do |(input, ast), parser| parsed = Parser.parse(parser, input) # require "debug"; binding.break if parsed.input.nil? case parsed in {ok: true, ast: nil, input:} cont_reduce([input, ast]) in {ok: true, ast: ast_node, input:} cont_reduce([input, [*ast, ast_node]]) in {ok: false, error:} halt_reduce(Result.nok(input: original_input, error:, parser_name: name)) end end case result in {ok: false} => error result in [input, ast] Result.ok(ast:, input:) end end end |