yaparc
Synopsis
There are several implementations of parser combinator in Ruby. This is an yet another simple combinator parser library in Ruby.
Install
gem install yaparc
Usage
In combinator parser, each parser is construct as a function taking input string as arguments. Larger parsers are built from smaller parsers. Although combinators are higher-order functions in ordinary functional languages, they are constructed as classes in yaparc, because Ruby has more object-oriented than functional property.
All parsers has parse
method, each of which takes input string as its arguments except Yaparc::Satisfy parser. Every parser returns either Yaparc::Result::OK or Yaparc::Result::Fail as their result of parsing. An instance of Yaparc::Result::Fail denotes faiilure, and instance of Yaparc::Result::OK indicates success.
Primitive Parsers
-
Yaparc::Succeed
-
Yaparc::Fail
-
Yaparc::Item
-
Yaparc::Satisfy
Succeed class
The parser Yaparc::Succeed always succeeds with the result value, without consuming any of the input string. In the following example, Yaparc::Succeed#parse takes an input string blah, blah, blah
and returns the singleton array [[1, "blah, blah, blah"]]
.
parser = Yaparc::Succeed.new(1)
parser.parse("blah, blah, blah")
#=> #<Yaparc::Result::OK:0xb7aaaf5c @input="blah, blah, blah", @value=1>
Fail class
The parser Yaparc::Fail always fails, regardless of the contents of the input string.
parser = Yaparc::Fail.new
parser.parse("abc")
#=> #<Yaparc::Result::Fail:0xb7aa56b0 @value=nil>
Item class
The parser Yaparc::Item fails if the input string is empty, and succeeds with the first character as the result value otherwise.
parser = Yaparc::Item.new
parser.parse("abc")
#=> #<Yaparc::Result::OK:0xb7a9fdb4 @input="bc", @value="a">
Satisfy class
The parser Yaparc::Satisfy recognizes a single input via predicate which determines if an arbitrary input is suitable for the predicate.
is_integer = lambda do |i|
begin
Integer(i)
true
rescue
false
end
end
parser = Yaparc::Satisfy.new(is_integer)
parser.parse("123")
#=> #<Yaparc::Result::OK:0xb7a8f284 @input="23", @value="1">
Combining Parsers
-
Yaparc::Alt
-
Yaparc::Seq
-
Yaparc::Many
-
Yaparc::ManyOne
Sequencing parser
The Yaparc::Seq corresponds to sequencing in BNF. The following parser recognizes anything that Symbol.new('+')
or Natural.new
would if placed in succession.
parser = Seq.new(Symbol.new('+'), Natural.new)
parser.parse("+321")
#=> #<Yaparc::Result::OK:0xb7a81ae4 @input="", @value=321>
If a block given to Yaparc::Seq, it analyses input string to construct its logical structure.
parser = Yaparc::Seq.new(Yaparc::Symbol.new('+'), Yaparc::Natural.new) do |plus, nat|
nat
end
parser.parse("+1234")
#=> #<Yaparc::Result::OK:0xb7a70a00 @input="", @value=1234>
It produces a parse tree which expounds the semantic structure of the program.
Alternation parser
The parser Yaparc::Alt class is an alternation parser, which returns the result of the first parser to succeed, and failure if neither does.
parser = Yaparc::Alt.new(
Yaparc::Seq.new(Yaparc::Symbol.new('+'), Yaparc::Natural.new) do |_, nat|
nat
end,
Yaparc::Natural.new
)
parser.parse("1234")
#=> #<Yaparc::Result::OK:0xb7a5a610 @input="", @value=1234>
parser.parse("-1234")
#=> #<Yaparc::Result::Fail:0xb7a57ba4 @value=nil>
Many
In Yaparc::Many, zero or more applications of parser are admissible.
parser = Yaparc::Many.new(Yaparc::Satisfy.new(lambda {|i| i > '0' and i < '9'}))
parser.parse("123abc")
#=> #<Yaparc::Result::OK:0xb7a49dc4 @input="abc", @value="123">
ManyOne
The Yaparc::ManyOne requires at least one successfull application of parser.
Tokenized parser
- Yaparc::Identifier
-
Parser for identifier
- Yaparc::Natural
-
Parser for natural number
- Yaparc::Symbol
-
Parser for symbol
Regex parser
parser = Regex.new(/\A[0-9]+/)
result = parser.parse("1234ab")
assert_equal '1234', result.value
parser = Regex.new(/([0-9]+):([a-z]+)/) do |match1, match2|
[match2,match1]
end
result = parser.parse("1234:ab")
assert_equal ["ab", "1234"], result.value
Define your own parser
In order to construct parsers, you make parser class to be inherited from Yaparc::AbstractParser class.
class Identifier < Yaparc::AbstractParser
def initialize
@parser = lambda do
Yaparc::Tokenize.new(Yaparc::Ident.new)
end
end
end
If you want to nest the same parser class in the parser definition, you have to choose this way. In the following example, note that Expr
class is instantiated inside Expr#initialize
method.
class Expr < Yaparc::AbstractParser
def initialize
@parser = lambda do
Yaparc::Alt.new(
Yaparc::Seq.new(Term.new,
Yaparc::Symbol.new('+'),
Expr.new) do |term, _, expr|
['+', term, expr]
end,
Term.new
)
end
end
end
Constructing your parsers, it should be noted that left-recursion leads to non-termination of the parser.
Avoiding left-recursion
A ::= A B | C
is equivalent to
A ::= C B*
Tokenization
When you want to tokenize input stream, use Yaparc::Tokenize class.
About this project
- RubyGems.org
- RubyForge (archived)
-
web.archive.org/web/20140515235842/http://rubyforge.org/projects/yaparc/
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run rake test
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in the library file, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and the created tag, and push the .gem
file to [rubygems.org](rubygems.org).
Contributing
Bug reports and pull requests are welcome.