Class: Treebank::Parser
Overview
A parser for string representations of trees.
This class uses a simplified shift-reduce parser to convert a string into a list of tree structures.
Treebank::Parser.new('(A) (B (C) (D))').collect
=> [<Treebank::Node A []>, <Treebank::Node B [C D]>]
The string representation of a list of trees has the following BNF definition:
-
trees -> node*
-
node -> (label? children)
-
label -> word
-
children -> node*|word
-
word -> w+
Note that the BNF definition of children allows a shortcut in which the labels of terminal nodes may be specified without brackets. So, for example, (A (B))
and (A B)
are equivalent.
The trees returned by this class are caller-defined node objects, where each node has a list of child nodes.
Instance Method Summary collapse
-
#each ⇒ Object
Enumerate the tokens yielding trees.
-
#initialize(tokens, node_class = Node) ⇒ Parser
constructor
- tokens
- Stream of tokens to be converted into trees node_class
-
Class of node to create.
Constructor Details
#initialize(tokens, node_class = Node) ⇒ Parser
- tokens
-
Stream of tokens to be converted into trees
- node_class
-
Class of node to create
If tokens is not a kind of TokenStream object it will be used as the source stream of one.
113 114 115 116 117 |
# File 'lib/treebank.rb', line 113 def initialize(tokens, node_class = Node) tokens = TokenStream.new(tokens) if not tokens.kind_of? TokenStream @tokens = tokens @node_class = node_class end |
Instance Method Details
#each ⇒ Object
Enumerate the tokens yielding trees.
120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
# File 'lib/treebank.rb', line 120 def each # :yields: tree parse = [] @tokens.each do |token| case token when @tokens.left parse << :left when @tokens.right # Reduce the end of the parse stack. left_index = parse.rindex(:left) raise "Extra #{@tokens.right}" if left_index.nil? parse[left_index..-1] = reduce(parse[left_index+1..-1]) # If the reduced stack consists of a single node, it must be # a complete tree. yield parse.pop if parse.length == 1 else parse << token end # case end # do raise "Extra #{@tokens.left}: #{parse}" if not parse.empty? end |