Class: Treebank::Parser

Inherits:
Object
  • Object
show all
Includes:
Enumerable
Defined in:
lib/treebank.rb

Overview

A parser for string representations of trees.

This class uses a simplified shift-reduce parser to convert a string into a list of tree structures.

Treebank::Parser.new('(A) (B (C) (D))').collect
=> [<Treebank::Node A []>, <Treebank::Node B [C D]>]

The string representation of a list of trees has the following BNF definition:

  • trees -> node*

  • node -> (label? children)

  • label -> word

  • children -> node*|word

  • word -> w+

Note that the BNF definition of children allows a shortcut in which the labels of terminal nodes may be specified without brackets. So, for example, (A (B)) and (A B) are equivalent.

The trees returned by this class are caller-defined node objects, where each node has a list of child nodes.

Instance Method Summary collapse

Constructor Details

#initialize(tokens, node_class = Node) ⇒ Parser

tokens

Stream of tokens to be converted into trees

node_class

Class of node to create

If tokens is not a kind of TokenStream object it will be used as the source stream of one.



113
114
115
116
117
# File 'lib/treebank.rb', line 113

def initialize(tokens, node_class = Node)
  tokens = TokenStream.new(tokens) if not tokens.kind_of? TokenStream
  @tokens = tokens
  @node_class = node_class
end

Instance Method Details

#eachObject

Enumerate the tokens yielding trees.



120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
# File 'lib/treebank.rb', line 120

def each # :yields: tree
  parse = []
  @tokens.each do |token|
    case token
    when @tokens.left
      parse << :left
    when @tokens.right
      # Reduce the end of the parse stack.
      left_index = parse.rindex(:left)
      raise "Extra #{@tokens.right}" if left_index.nil?
      parse[left_index..-1] = reduce(parse[left_index+1..-1])
      # If the reduced stack consists of a single node, it must be
      # a complete tree.
      yield parse.pop if parse.length == 1
    else
      parse << token
    end # case
  end # do
  raise "Extra #{@tokens.left}: #{parse}" if not parse.empty?
end