Tree Representation Module

This module supports the creation, search, manipulation, and serialization of tree structures.

Trees are implemented with Treebank::Node objects. Each Node has a writable label that may be any arbitrary object and a list of other child Node objects.

> require 'treebank'
=> true
> p = Treebank::Node.new('parent')
=> <Treebank::Node parent []>
> p.create_child!('child1')
=> <Treebank::Node child1 []>
> p.create_child!('child2')
=> <Treebank::Node child2 []>

Node has a subclass Treebank::ParentedNode that keeps track of the parent of the given node and has methods for iterating up the ancestor tree.

Node objects support breadth- and depth-first iteration. The functions each_depth_first and each_breadth_first yield a node and its descendants in the specified order. The functions depth_first_enumerator and breadth_first_enumerator wrap these functions inside Enumerator objects. See the function documentation for details as to how the enumeration may be controlled.

The default stringification method writes a node and all its children in a bracketed tree format.

> puts p
(parent
  (child1 )
  (child2 ))

Bracketed tree strings can be used to create Node trees.

> t = Treebank::Node.from_s('(parent (child1) (child2))')
=> <Treebank::Node parent [child1 child2]>
> puts t
(parent
  (child1 )
  (child2 ))

The bracketed tree format is the one used by the Penn Treebank Project to annotate linguistic structure.

The children of a node can be dereferenced with the array operator using a list of child offsets. For example:

> t = Treebank::Node.from_s(“(A (B b) (C (D d) (E e)))”) => <Treebank::Node A [B C]> > t => <Treebank::Node D [d]>

Here the index (1,0) finds the 0th child of the node’s 1st child, which is the node (D d).

History

1-0-0

First release

1-1-0

Removed unnecessary fsa dependency from gemspec

2-0-0

Changed from_s initialization

2-1-0

Add indented multiline stringification; Add preterminal?

3-0-0

Add Node.children function. Add child deferencing by index. The each_depth_first and each_breadth_first functions now yield nodes instead of returning private enumerator objects. To get enumerator objects, use depth_first_enumerator and breadth_first_enumerator instead. Likewise, the each_parent function in the ParentedNode class yields nodes, while parent_enumerator returns an enumerator.

See Also

Lingua::Treebank implements similar functionality in Perl.

Copyright

Copyright 2006-2008, William Patrick McNeill

This program is distributed under the GNU General Public License.

Author

W.P. McNeill [email protected]