Tree Representation Module
This module supports the creation, search, manipulation, and serialization of tree structures.
Trees are implemented with Treebank::Node objects. Each Node has a writable label that may be any arbitrary object and a list of other child Node objects.
> require 'treebank'
=> true
> p = Treebank::Node.new('parent')
=> <Treebank::Node parent []>
> p.create_child!('child1')
=> <Treebank::Node child1 []>
> p.create_child!('child2')
=> <Treebank::Node child2 []>
Node has a subclass Treebank::ParentedNode that keeps track of the parent of the given node and has methods for iterating up the ancestor tree.
Node objects support breadth- and depth-first iteration. The functions each_depth_first and each_breadth_first yield a node and its descendants in the specified order. The functions depth_first_enumerator and breadth_first_enumerator wrap these functions inside Enumerator objects. See the function documentation for details as to how the enumeration may be controlled.
The default stringification method writes a node and all its children in a bracketed tree format.
> puts p
(parent
(child1 )
(child2 ))
Bracketed tree strings can be used to create Node trees.
> t = Treebank::Node.from_s('(parent (child1) (child2))')
=> <Treebank::Node parent [child1 child2]>
> puts t
(parent
(child1 )
(child2 ))
The bracketed tree format is the one used by the Penn Treebank Project to annotate linguistic structure.
The children of a node can be dereferenced with the array operator using a list of child offsets. For example:
> t = Treebank::Node.from_s(“(A (B b) (C (D d) (E e)))”) => <Treebank::Node A [B C]> > t => <Treebank::Node D [d]>
Here the index (1,0) finds the 0th child of the node’s 1st child, which is the node (D d)
.
History
- 1-0-0
-
First release
- 1-1-0
-
Removed unnecessary fsa dependency from gemspec
- 2-0-0
-
Changed from_s initialization
- 2-1-0
-
Add indented multiline stringification; Add preterminal?
- 3-0-0
-
Add Node.children function. Add child deferencing by index. The each_depth_first and each_breadth_first functions now yield nodes instead of returning private enumerator objects. To get enumerator objects, use depth_first_enumerator and breadth_first_enumerator instead. Likewise, the each_parent function in the ParentedNode class yields nodes, while parent_enumerator returns an enumerator.
See Also
Lingua::Treebank implements similar functionality in Perl.
Copyright
Copyright 2006-2008, William Patrick McNeill
This program is distributed under the GNU General Public License.
Author
W.P. McNeill [email protected]