Class: Treebank::TokenStream

Inherits:
Object
  • Object
show all
Includes:
Enumerable
Defined in:
lib/treebank.rb

Overview

An enumerable list of tokens in a string representation of a tree

This class provides a way of enumerating over a source to produce tokens that can be used in parsing a string representation of a tree. The source is an enumerable object whose each function returns a sequence of String objects, for example a file or a single String. Each returned string is delimited by left and right brackets and whitespace. The default brackets are ‘(’ and ‘)’, but different delimiters may be specified in the constructor.

Treebank::TokenStream.new('(A (B c) (D))').collect
=> ["(", "A", "(", "B", "c", ")", "(", "D", ")", ")"]

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(source, left = '(', right = ')') ⇒ TokenStream

Create a stream of tokens from an enumerable source.

source

The string stream to tokenize

left

Left bracket symbol

right

Right bracket symbol



50
51
52
53
54
55
56
57
58
59
60
# File 'lib/treebank.rb', line 50

def initialize(source, left = '(', right = ')')
  @source = source
  @left = left
  @right = right
  # Escape the '[' and ']' characters in the character class
  # regular expression.
  cc_left = (left == '[') ? "\\#{left}" : left
  cc_right = (right == ']') ? "\\#{right}" : right
  # Delimit by left and right brackets, e.g. /\(|\)|[^()]/
  @s_regex = Regexp.new("\\#{@left}|\\#{@right}|[^#{cc_left}#{cc_right}]+")
end

Instance Attribute Details

#leftObject (readonly)

The left delimiter



40
41
42
# File 'lib/treebank.rb', line 40

def left
  @left
end

#rightObject (readonly)

The right delimiter



43
44
45
# File 'lib/treebank.rb', line 43

def right
  @right
end

Instance Method Details

#eachObject

Enumerate the tokens in the source.



63
64
65
66
67
# File 'lib/treebank.rb', line 63

def each
  @source.each do |string|
    tokenize_string(string) {|token| yield token}
  end
end