Class: StanfordParser::StandoffNode
- Inherits:
-
Treebank::ParentedNode
- Object
- Treebank::ParentedNode
- StanfordParser::StandoffNode
- Defined in:
- lib/stanfordparser.rb
Overview
Standoff syntactic tree annotation of text. Terminal nodes are labeled with the appropriate StandoffToken objects. Standoff parses can reproduce the original string from which they were generated verbatim, optionally with brackets around the yields of specified non-terminal nodes.
Instance Method Summary collapse
-
#initialize(stanford_parser_node, tokens) ⇒ StandoffNode
constructor
Create the standoff tree from a tree returned by the Stanford parser.
-
#to_bracketed_string(coords, open = "[", close = "]") ⇒ Object
Print the original string with brackets around word spans dominated by the specified consituents.
-
#to_original_string ⇒ Object
Return the original text string dominated by this node.
Constructor Details
#initialize(stanford_parser_node, tokens) ⇒ StandoffNode
Create the standoff tree from a tree returned by the Stanford parser. For non-terminal nodes, the tokens argument will be a StandoffSentence containing the StandoffToken objects representing all the tokens beneath and after this node. For terminal nodes, the tokens argument will be a StandoffToken.
357 358 359 360 361 362 363 364 365 366 367 368 |
# File 'lib/stanfordparser.rb', line 357 def initialize(stanford_parser_node, tokens) # Annotate this node with a non-terminal label or a StandoffToken as # appropriate. super(tokens.instance_of?(StandoffSentence) ? stanford_parser_node.value : tokens) # Enumerate the children depth-first. Tokens are removed from the list # left-to-right as terminal nodes are added to the tree. stanford_parser_node.children.each do |child| subtree = self.class.new(child, child.leaf? ? tokens.shift : tokens) attach_child!(subtree) end end |
Instance Method Details
#to_bracketed_string(coords, open = "[", close = "]") ⇒ Object
Print the original string with brackets around word spans dominated by the specified consituents.
The constituents to bracket are specified by passing a list of node coordinates, which are arrays of integers of the form returned by the tree enumerators of Treebank::Node objects.
- coords
-
the coordinates of the nodes around which to place brackets
- open
-
the open bracket symbol
- close
-
the close bracket symbol
387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 |
# File 'lib/stanfordparser.rb', line 387 def to_bracketed_string(coords, open = "[", close = "]") # Get a list of all the leaf nodes and their coordinates. items = depth_first_enumerator(true).find_all {|n| n.first.leaf?} # Enumerate over all the matching constituents inserting open and close # brackets around their yields in the items list. coords.each do |matching| # Insert using a simple state machine with three states: :start, # :open, and :close. state = :start # Enumerate over the items list looking for nodes that are the # children of the matching constituent. items.each_with_index do |item, index| # Skip inserted bracket characters. next if item.is_a? String # Handle terminal node items with the state machine. node, terminal_coordinate = item if state == :start next if not in_yield?(matching, terminal_coordinate) items.insert(index, open) state = :open else # state == :open next if in_yield?(matching, terminal_coordinate) items.insert(index, close) state = :close break end end # items.each_with_index # Handle the case where a matching constituent is flush with the end # of the sentence. items << close if state == :open end # each # Replace terminal nodes with their string representations. Insert # spacing characters in the list. items.each_with_index do |item, index| next if item.is_a? String text = item.first.label.current spacing = item.first.label.after # Replace the terminal node with its text. items[index] = text # Insert the spacing that comes after this text before the first # non-close bracket character. close_pos = find_index(items[index+1..-1]) {|item| not item == close} items.insert(index + close_pos + 1, spacing) end items.join end |
#to_original_string ⇒ Object
Return the original text string dominated by this node.
371 372 373 374 375 |
# File 'lib/stanfordparser.rb', line 371 def to_original_string leaves.inject("") do |s, leaf| s += leaf.label.current + leaf.label.after end end |