Class: StanfordParser::StandoffParsedText

Inherits:
Array
  • Object
show all
Defined in:
lib/stanfordparser.rb

Overview

Standoff syntactic annotation of natural language text which may contain multiple sentences.

This is an Array of StandoffNode objects, one for each sentence in the text.

Instance Method Summary collapse

Constructor Details

#initialize(text, nodetype = StandoffNode, tokenizer = EN_PENN_TREEBANK_TOKENIZER, parser = DefaultParser.instance) ⇒ StandoffParsedText

Parse the text and create the standoff annotation.

The default parser is a singleton instance of the English language Stanford Natural Langugage parser. There may be a delay of a few seconds for it to load the first time it is created.



323
324
325
326
327
328
329
330
331
332
333
# File 'lib/stanfordparser.rb', line 323

def initialize(text, nodetype = StandoffNode,
               tokenizer = EN_PENN_TREEBANK_TOKENIZER,
               parser = DefaultParser.instance)
  preprocessor = StandoffDocumentPreprocessor.new(tokenizer)
  # Segment the text into sentences.  Parse each sentence, writing
  # standoff annotation information into the terminal nodes.
  preprocessor.getSentencesFromString(text).map do |sentence|
    parse = parser.apply(sentence.to_s)
    push(nodetype.new(parse, sentence))
  end
end

Instance Method Details

#inspectObject

Print class name and number of sentences.



336
337
338
# File 'lib/stanfordparser.rb', line 336

def inspect
  "<#{self.class.name}, #{length} sentences>"
end

#to_sObject

Print parses.



341
342
343
# File 'lib/stanfordparser.rb', line 341

def to_s
  flatten.join(" ")
end