Class: Treat::Workers::Processors::Parsers::Stanford
- Inherits:
-
Object
- Object
- Treat::Workers::Processors::Parsers::Stanford
- Defined in:
- lib/treat/workers/processors/parsers/stanford.rb
Overview
Parsing using an interface to a Java implementation of probabilistic natural language parsers, both optimized PCFG and lexicalized dependency parsers, and a lexicalized PCFG parser.
Original paper: Dan Klein and Christopher D. Manning. 2003. Accurate Unlexicalized Parsing. Proceedings of the 41st Meeting of the Association for Computational Linguistics, pp. 423-430.
Constant Summary collapse
- Pttc =
Treat..aligned.
- DefaultOptions =
{ model: nil }
- @@parsers =
Hold one instance of the pipeline per language.
{}
Class Method Summary collapse
- .get_token_list(entity) ⇒ Object
-
.parse(entity, options = {}) ⇒ Object
Parse the entity using the Stanford parser.
- .recurse(java_node, ruby_node, tag_set) ⇒ Object
Class Method Details
.get_token_list(entity) ⇒ Object
80 81 82 83 84 85 86 |
# File 'lib/treat/workers/processors/parsers/stanford.rb', line 80 def self.get_token_list(entity) list = StanfordCoreNLP::ArrayList.new entity.tokens.each do |token| list.add(StanfordCoreNLP::Word.new(token.to_s)) end list end |
.parse(entity, options = {}) ⇒ Object
Parse the entity using the Stanford parser.
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
# File 'lib/treat/workers/processors/parsers/stanford.rb', line 20 def self.parse(entity, = {}) val, lang = entity.to_s, entity.language.intern Treat::Loaders::Stanford.load(lang) tag_set = StanfordCoreNLP::Config::TagSets[lang] list = get_token_list(entity) entity.remove_all! model_file = [:model] || StanfordCoreNLP::Config::Models[:parse][lang] unless @@parsers[lang] && @@parsers[lang][model_file] model_path = Treat.libraries.stanford.model_path || StanfordCoreNLP.model_path model_folder = StanfordCoreNLP::Config::ModelFolders[:parse] model = File.join(model_path, model_folder, model_file) @@parsers[lang] ||= {} = StanfordCoreNLP::Options.new parser = StanfordCoreNLP::LexicalizedParser .getParserFromFile(model, ) @@parsers[lang][model_file] = parser end parser = @@parsers[lang][model_file] text = parser.apply(list) recurse(text.children[0], entity, tag_set) entity.set :tag_set, tag_set end |
.recurse(java_node, ruby_node, tag_set) ⇒ Object
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
# File 'lib/treat/workers/processors/parsers/stanford.rb', line 55 def self.recurse(java_node, ruby_node, tag_set) java_node.children.each do |java_child| label = java_child.label tag = label.get(:category).to_s if Pttc[tag] && Pttc[tag][tag_set] ruby_child = Treat::Entities::Phrase.new ruby_child.set :tag, tag ruby_node << ruby_child unless java_child.children.empty? recurse(java_child, ruby_child, tag_set) end else val = java_child.children[0].to_s ruby_child = Treat::Entities::Token.from_string(val) ruby_child.set :tag, tag ruby_node << ruby_child end end end |