Module: Gemmy::Components::Nlp

Defined in:: lib/gemmy/patches_loaded/components/nlp.rb

Instance Method Summary collapse

#default_noun_proc_string(word) ⇒ Object
#default_verb_proc_string(word) ⇒ Object
#engtagger_lookup(sentence) ⇒ Object

This uses EngTagger to analyze a sentence The results will not be ambiguous; in this method’s results, a given word with either be ‘verb’, ‘noun’, or ‘unknown’.
#finalize_engtagger_pos(pos) ⇒ Object
#finalize_pos(word, pos) ⇒ Object

Compare WordPos and Engtagger results and save to proc if found only prioritize Engtagger if WordPos is missing.
#finalize_wordpos_pos(pos) ⇒ Object
#log_sentence(sentence) ⇒ Object

Adds words in sentence to application database The part of speech is identified by the DB Name Each entry is a word => proc mapping.
#parse_sentence(sentence) ⇒ Object
#save_noun_proc(word) ⇒ Object
#save_proc(final_pos, word) ⇒ Object
#save_verb_proc(word) ⇒ Object
#sentence_cache ⇒ Object

Engtagger evaluates POS in the context of a sentence So from that perspective, only entire sentences can be cached.
#setup_lexicons ⇒ Object
#tag_sentence(sentence) ⇒ Object

Uses the Ruby EngTagger tool to find parts of speech of a sentence.
#word_pos_cache ⇒ Object

This cache reduces the call rate of the WordPos shell util by caching the POS for individual words.
#wordpos_lookup(word) ⇒ Object

Instance Method Details

#default_noun_proc_string(word) ⇒ `Object`

# File 'lib/gemmy/patches_loaded/components/nlp.rb', line 126

def default_noun_proc_string(word)
  <<-Ruby.strip_heredoc
    ->(vn_phrases){ "#{word}" }
  Ruby
end

#default_verb_proc_string(word) ⇒ `Object`

# File 'lib/gemmy/patches_loaded/components/nlp.rb', line 132

def default_verb_proc_string(word)
  <<-Ruby.strip_heredoc
    ->(*nouns){ "#{word} \#{nouns.join " "}" }
  Ruby
end

#engtagger_lookup(sentence) ⇒ `Object`

This uses EngTagger to analyze a sentence The results will not be ambiguous; in this method’s results, a given word with either be ‘verb’, ‘noun’, or ‘unknown’.

# File 'lib/gemmy/patches_loaded/components/nlp.rb', line 141

def engtagger_lookup sentence
  nouns, verbs = tag_sentence(sentence)
  sentence.words.graph do |word|
    pos = case word
    when ->(w){ verbs.include? w }
      "verb"
    when ->(w){ nouns.include? w }
      "noun"
    else
      "unknown"
    end
    [word, [pos]]
  end
end

#finalize_engtagger_pos(pos) ⇒ `Object`

# File 'lib/gemmy/patches_loaded/components/nlp.rb', line 85

def finalize_engtagger_pos(pos)
  # If the WordPos definition isn't found, then there's no ambiguity
  if pos.include?("noun")
    "noun"
  elsif pos.include?("verb")
    "verb"
  else
    "unknown"
  end
end

#finalize_pos(word, pos) ⇒ `Object`

Compare WordPos and Engtagger results and save to proc if found only prioritize Engtagger if WordPos is missing

# File 'lib/gemmy/patches_loaded/components/nlp.rb', line 72

def finalize_pos word, pos
  final_pos = word_pos_cache.get_or_set(word) do
    doublecheck = wordpos_lookup(word)
    if ['noun', 'verb'].none? &doublecheck.m(:include?)
      finalize_engtagger_pos(pos)
    else
      finalize_wordpos_pos(pos)
    end
  end
  save_proc(final_pos, word)
  { word: word, pos: [final_pos] }
end

#finalize_wordpos_pos(pos) ⇒ `Object`

# File 'lib/gemmy/patches_loaded/components/nlp.rb', line 96

def finalize_wordpos_pos(pos)
  # WordPos returns ambiguous results.
  # Only unambiguous words are selected.
  # I.e. a noun|verb isn't saved.
  # It must be solely noun or verb.
  if pos.include?("noun") && !pos.include?("verb")
    "noun"
  elsif pos.include?("verb") && !pos.include?("noun")
    "verb"
  else
    "unknown"
  end
end

#log_sentence(sentence) ⇒ `Object`

Adds words in sentence to application database The part of speech is identified by the DB Name Each entry is a word => proc mapping.

Noun procs are passed all vn_phrases for the sentence (these are constructed by parse_sentence)

The Verb procs are passed the evaluated results of the Noun procs in its Verb-Noun phrase (as sequential arguments)

For example, if the phrase is “live well and flourish” Then (assuming the

Although EngTagger extracts POS for the words in a sentence, these classifications are context-dependent.

For this reason, words are also looked up using WordPos. Only umambiguous words are added to the grammar.

# File 'lib/gemmy/patches_loaded/components/nlp.rb', line 62

def log_sentence sentence
  sentence_cache.get_or_set(sentence) do
    engtagger_lookup(sentence).map do |word, pos|
      finalize_pos(word, pos)
    end
  end
end

#parse_sentence(sentence) ⇒ `Object`

# File 'lib/gemmy/patches_loaded/components/nlp.rb', line 6

def parse_sentence sentence
  setup_lexicons
  log_sentence sentence
  begin
    SentenceInterpreter.interpret sentence
  rescue NounBeforeVerbError
    []
  end
end

#save_noun_proc(word) ⇒ `Object`



118
119
120

# File 'lib/gemmy/patches_loaded/components/nlp.rb', line 118

def save_noun_proc word
  NounLexicon.set word.to_sym, default_noun_proc_string(word)
end

#save_proc(final_pos, word) ⇒ `Object`

# File 'lib/gemmy/patches_loaded/components/nlp.rb', line 110

def save_proc(final_pos, word)
  if final_pos.include?("noun")
    save_noun_proc(word)
  elsif final_pos.include?("verb")
    save_verb_proc word
  end
end

#save_verb_proc(word) ⇒ `Object`



122
123
124

# File 'lib/gemmy/patches_loaded/components/nlp.rb', line 122

def save_verb_proc word
  VerbLexicon.set word.to_sym, default_verb_proc_string(word)
end

#sentence_cache ⇒ `Object`

Engtagger evaluates POS in the context of a sentence So from that perspective, only entire sentences can be cached



158
159
160

# File 'lib/gemmy/patches_loaded/components/nlp.rb', line 158

def sentence_cache
  @sentence_cache ||= Gemmy::Components::Cache.new "sentence_pos"
end

#setup_lexicons ⇒ `Object`

# File 'lib/gemmy/patches_loaded/components/nlp.rb', line 16

def setup_lexicons
  return if @lexicon_set_up
  Object.send :remove_const, "VerbLexicon"
  Object.send :remove_const, "NounLexicon"
  Object.send(:const_set,"VerbLexicon", Gemmy::Components::Cache.new(
    "verb_lexicon"
  ))
  Object.send(:const_set,"NounLexicon", Gemmy::Components::Cache.new(
    "noun_lexicon"
  ))
  @lexicon_set_up = true
end

#tag_sentence(sentence) ⇒ `Object`

Uses the Ruby EngTagger tool to find parts of speech of a sentence

Returns a hash with :verbs and :nouns keys (vals are arrays)

# File 'lib/gemmy/patches_loaded/components/nlp.rb', line 34

def tag_sentence sentence
  @tagger ||= EngTagger.new
  res = @tagger.add_tags(sentence).ergo do |tagged|
    nouns = @tagger.get_nouns(tagged)&.keys || []
    verbs = @tagger.get_verbs(tagged)&.keys || []
    [nouns, verbs]
  end
end

#word_pos_cache ⇒ `Object`

This cache reduces the call rate of the WordPos shell util by caching the POS for individual words.



164
165
166

# File 'lib/gemmy/patches_loaded/components/nlp.rb', line 164

def word_pos_cache
  @pos_cache ||= Gemmy::Components::Cache.new("word_pos")
end

#wordpos_lookup(word) ⇒ `Object`

# File 'lib/gemmy/patches_loaded/components/nlp.rb', line 168

def wordpos_lookup(word)
  default_result = ['unknown']
  result = []
  word = word.strip.gsub(/[^a-zA-z]/, '')
  return default_result if word.empty?
  pos_response = JSON.parse `coffee -e "#{Gemmy::Coffee}" pos #{word}`
  result << "verb" unless pos_response["verbs"].empty?
  result << "noun" unless pos_response["nouns"].empty?
  result.empty? ? default_result : result
end

Module: Gemmy::Components::Nlp

Instance Method Summary collapse

Instance Method Details

#default_noun_proc_string(word) ⇒ Object

#default_verb_proc_string(word) ⇒ Object

#engtagger_lookup(sentence) ⇒ Object

#finalize_engtagger_pos(pos) ⇒ Object

#finalize_pos(word, pos) ⇒ Object

#finalize_wordpos_pos(pos) ⇒ Object

#log_sentence(sentence) ⇒ Object

#parse_sentence(sentence) ⇒ Object

#save_noun_proc(word) ⇒ Object

#save_proc(final_pos, word) ⇒ Object

#save_verb_proc(word) ⇒ Object

#sentence_cache ⇒ Object

#setup_lexicons ⇒ Object

#tag_sentence(sentence) ⇒ Object

#word_pos_cache ⇒ Object

#wordpos_lookup(word) ⇒ Object