Class: Linnaeus
- Inherits:
-
Object
- Object
- Linnaeus
- Defined in:
- lib/linnaeus.rb
Overview
The base class. You won’t use this directly - use one of the subclasses.
Direct Known Subclasses
Defined Under Namespace
Classes: Classifier, Persistence, Stopwords, Trainer
Instance Method Summary collapse
-
#count_word_occurrences(text = '') ⇒ Object
Count occurences of words in a text corpus.
-
#initialize(opts = {}) ⇒ Linnaeus
constructor
A new instance of Linnaeus.
Constructor Details
#initialize(opts = {}) ⇒ Linnaeus
Returns a new instance of Linnaeus.
9 10 11 12 13 14 15 16 17 18 19 20 21 |
# File 'lib/linnaeus.rb', line 9 def initialize(opts = {}) = { persistence_class: Persistence, stopwords_class: Stopwords, skip_stemming: false, encoding: 'UTF-8' }.merge(opts) @db = [:persistence_class].new() @stopword_generator = [:stopwords_class].new @skip_stemming = [:skip_stemming] @encoding = [:encoding] end |
Instance Method Details
#count_word_occurrences(text = '') ⇒ Object
Count occurences of words in a text corpus.
Parameters
- text
-
A string representing a document. Stopwords are removed and words are stemmed using the “Stemmer” gem.
28 29 30 31 32 33 34 35 36 37 |
# File 'lib/linnaeus.rb', line 28 def count_word_occurrences(text = '') count = {} text.encode(@encoding).downcase.split.each do |word| stemmed_word = (@skip_stemming) ? word : word.stem_porter unless stopwords.include? stemmed_word count[stemmed_word] = count[stemmed_word] ? count[stemmed_word] + 1 : 1 end end count end |