Class: Classifier::Base
- Inherits:
-
Object
- Object
- Classifier::Base
- Defined in:
- lib/classifier/base.rb
Instance Method Summary collapse
-
#clean_word_hash(str) ⇒ Object
Return a word hash without extra punctuation or short symbols, just stemmed words.
-
#initialize(options = {}) ⇒ Base
constructor
A new instance of Base.
- #prepare_category_name(val) ⇒ Object
-
#remove_stemmer ⇒ Object
When a Classifier instance is serialized, it is saved with an instance of Lingua::Stemmer that may not be initialized when deserialized later, raising a “RuntimeError: Stemmer is not initialized”.
-
#without_punctuation(str) ⇒ Object
Removes common punctuation symbols, returning a new string.
-
#word_hash(str) ⇒ Object
Return a Hash of strings => ints.
Constructor Details
#initialize(options = {}) ⇒ Base
Returns a new instance of Base.
5 6 7 8 9 10 |
# File 'lib/classifier/base.rb', line 5 def initialize( = {}) .reverse_merge!(:language => 'en') .reverse_merge!(:encoding => 'UTF_8') @options = end |
Instance Method Details
#clean_word_hash(str) ⇒ Object
Return a word hash without extra punctuation or short symbols, just stemmed words
31 32 33 |
# File 'lib/classifier/base.rb', line 31 def clean_word_hash str word_hash_for_words str.gsub(/[^\w\s]/,"").split end |
#prepare_category_name(val) ⇒ Object
12 13 14 |
# File 'lib/classifier/base.rb', line 12 def prepare_category_name val val.to_s.gsub("_"," ").capitalize end |
#remove_stemmer ⇒ Object
When a Classifier instance is serialized, it is saved with an instance of Lingua::Stemmer that may not be initialized when deserialized later, raising a “RuntimeError: Stemmer is not initialized”.
You can run remove_stemmer to force a new Stemmer to be initialized.
40 41 42 |
# File 'lib/classifier/base.rb', line 40 def remove_stemmer @stemmer = nil end |
#without_punctuation(str) ⇒ Object
Removes common punctuation symbols, returning a new string. E.g.,
"Hello (greeting's), with {braces} < >...?".without_punctuation
=> "Hello greetings with braces "
20 21 22 |
# File 'lib/classifier/base.rb', line 20 def without_punctuation str str.tr( ',?.!;:"@#$%^&*()_=+[]{}\|<>/`~', " " ) .tr( "'\-", "") end |
#word_hash(str) ⇒ Object
Return a Hash of strings => ints. Each word in the string is stemmed, and indexes to its frequency in the document.
26 27 28 |
# File 'lib/classifier/base.rb', line 26 def word_hash str word_hash_for_words(str.gsub(/[^\w\s]/,"").split + str.gsub(/[\w]/," ").split) end |