Class: Classifier::Base
- Inherits:
-
Object
- Object
- Classifier::Base
- Defined in:
- lib/classifier/base.rb
Instance Method Summary collapse
-
#clean_word_hash(str) ⇒ Object
Return a word hash without extra punctuation or short symbols, just stemmed words.
-
#initialize(options = {}) ⇒ Base
constructor
A new instance of Base.
- #prepare_category_name(val) ⇒ Object
-
#remove_stemmer ⇒ Object
When a Classifier instance is serialized, it is saved with an instance of Lingua::Stemmer that may not be initialized when deserialized later, raising a “RuntimeError: Stemmer is not initialized”.
-
#without_punctuation(str) ⇒ Object
Removes common punctuation symbols, returning a new string.
-
#word_hash(str) ⇒ Object
Return a Hash of strings => ints.
Constructor Details
#initialize(options = {}) ⇒ Base
Returns a new instance of Base.
4 5 6 7 8 9 |
# File 'lib/classifier/base.rb', line 4 def initialize( = {}) .reverse_merge!(:language => 'en') .reverse_merge!(:encoding => 'UTF_8') @options = end |
Instance Method Details
#clean_word_hash(str) ⇒ Object
Return a word hash without extra punctuation or short symbols, just stemmed words
30 31 32 |
# File 'lib/classifier/base.rb', line 30 def clean_word_hash str word_hash_for_words str.gsub(/[^\w\s]/,"").split end |
#prepare_category_name(val) ⇒ Object
11 12 13 |
# File 'lib/classifier/base.rb', line 11 def prepare_category_name val val.to_s.gsub("_"," ").capitalize end |
#remove_stemmer ⇒ Object
When a Classifier instance is serialized, it is saved with an instance of Lingua::Stemmer that may not be initialized when deserialized later, raising a “RuntimeError: Stemmer is not initialized”.
You can run remove_stemmer to force a new Stemmer to be initialized.
39 40 41 |
# File 'lib/classifier/base.rb', line 39 def remove_stemmer @stemmer = nil end |
#without_punctuation(str) ⇒ Object
Removes common punctuation symbols, returning a new string. E.g.,
"Hello (greeting's), with {braces} < >...?".without_punctuation
=> "Hello greetings with braces "
19 20 21 |
# File 'lib/classifier/base.rb', line 19 def without_punctuation str str.tr( ',?.!;:"@#$%^&*()_=+[]{}\|<>/`~', " " ) .tr( "'\-", "") end |
#word_hash(str) ⇒ Object
Return a Hash of strings => ints. Each word in the string is stemmed, and indexes to its frequency in the document.
25 26 27 |
# File 'lib/classifier/base.rb', line 25 def word_hash str word_hash_for_words(str.gsub(/[^\w\s]/,"").split + str.gsub(/[\w]/," ").split) end |