Class: Classifier::Base

Inherits:
Object
  • Object
show all
Defined in:
lib/classifier/base.rb

Direct Known Subclasses

Bayes, LSI

Instance Method Summary collapse

Constructor Details

#initialize(options = {}) ⇒ Base

Returns a new instance of Base.



5
6
7
8
9
10
# File 'lib/classifier/base.rb', line 5

def initialize(options = {})
  options.reverse_merge!(:language => 'en')
  options.reverse_merge!(:encoding => 'UTF_8')

  @options = options
end

Instance Method Details

#clean_word_hash(str) ⇒ Object

Return a word hash without extra punctuation or short symbols, just stemmed words



31
32
33
# File 'lib/classifier/base.rb', line 31

def clean_word_hash str
	word_hash_for_words str.gsub(/[^\w\s]/,"").split
end

#prepare_category_name(val) ⇒ Object



12
13
14
# File 'lib/classifier/base.rb', line 12

def prepare_category_name val
  val.to_s.gsub("_"," ").capitalize
end

#remove_stemmerObject

When a Classifier instance is serialized, it is saved with an instance of Lingua::Stemmer that may not be initialized when deserialized later, raising a “RuntimeError: Stemmer is not initialized”.

You can run remove_stemmer to force a new Stemmer to be initialized.



40
41
42
# File 'lib/classifier/base.rb', line 40

def remove_stemmer
  @stemmer = nil
end

#without_punctuation(str) ⇒ Object

Removes common punctuation symbols, returning a new string. E.g.,

"Hello (greeting's), with {braces} < >...?".without_punctuation
=> "Hello  greetings   with  braces         "


20
21
22
# File 'lib/classifier/base.rb', line 20

def without_punctuation str
  str.tr( ',?.!;:"@#$%^&*()_=+[]{}\|<>/`~', " " ) .tr( "'\-", "")
end

#word_hash(str) ⇒ Object

Return a Hash of strings => ints. Each word in the string is stemmed, and indexes to its frequency in the document.



26
27
28
# File 'lib/classifier/base.rb', line 26

def word_hash str
	word_hash_for_words(str.gsub(/[^\w\s]/,"").split + str.gsub(/[\w]/," ").split)
end