Class: Classifier::Base

Inherits:
Object
  • Object
show all
Defined in:
lib/classifier/base.rb

Direct Known Subclasses

Bayes, LSI

Instance Method Summary collapse

Constructor Details

#initialize(options = {}) ⇒ Base

Returns a new instance of Base.



4
5
6
7
8
9
# File 'lib/classifier/base.rb', line 4

def initialize(options = {})
  options.reverse_merge!(:language => 'en')
  options.reverse_merge!(:encoding => 'UTF_8')

  @options = options
end

Instance Method Details

#clean_word_hash(str) ⇒ Object

Return a word hash without extra punctuation or short symbols, just stemmed words



30
31
32
# File 'lib/classifier/base.rb', line 30

def clean_word_hash str
	word_hash_for_words str.gsub(/[^\w\s]/,"").split
end

#prepare_category_name(val) ⇒ Object



11
12
13
# File 'lib/classifier/base.rb', line 11

def prepare_category_name val
  val.to_s.gsub("_"," ").capitalize
end

#remove_stemmerObject

When a Classifier instance is serialized, it is saved with an instance of Lingua::Stemmer that may not be initialized when deserialized later, raising a “RuntimeError: Stemmer is not initialized”.

You can run remove_stemmer to force a new Stemmer to be initialized.



39
40
41
# File 'lib/classifier/base.rb', line 39

def remove_stemmer
  @stemmer = nil
end

#without_punctuation(str) ⇒ Object

Removes common punctuation symbols, returning a new string. E.g.,

"Hello (greeting's), with {braces} < >...?".without_punctuation
=> "Hello  greetings   with  braces         "


19
20
21
# File 'lib/classifier/base.rb', line 19

def without_punctuation str
  str.tr( ',?.!;:"@#$%^&*()_=+[]{}\|<>/`~', " " ) .tr( "'\-", "")
end

#word_hash(str) ⇒ Object

Return a Hash of strings => ints. Each word in the string is stemmed, and indexes to its frequency in the document.



25
26
27
# File 'lib/classifier/base.rb', line 25

def word_hash str
	word_hash_for_words(str.gsub(/[^\w\s]/,"").split + str.gsub(/[\w]/," ").split)
end