Class: Classifier::Bayes
Instance Method Summary collapse
-
#add_category(category) ⇒ Object
(also: #append_category)
Allows you to add categories to the classifier.
-
#categories ⇒ Object
Provides a list of category names For example: b.categories => [‘This’, ‘That’, ‘the_other’].
-
#classifications(text) ⇒ Object
Returns the scores in each category the provided
text
. -
#classify(text) ⇒ Object
Returns the classification of the provided
text
, which is one of the categories given in the initializer. -
#initialize(*categories) ⇒ Bayes
constructor
The class can be created with one or more categories, each of which will be initialized and given a training method.
-
#method_missing(name, *args) ⇒ Object
Provides training and untraining methods for the categories specified in Bayes#new For example: b = Classifier::Bayes.new ‘This’, ‘That’, ‘the_other’ b.train_this “This text” b.train_that “That text” b.untrain_that “That text” b.train_the_other “The other text”.
-
#myclassify(text) ⇒ Object
These assume that the classes are Member and Not Member.
- #myclassify_with_word_hash(word_hash, debugging_info = nil) ⇒ Object
- #remove_low_frequency_words(threshold = 5) ⇒ Object
-
#train(category, text) ⇒ Object
Provides a general training method for all categories specified in Bayes#new For example: b = Classifier::Bayes.new ‘This’, ‘That’, ‘the_other’ b.train :this, “This text” b.train “that”, “That text” b.train “The other”, “The other text”.
-
#untrain(category, text) ⇒ Object
Provides a untraining method for all categories specified in Bayes#new Be very careful with this method.
Constructor Details
#initialize(*categories) ⇒ Bayes
The class can be created with one or more categories, each of which will be initialized and given a training method. E.g.,
b = Classifier::Bayes.new 'Interesting', 'Uninteresting', 'Spam'
11 12 13 14 15 |
# File 'lib/classifier/bayes.rb', line 11 def initialize(*categories) @categories = Hash.new categories.each { |category| @categories[category.prepare_category_name] = Hash.new } @total_words = 0 end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(name, *args) ⇒ Object
Provides training and untraining methods for the categories specified in Bayes#new For example:
b = Classifier::Bayes.new 'This', 'That', 'the_other'
b.train_this "This text"
b.train_that "That text"
b.untrain_that "That text"
b.train_the_other "The other text"
132 133 134 135 136 137 138 139 140 141 |
# File 'lib/classifier/bayes.rb', line 132 def method_missing(name, *args) category = name.to_s.gsub(/(un)?train_([\w]+)/, '\2').prepare_category_name if @categories.has_key? category args.each { |text| eval("#{$1}train(category, text)") } elsif name.to_s =~ /(un)?train_([\w]+)/ raise StandardError, "No such category: #{category}" else super #raise StandardError, "No such method: #{name}" end end |
Instance Method Details
#add_category(category) ⇒ Object Also known as: append_category
Allows you to add categories to the classifier. For example:
b.add_category "Not spam"
WARNING: Adding categories to a trained classifier will result in an undertrained category that will tend to match more criteria than the trained selective categories. In short, try to initialize your categories at initialization.
161 162 163 |
# File 'lib/classifier/bayes.rb', line 161 def add_category(category) @categories[category.prepare_category_name] = Hash.new end |
#categories ⇒ Object
Provides a list of category names For example:
b.categories
=> ['This', 'That', 'the_other']
148 149 150 |
# File 'lib/classifier/bayes.rb', line 148 def categories # :nodoc: @categories.keys.collect {|c| c.to_s} end |
#classifications(text) ⇒ Object
Returns the scores in each category the provided text
. E.g.,
b.classifications "I hate bad words and you"
=> {"Uninteresting"=>-12.6997928013932, "Interesting"=>-18.4206807439524}
The largest of these scores (the one closest to 0) is the one picked out by #classify
67 68 69 70 71 72 73 74 75 76 77 78 |
# File 'lib/classifier/bayes.rb', line 67 def classifications(text) score = Hash.new @categories.each do |category, category_words| score[category.to_s] = 0 total = category_words.values.inject(0) {|sum, element| sum+element} text.word_hash.each do |word, count| s = category_words.has_key?(word) ? category_words[word] : 0.1 score[category.to_s] += Math.log(s/total.to_f) end end return score end |
#classify(text) ⇒ Object
Returns the classification of the provided text
, which is one of the categories given in the initializer. E.g.,
b.classify "I hate bad words and you"
=> 'Uninteresting'
120 121 122 |
# File 'lib/classifier/bayes.rb', line 120 def classify(text) (classifications(text).sort_by { |a| -a[1] })[0][0] end |
#myclassify(text) ⇒ Object
These assume that the classes are Member and Not Member
81 82 83 |
# File 'lib/classifier/bayes.rb', line 81 def myclassify(text) myclassify_with_word_hash(text.word_hash) end |
#myclassify_with_word_hash(word_hash, debugging_info = nil) ⇒ Object
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
# File 'lib/classifier/bayes.rb', line 85 def myclassify_with_word_hash(word_hash, debugging_info = nil) member_term_count = @categories[:Member].size nonmember_term_count = @categories[:"Not member"].size term_count = member_term_count + nonmember_term_count score = 0 word_hash.each do |word, count| # count of words in each category member_count = @categories[:Member][word].to_i + 0.1 nonmember_count = @categories[:"Not member"][word].to_i + 0.1 next if member_count == 0.1 && nonmember_count == 0.1 # find relative prob word is in class -- p(w|c) word_member_p = (member_count) / (total_member_count_correct + term_count).to_f word_nonmember_p = (nonmember_count) / (total_nonmember_count_correct + term_count).to_f word_pr = Math.log(word_member_p / word_nonmember_p) score += word_pr * count if debugging_info debugging_info[word] = word_pr * count end #print "#{word_pr * count}: #{word}\n" end if score > 0 return "Member", score else return "Not member", score end end |
#remove_low_frequency_words(threshold = 5) ⇒ Object
165 166 167 168 169 170 171 172 173 174 |
# File 'lib/classifier/bayes.rb', line 165 def remove_low_frequency_words(threshold = 5) @categories.each do |_, word_counts| word_counts.to_a.each do |word, count| if count < threshold word_counts.delete(word) end end end reset_correct_counts! end |
#train(category, text) ⇒ Object
Provides a general training method for all categories specified in Bayes#new For example:
b = Classifier::Bayes.new 'This', 'That', 'the_other'
b.train :this, "This text"
b.train "that", "That text"
b.train "The other", "The other text"
24 25 26 27 28 29 30 31 32 |
# File 'lib/classifier/bayes.rb', line 24 def train(category, text) category = category.prepare_category_name text.word_hash.each do |word, count| @categories[category][word] ||= 0 @categories[category][word] += count @total_words += count end reset_correct_counts! end |
#untrain(category, text) ⇒ Object
Provides a untraining method for all categories specified in Bayes#new Be very careful with this method.
For example:
b = Classifier::Bayes.new 'This', 'That', 'the_other'
b.train :this, "This text"
b.untrain :this, "This text"
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
# File 'lib/classifier/bayes.rb', line 42 def untrain(category, text) category = category.prepare_category_name text.word_hash.each do |word, count| if @total_words >= 0 # Sometimes items can be untrained before they are trained, # be tolerant of that case next if @categories[category][word].nil? orig = @categories[category][word] @categories[category][word] ||= 0 @categories[category][word] -= count if @categories[category][word] <= 0 @categories[category].delete(word) count = orig end @total_words -= count end end reset_correct_counts! end |