Class: Clusterer::MultinomialBayes

Inherits:
Bayes
  • Object
show all
Defined in:
lib/clusterer/bayes.rb

Overview

Based on the description given in “Tackling the Poor Assumptions of Naive Bayes Text Classifiers” by Jason D. M. Rennie, Lawrence Shih, Jaime Teevan and David R. Karger, ICML - 2003

The basic idea is that likelihood of a document for certain category is directly proportional to the number of other documents containing the same terms appearing while training for the particular class.

Direct Known Subclasses

WeightNormalizedMultinomialBayes

Instance Attribute Summary

Attributes inherited from Bayes

#categories

Instance Method Summary collapse

Methods inherited from Bayes

#classify, #method_missing

Dynamic Method Handling

This class handles dynamic methods through the method_missing method in the class Clusterer::Bayes

Instance Method Details

#distribution(document) ⇒ Object



137
138
139
140
141
142
143
# File 'lib/clusterer/bayes.rb', line 137

def distribution(document)
  super() do |cl,ind|
    numer, denom, sum = @likelihood_numer[cl], (1 + @likelihood_denom[cl]), 0.0
    document.each {|term,freq| sum += freq * Math.log((1 + (numer[term] || 0))/denom)}
    sum
  end
end

#train(document, category) ⇒ Object



113
114
115
116
117
118
119
120
121
122
# File 'lib/clusterer/bayes.rb', line 113

def train(document, category)
  category = category.to_sym
  super
  numer, sum = @likelihood_numer[category], 0.0
  document.each do |term,freq|
    numer[term] = (numer[term] || 0) + freq
    sum += freq
  end
  @likelihood_denom[category] += sum
end

#untrain(document, category) ⇒ Object



124
125
126
127
128
129
130
131
132
133
134
135
# File 'lib/clusterer/bayes.rb', line 124

def untrain(document, category)
  category = category.to_sym
  super
  numer, sum = @likelihood_numer[category], 0.0
  document.each do |term,freq|
    if numer[term]
      numer[term] = [numer[term] - freq, 0].max
      sum += freq
    end
  end
  @likelihood_denom[category] = [@likelihood_denom[category] - sum, 0.0].max
end