Class: Clusterer::ComplementBayes

Inherits:
Bayes
  • Object
show all
Defined in:
lib/clusterer/bayes.rb

Overview

Based on the description given in “Tackling the Poor Assumptions of Naive Bayes Text Classifiers” by Jason D. M. Rennie, Lawrence Shih, Jaime Teevan and David R. Karger, ICML - 2003

The idea is that likelihood of a document for certain category is inversely proportional to the number of other documents containing the same terms appearing in other classes. Notice, the difference with MultiNomialBayes, and hence it is called complement. Though the authors claim that this performs better than MultiNomialBayes, but take the results with a pinch of salt, the performance of MultiNomial may be better with balanced datasets. If the dataset is skewed with the minority class being important, use ComplementBayes.

Direct Known Subclasses

WeightNormalizedComplementBayes

Instance Attribute Summary

Attributes inherited from Bayes

#categories

Instance Method Summary collapse

Methods inherited from Bayes

#classify, #method_missing

Dynamic Method Handling

This class handles dynamic methods through the method_missing method in the class Clusterer::Bayes

Instance Method Details

#distribution(document) ⇒ Object



184
185
186
187
188
189
190
# File 'lib/clusterer/bayes.rb', line 184

def distribution(document)
  super() do |cl,ind|
    numer, denom, sum = @likelihood_numer[cl], (1 + @likelihood_denom[cl]), 0.0
    document.each {|term, freq| sum += freq * Math.log((1 + (numer[term] || 0))/denom)}
    -sum
  end
end

#train(document, category) ⇒ Object



156
157
158
159
160
161
162
163
164
165
166
167
# File 'lib/clusterer/bayes.rb', line 156

def train(document, category)
  category = category.to_sym
  super
  (@categories - [category]).each_with_index do |cl,ind|
    numer, sum = @likelihood_numer[cl], 0.0
    document.each do |term,freq|
      numer[term] = (numer[term] || 0) + freq
      sum += freq
    end
    @likelihood_denom[cl] += sum
  end
end

#untrain(document, category) ⇒ Object



169
170
171
172
173
174
175
176
177
178
179
180
181
182
# File 'lib/clusterer/bayes.rb', line 169

def untrain(document, category)
  category = category.to_sym
  super
  (@categories - [category]).each_with_index do |cl,ind|
    numer, sum = @likelihood_numer[category], 0.0
    document.each do |term,freq|
      if numer[term]
        numer[term] = [numer[term] - freq, 0].max
        sum += freq
      end
    end
    @likelihood_denom[category] = [@likelihood_denom[category] - sum, 0.0].max
  end
end