Class: Clusterer::ComplementBayes
Overview
Based on the description given in “Tackling the Poor Assumptions of Naive Bayes Text Classifiers” by Jason D. M. Rennie, Lawrence Shih, Jaime Teevan and David R. Karger, ICML - 2003
The idea is that likelihood of a document for certain category is inversely proportional to the number of other documents containing the same terms appearing in other classes. Notice, the difference with MultiNomialBayes, and hence it is called complement. Though the authors claim that this performs better than MultiNomialBayes, but take the results with a pinch of salt, the performance of MultiNomial may be better with balanced datasets. If the dataset is skewed with the minority class being important, use ComplementBayes.
Direct Known Subclasses
Instance Attribute Summary
Attributes inherited from Bayes
Instance Method Summary collapse
- #distribution(document) ⇒ Object
- #train(document, category) ⇒ Object
- #untrain(document, category) ⇒ Object
Methods inherited from Bayes
Dynamic Method Handling
This class handles dynamic methods through the method_missing method in the class Clusterer::Bayes
Instance Method Details
#distribution(document) ⇒ Object
184 185 186 187 188 189 190 |
# File 'lib/clusterer/bayes.rb', line 184 def distribution(document) super() do |cl,ind| numer, denom, sum = @likelihood_numer[cl], (1 + @likelihood_denom[cl]), 0.0 document.each {|term, freq| sum += freq * Math.log((1 + (numer[term] || 0))/denom)} -sum end end |
#train(document, category) ⇒ Object
156 157 158 159 160 161 162 163 164 165 166 167 |
# File 'lib/clusterer/bayes.rb', line 156 def train(document, category) category = category.to_sym super (@categories - [category]).each_with_index do |cl,ind| numer, sum = @likelihood_numer[cl], 0.0 document.each do |term,freq| numer[term] = (numer[term] || 0) + freq sum += freq end @likelihood_denom[cl] += sum end end |
#untrain(document, category) ⇒ Object
169 170 171 172 173 174 175 176 177 178 179 180 181 182 |
# File 'lib/clusterer/bayes.rb', line 169 def untrain(document, category) category = category.to_sym super (@categories - [category]).each_with_index do |cl,ind| numer, sum = @likelihood_numer[category], 0.0 document.each do |term,freq| if numer[term] numer[term] = [numer[term] - freq, 0].max sum += freq end end @likelihood_denom[category] = [@likelihood_denom[category] - sum, 0.0].max end end |