Class: Clusterer::MultinomialBayes
- Defined in:
- lib/clusterer/bayes.rb
Overview
Based on the description given in “Tackling the Poor Assumptions of Naive Bayes Text Classifiers” by Jason D. M. Rennie, Lawrence Shih, Jaime Teevan and David R. Karger, ICML - 2003
The basic idea is that likelihood of a document for certain category is directly proportional to the number of other documents containing the same terms appearing while training for the particular class.
Direct Known Subclasses
Instance Attribute Summary
Attributes inherited from Bayes
Instance Method Summary collapse
- #distribution(document) ⇒ Object
- #train(document, category) ⇒ Object
- #untrain(document, category) ⇒ Object
Methods inherited from Bayes
Dynamic Method Handling
This class handles dynamic methods through the method_missing method in the class Clusterer::Bayes
Instance Method Details
#distribution(document) ⇒ Object
137 138 139 140 141 142 143 |
# File 'lib/clusterer/bayes.rb', line 137 def distribution(document) super() do |cl,ind| numer, denom, sum = @likelihood_numer[cl], (1 + @likelihood_denom[cl]), 0.0 document.each {|term,freq| sum += freq * Math.log((1 + (numer[term] || 0))/denom)} sum end end |
#train(document, category) ⇒ Object
113 114 115 116 117 118 119 120 121 122 |
# File 'lib/clusterer/bayes.rb', line 113 def train(document, category) category = category.to_sym super numer, sum = @likelihood_numer[category], 0.0 document.each do |term,freq| numer[term] = (numer[term] || 0) + freq sum += freq end @likelihood_denom[category] += sum end |
#untrain(document, category) ⇒ Object
124 125 126 127 128 129 130 131 132 133 134 135 |
# File 'lib/clusterer/bayes.rb', line 124 def untrain(document, category) category = category.to_sym super numer, sum = @likelihood_numer[category], 0.0 document.each do |term,freq| if numer[term] numer[term] = [numer[term] - freq, 0].max sum += freq end end @likelihood_denom[category] = [@likelihood_denom[category] - sum, 0.0].max end |