Class: Clusterer::MultinomialBayes

Inherits:

Bayes

Object
Bayes
Clusterer::MultinomialBayes

show all

Defined in:: lib/clusterer/bayes.rb

Overview

Based on the description given in “Tackling the Poor Assumptions of Naive Bayes Text Classifiers” by Jason D. M. Rennie, Lawrence Shih, Jaime Teevan and David R. Karger, ICML - 2003

The basic idea is that likelihood of a document for certain category is directly proportional to the number of other documents containing the same terms appearing while training for the particular class.

Direct Known Subclasses

WeightNormalizedMultinomialBayes

Instance Attribute Summary

Attributes inherited from Bayes

#categories

Instance Method Summary collapse

Methods inherited from Bayes

#classify, #method_missing

Dynamic Method Handling

This class handles dynamic methods through the method_missing method in the class Clusterer::Bayes

Instance Method Details

#distribution(document) ⇒ `Object`

# File 'lib/clusterer/bayes.rb', line 137

def distribution(document)
  super() do |cl,ind|
    numer, denom, sum = @likelihood_numer[cl], (1 + @likelihood_denom[cl]), 0.0
    document.each {|term,freq| sum += freq * Math.log((1 + (numer[term] || 0))/denom)}
    sum
  end
end

#train(document, category) ⇒ `Object`

# File 'lib/clusterer/bayes.rb', line 113

def train(document, category)
  category = category.to_sym
  super
  numer, sum = @likelihood_numer[category], 0.0
  document.each do |term,freq|
    numer[term] = (numer[term] || 0) + freq
    sum += freq
  end
  @likelihood_denom[category] += sum
end

#untrain(document, category) ⇒ `Object`

# File 'lib/clusterer/bayes.rb', line 124

def untrain(document, category)
  category = category.to_sym
  super
  numer, sum = @likelihood_numer[category], 0.0
  document.each do |term,freq|
    if numer[term]
      numer[term] = [numer[term] - freq, 0].max
      sum += freq
    end
  end
  @likelihood_denom[category] = [@likelihood_denom[category] - sum, 0.0].max
end