Class: Ai4r::Classifiers::NaiveBayes
- Inherits:
-
Classifier
- Object
- Classifier
- Ai4r::Classifiers::NaiveBayes
- Defined in:
- lib/ai4r/classifiers/naive_bayes.rb
Overview
Introduction
This is an implementation of a Naive Bayesian Classifier without any specialisation (ie. for text classification) Probabilities P(a_i | v_j) are estimated using m-estimates, hence the m parameter as second parameter when isntantiating the class. The estimation looks like this: (n_c + mp) / (n + m)
the variables are: n = the number of training examples for which v = v_j n_c = number of examples for which v = v_j and a = a_i p = a priori estimate for P(a_i | v_j) m = the equivalent sample size
stores the conditional probabilities in an array named @pcp and in this form: @pcp[values]
This kind of estimator is useful when the training data set is relatively small. If the data set is big enough, set it to 0, which is also the default value
For further details regarding Bayes and Naive Bayes Classifier have a look at those websites: en.wikipedia.org/wiki/Naive_Bayesian_classification en.wikipedia.org/wiki/Bayes%27_theorem
Parameters
-
:m => Optional. Default value is set to 0. It may be set to a value greater than 0 when
the size of the dataset is relatively small
How to use it
data = DataSet.new.load_csv_with_labels "bayes_data.csv"
b = NaiveBayes.new.
set_parameters({:m=>3}).
build data
b.eval(["Red", "SUV", "Domestic"])
Defined Under Namespace
Classes: DataEntry
Instance Method Summary collapse
-
#build(data) ⇒ Object
counts values of the attribute instances and calculates the probability of the classes and the conditional probabilities Parameter data has to be an instance of CsvDataSet.
-
#eval(data) ⇒ Object
You can evaluate new data, predicting its category.
-
#get_probability_map(data) ⇒ Object
Calculates the probabilities for the data entry Data.
-
#initialize ⇒ NaiveBayes
constructor
A new instance of NaiveBayes.
Methods inherited from Classifier
Methods included from Data::Parameterizable
#get_parameters, included, #set_parameters
Constructor Details
#initialize ⇒ NaiveBayes
Returns a new instance of NaiveBayes.
63 64 65 66 67 68 69 70 71 |
# File 'lib/ai4r/classifiers/naive_bayes.rb', line 63 def initialize @m = 0 @class_counts = [] @class_prob = [] # stores the probability of the classes @pcc = [] # stores the number of instances divided into attribute/value/class @pcp = [] # stores the conditional probabilities of the values of an attribute @klass_index = {} # hashmap for quick lookup of all the used klasses and their indice @values = {} # hashmap for quick lookup of all the values end |
Instance Method Details
#build(data) ⇒ Object
counts values of the attribute instances and calculates the probability of the classes and the conditional probabilities Parameter data has to be an instance of CsvDataSet
105 106 107 108 109 110 111 112 113 114 115 |
# File 'lib/ai4r/classifiers/naive_bayes.rb', line 105 def build(data) raise 'Error instance must be passed' unless data.is_a?(Ai4r::Data::DataSet) raise 'Data should not be empty' if data.data_items.length == 0 initialize_domain_data(data) initialize_klass_index initialize_pc calculate_probabilities self end |
#eval(data) ⇒ Object
You can evaluate new data, predicting its category. e.g.
b.eval(["Red", "SUV", "Domestic"])
=> 'No'
77 78 79 80 81 |
# File 'lib/ai4r/classifiers/naive_bayes.rb', line 77 def eval(data) prob = @class_prob.dup prob = calculate_class_probabilities_for_entry(data, prob) index_to_klass(prob.index(prob.max)) end |
#get_probability_map(data) ⇒ Object
Calculates the probabilities for the data entry Data. data has to be an array of the same dimension as the training data minus the class column. Returns a map containint all classes as keys: {Class_1 => probability, Class_2 => probability2 … } Probability is <= 1 and of type Float. e.g.
b.get_probability_map(["Red", "SUV", "Domestic"])
=> {"Yes"=>0.4166666666666667, "No"=>0.5833333333333334}
92 93 94 95 96 97 98 99 100 |
# File 'lib/ai4r/classifiers/naive_bayes.rb', line 92 def get_probability_map(data) prob = @class_prob.dup prob = calculate_class_probabilities_for_entry(data, prob) prob = normalize_class_probability prob probability_map = {} prob.each_with_index { |p, i| probability_map[index_to_klass(i)] = p } probability_map end |