Class: Ai4r::Classifiers::ID3
- Inherits:
-
Classifier
- Object
- Classifier
- Ai4r::Classifiers::ID3
- Defined in:
- lib/ai4r/classifiers/id3.rb
Overview
Introduction
This is an implementation of the ID3 algorithm (Quinlan) Given a set of preclassified examples, it builds a top-down induction of decision tree, biased by the information gain and entropy measure.
How to use it
DATA_LABELS = [ 'city', 'age_range', 'gender', 'marketing_target' ]
DATA_ITEMS = [
['New York', '<30', 'M', 'Y'],
['Chicago', '<30', 'M', 'Y'],
['Chicago', '<30', 'F', 'Y'],
['New York', '<30', 'M', 'Y'],
['New York', '<30', 'M', 'Y'],
['Chicago', '[30-50)', 'M', 'Y'],
['New York', '[30-50)', 'F', 'N'],
['Chicago', '[30-50)', 'F', 'Y'],
['New York', '[30-50)', 'F', 'N'],
['Chicago', '[50-80]', 'M', 'N'],
['New York', '[50-80]', 'F', 'N'],
['New York', '[50-80]', 'M', 'N'],
['Chicago', '[50-80]', 'M', 'N'],
['New York', '[50-80]', 'F', 'N'],
['Chicago', '>80', 'F', 'Y']
]
data_set = DataSet.new(:data_items=>DATA_SET, :data_labels=>DATA_LABELS)
id3 = Ai4r::Classifiers::ID3.new.build(data_set)
id3.get_rules
# => if age_range=='<30' then marketing_target='Y'
elsif age_range=='[30-50)' and city=='Chicago' then marketing_target='Y'
elsif age_range=='[30-50)' and city=='New York' then marketing_target='N'
elsif age_range=='[50-80]' then marketing_target='N'
elsif age_range=='>80' then marketing_target='Y'
else raise 'There was not enough information during training to do a proper induction for this data element' end
id3.eval(['New York', '<30', 'M'])
# => 'Y'
A better way to load the data
In the real life you will use lot more data training examples, with more attributes. Consider moving your data to an external CSV (comma separate values) file.
data_file = "#{File.dirname(__FILE__)}/data_set.csv"
data_set = DataSet.load_csv_with_labels data_file
id3 = Ai4r::Classifiers::ID3.new.build(data_set)
A nice tip for data evaluation
id3 = Ai4r::Classifiers::ID3.new.build(data_set)
age_range = '<30'
marketing_target = nil
eval id3.get_rules
puts marketing_target
# => 'Y'
More about ID3 and decision trees
About the project
- Author
-
Sergio Fierens
- License
-
MPL 1.1
- Url
Instance Attribute Summary collapse
-
#data_set ⇒ Object
readonly
Returns the value of attribute data_set.
Instance Method Summary collapse
-
#build(data_set) ⇒ Object
Create a new ID3 classifier.
-
#eval(data) ⇒ Object
You can evaluate new data, predicting its category.
-
#get_rules ⇒ Object
This method returns the generated rules in ruby code.
Methods included from Data::Parameterizable
#get_parameters, included, #set_parameters
Instance Attribute Details
#data_set ⇒ Object (readonly)
Returns the value of attribute data_set.
94 95 96 |
# File 'lib/ai4r/classifiers/id3.rb', line 94 def data_set @data_set end |
Instance Method Details
#build(data_set) ⇒ Object
Create a new ID3 classifier. You must provide a DataSet instance as parameter. The last attribute of each item is considered as the item class.
99 100 101 102 103 104 |
# File 'lib/ai4r/classifiers/id3.rb', line 99 def build(data_set) data_set.check_not_empty @data_set = data_set preprocess_data(@data_set.data_items) return self end |
#eval(data) ⇒ Object
You can evaluate new data, predicting its category. e.g.
id3.eval(['New York', '<30', 'F']) # => 'Y'
109 110 111 |
# File 'lib/ai4r/classifiers/id3.rb', line 109 def eval(data) @tree.value(data) if @tree end |
#get_rules ⇒ Object
This method returns the generated rules in ruby code. e.g.
id3.get_rules
# => if age_range=='<30' then marketing_target='Y'
elsif age_range=='[30-50)' and city=='Chicago' then marketing_target='Y'
elsif age_range=='[30-50)' and city=='New York' then marketing_target='N'
elsif age_range=='[50-80]' then marketing_target='N'
elsif age_range=='>80' then marketing_target='Y'
else raise 'There was not enough information during training to do a proper induction for this data element' end
It is a nice way to inspect induction results, and also to execute them:
age_range = '<30'
marketing_target = nil
eval id3.get_rules
puts marketing_target
# => 'Y'
130 131 132 133 134 135 136 137 |
# File 'lib/ai4r/classifiers/id3.rb', line 130 def get_rules #return "Empty ID3 tree" if !@tree rules = @tree.get_rules rules = rules.collect do |rule| "#{rule[0..-2].join(' and ')} then #{rule.last}" end return "if #{rules.join("\nelsif ")}\nelse raise 'There was not enough information during training to do a proper induction for this data element' end" end |