Class: Ai4r::Classifiers::ID3

Inherits:
Classifier show all
Defined in:
lib/ai4r/classifiers/id3.rb

Overview

Introduction

This is an implementation of the ID3 algorithm (Quinlan) Given a set of preclassified examples, it builds a top-down induction of decision tree, biased by the information gain and entropy measure.

How to use it

DATA_LABELS = [ 'city', 'age_range', 'gender', 'marketing_target'  ]

DATA_ITEMS = [  
       ['New York',  '<30',      'M', 'Y'],
       ['Chicago',     '<30',      'M', 'Y'],
       ['Chicago',     '<30',      'F', 'Y'],
       ['New York',  '<30',      'M', 'Y'],
       ['New York',  '<30',      'M', 'Y'],
       ['Chicago',     '[30-50)',  'M', 'Y'],
       ['New York',  '[30-50)',  'F', 'N'],
       ['Chicago',     '[30-50)',  'F', 'Y'],
       ['New York',  '[30-50)',  'F', 'N'],
       ['Chicago',     '[50-80]', 'M', 'N'],
       ['New York',  '[50-80]', 'F', 'N'],
       ['New York',  '[50-80]', 'M', 'N'],
       ['Chicago',     '[50-80]', 'M', 'N'],
       ['New York',  '[50-80]', 'F', 'N'],
       ['Chicago',     '>80',      'F', 'Y']
     ]

data_set = DataSet.new(:data_items=>DATA_SET, :data_labels=>DATA_LABELS)
id3 = Ai4r::Classifiers::ID3.new.build(data_set)

id3.get_rules
  # =>  if age_range=='<30' then marketing_target='Y'
        elsif age_range=='[30-50)' and city=='Chicago' then marketing_target='Y'
        elsif age_range=='[30-50)' and city=='New York' then marketing_target='N'
        elsif age_range=='[50-80]' then marketing_target='N'
        elsif age_range=='>80' then marketing_target='Y'
        else raise 'There was not enough information during training to do a proper induction for this data element' end

id3.eval(['New York', '<30', 'M'])
  # =>  'Y'

A better way to load the data

In the real life you will use lot more data training examples, with more attributes. Consider moving your data to an external CSV (comma separate values) file.

data_file = "#{File.dirname(__FILE__)}/data_set.csv"
data_set = DataSet.load_csv_with_labels data_file
id3 = Ai4r::Classifiers::ID3.new.build(data_set)

A nice tip for data evaluation

id3 = Ai4r::Classifiers::ID3.new.build(data_set)

age_range = '<30'
marketing_target = nil
eval id3.get_rules   
puts marketing_target
  # =>  'Y'

More about ID3 and decision trees

About the project

Author

Sergio Fierens

License

MPL 1.1

Url

ai4r.org/

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from Data::Parameterizable

#get_parameters, included, #set_parameters

Instance Attribute Details

#data_setObject (readonly)

Returns the value of attribute data_set.



94
95
96
# File 'lib/ai4r/classifiers/id3.rb', line 94

def data_set
  @data_set
end

Instance Method Details

#build(data_set) ⇒ Object

Create a new ID3 classifier. You must provide a DataSet instance as parameter. The last attribute of each item is considered as the item class.



99
100
101
102
103
104
# File 'lib/ai4r/classifiers/id3.rb', line 99

def build(data_set)
  data_set.check_not_empty
  @data_set = data_set
  preprocess_data(@data_set.data_items)
  return self
end

#eval(data) ⇒ Object

You can evaluate new data, predicting its category. e.g.

id3.eval(['New York',  '<30', 'F'])  # => 'Y'


109
110
111
# File 'lib/ai4r/classifiers/id3.rb', line 109

def eval(data)
  @tree.value(data) if @tree
end

#get_rulesObject

This method returns the generated rules in ruby code. e.g.

id3.get_rules
  # =>  if age_range=='<30' then marketing_target='Y'
        elsif age_range=='[30-50)' and city=='Chicago' then marketing_target='Y'
        elsif age_range=='[30-50)' and city=='New York' then marketing_target='N'
        elsif age_range=='[50-80]' then marketing_target='N'
        elsif age_range=='>80' then marketing_target='Y'
        else raise 'There was not enough information during training to do a proper induction for this data element' end

It is a nice way to inspect induction results, and also to execute them:

age_range = '<30'
marketing_target = nil
eval id3.get_rules   
puts marketing_target
  # =>  'Y'


130
131
132
133
134
135
136
137
# File 'lib/ai4r/classifiers/id3.rb', line 130

def get_rules
  #return "Empty ID3 tree" if !@tree
  rules = @tree.get_rules
  rules = rules.collect do |rule|
      "#{rule[0..-2].join(' and ')} then #{rule.last}"
  end
  return "if #{rules.join("\nelsif ")}\nelse raise 'There was not enough information during training to do a proper induction for this data element' end"
end