Class: Knn

Inherits:
Object
  • Object
show all
Defined in:
lib/phisher/knn.rb

Overview

Knn : K-Nearest-Neighbor

the KNN algorithm is very simple Given a set of labeled training data, <x,f(x)> , a new input will be compared with each x to determine the distance. After this the class of the k-closest distances will be chosen

Usage Example:

knn = Knn.new

print “training Knn… ” 10.times do |i|

klazz = 0
klazz = 1 if i >= 5
knn.train([i],klazz)

end

puts “[done]” knn.data_set.each_with_index {|klass,index| p “class #{index}: #klass”}

puts “Classifying a few inputs” 20.times do |i|

test = i.to_f/2
print "#{test} =>"
puts knn.classify([test])

end

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeKnn

Returns a new instance of Knn.



37
38
39
40
41
42
43
44
45
# File 'lib/phisher/knn.rb', line 37

def initialize()
  @training_set = []
  @default_distance = lambda do |array1, array2|
    squares_sum = array1.zip(array2).map do |item|
      (item[0] - item[1])**2
    end
    Math.sqrt(squares_sum.reduce(:+))
  end
end

Instance Attribute Details

#default_distanceObject (readonly)

Returns the value of attribute default_distance.



35
36
37
# File 'lib/phisher/knn.rb', line 35

def default_distance
  @default_distance
end

#training_setObject (readonly)

Returns the value of attribute training_set.



34
35
36
# File 'lib/phisher/knn.rb', line 34

def training_set
  @training_set
end

Instance Method Details

#classify(data, k, &distance) ⇒ Object

Returns the class closest to the data point for a given K

Arguments:

{Array} data an array
{integer} k the number of classes to consider
{block} distance an optional block in case you want
        to provide a custom distance function

Returns:

The class that the data array should belong to


59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# File 'lib/phisher/knn.rb', line 59

def classify(data, k, &distance)

  if distance == nil
    distance = @default_distance
  end

  distances = @training_set.map do |training_point|
    [ distance.call(training_point.data, data), training_point.label ]
  end
  sorted_distances = distances.sort
  nearest_neightbors = sorted_distances.first(k)
  classes = nearest_neightbors.map { |neighbor| neighbor[1] }
  class_frequencies = get_class_frequencies(classes)
  most_frequent(class_frequencies)
end

#train(data, label) ⇒ Object

Classifies an array with the given label.

Arguments:

{Array} data the array that will be labeled
{symbol} label an identifier for the label

Returns:

An instance of the training set


83
84
85
86
# File 'lib/phisher/knn.rb', line 83

def train(data, label)
  training_point = TrainingPoint.new data, label
  @training_set.push training_point
end