Class: Ai4r::Clusterers::KMeans

Inherits:
Clusterer show all
Defined in:
lib/ai4r/clusterers/k_means.rb

Overview

The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, with k < n.

More about K Means algorithm: en.wikipedia.org/wiki/K-means_algorithm

Direct Known Subclasses

BisectingKMeans

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from Data::Parameterizable

#get_parameters, included, #set_parameters

Constructor Details

#initializeKMeans

Returns a new instance of KMeans.



39
40
41
42
43
44
45
46
# File 'lib/ai4r/clusterers/k_means.rb', line 39

def initialize
  @distance_function = nil
  @max_iterations = nil
  @old_centroids = nil
  @centroid_function = lambda do |data_sets| 
    data_sets.collect{ |data_set| data_set.get_mean_or_mode}
  end
end

Instance Attribute Details

#centroidsObject (readonly)

Returns the value of attribute centroids.



24
25
26
# File 'lib/ai4r/clusterers/k_means.rb', line 24

def centroids
  @centroids
end

#clustersObject (readonly)

Returns the value of attribute clusters.



24
25
26
# File 'lib/ai4r/clusterers/k_means.rb', line 24

def clusters
  @clusters
end

#data_setObject (readonly)

Returns the value of attribute data_set.



23
24
25
# File 'lib/ai4r/clusterers/k_means.rb', line 23

def data_set
  @data_set
end

#iterationsObject (readonly)

Returns the value of attribute iterations.



24
25
26
# File 'lib/ai4r/clusterers/k_means.rb', line 24

def iterations
  @iterations
end

#number_of_clustersObject (readonly)

Returns the value of attribute number_of_clusters.



23
24
25
# File 'lib/ai4r/clusterers/k_means.rb', line 23

def number_of_clusters
  @number_of_clusters
end

Instance Method Details

#build(data_set, number_of_clusters) ⇒ Object

Build a new clusterer, using data examples found in data_set. Items will be clustered in “number_of_clusters” different clusters.



52
53
54
55
56
57
58
59
60
61
62
63
64
# File 'lib/ai4r/clusterers/k_means.rb', line 52

def build(data_set, number_of_clusters)
  @data_set = data_set
  @number_of_clusters = number_of_clusters
  @iterations = 0
  
  calc_initial_centroids
  while(not stop_criteria_met)
    calculate_membership_clusters
    recompute_centroids
  end
  
  return self
end

#distance(a, b) ⇒ Object

This function calculates the distance between 2 different instances. By default, it returns the euclidean distance to the power of 2. You can provide a more convinient distance implementation:

1- Overwriting this method

2- Providing a closure to the :distance_function parameter



81
82
83
84
# File 'lib/ai4r/clusterers/k_means.rb', line 81

def distance(a, b)
  return @distance_function.call(a, b) if @distance_function
  return euclidean_distance(a, b)
end

#eval(data_item) ⇒ Object

Classifies the given data item, returning the cluster index it belongs to (0-based).



68
69
70
71
# File 'lib/ai4r/clusterers/k_means.rb', line 68

def eval(data_item)
  get_min_index(@centroids.collect {|centroid| 
      distance(data_item, centroid)})
end