Class: Spark::Mllib::KMeans

Inherits:
Object
  • Object
show all
Defined in:
lib/spark/mllib/clustering/kmeans.rb

Class Method Summary collapse

Class Method Details

.train(rdd, k, max_iterations: 100, runs: 1, initialization_mode: 'k-means||', seed: nil, initialization_steps: 5, epsilon: 0.0001) ⇒ Object

Trains a k-means model using the given set of parameters.

Arguments:

rdd

The training data, an RDD of Vectors.

k

Number of clusters.

max_iterations

Max number of iterations.

runs

Number of parallel runs, defaults to 1. The best model is returned.

initialization_mode

Initialization model, either “random” or “k-means||” (default).

seed

Random seed value for cluster initialization.

epsilon

The distance threshold within which we’ve consider centers to have converged.



113
114
115
116
117
118
# File 'lib/spark/mllib/clustering/kmeans.rb', line 113

def self.train(rdd, k, max_iterations: 100, runs: 1, initialization_mode: 'k-means||', seed: nil,
                       initialization_steps: 5, epsilon: 0.0001)
  # Call returns KMeansModel
  Spark.jb.call(RubyMLLibAPI.new, 'trainKMeansModel', rdd,
                k, max_iterations, runs, initialization_mode, Spark.jb.to_long(seed), initialization_steps, epsilon)
end