Class: Rumale::Clustering::KMedoids
- Inherits:
-
Object
- Object
- Rumale::Clustering::KMedoids
- Includes:
- Base::BaseEstimator, Base::ClusterAnalyzer
- Defined in:
- lib/rumale/clustering/k_medoids.rb
Overview
KMedoids is a class that implements K-Medoids cluster analysis.
Reference
-
Arthur and S. Vassilvitskii, “k-means++: the advantages of careful seeding,” Proc. SODA’07, pp. 1027–1035, 2007.
-
Instance Attribute Summary collapse
-
#medoid_ids ⇒ Numo::Int32
readonly
Return the indices of medoids.
-
#rng ⇒ Random
readonly
Return the random generator.
Attributes included from Base::BaseEstimator
Instance Method Summary collapse
-
#fit(x) ⇒ KMedoids
Analysis clusters with given training data.
-
#fit_predict(x) ⇒ Numo::Int32
Analysis clusters and assign samples to clusters.
-
#initialize(n_clusters: 8, metric: 'euclidean', init: 'k-means++', max_iter: 50, tol: 1.0e-4, random_seed: nil) ⇒ KMedoids
constructor
Create a new cluster analyzer with K-Medoids method.
-
#marshal_dump ⇒ Hash
Dump marshal data.
-
#marshal_load(obj) ⇒ nil
Load marshal data.
-
#predict(x) ⇒ Numo::Int32
Predict cluster labels for samples.
Methods included from Base::ClusterAnalyzer
Constructor Details
#initialize(n_clusters: 8, metric: 'euclidean', init: 'k-means++', max_iter: 50, tol: 1.0e-4, random_seed: nil) ⇒ KMedoids
Create a new cluster analyzer with K-Medoids method.
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
# File 'lib/rumale/clustering/k_medoids.rb', line 39 def initialize(n_clusters: 8, metric: 'euclidean', init: 'k-means++', max_iter: 50, tol: 1.0e-4, random_seed: nil) check_params_integer(n_clusters: n_clusters, max_iter: max_iter) check_params_float(tol: tol) check_params_string(metric: metric, init: init) check_params_type_or_nil(Integer, random_seed: random_seed) check_params_positive(n_clusters: n_clusters, max_iter: max_iter) @params = {} @params[:n_clusters] = n_clusters @params[:metric] = metric == 'precomputed' ? 'precomputed' : 'euclidean' @params[:init] = init == 'random' ? 'random' : 'k-means++' @params[:max_iter] = max_iter @params[:tol] = tol @params[:random_seed] = random_seed @params[:random_seed] ||= srand @medoid_ids = nil @cluster_centers = nil @rng = Random.new(@params[:random_seed]) end |
Instance Attribute Details
#medoid_ids ⇒ Numo::Int32 (readonly)
Return the indices of medoids.
23 24 25 |
# File 'lib/rumale/clustering/k_medoids.rb', line 23 def medoid_ids @medoid_ids end |
#rng ⇒ Random (readonly)
Return the random generator.
27 28 29 |
# File 'lib/rumale/clustering/k_medoids.rb', line 27 def rng @rng end |
Instance Method Details
#fit(x) ⇒ KMedoids
Analysis clusters with given training data.
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
# File 'lib/rumale/clustering/k_medoids.rb', line 65 def fit(x, _not_used = nil) check_sample_array(x) raise ArgumentError, 'Expect the input distance matrix to be square.' if @params[:metric] == 'precomputed' && x.shape[0] != x.shape[1] # initialize some varibales. distance_mat = @params[:metric] == 'precomputed' ? x : Rumale::PairwiseMetric.euclidean_distance(x) init_cluster_centers(distance_mat) error = distance_mat[true, @medoid_ids].mean @params[:max_iter].times do |_t| cluster_labels = assign_cluster(distance_mat[true, @medoid_ids]) @params[:n_clusters].times do |n| assigned_ids = cluster_labels.eq(n).where @medoid_ids[n] = assigned_ids[distance_mat[assigned_ids, assigned_ids].sum(axis: 1).min_index] end new_error = distance_mat[true, @medoid_ids].mean break if (error - new_error).abs <= @params[:tol] error = new_error end @cluster_centers = x[@medoid_ids, true].dup if @params[:metric] == 'euclidean' self end |
#fit_predict(x) ⇒ Numo::Int32
Analysis clusters and assign samples to clusters.
105 106 107 108 109 110 111 112 113 |
# File 'lib/rumale/clustering/k_medoids.rb', line 105 def fit_predict(x) check_sample_array(x) fit(x) if @params[:metric] == 'precomputed' predict(x[true, @medoid_ids]) else predict(x) end end |
#marshal_dump ⇒ Hash
Dump marshal data.
117 118 119 120 121 122 |
# File 'lib/rumale/clustering/k_medoids.rb', line 117 def marshal_dump { params: @params, medoid_ids: @medoid_ids, cluster_centers: @cluster_centers, rng: @rng } end |
#marshal_load(obj) ⇒ nil
Load marshal data.
126 127 128 129 130 131 132 |
# File 'lib/rumale/clustering/k_medoids.rb', line 126 def marshal_load(obj) @params = obj[:params] @medoid_ids = obj[:medoid_ids] @cluster_centers = obj[:cluster_centers] @rng = obj[:rng] nil end |
#predict(x) ⇒ Numo::Int32
Predict cluster labels for samples.
91 92 93 94 95 96 97 98 |
# File 'lib/rumale/clustering/k_medoids.rb', line 91 def predict(x) check_sample_array(x) distance_mat = @params[:metric] == 'precomputed' ? x : Rumale::PairwiseMetric.euclidean_distance(x, @cluster_centers) if @params[:metric] == 'precomputed' && distance_mat.shape[1] != @medoid_ids.size raise ArgumentError, 'Expect the size input matrix to be n_samples-by-n_clusters.' end assign_cluster(distance_mat) end |