Class: Rumale::Clustering::PowerIteration

Inherits:

Object

Object
Rumale::Clustering::PowerIteration

show all

Includes:: Base::BaseEstimator, Base::ClusterAnalyzer

Defined in:: lib/rumale/clustering/power_iteration.rb

Overview

PowerIteration is a class that implements power iteration clustering.

Reference

1. Lin and W W. Cohen, “Power Iteration Clustering,” Proc. ICML’10, pp. 655–662, 2010.

Examples:

analyzer = Rumale::Clustering::PowerIteration.new(n_clusters: 10, gamma: 8.0, max_iter: 1000)
cluster_labels = analyzer.fit_predict(samples)

Instance Attribute Summary collapse

#embedding ⇒ Numo::DFloat readonly

Return the data in embedded space.
#n_iter ⇒ Integer readonly

Return the number of iterations run for optimization.

Attributes included from Base::BaseEstimator

#params

Instance Method Summary collapse

#fit(x) ⇒ PowerIteration

Analysis clusters with given training data.
#fit_predict(x) ⇒ Numo::Int32

Analysis clusters and assign samples to clusters.
#initialize(n_clusters: 8, affinity: 'rbf', gamma: nil, init: 'k-means++', max_iter: 1000, tol: 1.0e-8, eps: 1.0e-5, random_seed: nil) ⇒ PowerIteration constructor

Create a new cluster analyzer with power iteration clustering.
#marshal_dump ⇒ Hash

Dump marshal data.
#marshal_load(obj) ⇒ nil

Load marshal data.

Methods included from Base::ClusterAnalyzer

#score

Constructor Details

#initialize(n_clusters: 8, affinity: 'rbf', gamma: nil, init: 'k-means++', max_iter: 1000, tol: 1.0e-8, eps: 1.0e-5, random_seed: nil) ⇒ `PowerIteration`

Create a new cluster analyzer with power iteration clustering.

Parameters:

n_clusters (Integer) (defaults to: 8) —

The number of clusters.
affinity (String) (defaults to: 'rbf') —

The representation of affinity matrix (‘rbf’ or ‘precomputed’).
gamma (Float) (defaults to: nil) —

The parameter of rbf kernel, if nil it is 1 / n_features. If affinity = ‘precomputed’, this parameter is ignored.
init (String) (defaults to: 'k-means++') —

The initialization method for centroids of K-Means clustering (‘random’ or ‘k-means++’).
max_iter (Integer) (defaults to: 1000) —

The maximum number of iterations.
tol (Float) (defaults to: 1.0e-8) —

The tolerance of termination criterion.
eps (Float) (defaults to: 1.0e-5) —

A small value close to zero to avoid zero division error.
random_seed (Integer) (defaults to: nil) —

The seed value using to initialize the random generator.

# File 'lib/rumale/clustering/power_iteration.rb', line 40

def initialize(n_clusters: 8, affinity: 'rbf', gamma: nil, init: 'k-means++', max_iter: 1000, tol: 1.0e-8, eps: 1.0e-5, random_seed: nil)
  check_params_integer(n_clusters: n_clusters, max_iter: max_iter)
  check_params_float(tol: tol, eps: eps)
  check_params_string(affinity: affinity, init: init)
  check_params_type_or_nil(Float, gamma: gamma)
  check_params_type_or_nil(Integer, random_seed: random_seed)
  check_params_positive(n_clusters: n_clusters, max_iter: max_iter, tol: tol, eps: eps)
  @params = {}
  @params[:n_clusters] = n_clusters
  @params[:affinity] = affinity
  @params[:gamma] = gamma
  @params[:init] = init == 'random' ? 'random' : 'k-means++'
  @params[:max_iter] = max_iter
  @params[:tol] = tol
  @params[:eps] = eps
  @params[:random_seed] = random_seed
  @params[:random_seed] ||= srand
  @embedding = nil
  @n_iter = nil
end

Instance Attribute Details

#embedding ⇒ `Numo::DFloat` (readonly)

Return the data in embedded space.

Returns:

(Numo::DFloat) —

(shape: [n_samples])



23
24
25

# File 'lib/rumale/clustering/power_iteration.rb', line 23

def embedding
  @embedding
end

#n_iter ⇒ `Integer` (readonly)

Return the number of iterations run for optimization

Returns:

(Integer)



27
28
29

# File 'lib/rumale/clustering/power_iteration.rb', line 27

def n_iter
  @n_iter
end

Instance Method Details

#fit(x) ⇒ `PowerIteration`

Analysis clusters with given training data.

Parameters:

x (Numo::DFloat) —

(shape: [n_samples, n_features]) The training data to be used for cluster analysis. If the metric is ‘precomputed’, x must be a square affinity matrix (shape: [n_samples, n_samples]).

Returns:

(PowerIteration) —

The learned cluster analyzer itself.

Raises:

(ArgumentError)

# File 'lib/rumale/clustering/power_iteration.rb', line 68

def fit(x, _y = nil)
  check_sample_array(x)
  raise ArgumentError, 'Expect the input affinity matrix to be square.' if @params[:affinity] == 'precomputed' && x.shape[0] != x.shape[1]
  # initialize some variables.
  affinity_mat = @params[:metric] == 'precomputed' ? x : Rumale::PairwiseMetric.rbf_kernel(x, nil, @params[:gamma])
  affinity_mat[affinity_mat.diag_indices] = 0.0
  n_samples = affinity_mat.shape[0]
  tol = @params[:tol].fdiv(n_samples)
  # calculate normalized affinity matrix.
  degrees = affinity_mat.sum(axis: 1)
  normalized_affinity_mat = (1.0 / degrees).diag.dot(affinity_mat)
  # initialize embedding space.
  @embedding = degrees / degrees.sum
  # optimization
  @n_iter = 0
  error = Numo::DFloat.ones(n_samples)
  @params[:max_iter].times do |t|
    @n_iter = t + 1
    new_embedding = normalized_affinity_mat.dot(@embedding)
    new_embedding /= new_embedding.abs.sum
    new_error = (new_embedding - @embedding).abs
    break if (new_error - error).abs.max <= tol
    @embedding = new_embedding
    error = new_error
  end
  self
end

#fit_predict(x) ⇒ `Numo::Int32`

Analysis clusters and assign samples to clusters.

Parameters:

x (Numo::DFloat) —

(shape: [n_samples, n_features]) The training data to be used for cluster analysis. If the metric is ‘precomputed’, x must be a square affinity matrix (shape: [n_samples, n_samples]).

Returns:

(Numo::Int32) —

(shape: [n_samples]) Predicted cluster label per sample.

# File 'lib/rumale/clustering/power_iteration.rb', line 101

def fit_predict(x)
  check_sample_array(x)
  fit(x)
  kmeans = Rumale::Clustering::KMeans.new(
    n_clusters: @params[:n_clusters], init: @params[:init],
    max_iter: @params[:max_iter], tol: @params[:tol], random_seed: @params[:random_seed]
  )
  kmeans.fit_predict(@embedding.expand_dims(1))
end

#marshal_dump ⇒ `Hash`

Dump marshal data.

Returns:

(Hash) —

The marshal data.

# File 'lib/rumale/clustering/power_iteration.rb', line 113

def marshal_dump
  { params: @params,
    embedding: @embedding,
    n_iter: @n_iter }
end

#marshal_load(obj) ⇒ `nil`

Load marshal data.

Returns:

(nil)

# File 'lib/rumale/clustering/power_iteration.rb', line 121

def marshal_load(obj)
  @params = obj[:params]
  @embedding = obj[:embedding]
  @n_iter = obj[:n_iter]
  nil
end

Class: Rumale::Clustering::PowerIteration

Overview

Examples:

Instance Attribute Summary collapse

Attributes included from Base::BaseEstimator

Instance Method Summary collapse

Methods included from Base::ClusterAnalyzer

Constructor Details

#initialize(n_clusters: 8, affinity: 'rbf', gamma: nil, init: 'k-means++', max_iter: 1000, tol: 1.0e-8, eps: 1.0e-5, random_seed: nil) ⇒ PowerIteration

Instance Attribute Details

#embedding ⇒ Numo::DFloat (readonly)

#n_iter ⇒ Integer (readonly)

Instance Method Details

#fit(x) ⇒ PowerIteration

#fit_predict(x) ⇒ Numo::Int32

#marshal_dump ⇒ Hash

#marshal_load(obj) ⇒ nil

#initialize(n_clusters: 8, affinity: 'rbf', gamma: nil, init: 'k-means++', max_iter: 1000, tol: 1.0e-8, eps: 1.0e-5, random_seed: nil) ⇒ `PowerIteration`

#embedding ⇒ `Numo::DFloat` (readonly)

#n_iter ⇒ `Integer` (readonly)

#fit(x) ⇒ `PowerIteration`

#fit_predict(x) ⇒ `Numo::Int32`

#marshal_dump ⇒ `Hash`

#marshal_load(obj) ⇒ `nil`