Class: Rumale::Clustering::SNN

Inherits:
DBSCAN
  • Object
show all
Defined in:
lib/rumale/clustering/snn.rb

Overview

SNN is a class that implements Shared Nearest Neighbor cluster analysis. The SNN method is a variation of DBSCAN that uses similarity based on k-nearest neighbors as a metric.

Reference

    1. Ertoz, M. Steinbach, and V. Kumar, “Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data,” Proc. SDM’03, pp. 47–58, 2003.

  • M E. Houle, H-P. Kriegel, P. Kroger, E. Schubert, and A. Zimek, “Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?,” Proc. SSDBM’10, pp. 482–500, 2010.

Examples:

analyzer = Rumale::Clustering::SNN.new(n_neighbros: 10, eps: 5, min_samples: 5)
cluster_labels = analyzer.fit_predict(samples)

Instance Attribute Summary

Attributes inherited from DBSCAN

#core_sample_ids, #labels

Attributes included from Base::BaseEstimator

#params

Instance Method Summary collapse

Methods inherited from DBSCAN

#marshal_dump, #marshal_load

Methods included from Base::ClusterAnalyzer

#score

Constructor Details

#initialize(n_neighbors: 10, eps: 5, min_samples: 5, metric: 'euclidean') ⇒ SNN

Create a new cluster analyzer with Shared Neareset Neighbor method.

Parameters:

  • n_neighbors (Integer) (defaults to: 10)

    The number of neighbors to be used for finding k-nearest neighbors.

  • eps (Integer) (defaults to: 5)

    The threshold value for finding connected components based on similarity.

  • min_samples (Integer) (defaults to: 5)

    The number of neighbor samples to be used for the criterion whether a point is a core point.

  • metric (String) (defaults to: 'euclidean')

    The metric to calculate the distances. If metric is ‘euclidean’, Euclidean distance is calculated for distance between points. If metric is ‘precomputed’, the fit and fit_transform methods expect to be given a distance matrix.



27
28
29
30
31
32
33
34
35
36
37
# File 'lib/rumale/clustering/snn.rb', line 27

def initialize(n_neighbors: 10, eps: 5, min_samples: 5, metric: 'euclidean')
  check_params_integer(n_neighbors: n_neighbors, min_samples: min_samples)
  check_params_string(metric: metric)
  @params = {}
  @params[:n_neighbors] = n_neighbors
  @params[:eps] = eps
  @params[:min_samples] = min_samples
  @params[:metric] = metric == 'precomputed' ? 'precomputed' : 'euclidean'
  @core_sample_ids = nil
  @labels = nil
end

Instance Method Details

#fit(x) ⇒ SNN

Analysis clusters with given training data.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The training data to be used for cluster analysis. If the metric is ‘precomputed’, x must be a square distance matrix (shape: [n_samples, n_samples]).

Returns:

  • (SNN)

    The learned cluster analyzer itself.



45
46
47
# File 'lib/rumale/clustering/snn.rb', line 45

def fit(x, _y = nil)
  super
end

#fit_predict(x) ⇒ Numo::Int32

Analysis clusters and assign samples to clusters.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The samples to be used for cluster analysis. If the metric is ‘precomputed’, x must be a square distance matrix (shape: [n_samples, n_samples]).

Returns:

  • (Numo::Int32)

    (shape: [n_samples]) Predicted cluster label per sample.



54
55
56
# File 'lib/rumale/clustering/snn.rb', line 54

def fit_predict(x)
  super
end