Class: Rumale::Manifold::TSNE

Inherits:

Object

Object
Rumale::Manifold::TSNE

show all

Includes:: Base::BaseEstimator, Base::Transformer

Defined in:: lib/rumale/manifold/tsne.rb

Overview

TSNE is a class that implements t-Distributed Stochastic Neighbor Embedding (t-SNE) with fixed-point optimization algorithm. Fixed-point algorithm usually converges faster than gradient descent method and do not need the learning parameters such as the learning rate and momentum.

Reference

1. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J. of Machine Learning Research, vol. 9, pp. 2579–2605, 2008.
1. Yang, I. King, Z. Xu, and E. Oja, “Heavy-Tailed Symmetric Stochastic Neighbor Embedding,” Proc. NIPS’09, pp. 2169–2177, 2009.

Examples:

tsne = Rumale::Manifold::TSNE.new(perplexity: 40.0, init: 'pca', max_iter: 500, random_seed: 1)
representations = tsne.fit_transform(samples)

Instance Attribute Summary collapse

#embedding ⇒ Numo::DFloat readonly

Return the data in representation space.
#kl_divergence ⇒ Float readonly

Return the Kullback-Leibler divergence after optimization.
#n_iter ⇒ Integer readonly

Return the number of iterations run for optimization.
#rng ⇒ Random readonly

Return the random generator.

Attributes included from Base::BaseEstimator

#params

Instance Method Summary collapse

#fit(x) ⇒ TSNE

Fit the model with given training data.
#fit_transform(x) ⇒ Numo::DFloat

Fit the model with training data, and then transform them with the learned model.
#initialize(n_components: 2, perplexity: 30.0, metric: 'euclidean', init: 'random', max_iter: 500, tol: nil, verbose: false, random_seed: nil) ⇒ TSNE constructor

Create a new transformer with t-SNE.
#marshal_dump ⇒ Hash

Dump marshal data.
#marshal_load(obj) ⇒ nil

Load marshal data.

Constructor Details

#initialize(n_components: 2, perplexity: 30.0, metric: 'euclidean', init: 'random', max_iter: 500, tol: nil, verbose: false, random_seed: nil) ⇒ `TSNE`

Create a new transformer with t-SNE.

Parameters:

n_components (Integer) (defaults to: 2) —

The number of dimensions on representation space.
perplexity (Float) (defaults to: 30.0) —

The effective number of neighbors for each point. Perplexity are typically set from 5 to 50.
metric (String) (defaults to: 'euclidean') —

The metric to calculate the distances in original space. If metric is ‘euclidean’, Euclidean distance is calculated for distance in original space. If metric is ‘precomputed’, the fit and fit_transform methods expect to be given a distance matrix.
init (String) (defaults to: 'random') —

The init is a method to initialize the representaion space. If init is ‘random’, the representaion space is initialized with normal random variables. If init is ‘pca’, the result of principal component analysis as the initial value of the representation space.
max_iter (Integer) (defaults to: 500) —

The maximum number of iterations.
tol (Float) (defaults to: nil) —

The tolerance of KL-divergence for terminating optimization. If tol is nil, it does not use KL divergence as a criterion for terminating the optimization.
verbose (Boolean) (defaults to: false) —

The flag indicating whether to output KL divergence during iteration.
random_seed (Integer) (defaults to: nil) —

The seed value using to initialize the random generator.

# File 'lib/rumale/manifold/tsne.rb', line 59

def initialize(n_components: 2, perplexity: 30.0, metric: 'euclidean', init: 'random',
               max_iter: 500, tol: nil, verbose: false, random_seed: nil)
  check_params_integer(n_components: n_components, max_iter: max_iter)
  check_params_float(perplexity: perplexity)
  check_params_string(metric: metric, init: init)
  check_params_boolean(verbose: verbose)
  check_params_type_or_nil(Float, tol: tol)
  check_params_type_or_nil(Integer, random_seed: random_seed)
  check_params_positive(n_components: n_components, perplexity: perplexity, max_iter: max_iter)
  @params = {}
  @params[:n_components] = n_components
  @params[:perplexity] = perplexity
  @params[:max_iter] = max_iter
  @params[:tol] = tol
  @params[:metric] = metric
  @params[:init] = init
  @params[:verbose] = verbose
  @params[:random_seed] = random_seed
  @params[:random_seed] ||= srand
  @rng = Random.new(@params[:random_seed])
  @embedding = nil
  @kl_divergence = nil
  @n_iter = nil
end

Instance Attribute Details

#embedding ⇒ `Numo::DFloat` (readonly)

Return the data in representation space.

Returns:

(Numo::DFloat) —

(shape: [n_samples, n_components])



30
31
32

# File 'lib/rumale/manifold/tsne.rb', line 30

def embedding
  @embedding
end

#kl_divergence ⇒ `Float` (readonly)

Return the Kullback-Leibler divergence after optimization.

Returns:

(Float)



34
35
36

# File 'lib/rumale/manifold/tsne.rb', line 34

def kl_divergence
  @kl_divergence
end

#n_iter ⇒ `Integer` (readonly)

Return the number of iterations run for optimization

Returns:

(Integer)



38
39
40

# File 'lib/rumale/manifold/tsne.rb', line 38

def n_iter
  @n_iter
end

#rng ⇒ `Random` (readonly)

Return the random generator.

Returns:

(Random)



42
43
44

# File 'lib/rumale/manifold/tsne.rb', line 42

def rng
  @rng
end

Instance Method Details

#fit(x) ⇒ `TSNE`

Fit the model with given training data.

Parameters:

x (Numo::DFloat) —

(shape: [n_samples, n_features]) The training data to be used for fitting the model. If the metric is ‘precomputed’, x must be a square distance matrix (shape: [n_samples, n_samples]).

Returns:

(TSNE) —

The learned transformer itself.

Raises:

(ArgumentError)

# File 'lib/rumale/manifold/tsne.rb', line 91

def fit(x, _not_used = nil)
  check_sample_array(x)
  raise ArgumentError, 'Expect the input distance matrix to be square.' if @params[:metric] == 'precomputed' && x.shape[0] != x.shape[1]
  # initialize some varibales.
  @n_iter = 0
  distance_mat = @params[:metric] == 'precomputed' ? x**2 : Rumale::PairwiseMetric.squared_error(x)
  hi_prob_mat = gaussian_distributed_probability_matrix(distance_mat)
  y = init_embedding(x)
  lo_prob_mat = t_distributed_probability_matrix(y)
  # perform fixed-point optimization.
  one_vec = Numo::DFloat.ones(x.shape[0]).expand_dims(1)
  @params[:max_iter].times do |t|
    break if terminate?(hi_prob_mat, lo_prob_mat)
    a = hi_prob_mat * lo_prob_mat
    b = lo_prob_mat * lo_prob_mat
    y = (b.dot(one_vec) * y + (a - b).dot(y)) / a.dot(one_vec)
    lo_prob_mat = t_distributed_probability_matrix(y)
    @n_iter = t + 1
    puts "[t-SNE] KL divergence after #{@n_iter} iterations: #{cost(hi_prob_mat, lo_prob_mat)}" if @params[:verbose] && (@n_iter % 100).zero?
  end
  # store results.
  @embedding = y
  @kl_divergence = cost(hi_prob_mat, lo_prob_mat)
  self
end

#fit_transform(x) ⇒ `Numo::DFloat`

Fit the model with training data, and then transform them with the learned model.

Parameters:

x (Numo::DFloat) —

(shape: [n_samples, n_features]) The training data to be used for fitting the model. If the metric is ‘precomputed’, x must be a square distance matrix (shape: [n_samples, n_samples]).

Returns:

(Numo::DFloat) —

(shape: [n_samples, n_components]) The transformed data

# File 'lib/rumale/manifold/tsne.rb', line 124

def fit_transform(x, _not_used = nil)
  fit(x)
  @embedding.dup
end

#marshal_dump ⇒ `Hash`

Dump marshal data.

Returns:

(Hash) —

The marshal data.

# File 'lib/rumale/manifold/tsne.rb', line 131

def marshal_dump
  { params: @params,
    embedding: @embedding,
    kl_divergence: @kl_divergence,
    n_iter: @n_iter,
    rng: @rng }
end

#marshal_load(obj) ⇒ `nil`

Load marshal data.

Returns:

(nil)

# File 'lib/rumale/manifold/tsne.rb', line 141

def marshal_load(obj)
  @params = obj[:params]
  @embedding = obj[:embedding]
  @kl_divergence = obj[:kl_divergence]
  @n_iter = obj[:n_iter]
  @rng = obj[:rng]
  nil
end

Class: Rumale::Manifold::TSNE

Overview

Examples:

Instance Attribute Summary collapse

Attributes included from Base::BaseEstimator

Instance Method Summary collapse

Constructor Details

#initialize(n_components: 2, perplexity: 30.0, metric: 'euclidean', init: 'random', max_iter: 500, tol: nil, verbose: false, random_seed: nil) ⇒ TSNE

Instance Attribute Details

#embedding ⇒ Numo::DFloat (readonly)

#kl_divergence ⇒ Float (readonly)

#n_iter ⇒ Integer (readonly)

#rng ⇒ Random (readonly)

Instance Method Details

#fit(x) ⇒ TSNE

#fit_transform(x) ⇒ Numo::DFloat

#marshal_dump ⇒ Hash

#marshal_load(obj) ⇒ nil

#initialize(n_components: 2, perplexity: 30.0, metric: 'euclidean', init: 'random', max_iter: 500, tol: nil, verbose: false, random_seed: nil) ⇒ `TSNE`

#embedding ⇒ `Numo::DFloat` (readonly)

#kl_divergence ⇒ `Float` (readonly)

#n_iter ⇒ `Integer` (readonly)

#rng ⇒ `Random` (readonly)

#fit(x) ⇒ `TSNE`

#fit_transform(x) ⇒ `Numo::DFloat`

#marshal_dump ⇒ `Hash`

#marshal_load(obj) ⇒ `nil`