Class: Rumale::Manifold::TSNE

Inherits:
Object
  • Object
show all
Includes:
Base::BaseEstimator, Base::Transformer
Defined in:
lib/rumale/manifold/tsne.rb

Overview

TSNE is a class that implements t-Distributed Stochastic Neighbor Embedding (t-SNE) with fixed-point optimization algorithm. Fixed-point algorithm usually converges faster than gradient descent method and do not need the learning parameters such as the learning rate and momentum.

Reference

    1. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J. of Machine Learning Research, vol. 9, pp. 2579–2605, 2008.

    1. Yang, I. King, Z. Xu, and E. Oja, “Heavy-Tailed Symmetric Stochastic Neighbor Embedding,” Proc. NIPS’09, pp. 2169–2177, 2009.

Examples:

tsne = Rumale::Manifold::TSNE.new(perplexity: 40.0, init: 'pca', max_iter: 500, random_seed: 1)
representations = tsne.fit_transform(samples)

Instance Attribute Summary collapse

Attributes included from Base::BaseEstimator

#params

Instance Method Summary collapse

Constructor Details

#initialize(n_components: 2, perplexity: 30.0, metric: 'euclidean', init: 'random', max_iter: 500, tol: nil, verbose: false, random_seed: nil) ⇒ TSNE

Create a new transformer with t-SNE.

Parameters:

  • n_components (Integer) (defaults to: 2)

    The number of dimensions on representation space.

  • perplexity (Float) (defaults to: 30.0)

    The effective number of neighbors for each point. Perplexity are typically set from 5 to 50.

  • metric (String) (defaults to: 'euclidean')

    The metric to calculate the distances in original space. If metric is ‘euclidean’, Euclidean distance is calculated for distance in original space. If metric is ‘precomputed’, the fit and fit_transform methods expect to be given a distance matrix.

  • init (String) (defaults to: 'random')

    The init is a method to initialize the representaion space. If init is ‘random’, the representaion space is initialized with normal random variables. If init is ‘pca’, the result of principal component analysis as the initial value of the representation space.

  • max_iter (Integer) (defaults to: 500)

    The maximum number of iterations.

  • tol (Float) (defaults to: nil)

    The tolerance of KL-divergence for terminating optimization. If tol is nil, it does not use KL divergence as a criterion for terminating the optimization.

  • verbose (Boolean) (defaults to: false)

    The flag indicating whether to output KL divergence during iteration.

  • random_seed (Integer) (defaults to: nil)

    The seed value using to initialize the random generator.



59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# File 'lib/rumale/manifold/tsne.rb', line 59

def initialize(n_components: 2, perplexity: 30.0, metric: 'euclidean', init: 'random',
               max_iter: 500, tol: nil, verbose: false, random_seed: nil)
  check_params_integer(n_components: n_components, max_iter: max_iter)
  check_params_float(perplexity: perplexity)
  check_params_string(metric: metric, init: init)
  check_params_boolean(verbose: verbose)
  check_params_type_or_nil(Float, tol: tol)
  check_params_type_or_nil(Integer, random_seed: random_seed)
  check_params_positive(n_components: n_components, perplexity: perplexity, max_iter: max_iter)
  @params = {}
  @params[:n_components] = n_components
  @params[:perplexity] = perplexity
  @params[:max_iter] = max_iter
  @params[:tol] = tol
  @params[:metric] = metric
  @params[:init] = init
  @params[:verbose] = verbose
  @params[:random_seed] = random_seed
  @params[:random_seed] ||= srand
  @rng = Random.new(@params[:random_seed])
  @embedding = nil
  @kl_divergence = nil
  @n_iter = nil
end

Instance Attribute Details

#embeddingNumo::DFloat (readonly)

Return the data in representation space.

Returns:

  • (Numo::DFloat)

    (shape: [n_samples, n_components])



30
31
32
# File 'lib/rumale/manifold/tsne.rb', line 30

def embedding
  @embedding
end

#kl_divergenceFloat (readonly)

Return the Kullback-Leibler divergence after optimization.

Returns:

  • (Float)


34
35
36
# File 'lib/rumale/manifold/tsne.rb', line 34

def kl_divergence
  @kl_divergence
end

#n_iterInteger (readonly)

Return the number of iterations run for optimization

Returns:

  • (Integer)


38
39
40
# File 'lib/rumale/manifold/tsne.rb', line 38

def n_iter
  @n_iter
end

#rngRandom (readonly)

Return the random generator.

Returns:

  • (Random)


42
43
44
# File 'lib/rumale/manifold/tsne.rb', line 42

def rng
  @rng
end

Instance Method Details

#fit(x) ⇒ TSNE

Fit the model with given training data.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The training data to be used for fitting the model. If the metric is ‘precomputed’, x must be a square distance matrix (shape: [n_samples, n_samples]).

Returns:

  • (TSNE)

    The learned transformer itself.

Raises:

  • (ArgumentError)


91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
# File 'lib/rumale/manifold/tsne.rb', line 91

def fit(x, _not_used = nil)
  check_sample_array(x)
  raise ArgumentError, 'Expect the input distance matrix to be square.' if @params[:metric] == 'precomputed' && x.shape[0] != x.shape[1]
  # initialize some varibales.
  @n_iter = 0
  distance_mat = @params[:metric] == 'precomputed' ? x**2 : Rumale::PairwiseMetric.squared_error(x)
  hi_prob_mat = gaussian_distributed_probability_matrix(distance_mat)
  y = init_embedding(x)
  lo_prob_mat = t_distributed_probability_matrix(y)
  # perform fixed-point optimization.
  one_vec = Numo::DFloat.ones(x.shape[0]).expand_dims(1)
  @params[:max_iter].times do |t|
    break if terminate?(hi_prob_mat, lo_prob_mat)
    a = hi_prob_mat * lo_prob_mat
    b = lo_prob_mat * lo_prob_mat
    y = (b.dot(one_vec) * y + (a - b).dot(y)) / a.dot(one_vec)
    lo_prob_mat = t_distributed_probability_matrix(y)
    @n_iter = t + 1
    puts "[t-SNE] KL divergence after #{@n_iter} iterations: #{cost(hi_prob_mat, lo_prob_mat)}" if @params[:verbose] && (@n_iter % 100).zero?
  end
  # store results.
  @embedding = y
  @kl_divergence = cost(hi_prob_mat, lo_prob_mat)
  self
end

#fit_transform(x) ⇒ Numo::DFloat

Fit the model with training data, and then transform them with the learned model.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The training data to be used for fitting the model. If the metric is ‘precomputed’, x must be a square distance matrix (shape: [n_samples, n_samples]).

Returns:

  • (Numo::DFloat)

    (shape: [n_samples, n_components]) The transformed data



124
125
126
127
# File 'lib/rumale/manifold/tsne.rb', line 124

def fit_transform(x, _not_used = nil)
  fit(x)
  @embedding.dup
end

#marshal_dumpHash

Dump marshal data.

Returns:

  • (Hash)

    The marshal data.



131
132
133
134
135
136
137
# File 'lib/rumale/manifold/tsne.rb', line 131

def marshal_dump
  { params: @params,
    embedding: @embedding,
    kl_divergence: @kl_divergence,
    n_iter: @n_iter,
    rng: @rng }
end

#marshal_load(obj) ⇒ nil

Load marshal data.

Returns:

  • (nil)


141
142
143
144
145
146
147
148
# File 'lib/rumale/manifold/tsne.rb', line 141

def marshal_load(obj)
  @params = obj[:params]
  @embedding = obj[:embedding]
  @kl_divergence = obj[:kl_divergence]
  @n_iter = obj[:n_iter]
  @rng = obj[:rng]
  nil
end