Class: Rumale::Manifold::MDS

Inherits:
Object
  • Object
show all
Includes:
Base::BaseEstimator, Base::Transformer
Defined in:
lib/rumale/manifold/mds.rb

Overview

MDS is a class that implements Metric Multidimensional Scaling (MDS) with Scaling by MAjorizing a COmplicated Function (SMACOF) algorithm.

Reference

  • P J. F. Groenen and M. van de Velden, “Multidimensional Scaling by Majorization: A Review,” J. of Statistical Software, Vol. 73 (8), 2016.

Examples:

mds = Rumale::Manifold::MDS.new(init: 'pca', max_iter: 500, random_seed: 1)
representations = mds.fit_transform(samples)

Instance Attribute Summary collapse

Attributes included from Base::BaseEstimator

#params

Instance Method Summary collapse

Constructor Details

#initialize(n_components: 2, metric: 'euclidean', init: 'random', max_iter: 300, tol: nil, verbose: false, random_seed: nil) ⇒ MDS

Create a new transformer with MDS.

Parameters:

  • n_components (Integer) (defaults to: 2)

    The number of dimensions on representation space.

  • metric (String) (defaults to: 'euclidean')

    The metric to calculate the distances in original space. If metric is ‘euclidean’, Euclidean distance is calculated for distance in original space. If metric is ‘precomputed’, the fit and fit_transform methods expect to be given a distance matrix.

  • init (String) (defaults to: 'random')

    The init is a method to initialize the representaion space. If init is ‘random’, the representaion space is initialized with normal random variables. If init is ‘pca’, the result of principal component analysis as the initial value of the representation space.

  • max_iter (Integer) (defaults to: 300)

    The maximum number of iterations.

  • tol (Float) (defaults to: nil)

    The tolerance of stress value for terminating optimization. If tol is nil, it does not use stress value as a criterion for terminating the optimization.

  • verbose (Boolean) (defaults to: false)

    The flag indicating whether to output stress value during iteration.

  • random_seed (Integer) (defaults to: nil)

    The seed value using to initialize the random generator.



54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
# File 'lib/rumale/manifold/mds.rb', line 54

def initialize(n_components: 2, metric: 'euclidean', init: 'random',
               max_iter: 300, tol: nil, verbose: false, random_seed: nil)
  check_params_integer(n_components: n_components, max_iter: max_iter)
  check_params_string(metric: metric, init: init)
  check_params_boolean(verbose: verbose)
  check_params_type_or_nil(Float, tol: tol)
  check_params_type_or_nil(Integer, random_seed: random_seed)
  check_params_positive(n_components: n_components, max_iter: max_iter)
  @params = {}
  @params[:n_components] = n_components
  @params[:max_iter] = max_iter
  @params[:tol] = tol
  @params[:metric] = metric
  @params[:init] = init
  @params[:verbose] = verbose
  @params[:random_seed] = random_seed
  @params[:random_seed] ||= srand
  @rng = Random.new(@params[:random_seed])
  @embedding = nil
  @stress = nil
  @n_iter = nil
end

Instance Attribute Details

#embeddingNumo::DFloat (readonly)

Return the data in representation space.

Returns:

  • (Numo::DFloat)

    (shape: [n_samples, n_components])



26
27
28
# File 'lib/rumale/manifold/mds.rb', line 26

def embedding
  @embedding
end

#n_iterInteger (readonly)

Return the number of iterations run for optimization

Returns:

  • (Integer)


34
35
36
# File 'lib/rumale/manifold/mds.rb', line 34

def n_iter
  @n_iter
end

#rngRandom (readonly)

Return the random generator.

Returns:

  • (Random)


38
39
40
# File 'lib/rumale/manifold/mds.rb', line 38

def rng
  @rng
end

#stressFloat (readonly)

Return the stress function value after optimization.

Returns:

  • (Float)


30
31
32
# File 'lib/rumale/manifold/mds.rb', line 30

def stress
  @stress
end

Instance Method Details

#fit(x) ⇒ MDS

Fit the model with given training data.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The training data to be used for fitting the model. If the metric is ‘precomputed’, x must be a square distance matrix (shape: [n_samples, n_samples]).

Returns:

  • (MDS)

    The learned transformer itself.

Raises:

  • (ArgumentError)


84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
# File 'lib/rumale/manifold/mds.rb', line 84

def fit(x, _not_used = nil)
  check_sample_array(x)
  raise ArgumentError, 'Expect the input distance matrix to be square.' if @params[:metric] == 'precomputed' && x.shape[0] != x.shape[1]
  # initialize some varibales.
  n_samples = x.shape[0]
  hi_distance_mat = @params[:metric] == 'precomputed' ? x : Rumale::PairwiseMetric.euclidean_distance(x)
  @embedding = init_embedding(x)
  lo_distance_mat = Rumale::PairwiseMetric.euclidean_distance(@embedding)
  @stress = calc_stress(hi_distance_mat, lo_distance_mat)
  @n_iter = 0
  # perform optimization.
  @params[:max_iter].times do |t|
    # guttman tarnsform.
    ratio = hi_distance_mat / lo_distance_mat
    ratio[ratio.diag_indices] = 0.0
    ratio[lo_distance_mat.eq(0)] = 0.0
    tmp_mat = -ratio
    tmp_mat[tmp_mat.diag_indices] += ratio.sum(axis: 1)
    @embedding = 1.fdiv(n_samples) * tmp_mat.dot(@embedding)
    # check convergence.
    new_stress = calc_stress(hi_distance_mat, lo_distance_mat)
    if terminate?(@stress, new_stress)
      @stress = new_stress
      break
    end
    # next step.
    @n_iter = t + 1
    @stress = new_stress
    lo_distance_mat = Rumale::PairwiseMetric.euclidean_distance(@embedding)
    puts "[MDS] stress function after #{@n_iter} iterations: #{@stress}" if @params[:verbose] && (@n_iter % 100).zero?
  end
  self
end

#fit_transform(x) ⇒ Numo::DFloat

Fit the model with training data, and then transform them with the learned model.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The training data to be used for fitting the model. If the metric is ‘precomputed’, x must be a square distance matrix (shape: [n_samples, n_samples]).

Returns:

  • (Numo::DFloat)

    (shape: [n_samples, n_components]) The transformed data



125
126
127
128
# File 'lib/rumale/manifold/mds.rb', line 125

def fit_transform(x, _not_used = nil)
  fit(x)
  @embedding.dup
end

#marshal_dumpHash

Dump marshal data.

Returns:

  • (Hash)

    The marshal data.



132
133
134
135
136
137
138
# File 'lib/rumale/manifold/mds.rb', line 132

def marshal_dump
  { params: @params,
    embedding: @embedding,
    stress: @stress,
    n_iter: @n_iter,
    rng: @rng }
end

#marshal_load(obj) ⇒ nil

Load marshal data.

Returns:

  • (nil)


142
143
144
145
146
147
148
149
# File 'lib/rumale/manifold/mds.rb', line 142

def marshal_load(obj)
  @params = obj[:params]
  @embedding = obj[:embedding]
  @stress = obj[:stress]
  @n_iter = obj[:n_iter]
  @rng = obj[:rng]
  nil
end