Class: Rumale::Preprocessing::BinDiscretizer

Inherits:
Object
  • Object
show all
Includes:
Base::BaseEstimator, Base::Transformer
Defined in:
lib/rumale/preprocessing/bin_discretizer.rb

Overview

Discretizes features with a given number of bins. In some cases, discretizing features may accelerate decision tree training.

Examples:

discretizer = Rumale::Preprocessing::BinDiscretizer.new(n_bins: 4)
samples = Numo::DFloat.new(5, 2).rand - 0.5
transformed = discretizer.fit_transform(samples)
# > pp samples
# Numo::DFloat#shape=[5,2]
# [[-0.438246, -0.126933],
#  [ 0.294815, -0.298958],
#  [-0.383959, -0.155968],
#  [ 0.039948,  0.237815],
#  [-0.334911, -0.449117]]
# > pp transformed
# Numo::DFloat#shape=[5,2]
# [[0, 1],
#  [3, 0],
#  [0, 1],
#  [2, 3],
#  [0, 0]]

Instance Attribute Summary collapse

Attributes included from Base::BaseEstimator

#params

Instance Method Summary collapse

Constructor Details

#initialize(n_bins: 32) ⇒ BinDiscretizer

Create a new discretizer for features with given number of bins.

Parameters:

  • n_bins (Integer) (defaults to: 32)

    The number of bins to be used disretizing feature values.



40
41
42
43
44
# File 'lib/rumale/preprocessing/bin_discretizer.rb', line 40

def initialize(n_bins: 32)
  @params = {}
  @params[:n_bins] = n_bins
  @feature_steps = nil
end

Instance Attribute Details

#feature_stepsArray<Numo::DFloat> (readonly)

Return the feature steps to be used discretizing.

Returns:

  • (Array<Numo::DFloat>)

    (shape: [n_features, n_bins])



35
36
37
# File 'lib/rumale/preprocessing/bin_discretizer.rb', line 35

def feature_steps
  @feature_steps
end

Instance Method Details

#fit(x) ⇒ BinDiscretizer

Fit feature ranges to be discretized.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The samples to calculate the feature ranges.

Returns:



52
53
54
55
56
57
58
59
60
61
# File 'lib/rumale/preprocessing/bin_discretizer.rb', line 52

def fit(x, _y = nil)
  check_sample_array(x)
  n_features = x.shape[1]
  max_vals = x.max(0)
  min_vals = x.min(0)
  @feature_steps = Array.new(n_features) do |n|
    Numo::DFloat.linspace(min_vals[n], max_vals[n], @params[:n_bins] + 1)[0...@params[:n_bins]]
  end
  self
end

#fit_transform(x) ⇒ Numo::DFloat

Fit feature ranges to be discretized, then return discretized samples.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The samples to be discretized.

Returns:

  • (Numo::DFloat)

    The discretized samples.



69
70
71
72
# File 'lib/rumale/preprocessing/bin_discretizer.rb', line 69

def fit_transform(x, _y = nil)
  check_sample_array(x)
  fit(x).transform(x)
end

#marshal_dumpHash

Dump marshal data.

Returns:

  • (Hash)

    The marshal data about BinDiscretizer



94
95
96
97
# File 'lib/rumale/preprocessing/bin_discretizer.rb', line 94

def marshal_dump
  { params: @params,
    feature_steps: @feature_steps }
end

#marshal_load(obj) ⇒ nil

Load marshal data.

Returns:

  • (nil)


101
102
103
104
105
# File 'lib/rumale/preprocessing/bin_discretizer.rb', line 101

def marshal_load(obj)
  @params = obj[:params]
  @feature_steps = obj[:feature_steps]
  nil
end

#transform(x) ⇒ Numo::DFloat

Peform discretizing the given samples.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The samples to be discretized.

Returns:

  • (Numo::DFloat)

    The discretized samples.



78
79
80
81
82
83
84
85
86
87
88
89
90
# File 'lib/rumale/preprocessing/bin_discretizer.rb', line 78

def transform(x)
  check_sample_array(x)
  n_samples, n_features = x.shape
  transformed = Numo::DFloat.zeros(n_samples, n_features)
  n_features.times do |n|
    steps = @feature_steps[n]
    @params[:n_bins].times do |bin|
      mask = x[true, n].ge(steps[bin]).where
      transformed[mask, n] = bin
    end
  end
  transformed
end