Class: Rumale::ModelSelection::ShuffleSplit

Inherits:

Object

Object
Rumale::ModelSelection::ShuffleSplit

show all

Includes:: Base::Splitter

Defined in:: lib/rumale/model_selection/shuffle_split.rb

Overview

ShuffleSplit is a class that generates the set of data indices for random permutation cross-validation.

Examples:

ss = Rumale::ModelSelection::ShuffleSplit.new(n_splits: 3, test_size: 0.2, random_seed: 1)
ss.split(samples, labels).each do |train_ids, test_ids|
  train_samples = samples[train_ids, true]
  test_samples = samples[test_ids, true]
  ...
end

Instance Attribute Summary collapse

#n_splits ⇒ Integer readonly

Return the number of folds.
#rng ⇒ Random readonly

Return the random generator for shuffling the dataset.

Instance Method Summary collapse

#initialize(n_splits: 3, test_size: 0.1, train_size: nil, random_seed: nil) ⇒ ShuffleSplit constructor

Create a new data splitter for random permutation cross validation.
#split(x, _y = nil) ⇒ Array

Generate data indices for random permutation cross validation.

Constructor Details

#initialize(n_splits: 3, test_size: 0.1, train_size: nil, random_seed: nil) ⇒ `ShuffleSplit`

Create a new data splitter for random permutation cross validation.

Parameters:

n_splits (Integer) (defaults to: 3) —

The number of folds.
test_size (Float) (defaults to: 0.1) —

The ratio of number of samples for test data.
train_size (Float) (defaults to: nil) —

The ratio of number of samples for train data.
random_seed (Integer) (defaults to: nil) —

The seed value using to initialize the random generator.

# File 'lib/rumale/model_selection/shuffle_split.rb', line 34

def initialize(n_splits: 3, test_size: 0.1, train_size: nil, random_seed: nil)
  check_params_integer(n_splits: n_splits)
  check_params_float(test_size: test_size)
  check_params_type_or_nil(Float, train_size: train_size)
  check_params_type_or_nil(Integer, random_seed: random_seed)
  check_params_positive(n_splits: n_splits)
  check_params_positive(test_size: test_size)
  check_params_positive(train_size: train_size) unless train_size.nil?
  @n_splits = n_splits
  @test_size = test_size
  @train_size = train_size
  @random_seed = random_seed
  @random_seed ||= srand
  @rng = Random.new(@random_seed)
end

Instance Attribute Details

#n_splits ⇒ `Integer` (readonly)

Return the number of folds.

Returns:

(Integer)



22
23
24

# File 'lib/rumale/model_selection/shuffle_split.rb', line 22

def n_splits
  @n_splits
end

#rng ⇒ `Random` (readonly)

Return the random generator for shuffling the dataset.

Returns:

(Random)



26
27
28

# File 'lib/rumale/model_selection/shuffle_split.rb', line 26

def rng
  @rng
end

Instance Method Details

#split(x, _y = nil) ⇒ `Array`

Generate data indices for random permutation cross validation.

Parameters:

x (Numo::DFloat) —

(shape: [n_samples, n_features]) The dataset to be used to generate data indices for random permutation cross validation.

Returns:

(Array) —

The set of data indices for constructing the training and testing dataset in each fold.

# File 'lib/rumale/model_selection/shuffle_split.rb', line 55

def split(x, _y = nil)
  check_sample_array(x)
  # Initialize and check some variables.
  n_samples = x.shape[0]
  n_test_samples = (@test_size * n_samples).to_i
  n_train_samples = @train_size.nil? ? n_samples - n_test_samples : (@train_size * n_samples).to_i
  unless @n_splits.between?(1, n_samples)
    raise ArgumentError,
          'The value of n_splits must be not less than 1 and not more than the number of samples.'
  end
  unless n_test_samples.between?(1, n_samples)
    raise RangeError,
          'The number of sample in test split must be not less than 1 and not more than the number of samples.'
  end
  unless n_train_samples.between?(1, n_samples)
    raise RangeError,
          'The number of sample in train split must be not less than 1 and not more than the number of samples.'
  end
  if (n_test_samples + n_train_samples) > n_samples
    raise RangeError,
          'The total number of samples in test split and train split must be not more than the number of samples.'
  end
  sub_rng = @rng.dup
  # Returns array consisting of the training and testing ids for each fold.
  dataset_ids = [*0...n_samples]
  Array.new(@n_splits) do
    test_ids = dataset_ids.sample(n_test_samples, random: sub_rng)
    train_ids = if @train_size.nil?
                  dataset_ids - test_ids
                else
                  (dataset_ids - test_ids).sample(n_train_samples, random: sub_rng)
                end
    [train_ids, test_ids]
  end
end

Class: Rumale::ModelSelection::ShuffleSplit

Overview

Examples:

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(n_splits: 3, test_size: 0.1, train_size: nil, random_seed: nil) ⇒ ShuffleSplit

Instance Attribute Details

#n_splits ⇒ Integer (readonly)

#rng ⇒ Random (readonly)

Instance Method Details

#split(x, _y = nil) ⇒ Array

#initialize(n_splits: 3, test_size: 0.1, train_size: nil, random_seed: nil) ⇒ `ShuffleSplit`

#n_splits ⇒ `Integer` (readonly)

#rng ⇒ `Random` (readonly)

#split(x, _y = nil) ⇒ `Array`