Class: SVMKit::ModelSelection::StratifiedKFold

Inherits:

Object

Object
SVMKit::ModelSelection::StratifiedKFold

Includes:: Base::Splitter

Defined in:: lib/svmkit/model_selection/stratified_k_fold.rb

Overview

StratifiedKFold is a class that generates the set of data indices for K-fold cross-validation. The proportion of the number of samples in each class will be almost equal for each fold.

Examples:

kf = SVMKit::ModelSelection::StratifiedKFold.new(n_splits: 3, shuffle: true, random_seed: 1)
kf.split(samples, labels).each do |train_ids, test_ids|
  train_samples = samples[train_ids, true]
  test_samples = samples[test_ids, true]
  ...
end

Instance Attribute Summary collapse

#rng ⇒ Random readonly

Return the random generator for shuffling the dataset.
#shuffle ⇒ Boolean readonly

Return the flag indicating whether to shuffle the dataset.

Attributes included from Base::Splitter

#n_splits

Instance Method Summary collapse

#initialize(n_splits: 3, shuffle: false, random_seed: nil) ⇒ StratifiedKFold constructor

Create a new data splitter for K-fold cross validation.
#split(x, y) ⇒ Array

Generate data indices for stratified K-fold cross validation.

Constructor Details

#initialize(n_splits: 3, shuffle: false, random_seed: nil) ⇒ `StratifiedKFold`

Create a new data splitter for K-fold cross validation.

Parameters:

n_splits (Integer) (defaults to: 3) —

The number of folds.
shuffle (Boolean) (defaults to: false) —

The flag indicating whether to shuffle the dataset.
random_seed (Integer) (defaults to: nil) —

The seed value using to initialize the random generator.

# File 'lib/svmkit/model_selection/stratified_k_fold.rb', line 34

def initialize(n_splits: 3, shuffle: false, random_seed: nil)
  @n_splits = n_splits
  @shuffle = shuffle
  @random_seed = random_seed
  @random_seed ||= srand
  @rng = Random.new(@random_seed)
end

Instance Attribute Details

#rng ⇒ `Random` (readonly)

Return the random generator for shuffling the dataset.

Returns:

(Random)



27
28
29

# File 'lib/svmkit/model_selection/stratified_k_fold.rb', line 27

def rng
  @rng
end

#shuffle ⇒ `Boolean` (readonly)

Return the flag indicating whether to shuffle the dataset.

Returns:

(Boolean)



23
24
25

# File 'lib/svmkit/model_selection/stratified_k_fold.rb', line 23

def shuffle
  @shuffle
end

Instance Method Details

#split(x, y) ⇒ `Array`

Generate data indices for stratified K-fold cross validation.

Parameters:

x (Numo::DFloat) —

(shape: [n_samples, n_features]) The dataset to be used to generate data indices for stratified K-fold cross validation. This argument exists to unify the interface between the K-fold methods, it is not used in the method.
y (Numo::Int32) —

(shape: [n_samples]) The labels to be used to generate data indices for stratified K-fold cross validation.

Returns:

(Array) —

The set of data indices for constructing the training and testing dataset in each fold.

# File 'lib/svmkit/model_selection/stratified_k_fold.rb', line 50

def split(x, y) # rubocop:disable Lint/UnusedMethodArgument
  # Check the number of samples in each class.
  unless valid_n_splits?(y)
    raise ArgumentError,
          'The value of n_splits must be not less than 2 and not more than the number of samples in each class.'
  end
  # Splits dataset ids of each class to each fold.
  fold_sets_each_class = y.to_a.uniq.map { |label| fold_sets(y, label) }
  # Returns array consisting of the training and testing ids for each fold.
  Array.new(@n_splits) { |fold_id| train_test_sets(fold_sets_each_class, fold_id) }
end

Class: SVMKit::ModelSelection::StratifiedKFold

Overview

Instance Attribute Summary collapse

Attributes included from Base::Splitter

Instance Method Summary collapse

Constructor Details

#initialize(n_splits: 3, shuffle: false, random_seed: nil) ⇒ StratifiedKFold

Instance Attribute Details

#rng ⇒ Random (readonly)

#shuffle ⇒ Boolean (readonly)

Instance Method Details

#split(x, y) ⇒ Array

#initialize(n_splits: 3, shuffle: false, random_seed: nil) ⇒ `StratifiedKFold`

#rng ⇒ `Random` (readonly)

#shuffle ⇒ `Boolean` (readonly)

#split(x, y) ⇒ `Array`