Class: CrossValidation::Runner

Inherits:

Object

Object
CrossValidation::Runner

show all

Defined in:: lib/cross_validation/runner.rb

Instance Attribute Summary collapse

#classifier ⇒ Proc

This instantiates your classifier.
#classifying ⇒ Proc

This receives a trained classifier and a test document.
#documents ⇒ Array

Array of documents to train and test on.
#errors ⇒ Array readonly

Array of which attributes are empty.
#fetch_sample_class ⇒ Proc

When verifying the results of executing the classifying method, we need to determine what the actual class (e.g., spam) of the document was.
#fetch_sample_value ⇒ Proc

This receives a document and should return its value, i.e., whatever you’re feeding into classifying.
#folds ⇒ Fixnum

The number of folds to partition documents into.
#matrix ⇒ ConfusionMatrix
#percentage ⇒ Float

The number of folds to partition documents into as a percentage of the documents.
#training ⇒ Proc

This receives an instantiated classifier and a document, and it should call your classifier’s training method.

Class Method Summary collapse

.create ⇒ Object

Configuring a cross-validation run is complicated.

Instance Method Summary collapse

#initialize ⇒ Runner constructor

A new instance of Runner.
#invalid? ⇒ Boolean
#k ⇒ Fixnum

Returns the number of folds to partition the documents into.
#run ⇒ ConfusionMatrix

Performs k-fold cross-validation and returns a confusion matrix.
#valid? ⇒ Boolean

Checks if all of the required run parameters are set.

Constructor Details

#initialize ⇒ `Runner`

Returns a new instance of Runner.

# File 'lib/cross_validation/runner.rb', line 51

def initialize
  @fetch_sample_value = lambda { |sample| sample.value }
  @fetch_sample_class = lambda { |sample| sample.klass }

  @critical_keys = [:documents, :classifier, :matrix, :training,
                    :classifying, :fetch_sample_value, :fetch_sample_class]
end

Instance Attribute Details

#classifier ⇒ `Proc`

Returns This instantiates your classifier.

Returns:

(Proc) —

This instantiates your classifier.



14
15
16

# File 'lib/cross_validation/runner.rb', line 14

def classifier
  @classifier
end

#classifying ⇒ `Proc`

Returns This receives a trained classifier and a test document. It classifies the document. It’s a Proc because we create a new one with each partition.

Returns:

(Proc) —

This receives a trained classifier and a test document. It classifies the document. It’s a Proc because we create a new one with each partition.



36
37
38

# File 'lib/cross_validation/runner.rb', line 36

def classifying
  @classifying
end

#documents ⇒ `Array`

Returns Array of documents to train and test on. It can be an array of anything, as the fetch_sample_value and fetch_sample_class lambdas specify what to feed into the classifying method.

Returns:

(Array) —

Array of documents to train and test on. It can be an array of anything, as the fetch_sample_value and fetch_sample_class lambdas specify what to feed into the classifying method.



11
12
13

# File 'lib/cross_validation/runner.rb', line 11

def documents
  @documents
end

#errors ⇒ `Array` (readonly)

Returns Array of which attributes are empty.

Returns:

(Array) —

Array of which attributes are empty



49
50
51

# File 'lib/cross_validation/runner.rb', line 49

def errors
  @errors
end

#fetch_sample_class ⇒ `Proc`

Returns When verifying the results of executing the classifying method, we need to determine what the actual class (e.g., spam) of the document was. This Proc receives a document and should return the document’s class.

Returns:

(Proc) —

When verifying the results of executing the classifying method, we need to determine what the actual class (e.g., spam) of the document was. This Proc receives a document and should return the document’s class.



46
47
48

# File 'lib/cross_validation/runner.rb', line 46

def fetch_sample_class
  @fetch_sample_class
end

#fetch_sample_value ⇒ `Proc`

Returns This receives a document and should return its value, i.e., whatever you’re feeding into classifying.

Returns:

(Proc) —

This receives a document and should return its value, i.e., whatever you’re feeding into classifying.



40
41
42

# File 'lib/cross_validation/runner.rb', line 40

def fetch_sample_value
  @fetch_sample_value
end

#folds ⇒ `Fixnum`

Returns The number of folds to partition documents into. Mutually exclusive with percentage.

Returns:

(Fixnum) —

The number of folds to partition documents into. Mutually exclusive with percentage.



18
19
20

# File 'lib/cross_validation/runner.rb', line 18

def folds
  @folds
end

#matrix ⇒ `ConfusionMatrix`

Returns:

(ConfusionMatrix)



26
27
28

# File 'lib/cross_validation/runner.rb', line 26

def matrix
  @matrix
end

#percentage ⇒ `Float`

Returns The number of folds to partition documents into as a percentage of the documents. Mutually exclusive with folds.

Returns:

(Float) —

The number of folds to partition documents into as a percentage of the documents. Mutually exclusive with folds.



23
24
25

# File 'lib/cross_validation/runner.rb', line 23

def percentage
  @percentage
end

#training ⇒ `Proc`

Returns This receives an instantiated classifier and a document, and it should call your classifier’s training method.

Returns:

(Proc) —

This receives an instantiated classifier and a document, and it should call your classifier’s training method.



31
32
33

# File 'lib/cross_validation/runner.rb', line 31

def training
  @training
end

Class Method Details

.create ⇒ `Object`

Configuring a cross-validation run is complicated. Let’s make it easier with a factory method.



121
122
123

# File 'lib/cross_validation/runner.rb', line 121

def self.create
  new.tap { |r| yield(r) }
end

Instance Method Details

#invalid? ⇒ `Boolean`

Returns:

(Boolean)

#k ⇒ `Fixnum`

Returns the number of folds to partition the documents into.

Returns:

(Fixnum)



62
63
64

# File 'lib/cross_validation/runner.rb', line 62

def k
  @k ||= percentage ? (documents.size * percentage) : folds
end

#run ⇒ `ConfusionMatrix`

Performs k-fold cross-validation and returns a confusion matrix.

The algorithm is as follows (Mitchell, 1997, p147):

partitions = partition data into k-equal sized subsets (folds)
for i = 1 -> k:
  T = data \ partitions[i]
  train(T)
  classify(partitions[i])
output confusion matrix

Returns:

(ConfusionMatrix)

Raises:

(ArgumentError) —

if the runner is missing required attributes

# File 'lib/cross_validation/runner.rb', line 97

def run
  fail_if_invalid

  partitions = Partitioner.subset(documents, k)

  results = partitions.map.with_index do |part, i|
    training_samples = Partitioner.exclude_index(documents, i)

    classifier_instance = classifier.call()

    train(classifier_instance, training_samples)

    # fetch confusion keys
    part.each do |x|
      prediction = classify(classifier_instance, x)
      matrix.store(prediction, fetch_sample_class.call(x))
    end
  end

  matrix
end

#valid? ⇒ `Boolean`

Checks if all of the required run parameters are set.

Returns:

(Boolean)

# File 'lib/cross_validation/runner.rb', line 69

def valid?
  @errors = []
  @critical_keys.each do |k|
    any_error = public_send(k).nil?
    @errors << k if any_error
  end

  @errors.size == 0
end

Class: CrossValidation::Runner

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize ⇒ Runner

Instance Attribute Details

#classifier ⇒ Proc

#classifying ⇒ Proc

#documents ⇒ Array

#errors ⇒ Array (readonly)

#fetch_sample_class ⇒ Proc

#fetch_sample_value ⇒ Proc

#folds ⇒ Fixnum

#matrix ⇒ ConfusionMatrix

#percentage ⇒ Float

#training ⇒ Proc

Class Method Details

.create ⇒ Object

Instance Method Details

#invalid? ⇒ Boolean

#k ⇒ Fixnum

#run ⇒ ConfusionMatrix

#valid? ⇒ Boolean

#initialize ⇒ `Runner`

#classifier ⇒ `Proc`

#classifying ⇒ `Proc`

#documents ⇒ `Array`

#errors ⇒ `Array` (readonly)

#fetch_sample_class ⇒ `Proc`

#fetch_sample_value ⇒ `Proc`

#folds ⇒ `Fixnum`

#matrix ⇒ `ConfusionMatrix`

#percentage ⇒ `Float`

#training ⇒ `Proc`

.create ⇒ `Object`

#invalid? ⇒ `Boolean`

#k ⇒ `Fixnum`

#run ⇒ `ConfusionMatrix`

#valid? ⇒ `Boolean`