Class: CrossValidation::Runner

Inherits:
Object
  • Object
show all
Defined in:
lib/cross_validation/runner.rb

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeRunner

Returns a new instance of Runner.



51
52
53
54
55
56
57
# File 'lib/cross_validation/runner.rb', line 51

def initialize
  @fetch_sample_value = lambda { |sample| sample.value }
  @fetch_sample_class = lambda { |sample| sample.klass }

  @critical_keys = [:documents, :classifier, :matrix, :training,
                    :classifying, :fetch_sample_value, :fetch_sample_class]
end

Instance Attribute Details

#classifierProc

Returns This instantiates your classifier.

Returns:

  • (Proc)

    This instantiates your classifier.



14
15
16
# File 'lib/cross_validation/runner.rb', line 14

def classifier
  @classifier
end

#classifyingProc

Returns This receives a trained classifier and a test document. It classifies the document. It’s a Proc because we create a new one with each partition.

Returns:

  • (Proc)

    This receives a trained classifier and a test document. It classifies the document. It’s a Proc because we create a new one with each partition.



36
37
38
# File 'lib/cross_validation/runner.rb', line 36

def classifying
  @classifying
end

#documentsArray

Returns Array of documents to train and test on. It can be an array of anything, as the fetch_sample_value and fetch_sample_class lambdas specify what to feed into the classifying method.

Returns:

  • (Array)

    Array of documents to train and test on. It can be an array of anything, as the fetch_sample_value and fetch_sample_class lambdas specify what to feed into the classifying method.



11
12
13
# File 'lib/cross_validation/runner.rb', line 11

def documents
  @documents
end

#errorsArray (readonly)

Returns Array of which attributes are empty.

Returns:

  • (Array)

    Array of which attributes are empty



49
50
51
# File 'lib/cross_validation/runner.rb', line 49

def errors
  @errors
end

#fetch_sample_classProc

Returns When verifying the results of executing the classifying method, we need to determine what the actual class (e.g., spam) of the document was. This Proc receives a document and should return the document’s class.

Returns:

  • (Proc)

    When verifying the results of executing the classifying method, we need to determine what the actual class (e.g., spam) of the document was. This Proc receives a document and should return the document’s class.



46
47
48
# File 'lib/cross_validation/runner.rb', line 46

def fetch_sample_class
  @fetch_sample_class
end

#fetch_sample_valueProc

Returns This receives a document and should return its value, i.e., whatever you’re feeding into classifying.

Returns:

  • (Proc)

    This receives a document and should return its value, i.e., whatever you’re feeding into classifying.



40
41
42
# File 'lib/cross_validation/runner.rb', line 40

def fetch_sample_value
  @fetch_sample_value
end

#foldsFixnum

Returns The number of folds to partition documents into. Mutually exclusive with percentage.

Returns:

  • (Fixnum)

    The number of folds to partition documents into. Mutually exclusive with percentage.



18
19
20
# File 'lib/cross_validation/runner.rb', line 18

def folds
  @folds
end

#matrixConfusionMatrix

Returns:



26
27
28
# File 'lib/cross_validation/runner.rb', line 26

def matrix
  @matrix
end

#percentageFloat

Returns The number of folds to partition documents into as a percentage of the documents. Mutually exclusive with folds.

Returns:

  • (Float)

    The number of folds to partition documents into as a percentage of the documents. Mutually exclusive with folds.



23
24
25
# File 'lib/cross_validation/runner.rb', line 23

def percentage
  @percentage
end

#trainingProc

Returns This receives an instantiated classifier and a document, and it should call your classifier’s training method.

Returns:

  • (Proc)

    This receives an instantiated classifier and a document, and it should call your classifier’s training method.



31
32
33
# File 'lib/cross_validation/runner.rb', line 31

def training
  @training
end

Class Method Details

.createObject

Configuring a cross-validation run is complicated. Let’s make it easier with a factory method.



121
122
123
# File 'lib/cross_validation/runner.rb', line 121

def self.create
  new.tap { |r| yield(r) }
end

Instance Method Details

#invalid?Boolean

Returns:

  • (Boolean)

See Also:



80
81
82
# File 'lib/cross_validation/runner.rb', line 80

def invalid?
  !valid?
end

#kFixnum

Returns the number of folds to partition the documents into.

Returns:

  • (Fixnum)


62
63
64
# File 'lib/cross_validation/runner.rb', line 62

def k
  @k ||= percentage ? (documents.size * percentage) : folds
end

#runConfusionMatrix

Performs k-fold cross-validation and returns a confusion matrix.

The algorithm is as follows (Mitchell, 1997, p147):

partitions = partition data into k-equal sized subsets (folds)
for i = 1 -> k:
  T = data \ partitions[i]
  train(T)
  classify(partitions[i])
output confusion matrix

Returns:

Raises:

  • (ArgumentError)

    if the runner is missing required attributes



97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
# File 'lib/cross_validation/runner.rb', line 97

def run
  fail_if_invalid

  partitions = Partitioner.subset(documents, k)

  results = partitions.map.with_index do |part, i|
    training_samples = Partitioner.exclude_index(documents, i)

    classifier_instance = classifier.call()

    train(classifier_instance, training_samples)

    # fetch confusion keys
    part.each do |x|
      prediction = classify(classifier_instance, x)
      matrix.store(prediction, fetch_sample_class.call(x))
    end
  end

  matrix
end

#valid?Boolean

Checks if all of the required run parameters are set.

Returns:

  • (Boolean)


69
70
71
72
73
74
75
76
77
# File 'lib/cross_validation/runner.rb', line 69

def valid?
  @errors = []
  @critical_keys.each do |k|
    any_error = public_send(k).nil?
    @errors << k if any_error
  end

  @errors.size == 0
end