Class: CrossValidation::Runner
- Inherits:
-
Object
- Object
- CrossValidation::Runner
- Defined in:
- lib/cross_validation/runner.rb
Instance Attribute Summary collapse
-
#classifier ⇒ Proc
This instantiates your classifier.
-
#classifying ⇒ Proc
This receives a trained classifier and a test document.
-
#documents ⇒ Array
Array of documents to train and test on.
-
#errors ⇒ Array
readonly
Array of which attributes are empty.
-
#fetch_sample_class ⇒ Proc
When verifying the results of executing the
classifying
method, we need to determine what the actual class (e.g., spam) of the document was. -
#fetch_sample_value ⇒ Proc
This receives a document and should return its value, i.e., whatever you’re feeding into
classifying
. -
#folds ⇒ Fixnum
The number of folds to partition
documents
into. - #matrix ⇒ ConfusionMatrix
-
#percentage ⇒ Float
The number of folds to partition
documents
into as a percentage of the documents. -
#training ⇒ Proc
This receives an instantiated
classifier
and a document, and it should call your classifier’s training method.
Class Method Summary collapse
-
.create ⇒ Object
Configuring a cross-validation run is complicated.
Instance Method Summary collapse
-
#initialize ⇒ Runner
constructor
A new instance of Runner.
- #invalid? ⇒ Boolean
-
#k ⇒ Fixnum
Returns the number of folds to partition the documents into.
-
#run ⇒ ConfusionMatrix
Performs k-fold cross-validation and returns a confusion matrix.
-
#valid? ⇒ Boolean
Checks if all of the required run parameters are set.
Constructor Details
#initialize ⇒ Runner
Returns a new instance of Runner.
51 52 53 54 55 56 57 |
# File 'lib/cross_validation/runner.rb', line 51 def initialize @fetch_sample_value = lambda { |sample| sample.value } @fetch_sample_class = lambda { |sample| sample.klass } @critical_keys = [:documents, :classifier, :matrix, :training, :classifying, :fetch_sample_value, :fetch_sample_class] end |
Instance Attribute Details
#classifier ⇒ Proc
Returns This instantiates your classifier.
14 15 16 |
# File 'lib/cross_validation/runner.rb', line 14 def classifier @classifier end |
#classifying ⇒ Proc
Returns This receives a trained classifier and a test document. It classifies the document. It’s a Proc
because we create a new one with each partition.
36 37 38 |
# File 'lib/cross_validation/runner.rb', line 36 def @classifying end |
#documents ⇒ Array
Returns Array of documents to train and test on. It can be an array of anything, as the fetch_sample_value
and fetch_sample_class
lambdas specify what to feed into the classifying method.
11 12 13 |
# File 'lib/cross_validation/runner.rb', line 11 def documents @documents end |
#errors ⇒ Array (readonly)
Returns Array of which attributes are empty.
49 50 51 |
# File 'lib/cross_validation/runner.rb', line 49 def errors @errors end |
#fetch_sample_class ⇒ Proc
Returns When verifying the results of executing the classifying
method, we need to determine what the actual class (e.g., spam) of the document was. This Proc
receives a document and should return the document’s class.
46 47 48 |
# File 'lib/cross_validation/runner.rb', line 46 def fetch_sample_class @fetch_sample_class end |
#fetch_sample_value ⇒ Proc
Returns This receives a document and should return its value, i.e., whatever you’re feeding into classifying
.
40 41 42 |
# File 'lib/cross_validation/runner.rb', line 40 def fetch_sample_value @fetch_sample_value end |
#folds ⇒ Fixnum
Returns The number of folds to partition documents
into. Mutually exclusive with percentage
.
18 19 20 |
# File 'lib/cross_validation/runner.rb', line 18 def folds @folds end |
#matrix ⇒ ConfusionMatrix
26 27 28 |
# File 'lib/cross_validation/runner.rb', line 26 def matrix @matrix end |
#percentage ⇒ Float
Returns The number of folds to partition documents
into as a percentage of the documents. Mutually exclusive with folds
.
23 24 25 |
# File 'lib/cross_validation/runner.rb', line 23 def percentage @percentage end |
#training ⇒ Proc
Returns This receives an instantiated classifier
and a document, and it should call your classifier’s training method.
31 32 33 |
# File 'lib/cross_validation/runner.rb', line 31 def training @training end |
Class Method Details
.create ⇒ Object
Configuring a cross-validation run is complicated. Let’s make it easier with a factory method.
121 122 123 |
# File 'lib/cross_validation/runner.rb', line 121 def self.create new.tap { |r| yield(r) } end |
Instance Method Details
#invalid? ⇒ Boolean
80 81 82 |
# File 'lib/cross_validation/runner.rb', line 80 def invalid? !valid? end |
#k ⇒ Fixnum
Returns the number of folds to partition the documents into.
62 63 64 |
# File 'lib/cross_validation/runner.rb', line 62 def k @k ||= percentage ? (documents.size * percentage) : folds end |
#run ⇒ ConfusionMatrix
Performs k-fold cross-validation and returns a confusion matrix.
The algorithm is as follows (Mitchell, 1997, p147):
partitions = partition data into k-equal sized subsets (folds)
for i = 1 -> k:
T = data \ partitions[i]
train(T)
classify(partitions[i])
output confusion matrix
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
# File 'lib/cross_validation/runner.rb', line 97 def run fail_if_invalid partitions = Partitioner.subset(documents, k) results = partitions.map.with_index do |part, i| training_samples = Partitioner.exclude_index(documents, i) classifier_instance = classifier.call() train(classifier_instance, training_samples) # fetch confusion keys part.each do |x| prediction = classify(classifier_instance, x) matrix.store(prediction, fetch_sample_class.call(x)) end end matrix end |
#valid? ⇒ Boolean
Checks if all of the required run parameters are set.
69 70 71 72 73 74 75 76 77 |
# File 'lib/cross_validation/runner.rb', line 69 def valid? @errors = [] @critical_keys.each do |k| any_error = public_send(k).nil? @errors << k if any_error end @errors.size == 0 end |