Class: Idhja22::BinaryClassifier
- Inherits:
-
Object
- Object
- Idhja22::BinaryClassifier
- Defined in:
- lib/idhja22/binary_classifier.rb
Class Method Summary collapse
-
.train(dataset, opts = {}) ⇒ Object
Trains a classifier using the provided Dataset.
-
.train_and_validate(dataset, opts = {}) ⇒ Object
Takes a dataset and splits it randomly into training and validation data.
-
.train_and_validate_from_csv(filename, opts = {}) ⇒ Object
see #train_and_validate.
-
.train_from_csv(filename, opts = {}) ⇒ Object
see #train.
Instance Method Summary collapse
Class Method Details
.train(dataset, opts = {}) ⇒ Object
Trains a classifier using the provided Dataset.
6 7 8 9 10 11 |
# File 'lib/idhja22/binary_classifier.rb', line 6 def train(dataset, opts = {}) attributes_to_use = (opts[:attributes] || dataset.attribute_labels) classifier = new classifier.train(dataset, attributes_to_use) return classifier end |
.train_and_validate(dataset, opts = {}) ⇒ Object
Takes a dataset and splits it randomly into training and validation data. Uses the training data to train a classifier whose perfomance then measured using the validation data.
16 17 18 19 20 21 22 |
# File 'lib/idhja22/binary_classifier.rb', line 16 def train_and_validate(dataset, opts = {}) opts[:"training-proportion"] ||= 0.5 training_set, validation_set = dataset.split(opts[:"training-proportion"]) tree = self.train(training_set, opts) validation_value = tree.validate(validation_set) return tree, validation_value end |
.train_and_validate_from_csv(filename, opts = {}) ⇒ Object
Note:
Takes a CSV filename rather than a Dataset
see #train_and_validate
33 34 35 36 |
# File 'lib/idhja22/binary_classifier.rb', line 33 def train_and_validate_from_csv(filename, opts={}) ds = Dataset.from_csv(filename) train_and_validate(ds, opts) end |
Instance Method Details
#validate(ds) ⇒ Object
39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
# File 'lib/idhja22/binary_classifier.rb', line 39 def validate(ds) output = 0 ds.data.each do |validation_point| begin prob = evaluate(validation_point) output += (validation_point.category == 'Y' ? prob : 1.0 - prob) rescue Idhja22::Dataset::Datum::UnknownAttributeValue # if don't recognised the attribute value in the example, then assume the worst: # will never classify this point correctly # equivalent to output += 0 but no point running this end end return output.to_f/ds.size.to_f end |