Class: Wikipedia::VandalismDetection::Classifier
- Inherits:
-
Object
- Object
- Wikipedia::VandalismDetection::Classifier
- Defined in:
- lib/wikipedia/vandalism_detection/classifier.rb
Instance Attribute Summary collapse
-
#dataset ⇒ Object
readonly
Returns the value of attribute dataset.
-
#evaluator ⇒ Object
readonly
Returns the value of attribute evaluator.
Instance Method Summary collapse
-
#classifier_instance ⇒ Object
Returns the concrete classifier instance configured in the config file When you configured a Trees::RandomForest classifier you will get a Weka::Classifiers::Trees::RandomForest instance.
-
#classify(edit_or_features, options = {}) ⇒ Object
Classifies an edit or a set of features and returns the vandalism confidence by default If option ‘return_all_params: true’ is set, it returns a Hash of form { confidence => …, class_index => …}.
-
#cross_validate(options = {}) ⇒ Object
Cross validates the classifier.
-
#initialize(dataset = nil) ⇒ Classifier
constructor
Loads the classifier instance configured in the config file.
Constructor Details
#initialize(dataset = nil) ⇒ Classifier
Loads the classifier instance configured in the config file.
18 19 20 21 22 23 |
# File 'lib/wikipedia/vandalism_detection/classifier.rb', line 18 def initialize(dataset = nil) @config = Wikipedia::VandalismDetection.configuration @feature_calculator = FeatureCalculator.new @classifier = load_classifier(dataset) @evaluator = Evaluator.new(self) end |
Instance Attribute Details
#dataset ⇒ Object (readonly)
Returns the value of attribute dataset.
15 16 17 |
# File 'lib/wikipedia/vandalism_detection/classifier.rb', line 15 def dataset @dataset end |
#evaluator ⇒ Object (readonly)
Returns the value of attribute evaluator.
15 16 17 |
# File 'lib/wikipedia/vandalism_detection/classifier.rb', line 15 def evaluator @evaluator end |
Instance Method Details
#classifier_instance ⇒ Object
Returns the concrete classifier instance configured in the config file When you configured a Trees::RandomForest classifier you will get a Weka::Classifiers::Trees::RandomForest instance. This instance can be used for native function callings of the classifier class.
29 30 31 |
# File 'lib/wikipedia/vandalism_detection/classifier.rb', line 29 def classifier_instance @classifier end |
#classify(edit_or_features, options = {}) ⇒ Object
Classifies an edit or a set of features and returns the vandalism confidence by default If option ‘return_all_params: true’ is set, it returns a Hash of form { confidence => …, class_index => …}
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
# File 'lib/wikipedia/vandalism_detection/classifier.rb', line 44 def classify(edit_or_features, = {}) features = @config.features param_is_features = edit_or_features.is_a?(Array) && (edit_or_features.size == features.count) param_is_edit = edit_or_features.is_a? Edit unless param_is_edit || param_is_features raise ArgumentError, "Input has to be an Edit or an Array of feature values." end feature_values = param_is_edit ? @feature_calculator.calculate_features_for(edit_or_features) : edit_or_features return -1.0 if feature_values.empty? feature_values = feature_values.map { |i| i == Features::MISSING_VALUE ? nil : i } dataset = Instances.empty dataset.set_class_index(feature_values.count) dataset.add_instance([*feature_values, Instances::VANDALISM]) instance = dataset.instance(0) instance.set_class_missing if @config.use_occ? if @config. =~ /#{Instances::VANDALISM}/ index = Instances::VANDALISM_CLASS_INDEX else index = Instances::REGULAR_CLASS_INDEX end else index = Instances::VANDALISM_CLASS_INDEX end confidence = (@classifier.distribution_for_instance(instance).to_a)[index] if [:return_all_params] class_index = @classifier.classify_instance(instance) class_index = class_index.nan? ? Instances::NOT_KNOWN_INDEX : class_index.to_i results = { confidence: confidence, class_index: class_index } else results = confidence end results end |
#cross_validate(options = {}) ⇒ Object
Cross validates the classifier. Fold is used as defined in configuration (default is 10).
97 98 99 |
# File 'lib/wikipedia/vandalism_detection/classifier.rb', line 97 def cross_validate( = {}) @evaluator.cross_validate() end |