Class: Wikipedia::VandalismDetection::Classifier

Inherits:
Object
  • Object
show all
Defined in:
lib/wikipedia/vandalism_detection/classifier.rb

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(dataset = nil) ⇒ Classifier

Loads the classifier instance configured in the config file.



18
19
20
21
22
23
# File 'lib/wikipedia/vandalism_detection/classifier.rb', line 18

def initialize(dataset = nil)
  @config = Wikipedia::VandalismDetection.configuration
  @feature_calculator = FeatureCalculator.new
  @classifier = load_classifier(dataset)
  @evaluator = Evaluator.new(self)
end

Instance Attribute Details

#datasetObject (readonly)

Returns the value of attribute dataset.



15
16
17
# File 'lib/wikipedia/vandalism_detection/classifier.rb', line 15

def dataset
  @dataset
end

#evaluatorObject (readonly)

Returns the value of attribute evaluator.



15
16
17
# File 'lib/wikipedia/vandalism_detection/classifier.rb', line 15

def evaluator
  @evaluator
end

Instance Method Details

#classifier_instanceObject

Returns the concrete classifier instance configured in the config file When you configured a Trees::RandomForest classifier you will get a Weka::Classifiers::Trees::RandomForest instance. This instance can be used for native function callings of the classifier class.



29
30
31
# File 'lib/wikipedia/vandalism_detection/classifier.rb', line 29

def classifier_instance
  @classifier
end

#classify(edit_or_features, options = {}) ⇒ Object

Classifies an edit or a set of features and returns the vandalism confidence by default If option ‘return_all_params: true’ is set, it returns a Hash of form { confidence => …, class_index => …}

Examples:

# suppose you have a dataset with 2 feature or 'edit' as an instance of Wikipedia::VandalismDetection::Edit
classifier = Wikipedia::VandalsimDetection::Classifier.new
features = [0.45, 0.67]

confidence = classifier.classify(features)
confidence = classifier.classify(edit)


44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
# File 'lib/wikipedia/vandalism_detection/classifier.rb', line 44

def classify(edit_or_features, options = {})
  features = @config.features
  param_is_features = edit_or_features.is_a?(Array) && (edit_or_features.size == features.count)
  param_is_edit = edit_or_features.is_a? Edit

  unless param_is_edit || param_is_features
    raise ArgumentError, "Input has to be an Edit or an Array of feature values."
  end

  feature_values = param_is_edit ? @feature_calculator.calculate_features_for(edit_or_features) : edit_or_features
  return -1.0 if feature_values.empty?

  feature_values = feature_values.map { |i| i == Features::MISSING_VALUE ? nil : i }

  dataset = Instances.empty
  dataset.set_class_index(feature_values.count)
  dataset.add_instance([*feature_values, Instances::VANDALISM])

  instance = dataset.instance(0)
  instance.set_class_missing

  if @config.use_occ?
    if @config.classifier_options =~ /#{Instances::VANDALISM}/
      index = Instances::VANDALISM_CLASS_INDEX
    else
      index = Instances::REGULAR_CLASS_INDEX
    end
  else
    index = Instances::VANDALISM_CLASS_INDEX
  end


  confidence = (@classifier.distribution_for_instance(instance).to_a)[index]

  if options[:return_all_params]
    class_index = @classifier.classify_instance(instance)
    class_index = class_index.nan? ? Instances::NOT_KNOWN_INDEX : class_index.to_i
    results = { confidence: confidence, class_index: class_index }
  else
    results = confidence
  end

  results
end

#cross_validate(options = {}) ⇒ Object

Cross validates the classifier. Fold is used as defined in configuration (default is 10).

Examples:

classifier = Wikipedia::VandalismDetection::Classifier.new
evaluation = classifier.cross_validate
evaluation = classifier.cross_validate(equally_distributed: true)


97
98
99
# File 'lib/wikipedia/vandalism_detection/classifier.rb', line 97

def cross_validate(options = {})
  @evaluator.cross_validate(options)
end