Class: Wikipedia::VandalismDetection::Features::Base

Inherits:
Object
  • Object
show all
Defined in:
lib/wikipedia/vandalism_detection/features/base.rb

Overview

This class should be the base class for all Wikipedia::Feature classes.

Instance Method Summary collapse

Instance Method Details

#calculate(edit) ⇒ Object

Base method for feature calculation. This method should be overwritten in the concrete Wikipedia::Feature-classes.

Example: def calculate(edit)

super # to handle ArgumentException

... concrete calculation of feature out of edit...

end



21
22
23
# File 'lib/wikipedia/vandalism_detection/features/base.rb', line 21

def calculate(edit)
  raise ArgumentError.new "parameter should be an Edit" unless edit.kind_of? Edit
end

#count(terms, options = {}) ⇒ Object

Count the apperance of a given single term or multiple terms in the given text Example of usage:

feature.count “and”, in: text feature.count [“and”, “or”], in: text

Raises:

  • (ArgumentError)


33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# File 'lib/wikipedia/vandalism_detection/features/base.rb', line 33

def count(terms, options = {})
  terms_is_string = terms.is_a?(String)
  terms_is_array = terms.is_a?(Array)

  raise ArgumentError, "The second parameter should be a Hash of form {in: text}" unless options[:in]
  raise ArgumentError, "The first parameter should be an Array or String" unless
      (terms_is_array || terms_is_string)

  words = options[:in].downcase
  freq = Hash.new(0)
  words.gsub(/[\.,'{2,}:\!\?\(\)]/, '').split.each{ |v| freq[v.to_sym] += 1 }

  if terms_is_string
    freq[terms.downcase.to_sym]
  else
    terms.reduce(0) {|r, term| r + freq[term] }
  end
end