Class: Wikipedia::VandalismDetection::Features::TermFrequency

Inherits:
FrequencyBase show all
Defined in:
lib/wikipedia/vandalism_detection/features/term_frequency.rb

Overview

This feature computes average frequency of words inserted in the new revision relative to the words in the old revision.

Instance Method Summary collapse

Methods inherited from FrequencyBase

#frequency

Methods inherited from Base

#count

Instance Method Details

#calculate(edit) ⇒ Object



13
14
15
16
17
18
19
20
21
# File 'lib/wikipedia/vandalism_detection/features/term_frequency.rb', line 13

def calculate(edit)
  super

  new_text = edit.new_revision.text
  inserted_terms = Text.new(edit.inserted_words.join("\n")).clean.gsub(/[^\w\s]/, '').split.uniq
  summed_frequencies = inserted_terms.reduce(0) { |count, term| count + frequency(new_text.clean, term) }

  (inserted_terms.count > 0) ? (summed_frequencies / inserted_terms.count) : 0.0
end