Class: Wikipedia::VandalismDetection::Features::TermFrequency
- Inherits:
-
FrequencyBase
- Object
- Base
- FrequencyBase
- Wikipedia::VandalismDetection::Features::TermFrequency
- Defined in:
- lib/wikipedia/vandalism_detection/features/term_frequency.rb
Overview
This feature computes average frequency of words inserted in the new revision relative to the words in the old revision.
Instance Method Summary collapse
Methods inherited from FrequencyBase
Methods inherited from Base
Instance Method Details
#calculate(edit) ⇒ Object
13 14 15 16 17 18 19 20 21 |
# File 'lib/wikipedia/vandalism_detection/features/term_frequency.rb', line 13 def calculate(edit) super new_text = edit.new_revision.text inserted_terms = Text.new(edit.inserted_words.join("\n")).clean.gsub(/[^\w\s]/, '').split.uniq summed_frequencies = inserted_terms.reduce(0) { |count, term| count + frequency(new_text.clean, term) } (inserted_terms.count > 0) ? (summed_frequencies / inserted_terms.count) : 0.0 end |