Class: Wikipedia::VandalismDetection::Features::RemovedEmoticonsFrequency

Inherits:
Base
  • Object
show all
Defined in:
lib/wikipedia/vandalism_detection/features/removed_emoticons_frequency.rb

Overview

This feature computes the frequency of emoticon words in the removed text.

Instance Method Summary collapse

Methods inherited from Base

#count

Instance Method Details

#calculate(edit) ⇒ Object

Returns the percentage of markup words in the removed text. Returns 0.0 if cleaned removed text is of zero length.



13
14
15
16
17
18
19
20
21
22
23
# File 'lib/wikipedia/vandalism_detection/features/removed_emoticons_frequency.rb', line 13

def calculate(edit)
  super

  removed_text = edit.removed_text
  regex = /(^|\s)(#{WordLists::EMOTICONS.join('|')})(?=\s|$|\Z|[\.,!?]\s|[\.!?]\Z)/

  emoticons_count = removed_text.scan(regex).flatten.reject { |c| c.size < 2 }.count
  total_count = removed_text.split.count

  (total_count > 0) ? (emoticons_count.to_f) / (total_count.to_f) : 0.0
end