Class: Wikipedia::VandalismDetection::Features::CharacterDiversity

Inherits:
Base
  • Object
show all
Defined in:
lib/wikipedia/vandalism_detection/features/character_diversity.rb

Overview

This feature computes the character diversity of the edit’s new revision inserted text. I.e. how many unique characters are amongst all inserted?

Random typing leads to less unique characters relative to full length =>

Instance Method Summary collapse

Methods inherited from Base

#count

Instance Method Details

#calculate(edit) ⇒ Object



13
14
15
16
17
18
19
20
21
# File 'lib/wikipedia/vandalism_detection/features/character_diversity.rb', line 13

def calculate(edit)
  super

  inserted_letters = edit.inserted_text.scan(/[^\s]/)
  all_letters_count = inserted_letters.size
  unique_count = inserted_letters.uniq.size

  all_letters_count ** (1.0 / unique_count)
end