Class: Wikipedia::VandalismDetection::Features::CharacterDiversity
- Defined in:
- lib/wikipedia/vandalism_detection/features/character_diversity.rb
Overview
This feature computes the character diversity of the edit’s new revision inserted text. I.e. how many unique characters are amongst all inserted?
Random typing leads to less unique characters relative to full length =>
Instance Method Summary collapse
Methods inherited from Base
Instance Method Details
#calculate(edit) ⇒ Object
13 14 15 16 17 18 19 20 21 |
# File 'lib/wikipedia/vandalism_detection/features/character_diversity.rb', line 13 def calculate(edit) super inserted_letters = edit.inserted_text.scan(/[^\s]/) all_letters_count = inserted_letters.size unique_count = inserted_letters.uniq.size all_letters_count ** (1.0 / unique_count) end |