Class: Wikipedia::VandalismDetection::Features::Compressibility

Inherits:
Base
  • Object
show all
Defined in:
lib/wikipedia/vandalism_detection/features/compressibility.rb

Overview

This feature describes compressibility ratio of compressed and uncompressed inserted text.

Instance Method Summary collapse

Methods inherited from Base

#count

Instance Method Details

#calculate(edit) ⇒ Object

Claculates the compressibility ratio of the inserted text. Values above 0.5 are higher compressed and therefor can stand for nonsense text as: ‘AAAAAAAAAAAAAAAAAAAhhhhhhhhhhhhhhhh!’ etc.



15
16
17
18
19
20
21
22
23
# File 'lib/wikipedia/vandalism_detection/features/compressibility.rb', line 15

def calculate(edit)
  super

  inserted_text = edit.inserted_text
  uncompressed_size = inserted_text.bytesize.to_f
  compressed_size = Zlib::Deflate.deflate(inserted_text).bytesize.to_f

  inserted_text.empty? ? 0.5 : (uncompressed_size / ( compressed_size + uncompressed_size))
end