Class: Bloombroom::BloomFilter

Inherits:
Object
  • Object
show all
Defined in:
lib/bloombroom/filter/bloom_filter.rb

Overview

BloomFilter false positive probability rule of thumb: see www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/ a Bloom filter with a 1% error rate and an optimal value for k only needs 9.6 bits per key, and each time we add 4.8 bits per element we decrease the error rate by ten times.

10000 elements, 1% error rate: m = 10000 * 10 bits -> 12k of memory, k = 0.7 * (10000 * 10 bits / 10000) = 7 hash functions 10000 elements, 0.1% error rate: m = 10000 * 15 bits -> 18k of memory, k = 0.7 * (10000 * 15 bits / 10000) = 11 hash functions

Bloombroom::BloomHelper.find_m_k can be used to compute optimal m & k values for a required capacity and error rate.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(m, k) ⇒ BloomFilter

Returns a new instance of BloomFilter.

Parameters:

  • m (Fixnum)

    filter size in bits

  • k (Fixnum)

    number of hashing functions



21
22
23
24
25
26
# File 'lib/bloombroom/filter/bloom_filter.rb', line 21

def initialize(m, k)
  @bits = BitField.new(m)
  @m = m
  @k = k
  @size = 0
end

Instance Attribute Details

#bitsObject (readonly)

Returns the value of attribute bits.



17
18
19
# File 'lib/bloombroom/filter/bloom_filter.rb', line 17

def bits
  @bits
end

#kObject (readonly)

Returns the value of attribute k.



17
18
19
# File 'lib/bloombroom/filter/bloom_filter.rb', line 17

def k
  @k
end

#mObject (readonly)

Returns the value of attribute m.



17
18
19
# File 'lib/bloombroom/filter/bloom_filter.rb', line 17

def m
  @m
end

#sizeObject (readonly)

Returns the value of attribute size.



17
18
19
# File 'lib/bloombroom/filter/bloom_filter.rb', line 17

def size
  @size
end

Instance Method Details

#add(key) ⇒ Fixnum Also known as: <<

Returns the total number of keys in the filter.

Parameters:

  • key (String)

    the key to add in the filter

Returns:

  • (Fixnum)

    the total number of keys in the filter



30
31
32
33
# File 'lib/bloombroom/filter/bloom_filter.rb', line 30

def add(key)
  BloomHelper.multi_hash(key, @k).each{|position| @bits.set(position % @m)}
  @size += 1
end

#include?(key) ⇒ Boolean Also known as: []

Returns true if given key is present in the filter. false positive are possible and dependant on the m and k filter parameters.

Parameters:

  • key (String)

    test for the inclusion if key in the filter

Returns:

  • (Boolean)

    true if given key is present in the filter. false positive are possible and dependant on the m and k filter parameters.



38
39
40
41
# File 'lib/bloombroom/filter/bloom_filter.rb', line 38

def include?(key)
  BloomHelper.multi_hash(key, @k).each{|position| return false unless @bits.include?(position % @m)}
  true
end