Class: Bloombroom::ContinuousBloomFilter

Inherits:
Object
  • Object
show all
Defined in:
lib/bloombroom/filter/continuous_bloom_filter.rb

Overview

ContinuousBloomFilter is a bloom filter for unbounded stream of keys where keys are expired over a given period of time. The expected capacity of the bloom filter for the desired validity period must be known or estimated. For a given capacity and error rate, BloomHelper.find_m_k can be used to compute optimal m & k values.

4 bits per key (instead of 1 bit in a normal bloom filter) are used for keeping track of the keys ttl. the internal timer resolution is set to half of the ttl (resolution divisor of 2). using 4 bits gives us 15 usable time slots (slot 0 is for the unset state). basically the internal time bookeeping is similar to a ring buffer where the first timer tick will be time slot=1, slot=2, .. slot=15, slot=1 and so on. The total time of our internal clock will thus be 15 * (ttl / 2). We keep track of ttl by writing the current time slot in the key k buckets when first inserted in the filter. when doing a key lookup if any of the bucket contain the 0 value the key is not found. if the interval betweem the current time slot and any of the k buckets value is greater than 2 (resolution divisor) we know this key is expired and we reset the expired buckets to 0.

Constant Summary collapse

RESOLUTION_DIVISOR =
2
BITS_PER_BUCKET =
4

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(m, k, ttl) ⇒ ContinuousBloomFilter

Returns a new instance of ContinuousBloomFilter.

Parameters:

  • m (Fixnum)

    total filter size in number of buckets. optimal m can be computed using BloomHelper.find_m_k

  • k (Fixnum)

    number of hashing functions. optimal k can be computed using BloomHelper.find_m_k

  • ttl (Fixnum)

    key time to live in seconds (validity period)



30
31
32
33
34
35
36
37
38
39
40
41
# File 'lib/bloombroom/filter/continuous_bloom_filter.rb', line 30

def initialize(m, k, ttl)
  @m = m
  @k = k
  @ttl = ttl
  @buckets = BitBucketField.new(BITS_PER_BUCKET, m)

  # time management
  @increment_period = @ttl / RESOLUTION_DIVISOR
  @current_slot = 1
  @max_slot = (2 ** BITS_PER_BUCKET) - 1 # ex. with 4 bits -> we want range 1..15
  @lock = Mutex.new
end

Instance Attribute Details

#bucketsObject (readonly)

Returns the value of attribute buckets.



22
23
24
# File 'lib/bloombroom/filter/continuous_bloom_filter.rb', line 22

def buckets
  @buckets
end

#kObject (readonly)

Returns the value of attribute k.



22
23
24
# File 'lib/bloombroom/filter/continuous_bloom_filter.rb', line 22

def k
  @k
end

#mObject (readonly)

Returns the value of attribute m.



22
23
24
# File 'lib/bloombroom/filter/continuous_bloom_filter.rb', line 22

def m
  @m
end

#ttlObject (readonly)

Returns the value of attribute ttl.



22
23
24
# File 'lib/bloombroom/filter/continuous_bloom_filter.rb', line 22

def ttl
  @ttl
end

Instance Method Details

#add(key) ⇒ ContinuousBloomFilter Also known as: <<

Returns self.

Parameters:

  • key (String)

    the key to add in the filter

Returns:



45
46
47
48
49
# File 'lib/bloombroom/filter/continuous_bloom_filter.rb', line 45

def add(key)
  current_slot = @lock.synchronize{@current_slot}
  BloomHelper.multi_hash(key, @k).each{|position| @buckets[position % @m] = current_slot}
  self
end

#inc_time_slotObject

advance internal time slot. this is exposed primarily for spec’ing purposes. normally this is automatically called by the internal timer thread but if not using the internal timer thread it can be called explicitly when doing your own time management.



80
81
82
83
# File 'lib/bloombroom/filter/continuous_bloom_filter.rb', line 80

def inc_time_slot
  # ex. with 4 bits -> we want range 1..15, 
  @lock.synchronize{@current_slot = (@current_slot % @max_slot) + 1}
end

#include?(key) ⇒ Boolean Also known as: []

Returns true if given key is present in the filter. false positive are possible and dependant on the m and k filter parameters.

Parameters:

  • key (String)

    test for the inclusion if key in the filter

Returns:

  • (Boolean)

    true if given key is present in the filter. false positive are possible and dependant on the m and k filter parameters.



54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
# File 'lib/bloombroom/filter/continuous_bloom_filter.rb', line 54

def include?(key)
  current_slot = @lock.synchronize{@current_slot}
  expired = false

  BloomHelper.multi_hash(key, @k).each do |position| 
    start_slot = @buckets[position % @m]
    if start_slot == 0
      expired = true
    elsif elapsed(start_slot, current_slot) > RESOLUTION_DIVISOR
      expired = true
      @buckets[position % @m] = 0
    end
  end
  !expired
end

#start_timerObject

start the internal timer thread for managing ttls. must be explicitely called



72
73
74
# File 'lib/bloombroom/filter/continuous_bloom_filter.rb', line 72

def start_timer
  @timer ||= detach_timer
end