Class: Pwnedkeys::Filter

Inherits:
Object
  • Object
show all
Defined in:
lib/pwnedkeys/filter.rb

Defined Under Namespace

Classes: Error, FilterClosedError, InvalidFileError, InvalidKeyError

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(filename, sync: true, revision: nil) ⇒ Filter

Create a new Pwnedkeys::Filter.

Equivalent to open, without the possibility of block-style access.

See Also:



209
210
211
212
213
214
# File 'lib/pwnedkeys/filter.rb', line 209

def initialize(filename, sync: true, revision: nil)
  @fd = File.open(filename, File::RDWR, binmode: true)
  refresh_header

  @sync, @revision = sync, revision
end

Class Method Details

.create(filename, hash_count:, hash_length:) ⇒ void

This method returns an undefined value.

Create a new filter data file.

Initialize a file, which cannot already exist, to be a pwnedkeys bloom filter v1 file. The file is not opened for use, it is simply created on the filesystem at the location specified.

Parameters:

  • filename (String)

    the file to be created. It can be an absolute or relative path, or anything else that ‘File.open` will accept.

  • hash_count (Integer)

    how many filter bits each element in the bloom filter will set.

  • hash_length (Integer)

    the number of bits that will be used for each hash value.

Raises:

  • (SystemCallError)

    if any sort of filesystem-related problem occurs, an ‘Errno`-related exception will be raised. Likely candidates include `EEXIST` (the file you specified already exists), `ENOENT` (the directory you specified doesn’t exist), and ‘EPERM` (you don’t have permissions to create a file where you want it).



110
111
112
113
114
115
116
117
118
119
120
# File 'lib/pwnedkeys/filter.rb', line 110

def self.create(filename, hash_count:, hash_length:)
  File.open(filename, File::WRONLY | File::CREAT | File::EXCL) do |fd|
    header = V1Header.new(hash_count: hash_count, hash_length: hash_length)

    fd.write(header.to_s)
    fd.seek((2 ** hash_length) / 8 - 1, :CUR)
    fd.write("\0")
  end

  nil
end

.filter_parameters(entries:, fp_rate:) ⇒ Hash<Symbol, Integer>

Calculate count/length parameters for a given entry count and desired false-positive rate.

Parameters:

  • entries (Integer)

    how many elements the bloom filter should accommodate.

  • fp_rate (Float)

    the maximum false-positive rate you wish to accept.

Returns:

  • (Hash<Symbol, Integer>)

    the ‘:hash_count` and `:hash_length` which will produce the desired false-positive rate if the filter is filled with the specified number of entries.



134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
# File 'lib/pwnedkeys/filter.rb', line 134

def self.filter_parameters(entries:, fp_rate:)
  # Blessings unto https://en.wikipedia.org/wiki/Bloom_filter#Optimal_number_of_hash_functions
  optimal_filter_bits = (-1 * entries * Math.log(fp_rate) / Math.log(2) / Math.log(2))
  hash_length = (Math.log2(optimal_filter_bits)).ceil
  actual_filter_bits = 2 ** hash_length

  # We could, in theory, just use the "optimal k" (hash count), which would
  # often produce an actual false-positive rate much smaller than our
  # target (because actual_filter_bits can be *significantly* larger than
  # optimal_filter_bits).  Instead, to minimise the hashing required, we
  # can use the extra filter length available to choose a smaller k that
  # still satisfies the target false positive rate.
  #
  # Because my algebra isn't up to solving the FP rate equation for k, and
  # because the search space of possible values of k is small and bounded
  # (by 1, and by the "optimal k" on the upper bound), brute-forcing things
  # seems the easiest option.

  upper_bound = (Math.log(2) * actual_filter_bits / entries).ceil
  hash_count = (1..upper_bound).find do |k|
    (1 - (1 - 1.0 / actual_filter_bits) ** (k * entries)) ** k < fp_rate
  end

  if hash_count.nil?
    #:nocov:
    raise RuntimeError, "CAN'T HAPPEN: Could not determine hash_count for entries: #{entries}, fp_rate: #{fp_rate}.  Please report a violation of the laws of mathematics."
    #:nocov:
  end

  {
    hash_count:  hash_count,
    hash_length: hash_length,
  }
end

.open(filename, sync: true, revision: nil) ⇒ Pwnedkeys::Filter

Open an existing pwnedkeys bloom filter data file.

Parameters:

  • filename (String)

    the file to open.

  • sync (Boolean) (defaults to: true)

    whether or not to sync the data to persistent storage after each call to ‘#add`. Turning this off can improve bulk addition performance, at the cost of potential data loss in the event of catastrophe.

  • revision (Integer) (defaults to: nil)

    set an explicit revision number for changes.

Returns:

Raises:

  • (SystemCallError)

    if anything low-level goes wrong, you will get some sort of ‘Errno`-related exception raised, such as `ENOENT` (the file you specified does not exist) or `EPERM` (you don’t have access to the file specified).

  • (Pwnedkeys::Filter::InvalidFileError)

    if the specified file exists, but is not recognised as a valid pwnedkeys filter file.



189
190
191
192
193
194
195
196
197
198
199
200
201
# File 'lib/pwnedkeys/filter.rb', line 189

def self.open(filename, sync: true, revision: nil)
  filter = Pwnedkeys::Filter.new(filename, sync: sync, revision: revision)

  if block_given?
    begin
      yield filter
    ensure
      filter.close
    end
  else
    filter
  end
end

Instance Method Details

#add(key) ⇒ Boolean

Add a new key (or SPKI) to the filter.

Parameters:

  • key (OpenSSL::PKey::PKey, OpenSSL::X509::SPKI, String)

    the key to add to the filter.

Returns:

  • (Boolean)

    whether the key was added as a new entry. Due to the probabilistic nature of the bloom filter structure, it is possible to add two completely different keys and yet it looks like the “same” key to the bloom filter. Adding two colliding keys isn’t a fatal error, but if it starts to happen regularly, it is a hint that perhaps the existing filter is getting a little too full.

Raises:



252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
# File 'lib/pwnedkeys/filter.rb', line 252

def add(key)
  raise FilterClosedError if @fd.nil?

  return false if probably_includes?(key)

  spki = spkify(key)

  begin
    @fd.flock(File::LOCK_EX)
    refresh_header
    filter_positions(spki.to_der).each do |n|
      @fd.seek(n / 8 + @header.header_size, :SET)
      byte = @fd.read(1).ord
      @fd.seek(-1, :CUR)

      mask = 2 ** (7 - (n % 8))
      new_byte = byte | mask

      @fd.write(new_byte.chr)
    end

    @revision ||= @header.revision + 1

    @header.revision = @revision
    @header.entry_added!

    @header.to_fd(@fd)
    @fd.fdatasync if @sync
  ensure
    @fd.flock(File::LOCK_UN)
  end

  @already_modified = true
end

#closevoid

This method returns an undefined value.

Signal that the filter should be closed for further querying and manipulation.

Raises:

  • (SystemCallError)

    if something filesystem-ish fails.



303
304
305
306
# File 'lib/pwnedkeys/filter.rb', line 303

def close
  @fd.close
  @fd = nil
end

#false_positive_rateFloat

An estimate of the false-positive rate inherent in the filter.

Given the parameters of the filter, we can estimate roughly what the false-positive rate will be when querying this filter.

Returns:

  • (Float)

    the approximate probability of a query result being a false positive, expressed as a floating-point number between 0 and 1.



316
317
318
319
# File 'lib/pwnedkeys/filter.rb', line 316

def false_positive_rate
  # Taken wholesale from https://en.wikipedia.org/wiki/Bloom_filter#Probability_of_false_positives
  (1 - (1 - 1.0 / filter_bit_count) ** (hash_count * entry_count)) ** hash_count
end

#probably_includes?(key) ⇒ Boolean

Query the bloom filter.

Parameters:

  • key (OpenSSL::PKey::PKey, OpenSSL::X509::SPKI, String)

    the key to query the filter for.

Returns:

  • (Boolean)

    whether the queried key probably exists in the filter (‘true`), or whether it *definitely doesn’t* (‘false`).

Raises:



230
231
232
233
234
235
# File 'lib/pwnedkeys/filter.rb', line 230

def probably_includes?(key)
  raise FilterClosedError if @fd.nil?

  spki = spkify(key)
  filter_bits(spki.to_der).all?
end

#syncObject

Force a sync of newly-added data to disk.

This method is only of interest if the filter was opened with ‘sync: false`, because otherwise data will already have been synced to disk as part of the call to #add.



293
294
295
# File 'lib/pwnedkeys/filter.rb', line 293

def sync
  @fd.fdatasync if @fd
end