Class: Pwnedkeys::Filter
- Inherits:
-
Object
- Object
- Pwnedkeys::Filter
- Defined in:
- lib/pwnedkeys/filter.rb
Defined Under Namespace
Classes: Error, FilterClosedError, InvalidFileError, InvalidKeyError
Class Method Summary collapse
-
.create(filename, hash_count:, hash_length:) ⇒ void
Create a new filter data file.
-
.filter_parameters(entries:, fp_rate:) ⇒ Hash<Symbol, Integer>
Calculate count/length parameters for a given entry count and desired false-positive rate.
-
.open(filename, sync: true, revision: nil) ⇒ Pwnedkeys::Filter
Open an existing pwnedkeys bloom filter data file.
Instance Method Summary collapse
-
#add(key) ⇒ Boolean
Add a new key (or SPKI) to the filter.
-
#close ⇒ void
Signal that the filter should be closed for further querying and manipulation.
-
#false_positive_rate ⇒ Float
An estimate of the false-positive rate inherent in the filter.
-
#initialize(filename, sync: true, revision: nil) ⇒ Filter
constructor
Create a new Pwnedkeys::Filter.
-
#probably_includes?(key) ⇒ Boolean
Query the bloom filter.
-
#sync ⇒ Object
Force a sync of newly-added data to disk.
Constructor Details
#initialize(filename, sync: true, revision: nil) ⇒ Filter
Create a new Pwnedkeys::Filter.
Equivalent to open, without the possibility of block-style access.
209 210 211 212 213 214 |
# File 'lib/pwnedkeys/filter.rb', line 209 def initialize(filename, sync: true, revision: nil) @fd = File.open(filename, File::RDWR, binmode: true) refresh_header @sync, @revision = sync, revision end |
Class Method Details
.create(filename, hash_count:, hash_length:) ⇒ void
This method returns an undefined value.
Create a new filter data file.
Initialize a file, which cannot already exist, to be a pwnedkeys bloom filter v1 file. The file is not opened for use, it is simply created on the filesystem at the location specified.
110 111 112 113 114 115 116 117 118 119 120 |
# File 'lib/pwnedkeys/filter.rb', line 110 def self.create(filename, hash_count:, hash_length:) File.open(filename, File::WRONLY | File::CREAT | File::EXCL) do |fd| header = V1Header.new(hash_count: hash_count, hash_length: hash_length) fd.write(header.to_s) fd.seek((2 ** hash_length) / 8 - 1, :CUR) fd.write("\0") end nil end |
.filter_parameters(entries:, fp_rate:) ⇒ Hash<Symbol, Integer>
Calculate count/length parameters for a given entry count and desired false-positive rate.
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
# File 'lib/pwnedkeys/filter.rb', line 134 def self.filter_parameters(entries:, fp_rate:) # Blessings unto https://en.wikipedia.org/wiki/Bloom_filter#Optimal_number_of_hash_functions optimal_filter_bits = (-1 * entries * Math.log(fp_rate) / Math.log(2) / Math.log(2)) hash_length = (Math.log2(optimal_filter_bits)).ceil actual_filter_bits = 2 ** hash_length # We could, in theory, just use the "optimal k" (hash count), which would # often produce an actual false-positive rate much smaller than our # target (because actual_filter_bits can be *significantly* larger than # optimal_filter_bits). Instead, to minimise the hashing required, we # can use the extra filter length available to choose a smaller k that # still satisfies the target false positive rate. # # Because my algebra isn't up to solving the FP rate equation for k, and # because the search space of possible values of k is small and bounded # (by 1, and by the "optimal k" on the upper bound), brute-forcing things # seems the easiest option. upper_bound = (Math.log(2) * actual_filter_bits / entries).ceil hash_count = (1..upper_bound).find do |k| (1 - (1 - 1.0 / actual_filter_bits) ** (k * entries)) ** k < fp_rate end if hash_count.nil? #:nocov: raise RuntimeError, "CAN'T HAPPEN: Could not determine hash_count for entries: #{entries}, fp_rate: #{fp_rate}. Please report a violation of the laws of mathematics." #:nocov: end { hash_count: hash_count, hash_length: hash_length, } end |
.open(filename, sync: true, revision: nil) ⇒ Pwnedkeys::Filter
Open an existing pwnedkeys bloom filter data file.
189 190 191 192 193 194 195 196 197 198 199 200 201 |
# File 'lib/pwnedkeys/filter.rb', line 189 def self.open(filename, sync: true, revision: nil) filter = Pwnedkeys::Filter.new(filename, sync: sync, revision: revision) if block_given? begin yield filter ensure filter.close end else filter end end |
Instance Method Details
#add(key) ⇒ Boolean
Add a new key (or SPKI) to the filter.
252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 |
# File 'lib/pwnedkeys/filter.rb', line 252 def add(key) raise FilterClosedError if @fd.nil? return false if probably_includes?(key) spki = spkify(key) begin @fd.flock(File::LOCK_EX) refresh_header filter_positions(spki.to_der).each do |n| @fd.seek(n / 8 + @header.header_size, :SET) byte = @fd.read(1).ord @fd.seek(-1, :CUR) mask = 2 ** (7 - (n % 8)) new_byte = byte | mask @fd.write(new_byte.chr) end @revision ||= @header.revision + 1 @header.revision = @revision @header.entry_added! @header.to_fd(@fd) @fd.fdatasync if @sync ensure @fd.flock(File::LOCK_UN) end @already_modified = true end |
#close ⇒ void
This method returns an undefined value.
Signal that the filter should be closed for further querying and manipulation.
303 304 305 306 |
# File 'lib/pwnedkeys/filter.rb', line 303 def close @fd.close @fd = nil end |
#false_positive_rate ⇒ Float
An estimate of the false-positive rate inherent in the filter.
Given the parameters of the filter, we can estimate roughly what the false-positive rate will be when querying this filter.
316 317 318 319 |
# File 'lib/pwnedkeys/filter.rb', line 316 def false_positive_rate # Taken wholesale from https://en.wikipedia.org/wiki/Bloom_filter#Probability_of_false_positives (1 - (1 - 1.0 / filter_bit_count) ** (hash_count * entry_count)) ** hash_count end |
#probably_includes?(key) ⇒ Boolean
Query the bloom filter.
230 231 232 233 234 235 |
# File 'lib/pwnedkeys/filter.rb', line 230 def probably_includes?(key) raise FilterClosedError if @fd.nil? spki = spkify(key) filter_bits(spki.to_der).all? end |
#sync ⇒ Object
Force a sync of newly-added data to disk.
This method is only of interest if the filter was opened with ‘sync: false`, because otherwise data will already have been synced to disk as part of the call to #add.
293 294 295 |
# File 'lib/pwnedkeys/filter.rb', line 293 def sync @fd.fdatasync if @fd end |