Class: Classifier::LSI

Inherits:
Object show all
Includes:
Streaming, Mutex_m
Defined in:
lib/classifier/lsi.rb,
lib/classifier/lsi.rb,
lib/classifier/lsi/incremental_svd.rb

Overview

This class implements a Latent Semantic Indexer, which can search, classify and cluster data based on underlying semantic relations. For more information on the algorithms used, please consult Wikipedia.

Defined Under Namespace

Modules: IncrementalSVD

Constant Summary collapse

DEFAULT_MAX_RANK =

Default maximum rank for incremental SVD

100

Constants included from Streaming

Streaming::DEFAULT_BATCH_SIZE

Class Attribute Summary collapse

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Streaming

#delete_checkpoint, #list_checkpoints, #save_checkpoint

Constructor Details

#initialize(options = {}) ⇒ LSI

Create a fresh index. If you want to call #build_index manually, use

Classifier::LSI.new auto_rebuild: false

For incremental SVD mode (adds documents without full rebuild):

Classifier::LSI.new incremental: true, max_rank: 100


98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
# File 'lib/classifier/lsi.rb', line 98

def initialize(options = {})
  super()
  @auto_rebuild = true unless options[:auto_rebuild] == false
  @word_list = WordList.new
  @items = {}
  @version = 0
  @built_at_version = -1
  @dirty = false
  @storage = nil

  # Incremental SVD settings
  @incremental_mode = options[:incremental] == true
  @max_rank = options[:max_rank] || DEFAULT_MAX_RANK
  @u_matrix = nil
  @initial_vocab_size = nil
end

Class Attribute Details

.backendObject

Returns the value of attribute backend.



15
16
17
# File 'lib/classifier/lsi.rb', line 15

def backend
  @backend
end

Instance Attribute Details

#auto_rebuildObject

Returns the value of attribute auto_rebuild.



85
86
87
# File 'lib/classifier/lsi.rb', line 85

def auto_rebuild
  @auto_rebuild
end

#singular_valuesObject (readonly)

Returns the value of attribute singular_values.



84
85
86
# File 'lib/classifier/lsi.rb', line 84

def singular_values
  @singular_values
end

#storageObject

Returns the value of attribute storage.



85
86
87
# File 'lib/classifier/lsi.rb', line 85

def storage
  @storage
end

#word_listObject (readonly)

Returns the value of attribute word_list.



84
85
86
# File 'lib/classifier/lsi.rb', line 84

def word_list
  @word_list
end

Class Method Details

.from_json(json) ⇒ Object

Loads an LSI index from a JSON string or Hash created by #to_json or #as_json. The index will be rebuilt after loading.

Raises:

  • (ArgumentError)


528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
# File 'lib/classifier/lsi.rb', line 528

def self.from_json(json)
  data = json.is_a?(String) ? JSON.parse(json) : json
  raise ArgumentError, "Invalid classifier type: #{data['type']}" unless data['type'] == 'lsi'

  # Create instance with auto_rebuild disabled during loading
  instance = new(auto_rebuild: false)

  # Restore items (categories stay as strings, matching original storage)
  data['items'].each do |item_key, item_data|
    word_hash = item_data['word_hash'].transform_keys(&:to_sym)
    categories = item_data['categories']
    instance.instance_variable_get(:@items)[item_key] = ContentNode.new(word_hash, *categories)
    instance.instance_variable_set(:@version, instance.instance_variable_get(:@version) + 1)
  end

  # Restore auto_rebuild setting and rebuild index
  instance.auto_rebuild = data['auto_rebuild']
  instance.build_index
  instance
end

.load(storage:) ⇒ Object

Loads an LSI index from the configured storage. The storage is set on the returned instance.

Raises:



611
612
613
614
615
616
617
618
# File 'lib/classifier/lsi.rb', line 611

def self.load(storage:)
  data = storage.read
  raise StorageError, 'No saved state found' unless data

  instance = from_json(data)
  instance.storage = storage
  instance
end

.load_checkpoint(storage:, checkpoint_id:) ⇒ Object

Loads an LSI index from a checkpoint.

Raises:

  • (ArgumentError)


630
631
632
633
634
635
636
637
638
639
640
641
642
# File 'lib/classifier/lsi.rb', line 630

def self.load_checkpoint(storage:, checkpoint_id:)
  raise ArgumentError, 'Storage must be File storage for checkpoints' unless storage.is_a?(Storage::File)

  dir = File.dirname(storage.path)
  base = File.basename(storage.path, '.*')
  ext = File.extname(storage.path)
  checkpoint_path = File.join(dir, "#{base}_checkpoint_#{checkpoint_id}#{ext}")

  checkpoint_storage = Storage::File.new(path: checkpoint_path)
  instance = load(storage: checkpoint_storage)
  instance.storage = storage
  instance
end

.load_from_file(path) ⇒ Object

Loads an LSI index from a file (legacy API).



623
624
625
# File 'lib/classifier/lsi.rb', line 623

def self.load_from_file(path)
  from_json(File.read(path))
end

.matrix_classObject

Get the Matrix class for the current backend



31
32
33
# File 'lib/classifier/lsi.rb', line 31

def matrix_class
  backend == :native ? Classifier::Linalg::Matrix : ::Matrix
end

.native_available?Boolean

Check if using native C extension

Returns:

  • (Boolean)


19
20
21
# File 'lib/classifier/lsi.rb', line 19

def native_available?
  backend == :native
end

.vector_classObject

Get the Vector class for the current backend



25
26
27
# File 'lib/classifier/lsi.rb', line 25

def vector_class
  backend == :native ? Classifier::Linalg::Vector : ::Vector
end

Instance Method Details

#<<(item) ⇒ Object

A less flexible shorthand for add_item that assumes you are passing in a string with no categorries. item will be duck typed via to_s .



240
241
242
# File 'lib/classifier/lsi.rb', line 240

def <<(item)
  add_item(item)
end

#add(**items) ⇒ Object

Adds items to the index using hash-style syntax. The hash keys are categories, and values are items (or arrays of items).

For example:

lsi = Classifier::LSI.new
lsi.add("Dog" => "Dogs are loyal pets")
lsi.add("Cat" => "Cats are independent")
lsi.add(Bird: "Birds can fly")  # Symbol keys work too

Multiple items with the same category:

lsi.add("Dog" => ["Dogs are loyal", "Puppies are cute"])

Batch operations with multiple categories:

lsi.add(
  "Dog" => ["Dogs are loyal", "Puppies are cute"],
  "Cat" => ["Cats are independent", "Kittens are playful"]
)


196
197
198
199
200
# File 'lib/classifier/lsi.rb', line 196

def add(**items)
  items.each do |category, value|
    Array(value).each { |doc| add_item(doc, category.to_s) }
  end
end

#add_batch(batch_size: Streaming::DEFAULT_BATCH_SIZE, **items) ⇒ Object

Adds items to the index in batches from an array. Documents are added without rebuilding, then the index is rebuilt at the end.

Examples:

Batch add with progress

lsi.add_batch(Dog: documents, batch_size: 100) do |progress|
  puts "#{progress.percent}% complete"
end


687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
# File 'lib/classifier/lsi.rb', line 687

def add_batch(batch_size: Streaming::DEFAULT_BATCH_SIZE, **items)
  original_auto_rebuild = @auto_rebuild
  @auto_rebuild = false

  begin
    total_docs = items.values.sum { |v| Array(v).size }
    progress = Streaming::Progress.new(total: total_docs)

    items.each do |category, documents|
      Array(documents).each_slice(batch_size) do |batch|
        batch.each { |doc| add_item(doc, category.to_s) }
        progress.completed += batch.size
        progress.current_batch += 1
        yield progress if block_given?
      end
    end
  ensure
    @auto_rebuild = original_auto_rebuild
    build_index if original_auto_rebuild
  end
end

#add_item(item, *categories, &block) ⇒ Object

Deprecated.

Use #add instead for clearer hash-style syntax.

Adds an item to the index. item is assumed to be a string, but any item may be indexed so long as it responds to #to_s or if you provide an optional block explaining how the indexer can fetch fresh string data. This optional block is passed the item, so the item may only be a reference to a URL or file name.

For example:

lsi = Classifier::LSI.new
lsi.add_item "This is just plain text"
lsi.add_item "/home/me/filename.txt" { |x| File.read x }
ar = ActiveRecordObject.find( :all )
lsi.add_item ar, *ar.categories { |x| ar.content }


218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
# File 'lib/classifier/lsi.rb', line 218

def add_item(item, *categories, &block)
  clean_word_hash = block ? block.call(item).clean_word_hash : item.to_s.clean_word_hash
  node = nil

  synchronize do
    node = ContentNode.new(clean_word_hash, *categories)
    @items[item] = node
    @version += 1
    @dirty = true
  end

  # Use incremental update if enabled and we have a U matrix
  return perform_incremental_update(node, clean_word_hash) if @incremental_mode && @u_matrix

  build_index if @auto_rebuild
end

#as_jsonObject

Returns a hash representation of the LSI index. Only source data (word_hash, categories) is included, not computed vectors. This can be converted to JSON or used directly.



499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
# File 'lib/classifier/lsi.rb', line 499

def as_json(*)
  items_data = @items.transform_values do |node|
    {
      word_hash: node.word_hash.transform_keys(&:to_s),
      categories: node.categories.map(&:to_s)
    }
  end

  {
    version: 1,
    type: 'lsi',
    auto_rebuild: @auto_rebuild,
    items: items_data
  }
end

#build_index(cutoff = 0.75, force: false) ⇒ Object

This function rebuilds the index if needs_rebuild? returns true. For very large document spaces, this indexing operation may take some time to complete, so it may be wise to place the operation in another thread.

As a rule, indexing will be fairly swift on modern machines until you have well over 500 documents indexed, or have an incredibly diverse vocabulary for your documents.

The optional parameter “cutoff” is a tuning parameter. When the index is built, a certain number of s-values are discarded from the system. The cutoff parameter tells the indexer how many of these values to keep. A value of 1 for cutoff means that no semantic analysis will take place, turning the LSI class into a simple vector search engine.



293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
# File 'lib/classifier/lsi.rb', line 293

def build_index(cutoff = 0.75, force: false)
  validate_cutoff!(cutoff)

  synchronize do
    return unless force || needs_rebuild_unlocked?

    make_word_list

    doc_list = @items.values
    tda = doc_list.collect { |node| node.raw_vector_with(@word_list) }

    if self.class.native_available?
      # Convert vectors to arrays for matrix construction
      tda_arrays = tda.map { |v| v.respond_to?(:to_a) ? v.to_a : v }
      tdm = self.class.matrix_class.alloc(*tda_arrays).trans
      ntdm, u_mat = build_reduced_matrix_with_u(tdm, cutoff)
      assign_native_ext_lsi_vectors(ntdm, doc_list)
    else
      tdm = Matrix.rows(tda).trans
      ntdm, u_mat = build_reduced_matrix_with_u(tdm, cutoff)
      assign_ruby_lsi_vectors(ntdm, doc_list)
    end

    # Store U matrix for incremental mode
    if @incremental_mode
      @u_matrix = u_mat
      @initial_vocab_size = @word_list.size
    end

    @built_at_version = @version
  end
end

#categories_for(item) ⇒ Object

Returns the categories for a given indexed items. You are free to add and remove items from this as you see fit. It does not invalide an index to change its categories.



248
249
250
251
252
253
254
# File 'lib/classifier/lsi.rb', line 248

def categories_for(item)
  synchronize do
    return [] unless @items[item]

    @items[item].categories
  end
end

#classify(doc, cutoff = 0.30, &block) ⇒ Object

This function uses a voting system to categorize documents, based on the categories of other documents. It uses the same logic as the find_related function to find related documents, then returns the most obvious category from this list.



421
422
423
424
425
426
427
428
429
430
# File 'lib/classifier/lsi.rb', line 421

def classify(doc, cutoff = 0.30, &block)
  validate_cutoff!(cutoff)

  synchronize do
    votes = vote_unlocked(doc, cutoff, &block)

    ranking = votes.keys.sort_by { |x| votes[x] }
    ranking[-1]
  end
end

#classify_with_confidence(doc, cutoff = 0.30, &block) ⇒ Object

Returns the same category as classify() but also returns a confidence value derived from the vote share that the winning category got.

e.g. category,confidence = classify_with_confidence(doc) if confidence < 0.3

category = nil

end

See classify() for argument docs



451
452
453
454
455
456
457
458
459
460
461
462
463
464
# File 'lib/classifier/lsi.rb', line 451

def classify_with_confidence(doc, cutoff = 0.30, &block)
  validate_cutoff!(cutoff)

  synchronize do
    votes = vote_unlocked(doc, cutoff, &block)
    votes_sum = votes.values.sum
    return [nil, nil] if votes_sum.zero?

    ranking = votes.keys.sort_by { |x| votes[x] }
    winner = ranking[-1]
    vote_share = votes[winner] / votes_sum.to_f
    [winner, vote_share]
  end
end

#current_rankObject

Returns the current rank of the incremental SVD (number of singular values kept). Returns nil if incremental mode is not active.



155
156
157
# File 'lib/classifier/lsi.rb', line 155

def current_rank
  @singular_values&.count(&:positive?)
end

#dirty?Boolean

Returns true if there are unsaved changes.

Returns:

  • (Boolean)


603
604
605
# File 'lib/classifier/lsi.rb', line 603

def dirty?
  @dirty
end

#disable_incremental_mode!Object

Disables incremental mode. Subsequent adds will trigger full rebuilds.



162
163
164
165
166
# File 'lib/classifier/lsi.rb', line 162

def disable_incremental_mode!
  @incremental_mode = false
  @u_matrix = nil
  @initial_vocab_size = nil
end

#enable_incremental_mode!(max_rank: DEFAULT_MAX_RANK) ⇒ Object

Enables incremental mode with optional max_rank setting. The next build_index call will store the U matrix for incremental updates.



172
173
174
175
# File 'lib/classifier/lsi.rb', line 172

def enable_incremental_mode!(max_rank: DEFAULT_MAX_RANK)
  @incremental_mode = true
  @max_rank = max_rank
end

This function takes content and finds other documents that are semantically “close”, returning an array of documents sorted from most to least relavant. max_nearest specifies the number of documents to return. A value of 0 means that it returns all the indexed documents, sorted by relavence.

This is particularly useful for identifing clusters in your document space. For example you may want to identify several “What’s Related” items for weblog articles, or find paragraphs that relate to each other in an essay.



406
407
408
409
410
411
412
413
# File 'lib/classifier/lsi.rb', line 406

def find_related(doc, max_nearest = 3, &block)
  synchronize do
    carry =
      proximity_array_for_content_unlocked(doc, &block).reject { |pair| pair[0] == doc }
    result = carry.collect { |x| x[0] }
    result[0..(max_nearest - 1)]
  end
end

#highest_ranked_stems(doc, count = 3) ⇒ Object

Prototype, only works on indexed documents. I have no clue if this is going to work, but in theory it’s supposed to.



470
471
472
473
474
475
476
477
478
# File 'lib/classifier/lsi.rb', line 470

def highest_ranked_stems(doc, count = 3)
  synchronize do
    raise 'Requested stem ranking on non-indexed content!' unless @items[doc]

    arr = node_for_content_unlocked(doc).lsi_vector.to_a
    top_n = arr.sort.reverse[0..(count - 1)]
    top_n.collect { |x| @word_list.word_for_index(arr.index(x)) }
  end
end

#highest_relative_content(max_chunks = 10) ⇒ Object

This method returns max_chunks entries, ordered by their average semantic rating. Essentially, the average distance of each entry from all other entries is calculated, the highest are returned.

This can be used to build a summary service, or to provide more information about your dataset’s general content. For example, if you were to use categorize on the results of this data, you could gather information on what your dataset is generally about.



336
337
338
339
340
341
342
343
344
345
# File 'lib/classifier/lsi.rb', line 336

def highest_relative_content(max_chunks = 10)
  synchronize do
    return [] if needs_rebuild_unlocked?

    avg_density = {}
    @items.each_key { |x| avg_density[x] = proximity_array_for_content_unlocked(x).sum { |pair| pair[1] } }

    avg_density.keys.sort_by { |x| avg_density[x] }.reverse[0..(max_chunks - 1)].map
  end
end

#incremental_enabled?Boolean

Returns true if incremental mode is enabled and active. Incremental mode becomes active after the first build_index call.

Returns:

  • (Boolean)


147
148
149
# File 'lib/classifier/lsi.rb', line 147

def incremental_enabled?
  @incremental_mode && !@u_matrix.nil?
end

#itemsObject

Returns an array of items that are indexed.



273
274
275
# File 'lib/classifier/lsi.rb', line 273

def items
  synchronize { @items.keys }
end

#marshal_dumpObject

Custom marshal serialization to exclude mutex state



482
483
484
# File 'lib/classifier/lsi.rb', line 482

def marshal_dump
  [@auto_rebuild, @word_list, @items, @version, @built_at_version, @dirty]
end

#marshal_load(data) ⇒ Object

Custom marshal deserialization to recreate mutex



488
489
490
491
492
# File 'lib/classifier/lsi.rb', line 488

def marshal_load(data)
  mu_initialize
  @auto_rebuild, @word_list, @items, @version, @built_at_version, @dirty = data
  @storage = nil
end

#needs_rebuild?Boolean

Returns true if the index needs to be rebuilt. The index needs to be built after all informaton is added, but before you start using it for search, classification and cluster detection.

Returns:

  • (Boolean)


120
121
122
# File 'lib/classifier/lsi.rb', line 120

def needs_rebuild?
  synchronize { (@items.keys.size > 1) && (@version != @built_at_version) }
end

#proximity_array_for_content(doc, &block) ⇒ Object

This function is the primitive that find_related and classify build upon. It returns an array of 2-element arrays. The first element of this array is a document, and the second is its “score”, defining how “close” it is to other indexed items.

These values are somewhat arbitrary, having to do with the vector space created by your content, so the magnitude is interpretable but not always meaningful between indexes.

The parameter doc is the content to compare. If that content is not indexed, you can pass an optional block to define how to create the text data. See add_item for examples of how this works.



361
362
363
# File 'lib/classifier/lsi.rb', line 361

def proximity_array_for_content(doc, &block)
  synchronize { proximity_array_for_content_unlocked(doc, &block) }
end

#proximity_norms_for_content(doc, &block) ⇒ Object

Similar to proximity_array_for_content, this function takes similar arguments and returns a similar array. However, it uses the normalized calculated vectors instead of their full versions. This is useful when you’re trying to perform operations on content that is much smaller than the text you’re working with. search uses this primitive.



372
373
374
# File 'lib/classifier/lsi.rb', line 372

def proximity_norms_for_content(doc, &block)
  synchronize { proximity_norms_for_content_unlocked(doc, &block) }
end

#reloadObject

Reloads the LSI index from the configured storage. Raises UnsavedChangesError if there are unsaved changes. Use reload! to force reload and discard changes.

Raises:

  • (ArgumentError)


574
575
576
577
578
579
580
581
582
583
584
# File 'lib/classifier/lsi.rb', line 574

def reload
  raise ArgumentError, 'No storage configured' unless storage
  raise UnsavedChangesError, 'Unsaved changes would be lost. Call save first or use reload!' if @dirty

  data = storage.read
  raise StorageError, 'No saved state found' unless data

  restore_from_json(data)
  @dirty = false
  self
end

#reload!Object

Force reloads the LSI index from storage, discarding any unsaved changes.

Raises:

  • (ArgumentError)


589
590
591
592
593
594
595
596
597
598
# File 'lib/classifier/lsi.rb', line 589

def reload!
  raise ArgumentError, 'No storage configured' unless storage

  data = storage.read
  raise StorageError, 'No saved state found' unless data

  restore_from_json(data)
  @dirty = false
  self
end

#remove_item(item) ⇒ Object

Removes an item from the database, if it is indexed.



259
260
261
262
263
264
265
266
267
268
269
# File 'lib/classifier/lsi.rb', line 259

def remove_item(item)
  removed = synchronize do
    next false unless @items.key?(item)

    @items.delete(item)
    @version += 1
    @dirty = true
    true
  end
  build_index if removed && @auto_rebuild
end

#saveObject

Saves the LSI index to the configured storage. Raises ArgumentError if no storage is configured.

Raises:

  • (ArgumentError)


553
554
555
556
557
558
# File 'lib/classifier/lsi.rb', line 553

def save
  raise ArgumentError, 'No storage configured. Use save_to_file(path) or set storage=' unless storage

  storage.write(to_json)
  @dirty = false
end

#save_to_file(path) ⇒ Object

Saves the LSI index to a file (legacy API).



563
564
565
566
567
# File 'lib/classifier/lsi.rb', line 563

def save_to_file(path)
  result = File.write(path, to_json)
  @dirty = false
  result
end

#search(string, max_nearest = 3) ⇒ Object

This function allows for text-based search of your index. Unlike other functions like find_related and classify, search only takes short strings. It will also ignore factors like repeated words. It is best for short, google-like search terms. A search will first priortize lexical relationships, then semantic ones.

While this may seem backwards compared to the other functions that LSI supports, it is actually the same algorithm, just applied on a smaller document.



385
386
387
388
389
390
391
392
393
# File 'lib/classifier/lsi.rb', line 385

def search(string, max_nearest = 3)
  synchronize do
    return [] if needs_rebuild_unlocked?

    carry = proximity_norms_for_content_unlocked(string)
    result = carry.collect { |x| x[0] }
    result[0..(max_nearest - 1)]
  end
end

#singular_value_spectrumObject



125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
# File 'lib/classifier/lsi.rb', line 125

def singular_value_spectrum
  return nil unless @singular_values

  total = @singular_values.sum
  return nil if total.zero?

  cumulative = 0.0
  @singular_values.map.with_index do |value, i|
    cumulative += value
    {
      dimension: i,
      value: value,
      percentage: value / total,
      cumulative_percentage: cumulative / total
    }
  end
end

#to_jsonObject

Serializes the LSI index to a JSON string. Only source data (word_hash, categories) is serialized, not computed vectors. On load, the index will be rebuilt automatically.



520
521
522
# File 'lib/classifier/lsi.rb', line 520

def to_json(*)
  as_json.to_json
end

#train_batch(category = nil, documents = nil, batch_size: Streaming::DEFAULT_BATCH_SIZE, **categories, &block) ⇒ Object

Alias train_batch to add_batch for API consistency with other classifiers. Note: LSI uses categories differently (items have categories, not the training call).



713
714
715
716
717
718
719
# File 'lib/classifier/lsi.rb', line 713

def train_batch(category = nil, documents = nil, batch_size: Streaming::DEFAULT_BATCH_SIZE, **categories, &block)
  if category && documents
    add_batch(batch_size: batch_size, **{ category.to_sym => documents }, &block)
  else
    add_batch(batch_size: batch_size, **categories, &block)
  end
end

#train_from_stream(category, io, batch_size: Streaming::DEFAULT_BATCH_SIZE) ⇒ Object

Trains the LSI index from an IO stream. Each line in the stream is treated as a separate document. Documents are added without rebuilding, then the index is rebuilt at the end.

Examples:

Train from a file

lsi.train_from_stream(:category, File.open('corpus.txt'))

With progress tracking

lsi.train_from_stream(:category, io, batch_size: 500) do |progress|
  puts "#{progress.completed} documents processed"
end


657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
# File 'lib/classifier/lsi.rb', line 657

def train_from_stream(category, io, batch_size: Streaming::DEFAULT_BATCH_SIZE)
  original_auto_rebuild = @auto_rebuild
  @auto_rebuild = false

  begin
    reader = Streaming::LineReader.new(io, batch_size: batch_size)
    total = reader.estimate_line_count
    progress = Streaming::Progress.new(total: total)

    reader.each_batch do |batch|
      batch.each { |text| add_item(text, category) }
      progress.completed += batch.size
      progress.current_batch += 1
      yield progress if block_given?
    end
  ensure
    @auto_rebuild = original_auto_rebuild
    build_index if original_auto_rebuild
  end
end

#vote(doc, cutoff = 0.30, &block) ⇒ Object



433
434
435
436
437
# File 'lib/classifier/lsi.rb', line 433

def vote(doc, cutoff = 0.30, &block)
  validate_cutoff!(cutoff)

  synchronize { vote_unlocked(doc, cutoff, &block) }
end