Module: EachBatched
- Defined in:
- lib/each_batched.rb,
lib/each_batched/version.rb
Overview
More grouping/batching logic options than what’s included in Rails.
Constant Summary collapse
- DEFAULT_BATCH_SIZE =
Default batch size to use, if none is specified (defaults to 1000)
1_000
- VERSION =
"0.1.3"
Instance Method Summary collapse
-
#batches_by_ids(batch_size = DEFAULT_BATCH_SIZE, key = nil) ⇒ Object
Yields batches of records from the current scope Snapshots the primary key ids in scope, then loops through grabbing the rows, one chunk of ids at a time.
-
#batches_by_range(batch_size = DEFAULT_BATCH_SIZE) ⇒ Object
Yields batches of records from the current scope.
-
#each_by_ids(batch_size = DEFAULT_BATCH_SIZE, key = nil) ⇒ Object
Loops through each individual row found by #batches_by_ids, instead of each batch see #batches_by_ids for an explanation of its algorithm.
-
#each_by_range(batch_size = DEFAULT_BATCH_SIZE) ⇒ Object
Loops through each individual row found by #batches_by_range, instead of each batch see #batches_by_range for an explanation of its algorithm.
Instance Method Details
#batches_by_ids(batch_size = DEFAULT_BATCH_SIZE, key = nil) ⇒ Object
Yields batches of records from the current scope Snapshots the primary key ids in scope, then loops through grabbing the rows, one chunk of ids at a time.
-
You should explicitly set an order if you want the same order as #batches_by_range, or it may be different.
-
The yielded scope can be lazily loaded (though the id selection query has already run obviously)
-
You can optionally give it some column other than the primary key to use, as long as it’s guaranteed unique
52 53 54 55 56 57 58 59 |
# File 'lib/each_batched.rb', line 52 def batches_by_ids(batch_size=DEFAULT_BATCH_SIZE, key=nil) reduced_scope = scoped.tap { |s| s.where_values = [] }.offset(nil).limit(nil) key = primary_key if key.nil? scoped.value_of(key).in_groups_of(batch_size, false) do |group_ids| # keeps select/group/joins/includes, inside inner batched scope yield reduced_scope.where(key => group_ids), group_ids end end |
#batches_by_range(batch_size = DEFAULT_BATCH_SIZE) ⇒ Object
Yields batches of records from the current scope. Uses offset/limit internally to run through each batch, and can be further restricted by in-scope offset/limit/order (it doesn’t just toss them out!).
-
This algorithm does NOT work well with data that may have inserts/deletes while you’re looping, so if that’s a problem, then you should either lock the table or rows first or use a different algorithm (like ActiveRecord::Batches#find_in_batches or #batches_by_ids).
-
This algorithm may be slower than #batches_by_ids if your query doesn’t execute very quickly.
-
This algorithm can’t be lazily loaded, because it checks for empty results to see when it’s done.
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
# File 'lib/each_batched.rb', line 22 def batches_by_range(batch_size=DEFAULT_BATCH_SIZE) start_offset = scoped.offset_value || 0 end_limit = scoped.limit_value # || nil group_number = 0 processed_number = 0 # This giant while condition (with multiple assignments in it) is a mess, isn't it! # But simplifying it means I have to repeat most of it multiple times! # And putting it into a subroutine doesn't really save space either, with lots of parameters and/or return values! while (length = (records = offset(start_offset + batch_size * group_number). limit(asked_limit = end_limit.nil? || processed_number + batch_size < end_limit ? batch_size : end_limit - processed_number)).length) > 0 yield records processed_number += length break if length < asked_limit || (! end_limit.nil? && processed_number >= end_limit) group_number += 1 end end |
#each_by_ids(batch_size = DEFAULT_BATCH_SIZE, key = nil) ⇒ Object
Loops through each individual row found by #batches_by_ids, instead of each batch see #batches_by_ids for an explanation of its algorithm
63 64 65 |
# File 'lib/each_batched.rb', line 63 def each_by_ids(batch_size=DEFAULT_BATCH_SIZE, key=nil) batches_by_ids(batch_size, key) { |batch| batch.each { |row| yield row } } end |
#each_by_range(batch_size = DEFAULT_BATCH_SIZE) ⇒ Object
Loops through each individual row found by #batches_by_range, instead of each batch see #batches_by_range for an explanation of its algorithm
42 43 44 |
# File 'lib/each_batched.rb', line 42 def each_by_range(batch_size=DEFAULT_BATCH_SIZE) batches_by_range(batch_size) { |batch| batch.each { |row| yield row } } end |