Module: PluckInBatches::Extensions::RelationExtension

Defined in:: lib/pluck_in_batches/extensions.rb

Instance Method Summary collapse

#pluck_each(*columns, start: nil, finish: nil, of: 1000, batch_size: of, error_on_ignore: nil, order: :asc, cursor_column: primary_key, &block) ⇒ Object

Yields each set of values corresponding to the specified columns that was found by the passed options.
#pluck_in_batches(*columns, start: nil, finish: nil, of: 1000, batch_size: of, error_on_ignore: nil, cursor_column: primary_key, order: :asc, &block) ⇒ Object

Yields each batch of values corresponding to the specified columns that was found by the passed options as an array.

Instance Method Details

#pluck_each(*columns, start: nil, finish: nil, of: 1000, batch_size: of, error_on_ignore: nil, order: :asc, cursor_column: primary_key, &block) ⇒ `Object`

Yields each set of values corresponding to the specified columns that was found by the passed options. If one column specified - returns its value, if an array of columns - returns an array of values.

See #pluck_in_batches for all the details.

# File 'lib/pluck_in_batches/extensions.rb', line 16

def pluck_each(*columns, start: nil, finish: nil, of: 1000, batch_size: of, error_on_ignore: nil, order: :asc, cursor_column: primary_key, &block)
  iterator = Iterator.new(self)
  iterator.each(*columns, start: start, finish: finish, batch_size: batch_size, error_on_ignore: error_on_ignore, cursor_column: cursor_column, order: order, &block)
end

#pluck_in_batches(*columns, start: nil, finish: nil, of: 1000, batch_size: of, error_on_ignore: nil, cursor_column: primary_key, order: :asc, &block) ⇒ `Object`

Yields each batch of values corresponding to the specified columns that was found by the passed options as an array.

User.where("age > 21").pluck_in_batches(:email) do |emails|
  jobs = emails.map { |email| PartyReminderJob.new(email) }
  ActiveJob.perform_all_later(jobs)
end

If you do not provide a block to #pluck_in_batches, it will return an Enumerator for chaining with other methods:

User.pluck_in_batches(:name, :email).with_index do |group, index|
  puts "Processing group ##{index}"
  jobs = group.map { |name, email| PartyReminderJob.new(name, email) }
  ActiveJob.perform_all_later(jobs)
end

Options

:batch_size - Specifies the size of the batch. Defaults to 1000.
:of - Same as :batch_size.
:start - Specifies the primary key value to start from, inclusive of the value.
:finish - Specifies the primary key value to end at, inclusive of the value.
:error_on_ignore - Overrides the application config to specify if an error should be raised when an order is present in the relation.
:cursor_column - Specifies the column(s) on which the iteration should be done. This column(s) should be orderable (e.g. an integer or string). Defaults to primary key.
:order - Specifies the cursor column(s) order (can be :asc or :desc or an array consisting of :asc or :desc). Defaults to :asc.
```
class Book < ActiveRecord::Base
  self.primary_key = [:author_id, :version]
end

Book.pluck_in_batches(:title, order: [:asc, :desc])
```
In the above code, author_id is sorted in ascending order and version in descending order.

Limits are honored, and if present there is no requirement for the batch size: it can be less than, equal to, or greater than the limit.

The options start and finish are especially useful if you want multiple workers dealing with the same processing queue. You can make worker 1 handle all the records between id 1 and 9999 and worker 2 handle from 10000 and beyond by setting the :start and :finish option on each worker.

# Let's process from record 10_000 on.
User.pluck_in_batches(:email, start: 10_000) do |emails|
  jobs = emails.map { |email| PartyReminderJob.new(email) }
  ActiveJob.perform_all_later(jobs)
end

NOTE: Order can be ascending (:asc) or descending (:desc). It is automatically set to ascending on the primary key (“id ASC”). This also means that this method only works when the primary key is orderable (e.g. an integer or string).

NOTE: By its nature, batch processing is subject to race conditions if other processes are modifying the database.

# File 'lib/pluck_in_batches/extensions.rb', line 81

def pluck_in_batches(*columns, start: nil, finish: nil, of: 1000, batch_size: of, error_on_ignore: nil, cursor_column: primary_key, order: :asc, &block)
  iterator = Iterator.new(self)
  iterator.each_batch(*columns, start: start, finish: finish, batch_size: batch_size, error_on_ignore: error_on_ignore, cursor_column: cursor_column, order: order, &block)
end