acts_as_background_solr Rails plugin

This plugin extends the functionality of the acts_as_solr plugin to provide for a disconnected background job that synchronizes data with: Solr in batch. acts_as_solr works by sending changes do Solr for each object immediately following any change. While this is nice as changes are immediately viewable, it has a few drawbacks:

* Invoking commit on solr requires a new searcher to be opened which
  is slow

* There is no way to keep track of an object that was saved when
  notification to solr failed (or when the database transaction
  rolled back)

Acts as background solr extends the acts_as_solr plugin to focus on background processing.

There is one other major changes. Acts as solr calls Model.find for each result from the result set. Acts as background solr will reconstitute your objects from the attributes stored in solr completely avoiding any required database hits when searching against solr. This requires that you modify schema.xml to store the fields you are indexing.

Installation

Use this in place of acts_as_solr as in your models, e.g.

acts_as_background_solr

Example:

require 'acts_as_background_solr'
class User < ActiveRecord::Base
  acts_as_background_solr
end

The options :background, :if and :auto_commit will be automatically overridden.

Each model can track changes in one of two ways:

* Default: Explicitly log changes from the model using listeners

* Database triggers: You can instead use db triggers on your model
  tables to track changes. If you use this method, set the option
  :db_triggers => true

Example:

class User < ActiveRecord::Bae

acts_as_background_solr :additional_fields => [:first_name, :last_name], :exclude_fields => ['encrypted_password'], :db_triggers => true

end

This plugin depends on the following table structure (this is written for postgresql):

create sequence solr_sync_records_seq start with 1; create table solr_sync_records (

id                integer constraint solr_sync_records_id_pk primary key default nextval('solr_sync_records_seq'),
model             varchar(50) not null,
model_id          integer not null,
created_at        timestamp default now() not null

);

– common access path create index solr_sync_records_model_id_idx on solr_sync_records(model, model_id);

To update the actual data stored in Solr, you need to invoke the following method:

SolrBatch.process_all

We’re using openwfe to schedule this job to run every few minutes, but any scheduler should work.

This method updates records in bulk, issuing a single commit when records are updated. The current algorithm updates up to 5,000 records per model in a single call to this method. The way this works is each call to SolrBatch.process_all has a default batch size of 500. Each call will attempt up to 10 iterations, w/ each iteration updating up to 500 records.

If you want to change the batch size, provide your own implementation of SolrBatch.process_all

Authors

Michael Bryzek mbryzek<at>alum.mit.edu

Release Information

Released under the MIT license.