Class: CassandraMapper::Index

Inherits:
Object
  • Object
show all
Defined in:
lib/cassandra_mapper/indexing.rb

Overview

The fundamental implementation of an index in Cassandra. Once installed into the class to be indexed, the CassandraMapper::Index maintains index values for all instances of the indexed class as those instances are written out t the database.

For any given instance of an indexed class, CassandraMapper::Index will update the index information based on the following criteria:

  • The class being indexed should be provided through indexed_class. The index uses an observer under the hood to track state changes per instance, and therefore requires the indexed_class to be provided to hook into the observer/callback machinery. Additionally, the index needs to know the class to instantiate when reading objects out of the index.

  • The column family to contain the indexing data is specified with the column_family attribute. CassandraMapper::Index handles writes/removes to that column family directly; there is no need for a CassandraMapper::Base model fronting the column family.

  • The actual indexed value is determined by invoking the method specified in the index’s source attribute on the object written to the database. If a class should have an index on its :foo attribute, then the index object should have source set to :foo. This determines the row key for the index.

  • Entries can be sorted within the index, provided an identifier is available per object that is sensibly sortable. The indexed_identifier attribute specifies the method to call to provide that sortable identifier, which will correspond to the column named used within the index row for the given object. The indexed_identifier defaults to :key, and does not need to be changed unless you have some criteria for sorting entries within the index. Like source, the indexed_identifier identifies a method on the object being saved, not a method on the index object itself.

  • The name identifies the name of the index. This ultimately must match up to the name of an attribute on objects being indexed that holds the instance index state information, in an instance of CassandraMapper::Index::State. Without this, index operations will fail because indexing of an object requires tracking state changes from one save to the next (to determine at save time in the case of an update whether the index needs to be changed and consequently requires a delete and a write).

Say we have the following model class:

class ToBeIndexed < CassandraMapper::Base
  column_family :ToBeIndexed
  maps :key, :type => :simple_uuid

  # We'll be indexing this attribute.
  maps :data

  # and within the index, we'll sort by create date from this attribute.
  maps :created_at, :type => :timestamp, :default => :from_type

  # we'll need this to match up with the :name attribute, as described above.
  def data_index
    @data_index ||= CassandraMapper::Index::State.new
  end

  # we'll use this to generate the sortable identifiers; it'll output
  # a string like "2010-06-02T09:45:21-04:00_47118d04-6e4e-11df-911a-e141fbb809ab".
  # It should be unique to each indexed object, as it includes the object's key.
  # But it is structured so it is effectively sortable according to create timestamp.
  def timestamped_key
    "#{created_at.to_s}_#{key}"
  end
end

We can index this class using the Indexes column family to hold index data.

index = CassandraMapper::Index.new(:indexed_identifier => :timestamped_key,
                                   :source             => :data,
                                   :name               => :data_index,
                                   :indexed_class      => :to_be_indexed,
                                   :column_family      => :Indexes)
# activate it to install the observer and start indexing.
index.activate!

Then supposing we ran this code:

# supposing key 47118d04-6e4e-11df-911a-e141fbb809ab is generated
ToBeIndexed.new(:data => 'this data').save
sleep 1
# say that key 5a7e65fa-6e4f-11df-9554-d05c3d9715f7 is generated
ToBeIndexed.new(:data => 'that data').save
sleep 1
# and finally say key gets 68985128-6e4f-11df-8e08-093a2b8b1253
ToBeIndexed.new(:data => 'this data').save

The resulting index structure in the Indexes column family would look like:

'this data': {
    '2010-06-02T10:01:00-04:00_47118d04-6e4e-11df-911a-e141fbb809ab': '47118d04-6e4e-11df-911a-e141fbb809ab',
    '2010-06-02T10:01:02-04:00_68985128-6e4f-11df-8e08-093a2b8b1253': '68985128-6e4f-11df-8e08-093a2b8b1253'
},
'that data': {
    '2010-06-02T10:01:01-04:00_5a7e65fa-6e4f-11df-9554-d05c3d9715f7': '5a7e65fa-6e4f-11df-9554-d05c3d9715f7'
}

Thus, the Indexes column family could be used to retrieve ToBeIndexed instances that have particular values for :data, and retrieve those instances sorted by create timestamp (thanks to the sortable column names).

Ultimately, the structure that goes to the index column family for an instance of an indexed class would look like this (relative to the index attributes and the instance being indexed):

:source : {
    :indexed_identifier : :key
}

Defined Under Namespace

Classes: Observer, State

Constant Summary collapse

ATTRS =
[:source, :indexed_class, :column_family, :name, :indexed_identifier]
DEFAULTS =
{:indexed_identifier => :key}

Instance Method Summary collapse

Constructor Details

#initialize(options = {}) ⇒ Index

Returns a new instance of Index.



266
267
268
269
270
271
272
# File 'lib/cassandra_mapper/indexing.rb', line 266

def initialize(options={})
  opts = DEFAULTS.merge(options)
  ATTRS.each do |attrib|
    value = opts[attrib]
    send(:"#{attrib.to_s}=", value) if not value.nil?
  end
end

Instance Method Details

#_multi_get(values, options) ⇒ Object



412
413
414
415
416
417
418
419
420
421
# File 'lib/cassandra_mapper/indexing.rb', line 412

def _multi_get(values, options)
  result = Cassandra::OrderedHash.new
  indexes = indexed_class.connection.multi_get(column_family, values, options)
  if indexes
    indexes.values.each do |index|
      result.merge!(index)
    end
  end
  result
end

#_single_get(value, options) ⇒ Object



408
409
410
# File 'lib/cassandra_mapper/indexing.rb', line 408

def _single_get(value, options)
  indexed_class.connection.get(column_family, value, options)
end

#activate!Object

Creates the necessary observer for the class to be indexed and thus activates the callbacks for index management.



355
356
357
358
# File 'lib/cassandra_mapper/indexing.rb', line 355

def activate!
  @observer = Class.new(Observer)
  @observer.activate!(self)
end

#create(instance) ⇒ Object

If the value to be indexed is non-nil, performs an insert into the appropriate column family of the index structure for the instance provided. Also updates the state information at the index’s name on instance to reflect the latest source and indexed identifier values.

This is typically managed under the hood by observer callbacks during the instance lifecycle, but you could invoke it directly if you need to force certain index values to be present.



310
311
312
313
314
315
316
317
318
319
320
# File 'lib/cassandra_mapper/indexing.rb', line 310

def create(instance)
  index_key = source_for(instance)
  if not index_key.nil?
    column = indexed_identifier_for(instance)
    instance.connection.insert(column_family, index_key, {column => instance.key})
    state = state_for(instance)
    state.source_value = index_key
    state.identifier_value = column
  end
  instance
end

#get(values, options = {}) ⇒ Object

Retrieve a hash of indexed identifier to row key mappings from the index for all indexed values. The values may be an array of indexed values to check, or a single such value. The result set is collapsed such that it cannot be determined which result corresponds to which index. Additionally, if a particular row key is present in multiple indexes, it’ll be redundantly represented here (as redundant values in the result hash).

The options are passed directly to the underlying Cassandra get/multi_get invocations, and can be used to control paging through results, result set size limits, etc.



369
370
371
372
373
374
375
376
377
378
379
380
# File 'lib/cassandra_mapper/indexing.rb', line 369

def get(values, options={})
  case values
    when Array
      if values.size == 1
        _single_get(values[0], options)
      else
        _multi_get(values, options)
      end
    else
      _single_get(values, options)
  end
end

#indexed_identifier_for(instance) ⇒ Object

Returns the “indexed identifier” (the sort-friendly column name) for instance based on the method specified in the receiver’s indexed_identifier attribute.

This could be overridden to have more sophisticated sort logic within an index for a particular index object.



298
299
300
# File 'lib/cassandra_mapper/indexing.rb', line 298

def indexed_identifier_for(instance)
  instance.send(indexed_identifier)
end

#keys(values, options = {}) ⇒ Object

Retrieve the row keys for objects that have the indexed values specified in values. The handling of values and options is done by the CassandraMapper::Index#get method, and row keys from the result set are collapsed into a unique list matching the original sort order.

The resulting list could be passed to a find call or manipulated in some other delightful fashion.



389
390
391
# File 'lib/cassandra_mapper/indexing.rb', line 389

def keys(values, options={})
  get(values, options).values.uniq
end

#objects(values, options = {}) ⇒ Object

Retrieve the objects that have the indexed values specified in values. The operations are analogous to CassandraMapper::Index#keys, except that a find call is made on the receiver’s indexed_class.

If you are potentially dealing with large sets of objects, consider using the :start, :finish, and :count options supported by the underlying Cassandra#get and Cassandra#multi_get functionality.



400
401
402
403
404
405
406
# File 'lib/cassandra_mapper/indexing.rb', line 400

def objects(values, options={})
  if ids = keys(values, options) and ids.size > 0
    indexed_class.find(ids, {:allow_missing => true})
  else
    []
  end
end

#remove(instance) ⇒ Object

Given non-nil values in the instance’s index state for the index’s name, performs a :remove against the appropriate column family to remove that old state from the index. Also clears the index state object for the instance.

Like :create, this is intended to be managed automatically during the instance lifecycle, but you could invoke it directly if necessary. In this case, take care to note that the remove acts against the index state object at name on instance, not against the current source/identifier values.



330
331
332
333
334
335
336
337
338
# File 'lib/cassandra_mapper/indexing.rb', line 330

def remove(instance)
  state = state_for(instance)
  unless state.source_value.nil? or state.identifier_value.nil?
    instance.connection.remove(column_family, state.source_value, state.identifier_value)
    state.source_value = nil
    state.identifier_value = nil
  end
  instance
end

#source_for(instance) ⇒ Object

Returns the “source” value (the index row key) for instance based on the method specified in the receiver’s source attribute.

This could be overridden to have more sophisticated index row key generation techniques applied for a particular index.



289
290
291
# File 'lib/cassandra_mapper/indexing.rb', line 289

def source_for(instance)
  instance.send(source)
end

#state_for(instance) ⇒ Object

Returns the CassandraMapper::Index::State instance pertaining to the receiver on instance, determined by the receiver’s name attribute.

The instance is expected to implement that interface, ensuring that an accessor with name matching index’s name returns an object conforming to the state object interface.



280
281
282
# File 'lib/cassandra_mapper/indexing.rb', line 280

def state_for(instance)
  instance.send(name)
end

#update(instance) ⇒ Object

If the source or indexed identifier values are found to have changed on instance (current values compared to the state preserved in the index state object at the index’s name on instance), performs a :remove followed by a :create to keep the index up to date.



344
345
346
347
348
349
350
351
# File 'lib/cassandra_mapper/indexing.rb', line 344

def update(instance)
  state = state_for(instance)
  if state.source_value != source_for(instance) or state.identifier_value != indexed_identifier_for(instance)
    remove(instance)
    create(instance)
  end
  instance
end