Class: CassandraMapper::Index
- Defined in:
- lib/cassandra_mapper/indexing.rb
Overview
The fundamental implementation of an index in Cassandra. Once installed into the class to be indexed, the CassandraMapper::Index maintains index values for all instances of the indexed class as those instances are written out t the database.
For any given instance of an indexed class, CassandraMapper::Index will update the index information based on the following criteria:
-
The class being indexed should be provided through indexed_class. The index uses an observer under the hood to track state changes per instance, and therefore requires the indexed_class to be provided to hook into the observer/callback machinery. Additionally, the index needs to know the class to instantiate when reading objects out of the index.
-
The column family to contain the indexing data is specified with the column_family attribute. CassandraMapper::Index handles writes/removes to that column family directly; there is no need for a CassandraMapper::Base model fronting the column family.
-
The actual indexed value is determined by invoking the method specified in the index’s source attribute on the object written to the database. If a class should have an index on its
:foo
attribute, then the index object should have source set to:foo
. This determines the row key for the index. -
Entries can be sorted within the index, provided an identifier is available per object that is sensibly sortable. The indexed_identifier attribute specifies the method to call to provide that sortable identifier, which will correspond to the column named used within the index row for the given object. The indexed_identifier defaults to
:key
, and does not need to be changed unless you have some criteria for sorting entries within the index. Like source, the indexed_identifier identifies a method on the object being saved, not a method on the index object itself. -
The name identifies the name of the index. This ultimately must match up to the name of an attribute on objects being indexed that holds the instance index state information, in an instance of CassandraMapper::Index::State. Without this, index operations will fail because indexing of an object requires tracking state changes from one save to the next (to determine at save time in the case of an update whether the index needs to be changed and consequently requires a delete and a write).
Say we have the following model class:
class ToBeIndexed < CassandraMapper::Base
column_family :ToBeIndexed
maps :key, :type => :simple_uuid
# We'll be indexing this attribute.
maps :data
# and within the index, we'll sort by create date from this attribute.
maps :created_at, :type => :timestamp, :default => :from_type
# we'll need this to match up with the :name attribute, as described above.
def data_index
@data_index ||= CassandraMapper::Index::State.new
end
# we'll use this to generate the sortable identifiers; it'll output
# a string like "2010-06-02T09:45:21-04:00_47118d04-6e4e-11df-911a-e141fbb809ab".
# It should be unique to each indexed object, as it includes the object's key.
# But it is structured so it is effectively sortable according to create timestamp.
def
"#{created_at.to_s}_#{key}"
end
end
We can index this class using the Indexes
column family to hold index data.
index = CassandraMapper::Index.new(:indexed_identifier => :timestamped_key,
:source => :data,
:name => :data_index,
:indexed_class => :to_be_indexed,
:column_family => :Indexes)
# activate it to install the observer and start indexing.
index.activate!
Then supposing we ran this code:
# supposing key 47118d04-6e4e-11df-911a-e141fbb809ab is generated
ToBeIndexed.new(:data => 'this data').save
sleep 1
# say that key 5a7e65fa-6e4f-11df-9554-d05c3d9715f7 is generated
ToBeIndexed.new(:data => 'that data').save
sleep 1
# and finally say key gets 68985128-6e4f-11df-8e08-093a2b8b1253
ToBeIndexed.new(:data => 'this data').save
The resulting index structure in the Indexes
column family would look like:
'this data': {
'2010-06-02T10:01:00-04:00_47118d04-6e4e-11df-911a-e141fbb809ab': '47118d04-6e4e-11df-911a-e141fbb809ab',
'2010-06-02T10:01:02-04:00_68985128-6e4f-11df-8e08-093a2b8b1253': '68985128-6e4f-11df-8e08-093a2b8b1253'
},
'that data': {
'2010-06-02T10:01:01-04:00_5a7e65fa-6e4f-11df-9554-d05c3d9715f7': '5a7e65fa-6e4f-11df-9554-d05c3d9715f7'
}
Thus, the Indexes
column family could be used to retrieve ToBeIndexed
instances that have particular values for :data
, and retrieve those instances sorted by create timestamp (thanks to the sortable column names).
Ultimately, the structure that goes to the index column family for an instance of an indexed class would look like this (relative to the index attributes and the instance being indexed):
:source : {
:indexed_identifier : :key
}
Defined Under Namespace
Constant Summary collapse
- ATTRS =
[:source, :indexed_class, :column_family, :name, :indexed_identifier]
- DEFAULTS =
{:indexed_identifier => :key}
Instance Method Summary collapse
- #_multi_get(values, options) ⇒ Object
- #_single_get(value, options) ⇒ Object
-
#activate! ⇒ Object
Creates the necessary observer for the class to be indexed and thus activates the callbacks for index management.
-
#create(instance) ⇒ Object
If the value to be indexed is non-nil, performs an insert into the appropriate column family of the index structure for the instance provided.
-
#get(values, options = {}) ⇒ Object
Retrieve a hash of indexed identifier to row key mappings from the index for all indexed values.
-
#indexed_identifier_for(instance) ⇒ Object
Returns the “indexed identifier” (the sort-friendly column name) for instance based on the method specified in the receiver’s indexed_identifier attribute.
-
#initialize(options = {}) ⇒ Index
constructor
A new instance of Index.
-
#keys(values, options = {}) ⇒ Object
Retrieve the row keys for objects that have the indexed values specified in values.
-
#objects(values, options = {}) ⇒ Object
Retrieve the objects that have the indexed values specified in values.
-
#remove(instance) ⇒ Object
Given non-nil values in the instance’s index state for the index’s name, performs a
:remove
against the appropriate column family to remove that old state from the index. -
#source_for(instance) ⇒ Object
Returns the “source” value (the index row key) for instance based on the method specified in the receiver’s source attribute.
-
#state_for(instance) ⇒ Object
Returns the CassandraMapper::Index::State instance pertaining to the receiver on instance, determined by the receiver’s name attribute.
-
#update(instance) ⇒ Object
If the source or indexed identifier values are found to have changed on instance (current values compared to the state preserved in the index state object at the index’s name on instance), performs a
:remove
followed by a:create
to keep the index up to date.
Constructor Details
Instance Method Details
#_multi_get(values, options) ⇒ Object
412 413 414 415 416 417 418 419 420 421 |
# File 'lib/cassandra_mapper/indexing.rb', line 412 def _multi_get(values, ) result = Cassandra::OrderedHash.new indexes = indexed_class.connection.multi_get(column_family, values, ) if indexes indexes.values.each do |index| result.merge!(index) end end result end |
#_single_get(value, options) ⇒ Object
408 409 410 |
# File 'lib/cassandra_mapper/indexing.rb', line 408 def _single_get(value, ) indexed_class.connection.get(column_family, value, ) end |
#activate! ⇒ Object
Creates the necessary observer for the class to be indexed and thus activates the callbacks for index management.
355 356 357 358 |
# File 'lib/cassandra_mapper/indexing.rb', line 355 def activate! @observer = Class.new(Observer) @observer.activate!(self) end |
#create(instance) ⇒ Object
If the value to be indexed is non-nil, performs an insert into the appropriate column family of the index structure for the instance provided. Also updates the state information at the index’s name on instance to reflect the latest source and indexed identifier values.
This is typically managed under the hood by observer callbacks during the instance lifecycle, but you could invoke it directly if you need to force certain index values to be present.
310 311 312 313 314 315 316 317 318 319 320 |
# File 'lib/cassandra_mapper/indexing.rb', line 310 def create(instance) index_key = source_for(instance) if not index_key.nil? column = indexed_identifier_for(instance) instance.connection.insert(column_family, index_key, {column => instance.key}) state = state_for(instance) state.source_value = index_key state.identifier_value = column end instance end |
#get(values, options = {}) ⇒ Object
Retrieve a hash of indexed identifier to row key mappings from the index for all indexed values. The values may be an array of indexed values to check, or a single such value. The result set is collapsed such that it cannot be determined which result corresponds to which index. Additionally, if a particular row key is present in multiple indexes, it’ll be redundantly represented here (as redundant values in the result hash).
The options are passed directly to the underlying Cassandra get
/multi_get
invocations, and can be used to control paging through results, result set size limits, etc.
369 370 371 372 373 374 375 376 377 378 379 380 |
# File 'lib/cassandra_mapper/indexing.rb', line 369 def get(values, ={}) case values when Array if values.size == 1 _single_get(values[0], ) else _multi_get(values, ) end else _single_get(values, ) end end |
#indexed_identifier_for(instance) ⇒ Object
Returns the “indexed identifier” (the sort-friendly column name) for instance based on the method specified in the receiver’s indexed_identifier attribute.
This could be overridden to have more sophisticated sort logic within an index for a particular index object.
298 299 300 |
# File 'lib/cassandra_mapper/indexing.rb', line 298 def indexed_identifier_for(instance) instance.send(indexed_identifier) end |
#keys(values, options = {}) ⇒ Object
Retrieve the row keys for objects that have the indexed values specified in values. The handling of values and options is done by the CassandraMapper::Index#get method, and row keys from the result set are collapsed into a unique list matching the original sort order.
The resulting list could be passed to a find call or manipulated in some other delightful fashion.
389 390 391 |
# File 'lib/cassandra_mapper/indexing.rb', line 389 def keys(values, ={}) get(values, ).values.uniq end |
#objects(values, options = {}) ⇒ Object
Retrieve the objects that have the indexed values specified in values. The operations are analogous to CassandraMapper::Index#keys, except that a find
call is made on the receiver’s indexed_class.
If you are potentially dealing with large sets of objects, consider using the :start
, :finish
, and :count
options supported by the underlying Cassandra#get and Cassandra#multi_get functionality.
400 401 402 403 404 405 406 |
# File 'lib/cassandra_mapper/indexing.rb', line 400 def objects(values, ={}) if ids = keys(values, ) and ids.size > 0 indexed_class.find(ids, {:allow_missing => true}) else [] end end |
#remove(instance) ⇒ Object
Given non-nil values in the instance’s index state for the index’s name, performs a :remove
against the appropriate column family to remove that old state from the index. Also clears the index state object for the instance.
Like :create, this is intended to be managed automatically during the instance lifecycle, but you could invoke it directly if necessary. In this case, take care to note that the remove acts against the index state object at name on instance, not against the current source/identifier values.
330 331 332 333 334 335 336 337 338 |
# File 'lib/cassandra_mapper/indexing.rb', line 330 def remove(instance) state = state_for(instance) unless state.source_value.nil? or state.identifier_value.nil? instance.connection.remove(column_family, state.source_value, state.identifier_value) state.source_value = nil state.identifier_value = nil end instance end |
#source_for(instance) ⇒ Object
Returns the “source” value (the index row key) for instance based on the method specified in the receiver’s source attribute.
This could be overridden to have more sophisticated index row key generation techniques applied for a particular index.
289 290 291 |
# File 'lib/cassandra_mapper/indexing.rb', line 289 def source_for(instance) instance.send(source) end |
#state_for(instance) ⇒ Object
Returns the CassandraMapper::Index::State instance pertaining to the receiver on instance, determined by the receiver’s name attribute.
The instance is expected to implement that interface, ensuring that an accessor with name matching index’s name returns an object conforming to the state object interface.
280 281 282 |
# File 'lib/cassandra_mapper/indexing.rb', line 280 def state_for(instance) instance.send(name) end |
#update(instance) ⇒ Object
If the source or indexed identifier values are found to have changed on instance (current values compared to the state preserved in the index state object at the index’s name on instance), performs a :remove
followed by a :create
to keep the index up to date.
344 345 346 347 348 349 350 351 |
# File 'lib/cassandra_mapper/indexing.rb', line 344 def update(instance) state = state_for(instance) if state.source_value != source_for(instance) or state.identifier_value != indexed_identifier_for(instance) remove(instance) create(instance) end instance end |