AridCache

AridCache makes caching easy and effective. AridCache supports caching on all of your ActiveRecord model named scopes, class and instance methods right out of the box. AridCache keeps caching logic out of your model methods and clarifies your view code by making calls to cached result sets explicit.

AridCache supports caching large, expensive ActiveRecord collections by caching only the model IDs, provides efficient in-memory pagination of your cached collections, and gives you collection counts for free. Non-ActiveRecord collection data is cached unchanged allowing you to cache the results of any expensive operation simply by prepending your method call with cached_.

AridCache simplifies caching by supporting auto-expiring cache keys - as well as common options like :expires_in - and provides methods to help you manage your caches at the global, model class, model instance and per-cache level.

Changes

v1.0.5: Support :raw and :clear options.

Install

Rails 3:

Add the gem to your ‘Gemfile`

gem 'arid_cache'

Then

bundle install

For some reason AridCache is not being included into ActiveRecord, so add the following to an initializer to get around that until I fix it:

AridCache.init_rails

Rails 2:

Add the gem to your config/environment.rb file:

config.gem 'arid_cache'

Then

rake gems:install

Introduction

The name AridCache comes from ActiveRecord ID Cache. It’s also very DRY…get it? :)

Out of the box AridCache supports caching on all your ActiveRecord class and instance methods and named scopes…basically if a class or class instance respond_to? something, you can cache it.

The way you interact with the cache via your model methods is to prepend the method call with cached_. The part of the method call after cached_ serves as the basis for the cache key. For example,

User.cached_count            # cache key is arid-cache-user-count
genre.cached_top_ten_tracks  # cache key is arid-cache-genres/<id>-top_ten_tracks

You can also define caches that use compositions of methods or named scopes, or other complex queries, without having to add a new method to your class. This way you can also create different caches that all use the same method. For example,

# cache key is arid-cache-user-most_active_users
User.cached_most_active_users do
  active.find(:order => 'activity DESC', :limit => 5)
end

ActiveRecord Collections

If the result of your cached_ call is an array of ActiveRecords, AridCache only stores the IDs in the cache (because it’s a bad idea to store records in the cache).

On subsequent calls we call find_all_by_id on the target class passing in the ActiveRecord IDs that were stored in the cache. AridCache will preserve the original ordering of your collection (you can change this using the :order).

The idea here is to cache collections that are expensive to query. Once the cache is loaded, retrieving the cached records from the database simply involves a SELECT * FROM table WHERE id IN (ids, ...).

Consider how long it would take to get the top 10 favorited tracks of all time from a database with a million tracks and 100,000 users. Now compare that to selecting 10 tracks by ID from the track table. The performance gain is huge.

Base Types and Other Collections

Arrays of non-ActiveRecords are stored as-is so you can cache arrays of strings and other types without problems.

Any other objects (including single ActiveRecord objects) are cached and returned as-is.

Example

An example of caching using existing methods on your class:

class User < ActiveRecord::Base
  has_many    :pets
  has_one     :preferences
  named_scope :active, :conditions => [ 'updated_at <= ', 5.minutes.ago ]
end

User.cached_count          # uses the built-in count method
User.cached_active         # only stores the IDs of the active users in the cache
User.cached_active_count   # returns the count of active users directly from the cache

user.cached_pets_count     # only selects the count until the collection is requested
user.cached_pets           # loads the collection and stores the pets IDs in the cache

Defining Your Caches

Dynamically

To dynamically define caches just pass a block to your cached_ calls. Caches can be defined on your classes or class instances. For example,

User.cached_most_active_users do
  active.find(:order => 'activity DESC', :limit => 5)
end

=> [#<User id: 23>, #<User id: 30>, #<User id: 5>, #<User id: 2>, #<User id: 101>]

user.cached_favorite_pets do
  pets.find(:all, :conditions => { 'favorite' => true })
end

=> [#<Pet id: 11>, #<Pet id: 21>, #<Pet id: 3>]

Configuring Caches on your Models

We can clean up our views significantly by configuring caches on our model rather than defining them dynamically and passing options in each time. You configure caches by calling instance_caches(options={}) or class_caches(options={}) with a block and defining your caches inside the block (you don’t need to prepend cached_ when defining these caches because we are not returning results, just storing options).

You can pass a hash of options to instance_caches and class_caches to have those options applied to all caches in the block. The following is a more complex example that also demonstrates nested cached calls.

# app/models/genre.rb
class Genre
  class_caches do
    most_popular do
      popular(:limit => 10, :order => 'popularity DESC')
    end
  end

  instance_caches(:order => 'release_date DESC') do
    highlight_tracks(:include => [:album, :artist]) do
      cached_tracks(:limit => 10, :include => [:album, :artist])
    end
    highlight_artists(:order => nil) do   # override the global :order option
      cached_artists(:limit => 10)
    end
    highlight_albums(:include => :artist) do
      cached_albums(:limit => 3, :include => :artist)
    end
  end
end

# app/controllers/genre_controller.rb
@most_popular = Genre.cached_most_popular
@tracks  = @genre.cached_highlight_tracks
@artists = @genre.cached_highlight_artists
@albums  = @genre.cached_highlight_albums

You can configure your caches in this manner wherever you want, but I think the model is a good place. If you wanted to move all your cache configurations to a file in lib or elsewhere, your calls would look like,

Genre.class_caches do
  ...
end
Genre.instance_caches do
  ...
end

Cache Keys

AridCache cache keys are defined based on the methods you call to interact with the cache. For example:

Album.cached_featured_albums  => cache key is arid-cache-album-featured_albums
album.cached_top_tracks       => cache key is arid-cache-albums/<id>-top_tracks

Caches on model instances can be set to automatically incorporate the ActiveRecord cache_key which includes the updated_at timestamp of that instance, making them auto-expire when the instance is updated.

To incorporate the the cache_key pass :auto_expire => true to your cache method:

album.cached_top_tracks(:auto_expire => true) => cache key like arid-cache-albums/2-20091211120100-top_tracks

Or via the cache configuration:

Album.instance_caches do
  top_tracks(:auto_expire => true)
end

If you need to examine values in the cache yourself you can build the AridCache key by calling arid_cache_key('method') on your object, whether it is a class or instance. Using the examples above we would call,

Album.arid_cache_key('featured_albums') => arid-cache-album-featured_albums
album.arid_cache_key('top_tracks')      => arid-cache-albums/2-top_tracks
album.arid_cache_key('top_tracks', :auto_expire => true) => arid-cache-albums/2-20091211120100-top_tracks

Managing your Caches

Deleting & Expiring Caches

AridCache provides methods to help you clear your caches:

AridCache.clear_caches      => expires all AridCache caches
Model.clear_caches          => expires class and instance-level caches for this model
Model.clear_instance_caches => expires instance-level caches for this model
Model.clear_class_caches    => expires class-level caches for this model

The Model.clear_caches methods are also available on all model instances.

Your cache store needs to support the delete_matched method for the above to work. Currently MemCacheStore and MemoryStore do not.

Alternatively you can pass a :force => true option in your cached_ calls to force a refresh of a particular cache, while still returning the refreshed results. For example:

Album.cached_featured_albums(:force => true)  => returns featured albums
album.cached_top_tracks(:force => true)       => returns top tracks

If you just want to clear a cache without forcing a refresh pass :clear => true. The cached value will be deleted with no unnecessary queries or cache reads being performed. It is safe to pass this option even if there is nothing in the cache yet. The method returns the result of calling delete on your cache object. For example:

Album.cached_featured_albums(:clear => true)  => returns false
Rails.cache.read(Album.arid_cache_key(:featured_albums)) => returns nil

You can pass an :expires_in option to your caches to manage your cache expiry (if your cache store supports this option, which most do).

Album.cached_featured_albums(:expires_in => 1.day)
album.cached_top_tracks(:expires_in => 1.day)

Or via the cache configuration,

Album.instance_caches(:expires_in => 1.day) do
  top_tracks
  featured_albums
end

If you would like to be able to pass more options to your cache store (like :unless_exists, etc), just add them to the AridCache::CacheProxy::OPTIONS_FOR_CACHE class constant, for example

AridCache::CacheProxy::OPTIONS_FOR_CACHE.push(:raw, :unless_exist)

Extras

Cached Counts

AridCache gives you counts for free. When a collection is stored in the cache AridCache stores the count as well so the next time you request the count it just takes a single read from the cache.

To get the count just append _count to your cached_ call. For example, if we have a cache like album.cached_tracks we can get the count by calling,

album.cached_tracks        => returns an array of tracks
album.cached_tracks_count  => returns the count with a single read from the cache

This is also supported for your non-ActiveRecord collections if the collection responds_to?(:count). For example,

album.cached_similar_genres       => returns ['Pop', 'Rock', 'Rockabilly']
album.cached_similar_genres_count => returns 3

Sometimes you may want the collection count without loading and caching the collection itself. AridCache is smart enough that if you only ask for a count it will only query for the count. This is only possible if the return value of your method is a named scope or association proxy (since these are lazy-loaded unlike a call to find()).

In the example above if we only ever call album.cached_tracks_count, only the count will be cached. If we subsequently call album.cached_tracks the collection will be loaded and the IDs cached as per normal.

Other methods for caching counts are provided for us by virtue of ActiveRecord’s built-in methods and named scopes, for example,

Artist.cached_count  # takes advantage of the built-in method Artist.count

Pagination

AridCache supports pagination using WillPaginate. If you are not changing the order of the cached collection the IDs are paginated in memory and only that page is selected from the database - directly from the target table, which is extremely fast.

An advantage of using AridCache is that since we already have the size of the collection in the cache no query is required to set the :total_entries on the WillPaginate::Collection.

To paginate just pass a :page option in your call to cached_. If you don’t pass a value for :per_page AridCache gets the value from Model.per_page, which is what WillPaginate uses.

The supported pagination options are:

:page, :per_page, :total_entries, :finder

Some examples of pagination:

User.cached_active(:page => 1, :per_page => 30)
User.cached_active(:page => 2)                  # uses User.per_page
user.cached_pets(:page => 1)                    # uses Pet.per_page

If you want to paginate using a different ordering, pass an :order option. Because the order is being changed AridCache cannot paginate in memory. Instead, the cached IDs are passed to your Model.paginate method along with any other options and the database will order the collection, apply limits and offsets, etc. Because the number of records the database deals with is limited, this is still much, much faster than ordering over the whole table.

For example, the following queries will work:

user.cached_companies(:page => 1, :per_page => 3, :order => 'name DESC')
user.cached_companies(:page => 1, :per_page => 3, :order => 'name ASC')

By specifying an :order option in our cached call we can get different “views” of the cached collection. I think this a “good thing”. However, you need to be aware that in order to guarantee that the ordering you requested is the same as the order of the initial results (when the cache was primed), we have to order in the database. This results in two queries being executed the first time you query the cache (one to prime it and the other to order and return the results). If no order option is specified, we can skip the second query and do everything in memory.

If you have an expensive cache and don’t want that extra query, just define a new cache with your desired ordering and use that. Make sure that the order of the initial results matches your desired ordering. Building on the example above we could do:

User.instance_caches do
  companies_asc do
    companies(:order => 'name ASC')
  end
  companies_desc do
    companies(:order => 'name DESC')
  end
end
user.cached_companies_asc(:page => 1, :per_page => 3)
user.cached_companies_desc(:page => 1, :per_page => 3)

Limit & Offset

You apply :limit and :offset options in a similar manner to the :page and :per_page options. The limit and offset will be applied in memory and only the resulting subset selected from the target table - unless you specify a new order.

user.cached_pets(:limit => 2, :include => :toys)
user.cached_pets(:limit => 2, :offset => 3, :include => :toys)
genre.cached_top_ten_tracks { cached_tracks(:limit => 10, :order => 'popularity DESC') }

Other Options to find

The supported options to find are:

:conditions, :include, :joins, :limit, :offset, :order,
:select, :readonly, :group, :having, :from, :lock

You can pass options like :include (or any other valid find options) to augment the results of your cached query. Just because all of the options are supported, does not mean it’s a good idea to use them, though. Take a look at your logs to see how AridCache is interacting with the cache and the database if you don’t get the results you expect.

For example, we could call:

User.cached_active(:page => 2, :per_page => 10, :include => :preferences)

To return page two of the active users, with the preferences association eager-loaded for all the users.

Accessing the cached IDs directly

Sometimes you may want to access the cached list of record IDs without instantiating all the records. This can be useful, for example, to determine if a particular track belongs to a user’s favorite tracks. If we have cached the list of favorite tracks, we just need to determine whether the track’s ID appears in the cached list of IDs.

The cached result is a AridCache::CacheProxy::Result and can be accessed by passing the :raw => true option in your cached call. The AridCache::CacheProxy::Result is a type of Struct with methods to return the ids, count and klass of the cached records.

Note that passing the :raw option to your cache store is not supported, because the AridCache option shares the same name. If you really want to get the marshalled result from your cache you will have to use cache.read, manually passing in the AridCache key and :raw option.

Usage example:

user = User.first
user.cached_favorite_tracks  => returns [#<Track:1>, #<Track:2>]
user.cached_favorite_tracks(:raw => true) => returns
    {
      :klass => "Track",  # stored as a string
      :count => 1,
        :ids => [1, 2]
    }
user.cached_favorite_tracks(:raw => true).ids => returns [1, 2]

The cache will be primed if it is empty, so you can be sure that it will always return a AridCache::CacheProxy::Result.

In some circumstances - like when you are querying on a named scope - if you have only requested a count, only the count is computed, which means the ids array is nil. When you call your cached method passing in :raw => true AridCache detects that the ids array has not yet been set, so in this case it will perform a query to seed the ids array before returning the result. This can be seen in the following example:

class User
  named_scope :guests, :conditions => { :account_type => ['guest'] }
end

User.cached_guests_count => returns 4
Rails.cache.read(User.arid_cache_key(:guests)) => returns
  {
    :klass => "User",
    :count => 4,
      :ids => nil                 # notice the ids array is nil in the cache
  }
User.cached_guests(:raw => true) => returns
  {
    :klass => "User",
    :count => 4,
      :ids => [2, 235, 236, 237]  # the ids array is seeded before returning
  }

Efficiency

  • AridCache intercepts calls to cached_ methods using method_missing then defines those methods on your models as they are called, so they bypass method missing on subsequent calls.

  • In-memory pagination of cached collections speeds up your queries. See Pagination.

  • If you only request a count AridCache will only select the count. See Cached Counts.

  • If a collection has already been loaded, you get the count for free. See Cached Counts.

Compatibility

Tested on Ruby 1.8.6, 1.8.7, REE 1.8.7 and 1.9.1. Tested in Rails 2.3.* and Rails 3

For Ruby < 1.8.7 you probably want to include the following to extend the Array class with a count method. Otherwise your cached_<key>_count calls probably won’t work:

Array.class_eval { alias count size }

For Rails 3 for some reason AridCache is not being included into ActiveRecord, so add the following to an initializer to get around that:

AridCache.init_rails

Resources & Metrics

Known Issues

  1. Caches that contains duplicate records will only return unique records on subsequent calls. This is because of the way find works when selecting multiple ids. For example, if your query returns [#<User id: 1>, #<User id: 1>, #<User id: 1>], the IDs are cached as [1,1,1]. On the next call to the cache we load the IDs using User.find_all_by_id([1,1,1]) which returns [#<User id: 1>], not [#<User id: 1>, #<User id: 1>, #<User id: 1>] as you might have expected.

  2. You can’t cache polymorphic arrays e.g. [#<User id: 1>, #<Pet id: 5>] because it expects all ActiveRecords to be of the same class. We could accept a :polymorphic => true option but I don’t think this is a great idea because instantiating all the records would result in a lot of queries to the individual tables.

Contributors

Contributions are welcome! Please,

  • Fork the project.

  • Make your feature addition or bug fix.

  • Add tests for it (this is important so I don’t break it in a future release).

  • Commit (don’t mess with the Rakefile, version, or history).

  • Send me a pull request.

Thank-you to these contributors to AridCache:

Copyright © 2009 Karl Varga. See LICENSE for details.