ActiveShard - Multi-schema sharding for ActiveRecord
ActiveShard is a library built primarily for sharding in ActiveRecord. It also supports multiple databases with differing schemas. As with the other sharding libraries for ActiveRecord (there are a few), this library represents the best solution to the authors' sharding needs. If you've been unhappy with other options, ActiveShard might be for you.
Design goals
The fundamental purpose of ActiveShard is to provide a framework that allows ActiveRecord to connect to multiple databases with multiple different schemas. All other features are a subset of this framework (sharding, replication, etc).
More framework, less magic
ActiveShard doesn't do much guessing or sleight of hand. Queries are executed against whatever shard is marked as active for the schema used. Instances of models do not remember what shards they belong to.
ActiveShard.with( :main => :db1 ) do
user = User.find( 100 )
end
# this will fail, as no shard is selected at this point -->
user.save!
This is an intentional design decision. Other libraries go to great lengths to provide smarter implementations, but do so at the expense of flexibility.
In contrast to other implementations, ActiveShard does not reopen or monkey-patch any Rails or ActiveRecord classes. It has been the authors' experience that any Rails application large enough to need sharding probably contains a fair amount of customization already. Sharding libraries which hack up core Rails classes often do not play nice with existing code. ActiveShard (hopefully!) does.
Install
Rails 3.x
Add this line to your Gemfile:
gem 'active_shard'
Install bundle:
$ bundle install
Add to config/application.rb, right under "require 'rails/all'":
require 'active_shard/railtie'
Most common usage
config/shards.yml (if used in Rails)
Create a config/shards.yml file with your desired database configuration. Shards.yml contains the following structure (pseudo-code):
<environment>
<schema_name>
<shard_name>
<shard_specification>
<shard_name>
<shard_specification>
<schema_name>
<shard_name>
<shard_specification>
Example shards.yml:
production:
directories:
directory:
adapter: mysql2
database: dir_db
host: localhost
username: root
main:
db1:
adapter: mysql2
host: localhost
database: db1_db
username: root
db2:
adapter: mysql2
host: localhost
database: db2_db
username: root
db3:
adapter: mysql2
host: localhost
database: db3_db
username: root
development:
directories:
directory:
adapter: mysql2
host: localhost
database: dir_db_development
username: root
main:
db1:
adapter: mysql2
host: localhost
database: db1_db_development
username: root
Schema name for models
ActiveRecord models must have a schema name associated with them. Given the shards.yml file specified above, you might see the following models:
# contains user lookup fields and the shard name on which user's primary data resides
class UserShard < ActiveRecord::Base
schema_name :directories
# ...
end
# primary user data
class User < ActiveRecord::Base
schema_name :main
# ...
end
Selecting the active shard(s)
In order to use a model -- such as the ones specified above -- a shard must be selected as the current active shard for each schema used. The easiest way to do this is to pass a block containing the queries to the ActiveShard.with( ... ) method, specifying the active shards in the parameters.
Nested blocks maintain active shard settings for any schemas they do not explicitly set. Example:
ActiveShard.with( :directories => :directory ) do
# We can now use the UserShard model as a shard has been
# selected for the 'directories' schema.
#
user_shard = UserShard.find_by_login( 'xavier' )
# Using the User model here would raise an exception since there
# is no active shard for the schema that User belongs to ('main').
#
# However, if we select a shard for that schema ...
#
ActiveShard.with( :main => user_shard.shard_name ) do
user = User.find( user_shard.user_id ) # <-- this works.
# ActiveShard effectively 'merges' nested shard selections rather
# than replace them. The active shards at this point are:
#
# :directories => :directory
# :main => <user_shard.shard_name>
#
end
# ... the active shard for the :main schema has been de-selected,
# leaving only the :directories => :directory shard as active.
end
To better understand what's happening here, see ActiveShard::Scope.
Migrations
Migrations for each schema must reside in a directory under db/migrate that corresponds to the schema name.
Example:
db/migrate/main
20110810103523_create_users_table.rb
db/migrate/directories
20110810105318_create_user_shards_table.rb
You can run migrations against a shard by passing the shard name into the shards:migrate rake task, like so:
$ rake shards:migrate[db1]
The schema is discovered from the ActiveShard configuration and the proper migrations are executed.
Each shard maintains its own schema_migrations table and can be (must be) migrated independently. This allows you to spin up additional shards in the future by simply adding an entry to your ActiveShard configuration and running migrations against that shard.
Rails Controllers
If you want to send a specified action, or all actions from a controller, to a specific shard, use this syntax:
class ApplicationController < ActionController::Base
around_filter :activate_shards
def activate_shards(&block)
ActiveShard.with( :directories => :directory, :main => :db1 )
end
end
Other similar libraries
There are several, but Octopus (ar-octopus) is the most popular.
Authors
- Brasten Sager ( [email protected] )
- Matt Baker ( [email protected] )
Copyright
Copyright (c) 2011 Dashwire Inc., released under the MIT license.