ocean-dynamo
OceanDynamo is a massively scalable Amazon DynamoDB near drop-in replacement for ActiveRecord.
OceanDynamo requires Ruby 2.0 and Ruby on Rails 4.0.0 or later. Rails 5.0.x is supported, but 5.1 will require a couple of changes.
src="https://badge.fury.io/rb/ocean-dynamo.png" alt="Gem Version" />[http://badge.fury.io/rb/ocean-dynamo]
Features
As one important use case for OceanDynamo is to facilitate the conversion of SQL databases to no-SQL DynamoDB databases, it is important that the syntax and semantics of OceanDynamo are as close as possible to those of ActiveRecord. This includes callbacks, exceptions and method chaining semantics. OceanDynamo follows this pattern closely and is of course based on ActiveModel.
The attribute and persistence layer of OceanDynamo is modeled on that of ActiveRecord:
there's save
, save!
, create
, update
, update!
, update_attributes
, find_each
,
destroy_all
, delete_all
, read_attribute
, write_attribute
and all the other
methods you're used to. The design goal is always to implement as much of the ActiveRecord
interface as possible, without compromising scalability. This makes the task of switching
from SQL to no-SQL much easier.
OceanDynamo uses only primary indices to retrieve related table items and collections, which means it will scale without limits.
OceanDynamo is fully usable as an ActiveModel and can be used by Rails controllers. Thanks to its structural similarity to ActiveRecord, OceanDynamo works with FactoryBot.
OceanDynamo supports optimistic record locking via a combination of DynamoDB conditional writes and atomic updates. This is activated by default.
Current State
- Secondary indices are now fully supported! See below for more information.
- Version 2 of the AWS Ruby SDK is now used.
- Work begun on association proxies, etc.
Future milestones
- Direct support for the DynamoDB JSON attribute types for arrays and hashes
- Collection proxies, to implement ActiveRecord-style method chaining, e.g.:
blog_entry.comments.build(body: "Cool!").save!
- The
has_and_belongs_to_many
assocation.
Current use
OceanDynamo is used as a central component in Ocean, a Rails framework and development, testing, and production pipeline for creating massively scalable HATEOAS microservice SOAs in the cloud.
However, OceanDynamo can of course also be used stand-alone.
Documentation
- Ocean-dynamo gem on Rubygems: https://rubygems.org/gems/ocean-dynamo
- Ocean-dynamo gem API: http://rubydoc.info/gems/ocean-dynamo/frames
- Ocean-dynamo source: https://gitlab.com/ocean-dev/ocean-dynamo
- AWS DynamoDB Ruby SDK v2: http://docs.aws.amazon.com/sdkforruby/api/Aws/DynamoDB.html
Contributing
Contributions are welcome. Fork in the usual way. OceanDynamo is developed using TDD: the specs are extensive and test coverage is very near to 100 percent. Pull requests will not be considered unless all tests pass and coverage is equally high or higher. All contributed code must therefore also be exhaustively tested.
Examples
Basic syntax
The following example shows the basic syntax for declaring a DynamoDB-based schema.
class AsyncJob < OceanDynamo::Table
dynamo_schema(:guid) do
attribute :credentials, :string
attribute :token, :string, default: Proc { SecureRandom.uuid }
attribute :steps, :serialized, default: []
attribute :max_seconds_in_queue, :integer, default: 1.day
attribute :default_poison_limit, :integer, default: 5
attribute :default_step_time, :integer, default: 30
attribute :started_at, :datetime
attribute :last_completed_step, :integer
attribute :finished_at, :datetime
attribute :destroy_at, :datetime, local_secondary_index: true
attribute :created_by
attribute :updated_by
attribute :succeeded, :boolean, default: false
attribute :failed, :boolean, default: false
attribute :poison, :boolean, default: false
global_secondary_index :token, projection: :all
end
end
Attributes
Each attribute has a name, a type (:string
, :integer
, :float
, :datetime
, :boolean
,
or :serialized
) where :string
is the default. Each attribute also optionally has a default
value, which can be a Proc. The hash key attribute is by default :id
(overridden as :guid
in
the example above) and is a :string
.
The :string
, :integer
, :float
and :datetime
types can also store sets of their type.
Sets are represented as arrays, may not contain duplicates and may not be empty.
All attributes except the :string
type can take the value nil
. Storing nil
for a string
value will return the empty string, "".
Schema args and options
dynamo_schema
takes args and many options. Here's the full syntax:
dynamo_schema(
table_hash_key = :id, # The name of the hash key attribute
table_range_key = nil, # The name of the range key attribute (or nil)
table_name: compute_table_name, # The basename of the DynamoDB table
table_name_prefix: nil, # A basename prefix string or nil
table_name_suffix: nil, # A basename suffix string or nil
read_capacity_units: 10, # Used only when creating a table
write_capacity_units: 5, # Used only when creating a table
connect: :late, # true, :late, nil/false
create: false, # If true, create the table if nonexistent
locking: :lock_version, # The name of the lock attribute or nil/false
timestamps: [:created_at, :updated_at], # A two-element array of timestamp columns, or nil/false
client: nil, # A DynamoDB client or nil
) do
# Attribute definitions
...
...
end
Tip: you might want to use the following idiom when declaring a dynamo_schema
. It is used everywhere in Ocean:
dynamo_schema(:id, table_name_suffix: Api.basename_suffix, create: true) do
...
end
This will auto-create the DynamoDB table in all environments and also add a
suffix to the table name in order to prevent collisions during tests.
Cf Api.basename_suffix
in the ocean-rails
gem for more information.
has_many and belongs_to
Example
The following example shows how to set up has_many
/ belongs_to
relations:
class Forum < OceanDynamo::Table
dynamo_schema do
attribute :name
attribute :description
end
has_many :topics, dependent: :destroy
end
class Topic < OceanDynamo::Table
dynamo_schema(:guid) do
attribute :title
end
belongs_to :forum
has_many :posts, dependent: :destroy
end
class Post < OceanDynamo::Table
dynamo_schema(:guid) do
attribute :body
end
belongs_to :topic, composite_key: true
end
The only non-standard aspect of the above is composite_key: true, which
is required as the Topic class itself has a belongs_to
relation and thus has
a composite key. This must be declared in the child class as it needs to know
how to retrieve its parent.
Restrictions
Restrictions for belongs_to
tables:
- The hash key must be specified and must not be
:id
. - The range key must not be specified at all.
belongs_to
can be specified only once in each class.belongs_to
must be placed after thedynamo_schema
attribute block.
Restrictions for has_many
tables:
has_many
must be placed after thedynamo_schema
attribute block.
These restrictions allow OceanDynamo to implement the has_many
/ belongs_to
relation in a very efficient and massively scalable way.
Implementation
belongs_to
claims the range key and uses it to store its own id, which normally
would be stored in the hash key attribute. Instead, the hash key attribute holds the
id of the parent. We have thus reversed the roles of these two fields. As a result,
all children store their parent id in the hash key, and their own id in the
range key.
This type of relation is even more efficient than its ActiveRecord counterpart as
it uses only primary indices in both directions of the has_many
/ belongs_to
association. No scans.
Furthermore, since DynamoDB has powerful primary index searches involving substrings and matching, the fact that the range key is a string can be used to implement wildcard matching of additional attributes. This gives, amongst other things, the equivalent of an SQL GROUP BY request, again without requiring any secondary indices.
It's our goal to use a similar technique to implement has_and_belongs_to_many
relations,
which means that secondary indices won't be necessary for the vast majority of
DynamoDB tables. This ultimately means reduced operational costs, as well as
reduced complexity.
Secondary Indices
Local Secondary Indices
Up to five attributes can be declared as local secondary indices, in the following manner:
class Authentication < OceanDynamo::Table
dynamo_schema(:username, :expires_at) do
attribute :token, :string, local_secondary_index: true
attribute :max_age, :integer
attribute :created_at, :datetime
attribute :expires_at, :datetime
attribute :api_user_id, :string
end
end
The items of the above table can be accessed by using the hash key :username
and
the range key :expires_at
. The local_secondary_index
declaration makes it possible
to access items using the same hash key :username
but with :token
as an alternate
range key. Local secondary indices all use the same hash key as the primary index,
substituting another attribute instead as the range key.
NB: The primary index [:username, :expires_at] requires all items to have a unique combination of keys. Secondary indices don't require the range key to be unique for the same hash key. This means that secondary index searches always will return a collection.
Local secondary indices are queried through find_local_each
and find_local
.
They take the same arguments; the former yields to a block for each item,
the other returns all items in an array.
The following finds all Authentications where :username
is "joe" and :token
is "quux":
Authentication.find_local(:username, "joe", :token, "=", "quux")
This retrieves all Authentications belonging to Joe, sorted on :token
:
Authentication.find_local(:username, "joe", :token, ">=", "0")
The same thing but with the only the item with the highest token value:
Authentication.find_local(:username, "joe", :token, ">=", "0",
scan_index_forward: false, limit: 1)
Global Secondary Indices
Global secondary indices are declared after all attributes, but still within the do
block:
class Authentication < OceanDynamo::Table
dynamo_schema(:username, :expires_at) do
attribute :token, :string, local_secondary_index: true
attribute :max_age, :integer
attribute :created_at, :datetime
attribute :expires_at, :datetime
attribute :api_user_id, :string
global_secondary_index :token, projection: :all
global_secondary_index :api_user_id, :expires_at, read_capacity_units: 100
global_secondary_index :expires_at, write_capacity_units: 50
end
end
Each global_secondary_index
clause (there can be a maximum of 5 per table) takes
the following arguments:
hash_value
(required),range_value
(optional),:projection
(default:keys_only
,:all
for all attributes):read_capacity_units
(defaults to the table's read capacity, normally 10):write_capacity_units
(default to the table's write capacity, normally 5)
Global secondary indices are queried through find_global_each
and find_global
.
They take the same arguments; the former yields to a block for each item,
the other returns all items in an array.
The following finds all Authentications whose :token
is "quux"
:
Authentication.find_global(:token, "quux")
This retrieves all Authentications belonging to the user with the ID "dfstw-ruyhdf-ewijf"
,
sorted in ascending order of the :expires_at
attribute:
Authentication.find_global(:api_user_id, "dfstw-ruyhdf-ewijf",
:expires_at, ">=", 0)
To get the highest :expires_at
record:
Authentication.find_global(:api_user_id, "dfstw-ruyhdf-ewijf",
:expires_at, ">=", 0,
scan_index_forward: false, limit: 1)
Installation
gem install ocean-dynamo
Ocean-dynamo relies on Aws.config to fetch configuration data. In Ocean, we use an init file similar to this one:
cfg = if ENV['AWS_ACCESS_KEY_ID'].present? && ENV['AWS_SECRET_ACCESS_KEY'].present?
{
region: ENV['AWS_REGION'],
credentials: Aws::Credentials.new(ENV['AWS_ACCESS_KEY_ID'],
ENV['AWS_SECRET_ACCESS_KEY'])
}
elsif ENV['AWS_DYNAMODB_ENDPOINT'].present?
{
region: "eu-west-1", # Doesn't matter in this context
credentials: Aws::Credentials.new('not', 'used'),
endpoint: ENV['AWS_DYNAMODB_ENDPOINT']
}
else
{
region: ENV['AWS_REGION'],
credentials: Aws::InstanceProfileCredentials.new
}
end
Aws.config.update(cfg)
As you can see, this relies on ENV variables to choose from three different setups:
- If you're using discrete AWS credentials (not recommended), set AWS_REGION, AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY.
- If you're using localstack, provide AWS_DYNAMODB_ENDPOINT (http://localhost:4569).
- If you're using EC2 or ECS instance profile credentials, set only AWS_REGION.
You can also supply a DynamoDB client for each table by using the keyword +client:+. If +nil+, the default config in +Aws.config+ will be used.
Running the specs
We're using localstack (https://github.com/localstack/localstack) for development
and testing. Install (we use a Docker container) and run it. If you're using
ocean-dynamo with the Ocean Rails context, you can install and start it by
following the instructions in the ocean
repo.
With localstack running, you should now be able to do
bundle exec rspec
All tests should pass.
Rails console
The Rails console is available from the built-in dummy application:
cd spec/dummy
rails console
This will, amongst other things, also create the CloudModel table if it doesn't already exist. On Amazon, this will take a little while. With DynamoDB Local, it's practically instant.
When you leave the console, you must navigate back to the top directory (cd ../..) in order to be able to run RSpec again.