Mongo Delta

Coordinated transfer between MongoDB clusters

Mongo Delta is a command line tool that tails a MongoDB replica set's oplog (using mongoriver) and based on a configured set of outlets transfers documents to other MongoDB instances.

Installation

Install from Rubygems as:

$ gem install mongo_delta

Or build from source by:

$ gem build mongo_delta.gemspec

And then install the built gem.

Configuration

Mongo Delta requires a configuration where you set up your source, various targets and outlets. This can be stored in a YAML file or in the source database.

Here's an example:

db: mongo_delta
service: mongo_delta

source: mongodb://mongorsa1:27017,mongorsa2:27017

targets:
  archive: mongodb://mongoarch:27017

outlets:
  event_archiver:
    outlet: Replicator
    target: archive
    db: db_name
    collection: events

The db and service options are optional and do the same as their command line counterparts. The default for both is 'mongo_delta'. This tells Mongo Delta where to persist the optime which tracks the point of time upto which the oplog has been processed. The service option makes it possible to run multiple Mongo Delta processes using the same source.

The source is where Mongo Delta is going to tail the oplog. Under targets several target connections can be listed. Use MongoDB URIs for both options.

Finally, list outlets which will handle the incoming data and send them out another way. Configure each outlet with the following options:

  • outlet: name of one of the outlet implementations (see below)
  • target: name of one of the targets
  • db and collection: specify the namespace for which the outlet applies
  • target_db and target_collection: optional, send data at target to a different db and collection
  • some outlets can have further options

Storing configuration in the source database

You can store this configuration in the source database. Use the --source command line option and Mongo Delta will assume that the configuration is located in the config collection of the mongo_delta database with _id: 'mongo_delta'. The database and the service ID can be overridden with the --db and --service options respectively.

Example:

$ mongo mongo_delta
rs0:PRIMARY> db.config.save({
... _id: 'mongo_delta',
... outlets: {
...   event_archiver: {
...     outlet: 'Replicator',
...     target: 'live',
...     db: 'sourcedb',
...     collection: 'events',
...     target_db: 'archive'
...   }
... },
... targets: {live: 'mongodb://localhost:27017'}
... })
$ mongo_delta --source mongodb://localhost:27017
2013-06-10 21:24:29 - INFO: Registering event_archiver Replicator outlet for cartman.events
2013-06-10 21:24:29 - INFO: Starting stream

Usage

mongo_delta --config path/to/config.yml [options]

or if the configuration is stored in the source database:

mongo_delta --source mongodb://mongorsa1:27017,mongorsa2:27017 [options]

Run mongo_delta --help for more options.

Outlets

Replicator

This outlet simply repeats insert, remove and update operations on the configured target. You can use this to keep a remote collection in sync with your main MongoDB cluster. Keep in mind that the replication is one-way.

Sharded clusters

Mongo Delta does not have special support for sharded Mongo clusters at this time. It should be possible to run a separate mongo_delta instance against each of the individual backend shard replica sets, but otherwise with the same configuration.

Development

Patches and contributions are welcome! Please fork the project and open a pull request on github, or just report issues.

Mongo Delta assumes the source MongoDB to be a replica set member. You can create a standalone replica set member on your development machine by running mongod with the --replSet rs0 option, and then running the following command in the mongo shell:

rs.initiate({_id: 'rs0', members: [{ _id: 0, host: '127.0.0.1:27017'}]})