Couch Tap

Utility to listen to a CouchDB changes feed and automatically insert, update, or delete rows into a relational database from matching key-value conditions of incoming documents.

While CouchDB is awesome, business people probably won't be quite as impressed when they want to play around with the data. Regular SQL is generally accepted as being easy to use and much more widely supported by a larger range of comercial tools.

Couch Tap will listen to incoming documents on a CouchDB's changes stream and automatically update rows of RDBMS tables defined in the conversion schema. The changes stream uses a sequence number allowing synchronisation to be started and stopped at will.

Ruby's fast and simple (sequel)[http://sequel.jeremyevans.net/] library is used to provide the connection to the database. This library can also be used for migrations, important for frequently changing schemas.

Couch tap takes a simple two-step approach converting documents to rows. When a change event is received for a matching document definition, each associated row is completely deleted. If the change is anything other than a delete event, the rows will be re-created with the new data. This makes things much easier when trying to deal with multi-level documents (i.e. documents of documents) and one-to-many table relationships.

A Couch Tap Project

Couch Tap requires a configuration or filter definition that will allow incoming document changes to be identified and dealt with.

The following example attempts to outline most of the key features of the DSL.

# The couchdb database from which to request the changes feed
changes "http://user:pass@host:port/invoicing" do

  # Which database should we connect to?
  database "postgres://user:pass@localhost:5432/invoicing"

  # Simple automated copy, each property's value in the matching CouchDB
  # document will copied to the table field with the same name.
  document 'type' => 'User' do
    table :users
  end

  document 'type' => 'Invoice' do

    table :invoices, :key => :invoice_id do

      # Copy columns from fields with different name
      column :updated_at, :updated_on
      column :created_at, :created_on

      # Manually set a value from document or fixed variable
      column :date, doc['date'].to_json
      column :added_at, Time.now

      # Set column values from a block.
      column :total do
        doc['items'].inject(0){ |sum,item| sum + item['total'] }
      end

      # Collections perform special synchronization in order to deal with
      # one to one, or indeed many to many relationships.
      #
      # Rather than attempting a complex syncrhonisation process, the current
      # version of Couch Tap will just DELETE all current entries with a
      # primary key id that matches that of the parent table.
      #
      # The foreign id key is assumed to be name of the parent
      # table in singular form with `_id` appended.
      #
      # Each item provided in the array will be made available in the
      # `#data` method, and index from `#index`.
      # `#document` continues to be the complete source document.
      #
      # Collections can be nested to create highly complex structures.
      #
      collection :groups do
        table :invoice_groups do

          collection :entries do
            table :invoice_entries, :key => :entry_id do
              column :date, data['date']
              column :updated_at, document['updated_at']
            end
          end

        end
      end

      # Collections can also be used on Many to Many relationships.
      collection :label_ids do
        table :invoice_labels do
          column :label_id, data
        end
      end

    end

  end
end

DSL Summary

changes

Defines which CouchDB database should be used to request the changes feed.

After loading the rest of the configuration, the service will connect to the database using Event Machine. As new changes come into the system, they will be managed in the background.

connection

The Sequel URL used to connect to the destination database. Behind the scenes, Couch Tap will check for a table named couchdb_sequence that contains a single row for the current changes sequence id, much like a migration id typically seen in a Rails database.

As changes are received from CouchDB, the current sequence will be updated to match.

document

When a document is received from the changes feed, it will be passed through each document stanza looking for a match. Take the following example:

document :type => 'Invoice' do |doc|
  # ...
end

This will match all documents whose type property is equal to "Invoice". The document itself will be made available as a hash through the doc block variable.

document stanzas may be nested if required to provide further levels of filtering.

table

Each table stanza lets Couch Tap know that all or part of the current document should be inserted into it. By default, the matching table's schema will be read and any field names that match a property in the top-level of the document will be inserted automatically.

One of the limitations of Couch Tap is that all tables must have an id field as their primary key. In each row, the id's value will be copied from the _id of the document being imported. This is the only way that deleted documents can be reliably found and removed from the relational database.

column

collection

foreign_key

Notes on deleted documents

Synchronising a deleted document is generally a much more complicated operation. Given that the original document no longer exists in the CouchDB database, there is no way to know which document group and table the document was inserted into.

To get around this issue, Couch Tap will search through all the tables defined for the database and delete rows that match the primary or foreign keys.

Obviously, this is very inefficient. Fortunately, CouchDB is not really suited to systems that require lots of document deletion, so hopefully this won't be too much of a problem.

Testing

Run tests using rake, or individual tests as follows:

rake test TEST=test/unit/changes_test.rb