DataMapper SQL Finders

DataMapper SQL Finders adds #by_sql to your models, as a compliment to the standard pure-ruby query system used by DataMapper. The SQL you write will be executed directly against the adapter, but you do not need to lose the benefit of the field and table name abstraction offered by DM. When you invoke #by_sql, one or more table representations are yielded into a block, which you provide. These objects are interpolated into your SQL, such that you use DataMapper properties in the SQL and make no direct reference to the real field names in the database schema.

I wrote this gem because my original reason for using DataMapper was that I needed to work with a large, existing database schema belonging to an old PHP application, subsequently ported to Rails. We need the naming abstraction offered by DataMapper, but we also prefer its design. That said, we run some queries that are less than trivial, mixing LEFT JOINs and inline derived tables… something which DataMapper does not currently handle at all. #by_sql allows us to drop down to SQL in places where we need it, but where we don't want to by-pass DataMapper entirely.

The #by_sql method returns a DataMapper Collection wrapping a Query, in just the same way #all does. Just like using #all, no SQL is executed until a kicker triggers its execution (e.g. beginning to iterate over a Collection).

Because the implementation is based around a real DataMapper Query object, you can chain other DataMapper methods, such as #all, or methods or relationships defined on your Model.

It looks like this:

class Post
  include DataMapper::Resource

  property :id,    Serial
  property :title, String

  belongs_to :user
end

class User
  include DataMapper::Resource

  property :id,       Serial
  property :username, String

  has n, :posts


  def self.never_posted
    by_sql(Post) { |u, p| "SELECT #{u.*} FROM #{u} LEFT JOIN #{p} ON #{p.user_id} = #{u.id} WHERE #{p.id} IS NULL" }
  end
end

User.never_posted.each do |user|
  puts "#{user.username} has never created a Post"
end

The first block argument is always the current Model. You can optionally pass additional models to #by_sql and have them yielded into the block if you need to join.

You may chain regular DataMapper finders onto the result (the original SQL is modified with the additions):

User.never_posted.all(:username.like => "%bob%")

A note about DataMapper 2.0

The DataMapper guys are hard at work creating DataMapper 2.0, which involves a lot of under-the-surface changes, most notably building DM's query interface atop Veritas, with the adapter layer generating SQL by walking a Veritas relation (an AST - abstract syntax tree). Because of the way DM 1 handles queries, it is not trivial to support SQL provided by the user (except for the trival case of it being in the WHERE clause). With any hope, gems like this will either not be needed in DM 2.0, or at least will be easy to implement cleanly, since SQL and Veritas play nicely with each other.

Installation

Via rubygems:

gem install dm-sql-finders

Detailed Usage

Note that in the following examples, you are not forced to use the table representations yielded into the block, but you are encouraged to. They respond to the following methods:

  • tbl.*: expands the splat to only the known fields defined in your model. Other fields in the database are excluded.
  • tbl.to_s: represents the name of the table in the database. #to_s is invoked implcitly in String context. Note that if you join to the same table multiple times, DataMapper SQL Finders will alias them accordingly.
  • tbl.property_name: represents the field name in the database mapping to property_name in your model.

Writing the field/table names directly, while it will work, is not advised, since it will significantly hamper any future efforts to chain onto the query (and it reads just like SQL anyway, right?).

Basic SELECT statements

Returning a String from the block executes the SQL when a kicker is invoked (e.g. iterating the Collection).

def self.basically_everything
  by_sql { |m| "SELECT #{m.*} FROM #{m}" }
end

Passing in variables

The block may return an Array, with the first element as the SQL and the following elements as the bind values.

def self.created_after(time)
  by_sql { |m| ["SELECT #{m.*} FROM #{m} WHERE #{m.created_at} > ?", time] }
end

Selecting less than all fields

Just specify individual fields in the SQL. The regular DM semantics apply (i.e. the rest will be lazy loaded, and omitting the primary key means your records are immutable).

def self.usernames_only
  by_sql { |u| "SELECT #{u.username} FROM #{u}" }
end

Selecting more than all fields (experimental)

This allows you to pre-load things like aggregate calculations you may otherwise add denormalizations for:

def self.with_post_counts
  by_sql(Post) { |u, p| "SELECT #{u.*}, COUNT(#{p.id}) AS post_count FROM #{u} INNER JOIN #{p} ON #{p.user_id} = #{u.id} GROUP BY #{u.id}" }
end

A @post_count instnace variable is set on all resources. Currently this is always a String. You will need to typecast manually.

See the section on "Joins" for details on the join syntax.

You should consider this feature experimental. It takes advantage of the fact DM Property instances can be created and thrown-away on-the-fly.

Ordering

DataMapper always adds an ORDER BY to your queries if you don't specify one. DataMapper SQL Finders behaves no differently. The default ordering is always ascending by primary key. You can override it in the SQL:

def self.backwards
  by_sql { |m| "SELECT #{m.*} FROM #{m} ORDER BY #{m.id} DESC" }
end

Or you can provide it as a regular option to #by_sql, just like you can with #all:

def self.backwards
  by_sql(:order => [:id.desc]) { |m| "SELECT #{m.*} FROM #{m}" }
end

Note that the :order option take precendence over anything specified in the SQL. This allows method chains to override it.

Joins

The additional models are passed to #by_sql, then you use them to construct the join.

class User
  ... snip ...

  def self.posted_today
    by_sql(Post) { |u, p| ["SELECT #{u.*} FROM #{u} INNER JOIN #{p} ON #{p.user_id} = #{u.id} WHERE #{p.created_at} > ?", Date.today - 1] }
  end
end

The :links option will also be interpreted and added to the FROM clause in the SQL. This is useful if you need to override the SQL.

Limits and offsets

These can be specified in the SQL:

def self.penultimate_five
  by_sql { |m| "SELECT #{m.*} FROM #{m} ORDER BY #{m.id} DESC LIMIT 5 OFFSET 5" }
end

Or they can be provided as options to #by_sql:

def self.penultimate_five
  by_sql(:limit => 5, :offset => 5) { |m| "SELECT #{m.*} FROM #{m}" }
end

If :limit and/or :offset are passed to #by_sql, they take precedence over anything written in the SQL itself.

Method chaining

Method chaining with #by_sql, for the most part, works just like with #all. There are some current limitations, such as reversing the order of a query that used ORDER BY in the SQL, rather than via an :order option.

Also note the you may not currently chain #by_sql calls together. #by_sql must, logically, always be the first call in the chain.

User.by_sql{ |u| ["SELECT #{u.*} FROM #{u} WHERE #{u.role} = ?", "Manager"] }.all(:username.like => "%bob%", :order => [:username.desc])

Unions, Intersections and Differences

Unfortunately this is not currently supported, and will likely only be added after the other limitations are worked out.

Specifically, queries like this:

User.by_sql { |u| ["SELECT #{u.*} FROM #{u} WHERE #{u.created_at} < ?", Date.today - 365] } | User.all(:admin => true)

Should really produce SQL of the nature:

SELECT "users"."id", "users"."username", "users"."admin" FROM "users" WHERE ("created_at" < ?) OR (admin = TRUE)

I have no idea what will happen if it is attempted, but it almost certainly will not work ;)

Will it interfere with DataMapper?

#select_statement on the adapter is overridden such that, when you use #by_sql query, code in the gem is executed, and when you execute a regular query, the original code pathways are followed. I'd prefer some sort of extension API in DataObjectsAdapter to allow hooks into its SQL generation logic, but for now, this is how it works.

DataMapper 2.0 should fix this.

Contributors

DataMapper SQL Finders is currently written by Chris Corbyn

Contributions are more than gladly accepted. The primary goal is to support SQL in a way that does not break gems like dm-aggregates and dm-pager. The more the SQL can be interpreted and turned into a native Query, the better.

TODO

There are some known limitations, that are mostly edge-cases. You will only run into them if you try to get too crazy combining regular DM queries with SQL (e.g. adding :links to a hand-written SQL query works, unless you have used bind values somewhere other than the WHERE clause and if, and only if DataMapper needs to use a bind value in the join, such as for special join conditions). Real edge-cases.

  • Support overriding :fields in a #by_sql query (complex if the query depends on RDBMS native functions in both the WHERE and the SELECT)
  • Reverse the order when invoking #reverse in a #by_sql query that used ORDER BY in the SQL (note this will work just fine if you use the :order option)
  • Better support for ? replacements in places other than the WHERE clause
  • Support set operations (union, intersection, difference)
  • Possibly (?) support crazy complex mass-updates (seems a little DB-specific though): User.by_sql { ... something with join conditions ... }.update!(:banned => true) (MySQL, for one, can do UPDATE ... INNER JOIN ...)

Future Plans

Before I started writing this, I wanted to implement something similar to Doctrine's (PHP) DQL, probably called DMQL. This is basically a strict superset of SQL that is pre-processed with DataMapper, having knowledge of your schema, therefore allowing you to simplify the query and let DMQL hande things like JOIN logic. Say, for example:

Post.by_dmql("JOIN User u WHERE u.username = ?", "Bob")

Which would INNER JOIN posts with users and map u.username with the real field name of the User#username property.

This gem would be a pre-requisite for that.

Copyright (c) 2011 Chris Corbyn.