Introduction

This gem provides a nice API to create complicated SQL queries for associated and aggregated tables.

Example

Take a classic example:

class Post has_many :comments end

Now let’s say you want to display a table with some posts’ attributes including number of comments. Normally you would use post.comments.count, but that generates separate query for each post. Alternatively you could use counter cache, but it can get out of sync plus it only works on this simple example, while you could also need the time of latest comment for each post.

Rails 3

AggregateColumns solution would look like this:

posts = Post.aggregate_columns( :association => :comments )

SELECT posts.*, (SELECT count(*) AS comment_count FROM comments WHERE (posts.id = post_id)) AS comment_count FROM “posts” ORDER BY comment_count DESC

posts.first.comment_count # => “1”

This returns a full blown ActiveRecord::Relation object encapsulating a query for posts with additional column called “comment_count” containing (big surprise here) number of comments for each post. I’m taking advantage of the fact columns specified in :select option are kept inside model objects (but not type-casted, so comment_count would be a String).

Options:

* :association - association to aggregate
* :function - SQL aggregate function to use, defaults to count
* :column - associated table column to aggregate, defaults to '*' (useful for
  count)
* :result_column - name of resulting column, defaults to
  association_field_function with exception of '*' column, which results in

association.singularize_function

* :join_type - JOIN type for in-join mode, explained later

Ordering defaults to aggregate column descending, you may always change it using reorder. The reason for using this default is that in my experience this is the only order I needed :)

What if you did not want such a simple aggregation, but the one I already mentioned: time of latest comment for each post:

Post.aggregate_columns( :association => :comments, :function => :max, :column => :created_at )

SELECT posts.*, (SELECT max(created_at) AS comments_created_at_max FROM comments WHERE (posts.id = post_id)) AS comments_created_at_max FROM “posts” ORDER BY comments_created_at_max DESC

The name of aggregated column would be comments_created_at_max in this case, but it can be changed using :result_column option.

You may add some additional relations to the subquery using a block, eg.

Post.aggregate_columns( :association => :comments ) { |rel| rel.where( :user_id => 1 ) }

SELECT posts.*, (SELECT count(*) AS comment_count FROM comments WHERE “comments”.“user_id” = 1 AND (posts.id = post_id)) AS comment_count FROM “posts” ORDER BY comment_count DESC

This may you can limit which comments are considered for aggregation.

The only option not mentioned yet is :join_type, which can only be explained after digging deeper into how the gem works. Aggregate column is added in one of two ways:

  • if :join_type option is not present - it’s appended as subquery to SELECT clause, eg:

SELECT posts.*, (SELECT count(*) AS comment_count FROM comments WHERE (posts.id = post_id)) AS comment_count FROM “posts” ORDER BY comment_count DESC

  • if :join_type option is present - it’s added as a subquery connected using JOIN clause with join type specified, eg (:join_type => :right):

SELECT posts.*, comment_count FROM “posts” RIGHT JOIN (SELECT post_id, count(*) AS comment_count FROM comments GROUP BY post_id) comment_count_join ON posts.id = comment_count_join.post_id ORDER BY comment_count DESC Now, each way has it’s place.

In-select subquery:

  • foreign key index is used (PostgreSQL at least)

  • subquery executed only for rows fetched from main table

In-join subquery:

  • with RIGHT join only fetches records from main table having results in aggregate column (eg. only posts with any comments)

You can only decide between those two on a case by case basis. But generally: if you only care about records having some meaningful values in aggregate column, in-join might be better; if main table has some large record count, but you only want to select some subset of those, in-select might save you some computation time.

Rails 2

NOTE: this is deprecated and I’m not sure if it even still works. But the code is there, so it might be useful for someone.

Instead of aggregate_columns you have aggregate_column_options:

Post.aggregate_columns_options( :association => :comments )

which returns a Hash of options suitable to pass to .find call. You have additional options: :joins and :conditions to be passed to the subquery, because you don’t have a Relation object to work on in a block.

Now something really complicated - sum of votes for comments whose authors are active. Here’s where :conditions and :joins options come in handy:

Post.all( Post.aggregate_columns_options( :association => :comments, :function => :sum, :column => :votes, :joins => “INNER JOIN authors ON comments.author_id = authors.id”, :conditions => [“authors.active = ?”, true], :result_column => :comment_vote_count ) )

:joins and :conditions are passed to the subquery, so they will determine which comments are considered when calculating vote sum

Important things to note here:

  • aggregate_column_options return :select, :joins and :order options, so those

cannot be used for other purposes

  • all options are used internally in aggregate subqueries, so they do not clash

with normal find options (other than aforementioned ones). This means you can merge resulting options with eg. custom :conditions

You may also define multiple aggregate columns in one call:

Post.aggregate_columns_options( { :association => :comments, :function => :max, :column => :created_at }, { :association => :tags, :result_column => :number_of_tags } )

Yet another method to combine aggregate columns with other find options is to use scopes:

Post.aggregate_columns_scope( :association => :comments ).scoped( …

Thanks

Many thanks go to Stefan Nothegger and Sharewise project (www.sharewise.com), where the idea originates from.

2010 Marek Janukowicz/Starware. Released under MIT license.