ActiveRecord::Summarize
Why summarize
?
Make existing groups of related
ActiveRecord
calculations twice as fast (or more) with minimal code alteration. It's like ago_faster
block.For more complex reporting requirements, including nested
.group
calls, usesummarize
for fast, legible code that you just couldn't have written before without unacceptable performance or lengthy custom SQL and data-wrangling.
Sidebar: Are you wondering how summarize
compares to load_async
?
Installation
Add this line to your Rails application's Gemfile:
gem 'activerecord-summarize'
And then execute:
$ bundle install
Usage
Suppose your controller method looks like this:
purchases = Purchase.complete
promotions = purchases.where.not(promotion_id: nil)
@promotion_sales = promotions.count
@promotion_revenue = promotions.sum(:amount)
@by_region = purchases.group(:region_id).count
Make it this instead:
Purchase.complete.summarize do |purchases|
promotions = purchases.where.not(promotion_id: nil)
@promotion_sales = promotions.count
@promotion_revenue = promotions.sum(:amount)
@by_region = purchases.group(:region_id).count
end
...and you'll have exactly the same instance variables set, but only one SQL query will have been executed.
You can run as many calculations in a summarize
block as it makes sense to run, so long as they all chain to the relation on which you called summarize
. They can use different, possibly-overlapping subsets of the original relation, i.e., they can have their own where
clauses and even group
. The final result of each will be exactly as it would have been if you had run each query independently, but only one query will actually be issued to the database.
Limitations & details
The only restriction is that each of the queries must be structurally compatible with the parent relation, in the same sense as is required for relation.or(other)
. So if you wanted to display the region's name, you'd need to group by a sub-select (ew) or do the join at the top level:
Purchase.complete.left_joins(:region).summarize do |purchases|
promotions = purchases.where.not(promotion_id: nil)
@promotion_sales = promotions.count
@promotion_revenue = promotions.sum(:amount)
@by_region = purchases.group("regions.name").count
end
Until the summarize
block ends, the return value of your calculations are ChainableResult::Future
instances, a bit like a Promise with a more convenient API. You can call any method you like on a ChainableResult
, and you'll get back another ChainableResult
, and they'll all turn out alright in the end—provided you called methods that would have worked if you had run that calculation without summarize
. OTOH, using a ChainableResult
as an argument to a method of a non-ChainableResult
generally will not work.
Purchase.last_quarter.complete.summarize do |purchases|
@sales = purchases.sum(:amount)
# x * y is syntactic sugar for x.*(y), so this will work:
@vc_projection = @sales * 3
# And this won't:
@vc_projection = 3 * @sales
# But this will work since v0.5.0...
@units_sold = purchases.sum(:units)
# ...because methods of `ChainableResult` now resolve their argument(s)
@avg_unit_price = @sales / @units_sold
end
If, within a summarize
block, you want to combine data from more than one ChainableResult
, you may need to use the otherwise-optional second argument yielded to the block, a proc
I like to name with_resolved
. Pass it all the results you want to combine and a block that combines them and returns the new result:
Purchase.complete.left_joins(:promotion).summarize do |purchases, with_resolved|
@all_revenue = purchases.sum(:amount)
promotions = purchases.where.not(promotions: {id: nil})
@promotion_sales = promotions.count
@promotion_discounts = promotions.sum("promotions.discount_amount")
@avg_discount = with_resolved[@promotion_sales, @promotion_discounts] do |sales, discounts|
sales.zero? ? 0 : discounts / sales
end
end
Treat a with_resolved
block as a pure function: i.e., return the value you care about, and don't set or change any other state within the block. Behavior in any other case is undefined.
Escape hatch
The query generated by summarize
is often much faster than equivalent queries written without it, but for few-query cases where each query is well-served by its own index, summarize
could possibly be slower.
By design, every operation performed with summarize
is correct and corresponds to normal ActiveRecord
behavior, and any operations that can't be done correctly this way or aren't yet will raise exceptions. But only imperfect humans have worked on this gem, so you might also wonder if summarize
is producing correct results.
Fortunately, you can easily check both with summarize(noop: true)
, which causes summarize
to yield the original relation it was called on and a trivial with_resolved
proc. The block will be executed as though summarize
were not involved, with each calculation executing separately and immediately returning numbers or hashes.
If you do find any case where you get different results with summarize(noop: true)
, I'd be grateful if you filed an issue.
How
ActiveRecord::Relation#summarize
yields a lightly-modified copy of the relation that intercepts all calls to sum
or count
which, instead of a number or hash, return a ChainableResult::Future
. A ChainableResult
accepts any method called on it, returning a new ChainableResult
that will evaluate to the result of running the method on the eventual result of its parent.
At the end of the summarize
block:
- All the calculations are combined into a single query.
- The results of the query are collected into the same shapes they would have if they had been called independently. E.g., a bare
.count
returns a number, but.group(*expressions).count
returns a hash with single value (one group expression) or array (two-plus expressions) keys. - Any
ChainableResult
in the return value of the block (usually a singleChainableResult
or anArray
orHash
withChainableResult
values) is replaced with its resolved value. - Any
ChainableResult
in the local scope of the block (i.e.,block.binding
) or an instance variable of the block context (i.e.,block.binding.receiver
) is replaced with its resolved value.
N.b., if you are using summarize
in a more functional style and will return all values you care about, you can let summarize
know to skip step 4 by invoking it with summarize(pure: true)
.
When the parent relation already has .group
applied, pure: true
is implied and step 4 does not take place.
Power usage with group
Build even more complex queries by using summarize
on a relation that already has .group
applied. Results are grouped just like a standard .group(*expressions).count
, but instead of single numbers, the values are whatever set of calculations you return from the block, including further .group(*more).calculate(:sum|:count,*args)
calculations, in whatever Array
or Hash
shape you arrange them. For example:
puts Purchase.last_year.complete.group(:region_id).summarize do |purchases,with_resolved|
total = purchases.count
by_quarter = purchases.group(CREATED_TO_YEAR_SQL, CREATED_TO_QUARTER_SQL).count.sort.to_h
target = with_resolved[total / 4, by_quarter.values.max] {|avg_q, best_q| [avg_q * 1.25, best_q].max.round }
{last_year: total, quarters: by_quarter, unit_target: target}
end
# Output:
# {
# 1 => {
# last_year: 2717316,
# quarters: {
# [2021, 1] => 634057,
# [2021, 2] => 590012,
# [2021, 3] => 659010,
# [2021, 4] => 834237
# },
# unit_target: 849161
# },
# 2 => { ... },
# 3 => { ... }
# }
See Use case: moderator dashboard for a more-complete example comparing ActiveRecord-only code with summarize
.
Caveat
The ActiveRecord API has no direct analog for this mode, so noop: true
is not allowed when summarize
is called on a grouped relation.
When the relation already has group
applied, for correct results, summarize
requires that the block mutate no state and return all values you care about: functional purity, no side effects. ChainableResult
values referenced by instance variables or local variables not returned from the block won't be evaluated. I.e., pure: true
is implied and pure: false
is not allowed. To see why:
# A trivial example:
Purchase.complete.group(:region).summarize {|purchases| purchases.sum(:amount) }
# ...is exactly equivalent to:
Purchase.complete.group(:region).sum(:amount)
# But if there were three regions, what should the value of @target be in this case?
region_targets = Purchase.last_quarter.complete.group(:region).summarize do |purchases|
@target = purchases.sum(:amount) * 1.25
end
As a rubyist, that last example looks like the block will be evaluated for each group, so @target
should keep whatever value it got the last time the block was run. However:
- This is not often useful.
- The block is not actually linearly evaluated for each group.
Instead the block is evaluated once to determine what calculations need to be run, the query is built and evaluated, and then, for each group of the parent relation, the return value of the block is evaluated with respect to just those rows belonging to the group. In practice this is quite powerful and makes a pleasant, legible API for complex reporting.
Development
Run bin/setup
to install dependencies. If you don't have PostgreSQL installed, comment out the pg
line in Gemfile
first. Then, run bundle exec rake test
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
Tests and bin/console
support SQLite and PostgreSQL: (un)comment the appropriate lines at the top of test/test_data.rb
to choose. In the future, we'll have a nicer solution. If you want to use PostgreSQL, run CREATE DATABASE summarize_test;
as your default user.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and the created tag, and push the .gem
file to rubygems.org.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/midnightmonster/activerecord-summarize.
License
The gem is available as open source under the terms of the MIT License.