Ruby on Rails | Screencasts | Download | Documentation | Weblog | Community | Source

Ticket #9487 (new enhancement)

Opened 8 months ago

Last modified 5 months ago

[PATCH] Allow multiple calculations with one query

Reported by: jcoglan Assigned to: technoweenie
Priority: normal Milestone: 2.x
Component: ActiveRecord Version: edge
Severity: normal Keywords: unverified calculations queries
Cc:

Description

In some cases, you need to grab several COUNTs, SUMs etc from the database, and it speeds things along if you can get them all in one query. This patch provides a simple way to achieve this through ActiveRecord::Base.calculate. For example:

Person.calculate(:how_many => [:count, '*'], :total_age => [:sum, :age])
#=> {:how_many => 12, :total_age => 387}

Person.calculate({:how_many => [:count, '*'], :total_age => [:sum, :age]},
    :conditions => ['age < ?', 30])
#=> {:how_many => 7, :total_age => 94}

The results are fetched using one query only, rather than a separate query for each quantity. I've found this tremendously useful for generating reports from large data sets.

I've included documentation and a few tests, which run fine under MySQL. Testing with other DBs would be appreciated.

Attachments

parallel_calculations_with_tests.diff (5.8 kB) - added by jcoglan on 01/30/08 23:24:08.
parallel_calculations_and_distinct_records_with_tests.diff (10.2 kB) - added by jcoglan on 01/30/08 23:24:22.

Change History

09/06/07 08:32:16 changed by jcoglan

This ticket conflicts with the patch for #9440, so I'm adding a patch for both these tickets. See parallel_calculations_and_distinct_records_with_tests.diff.

09/16/07 10:40:06 changed by jcoglan

  • keywords changed from calculations queries to unverified calculations queries.

09/16/07 21:24:23 changed by josh

Tests pass.

However, I'm not voting for this patch. It just doesn't seem that usefully for me.

09/16/07 23:44:59 changed by jcoglan

Granted, it may be slightly niche. What I can say is that this was extracted from a real-world project where it made a significant difference to performance. I had to graph data in real time based on a data set of ~10,000 records with various joins to other large tables and pulling several calculations at once really helped me out.

I'd like to see this go into the core as it requires directly overwriting parts of lengthy Rails methods, which makes it hard to release as a plugin (where alias_method_chain is prefered for attaching extra functionality, so the original method can change without affection your plugin).

10/07/07 15:08:10 changed by fcheung

+1 from me. I've mostly written a patch to do much the same. The stuff I've done adds one extra piece of functionality that allows the different calculations to have their own options, so (to use your syntax) you can do

Person.calculate({:how_many_female => [:count, '*', :conditions => ["sex = 'f'"]], 
             :how_many_male => [:count, '*', :conditions => ["sex = 'm'"],
             :total_age => [:sum, :age]},
             :conditions => ['age < ?', 30])

As said, this sort of stuff would be very difficult/brittle to do as a plugin.

10/07/07 15:11:55 changed by fcheung

(obviously a bad thing if the conditions vary wildly, but in the case where you have a common set of conditions, and then you just need to chop it into 2, it's been worth it in my experience)

10/07/07 19:38:28 changed by bitsweat

  • owner changed from core to technoweenie.

10/07/07 21:08:57 changed by jcoglan

fcheung - could you clarify something for me? I might have this wrong, but presumably any difference in conditions means you need an extra query? For me the core benefit of this patch is reducing the number of queries -- it's not just syntactic sugar but a real performance boost -- but if each calculation has different conditions, don't you need a separate query for each one? Please correct me if I've got this wrong.

10/07/07 21:14:02 changed by tarmo

All the calculations use the same conditions, if they used different conditions then they'd have to be separate queries (or subqueries of one main query).

10/07/07 21:15:13 changed by fcheung

I'm with you on reducing queries, that's my goal too. I'll take my example, you've got some conditions which identify a set of people, and you want a count of the men fitting that criteria, and a count of the women fitting that criteria. You can then do

SELECT SUM(IF(sex='m', 1, 0)) as men_count,
       SUM(IF(sex='f', 1, 0)) as wommen_count
WHERE [some_conditions]

If i were to get my mathematician hat out, i'd tell you that the cardinality of a subset can be obtained by summing it's indicator function :-)

10/13/07 21:39:19 changed by danger

I'm on the fence with this one. What I'd REALLY like to see is a patch that refactors the calculate method in such a way that this functionality can be easily pluginized. If this patch is applied it'll make it that much tougher for the next developer to extend the method's functionality.

10/22/07 16:39:05 changed by josh

I agree with danger. A nice refactoring of calculations could make it easier for you to write plugins for.

-1

12/14/07 00:56:19 changed by mdemare

+1

Refactoring would be even nicer, but this is a real improvement, and I doubt if it would make refactoring any harder.

01/30/08 23:24:08 changed by jcoglan

  • attachment parallel_calculations_with_tests.diff added.

01/30/08 23:24:22 changed by jcoglan

  • attachment parallel_calculations_and_distinct_records_with_tests.diff added.