Ruby on Rails | Screencasts | Download | Documentation | Weblog | Community | Source

Ticket #9640 (closed defect: fixed)

Opened 8 months ago

Last modified 1 month ago

[PATCH] Alternative to eager loading

Reported by: fcheung Assigned to: core
Priority: normal Milestone: 2.1
Component: ActiveRecord Version: edge
Severity: normal Keywords: eagerloading
Cc: bitsweat, tarmo, lifofifo, chuyeow, jbarnette, ian.w.white@gmail.com, michael@schuerig.de, acechase

Description

Eager loading is inefficient, particularly in the case when multiple has_many associations are :included, as the query generated then causes the cartesian product to be loaded.

This patch attempts an alternative mechanism, where each association to be :preloaded generates one extra query, so

  Author.find :all, :conditions => blah, :preload => [:posts, :comments]

will result in 3 queries

  Author.find :all, :conditions => blah, :preload => [:comments, {:posts => :categories}]

will result in 4 queries (the exception being has_many :through associations, which add 2 queries).

This avoids the monster joins that can result from :include and the large result sets produced in the case described above.

Attachments

preload_ar.diff (7.8 kB) - added by fcheung on 09/24/07 02:19:43.
preload_ar.2.diff (7.8 kB) - added by fcheung on 09/27/07 01:59:00.
extra test, stylistic changes
preload_ar.3.diff (25.6 kB) - added by fcheung on 09/30/07 13:16:26.
had previously omitted tests in the diff (d'oh)
preload_ar.3.2.diff (26.0 kB) - added by fcheung on 09/30/07 13:43:42.
added tests for conditions
ar_preload.4.diff (32.7 kB) - added by fcheung on 09/30/07 18:49:45.
Allow preconfigured preload (ie class Post; has_many :comments, :preload => :author)
preload_ar_polymorph.diff (35.7 kB) - added by fcheung on 10/07/07 22:22:24.
Add support for polymorphic associations
ar_preload.diff (16.5 kB) - added by fcheung on 10/08/07 14:56:59.
Extra tests & no excessive singularisation
ar_preload_polymorp_tests.diff (37.2 kB) - added by fcheung on 10/09/07 12:18:42.
restored previously zapped tests
replace_include.diff (18.2 kB) - added by fcheung on 10/15/07 19:28:53.
Replace :include with :preload when possible
preloading.patch (17.1 kB) - added by lifofifo on 10/16/07 16:04:19.
Refactored Fred's patch
preloading.2.patch (17.3 kB) - added by fcheung on 11/06/07 09:17:30.
updated to apply to latest edge
preloading.3.patch (18.8 kB) - added by fcheung on 11/06/07 09:43:45.
interpolate sql
preloading.4.patch (29.2 kB) - added by fcheung on 11/07/07 16:39:10.
preloading.5.patch (18.6 kB) - added by fcheung on 11/07/07 16:40:17.
preloading.6.patch (20.5 kB) - added by fcheung on 12/09/07 23:28:30.
beef up some tests & pass more association options through
preloading.7.patch (20.0 kB) - added by fcheung on 01/04/08 14:04:45.
rebased on latest edge
test-order.patch (1.0 kB) - added by Aleksey Kondratenko on 01/16/08 17:04:36.
failing test with symbol arg to :order option
fix-order.patch (0.8 kB) - added by Aleksey Kondratenko on 01/16/08 17:05:19.
fix for test-order.patch
test-sti.patch (1.8 kB) - added by Aleksey Kondratenko on 01/16/08 17:06:23.
failing test for polymorphic association preloading on STI subclass
fix-sti.patch (1.1 kB) - added by Aleksey Kondratenko on 01/16/08 17:06:55.
fix for test-sti
fix_belongs_to_with_foreign_key.2.patch (4.0 kB) - added by codafoo on 01/21/08 04:02:22.
Fixed a bug with preloading belongs_to associations that doesn't use 'id' as the primary_key.
fix_belongs_to_with_foreign_key.patch (4.0 kB) - added by codafoo on 01/21/08 04:05:17.
Fixed a bug with preloading belongs_to associations that doesn't use 'id' as the primary_key.
fix_nested_includes.patch (5.1 kB) - added by blweiner on 01/24/08 18:34:10.
fixes nested includes for preloading
fix_nested_includes2.patch (1.3 kB) - added by blweiner on 01/31/08 07:44:57.
fix_nested_include_with_mixed_record_type.patch (5.1 kB) - added by acechase on 02/13/08 19:27:45.
fix_dont_attempt_to_preload_when_association_returns_nil.diff (0.7 kB) - added by GMFlash on 02/17/08 06:57:21.

Change History

09/24/07 02:19:43 changed by fcheung

  • attachment preload_ar.diff added.

09/24/07 02:21:00 changed by fcheung

Ooops, should read "Eager loading is sometimes inefficient"

09/24/07 07:23:19 changed by bitsweat

  • cc set to bitsweat.

I'm interested to see where each performs better than the other and whether we could simply preload instead of eager load accordingly.

Perhaps testing for one-to-one joins like has_one and belongs_to?

09/24/07 07:49:21 changed by tarmo

  • cc changed from bitsweat to bitsweat, tamo.

09/24/07 09:44:53 changed by lifofifo

  • cc changed from bitsweat, tamo to bitsweat, tamo, lifofifo.

09/24/07 15:40:49 changed by protocool

I've had a plugin that I've been using privately to do this for quite some time now - aside from not having the time to support users, one of the issues that's held me back from releasing it is the performance implications. To explain:

First of all, if you are doing a *monster* list of :includes, hydrating your associations like this goes a long way to reducing memory consumption - there's a lot less 'discarded' data. The discarded data doesn't tax mysql too much, but it does tax the ruby vm.

In general, hydrating with multiple queries is at least as fast as eager loads with :include but it just depends on the structure of your data. When your toplevel objects are 'large' and you eager :include a has_many, it's probably about 3-4 times faster to hydrate.

As your :include list grows, so too does the benefit of hydrating.

But there is a major gotcha with hydration - while it performs your queries much faster, it increases the load on your database server significantly.

To me it's a trade-off: would you rather see 10,000 find()s completed in 10 seconds with a 5% system load for mysql or 10,000 find()s completed in 3 seconds with a 30% system load for mysql.

Every DBA I've spoken to has been *very* concerned about the increased load - even if it was for a shorter time. Above all, DBAs want to be able to tune the strategy for a given set of queries.

So to answer bitsweat - I've always treated this as a strategy you choose on a per-find() basis. Sometimes it's worth it, sometimes it's not. I've never believed it should be automatically applied.

09/24/07 16:30:26 changed by fcheung

This seems counterintuitive to me (although of course if that's what you see then that's the way it is. I don't have any DBAs to pester me though :-)) Sure part of the stuff that makes :include slow with several :has_manys is what the ruby code does with the data afterwards, but the database doesn't get away completely free either

Surely making the database server give you back 200 rows instead of 10000 can only be a good thing for it. I would have also thought 3 really simple queries (since the associations turn into select * from foos where bar_id IN (1,2,3)) would be kinder to the db (as well as faster) than 1 really uber joined query

I've got an example where a question object has 2 has_many associations, each with 50 objects. Doing an :include on both of those and on a has_one takes around 1.3 seconds (wall clock time), and 0.1s spent in the database (according to the rails log). Doing the :preload thing results in wall clock time of .025s, with around 0.01s spent in the database. I plain don't understand how that could ever be worse (or do other uses show different patterns?)

09/24/07 19:17:43 changed by protocool

Frederick,

you have to measure more than just wall-clock time at the ruby end. You need to watch the system load of your mysql process over a sustained period of time using hydration as a strategy.

What I found is that there is a non-linear relation between 'results obtained' and mysql system load when you compare :include and hydration. If you consider wall-clock-time * mysql-system-load to be how 'hard' mysql is working for a given set of results, hydration makes mysql work 'harder'.

I presented this to one of the mysql core developers back at RailsConf 2006 and the impression I got from him was that from a pure-database standpoint, he wasn't a fan of the hydration strategy (for reasons above). However, he did concede that ActiveRecord's treatment of the database as a 'bucket' (aggressively loading all columns, for example) meant that maybe it was an acceptable strategy.

I think what worries DBAs can be best summed up with another benchmark that compares the effects of one query vs many queries (in the extreme):

I had a benchmark (lasting about 5 minutes) where I set up an enormous mysql query cache and repeatedly hit the same object graph (top level of 100 objects, with 2 has_many relations) using N+1 queries.

With a near-perfect hit rate for the query cache I've found that N+1 is as much as 17 times *faster* at the ruby end than running the benchmark with :include.

Unfortunately, the system load for mysql during N+1 benchmark spiked to nearly 100% cpu. DBAs really hate that :-)

If you just consider the wall-clock at the ruby-end you might be ignoring the way you're beating up your mysql server.

Trevor

09/24/07 20:47:10 changed by fcheung

I ran a benchmark which was essentially this:

500.times do

Question.find an_id_with_many_objects_in_its_has_manys, :preload => ... Quesiton.find :all, :order => 'questions.id desc', :preload => ..., :limit => 50

end

where the :preload/:included associations were 2 has_manys and one has_one, and a has_many :through on one of the has_many

:preload version:

55 secs total, 10.2 secs spent inside mysql (according to ps), average mysql load around 18% (

:include version 36 minutes, 76 secs spent inside mysql mysql load around 3-4%. So total amount of load put on mysql is more than sextuple. It's also worse on the wallclock * mysql cpu percentage front.

Sure the mysql cpu usage is higher, but that's only because we're running the queries as fast as we can: in the real world if I were to replace :includes with :preloads I'd have the same numbers of queries over the same amount of time in both cases: my users aren't going to start clicking the buttons in their browser any faster. Put it another way: if I were to override find to be

def find_with_sleep

sleep 1 find_without_sleep

end

Then the mysql load goes down to the same level as with :include (except that ruby processes aren't chewing all my cpu time), but I haven't actually done anything that increases/decreases the amount of load put on the server (for a constant amount of work.

The only way I can see this impacting real world mysql server load is if all you day is run benchmarks continuously

09/24/07 21:52:23 changed by protocool

Frederick,

well, the difference between our benchmark results at the very least indicates that performance of the various strategies may be data/environment specific, and there is no hard and fast rule (at least that I can determine) for all situations.

I have to say though... The results you're seeing are odd - it's almost as though you only have one 'client' ruby process hitting the database... or maybe you're even using a single machine for both ruby and the db...

Either way, all I'm saying is that in my experience there are trade-offs and as such, this behavior should be opt-in for a given find() rather than automatic for *all* find()s.

Trevor

09/25/07 07:08:49 changed by fcheung

I have been using just the one machine for now. It needs be optional anyway, given that :include does allow you to specify conditions on the extra tables and this doesn't

09/25/07 07:19:45 changed by fcheung

I think I can extract a concise summary from my lengthy ramble. If during a lengthy benchmark mysql use is low it just means that it;s not the bottleneck (since as we know rails' unpacking of the rows returned chews up resources). The fact that changing things means that a benchmark now taxes mysql fully just means that the bottleneck is no longer ruby but the database (and it's normal that during a benchmark like this the bottleneck be maxed out - in my tests the ruby processes were using 100% cpu during the 'mysql friendly' version).

09/25/07 09:15:28 changed by lifofifo

In any cases, it'll be good to have both the options.

09/27/07 01:59:00 changed by fcheung

  • attachment preload_ar.2.diff added.

extra test, stylistic changes

09/27/07 01:59:44 changed by fcheung

I've added a few tests. As requested, I've done a few simple benchmarks:

user system totalreal
:include belongs_to 0.170000 0.010000 0.180000 ( 0.194349)
:preload belongs_to 0.120000 0.010000 0.130000 ( 0.144132)
:include has_many 0.220000 0.010000 0.230000 ( 0.255290)
:preload has_many 0.140000 0.010000 0.150000 ( 0.185786)
:include multiple has_many 7.470000 0.200000 7.670000 ( 8.122567)
:include multiple has_many 0.290000 0.010000 0.300000 ( 0.391572)
:include nested has_many 0.510000 0.020000 0.530000 ( 0.577883)
:preload nested has_many 0.290000 0.010000 0.300000 ( 0.402208)
:include single_record small has_many 0.020000 0.000000 0.020000 ( 0.030637)
:preload single_record small has_many 0.010000 0.000000 0.010000 ( 0.018658)
:include single_record medium has_many 0.020000 0.000000 0.020000 ( 0.021437)
:preload single_record medium has_many 0.010000 0.000000 0.010000 ( 0.020416)
:include has_one 0.070000 0.010000 0.080000 ( 0.091121)
:preload has_one 0.060000 0.000000 0.060000 ( 0.090098)

To my untrained eye this says that :include is basically ok until its bad, then it's really bad. I think a little more care would be required to read more into this.

Code to generate the benchmarks:

Benchmark.bm(15) do |x|
  x.report(":include belongs_to") { 10.times {Question.find( :all, :limit => 50, :include => :customer, :order => 'questions.id desc') }}
  x.report(":preload belongs_to") { 10.times {Question.find( :all, :limit => 50, :preload => :customer, :order => 'questions.id desc') }}

  x.report(":include has_many") { 10.times {Question.find( :all, :limit => 50, :include => :incoming_messages, :order => 'questions.id desc') }}
  x.report(":preload has_many") { 10.times {Question.find( :all, :limit => 50, :preload => :incoming_messages, :order => 'questions.id desc') }}

  x.report(":include multiple has_many") {10.times {Question.find :all, :limit => 50, :include => [:incoming_messages, :outgoing_messages], :order => 'questions.id desc'}}
  x.report(":include multiple has_many") {10.times {Question.find :all, :limit => 50, :preload => [:incoming_messages, :outgoing_messages], :order => 'questions.id desc'}}

  x.report(":include nested has_many") { 10.times {Customer.find :all, :limit => 50, :include => {:questions => :incoming_messages}, :order => 'customers.id desc' }}
  x.report(":preload nested has_many") { 10.times {Customer.find :all, :limit => 50, :preload => {:questions => :incoming_messages}, :order => 'customers.id desc' }}
  
  #8 questions
  x.report(":include single_record small has_many") {10.times {Customer.find 123456, :include => :questions}}
  x.report(":preload single_record small has_many") {10.times {Customer.find 123456, :preload => :questions}}

  #50 questions
  x.report(":include single_record medium has_many") {10.times {Customer.find 123480, :include => :questions}}
  x.report(":preload single_record medium has_many") {10.times {Customer.find 123480, :preload => :questions}}

  x.report(":include has_one") {10.times {Question.find :all, :limit => 50, :include => :question_time_record, :order => 'questions.id desc'}}
  x.report(":preload has_one") {10.times {Question.find :all, :limit => 50, :preload => :question_time_record, :order => 'questions.id desc'}}
end

09/30/07 13:16:26 changed by fcheung

  • attachment preload_ar.3.diff added.

had previously omitted tests in the diff (d'oh)

09/30/07 13:43:42 changed by fcheung

  • attachment preload_ar.3.2.diff added.

added tests for conditions

09/30/07 18:49:45 changed by fcheung

  • attachment ar_preload.4.diff added.

Allow preconfigured preload (ie class Post; has_many :comments, :preload => :author)

10/07/07 22:22:24 changed by fcheung

  • attachment preload_ar_polymorph.diff added.

Add support for polymorphic associations

10/07/07 22:23:50 changed by fcheung

I've never used polymorphic associations much, so I could easily have got things wrong (i've written tests of course, but then those could be wrong too :-))

10/08/07 14:56:59 changed by fcheung

  • attachment ar_preload.diff added.

Extra tests & no excessive singularisation

10/09/07 12:18:42 changed by fcheung

  • attachment ar_preload_polymorp_tests.diff added.

restored previously zapped tests

10/15/07 19:28:53 changed by fcheung

  • attachment replace_include.diff added.

Replace :include with :preload when possible

10/15/07 19:34:17 changed by fcheung

:include is now effectively :preload. The patch defers to the old include if the condition or order reference the :included table (this only works if the tables are explicitly_name (ie. "foos.bar =42")

(follow-up: ↓ 17 ) 10/15/07 21:07:13 changed by tarmo

With this patch I can no longer :include associations which have :conditions that require interpolate_sql. The interpolation is there just so it would be possible to :include those columns so I can just remove the interpolation (which I'm using to specify the aliased_table_name for ambiguous column references) now.

Perhaps as it would be good to default associations which use sql interpolation to not use preloading but just normal eager loading?

(in reply to: ↑ 16 ) 10/16/07 08:03:47 changed by fcheung

Replying to tarmo:

Perhaps as it would be good to default associations which use sql interpolation to not use preloading but just normal eager loading?

That's entirely possible. It would make the 'should I use normal eager loading' code a little bit more complicated but I think it is doable (or do you mean do no :preload if any of the associations uses sql interpolation). If you can exhibit a test case for this I might also have a go at seeing if I can bludgeon preload into working in this case.

In general, might this be a case of having our cake and eating it? ie given {{{ Post.find :all, :include =>[:comments, :some_other_associations], :conditions => ['comments.unmoderated = ?', true]}}} then we normal-eager-load comments, but we :preload :some_other_associations. I can see this getting tricky to work out in more complicated cases (eg nested associations), so it might be better to leave that for another day.

10/16/07 16:04:19 changed by lifofifo

  • attachment preloading.patch added.

Refactored Fred's patch

(follow-up: ↓ 19 ) 10/26/07 20:08:24 changed by davidomundo

Hey guys, I'm really expectant about this feature. Is there any chance you could implement an "afterload" feature? I think this will help in cases where you cannot preload, but still want to load many rows without Rails doing hundreds of queries for each item.

For example:

# user.rb
has_many :blogs

# blog.rb
belongs_to :user
has_many :posts

# post.rb
belongs_to :blog
has_many :comments

# comment.rb
belongs_to :post

# users_controller.rb
def all_coments
@user = User.find params[:id]
@posts = @user.blogs.find(:all, :preload => [:post])
Post.afterload(@posts, [:comments])
# each post in @posts has comments preloaded

An alternate approach would be to do nested preloads:

@posts = @user.blogs.find(:all, :preload => {:post => [:comments]})

(in reply to: ↑ 18 ) 10/26/07 20:21:21 changed by tarmo

Replying to davidomundo:

An alternate approach would be to do nested preloads: {{{ @posts = @user.blogs.find(:all, :preload => {:post => [:comments]}) }}}

Isn't that exactly what :preload (not really called that anymore, it's :include in the latest patch) already does?

(follow-up: ↓ 21 ) 10/26/07 20:23:36 changed by fcheung

Yup, that's definitely what it already does. tarmo, could you show an example using interpolate_sql (it's not something i've ever really needed before)? I think I can just add that in same as :include does but I'd far rather start with a failing test.

(in reply to: ↑ 20 ) 10/26/07 20:41:13 changed by tarmo

Replying to fcheung:

tarmo, could you show an example using interpolate_sql (it's not something i've ever really needed before)? I think I can just add that in same as :include does but I'd far rather start with a failing test.

An ugly example would be:

  has_many :upcoming_performances_on_sale, :class_name => 'Performance',
    :conditions => ['#{"#{aliased_table_name}." rescue ""}state = ?', 'on_sale']

The point of the interpolation part is to make the association eager loadable while also working with normal access (when the aliasing is not needed). The aliasing itself is expected though as it's a self-referential association and thus can not be eager loaded without aliasing. :preload theoretically reduces the need for such workarounds but in cases where it falls back to regular eager loading it is still needed.

(follow-up: ↓ 23 ) 10/26/07 20:48:00 changed by davidomundo

So you're saying that

Blog.find(:all, :preload => {:post => [:comments]})

Would first load all blogs, then load all posts for each blog, and then load all comments for each post? If so, I'm pleasantly surprised and impressed.

(in reply to: ↑ 22 ) 10/26/07 20:52:21 changed by tarmo

Replying to davidomundo:

So you're saying that {{{ Blog.find(:all, :preload => {:post => [:comments]}) }}} Would first load all blogs, then load all posts for each blog, and then load all comments for each post? If so, I'm pleasantly surprised and impressed.

That's exactly the way it works. And also the reason why preload is a great improvement on the old eager loading behavior.

10/26/07 20:57:51 changed by fcheung

Yup, that was the whole point :-)

10/26/07 21:16:38 changed by davidomundo

Oh, nice!

I was under the impression that :preload was preferred over :include because :include returns a huge cartesian product with redundant info whereas preload returns only one row per needed item.

I guess this is another plus!

10/26/07 21:19:51 changed by fcheung

That's exactly it. The way it avoids the cartesian product is by doing it in steps.

10/26/07 21:21:55 changed by davidomundo

Now how long until we get this in Core? :)

11/05/07 16:06:41 changed by tarmo

  • cc changed from bitsweat, tamo, lifofifo to bitsweat, tarmo, lifofifo.

The patch seems to not apply anymore (due to the :joins changes from #10012), anyone planning to fix it?

11/05/07 16:18:49 changed by fcheung

I'll have a look at it today or tomorrow

11/06/07 07:01:27 changed by protocool

I've got a patch attached to #10061 which will change how :joins works again (to be more consistent with the intent of :joins).

Once I rework the tests that were orginally submitted with #10012 I'm told it will be applied.

11/06/07 09:17:30 changed by fcheung

  • attachment preloading.2.patch added.

updated to apply to latest edge

11/06/07 09:43:45 changed by fcheung

  • attachment preloading.3.patch added.

interpolate sql

11/06/07 09:45:36 changed by fcheung

I've updated this against the latest edge and also added some stuff for interpolated conditions. I'm not a big user of that myself, so while it doesn't just blow up in the most trivial of cases it could still be iffy.

11/06/07 10:09:49 changed by chuyeow

  • cc changed from bitsweat, tarmo, lifofifo to bitsweat, tarmo, lifofifo, chuyeow.

11/07/07 15:16:39 changed by tarmo

Now that #10061 has been fixed by [8109] lifofifo's patch applies again and tests pass (although the patch supposedly lacks the sql interpolation changes).

11/07/07 16:39:10 changed by fcheung

  • attachment preloading.4.patch added.

11/07/07 16:40:17 changed by fcheung

  • attachment preloading.5.patch added.

11/07/07 16:41:03 changed by fcheung

Updated against latest edge, still with interpolation.

12/06/07 23:58:44 changed by jbarnette

  • cc changed from bitsweat, tarmo, lifofifo, chuyeow to bitsweat, tarmo, lifofifo, chuyeow, jbarnette.

12/07/07 00:03:32 changed by lifofifo

Hey Fred,

If you have your benchmarking script handy, it'd be great if you could post the benchmarking report against the latest edge.

12/07/07 00:40:21 changed by fcheung

Well my data has changed quite a bit (and I've changed machine), so this isn't comparable to the previous benchmarks, but:

Without patch:

usersystemtotalreal
:include belongs_to0.6100000.0400000.650000( 0.685515)
:include has_many0.7600000.0600000.820000( 0.891274)
:include multiple has_many3.7200000.2300003.950000( 4.329490)
:include nested has_many2.7800000.1300002.910000( 3.200279)
:include single_record small has_many0.0100000.0100000.020000( 0.025702)
:include single_record medium has_many0.0100000.0000000.010000( 0.046113)
:include has_one0.6200000.0300000.650000( 0.694406)

With patch

:include belongs_to0.3100000.0400000.350000( 0.391471)
:include has_many0.5500000.0400000.590000( 0.728103)
:include multiple has_many0.9300000.0900001.020000( 1.283857)
:include nested has_many2.2000000.1600002.360000( 2.834314)
:include single_record small has_many0.0100000.0000000.010000( 0.013746)
:include single_record medium has_many0.0100000.0000000.010000( 0.012381)
:include has_one0.4300000.0400000.470000( 0.574127)

In particular, my data no longer includes a good example of 2 has manys with large number of rows, that really killed old-style :include.

As an alternative example of the case where the current implemenation suffers:

Benchmark.bm(15) do |x|
x.report(":include multiple has_many") {10.times {Question.find :all, :limit => 100 :include => [:incoming_messages, :outgoing_messages, :log_entries], :order => 'questions.id desc'}}
end

We're loading 3 has_manys old style:

usersystemtotalreal
:include multiple has_many4.4000000.2400004.640000( 5.220345)

new style

usersystemtotalreal
:include multiple has_many0.5500000.0500000.600000( 0.741723)

So there's still a decent gain on 'normal' cases, and a huge win on the case the current code really doesn't like.

12/09/07 23:28:30 changed by fcheung

  • attachment preloading.6.patch added.

beef up some tests & pass more association options through

12/11/07 14:34:10 changed by ian.w.white@gmail.com

  • cc changed from bitsweat, tarmo, lifofifo, chuyeow, jbarnette to bitsweat, tarmo, lifofifo, chuyeow, jbarnette, ian.w.white@gmail.com.

01/04/08 14:04:45 changed by fcheung

  • attachment preloading.7.patch added.

rebased on latest edge

01/16/08 17:04:36 changed by Aleksey Kondratenko

  • attachment test-order.patch added.

failing test with symbol arg to :order option

01/16/08 17:05:19 changed by Aleksey Kondratenko

  • attachment fix-order.patch added.

fix for test-order.patch

01/16/08 17:06:23 changed by Aleksey Kondratenko

  • attachment test-sti.patch added.

failing test for polymorphic association preloading on STI subclass

01/16/08 17:06:55 changed by Aleksey Kondratenko

  • attachment fix-sti.patch added.

fix for test-sti

01/16/08 17:07:44 changed by Aleksey Kondratenko

tried latest patch on top of trunk r8641. Found couple of bugs. Tests and fixes are applied.

01/19/08 03:50:16 changed by bitsweat

  • keywords changed from activerecord eagerloading to eagerloading.
  • milestone changed from 2.x to 2.1.

01/19/08 04:19:56 changed by bitsweat

  • status changed from new to closed.
  • resolution set to fixed.

(In [8672]) Introduce preload query strategy for eager :includes. Closes #9640.

01/21/08 04:02:22 changed by codafoo

  • attachment fix_belongs_to_with_foreign_key.2.patch added.

Fixed a bug with preloading belongs_to associations that doesn't use 'id' as the primary_key.

01/21/08 04:05:17 changed by codafoo

  • attachment fix_belongs_to_with_foreign_key.patch added.

Fixed a bug with preloading belongs_to associations that doesn't use 'id' as the primary_key.

01/21/08 04:09:19 changed by codafoo

  • status changed from closed to reopened.
  • type changed from enhancement to defect.
  • resolution deleted.

There is a small bug with eager loading belongs_to associations that does not use the standard 'id' primary_key. I've attached a patch that adds a test to check for it, and a fix for the bug.

01/21/08 10:25:03 changed by fcheung

Oops, my bad :-) That certainly looks sane to me (and thanks Aleksey for your fixes!)

01/24/08 18:34:10 changed by blweiner

  • attachment fix_nested_includes.patch added.

fixes nested includes for preloading

01/24/08 18:39:43 changed by blweiner

This broke nested includes for me, i.e. :include => { :model1 => :model2 }. There was a problem when the association was NULL and also when it was retrieving the nested association, it wasn't sanitizing the SQL conditions for the parent model, rather than the associated child.

I've submitted a patch that fixes it for me. (Hopefully it's done correctly, it's my first patch.)

01/25/08 13:43:45 changed by tom

[8672] completely broke eager loading for me, because the naive regexes in include_eager_conditions? and include_eager_order? fail to match quoted table and column names such as those generated by the :conditions => { :bar => 'baz' } finder syntax, i.e.

Foo.find :all, :conditions => { :bar => 'baz' }

generates

SELECT * FROM "foos" WHERE ("foos"."bar" = 'baz')

which scan(/([\.\w]+)\.\w+/) totally misses, so preloading always happens even though there may be conditions or orders which depend upon a join. You've only got away with it because all of the test cases use :conditions => "bar = 'baz'" instead of the hash syntax and so any table and column names are always unquoted. Would a regex like /([\.\w]+)["']?\.["']?\w+/ be enough, or could we use a more robust way to analyse the table references?

01/31/08 07:37:51 changed by bitsweat

blweiner, your patch is missing test/models/picture.rb and test/models/avatar.rb

01/31/08 07:44:57 changed by blweiner

  • attachment fix_nested_includes2.patch added.

01/31/08 07:45:18 changed by bitsweat

tom, looks good. Needs a unit test.

01/31/08 07:50:19 changed by bitsweat

(In [8762]) Fixed preloading belongs_to associations which reference a custom foreign key. References #9640.

02/02/08 08:34:59 changed by mschuerig

  • cc changed from bitsweat, tarmo, lifofifo, chuyeow, jbarnette, ian.w.white@gmail.com to bitsweat, tarmo, lifofifo, chuyeow, jbarnette, ian.w.white@gmail.com, michael@schuerig.de.

I don't see where vanilla AR eager loading results in a cartesian product of tables being loaded. The cartesian product of two or more tables is the set of all combinations of the rows from each of these tables. By contrast, :include generates left outer joins, which only load a subset of the cartesian product.

The effect of a left outer join is that columns from the base table are repeated for each joined row. If this has the effect that duplicate objects are allocated over and over for what is essentially the same value, that's far from optimal, of course. However, this behavior should be improved somewhere way down in the database adapter.

At this time I'm not very enthusiastic about the approach taken by the patch. I can think of a few important questions:

  • How is loading of associated objects affected by concurrent DB updates (atomicity)?
  • How does the patch compare to vanilla eager loading when there are lots of concurrent accesses to the DB?
  • What's the performance impact on DBMS other than MySQL?

02/03/08 13:29:47 changed by fcheung

There's definitely a difference regarding atomicity. I don't think this is an issue though (and you can always stik it in a transaction if it is), given that you're normally trying to speed up something along the lines of

for post in @posts
  #do something with post.comments
end

Which isn't atomic at all.

Regarding the cartesian product, maybe that's the wrong word, substitute 'hell of a lot of rows' if you like. But as far as concrete examples go, i've got a Question object which has many :incoming_messages, :outgoing_messages. one particular instance of it has 145 outgoing, 160 incoming.

If I load that instance, with those associations, the sql you get is (minus all the table aliasing) is

SELECT  ... FROM `questions` 
LEFT OUTER JOIN `incoming_messages` 
ON incoming_messages.question_id = questions.id 
LEFT OUTER JOIN `outgoing_messages` ON outgoing_messages.question_id = questions.id WHERE (`questions`.`id` = 2389495)

If I run that query on its own, the resultset has 23200 rows (=145 * 160) which sounds rather cartesiany to me.

I've little expertise with other databases so I'll leave that up to someone more qualified

02/03/08 15:13:57 changed by mschuerig

"Hell of a lot of rows" is definitely a fitting description (to get the cartesian product, leave out all join conditions). In Rails this causes performance pain and suffering due to the inordinate number of short-lived objects that are allocated.

The easiest solution would be to just not :include multiple has_many associations on the same object. Including nested has_many associations is an entirely different case, however, as it does not suffer from the same explosion of rows.

02/03/08 15:51:44 changed by fcheung

With regards to multiple has_manys that is certainly the issue (and gettin 23000 rows back for 300 rows of actual information feels inefficient, independantly of rails). Not including multiple has_manys does avoid this, but that's not always a very appealing solution.

(in reply to: ↑ description ) 02/04/08 18:48:17 changed by acechase

  • cc changed from bitsweat, tarmo, lifofifo, chuyeow, jbarnette, ian.w.white@gmail.com, michael@schuerig.de to bitsweat, tarmo, lifofifo, huyeow, jbarnette, ian.w.white@gmail.com, michael@schuerig.de, acechase.

02/04/08 18:50:15 changed by lifofifo

  • cc changed from bitsweat, tarmo, lifofifo, huyeow, jbarnette, ian.w.white@gmail.com, michael@schuerig.de, acechase to bitsweat, tarmo, lifofifo, chuyeow, jbarnette, ian.w.white@gmail.com, michael@schuerig.de, acechase.

Poor chuyeow ;-)

02/13/08 19:27:45 changed by acechase

  • attachment fix_nested_include_with_mixed_record_type.patch added.

02/17/08 06:55:40 changed by GMFlash

I'm encountering an error when using nested preloading and one of the eager loaded records does not have any further associations. In the example below, Roles are a tree so I am preloading up to 4 levels deep. The problem is if a particular Role only has ancestors that go 2 levels deep then role.parent returns Nil and the preloading code still attempts to preload it. I have provided a patch to fix this issue.

Here is my example query:

user = User.find(:first,
  :conditions => {
    :account_id => account_id,
    :login      => login
  },
  :include => {
    :roles => [:permissions, :permission_assignments, {
      :parent => [:permissions, :permission_assignments, {
        :parent => [:permissions, :permission_assignments, {
          :parent => [:permissions, :permission_assignments]
        }]
      }]
    }]
  }
)

The error it produces:

NoMethodError in SessionsController#create
undefined method `preload_associations' for NilClass:Class

Stack Trace:
vendor/rails/activerecord/lib/active_record/association_preload.rb:25:in `preload_associations'
vendor/rails/activerecord/lib/active_record/association_preload.rb:19:in `each'
vendor/rails/activerecord/lib/active_record/association_preload.rb:19:in `preload_associations'
vendor/rails/activerecord/lib/active_record/association_preload.rb:16:in `preload_associations'
vendor/rails/activerecord/lib/active_record/association_preload.rb:16:in `each'
vendor/rails/activerecord/lib/active_record/association_preload.rb:16:in `preload_associations'
vendor/rails/activerecord/lib/active_record/association_preload.rb:25:in `preload_associations'
vendor/rails/activerecord/lib/active_record/association_preload.rb:19:in `each'
vendor/rails/activerecord/lib/active_record/association_preload.rb:19:in `preload_associations'
vendor/rails/activerecord/lib/active_record/association_preload.rb:16:in `preload_associations'
vendor/rails/activerecord/lib/active_record/association_preload.rb:16:in `each'
vendor/rails/activerecord/lib/active_record/association_preload.rb:16:in `preload_associations'
vendor/rails/activerecord/lib/active_record/association_preload.rb:25:in `preload_associations'
vendor/rails/activerecord/lib/active_record/association_preload.rb:19:in `each'
vendor/rails/activerecord/lib/active_record/association_preload.rb:19:in `preload_associations'
vendor/rails/activerecord/lib/active_record/association_preload.rb:16:in `preload_associations'
vendor/rails/activerecord/lib/active_record/association_preload.rb:16:in `each'
vendor/rails/activerecord/lib/active_record/association_preload.rb:16:in `preload_associations'
vendor/rails/activerecord/lib/active_record/base.rb:1247:in `find_every'
vendor/rails/activerecord/lib/active_record/base.rb:1236:in `find_initial'
vendor/rails/activerecord/lib/active_record/base.rb:1530:in `send'
vendor/rails/activerecord/lib/active_record/base.rb:1530:in `find_by_account_id_and_login'
vendor/rails/activesupport/lib/active_support/deprecation.rb:44:in `silence'
vendor/rails/activerecord/lib/active_record/base.rb:1530:in `find_by_account_id_and_login'
vendor/rails/activerecord/lib/active_record/base.rb:1518:in `send'
vendor/rails/activerecord/lib/active_record/base.rb:1518:in `method_missing'
app/models/user.rb:161:in `authenticate'
app/controllers/sessions_controller.rb:7:in `create'

02/17/08 06:57:21 changed by GMFlash

  • attachment fix_dont_attempt_to_preload_when_association_returns_nil.diff added.

02/17/08 23:42:21 changed by nzkoz

This ticket has more than 20 patches attached to it, it's becoming impossible to figure out what is going on :/

Can you please open new tickets for the issues which remain open, and assign them directly to me (nzkoz).

02/18/08 05:14:58 changed by nzkoz

  • status changed from reopened to closed.
  • resolution set to fixed.

Please open any remaining issues in new tickets and assign them directly to me.