Ruby on Rails | Screencasts | Download | Documentation | Weblog | Community | Source

Ticket #10012 (closed enhancement: fixed)

Opened 7 months ago

Last modified 7 months ago

[PATCH] Enhancement to find options, allow :joins to take addtional value forms similar to :include

Reported by: RubyRedRick Assigned to: core
Priority: normal Milestone: 2.x
Component: ActiveRecord Version: edge
Severity: normal Keywords:
Cc:

Description

This is a replacement for http://dev.rubyonrails.org/ticket/9923 which I submitted about a week ago.

This accomplishes the same goals with a much cleaner syntax which should fit much more naturally with the current AR api. Thanks to Josh Susser for a great suggestion.

Here is an excerpt from the documentation I added to associations.rb

== Adding Joins For Associations to Queries Using the :joins option

ActiveRecord::Base#find provides a :joins option, which takes either a string or values accepted by the :include option. if the value is a string, the it should contain a SQL fragment containing a join clause.

Non-string values of :joins will add an automatic join clause to the query in the same way that the :include option does but with two critical differences:

  1. A normal (inner) join will be performed instead of the outer join generated by :include. this means that only objects which have objects attached to the association will be included in the result. For example, suppose we have the following tables (in yaml format):
               Authors
                 fred:
                   id: 1   
                   name: Fred
                 steve:
                   id: 2
                   name: Steve
        
               Contributions
                 only:
                   id: 1
                   author_id: 1   
                   description: Atta Boy Letter for Steve     
                   date: 2007-10-27 14:09:54    
    
    and corresponding AR Classes

class Author: < ActiveRecord::Base

has_many :contributions

end

class Contribution < ActiveRecord::Base

belongs_to :author

end

The query Author.find(:all) will return both authors, but Author.find(:all, :joins => :contributions) will only return authors who have at least one contribution, in this case only the first. This is only a degenerate case of the more typical use of :joins with a non-string value. For example to find authors who have at least one contribution before a certain date we can use:

Author.find(:all, :joins => :contributions, :conditions => ["contributions.date <= ?", 1.week.ago.to_s(:db)])

  1. Only instances of the class to which the find is sent will be instantiated. ActiveRecord objects will not be instantiated for rows reached through the associations.

The difference between using :joins vs :include to name associated records is that :joins allows associated tables to participate in selection criteria in the query without incurring the overhead of instantiating associated objects. This can be important when the number of associated objects in the database is large, and they will not be used, or only those associated with a paricular object or objects will be used after the query, making two queries more efficient than one.

Note that while using a string value for :joins marks the result objects as read-only, the objects resulting from a call to find with a non-string :joins option value will be writable.

End of documentation excerpt

The new form of :joins can be used in find and calculation methods, and in scoped finds.

The patch includes extensive tests. I tried to adapt every existing test I could find in the existing ar tests which used the :include option which would be sensible if :include were changed to :joins.

All tests pass using rake test_mysql.

Attachments

ar_joins.diff (25.2 kB) - added by RubyRedRick on 10/28/07 17:21:27.
ar_joins_fixed_for_10011.diff (25.6 kB) - added by RubyRedRick on 10/29/07 16:08:03.

Change History

10/28/07 17:21:27 changed by RubyRedRick

  • attachment ar_joins.diff added.

10/29/07 03:48:40 changed by bitsweat

Looks good, but perhaps colliding with #10011:

  1) Failure:
test_ar_joins(ArJoinsTest)
    [./test/associations/ar_joins_test.rb:22:in `test_ar_joins'
     ./test/associations/ar_joins_test.rb:21:in `each'
     ./test/associations/ar_joins_test.rb:21:in `test_ar_joins'
     /usr/local/lib/ruby/gems/1.8/gems/mocha-0.5.5/lib/mocha/test_case_adapter.rb:19:in `__send__'
     /usr/local/lib/ruby/gems/1.8/gems/mocha-0.5.5/lib/mocha/test_case_adapter.rb:19:in `run']:
<false> is not true.

  2) Failure:
test_ar_joins_with_cascaded_two_levels(ArJoinsTest)
    [./test/associations/ar_joins_test.rb:31:in `test_ar_joins_with_cascaded_two_levels'
     ./test/associations/ar_joins_test.rb:30:in `each'
     ./test/associations/ar_joins_test.rb:30:in `test_ar_joins_with_cascaded_two_levels'
     /usr/local/lib/ruby/gems/1.8/gems/mocha-0.5.5/lib/mocha/test_case_adapter.rb:19:in `__send__'
     /usr/local/lib/ruby/gems/1.8/gems/mocha-0.5.5/lib/mocha/test_case_adapter.rb:19:in `run']:
<false> is not true.

  3) Failure:
test_ar_joins_with_complex_conditions(ArJoinsTest)
    [./test/associations/ar_joins_test.rb:49:in `test_ar_joins_with_complex_conditions'
     ./test/associations/ar_joins_test.rb:48:in `each'
     ./test/associations/ar_joins_test.rb:48:in `test_ar_joins_with_complex_conditions'
     /usr/local/lib/ruby/gems/1.8/gems/mocha-0.5.5/lib/mocha/test_case_adapter.rb:19:in `__send__'
     /usr/local/lib/ruby/gems/1.8/gems/mocha-0.5.5/lib/mocha/test_case_adapter.rb:19:in `run']:
<false> is not true.

10/29/07 04:38:33 changed by RubyRedRick

I've attached a patch which makes ar_joins compatible with #1011

10/29/07 04:40:16 changed by RubyRedRick

Actually, I should have said, an updated patch. The original is still there, just in case second thoughts arise about #1011

10/29/07 04:43:12 changed by hasmanyjosh

Rick, this looks like good work, though I haven't had a chance to try it for myself yet. The good looks pretty good, though it might need a little polish to blend with Rails code style and naming. The one concrete suggestion I'll make now is that you change the join_type from "JOIN" to "INNER JOIN", which is how Rails does that kind of join everywhere else.

10/29/07 04:52:05 changed by RubyRedRick

Changed second patch to adopt Josh's suggestion. All tests verified.

10/29/07 14:17:52 changed by tarmo

There's a typo on the documentation "the it should contain a SQL" should be "then it should contain a SQL".

I'm not so sure about the use of inner join though, if this is supposed to have the same behavior as :include but be faster because of not having to instantiate the included records then why not use the same join type that :include uses? Or at least somehow allow the join type to be changed.

I think it's possible with conditions to make an outer join behave like an inner join (:conditions => 'contributions.id IS NOT NULL'), but the reverse is not possible so using an inner join actually limits the usefulness of this feature. Although it certainly can be useful even with the limitation.

10/29/07 16:08:03 changed by RubyRedRick

  • attachment ar_joins_fixed_for_10011.diff added.

10/29/07 16:36:06 changed by RubyRedRick

tammo, thanks for the comment. I've fixed the typo and reloaded the second patch.

The use of inner join is intentional. This came out of my discussions with Josh Susser.

It goes to the different use cases for :include vs. :joins with the extended value syntax.

The :include option is (primarily) used to pre-load associated records. The reason it does a left outer join is that otherwise, it would skip base records (i.e. records from the table associated with the class receiving the find method) which didn't have an associated record. For example:

Class A

has_many :bs

end

class B

belongs_to :a

end

A data

id other attributes
1 ...
2 ...
3 ...

B data

id a_id value
1 1 5
2 1 7
3 2 3

Now:

A.find(:all, :include => :b)

Will return:

a(1) with the has_many :bs association preloaded with [b(1), b(2)] a(2) with the has_many :bs association preloaded with [b(3)] a(3) with an empty has_many :bs association

I'm using the notation a(n) here to indicate the row from table A whose id is n.

The include option needs to use LEFT OUTER JOIN in order to retrieve a(3). The new type of joins (what I call active record joins) introduced by this enhancement are intended to be used to join associated tables so that they can be referenced in the :conditions option (i.e. in the WHERE clause of the resulting query)

So, for example, given the same data:

A.find(:all, :joins => :bs, :conditions => ["bs.value > ?", 4])

returns only a(1) and that's what we wanted.

Since the purpose of the join is for filtering based on values of attributes in associated tables, and we're not including the corresponding associated AR objects in the result, the use of an outer join is overkill, and the inner join is more efficient.

And it also provides a nice use case which can be solved easily. Let's say we want to retrieve all A's which have at least one associated B. That's just:

A.find(:all, :joins => :b)

Because of the inner join, only A's which have at least one B that joins on a.id == b.a_id will be returned by the SQL query.

I hope that clarifies things.

10/29/07 21:39:56 changed by bitsweat

  • status changed from new to closed.
  • resolution set to fixed.

(In [8054]) Introduce finder :joins with associations. Same :include syntax but with inner rather than outer joins. Closes #10012.

11/02/07 10:40:06 changed by protocool

Can you please consider #10061 - this change introduces some undesireable behavior.