I expect there will be controversy around this patch. It does break 11 existing tests, but it offers a huge performance benefit under certain circumstances and I don't believe much is lost.
The limited eager-loading pre-query currently adds all the LEFT OUTER JOINs when it is fetching the ids for use in the second query. It does this for the purpose of allowing :order or :conditions clauses to affect the selection through the eager-loaded tables. However in practice I don't think WHERE or ORDER BY clauses referring to the eager loaded tables are all that useful, especially since we are just loading the IDs of the base table. The utility of conditions is reduced here because all of the association will be loaded anyway. So if you put a condition on the eager-loaded model, it will have to apply to ALL associated objects in order to actually restrict the IDs that are selected. Similarly, ORDER BY on an association table has a narrow use-case.
In contrast, by using explicit :joins and :group you can achieve a much wider range of WHERE and ORDER BY clauses than what the vanilla LEFT OUTER JOINs can provide (especially considering you can't modify the ON clause for the eager-loaded queries).
Clearly there is a use case for the current behavior. The reason I want to change it is because it can offer an optimization of several orders of magnitude in cases where:
A) the base table result set is large
and
B) there are several eager loaded tables being joined
Just to show you what I'm talking about, here is the same page loaded back to back with and without this patch respectively:
Property Load IDs For Limited Eager Loading (0.096613) SELECT DISTINCT properties.id FROM properties INNER JOIN idxes as qidx_association_idxes ON qidx_association_idxes.id = properties.idx_id AND qidx_association_idxes.association_id = 2 WHERE (properties.idx_id IN( 7 )) AND (properties.idx_id IN( 7 ) AND properties.type IN( 'Land' )) ORDER BY price DESC LIMIT 0, 10
Property Load IDs For Limited Eager Loading (31.864761) SELECT DISTINCT properties.id FROM properties LEFT OUTER JOIN photos ON photos.property_id = properties.id LEFT OUTER JOIN open_houses ON open_houses.property_id = properties.id AND open_houses.start_time > DATE(NOW()) LEFT OUTER JOIN agents_properties ON agents_properties.property_id = properties.id LEFT OUTER JOIN agents ON agents.id = agents_properties.agent_id LEFT OUTER JOIN agent_photos ON agent_photos.agent_id = agents.id INNER JOIN idxes as qidx_association_idxes ON qidx_association_idxes.id = properties.idx_id AND qidx_association_idxes.association_id = 2 WHERE (properties.idx_id IN( 7 )) AND (properties.idx_id IN( 7 ) AND properties.type IN( 'Land' )) ORDER BY price DESC LIMIT 0, 10
As you can see the first query is 300 times faster. And this does not even use very large tables. The properties table is about 15,000 rows, with the WHERE conditions and INNER JOIN optimizing it to an active set of under 10,000 rows.