Ruby on Rails | Screencasts | Download | Documentation | Weblog | Community | Source

Ticket #4668 (closed defect: fixed)

Opened 2 years ago

Last modified 2 years ago

[PATCH] Find with :order and limited eager loading fails on PostgreSQL

Reported by: alex@purefiction.net Assigned to: David
Priority: high Milestone: 1.2
Component: ActiveRecord Version: 1.2.0rc2
Severity: critical Keywords: PostgreSQL eager loading patch
Cc:

Description

Event.find(:all, :include => ..., :order => ...) fails with:

Event Load IDs For Limited Eager Loading (0.000000)   PGError: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list

The "order by" clause includes the order column, but the select list generated by AR does not. This breaks a lot of query use cases.

Verified to fail with trunk.

Attachments

allow_find_with_order_and_limited_eager_loading.diff (4.7 kB) - added by eventualbuddha on 01/12/07 02:24:40.

Change History

04/09/06 19:46:58 changed by robbyrussell

It's because PostgreSQL requires the query to comply to a standard and MySQL is (sadly) more forgiving.

Can you post the query that its attempting to run?

04/09/06 22:08:18 changed by alex@purefiction.net

I know. The query:

  Event Load IDs For Limited Eager Loading (0.012995)   SELECT DISTINCT events.id FROM events LEFT OUTER JOIN event_participations ON event_participations.event_id = events.id LEFT OUTER JOIN users ON users.id = event_participations.user_id LEFT OUTER JOIN venues ON venues.id = events.venue_id WHERE ((events.time between '2006-04-09 00:00:00' and '2035-01-01 00:00:00') and venues.id = 424 and events.private = false) ORDER BY events.time LIMIT 20

Here's a possible patch (which seems to work here), which adds the column(s) in the order option to the column list:

--- vendor/rails/activerecord/lib/active_record/associations.rb.old	2006-04-09 02:47:54.000000000 +0200
+++ vendor/rails/activerecord/lib/active_record/associations.rb	2006-04-10 00:02:00.000000000 +0200
@@ -1163,10 +1163,10 @@
         end
  
         def select_limited_ids_list(options, join_dependency)
-          connection.select_values(
+          connection.select_all(
             construct_finder_sql_for_association_limiting(options, join_dependency),
             "#{name} Load IDs For Limited Eager Loading"
-          ).collect { |id| connection.quote(id) }.join(", ")
+          ).collect { |row| connection.quote(row['id']) }.join(", ")
         end
  
         def construct_finder_sql_for_association_limiting(options, join_dependency)
@@ -1174,7 +1174,14 @@
           #sql = "SELECT DISTINCT #{table_name}.#{primary_key} FROM #{table_name} "
           sql = "SELECT "
           sql << "DISTINCT #{table_name}." if include_eager_conditions?(options) || include_eager_order?(options)
-          sql << "#{primary_key} FROM #{table_name} "
+          sql << "#{primary_key}"
+          if options[:order]
+            # Convert order list, eg. "foo asc, bar desc" into column name list
+            options[:order].split(",").each do |part|
+              sql << ", #{part.split[0]}"
+            end
+          end
+          sql << " FROM #{table_name} "
           
           if include_eager_conditions?(options) || include_eager_order?(options)
             sql << join_dependency.join_associations.collect{|join| join.association_join }.join

Note that it has to use select_all instead of select_values because the problem mentioned in ticket #4671. Also, I am not sure why you are using the "distinct" keyword here, when you are selecting on a primary key which is by definition distinct; with the added column(s) it will most likely just be less efficient.

04/18/06 07:46:46 changed by anonymous

Here's an alternate patch that replaces the need for DISTINCT by using a GROUP BY statement on the fields specified in the ORDER BY expression. The changes are limited to a single method and there is no need to change select_values to select_all as in the above patch.

--- associations.rb.orig        2006-04-03 19:00:11.277154120 +0800
+++ associations.rb     2006-04-03 18:50:12.678155000 +0800
@@ -1154,7 +1154,7 @@
           scope = scope(:find)
           #sql = "SELECT DISTINCT #{table_name}.#{primary_key} FROM #{table_name} "
           sql = "SELECT "
-          sql << "DISTINCT #{table_name}." if include_eager_conditions?(options) || include_eager_order?(options)
+          sql << "#{table_name}." if include_eager_conditions?(options) || include_eager_order?(options)
           sql << "#{primary_key} FROM #{table_name} "

           if include_eager_conditions?(options) || include_eager_order?(options)
@@ -1163,6 +1163,17 @@
           end

           add_conditions!(sql, options[:conditions], scope)
+
+          if include_eager_conditions?(options) || include_eager_order?(options)
+            group_by = "#{table_name}.#{primary_key}"
+            order_by = options[:order] || ""
+            order_by = order_by.gsub(/\basc\b/i, "")
+            order_by.gsub!(/\bdesc\b/i, "")
+            group_by << ", #{order_by}" unless order_by.blank?
+
+            sql << "GROUP BY #{group_by} "
+          end
+
           sql << "ORDER BY #{options[:order]} " if options[:order]
           add_limit!(sql, options, scope)
           return sanitize_sql(sql)

04/26/06 02:00:52 changed by alex@purefiction.net

  • keywords changed from PostgreSQL eager loading to PostgreSQL eager loading patch.

Probably a better solution.

04/28/06 07:48:07 changed by anonymous

The original patch doesn't work with functions in the ORDER BY expression.

A contrived example:

# find all articles and order them by month posted, then by poster sorted firstname then lastname.
Article.find(:all, :include => "posted_by", :order => "date_trunc('month', posted_on), people.first_name, people.last_name")

The original patch does seem to be in SVN as changeset 4231.

04/28/06 18:58:06 changed by alex@purefiction.net

Go for the "group by" patch, which is much cleverer by far.

(follow-up: ↓ 8 ) 06/21/06 00:43:38 changed by bitsweat

  • status changed from new to closed.
  • resolution set to fixed.
  • milestone set to 1.2.

(in reply to: ↑ 7 ) 12/11/06 01:51:30 changed by peterroyal

  • status changed from closed to reopened.
  • version changed from 1.1.1 to 1.2.0rc1.
  • resolution deleted.

Replying to bitsweat:

[4231]

This doesn't seem to be fixed in 1.2RC1

If I do:

Photos.find( :all, :order => 'photos.created_at DESC', :include  => [ :photo_image ], :conditions => "photo_images.thumbnail = 'small'" )

it generates

SELECT DISTINCT ON (photos.id) photos.id, photos.created_at FROM photos LEFT OUTER JOIN photo_images ON photo_images.photo_id = photos.id WHERE (photo_images.thumbnail = 'small') ORDER BY photos.id, photos.created_at DESC LIMIT 10 OFFSET

.. which is clearly not in order by what I requested.

01/12/07 02:24:40 changed by eventualbuddha

  • attachment allow_find_with_order_and_limited_eager_loading.diff added.

01/12/07 02:27:19 changed by eventualbuddha

  • version changed from 1.2.0rc1 to 1.2.0rc2.

I've attached a patch with tests. The solution proposed in the postgresql IRC channel was to use a sub-query. I tried both of the above patches, and neither seemed to work. While this patch is functional, its quality could be improved. Specifically, I'm open to suggestions as to where to wrap the subquery. The current implementation feels like a bit of a kludge.

01/12/07 02:29:23 changed by bitsweat

  • summary changed from Find with :order and limited eager loading fails on PostgreSQL to [PATCH] Find with :order and limited eager loading fails on PostgreSQL.

01/12/07 05:14:58 changed by bitsweat

  • status changed from reopened to closed.
  • resolution set to fixed.

(In [5887]) PostgreSQL: use a subselect to correctly perform eager finds with :limit and :order. Closes #4668.

01/12/07 05:24:12 changed by bitsweat

(In [5888]) Merge [5887] from trunk. References #4668.