Ruby on Rails | Screencasts | Download | Documentation | Weblog | Community | Source

Ticket #428 (closed defect: fixed)

Opened 4 years ago

Last modified 2 years ago

Stale database connections should be handled gracefully

Reported by: josh@dwgsolutions.com Assigned to: Jeremy Kemper <rails@bitsweat.net>
Priority: high Milestone: 1.0
Component: ActiveRecord Version: 0.14.3
Severity: normal Keywords: fcgi webrick mssql postgre mysql broken pipe timeout fd
Cc: me@julik.nl, schoenm@earthlink.net, rtomayko@gmail.com, tom@popdog.net, skaes@web.de, ssmoot@gmail.com, kennethkunz@gmail.com, charles.gerungan@gmail.com, david@davelee.com.au, zach@berkeleydata.net

Description

(excerpt from chat)

07:38 < Dreamer3> after running for about a day my rails app starts to err out with broken pipe 07:39 < Dreamer3> restarting apache fixes it, but obviously that isn't an acceptible solution

07:40 < nextangle> dreamer3: could be mysql timeouts

07:41 < kwe> dreamer3: I get mysql timeouts

07:41 < Dreamer3> nextangle: ?? it breaks consistently until apache is restarted and i can access mysql just fine by itself

07:41 < Dreamer3> nextangle: and this machine has like 0 load

07:41 < nextangle> dreamer3: FCGI holds a permanent connection to MySQL that might timeout 07:42 < alexeyv> Dreamer3: 2 cents say it's connection time-out

07:42 < alexeyv> Rails never renews database connections

07:43 < nextangle> I think you can look it up in the MySQL docs..

07:43 < nextangle> When you figure it out, lemme know and I'll post something about it

---

Ok, I don't have a solution yet... but here are the references I have...

http://dev.mysql.com/doc/mysql/en/Gone_away.html "By default, the server closes the connection after eight hours if nothing has happened. You can change the time limit by setting the wait_timeout variable when you start mysqld."

http://one.textdrive.com/pipermail/rails/2004-October/000100.html Someone else noting the same problem...

http://one.textdrive.com/pipermail/rails/2004-November/000203.html "PROPOSED FIX: Since Rails is going to be heavily used within a FastCGI environment, I suggest fixing the problem within ActiveRecord itself. More specifically, before abstract_adapter.rb raises ActiveRecord::StatementInvalid, it should check if the caught exception was caused by a broken pipe error, and in that case establish a new connection to the database." (I don't think this is a half bad idea - wouldn't guess it would be hard to impliment)

http://one.textdrive.com/pipermail/rails/2004-November/000204.html Seems to be a "hack" fix to me... as killing the FastCGI thread every "x units of time" for no reason other than a dropped connection (that AR could reestablish) is going to be slower (have to reload and recache all those Ruby files) than just adding a check to the AR code to deal with this issue.

I like the proposed solution with modifying AR (or would it be the MySQL adapter?)... but am thinking of using the workaround and killing the FastCGI every 7 hours to insure the app continues to work. It's worth nothing that since this seems to be a timeout issue heavy (or even low-medium) traffic sites _should_ not even see this problem... Though I can't really imagine no one hit the website in question in the past 8 hours, I suppose it is possible...

I suppose I could also make the MySQL timeout longer... 16 hours, 24 hours, etc... but I don't like this idea. Perhaps this is a better idea than changing the FCGI settings...

If anyone else has thoughts, commends, ideas, please add...

Attachments

oci_reconnect.patch (6.6 kB) - added by mschoen on 11/14/05 08:35:10.
oracle implementation
sqlserver_reconnect.patch (2.6 kB) - added by kajism@yahoo.com on 11/15/05 08:52:29.
Implementation of active? and reconnect! for sqlserver_adapter.rb
sqlserver_active_and_reconnect_enhancements.diff (1.4 kB) - added by tomafro on 11/15/05 13:03:41.
oci_ping.patch (2.2 kB) - added by mschoen on 11/24/05 08:34:43.
makes Oracle work w/ latest approach
fb_reconnect_r3190.diff (2.2 kB) - added by Ken Kunz <kennethkunz@gmail.com> on 11/25/05 06:29:59.
Firebird implementation (based on rev. 3190)
mysql_connection_timeout_controller_test.rb (1.0 kB) - added by wv on 05/23/07 10:05:10.
Test-case for MySQL server has gone away

Change History

01/06/05 13:22:30 changed by josh@dwgsolutions.com

Ok, adding the following line to my my.cnf doesn't seem to change the setting... am I missing something?

set-variable = wait_timeout=57600

01/06/05 13:35:08 changed by anonymous

Ok, "interactive_timeout=57600" changes the client timeouts... I'll see if this helps my situation...

01/20/05 20:56:47 changed by josh@dwgsolutions.com

  • version changed from 0.9.2 to 0.9.3.

Ok, the poll on the primary site seems to work but just saw the pipe error again with the infrequently user admin tool... restarting Apache fixed it... so obviously my mysql timeout changing to 16 hours isn't going to work for seldom used apps...

01/20/05 20:57:32 changed by josh@dwgsolutions.com

I'll try to upgrade to 0.9.4.1 soon and see if the problem is still evident...

01/27/05 03:27:17 changed by anonymous

  • version changed from 0.9.3 to 0.9.4.1.

Still broken in 9.4.1 (error in the infrequently used admin tool... the frequently used site works fine)... upgrading to 9.5...

02/15/05 00:28:59 changed by josh@dwgsolutions.com

  • version changed from 0.9.4.1 to 0.9.5.

Still broken in 0.9.5....

02/18/05 17:22:35 changed by david

This adds additional clues on auto_reconnect=true -- http://epi.is-a-geek.net/wiki/RailsDBFix

02/21/05 01:47:42 changed by ulysses

  • status changed from new to closed.
  • resolution set to fixed.

Fixed with a team effort from Nicholas Seckar & DHH.

04/21/05 01:16:16 changed by anonymous

  • status changed from closed to reopened.
  • resolution deleted.

I still have this problem on win32. Why was this set to "fixed"?

04/21/05 01:36:53 changed by nzkoz

anonymous, Are you sure you're receiving the same error?

There's now code in the adapter that automatically retries the query after re-establishing the connection. Which is why it was marked fixed.

Please make sure your email is included when commenting on bugs as it makes it harder for us confirm details with you.

04/21/05 17:20:37 changed by tomkaltz@msn.com

The following is the error. An apache restart fixes the problem but only for a day.

ActiveRecord::StatementInvalid in Bookings#list

Invalid argument: SELECT * FROM bookings

04/23/05 17:01:02 changed by nzkoz

  • milestone set to 1.0.

06/03/05 18:01:17 changed by tomkaltz@msn.com

Anyone have any progress on this?

06/06/05 00:23:27 changed by nzkoz

Posting here on pragdave's part

      LOST_CONNECTION_ERROR_MESSAGES = [ 
        "Server shutdown in progress",
        "Broken pipe", 
        "Lost connection to MySQL server during query", 
        "MySQL server has gone away"
      ]

Any chance the errors are being localised?

Those of you who are seeing this, what's the exact text of the error from the logs/error page?

06/06/05 11:30:57 changed by anonymous

  • priority changed from high to alto.
  • version changed from 0.9.5 to 0,9,5.
  • milestone changed from 1.0 to 1,0.

08/15/05 08:07:57 changed by g.arrowsmith@gmail.com

I have exactly the same pattern as described above, after 12 hours or so the application refuses to respond and the log states the following:

Processing InvoiceController#list (for 192.168.1.124 at Mon Aug 15 09:33:11 GMT Daylight Time 2005) Parameters: {"action"=>"list", "controller"=>"invoice"} Invoice Count (0.000000) [1;37mInvalid argument: SELECT COUNT(*) FROM invoices WHERE status='open'

ActiveRecord::StatementInvalid (Invalid argument: SELECT COUNT(*) FROM invoices WHERE status='open' ):

Config: Windows XP & 2000, Mysql 4.1.13 rails 0.13.1

After an apache reboot the app functions perfectly until the following day. This behaviour is identical with Webrick.

Is the only way to handle this issue to reconfigure mysql?

08/15/05 13:35:48 changed by anonymous

  • priority changed from alto to high.
  • version changed from 0,9,5 to 0.9.5.
  • milestone changed from 1,0 to 1.0.

08/19/05 13:58:39 changed by admin [nospamat] electricvisions dot com

  • keywords changed from fcgi mysql broken pipe timeout to fcgi webrick mysql broken pipe timeout.
  • version changed from 0.9.5 to 0.13.1.
  • summary changed from Broken Pipe errors with FCGI when MySQL connection timesout to Broken Pipe errors when MySQL connection times out.

I'm getting the same problem on my Windows XP machine using WEBrick (I've also received a comment from someone experiencing the same issue on both WEBrick and Apache). A restart of the server fixes the issue.

Here is a dump of the error:

ActiveRecord::StatementInvalid (Invalid argument: SELECT COUNT(*) FROM users ):

c:/programs/ruby/lib/ruby/gems/1.8/gems/activerecord-1.11.1/lib/active_record/connection_adapters/abstract_adapter.rb:462:in `log' c:/programs/ruby/lib/ruby/gems/1.8/gems/activerecord-1.11.1/lib/active_record/connection_adapters/mysql_adapter.rb:121:in `execute' c:/programs/ruby/lib/ruby/gems/1.8/gems/activerecord-1.11.1/lib/active_record/connection_adapters/mysql_adapter.rb:230:in `select' c:/programs/ruby/lib/ruby/gems/1.8/gems/activerecord-1.11.1/lib/active_record/connection_adapters/mysql_adapter.rb:101:in `select_one' c:/programs/ruby/lib/ruby/gems/1.8/gems/activerecord-1.11.1/lib/active_record/base.rb:461:in `count_by_sql' c:/programs/ruby/lib/ruby/gems/1.8/gems/activerecord-1.11.1/lib/active_record/base.rb:454:in `count' c:/programs/ruby/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/pagination.rb:154:in `count_collection_for_pagination' c:/programs/ruby/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/pagination.rb:171:in `paginator_and_collection_for' c:/programs/ruby/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/pagination.rb:111:in `paginate' /app/controllers/users_controller.rb:12:in `list' /app/controllers/users_controller.rb:7:in `index' c:/programs/ruby/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/base.rb:756:in `send' c:/programs/ruby/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/base.rb:756:in `perform_action_without_filters' c:/programs/ruby/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/filters.rb:295:in `perform_action_without_benchmark' c:/programs/ruby/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/benchmarking.rb:41:in `perform_action_without_rescue' c:/programs/ruby/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/benchmarking.rb:41:in `measure' c:/programs/ruby/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/benchmarking.rb:41:in `perform_action_without_rescue' c:/programs/ruby/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/rescue.rb:80:in `perform_action' c:/programs/ruby/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/base.rb:356:in `send' c:/programs/ruby/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/base.rb:356:in `process' c:/programs/ruby/lib/ruby/gems/1.8/gems/rails-0.13.1/lib/dispatcher.rb:32:in `dispatch' c:/programs/ruby/lib/ruby/gems/1.8/gems/rails-0.13.1/lib/webrick_server.rb:105:in `handle_dispatch' c:/programs/ruby/lib/ruby/gems/1.8/gems/rails-0.13.1/lib/webrick_server.rb:71:in `service' c:/programs/ruby/lib/ruby/1.8/webrick/httpserver.rb:104:in `service' c:/programs/ruby/lib/ruby/1.8/webrick/httpserver.rb:65:in `run' c:/programs/ruby/lib/ruby/1.8/webrick/server.rb:155:in `start_thread' c:/programs/ruby/lib/ruby/1.8/webrick/server.rb:144:in `start' c:/programs/ruby/lib/ruby/1.8/webrick/server.rb:144:in `start_thread' c:/programs/ruby/lib/ruby/1.8/webrick/server.rb:94:in `start' c:/programs/ruby/lib/ruby/1.8/webrick/server.rb:89:in `each' c:/programs/ruby/lib/ruby/1.8/webrick/server.rb:89:in `start' c:/programs/ruby/lib/ruby/1.8/webrick/server.rb:79:in `start' c:/programs/ruby/lib/ruby/1.8/webrick/server.rb:79:in `start'

08/26/05 00:00:56 changed by anonymous

  • priority changed from high to highest.

08/26/05 23:58:20 changed by ulysses

Those who can produce these errors, please find out what MySQL is telling Rails.

If all we do is change the priority, nothing will get fixed. If someone decides to spend some time doing some research, we can move forward.

08/29/05 04:40:49 changed by garrow

I've used the following settings to accelerate the issue and enabled log_warnings (don't want to wait 8 hours or so) and gives the error described in my earlier entry. wait_timeout=600 interactive_timeout=600

Log file mysql.err

050829 9:49:48 InnoDB: Started; log sequence number 0 46732 C:\Program Files\MySQL\MySQL Server 4.1\bin\mysqld-nt: ready for connections. Version: '4.1.13-nt' socket: port: 3307 Official MySQL binary

050829 9:56:01 [Warning] Aborted connection 6 to db: 'xxx' user: 'webrick' host: `localhost' (Got an error reading communication packets)

050829 9:59:51 [Warning] Aborted connection 1 to db: 'unconnected' user: 'root' host: `localhost' (Got an error reading communication packets)

050829 10:01:07 [Warning] Aborted connection 5 to db: 'xxx' user: 'webrick' host: `localhost' (Got an error reading communication packets)

050829 10:15:43 [Warning] Aborted connection 4 to db: 'xxx' user: 'webrick' host: `localhost' (Got an error reading communication packets)

Please let me know if there is any other information I can gather, as this doesn't look particularly helpful.

09/01/05 09:18:30 changed by anonymous

I got this after a fresh installation of ruby, rails and postgres 8.0 after an idle time of only about 1 hour. Does not make me confident that I should go on with RoR instead of J2EE.

09/01/05 16:08:34 changed by nzkoz

anonymous,

This bug relates to mysql, not postgresql. Please open another ticket if you're still having trouble with postgresql.

When you do, please try and keep comments constructive.

09/05/05 06:26:18 changed by anonymous

I've tried the suggestion of scheduling an http request a regular interval however this hasn't helped. Looks like the lifetime of the connection is fixed.

Scheduled daily at 2pm & 6pm:

Processing InvoiceController#list (for 192.168.1.254 at Sat Sep 03 18:02:00 GMT Daylight Time 2005)

Parameters: {"action"=>"list", "controller"=>"invoice"} Invoice Count (0.000000) SELECT COUNT(*) FROM invoices WHERE status='open'  Invoice Load (0.015000) SELECT * FROM invoices WHERE status='open' ORDER BY created_on DESC LIMIT 10

... Completed in 0.21800 (4 reqs/sec) | Rendering: 0.18700 (85%) | DB: 0.01500 (6%) http://192.168.1.254/invoice/list

Processing InvoiceController#index (for 192.168.1.254 at Sat Sep 03 18:03:00 GMT Daylight Time 2005)

Parameters: {"controller"=>"invoice"}

Redirected to http://192.168.1.254:3000/invoice/list Completed in 0.06300 (15 reqs/sec) | DB: 0.00000 (0%) http://192.168.1.254/

Processing InvoiceController#list (for 192.168.1.254 at Sat Sep 03 18:03:00 GMT Daylight Time 2005)

Parameters: {"action"=>"list", "controller"=>"invoice"} Invoice Count (0.000000) Invalid argument: SELECT COUNT(*) FROM invoices WHERE status='open' 

ActiveRecord::StatementInvalid (Invalid argument: SELECT COUNT(*) FROM invoices WHERE status='open' ):

C:/ruby/lib/ruby/gems/1.8/gems/activerecord-1.11.1/lib/active_record/connection_adapters/abstract_adapter.rb:462:in `log' C:/ruby/lib/ruby/gems/1.8/gems/activerecord-1.11.1/lib/active_record/connection_adapters/mysql_adapter.rb:121:in `execute' C:/ruby/lib/ruby/gems/1.8/gems/activerecord-1.11.1/lib/active_record/connection_adapters/mysql_adapter.rb:230:in `select' C:/ruby/lib/ruby/gems/1.8/gems/activerecord-1.11.1/lib/active_record/connection_adapters/mysql_adapter.rb:101:in `select_one' C:/ruby/lib/ruby/gems/1.8/gems/activerecord-1.11.1/lib/active_record/base.rb:461:in `count_by_sql' C:/ruby/lib/ruby/gems/1.8/gems/activerecord-1.11.1/lib/active_record/base.rb:454:in `count' C:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/pagination.rb:154:in `count_collection_for_pagination' C:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/pagination.rb:171:in `paginator_and_collection_for' C:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/pagination.rb:111:in `paginate' app/controllers/invoice_controller.rb:19:in `list' C:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/base.rb:756:in `send' C:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/base.rb:756:in `perform_action_without_filters' C:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/filters.rb:295:in `perform_action_without_benchmark' C:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/benchmarking.rb:41:in `perform_action_without_rescue' C:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/benchmarking.rb:41:in `measure' C:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/benchmarking.rb:41:in `perform_action_without_rescue' C:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/rescue.rb:80:in `perform_action' C:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/base.rb:356:in `send' C:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.9.1/lib/action_controller/base.rb:356:in `process' C:/ruby/lib/ruby/gems/1.8/gems/rails-0.13.1/lib/dispatcher.rb:32:in `dispatch' C:/rails/accounts/latest/public/dispatch.fcgi:20 C:/rails/accounts/latest/public/dispatch.fcgi:18:in `each_cgi' C:/ruby/lib/ruby/site_ruby/1.8/fcgi.rb:595:in `each' C:/ruby/lib/ruby/site_ruby/1.8/fcgi.rb:595:in `each_cgi' C:/rails/accounts/latest/public/dispatch.fcgi:18

09/11/05 05:43:43 changed by david

  • status changed from reopened to closed.
  • resolution set to fixed.

This problem was initially only fixed for people using the MySQL/Ruby (native C) adapter. With #2163, it's been fixed for the bundled adapter too.

10/15/05 10:54:53 changed by brillo

I am running *0.13.1* (11 July, 2005) on win2k with mysql. I have experienced this error using the bundled version of the mysql adapter.

ActiveRecord::StatementInvalid in Tournament#list Invalid argument: SELECT COUNT(*) FROM tournaments

10/16/05 08:56:57 changed by bpotier@edreamers.org

I'm quite new to Ruby and can't be sure I haven't done something that would cause the error but I'm experiencing the same pb just trying to implement any easy ror application examples I found on the web. Running latest stable versions of RoR on winxp, Apache 1.3 and MySQL 4.1.9

10/25/05 08:31:57 changed by Tamer Salama

  • status changed from closed to reopened.
  • resolution deleted.

I'm still getting ActiveRecord::StatementInvalid "Invalid Argument.." on Windows 2003, Rails 0.13.1, AR 1.11.1 (bundled adapter), WEBrick, MySQL 4.1.9. I had to set wait_timeout and interactive_timeout to high values in my.ini to prevent MySQL connection closure.

10/26/05 17:25:49 changed by anonymous

To quote the MySQL website: "The solution to this is to either do a mysql_ping on the connection if there has been a long time since the last query (this is what MyODBC does) or set wait_timeout on the mysqld server so high that it in practice never times out."

Would it be advisable to introduce a mysql_ping on the connection to overcome the connection timeout?

10/28/05 14:31:12 changed by bauer.mail@gmail.com

  • keywords changed from fcgi webrick mysql broken pipe timeout to fcgi webrick mssql postgre mysql broken pipe timeout.
  • version changed from 0.13.1 to 0.14.1.
  • severity changed from normal to blocker.

This isn't a MySql, MS SQL, PostgresSQL, or any other database problem, its a RAILS problem. This happens on two different servers, one using MS SQL and the other using PostgresSQL, the errors are identical. If the connection is closed, it is never re-opened. Restarting Apache fixes the problem every time. This is with ActiveRecord 1.12.1. We're going to test with Rails 0.14.2 (activerecord v1.12.2) to see if the problem persists.

10/28/05 15:55:50 changed by bauer.mail@gmail.com

  • version changed from 0.14.1 to 0.14.2.

Here are some steps to re-create this issue (tested on rails 0.14.1 and 0.14.2 with MSSQL and PostgreSQL)

1. Start Web Server and database 2. Visit site, verify that it is working 3. Restart database service 4. Visit site again, and verify that it no longer works

In my opinion, that pretty much proves that its an issue with ActiveRecord, not any particular database adapter/driver.

10/28/05 17:12:36 changed by ssmoot@gmail.com

The problem in both MySQL and PostgreSQL (I'm not sure why MSSQL is broken since it should be using DBI:ADO), is a Socket problem.

If you restart the database, the TCPSocket they use is at EOF. You can prove this to yourself by:

require 'socket'
=> true
sock = TCPSocket.new '127.0.0.1', 5432 # connect to a local PostgreSQL instance
=> #<TCPSocket:0x2c63280>
# restart the PostgreSQL Service now.
sock.eof?
=> true

It's important that you check for EOF after restarting the server. Not only does this duplicate the problem at hand, but: Socket#eof? is blocking in Ruby. It will keep blocking indefinitely until it recieves data if it's not already at EOF.

So you can't really use Socket#eof? to test for an open connection.

I did dig through ActiveRecord and add an ActiveRecord:ConnectionAdapters:AbstractAdapter#open? method, but then I ran into the blocking issue and googled up a comp.lang.ruby thread where Ara T. Howard describes the issue with Socket#eof?.

The following comment by Bill Kelley is interesting, but I'm not sure if it's applicable to this situation since I'm no authority, but I doubt the database server is sending out anything.

I've changed my program to infer EOF when gets() returns
nil.  I no longer call #eof? because although it admittedly
looks pretty darned thorough, it's no good for me to block. 

10/28/05 17:40:04 changed by ssmoot@gmail.com

Ok, the reason this also fails on MSSQL is that DBI:ADO throws a "Connection failure" ADO error that's just caught as a generic DBI::DatabaseError. So the root cause is _slightly_ different, but it's really the same issue all around:

ActiveRecord (and most/all the common drivers) have no mechanism for determining if the Connection is active. As far as ActiveRecord is concerned, if a Connection exists at all, it's active, and it's going to try to send messages through it.

It's frankly pretty surprising that it seems no one's run into this before, or taken a few minutes out to find out the cause. In my opinion Rails is definitely not "1.0" material until we can restart the databases independently of the web servers.

AbstractAdapter needs to implement an open?/connection_open?/whatever method, and that is going to affect a number of files, so I'm not sure how this sort of thing is typically handled?

The fix for SQL Server should be trivial. For PostgreSQL and MySQL, I'm not really sure. Perhaps some helpful exception is raised when you try to send data through an EOF TCPSocket.

10/31/05 13:00:06 changed by ssmoot@gmail.com

Ok, made the necessary modifications to the Postgres driver.

Line 86 in postgres-pr-0.4.0/lib/postgres-pr/connection.rb should change from this:

@conn << Query.dump(sql)

To this:

envelope = Query.dump(sql)
  	
begin
  @conn << envelope
rescue Errno::EINVAL => e
  raise ConnectionError.new('Socket Disconnected (Write Error)')
end

The MySQL fix should be very similar. In fact, I think it might be just wrapping line 461 in activerecord-1.12.2/lib/active_record/vendor/mysql.rb with a similar begin->rescue.

I'm not sure if the above does anything for the C version of the postgres library. I suspect not.

I'm also not sure how to coordinate these updates since it involves multiple 3rd party library fixes.

Anyways, once all the database drivers are throwing useful errors (I think it would be a mistake to catch these exceptions higher in the stack), it should be trivial for ActiveRecord to catch them and refresh the connection(s). (You could flush the whole connection-pool I guess, but I'm not sure that that would always be necessary, and it shouldn't make any sort of performance difference to refresh the connections individually as-needed, so that's probably the way I'd do it.)

If someone else is already working on this, please let me know, otherwise I'll keep hacking away, and figure out how to submit a patch before the week's out hopefully.

I'd assign the ticket to myself, but I'm not sure what the etiquette (sp?) here is for that. :)

11/05/05 21:00:48 changed by david

  • keywords changed from fcgi webrick mssql postgre mysql broken pipe timeout to fcgi webrick mssql postgre mysql broken pipe timeout fd.

11/08/05 07:05:05 changed by bitsweat

I think a simple way is to keep a timestamp tracking the last database call in the connection adapter instance. When timestamp > timeout.minutes.ago, close and reopen the connection before proceeding further.

The only adapter change required is a close_connection method (or similar.)

11/08/05 07:08:34 changed by bitsweat

  • summary changed from Broken Pipe errors when MySQL connection times out to Stale database connections should be handled gracefully.

Furthermore, catching errors + reopening connection + retrying query is dangerous and begs for dirty data. We should detect a stale connection before attempting a query.

11/08/05 13:24:20 changed by ssmoot@gmail.com

The timestamp is not sufficient.

Say my timeout is 10 minutes. If I restart the database, I've got a bunch of dead connections now, not 10 minutes from now. Now you have to track not only when the last database call was, but filter on success so that I can wait 10 minutes and eventually my dead connections will timeout and renew.

That seems sloppy. Why renew connections when you don't have to? Why force a timeout when we already know the connection is dead and it's never going to come back?

11/08/05 19:57:58 changed by jinslee@gmail.com

As ssmoot pointed out earlier, using SQL Server exhibits this same behavior. To recreate the problem, simply fire up Apache + SQL Server, then restart SQL Server. Rails will throw a DBI::DatabaseError with description "Connection failure".

11/10/05 02:10:36 changed by ssmoot@gmail.com

Yeah, uhmm... I give up. :-)

11/10/05 17:27:05 changed by mschoen

I agree w/ Jeremy that automatically retrying an individual statement is dangerous, as it may result in a transaction being partially done.

What about the following approach...

1. When executing a statement against the db connection, rescue failures that indicate a connection failure (adapter specific) by:

  1. trying to reconnect
  2. setting the connection as dead if 1) fails
  3. logging that the statement failed, and indicate whether the reconnect attempt pass/failed
  4. re-raise the error

2. When executing a statement, if the db connection is marked as dead

  1. try to reconnect
  2. leave the connection as dead if 1) fails
  3. execute the statement if a) passes

This works better than a simple timeout, and ensures that at most only a single transaction will fail per scgi/fcgi if the target database has been restarted (and is available again).

11/11/05 04:58:30 changed by anonymous

  • cc set to schoenm@earthlink.net.

11/13/05 08:23:10 changed by bitsweat

(In [3000]) r3032@asus: jeremy | 2005-11-12 23:16:52 -0800

Ticket 428 - stale connections r3040@asus: jeremy | 2005-11-13 00:22:29 -0800 When AbstractAdapter#log rescues an exception, attempt to detect and reconnect to an inactive database connection. Connection adapter must respond to the active? and reconnect! instance methods. Initial support for PostgreSQL. References #428.

11/13/05 10:03:07 changed by bitsweat

(In [3001]) r3042@asus: jeremy | 2005-11-13 01:51:08 -0800

MySQL active? and reconnect! References #428. r3043@asus: jeremy | 2005-11-13 01:58:28 -0800 SQLite active? and reconnect! References #428. r3044@asus: jeremy | 2005-11-13 02:02:27 -0800 Update CHANGELOG with admonishment regarding avoidance of the log method.

11/13/05 10:18:01 changed by bitsweat

  • cc changed from schoenm@earthlink.net to schoenm@earthlink.net, rtomayko@gmail.com, tom@popdog.net.

Hey all, what do you think of this approach: when the driver raises an exception, rather than trying to parse the error message for a stale connection, just ask the connection adapter whether it's active. If not, reconnect. Raise the original exception in either case.

I'm unfamiliar with Oracle, SQLServer, and DB2, so please dig in, implement active? and reconnect!, and attach your patches.

We also need to build up some test cases for this behavior, if anyone would like to take the first crack at it.

11/13/05 21:32:21 changed by mschoen

I like the approach, and I'll submit a patch for Oracle support of active? and reconnect! this evening.

I'm still a little hung up on the retrying thing though. The most typical situation I'll face is a db restart, which will invalidate all 40 db connections (10 scgi instances on each of 4 production servers). With this patch, the app will recover, but only after serving 40 failures.

My app will always do a harmless select first in the request pipeline (to get the session from the ActiveRecordStore), so in my case it's always safe to reconnect-and-retry if the db connection fails on the first attempt.

Any thoughts on how this might be generalized?

11/14/05 00:59:30 changed by bitsweat

  • cc changed from schoenm@earthlink.net, rtomayko@gmail.com, tom@popdog.net to schoenm@earthlink.net, rtomayko@gmail.com, tom@popdog.net, skaes@web.de.

This is true. We could avoid the failures by doing the active? test before yielding to the block, but I was concerned with its expense (since it'd have to be called before every query rather than only on errors.) I would side with zero-failures/check-first, but others may disagree (particularly if it's a big slowdown.)

11/14/05 08:34:44 changed by mschoen

Hmm. Yeah, I'd be concerned that checking before every query would have a measurable impact on performance.

I'm attaching a patch that implements #active? and #reconnect! for Oracle. I also found that you were missing a multiplier of 100 to get the success rate calculated correctly in abstract_adapter.rb.

The patch also provides an implementation of auto-retry, though that functionality is turned off by default.

11/14/05 08:35:10 changed by mschoen

  • attachment oci_reconnect.patch added.

oracle implementation

11/14/05 09:43:38 changed by bitsweat

  • owner changed from David to Jeremy Kemper <rails@bitsweat.net>.
  • status changed from reopened to new.

11/14/05 09:58:20 changed by bitsweat

(In [3024]) Fix reconnect success rate to report as percentage. References #428.

11/14/05 10:01:12 changed by bitsweat

(In [3025]) Oracle: active? and reconnect! methods for handling stale connections. Optionally retry queries after reconnect. References #428.

11/14/05 14:57:27 changed by anonymous

  • version changed from 0.14.2 to 0.12.1.

I have read this ticket but am a bit unsure as what i need to do to get the resolution. I am running Apache2 on Windows XP with FastCGI, MySQL, and rails .12.1. I also get the stale connection from MySQL after a time of inactivity. I have set the wait_timeout on MySQL to help with the issue but would like a more permanent solution. Can someone outline what is needed to resolve this issue.

11/14/05 15:18:23 changed by ssmoot@gmail.com

  • cc changed from schoenm@earthlink.net, rtomayko@gmail.com, tom@popdog.net, skaes@web.de to schoenm@earthlink.net, rtomayko@gmail.com, tom@popdog.net, skaes@web.de, ssmoot@gmail.com.

I think that's a great solution Jeremy!

Wouldn't trying to check for active? before a query was issued result in doubling the number of queries issued though? I think that expense is just too great to justify, so I like your first approach much better.

11/14/05 18:18:36 changed by mschoen

Had another thought on this. What if we automatically do a retry under the following conditions:

1. We're not in the middle of a db transaction.

2. The statement being executed is a SELECT (ie., sql.upcase.starts_with? "SELECT")

Thoughts?

11/14/05 18:31:14 changed by ssmoot@gmail.com

If it can reconnect, and you're not in the middle of a transaction, why would you care if it's an UPDATE, DELETE or INSERT?

(I'm not trying to say that isn't important btw, I'm just not sure I follow your thinking since I'm having a hard time understanding why it would be?)

11/14/05 18:46:10 changed by mschoen

Because there are still cases where even when you're not in a transaction, re-trying a DML statement can be dangerous. Imagine that you're running an update of the form: update account set balance = balance + 100 where id = 42 and you're not in a transaction (so autocommit is turned on). What if the db fails right after the commit, but before you can get the ACK -- so that you see is a failure, even though the transaction succeeded. Reconnecting and retrying that statement (and depositing another $100 in the account) would be bad.

You can construct similar situations with DELETE and INSERT statements. Only SELECTs seem guaranteed to be safe to re-run (again, so long as you're not in the middle of a db transaction).

11/14/05 18:58:37 changed by bitsweat

Because the query may have altered the database then lost the connection. We can't tell whether the query succeeded or failed, only that the connection is dead. Retrying is gambling with our data.

INSERT INTO children (parent_id, name, gender) VALUES (1, 'Bob', 'M');
# inserts into children
# connection dies
# oops, connection died -- let's retry:
INSERT INTO children (parent_id, name, gender) VALUES (1, 'Bob', 'M');
# now we have twins named Bob:
SELECT * FROM children;
1 Bob M
2 Bob M

Now, the odds are low that we'll mangle our data with repeated SELECT queries outside of a transaction, as Michael suggests, so this is a good avenue to pursue, though we'll need to be able to disable the behavior for stored proc users.

Regarding whether to check active? before every query or only on error: it could double the number of queries if active? is implemented as a query. Michael implemented it for Oracle as the result of the last successful query, which means calling active? is cheap. However: using the result of the last query won't work if there is a long time between queries, since the connection may have died even though the last query succeeded.

to anonymous above, running Rails 0.12.1: the quick solution is to upgrade through 0.13.1 to 1.0. If you want to stay on 0.12.1 it'd take some effort to backport these changes; it's probably simpler to upgrade your app.

11/14/05 19:21:04 changed by ssmoot@gmail.com

I see...

Except, that if you only retried on a ConnectionFailure, then you wouldn't have that problem. Because wether you got the ACK or not, the exception wouldn't be a ConnectionFailure, it would be something else. In the case of the raw-sockets drivers, I'd just imagine they would never return from the sock.gets, so it would look like your FCGI process had just hung, but that's a different issue.

So I suppose I could see the scenario you laid out happening if you're not modifying the exception behaviour, but if we make the assumption that we only try reconnecting on ConnectionFailure errors, then Adapters that have decent exceptions will work without the nasty consequences you mentioned (I think), and ones that don't just won't retry since we can't determine whether it's safe to do so or not.

That logic relies on not getting the exact same error for a socket-write failure as a failed socket-read though, which might be flawed I suppose.

11/14/05 19:37:50 changed by mschoen

For Oracle I'm not sure that your assumption is correct. One of the several ways in which a connection can "die" is for your session to be killed.

If I'm running a statement, and my session gets killed at just the right time, it's possible that I get an "ORA-00028: your session has been killed" even though the statement was committed.

I'm not sure, and I don't know of a good way to test that either.

All in all, I'm a fan of the approach of retrying if we're not in a transaction and the statement is a SELECT, w/ the option of turning all retrying off for folks that use stored procedures (for whom even a SELECT can be dangerous to retry).

11/14/05 19:41:23 changed by ssmoot@gmail.com

I see. Well, bad assumption on my part then, thanks for setting me straight. I'd definitely agree with your solution then. A Select is likely to be the first query in a Rails request anyways, so in practice it should be just as effective anyways I suppose.

11/14/05 19:56:13 changed by mschoen

Right, that was my thinking as well -- that a select is likely to be the first query in a request.

In fact, I'd be happy w/ an approach that only retried on the first query on a request. Though keeping track of that might be harder than the proposed solution.

11/14/05 20:50:06 changed by bitsweat

You're right ssmoot, if we had more information about the failure we could do smarter things about it. In many cases we don't, but I hope we can move in that direction (including improving the underlying db drivers.)

How about this approach: reconnect_if_inactive! only on error (not before each query) and don't retry on disconnect, but do a reconnect_if_inactive! at the beginning of each web request. I think that gets us 90% to where we'd like to be; we can hash out the rest for 1.1.

11/14/05 21:03:11 changed by mschoen

Sounds good. If you implement that approach we should change the implementation of the OCI adapter's #active? method to use #ping on the connection -- otherwise it wouldn't do any good.

11/14/05 21:10:23 changed by bitsweat

Perhaps active? and reconnect! aren't the best API for what we'd like to do. Perhaps move reconnect_if_inactive! into the individual adapters as validate! or such? Then we could also do a COMMIT even if the connection is not dead to ensure that we don't get a leftover transaction from the last request.

11/15/05 08:52:29 changed by kajism@yahoo.com

  • attachment sqlserver_reconnect.patch added.

Implementation of active? and reconnect! for sqlserver_adapter.rb

11/15/05 09:05:49 changed by kajism@yahoo.com

  • version changed from 0.12.1 to 0.14.3.

I'm attaching a patch that implements #active? and #reconnect! for sqlserver.

11/15/05 11:32:02 changed by bitsweat

Thanks kajism@yahoo.com! Thoughts from Ryan, Tom?

11/15/05 11:35:17 changed by bitsweat

(In [3044]) SQLServer: active? and reconnect! methods for handling stale connections. References #428.

11/15/05 11:48:55 changed by tomafro

I'll give them a go . . .

11/15/05 13:02:43 changed by tomafro

kajism@yahoo.com's SQL Server implementation works fine in my testing. I've got three minor modifications to add, which I'll add as a patch in a second:

First, changed active? to return false if an InterfaceError occurs, as well as a DatabaseError.

Second, changed reconnect! so it doesn't fall over if disconnect fails. Disconnect will fail if called twice, so if the first reconnection fails, all subsequent reconnections would have also failed.

Finally, returning false from reconnect! if the reconnection fails.

I think these methods make a useful addition to the API, regardless of how stale connections are handled. I think the ideal way to check/handle these connections probably differs from application to application. It might be an idea to see how other db frameworks in other languages handle these issues (db timeouts being pretty much universal!).

I know that hibernate, for example, relies on the concept of sessions of work (in a web app usually spanning a single request). Timeout problems are therefore avoided, as when each session starts, the connection can be checked for staleness (or more likely, retrieved from a pool). Maybe there's some way we could do something similar - check the connection on the first call to ActiveRecord::Base.connection in each request? I'm not sure how easy that would be, but it's food for thought. I think any automatic retrying/testing on each query/rescuing failed queries is going to be very difficult to get right. I'll think about it some more.

11/15/05 13:03:41 changed by tomafro

  • attachment sqlserver_active_and_reconnect_enhancements.diff added.

11/15/05 13:40:49 changed by rtomayko@gmail.com

SQLServer patches look fine to me.

Just out of curiosity, will there be some way of turning this retry logic off? "Retry" and "database" are two words that frighten me dearly when used in close proximity. I've played these games before.

After reading back through the ticket thread, parts of the proposal seem fine but some of this (retrying during activity) I would prefer to disable and handle at the application level. Thanks.

11/15/05 14:25:53 changed by tomafro

To try and clarify what I wrote earlier, using the active?/reconnect! API I can add the following to ApplicationController to gracefully recover lost connections in my application. However, this isn't a universal solution, as it assumes the application only uses one connection (ie subclasses don't use alternative connections).

prepend_before_filter :ensure_db_connection_active
  
def ensure_db_connection_active
  unless ActiveRecord::Base.connection.active?
    unless ActiveRecord::Base.connection.reconnect!
      raise "Database not available"
    end
  end
end

11/15/05 15:21:11 changed by bitsweat

I think that's the best way too, Tom.

11/15/05 15:58:39 changed by bitsweat

(In [3045]) SQLServer: active? and reconnect! methods for handling stale connections. References #428.

11/15/05 16:06:10 changed by tomafro

Sorry to keep spamming this bug, but I've had an idea for a more universal solution, providing this failover for everyone whilst adding the minimum possble extra db calls (amount depends on the implementation of active?):

In ActiveRecord, allow connections to be marked as potentially stale (when we can't be sure when they'll next be used)

When ActiveRecord.connection is called, if the connection is potentially stale, do the unless active? || reconnect! dance and remove it from the stale connections.

Finally, in ActionController, when the request has completely finished, mark all non-stale connections stale.

This would give all rails users more stable db connections, at the cost of one extra call to active? per connection used in the request (so no extra cost if no database work is required, unlike my other suggestion). I won't have time to create a patch for this before the end of the week, but I'd be interested to hear everyone's thoughts. Maybe it's overkill?

11/15/05 16:29:30 changed by bitsweat

This is a connection pool; I think it's a good way to go. Otherwise we'd be putting portions of its implementation in ActionController::Base and the connection adapters, so we're better off factoring it out. There's an open ticket with a partial implementation; see #2162.

11/15/05 23:40:55 changed by anonymous

edgerails mysql is currently broken (fails after 10mins or so), I'm assuming it's related to this.

11/16/05 08:20:09 changed by bitsweat

  • cc changed from schoenm@earthlink.net, rtomayko@gmail.com, tom@popdog.net, skaes@web.de, ssmoot@gmail.com to schoenm@earthlink.net, rtomayko@gmail.com, tom@popdog.net, skaes@web.de, ssmoot@gmail.com, kennethkunz@gmail.com.

That's right, anonymous. Retry behavior is changing on trunk. Check out from http://dev.rubyonrails.org/svn/rails/branches/stable for the 1.0 code.

Adding Firebird maintainer.

11/16/05 17:19:06 changed by ssmoot@gmail.com

I think it's great everyone is contributing to getting this done.

I'd like to suggest that this is considered a temporary solution though. It seems that it's breaking encapsulation pretty badly to have a before filter in the Controller for example.

Seeing ActiveRecord migrate into a true DataMapper would be preferable to me.

In this case, I think it's preferable to go with _schoenm_'s solution and have retries happen for SELECT's, for Adapters that support them. I think catching Exceptions for exceptional situations is preferrable to trying to implement an _active?_ method that depends on side-effects, and isn't actually deterministic on it's own. That breaks the whole Ruby Principle of Least Surprise thing.

I couldn't say how or if Hibernate has some sort of scavenging mechanism for it's connection pools, but perhaps that's a good place to look in future versions.

Again, I for one really appreciate everyone's hard work as it's going to make my life easier at work. :-) Just thought I'd give my opinion on the direction of AR and this fix in particular.

11/16/05 18:12:38 changed by tomafro

Having thought about this some more (and the more you think about it, the more of a headache it seems), I think the first problem is defining exactly what our expectations are.

For me, I don't really expect ActiveRecord to recover if a fresh connection breaks during a request. This should be an extremely exceptional occurrence, and while I would hope the connection is reset before any subsequent request, if the request doesn't complete I will understand. So for me retrying on error is kind of irrelevant (and probably fraught with problems I don't want to think too much about).

What I do expect though, is that for each request, the connection should be fresh. It shouldn't never be be broken before I even get to use it.

How to actually ensure this though is tricky. I think my before_filter suggestion is a good workaround, but as others have said, it doesn't smell right (encapsulation, unwanted dependencies, etc). That said, it's simple and seems to work.

So that leaves working towards a connection pool (which I'm really warming to). I've looked at the pool in #2162 and it looks like a great start. The problem it's trying to solve though is slightly different - that connections actually run out when request threads aren't reused. It should be easy to enhance to keep connections fresh as well as limit the number used (and maybe not reap/recreate them as frequently). It can also use the management api that's evolved here. Hopefully I'll have time tomorrow to look into it. Any thoughts?

11/16/05 19:00:07 changed by ssmoot@gmail.com

I agree that trying to recover the current request isn't the biggest priority (though it would be nice).

The problem is, an O/R Mapper shouldn't have a notion of a Request. It's just all Statements to it, and that's how it should be.

Any design that results from wether a request can or should be salvaged is very flawed IMO.

Working code is great of course, so I'm not trying to tear down your solution Tom. Just saying that solving this problem with ActiveRecord should be completely divorced of any thought about Rails IMO.

I'm planning to use AR in my Processing System for example. An always-up Service. This same issue with ConnectionFailures would cause the same fatal problems in that scenario.

So going with the retry on SELECT when not in a Transaction is best I think, but long-term, say Rails v1.5, I'd rather see a QueryGenerator, ConnectionPool, ExecutionEngine, Configuration, Dialects, etc. All those nasty things DHH parodies Java for basically. ;-)

This connection failure issue for example could and should be a simple fix with a little SoC (seperation of concerns).

11/16/05 23:51:10 changed by davelee

I get similar problems with Sqlserver through iODBC / Freetds, except the error given (a SQL error) is INVALID CURSOR STATE. It's unrecoverable, and the only way around it to date is to set up a regular cron to restart lighty.

11/17/05 00:06:20 changed by bitsweat

davelee, you get this error on trunk after these updates? or on a previous Rails release? or what?

11/17/05 18:19:06 changed by dreamer3@gmail.com

Running latest svn... I get the following in an app with very low usage (ie, subject to db timeouts):

Processing OverviewController#daily (for 63.18.140.147 at 2005-11-17 13:14:41) [ GET]

Parameters: {"month"=>"11", "action"=>"daily", "day"=>"17", "controller"=>"ove rview", "year"=>"2005"}

MySQL automatically reconnected. Success rate: 100%

ActiveRecord::StatementInvalid (Mysql::Error: MySQL server has gone away: SELECT * FROM activities WHERE (scheduled_on <= '2005-11-17' and '2005-11-17' <= compl ete_on ) AND ( (activities.type = 'Lodging' ) ) ):

I assume the "reconnected" line has to do with all this new code, but it's obviously not having the desired effect here. Running svn from a day or two again... I'll up again and keep you all posted.

11/17/05 20:29:55 changed by bitsweat

Dreamer3, the database reconnects but does not retry the query. Do subsequent requests succeed?

11/18/05 08:27:57 changed by anonymous

  • cc changed from schoenm@earthlink.net, rtomayko@gmail.com, tom@popdog.net, skaes@web.de, ssmoot@gmail.com, kennethkunz@gmail.com to schoenm@earthlink.net, rtomayko@gmail.com, tom@popdog.net, skaes@web.de, ssmoot@gmail.com, kennethkunz@gmail.com, charles.gerungan@gmail.com.

11/18/05 18:36:31 changed by dreamer3@gmail.com

Bitsweat, subsequent requests fail, same error.

I just svnuped to 3087. ./script/process/reaper.... my app was reloaded... first connection FAILS... (same error)... second connection succeeds.

Anything I should try if it's dead again tomorrow to help test?

11/19/05 08:19:13 changed by bitsweat

Dreamer3, by 'request' I mean a page load -- a full HTTP request for your web page.

When a request fails because the database connection has died, an exception will be raised such as "MySQL server has gone away" (or similar) *and* the database connection will be reestablished, on the ready for the next request.

So, when you get a connection failure on one request, does the next request succeed?

11/19/05 10:55:16 changed by bitsweat

(In [3096]) r3181@asus: jeremy | 2005-11-19 02:52:24 -0800

Mark connections for verification. Retrieve connection verifies before returning a connection. Verification tests whether the connection is marked then reconnects if the connection is inactive. All active connections are marked for verification after a request is handled. References #428.

11/19/05 19:27:21 changed by dreamer3@gmail.com

Start server...

24 horus later...

[first request] Processing OverviewController#daily (for 63.18.155.45 at 2005-11-19 14:11:57) [GET]

Parameters: {"month"=>"11", "action"=>"daily", "day"=>"19", "controller"=>"overview", "year"=>"2005"}

MySQL automatically reconnected. Success rate: 100%

ActiveRecord::StatementInvalid (Mysql::Error: MySQL server has gone away: SELECT * FROM activities WHERE (scheduled_on <= '2005-11-19' and '2005-11-19' <= complete_on ) AND ( (activities.type = 'Lodging' ) ) ):

[second request - hit refresh / watching log] Processing OverviewController#daily (for 63.18.155.45 at 2005-11-19 14:14:28) [G ET]

Parameters: {"month"=>"11", "action"=>"daily", "day"=>"19", "controller"=>"ove rview", "year"=>"2005"}

MySQL automatically reconnected. Success rate: 100%

ActiveRecord::StatementInvalid (Mysql::Error: MySQL server has gone away: SELECT * FROM activities WHERE (scheduled_on <= '2005-11-19' and '2005-11-19' <= compl ete_on ) AND ( (activities.type = 'Lodging' ) ) ):

[additional requests] Hitting refresh and making subsequent requests resutls in exactly the same as above with only a changed timestamp.

Addl info:

MySQL 4.1.11a-4sarge (Debian sarge) Rails (r3087) mysql gem (2.7)


I just updated to Rails r3097 and ./script/process/reaper.

[first request] Processing OverviewController#daily (for 63.18.155.45 at 2005-11-19 14:21:45) [GET]

Parameters: {"month"=>"11", "action"=>"daily", "day"=>"19", "controller"=>"overview", "year"=>"2005"}

MySQL automatically reconnected. Success rate: 100%

ActiveRecord::StatementInvalid (Mysql::Error: MySQL server has gone away: SELECT * FROM activities WHERE (scheduled_on <= '2005-11-19' and '2005-11-19' <= complete_on ) AND ( (activities.type = 'Lodging' ) ) ):

[second request] Processing OverviewController#daily (for 63.18.155.45 at 2005-11-19 14:22:36) [GET]

Parameters: {"month"=>"11", "action"=>"daily", "day"=>"19", "controller"=>"overview", "year"=>"2005"}

Rendering within layouts/application Rendering overview/daily Transferred: 3834 Completed in 3.27387 (0 reqs/sec) | Rendering: 0.53283 (16%) | DB: 1.51542 (46%) | 200 OK http://reservations.marengocave.com/overview/2005/11/19


This is the same behavior exhibited last time... kicking (requests) it over and over has no effect... it won't really reconnect to Mysql... but somehow prodding it with reaper... although it doesn't immediately fix the issue (the first failure after reaping) it makes the reconnect code start working properly again and the second request succeeds.

I'm happy to provide any more information, this is repeatable every morning.

11/20/05 17:12:52 changed by skaes

Changeset 3096 broke performance severly, due to retrieving the DB connection on every access to AR::Base.connection This needs to be changed.

11/20/05 19:10:18 changed by dreamer3@gmail.com

One (or two) days later... Mysql has evidentaly timed out and closed the connection... multiple (5-6) requests produce the same results... Rails application error... same log as before except now the "MySQL automatically reconnected." does not seem to be shown anymore.

I ran script reaper... first request failed... second and subsequent succeed... but only after reaping the process...

This is with the aforementioned revision r3097... I've just updated to r3112 and will keep posting updates.

11/20/05 20:27:51 changed by bitsweat

Stefan, the change was intentional. We'll reinstate the caching when it's clear that this technique works. Thanks for keeping an eye out for performance.

11/23/05 15:12:39 changed by davelee

  • cc changed from schoenm@earthlink.net, rtomayko@gmail.com, tom@popdog.net, skaes@web.de, ssmoot@gmail.com, kennethkunz@gmail.com, charles.gerungan@gmail.com to schoenm@earthlink.net, rtomayko@gmail.com, tom@popdog.net, skaes@web.de, ssmoot@gmail.com, kennethkunz@gmail.com, charles.gerungan@gmail.com,david@davelee.com.au.

11/23/05 23:51:01 changed by bitsweat

References #2978. Be sure to reconfigure the connection appropriately after reconnect.

How is the new logic working for you all? Connections are marked as dirty at the end of each request. Dirty connections are checked for activity and reconnected before handing them to the app.

11/23/05 23:52:32 changed by bitsweat

Also, we're missing DB2 coverage. Any DB2 folks in the house? I suppose we can give it a dummy active? and reconnect! as with SQLite in the meantime.

Please let me know if there are any outstanding issues; I'd like to push the work we've done here to the stable branch so we can get 1.0 out the door. Thanks!

11/24/05 01:59:10 changed by anonymous

Running at revision 3112...

Processing OverviewController#daily (for 63.18.137.222 at 2005-11-23 20:52:21) [GET]

Parameters: {"month"=>"11", "action"=>"daily", "day"=>"23", "controller"=>"overview", "year"=>"2005"}

ActiveRecord::StatementInvalid (Mysql::Error: MySQL server has gone away: SELECT * FROM activities WHERE (scheduled_on <= '2005-11-23' and '2005-11-23' <= complete_on ) AND ( (activities.type = 'Lodging' ) ) ):

Over and over and over, each request... still broken in 3112.

Reaper... first request, error, second request works.

I just svn uped to 3180.

11/24/05 06:10:43 changed by bitsweat

  • owner changed from Jeremy Kemper <rails@bitsweat.net> to bitsweat.

Ok, I'll see what I can do about MySQL. I don't use it in my apps, so if anyone else would like to lend a hand sewing it up, I'd be much obliged.

Is there any reason not to have MySQL connect/close per-request, since these operations are so fast? That'd eliminate all these contortions over clean, persistent connections because they'd be unnecessary. Food for thought.

11/24/05 08:34:12 changed by mschoen

This approach works for me. I've attached a patch for the OCI adapter to make this work for Oracle -- it now does a real ping of the db, rather than just checking the last known state. With this patch, it all looks good.

The only thing I don't like is that in [3096] you dropped the logging of the reconnects and the tracking of success/failure. I find that really useful, esp. when tracking down db issues. Why drop it?

11/24/05 08:34:43 changed by mschoen

  • attachment oci_ping.patch added.

makes Oracle work w/ latest approach

11/24/05 08:38:09 changed by skaes

There is a very good reason for not creating a new connection for every request: performance. Establishing a new connection is expensive for the DB.

I wonder how you could even consider that option. Every halfway decent application server implements conection pooling. Do you think they all do that just for fun?

That said, I don't like the idea of retrying a failed request, at all. It adds extra logic to the database adapters which is very error prone, proved by the fact that it still doesn't work right after several implementation attempts, and is very hard to test too.

Retrying a failed request is likely to create havoc in the DB, unless the request is a select without any enclosing transaction. But how can you detect it's not enclosed in commit brackets? You would need to parse every sql statement to check whether it starts a transaction (method execute). But that's not enough, because transactions can be nested, so you need to count transaction brackets properly. Better not make any mistake there. But, even counting is not sufficient, because someone could retrieve the native DB handle and issue DB statements directly, invalidating you count, with possibly desastrous consequences. I think this efectively rules out retrying anything.

I think you should take a step back and reconsider which problem you are trying to solve.

As far as I know, connections get dropped by the DB only when there's a timeout or the db crashed? At least I haven't seen any other causes for MySQL.

The timeout problem can be easily solved in at least 3 different ways:

* by having a DB ping thread run inside the dispatcher.

* create a cronjob that requests an uncached page which requires a DB connect from the app.

* just let the app die. it will get restarted by the web server, fcgi runner, whatever, on the next request.

I currently use number 2, as Rails insists on running happily along with a broken database connection, which it shouldn't do. I never had a broken connection since.

Option 3 is by far the best one: zero overhead. Simple to implement. No change in the DB adapaters required. We only should Rails allow to die if the connection gets closed by the DB.

It also has the best scaling properties: no extra work for the DB. no extra work inside the adapaters. no controller filter hacks. no nothin.

If you run a single dispatcher, then no one as accessed your site for the DB timeout interval (when you use the DB to store sessions) or has only looked at cached pages. Fine, let's restart the app.

If you have a setup running more than one dispatcher, then one dispatcher will die and get restarted. While it does that, the remaining dispatchers will handle incoming requests. So what?

If you dread the error page dispayed to the user, just make it a friendly text, telling the user to retry in a minute. (This would be desastrous for a high traffic site, but these don't get DB timeouts).

11/24/05 08:46:01 changed by mschoen

The current implementation does not retry a failed request.

11/24/05 19:41:15 changed by bitsweat

Stefan, I think you need to take a closer look at the problem and the solution we have come up with. It has nothing to do with retrying failed queries and it is simpler than burdening the application developer with crashing fcgi processes. Having to respawn new fcgi processes is definitely worse than simply reconnecting to a dead connection before handing it to the app.

Database connection overhead is a myth. Connections are slow for Oracle, sure, but share-nothing webapps have a history of scaling better when not using persistent connections because it gives better resource utilization -- you only have as many database connections as you have actively processing web requests, rather than a database connection per application instance.

In any case, it was just food for though. As you say, to take a step back and really look at what the problem is. If connecting to MySQL is a matter of milliseconds and eliminates a lot of complexity, I'm happy to pursue it for the 95% of users who would strongly prefer it.

Anyway, please look at the code. It is sensible, straightforward, and is not dangerous.

11/24/05 19:46:10 changed by bitsweat

"The only thing I don't like is that in [3096] you dropped the logging of the reconnects and the tracking of success/failure. I find that really useful, esp. when tracking down db issues. Why drop it?"

I will add it again when we have settled on a working solution. I don't want our choices to be artificially limited by a snippet of logging code :)

One remaining detail is that we need to cache connections per-class because Base.retrieve_connection is expensive (it loops through its superclasses looking for an active connection). The I can get rid of the dirty_connection methods and simply wipe the connection cache.

11/24/05 19:50:14 changed by bitsweat

(In [3189]) Oracle: active? check pings the database rather than testing the last command status. References #428.

11/24/05 19:54:30 changed by skaes

I will take a look at the code, as soon as I can spend the time.

A matter of milliseconds: if you consider this data

page request                          total  stddev%     r/s    ms/r
/empty/index                        1.28028   0.8644  781.08    1.28
/welcome/index                      1.45221   0.3106  688.60    1.45
/rezept/index                       1.54641   0.4578  646.66    1.55
/rezept/myknzlpzl                   1.53282   0.1405  652.39    1.53
/rezept/show/713                    3.63583   0.1372  275.04    3.64
/rezept/cat/Hauptspeise             4.47639   0.1260  223.39    4.48
/rezept/cat/Hauptspeise?page=5      4.69200   0.1452  213.13    4.69
/rezept/letter/G                    4.64599   0.0938  215.24    4.65

then you can see that a few milliseconds could easily bring down performance by an order of magnitude. This doesn't seem acceptable to me.

You quote that slow connection setup is a myth. Please backup with substantial data.

11/24/05 20:31:44 changed by bitsweat

Indeed, you are in the < 5% of users where it would make a large difference. Hopefully, you are highly motivated to find the best solution. I'll be away this weekend (holiday) and would love to see MySQL working smoothly when I return ;-)

11/25/05 06:29:59 changed by Ken Kunz <kennethkunz@gmail.com>

  • attachment fb_reconnect_r3190.diff added.

Firebird implementation (based on rev. 3190)

11/25/05 06:33:36 changed by Ken Kunz <kennethkunz@gmail.com>

I've attached a patch that implements #active? and #reconnect! for Firebird.

11/25/05 13:17:45 changed by skaes

Just to tell you how much performance has already been lost on trunk:

perf data file 1: 11-25.all.r141
 requests=1000, options=-bm=all -mysql_session -fast_routes -fast_readers -lib=r141

perf data file 2: 11-25.all.trunk.r3190
 requests=1000, options=-bm=all -mysql_session -fast_routes -fast_readers -lib=trunk

page                               c1 real   c2 real    r/s    r/s   ms/r   ms/r  c1/c2
/empty/index                       1.25908   1.30168  794.2  768.2   1.26   1.30   0.97
/welcome/index                     1.42033   1.45796  704.1  685.9   1.42   1.46   0.97
/rezept/index                      1.50141   1.54753  666.0  646.2   1.50   1.55   0.97
/rezept/myknzlpzl                  1.49341   1.53782  669.6  650.3   1.49   1.54   0.97
/rezept/show/713                   3.56894   4.42182  280.2  226.2   3.57   4.42   0.81
/rezept/cat/Hauptspeise            4.35380   5.23607  229.7  191.0   4.35   5.24   0.83
/rezept/cat/Hauptspeise?page=5     4.56013   5.33775  219.3  187.3   4.56   5.34   0.85
/rezept/letter/G                   4.49647   5.20815  222.4  192.0   4.50   5.21   0.86 

11/27/05 21:16:11 changed by assaf@labnotes.org

I was just reviewing the code and noticed the problem with reconnecting during transactions.

Unfortunately, reconnecting during a transaction will break any assumption a developer could make on transactional behavior. For example, you can use SELECT to lock records that you intend to update later on, or to prevent another transaction from reading their values until the update completes. If the adapter reconnects on a SELECT, you lose the locks and with it the transaction isolation. It may affect your transaction, or it may affect another transaction running in parallel. There are also some common cases relying on a certain order of SELECTs to avoid dead-locks.

I ran into that problem before with stale connections being closed after a certain period of time. The application would experience heavy load at certain times of the day, say early morning and late afternoon. In the morning it will acquire a large number of connections, but would only use some of these connections throughout the day. By afternoon, when it needs all these connections again, some would already be closed by the database server.

Bombing the server with fake requests during the day is not my ideal solution for solving this. Instead we used a simple ping-on-stale mechanism. Whenever the application requested a new connection (e.g. at the start of an HTTP request), the pool would look at the time lapse since the connection was last used, if that was longer than a few minutes, it would attempt a no-op statement (e.g. SHOW TABLES) to determine if the connection is still open.

11/27/05 22:33:54 changed by bitsweat

Assaf, dead connections raise an exception which rolls back any active transaction. On the next request, the connection will be tested for activity and reconnected if inactive. There is no consistency problem with this scheme since neither queries nor transactions are retried.

11/27/05 22:59:03 changed by assaf@labnotes.org

I'm looking at the code for ActiveRecord 1.13.0, mysql_adapter.rb lines 151-171. If the execute method fails on a query, it will attempt to reconnect and execute the statement again.

Let's say I'm processing a request, the connection is active, so I'm able to start the transaction and execute the first statement. At this point the connection breaks. When my code gets to execute the next statement, the execute method detects the connection is broken and attempts to reconnect. If successful, the first part of the transaction would abort, the second part would execute in auto commit mode.

I'm not seeing any code that prevents the reconnect from happening inside a transaction, or forcing the entire transaction to abort by killing the request.

11/27/05 23:13:00 changed by bitsweat

Ah, you're reviewing old code. Do a svn checkout to get up to speed. Your concerns are well-founded and have been covered in the extensive commentary above.

11/27/05 23:35:16 changed by assaf@labnotes.org

Thanks. Looking at the SVN trunk, the reconnect code is indeed no longer there. Sorry for being slow on the uptake, I've noticed the problem a few weeks ago, and only now had the time to report it.

Is there any plans for adding connection pools (2162)? That could solve the idle connection death problem we're seeing with MySQL; it's also a problem I've seen before with SQL Server.

11/28/05 10:01:37 changed by tomafro

I've been working on building a connection pool for #2162. I should have a patch available by the end of today, though it will need a fair amount of external review and testing before any final release.

11/28/05 21:00:05 changed by bitsweat

(In [3194]) MySQL: active? compatibility with the pure-Ruby driver. References #428.

11/28/05 21:46:37 changed by bitsweat

(In [3195]) PostgreSQL: active? compatibility with the pure-Ruby driver. Still need to move connect into the adapter since postgres-pr doesn't support #reset. References #428.

12/01/05 10:26:41 changed by dreamer3@gmail.com

This still isn't fixed at all for me...

Processing OverviewController#daily (for 63.18.216.108 at 2005-12-01 05:22:54) [GET]

Parameters: {"month"=>"12", "action"=>"daily", "day"=>"1", "controller"=>"overview", "year"=>"2005"}

ActiveRecord::StatementInvalid (Mysql::Error: MySQL server has gone away: SELECT * FROM activities WHERE (scheduled_on <= '2005-12-01' and '2005-12-01' <= complete_on ) AND ( (activities.type = 'Lodging' ) ) ):

/vendor/rails/activerecord/lib/active_record/connection_adapters/abstract_adapter.rb:78:in `log' /vendor/rails/activerecord/lib/active_record/connection_adapters/mysql_adapter.rb:185:in `execute' /vendor/rails/activerecord/lib/active_record/connection_adapters/mysql_adapter.rb:331:in `select' /vendor/rails/activerecord/lib/active_record/connection_adapters/mysql_adapter.rb:176:in `select_all' /vendor/rails/activerecord/lib/active_record/base.rb:445:in `find_by_sql' /vendor/rails/activerecord/lib/active_record/base.rb:409:in `find'

over and over... once with each request... the database is never reconnected to...

After reaping the first request fails with the same error... the second and later requests go thru...

Running Rails trunk #3196... should I keep svnuping? Anything else I should be trying?

12/01/05 17:32:55 changed by paul@vudmaska.com

  • priority changed from highest to high.
  • severity changed from blocker to major.
  • summary changed from Stale database connections should be handled gracefully to Mysql has gone away..

I'm running from the SVN head of rails r3200 and getting the error below.

Mysql::Error: MySQL server has gone away: SELECT * FROM domains WHERE (domains.host = 'somehost.com' ) LIMIT 1):

/vendor/rails/activerecord/lib/active_record/connection_adapters/abstract_adapter.rb:78:in `log' /vendor/rails/activerecord/lib/active_record/connection_adapters/mysql_adapter.rb:185:in `execute' /vendor/rails/activerecord/lib/active_record/connection_adapters/mysql_adapter.rb:331:in `select' /vendor/rails/activerecord/lib/active_record/connection_adapters/mysql_adapter.rb:176:in `select_all' /vendor/rails/activerecord/lib/active_record/base.rb:445:in `find_by_sql'

I'm using lighty.

Locally I can not get this to happen - it is only happening on textdrive.com.

When I restart, all is good for awhile - but when I let the app sit - say 20 min or so, when I go back I get the above. If I'm doing things continously things keep working.

If more info is needed to help out please shout.

12/01/05 18:20:53 changed by joevandyk@gmail.com

I'm also having this problem using edge rails on TxD with mysql.

12/02/05 00:03:37 changed by bitsweat

  • priority changed from high to highest.
  • severity changed from major to blocker.
  • summary changed from Mysql has gone away. to Stale database connections should be handled gracefully.

12/02/05 01:19:37 changed by bitsweat

(In [3205]) Firebird: active? and reconnect! methods for handling stale connections. References #428.

12/03/05 00:38:57 changed by me@julik.nl

  • cc changed from schoenm@earthlink.net, rtomayko@gmail.com, tom@popdog.net, skaes@web.de, ssmoot@gmail.com, kennethkunz@gmail.com, charles.gerungan@gmail.com,david@davelee.com.au to me@julik.nl schoenm@earthlink.net, rtomayko@gmail.com, tom@popdog.net, skaes@web.de, ssmoot@gmail.com, kennethkunz@gmail.com, charles.gerungan@gmail.com,david@davelee.com.au.

Same here I also get problems with this. Currently I need to call reconnect in a pre_filter

12/03/05 23:04:38 changed by bitsweat

(In [3216]) MySQL: work around ruby-mysql/mysql-ruby inconsistency with mysql.stat. Eliminate usage of mysql.ping because it doesn't guarantee reconnect. Explicitly close and reopen the connection instead. References #428.

12/04/05 05:56:54 changed by bitsweat

(In [3218]) Connection cache to speed up retrieve_connection and get rid of dirty connection marking. References #428.

12/04/05 06:15:07 changed by bitsweat

(In [3219]) Move dummy active? and reconnect! from sqlite to base adapter. References #428.

12/06/05 22:17:50 changed by anonymous

  • cc changed from me@julik.nl schoenm@earthlink.net, rtomayko@gmail.com, tom@popdog.net, skaes@web.de, ssmoot@gmail.com, kennethkunz@gmail.com, charles.gerungan@gmail.com,david@davelee.com.au to me@julik.nl schoenm@earthlink.net, rtomayko@gmail.com, tom@popdog.net, skaes@web.de, ssmoot@gmail.com, kennethkunz@gmail.com, charles.gerungan@gmail.com,david@davelee.com.au, zach@berkeleydata.net.

12/07/05 06:27:50 changed by bitsweat

  • status changed from new to closed.
  • resolution set to fixed.

(In [3226]) r3363@asus: jeremy | 2005-12-06 22:26:27 -0800

Apply #428 changesets to stable: r3000, r3001, r3002, r3025, r3045, r3096, r3148, r3152, r3165, r3189, r3195, r3205, r3216, r3219, r3221, r3222. Closes #428.

12/10/05 00:58:27 changed by anonymous

  • cc changed from me@julik.nl schoenm@earthlink.net, rtomayko@gmail.com, tom@popdog.net, skaes@web.de, ssmoot@gmail.com, kennethkunz@gmail.com, charles.gerungan@gmail.com,david@davelee.com.au, zach@berkeleydata.net to me@julik.nl schoenm@earthlink.net, rtomayko@gmail.com, tom@popdog.net, skaes@web.de, ssmoot@gmail.com, kennethkunz@gmail.com, charles.gerungan@gmail.com,david@davelee.com.au, zach@berkeleydata.net f.svehla@gmail.com.

12/11/05 04:56:51 changed by dreamer3@gmail.com

  • status changed from closed to reopened.
  • resolution deleted.

Has anyone looked at #3159? Same problem, same symptoms, different error... ie, after long period of disuse the the app does not work until it's restarted... This is with the stable branch... revision 3255.... error is still in AR, but not the same...

Processing ReservationsController#show (for 65.138.145.249 at 2005-12-10 23:53:25) [GET]

Parameters: {"action"=>"show", "id"=>"135", "controller"=>"reservations"}

Redirected to http://reservations.marengocave.com/login Filter chain halted as [require_login] returned false

NoMethodError (undefined method `[]' for nil:NilClass):

/vendor/rails/activerecord/lib/active_record/connection_adapters/mysql_adapter.rb:322:in `connect' /vendor/rails/activerecord/lib/active_record/connection_adapters/mysql_adapter.rb:174:in `reconnect!' /vendor/rails/activerecord/lib/active_record/connection_adapters/abstract/connection_specification.rb:103:in `retrieve_connection' /vendor/rails/activerecord/lib/active_record/connection_adapters/abstract/connection_specification.rb:20:in `connection' /vendor/rails/actionpack/lib/action_controller/benchmarking.rb:85:in `active_record_runtime' /vendor/rails/actionpack/lib/action_controller/benchmarking.rb:72:in `perform_action_without_rescue' /vendor/rails/actionpack/lib/action_controller/rescue.rb:82:in `perform_action' /vendor/rails/actionpack/lib/action_controller/base.rb:369:in `send' /vendor/rails/actionpack/lib/action_controller/base.rb:369:in `process_without_session_management_support' /vendor/rails/actionpack/lib/action_controller/session_management.rb:116:in `process' /vendor/rails/railties/lib/dispatcher.rb:38:in `dispatch' /vendor/rails/railties/lib/fcgi_handler.rb:141:in `process_request' /vendor/rails/railties/lib/fcgi_handler.rb:53:in `process!' /vendor/rails/railties/lib/fcgi_handler.rb:52:in `each_cgi' /usr/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:597:in `each' /usr/local/lib/ruby/gems/1.8/gems/fcgi-0.8.6.1/./fcgi.rb:597:in `each_cgi' /vendor/rails/railties/lib/fcgi_handler.rb:52:in `process!' /vendor/rails/railties/lib/fcgi_handler.rb:22:in `process!' /public/dispatch.fcgi:24

./script/process/reaper

Error (same as above) after first request after restart.... second request and subsequent work fine. I hope this is something different, but feels the same here.

12/11/05 05:04:21 changed by dreamer3@gmail.com

I just restart my app (see above)... and all was working fine... then I restarting MySQL (as a test)... and now it's broken with the same error as above... same exact trace...

Same solution... reaper... two refereshs and it's working on the second.

To reproduce:

1. Start app... everything is working normally. 2. Restart MySQL... 3. Try to request any page... 4. BOOM, error...

12/11/05 05:22:38 changed by dreamer3@gmail.com

  • status changed from reopened to closed.
  • resolution set to fixed.

I think #3266 or #3267 fixed this... Think it may have been the @config issue... I can complete the above steps with no error now (I assume the reconnect code is finally working flawlessly)... Closing again, since I'm the one that reopened.

12/28/05 09:00:39 changed by anonymous

  • status changed from closed to reopened.
  • resolution deleted.

Hi. Running Rails 1.0 and got this in my production.log this morning. Note the gap is more than 8 hours, which indicates that MySQL DB has closed the connection and Rails doesn't handle it.

Processing MainController#index (for 87.49.152.189 at 2005-12-27 22:20:53) [GET]

Parameters: {"action"=>"index", "aid"=>"8", "controller"=>"main"}

Rendering main/index Completed in 0.07976 (12 reqs/sec) | Rendering: 0.04270 (53%) | DB: 0.00215 (2%) | 200 OK http://www..../main?aid=8

Processing MainController#index (for 66.194.6.76 at 2005-12-28 06:30:45) [GET]

Parameters: {"action"=>"index", "controller"=>"main"}

Errno::EPIPE (Broken pipe - /var/lib/mysql/mysql.sock):

/usr/lib/ruby/1.8/mysql.rb:1042:in `flush' /usr/lib/ruby/1.8/mysql.rb:1042:in `write' /usr/lib/ruby/1.8/mysql.rb:462:in `write' /usr/lib/ruby/1.8/mysql.rb:436:in `command' /usr/lib/ruby/1.8/mysql.rb:307:in `stat' /usr/lib/ruby/gems/1.8/gems/activerecord-1.13.2/lib/active_record/connection_adapters/mysql_adapter.rb:147:in `active?' /usr/lib/ruby/gems/1.8/gems/activerecord-1.13.2/lib/active_record/connection_adapters/abstract/connection_specification.rb:103:in `retrieve_connection' /usr/lib/ruby/gems/1.8/gems/activerecord-1.13.2/lib/active_record/connection_adapters/abstract/connection_specification.rb:20:in `connection' /usr/lib/ruby/gems/1.8/gems/activerecord-1.13.2/lib/active_record/base.rb:1152:in `quote_bound_value' /usr/lib/ruby/gems/1.8/gems/activerecord-1.13.2/lib/active_record/base.rb:1133:in `replace_bind_variables' /usr/lib/ruby/gems/1.8/gems/activerecord-1.13.2/lib/active_record/base.rb:1133:in `gsub' /usr/lib/ruby/gems/1.8/gems/activerecord-1.13.2/lib/active_record/base.rb:1133:in `replace_bind_variables' /usr/lib/ruby/gems/1.8/gems/activerecord-1.13.2/lib/active_record/base.rb:1122:in `sanitize_sql' /usr/lib/ruby/gems/1.8/gems/activerecord-1.13.2/lib/active_record/base.rb:942:in `add_conditions!' /usr/lib/ruby/gems/1.8/gems/activerecord-1.13.2/lib/active_record/base.rb:921:in `construct_finder_sql' /usr/lib/ruby/gems/1.8/gems/activerecord-1.13.2/lib/active_record/base.rb:395:in `find' .//app/controllers/main_controller.rb:11:in `index' /usr/lib/ruby/gems/1.8/gems/actionpack-1.11.2/lib/action_controller/base.rb:853:in `send' /usr/lib/ruby/gems/1.8/gems/actionpack-1.11.2/lib/action_controller/base.rb:853:in `perform_action_without_filters' /usr/lib/ruby/gems/1.8/gems/actionpack-1.11.2/lib/action_controller/filters.rb:332:in `perform_action_without_benchmark' /usr/lib/ruby/gems/1.8/gems/actionpack-1.11.2/lib/action_controller/benchmarking.rb:69:in `perform_action_without_rescue' /usr/lib/ruby/gems/1.8/gems/actionpack-1.11.2/lib/action_controller/benchmarking.rb:69:in `measure' /usr/lib/ruby/gems/1.8/gems/actionpack-1.11.2/lib/action_controller/benchmarking.rb:69:in `perform_action_without_rescue' /usr/lib/ruby/gems/1.8/gems/actionpack-1.11.2/lib/action_controller/rescue.rb:82:in `perform_action' /usr/lib/ruby/gems/1.8/gems/actionpack-1.11.2/lib/action_controller/base.rb:369:in `send' /usr/lib/ruby/gems/1.8/gems/actionpack-1.11.2/lib/action_controller/base.rb:369:in `process_without_session_management_support' /usr/lib/ruby/gems/1.8/gems/actionpack-1.11.2/lib/action_controller/session_management.rb:116:in `process' /usr/lib/ruby/gems/1.8/gems/rails-1.0.0/lib/dispatcher.rb:38:in `dispatch' /usr/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/bin/scgi_service:23:in `process_request' /usr/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/bin/scgi_service:21:in `synchronize' /usr/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/bin/scgi_service:21:in `process_request' /usr/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:291:in `read_header' /usr/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:253:in `handle_client' /usr/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:234:in `initialize' /usr/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:234:in `new' /usr/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:234:in `handle_client' /usr/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:188:in `listen' /usr/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:186:in `initialize' /usr/lib/ruby/gems/1.8/gems/scgi_rails-0.4.3/lib/scgi.rb:186:in `new'

12/28/05 21:25:05 changed by bitsweat

  • status changed from reopened to closed.
  • resolution set to fixed.

Please use the mysql.rb included with Rails. It fixes some errors, including the broken pipe exception, that haven't been included in an upstream release yet. Alternatively, switch to the mysql gem.

02/27/06 22:00:52 changed by bitsweat

See #3979 for furthur cleanup & speedup.

08/05/06 02:15:10 changed by anonymous

  • status changed from closed to reopened.
  • resolution deleted.

This one is severly messed up by spammers.

08/05/06 02:18:51 changed by anonymous

Additionally, I still get this sometimes on edge. Mysql server has gone away etc.

08/05/06 02:27:17 changed by bitsweat

  • status changed from reopened to closed.
  • severity changed from 1 to normal.
  • cc changed from Rex to me@julik.nl, schoenm@earthlink.net, rtomayko@gmail.com, tom@popdog.net, skaes@web.de, ssmoot@gmail.com, kennethkunz@gmail.com, charles.gerungan@gmail.com, david@davelee.com.au, zach@berkeleydata.net.
  • type changed from 1 to defect.
  • component changed from 1 to ActiveRecord.
  • summary changed from Rex to Stale database connections should be handled gracefully.
  • priority changed from 1 to normal.
  • version changed from 1 to 0.14.3.
  • milestone changed from 1 to 1.0.
  • keywords changed from Rex to fcgi webrick mssql postgre mysql broken pipe timeout fd.
  • resolution set to fixed.

08/07/06 09:25:23 changed by bitsweat

(In [4708]) r3022@ks: jeremy | 2005-11-12 18:40:42 -0800

merge to stable r3024@ks: jeremy | 2005-11-12 19:00:49 -0800 Apply [2992] to stable. SQLite: the clone_structure_to_test and purge_test_database Rake tasks should always use the test environment. References #2846. r3027@ks: jeremy | 2005-11-12 22:37:45 -0800 Apply [2994] to stable. PostgreSQL: correct the sequence discovery fallback query. References #2594. r3033@ks: jeremy | 2005-11-12 23:27:13 -0800 Apply [2996] to stable. Much faster Oracle column reflection. Closes #2848. r3037@ks: jeremy | 2005-11-13 00:11:26 -0800 Apply [2998] to stable. PostgreSQL: last_insert_id uses select_value rather than using @connection.exec directly. r3053@ks: jeremy | 2005-11-13 14:37:39 -0800 Only include builtin filters whose filenames match /[a-z][a-z_]*_helper.rb$/ to avoid including operating system metadata such as ._foo_helper.rb. Closes #2855. r3055@ks: jeremy | 2005-11-13 14:48:48 -0800 Apply [3009] to stable: Reapply [2942] which was elided by [2997]. References #2788. r3058@ks: jeremy | 2005-11-13 16:09:17 -0800 Apply [3013] to stable. Note that the ruby-memcache bindings are required to use the memcache store. Closes #2857. r3063@ks: jeremy | 2005-11-13 16:13:51 -0800 Apply [3015] to stable. Update documentation for render :file. Closes #2858. r3066@ks: jeremy | 2005-11-13 20:24:18 -0800 Apply [3017] to stable. Update documentation for Migrations. Closes #2861. r3070@ks: jeremy | 2005-11-13 20:54:50 -0800 Apply [3019] to stable. Oracle: test case for column default parsing. Closes #2788. r3073@ks: jeremy | 2005-11-13 23:42:32 -0800 Apply [3021] to stable. Correct documentation for Base.delete_all. Closes #1568. r3077@ks: jeremy | 2005-11-14 14:28:21 -0800 Apply [3035] to stable. SQLServer: insert uses given primary key value if not nil rather than SELECT @@IDENTITY. Closes #2866. r3120@ks: jeremy | 2005-11-16 13:34:45 -0800 Apply [3060] to stable. Remove the unused, slow response_dump and session_dump variables from error pages. Closes #1222. r3130@ks: jeremy | 2005-11-16 14:37:47 -0800 Apply [3063] to stable. Remove CHANGELOG from apidoc Rake task since it isn't included with new apps anymore. r3148@ks: jeremy | 2005-11-17 12:45:44 -0800 Apply [3074] to stable. Don't add the same conditions twice in has_one finder sql. Closes #2916. r3149@ks: jeremy | 2005-11-17 12:47:36 -0800 Apply [3075] to stable. Document :force option to create_table. Closes #2921. r3157@ks: jeremy | 2005-11-17 21:27:39 -0800 Apply [3081] to stable. Eliminate nil from newly generated logfiles. Closes #2927. r3167@ks: jeremy | 2005-11-18 22:47:31 -0800 Apply [3089] to stable. Don't generate read methods for columns whose names are not valid ruby method names. Closes #2946. r3174@ks: jeremy | 2005-11-19 01:53:00 -0800 Apply [3092] to stable. Correct boolean handling in generated reader methods. Closes #2945. r3178@ks: jeremy | 2005-11-19 01:59:47 -0800 Apply [3094] to stable. Use query methods rather than readers in boolean tests for [3092]. References #2949. r3190@ks: jeremy | 2005-11-19 20:20:48 -0800 Apply [3098] to stable. Introducing the session_migration generator. Creates an add_session_table migration. Closes #2958. r3199@ks: jeremy | 2005-11-19 21:13:22 -0800 Apply [3101] to stable. Allows generator to specify migrations directory. Closes #2960. r3201@ks: jeremy | 2005-11-19 21:31:47 -0800 Apply [3103] to stable. Document request.env and request.host. Strip trailing whitespace. r3258@ks: jeremy | 2005-11-21 00:33:59 -0800 Apply [3139] to stable. Use Kernel.binding rather than binding to allow columns of that name. Closes #2973. r3269@ks: jeremy | 2005-11-21 04:46:40 -0800 Apply [3148] to stable. Simpler Mysql load test. r3285@ks: jeremy | 2005-11-22 13:33:04 -0800 Apply [3162] to stable. Model generator: correct relative path to test_helper in unit test. r3296@ks: jeremy | 2005-11-23 13:38:56 -0800 Apply [3170] to stable. Credit ticket author. References #2888. r3303@ks: jeremy | 2005-11-23 17:14:31 -0800 Apply [3051], [3052], [3053], [3059] to stable. Brings the Firebird adapter to 1.0. References #1874. r3305@ks: jeremy | 2005-11-23 17:33:51 -0800 Apply [3151] to stable. Don't put flash in session if sessions are disabled. r3306@ks: jeremy | 2005-11-23 17:39:24 -0800 Apply [3100], [3108] to stable. Makes new ActiveRecordStore sessions work correctly with components. r3307@ks: jeremy | 2005-11-23 17:52:14 -0800 Apply [3110] to stable. PostgreSQL: the purge_test_database Rake task shouldn't explicitly specify the template0 template when creating a fresh test database. References #2964. r3313@ks: jeremy | 2005-11-23 23:03:36 -0800 Apply [3182] to stable. Reloading a model doesn't lose track of its connection. Closes #2996. r3315@ks: jeremy | 2005-11-23 23:13:48 -0800 Apply [3184] to stable. SQLServer: resolve column aliasing/quoting collision when using limit or offset in an eager find. Closes #2974. r3339@ks: jeremy | 2005-12-01 17:03:16 -0800 Apply [3202] to stable. Firebird: updated for FireRuby 0.4.0. Closes #3009. r3363@ks: jeremy | 2005-12-06 22:26:27 -0800 Apply #428 changesets to stable: r3000, r3001, r3002, r3025, r3045, r3096, r3148, r3152, r3165, r3189, r3195, r3205, r3216, r3219, r3221, r3222. Closes #428. r3372@ks: jeremy | 2005-12-07 20:25:20 -0800 Apply [3233] to stable. Oracle: use syntax compatible with Oracle 8. Closes #3131. r3375@ks: jeremy | 2005-12-07 20:36:13 -0800 Apply [3235] to stable. PostgreSQL: more robust sequence name discovery. Closes #3087. r3378@ks: jeremy | 2005-12-07 20:41:47 -0800 Apply [3237] to stable. More robust relative url root discovery for SCGI compatibility. This solves the 'SCGI routes problem' -- you no longer need to prefix all your routes with the name of the SCGI mountpoint. Closes #3070. r3379@ks: jeremy | 2005-12-07 20:42:31 -0800 Don't warn on keep_flash in test. r3387@ks: jeremy | 2005-12-07 20:48:42 -0800 Apply [3242] to stable. Reloading an instance refreshes its aggregations as well as its associations. Closes #3024. r3389@ks: jeremy | 2005-12-07 20:52:39 -0800 Apply [3244] to stable. SQLite: find database file when RAILS_ROOT is a symlink. Closes #3116. r3392@ks: jeremy | 2005-12-07 21:10:42 -0800 Apply [3246] to stable. MySQL: more robust test for nullified result hashes. Closes #3124. r3401@ks: jeremy | 2005-12-07 23:30:11 -0800 Apply [3252] to stable. Oracle: active? performs a select instead of a commit. Closes #3133. r3404@ks: jeremy | 2005-12-08 15:24:44 -0800 Apply [3254] to stable. Fix some test failures due to MySQL assumptions. References #3149. r3407@ks: jeremy | 2005-12-09 10:39:00 -0800 Apply [3256] to stable. Generator copies files in binary mode. Closes #3156. r3410@ks: jeremy | 2005-12-09 10:48:32 -0800 Apply [3258] to stable. Fix shebang handling for empty files. Closes #2927. r3413@ks: jeremy | 2005-12-09 15:03:42 -0800 Apply [3260] to stable. Fix bundled mysql.rb to correctly check for PROTO_41. Fixed scramble41 with nil password. Fixed change_user with PROTO_41. r3416@ks: jeremy | 2005-12-09 15:07:21 -0800 Apply [3262] to stable. Generator looks in vendor/generators also. r3419@ks: jeremy | 2005-12-09 15:36:50 -0800 Apply [3165] to stable. MySQL, PostgreSQL: reconnect! also reconfigures the connection. Otherwise, the connection 'loses' its settings if it times out and is reconnected. References #2978. r3420@ks: jeremy | 2005-12-09 15:38:30 -0800 Apply [3264] to stable. MySQL: ensure that @config is set. r3421@ks: jeremy | 2005-12-09 15:42:40 -0800 Apply [3265] to stable. Name vendor/generators source differently from lib/generators source. r3429@ks: jeremy | 2005-12-09 16:55:16 -0800 Apply [3270] SQLServer rollup to stable. r3435@ks: jeremy | 2005-12-10 13:40:43 -0800 Apply [3274] to stable. MySQL: fixes for the bundled mysql.rb driver. Closes #3160. r3457@ks: jeremy | 2005-12-13 08:52:39 -0800 Apply [3293] to stable. MySQL: allow encoding option for mysql.rb driver. r3458@ks: jeremy | 2005-12-13 09:33:50 -0800 Roll back [3245] on stable. References #3116. r3465@ks: jeremy | 2005-12-13 10:15:33 -0800 Apply [3298] to stable. Don't used defined? on a scoped constant since it results in a const_missing call. r3724@ks: jeremy | 2006-02-09 10:13:41 -0800 Apply [3554] to stable. PostgreSQL: correctly parse negative integer column defaults. Closes #3776. r3730@ks: jeremy | 2006-02-09 11:42:57 -0800 Apply [3559] to stable. Closes #3581. r3846@ks: jeremy | 2006-02-26 15:24:58 -0800 silence test/unit whining DefaultsTest is empty r3847@ks: jeremy | 2006-02-26 15:26:53 -0800 Apply [3674] to stable. Closes #3591. r4960@ks: jeremy | 2006-08-06 23:56:18 -0700 Merged to stable: change the request.env example in AC::Base docs to a var that exists (REMOTE_IP doesn't) and isn't already wrapped by a request method (i.e. request.remote_ip). References #5113. r4961@ks: jeremy | 2006-08-07 00:11:59 -0700 Merged to stable: add :status option to send_data and send_file. Defaults to '200 OK'. References #5243. r4962@ks: jeremy | 2006-08-07 00:18:42 -0700 Merged to stable: real files and symlinks should be treated the same when compiling templates. References #5438. r4963@ks: jeremy | 2006-08-07 00:26:42 -0700 Merged to stable: Added ActionController.filter_parameter_logging that makes it easy to remove passwords, credit card numbers, and other sensitive information from being logged when a request is handled. References #1897. r4964@ks: jeremy | 2006-08-07 00:33:32 -0700 Merged to stable: correct spurious documentation example code which results in a SyntaxError. References [4210]. r4965@ks: jeremy | 2006-08-07 00:38:04 -0700 Merged to stable: remote_form_for can leave out the object parameter and default to the instance variable of the object_name, just like form_for. References [4215]. r4966@ks: jeremy | 2006-08-07 00:48:40 -0700 Merged to stable: update inconsistent migrations documentation. References #4683. r4967@ks: jeremy | 2006-08-07 00:52:34 -0700 Merge to stable: cache CgiRequest#request_parameters so that multiple calls don't re-parse multipart data. References [4256]. r4968@ks: jeremy | 2006-08-07 00:55:51 -0700 Merge to stable: only require redcloth/bluecloth if they are not already loaded. References [4257]. r4977@ks: jeremy | 2006-08-07 01:08:02 -0700 Merge to stable: skip silence stderr test if we can't STDERR.tell r4978@ks: jeremy | 2006-08-07 01:13:00 -0700 Merge to stable: update layout and content_for documentation to use yield rather than magic @content_for_layout instance variables. References [4263]. r4979@ks: jeremy | 2006-08-07 01:16:43 -0700 Merge to stable: add documentation for redirect_to :back's RedirectBackError exception; remove all remaining references to @params in the documentation. References [4267], [4268]. r4980@ks: jeremy | 2006-08-07 01:18:04 -0700 Merge to stable: fix documentation indentation. r4981@ks: jeremy | 2006-08-07 01:28:34 -0700 Merge to stable: mention in docs that config.frameworks doesn't work when getting Rails via Gems; documentation for AbstractRequest. References #4857, #4895. r4982@ks: jeremy | 2006-08-07 01:45:26 -0700 Merge to stable: fix syntax error in documentation. References #4679. r4983@ks: jeremy | 2006-08-07 01:50:47 -0700 Merge to stable: Enhance documentation for setting headers in integration tests. Skip auto HTTP prepending when its already there. References #4079. Add warning about the proper way to validate the presence of a foreign key. References #4147. r4984@ks: jeremy | 2006-08-07 02:08:00 -0700 Merge to stable: ActionController::Base Summary documentation rewrite; Fix text_helper.rb documentation rendering; Fixes bad rendering of JavaScriptMacrosHelper rdoc. References #4725, #4900, #4910. r4985@ks: jeremy | 2006-08-07 02:14:18 -0700 Merge to stable: documentation fix: integration test scripts don't require integration_test. References #4914.

(follow-up: ↓ 153 ) 05/03/07 18:19:42 changed by wv

  • priority changed from normal to high.
  • status changed from closed to reopened.
  • resolution deleted.

I have a RoR script that runs indefinitely. I.e. it is run from the shell, not through CGI. When there is activity after a couple of hours of idle time, messages of the following form are raised:

Mysql::Error: MySQL server has gone away: SELECT * FROM ...

This appears to me that this bug, as well as #2163, is not fixed.

Environment:

arch: linux sparc; Rails 1.2.3; ruby 1.8.5 (2006-12-04 patchlevel 2) [sparc-linux]; mysql Ver 14.12 Distrib 5.0.38, for unknown-linux-gnu (sparc) using readline 5.1

(in reply to: ↑ 152 ) 05/10/07 19:00:04 changed by dinsley

We're experiencing the exact same problem as the above poster. The script is being run from the shell via a cron. The script is archiving old user data so sometimes the idle time can be quite long, the following messages are raised:

ActiveRecord::StatementInvalid: Mysql::Error: Lost connection to MySQL server during query
ActiveRecord::StatementInvalid: Mysql::Error: MySQL server has gone away

Our enviroment is:

  • ruby 1.8.5 (2006-12-04 patchlevel 2) [x86_64-linux]
  • rails Rails 1.2.3
  • Mysql 5.0.27-max-log

(follow-up: ↓ 155 ) 05/22/07 17:29:38 changed by kevinclark

Can either of you provide some sort of failing test case, or instructions for how to reproduce?

(in reply to: ↑ 154 ) 05/22/07 19:11:42 changed by yyyc514

This hasn't been an issue for me in forever, so I can't help here.

05/23/07 10:05:10 changed by wv

  • attachment mysql_connection_timeout_controller_test.rb added.

Test-case for MySQL server has gone away

05/23/07 10:09:19 changed by wv

I attached a test case that manually sets the MySQL "wait_timeout" variable to 1 second, then waits 2 seconds and tries to use the MySQL connection. This fails.

The second and third testcase effectively call ActiveRecord::Base.connection.reconnect! before re-checking the connection. The immediate subsequent connection check now succeeds.

Expected behaviour:

  1. always check the connection before running a query (call ActiveRecord::Base.connection.verify!)
  2. Or re-run the query after this error

Instructions:

  1. Create a new rails application, and configure the test database in database.yml.
  2. Put the test in test/functional/
  3. Run: ruby test/functional/mysql_connection_timeout_controller_test.rb
Loaded suite test/functional/mysql_connection_timeout_controller_test
Started
E..
Finished in 6.122323 seconds.

  1) Error:
test_mysql_connection_after_timeout(MysqlConnectionTimeoutControllerTest):
ActiveRecord::StatementInvalid: Mysql::Error: MySQL server has gone away: select 1=1
    /usr/lib/ruby/gems/1.8/gems/activerecord-1.15.3/lib/active_record/connection_adapters/abstract_adapter.rb:128:in `log'
    /usr/lib/ruby/gems/1.8/gems/activerecord-1.15.3/lib/active_record/connection_adapters/mysql_adapter.rb:243:in `execute'
    /usr/lib/ruby/gems/1.8/gems/activerecord-1.15.3/lib/active_record/connection_adapters/mysql_adapter.rb:399:in `select'
    /usr/lib/ruby/gems/1.8/gems/activerecord-1.15.3/lib/active_record/connection_adapters/abstract/database_statements.rb:7:in `select_all'
    test/functional/mysql_connection_timeout_controller_test.rb:11:in `test_mysql_connection_after_timeout'

3 tests, 5 assertions, 0 failures, 1 errors

05/25/07 21:09:49 changed by bitsweat

(In [6838]) Test reconnection after MySQL client timeout. References #428.

05/25/07 21:10:48 changed by bitsweat

  • status changed from reopened to closed.
  • resolution set to fixed.

The folks having issues on Windows: this is a separate error. Please troubleshoot, fix, and open a new ticket with a well-tested patch. Thanks!

05/25/07 21:37:13 changed by wv

I have this issue on Linux. So I'm not quite sure how you find this issue "fixed" now?

05/26/07 02:51:24 changed by bitsweat

I applied your test case wv. Your assumption that the database would automatically reconnect when the connection has gone away is incorrect.