Ruby on Rails | Screencasts | Download | Documentation | Weblog | Community | Source

Ticket #1349 (closed defect: wontfix)

Opened 3 years ago

Last modified 2 years ago

[PATCH] save only when a record has been modified

Reported by: minam Assigned to: David
Priority: normal Milestone:
Component: ActiveRecord Version: 0.12.1
Severity: normal Keywords:
Cc: Dee.Zsombor@gmail.com

Description

This is NOT a "save only what has been modified patch". All columns of every record will still be updated on a save. This patch merely checks to see if anything has been modified in the record, and if not, the record will not be saved.

This patch was inspired by a particularly nasty issue that arises when you have after_create hooks that (as a side effect) cause the record to be saved again. This causes the "@new_record_before_save" flag to be reset, and means that subsequent after_create/after_save hooks will misinterpret the record as being "not new", which can wreak havoc in (among other places) the after_save hook for habtm collections.

I did some naive benchmarking on this patch (on my 1.5ghz Powerbook, using MySQL). Without the patch, 1000 finds took 6.04s. When using Object#hash to snapshot the attributes hash, it took anywhere from 6.04s to 6.06s for 1000 finds. (The problem with Object#hash, though, is the danger of collisions, which can result in false negatives in Base#dirty? and result in records not being saved when they ought to.) Using MD5 hashing, 1000 finds took about 6.11s.

I did not benchmark #save calls, though they will suffer the same performance penalty (because the digest has to be computed in order to determine whether it differs from the original), but save is not typically called as often as find.

Attachments

save-only-when-dirty.diff (9.1 kB) - added by minam on 05/24/05 10:11:22.
save-only-when-dirty-2.diff (8.1 kB) - added by minam on 05/24/05 11:41:45.
DRY up the AdapterInterceptor definition. This supercedes the previous patch.
bad-callbacks.rb (1.3 kB) - added by minam on 06/10/05 11:19:14.
A minimal script to duplicate the problem
bad-callbacks.2.rb (1.3 kB) - added by minam on 06/10/05 11:22:22.
Fixed script for duplicating the problem (supercedes previous script)

Change History

05/24/05 10:11:22 changed by minam

  • attachment save-only-when-dirty.diff added.

05/24/05 11:41:45 changed by minam

  • attachment save-only-when-dirty-2.diff added.

DRY up the AdapterInterceptor definition. This supercedes the previous patch.

05/25/05 08:54:09 changed by Dee.Zsombor@gmail.com

  • cc set to Dee.Zsombor@gmail.com.

Could we not set a member like @dirty whenever an assignment occurs, instead of computing or MD5 sums on attributes hash? That would be faster and collision safe too. Also there is a difference when you are doing MD5summing trivial records, or when doing the same with large text columns. Benchmarking is very subjective.

05/26/05 15:21:17 changed by minam

We could, but that wouldn't catch cases where a field is modified in place, like "x.y << 'hello'". Such an operation would leave the record thinking it was clean, when it is really dirty. Hence, the need to snapshot the record's state and test against that snapshot whenever checking for dirtiness.

05/27/05 04:04:23 changed by Dee.Zsombor@gmail.com

Yes but could we not observe inplace modifications also? For example if we would overwrite operators = and << (plus any others needed) for all objects in the attribute hash to set record level @dirty before doing the actual change. Or we could place @dirty flag at attribute level, thus producing a "save only what has been modified patch" too.

Another option would be to store automatic proxy objects in attributes hash. These would have only two methods: dirty? and clear_dirty!. All other methods would be redirected trough method_missing to actual attributes. Before this redirection we would inspect method name and set @dirty flag if appropiate.

05/27/05 10:03:04 changed by minam

That could work, but I'm skeptical of the amount of overhead it would require. Creating a new proxy instance for each attribute is going to be expensive, especially when a query returns lots of objects.

Also, it's not a trivial matter to know what methods allow in-place modifications. Consider the ability of AR to serialize arbitrary objects into and out of the database. Array alone includes multiple ways to modify the array in-place.

Please, put together a patch so we can see how it would perform. I'm sure there is a way to make it work well, fast, and seamlessly, but it's not a high enough priority for me right now. This patch currently works well enough and fast enough for my purposes, and has the added benefit of adding very little additional complexity to AR (which is already very, very complex). Mostly, I just needed a way to eliminate negative side effects from after-create hooks that attempt to save the object again.

06/10/05 11:19:14 changed by minam

  • attachment bad-callbacks.rb added.

A minimal script to duplicate the problem

06/10/05 11:22:22 changed by minam

  • attachment bad-callbacks.2.rb added.

Fixed script for duplicating the problem (supercedes previous script)

06/11/05 10:56:34 changed by minam

  • status changed from new to closed.
  • resolution set to wontfix.

This patch is superceded by #1423.