Ruby on Rails | Screencasts | Download | Documentation | Weblog | Community | Source

Ticket #1423 (closed enhancement: wontfix)

Opened 3 years ago

Last modified 1 year ago

[RESEARCH] smart saves

Reported by: minam Assigned to: David
Priority: normal Milestone: 1.x
Component: ActiveRecord Version: 0.12.1
Severity: normal Keywords: smart save
Cc: chewi@aura-online.co.uk

Description

Save only what changes, and only when something changes!

I've worked really hard to make sure performance is not negatively impacted by this feature, which has been frequently requested. In general:

* #find is no slower than before * #save is not significantly slower than before, and may be much, much faster if there is no data to be saved

This patch supports in-place modification of data, both for text and for serialized objects. Note that this patch does incur a performance hit for serialized objects, because an md5 hash of the Marshal.dumped object is used to determine whether the object is "dirty" or not. This overhead is only incurred when the serialized object is read from the model, and again before each save. Also, given that serialized objects are not common, this seems a fair compromise, given the more general usefulness of the "smart save".

Also, in-place String modifications are about twice as slow as native string operations, but all other string operations are unchanged. However, my pb can still do 10,000 String#replace calls in about .16s, so it's still very, very fast.

Please, I'm hungry for feedback on this! Let me know what you think.

Attachments

smartsave.diff (19.1 kB) - added by minam on 06/11/05 10:24:09.
compare-benchmark.rb (0.6 kB) - added by Ulysses on 06/11/05 11:06:06.
Benchmark of MD5 vs String#hash vs. String==
smartsave2.diff (18.9 kB) - added by minam on 06/11/05 21:55:56.
Second attempt, using string comparisons for dirty-checking, instead of an actual write-barrier
smartsave3.diff (19.4 kB) - added by minam on 06/11/05 23:39:07.
GeneralWriteBarrier needed to be a little more sophisticated
smartsave-bm.rb (2.5 kB) - added by minam on 06/11/05 23:39:32.
Benchmarks for comparing implementations
benchmarks.txt (2.2 kB) - added by minam on 06/11/05 23:42:13.
Results of running the smartsave-bm.rb benchmark script both with and without smart saves
startsave4.diff (19.4 kB) - added by minam on 06/13/05 09:43:21.
Updated patch that works with latest HEAD

Change History

06/11/05 10:24:09 changed by minam

  • attachment smartsave.diff added.

06/11/05 10:26:48 changed by minam

I should add: this patch also makes sure optimistic locking continues to work. If a model uses optimistic locking, doing a #save call will always save at least the lock_version column, even if nothing else has changed about the record.

06/11/05 11:05:25 changed by ulysses

Regarding comparison of serialized objects:

The overhead of MD5 is not really required. Instead of comparing the MD5 of Marshal.dumps' output, we can skip the MD5 and just compare the Marshal'd strings.

In benchmarks, comparing the dumped strings is about 3 times faster than comparing the MD5. (See attached compare-benchmark.rb)

I should point out that the increase of speed does come at a cost of memory. Using MD5 means that the dumped string can be garbage collected after the hash is calculated -- using string comparison means it must be held until the record is GC'ed.

However, the signifigance of this may be minor. Most of the time serialized objects are not very large, and the GC does not run on every request.

06/11/05 11:06:06 changed by Ulysses

  • attachment compare-benchmark.rb added.

Benchmark of MD5 vs String#hash vs. String==

06/11/05 16:45:28 changed by nzkoz

Looks really nice Jamis, a couple of things to consider. Reducing the number of columns written to can be problematic. Without optimistic locking you can end up with inconsistent values which don't pass the validation. Also, people using MySQL's TIMESTAMP column will get severely screwed.

http://www.rollerweblogger.org/comments/roller/blog/mysql_update_kills_date_fields

It'd be nice if we could have the 'save only when changed' functionality but still update all the columns?

06/11/05 21:37:40 changed by minam

Thanks, Michael. However, Ulysses and I spent some time on this patch tonight, and found a real show-stopper: the write barrier that detects in-place changes on strings has issues with propogating the $1, $2, etc. variables up to the caller. This means that existing code (like "x.y.to_time") breaks rather spectacularly. Judging from prior discussions about similar issues on ruby-talk (http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/79303), there does not appear to be any way to work around this.

We could also just try another approach to detecting changes on strings. If the subclass just stores it's original value (thus doubling the memory used by strings, unfortunately), the "dirty?" method can just compare self with the copy. This would only cause slowdowns when checking the dirtiness of a value, which would be infrequent.

Ulysses suggested making this feature optional, and then just freezing strings and serializable attributes when they are queried, making in-place modifications illegal. We may investigate this approach further if other options don't turn up.

06/11/05 21:55:56 changed by minam

  • attachment smartsave2.diff added.

Second attempt, using string comparisons for dirty-checking, instead of an actual write-barrier

06/11/05 23:39:07 changed by minam

  • attachment smartsave3.diff added.

GeneralWriteBarrier needed to be a little more sophisticated

06/11/05 23:39:32 changed by minam

  • attachment smartsave-bm.rb added.

Benchmarks for comparing implementations

06/11/05 23:42:13 changed by minam

  • attachment benchmarks.txt added.

Results of running the smartsave-bm.rb benchmark script both with and without smart saves

06/11/05 23:49:19 changed by minam

Michael, I understand what you are saying about the need to update entire records, but wouldn't it make more sense just to encourage people to use optimistic locking in those kinds of situations? By not updating all the columns you can get some significant speed boosts (see the benchmarks results attached to this ticket).

06/13/05 09:43:21 changed by minam

  • attachment startsave4.diff added.

Updated patch that works with latest HEAD

06/13/05 14:46:03 changed by nzkoz

Absolutely it makes sense, however there's still the TIMESTAMP problem. If you update columns in a given row, and that row contains a TIMESTAMP and you don't update the TIMESTAMP, the TIMESTAMP gets set to now(). It's a feature, not a bug, but a *lot* of applications are using TIMESTAMP.

I like the idea of freezing the attributes. I have to say it's never occured to me to try:

  @model.something.gsub!("asdf", "jklsemicolon");

So it doesn't break my expectations. Then again, I came from Java where such things are impossible...

12/14/05 14:04:50 changed by Rui Martins

Wouldn't it be easyer (for now) to just make a smart update_attributes method ?

Anyone concerned with performance, would use the update_attributes, instead of the save method.

At least, this is what I tought the method was good for ! But after checking it wasn't working as I expected, I looked into the source, and it actually uses the same save method.

This only needs a simple save_attributes method, which will only save what we tell it to save.

NOTE: TIMESTAMPS, and similar problematic fields, could be always updated or something like that.

Example:

	def decreaseStock( quantity )
		# Deduce Items from Stock
		self.quantity -= quantity

		# Keep a record of the Sale
		self.itemsSold += quantity

		# save! # => Optimized to =>
		raise( 'Fail!' ) unless update_attributes( { :quantity => self.quantity, :itemsSold => self.itemsSold } )
	end

12/15/05 15:45:56 changed by Rui Martins

Better yet, is to forget the problematic fields (TIMESTAMPS and similar), since we are empowering the developer, to only save what he wants to save, hence optimizing its code.

What this means is that whatever the developer is doing is his responsability. He should now that he has some specific requirements that he must meet, and not expect that some background code will find his solution shortcomings.

This is specially true, when one has to make choices, performance wise for example, that we shouldn't be doing, just because we are trying to guess what the majority of the cases will want.

Empower the developer, make him responsible for is acts.

08/30/06 05:21:30 changed by bitsweat

  • severity changed from critical to normal.
  • cc deleted.
  • component changed from ActionPack to ActiveRecord.
  • priority changed from high to normal.
  • milestone changed from 0.9.5 to 1.x.
  • keywords deleted.
  • type set to enhancement.

(in reply to: ↑ description ) 09/03/06 22:49:38 changed by anonymous

  • keywords changed from rgergerger to smart save.
  • priority changed from high to normal.
  • component changed from ActionPack to ActiveRecord.
  • severity changed from critical to normal.
  • type set to enhancement.

Replying to minam:

Save only what changes, and only when something changes! I've worked really hard to make sure performance is not negatively impacted by this feature, which has been frequently requested. In general: * #find is no slower than before * #save is not significantly slower than before, and may be much, much faster if there is no data to be saved This patch supports in-place modification of data, both for text and for serialized objects. Note that this patch does incur a performance hit for serialized objects, because an md5 hash of the Marshal.dumped object is used to determine whether the object is "dirty" or not. This overhead is only incurred when the serialized object is read from the model, and again before each save. Also, given that serialized objects are not common, this seems a fair compromise, given the more general usefulness of the "smart save". Also, in-place String modifications are about twice as slow as native string operations, but all other string operations are unchanged. However, my pb can still do 10,000 String#replace calls in about .16s, so it's still very, very fast. Please, I'm hungry for feedback on this! Let me know what you think.

Why not just track the touch with a child object ? I mean, when an attribute is changed just set the object dirty ? But the real question that have popped into my mind is, what is the efficient way to track the attribute entrance ?

01/16/07 18:37:04 changed by chewi

  • cc set to chewi@aura-online.co.uk.

How's this looking? Would you say it's fairly safe to use now? Will it apply against 1.2? I have a feeling it won't! Looks very promising though.

01/17/07 19:22:59 changed by minam

  • status changed from new to closed.
  • resolution set to wontfix.

The patch hasn't been touched since before Rails 1.0, and is guaranteed NOT to work with the latest Rails. In fact, I'm going to just close this ticket. This would be better managed as a plugin, anyway, so if someone wants to try and modernize this patch, please consider releasing it as a plugin, instead of submitting a patch.