Ruby on Rails | Screencasts | Download | Documentation | Weblog | Community | Source

Ticket #5171 (closed enhancement: fixed)

Opened 2 years ago

Last modified 2 years ago

[PATCH] $$ and getElementsByClassName use XPath if possible (huge performance boost)

Reported by: rubyonrails@andrewdupont.net Assigned to: sam
Priority: normal Milestone:
Component: Prototype Version:
Severity: normal Keywords: css xpath
Cc: alex@spamcop.net, mislav

Description

Praise be to Joe Hewitt, whose CSS-to-XPath script I was able to adapt for this patch. I'm a regular expression doofus, so otherwise I'd have been dead in the water.

For a while I've had the idea of translating CSS strings to XPath strings to speed up the $$ function (which, while handy, is horribly slow), but I couldn't justify writing all that code just for Firefox. But XPath support has made it into WebKit nightly builds, which means the next major version of Safari will probably get XPath. And Opera 9 supports it, too, leaving IE as the only member of the big 4 that will not. (I can always dream of IE8).

This patch introduces a new method for String.prototype so that you can do "li#main".toXPath() and it'll return //li[@id='main']. It also tests for document.evaluate and, if successful, re-defines $$ and document.getElementsByClassName to use XPath. Browsers that do not support XPath will keep the ordinary versions (which, while a little slow, are still perfectly fine), so this doesn't break compatibility with any browser.

What's the point of all this? A performance increase that's at least sixfold and usually much greater. To use getElementsByClassName as an example: since it fetches all tags on the page and loops through them, the time it takes to execute rises linearly with the number of elements on the page, not the number of elements you're requesting.

This means that XPath really kicks ass in situations where there's a really big haystack and a really small needle. In informal testing I've had XPath beat the ordinary $$ function by a factor of 20 on large pages.

Attached is xpath.js, which can be dropped into an existing Prototype install without modifying any files. (I'll work on testcases soon. I swear.) I've tried to make the code footprint as light as possible, but the String.prototype.toXPath method is somewhat bulky out of necessity.

Attachments

xpath.js (3.8 kB) - added by rubyonrails@andrewdupont.net on 05/24/06 02:58:33.
xpath2.js (4.4 kB) - added by Andrew Dupont <rubyonrails@andrewdupont.net> on 06/14/06 16:14:54.
Now free of syntax errors! (Andrew hits self in back of head)
xpath3.js (4.5 kB) - added by Andrew Dupont <rubyonrails@andrewdupont.net> on 06/14/06 17:20:45.
Once more, with bugfixes.
xpath4.js (4.4 kB) - added by Andrew Dupont <rubyonrails@andrewdupont.net> on 06/30/06 15:56:04.
More fixes

Change History

05/24/06 02:58:33 changed by rubyonrails@andrewdupont.net

  • attachment xpath.js added.

05/24/06 03:03:15 changed by anonymous

  • keywords set to xpath dom $$.
  • type changed from defect to enhancement.

05/24/06 08:44:53 changed by Martin Bialasinski

Very nice :-)

As you have noted, "!=" is not part of CSS. There is a patch open to change it to "not()", see #5170. Could you also enable usage of not()?

With using XPath, there is also a major change the way $$() works. Unlike to implementation so far, the XPath method only works after the DOM tree has been fully created, i.e. not prior to onload. At least this is what I saw in my tests after Dean Edward's post about using XPath. Can you confirm this? If this is the case, overriding the original methods should be done in an onload event handler.

05/26/06 04:20:00 changed by Andrew Dupont <rubyonrails@andrewdupont.net>

Martin, that's not the behavior I'm observing. I ran some informal tests just now in Firefox and Safari; in both, document.evaluate seemed to behave exactly like any other DOM function. If I attached it to an onload handler, it worked fine (same with DOMContentLoaded in Firefox). If I put it at the bottom of the HTML file, right before the body tag closed, it performed identically. If I put it somewhere else in the document, it'd successfully return anything above the code block that matched the expression.

Can you elaborate on your testing? I want to make sure I consider scenarios that I'm not creative enough to envision.

Also, good call on the not() syntax; I'll add that in this weekend.

05/26/06 21:57:13 changed by

  • cc set to alex@spamcop.net.

06/14/06 16:14:54 changed by Andrew Dupont <rubyonrails@andrewdupont.net>

  • attachment xpath2.js added.

Now free of syntax errors! (Andrew hits self in back of head)

06/14/06 16:17:17 changed by Andrew Dupont <rubyonrails@andrewdupont.net>

OK, the version I just added also has preliminary support for not().

If there are any XPath gurus in the house, I'd appreciate a brainstorm. Give me some ususual CSS selectors and their XPath equivalents so that I can put together some test cases.

06/14/06 17:20:45 changed by Andrew Dupont <rubyonrails@andrewdupont.net>

  • attachment xpath3.js added.

Once more, with bugfixes.

06/14/06 20:52:40 changed by anonymous

I just tried 'div.animator'.toXPath() and it bugs out. I'm looking at the code to figure out what's going on but you'll be able to figure it out faster than me

06/14/06 20:57:40 changed by anonymous

swap lines 54 and 55 and it's fixed. there's a bracket level error.

06/14/06 21:45:14 changed by anonymous

it would be better to use $$ = function() rather than function $$() as in the latter form IE uses the xpath version despite the conditional block not being run.

06/30/06 15:55:06 changed by Andrew Dupont <rubyonrails@andrewdupont.net>

OK, this fixes the transposed lines and the var $$ = function things.

06/30/06 15:56:04 changed by Andrew Dupont <rubyonrails@andrewdupont.net>

  • attachment xpath4.js added.

More fixes

07/05/06 03:35:27 changed by hi-world cup

  • keywords changed from xpath dom $$ to rthml tab space editor js.

07/09/06 00:48:22 changed by Andrew Dupont <rubyonrails@andrewdupont.net>

  • summary changed from hi-world cup to [PATCH] $$ and getElementsByClassName use XPath if possible (huge performance boost).

09/03/06 21:05:07 changed by madrobby

  • status changed from new to closed.
  • resolution set to untested.

Unit tests, anyone?

10/19/06 16:21:50 changed by jwalsh04

After applying this patch, the parentElement parameter of document.getElementsByClassName is ignored.

11/25/06 14:32:46 changed by mislav

  • cc changed from alex@spamcop.net to alex@spamcop.net, mislav.
  • keywords changed from rthml tab space editor js to css xpath.

03/03/07 19:05:22 changed by mislav

  • status changed from closed to reopened.
  • resolution deleted.

03/03/07 19:06:06 changed by mislav

  • status changed from reopened to closed.
  • resolution set to fixed.

Andrew is now tackling this in the selector branch... and it has tests!