Ruby on Rails | Screencasts | Download | Documentation | Weblog | Community | Source

Ticket #9030 (closed defect: duplicate)

Opened 1 year ago

Last modified 4 months ago

[PATCH] UrlEncodedPairParser fails with complex name schemes

Reported by: jdrowell Assigned to: core
Priority: normal Milestone: 2.x
Component: ActionPack Version: edge
Severity: normal Keywords:
Cc:

Description

Background: Due to a few nested resources combined with complex forms, I have a naming scheme like "a[b][][c][d]" for my form fields (actually it's e.g. "company[branch][][address][street|city_id|etc.]"). The forms worked fine while I only had the "[]" and "[][xx]" formats, but broke with "[][xx][xx]". Here's sample output:

ActionController::UrlEncodedPairParser.new([['a[b][][c][d]','1'],['a[b][][c][d]','2'],['a[b][][c][d]','3']]).result
=> {"a"=>{"b"=>[{"c"=>{}}, {"c"=>{}}, {"c"=>{}}]}}

which is obviously (?) broken. After the patch:

ActionController::UrlEncodedPairParser.new([['a[b][][c][d]','1'],['a[b][][c][d]','2'],['a[b][][c][d]','3']]).result
=> {"a"=>{"b"=>[{"c"=>{"d"=>"1"}}, {"c"=>{"d"=>"2"}}, {"c"=>{"d"=>"3"}}]}}

which is what I wanted (and looks correct).

The only thing that the patch does is remove the ".with_indifferent_access" while pushing a hash. I'm not really sure why that was used in the first place. Also "rake test" doesn't run on my tree (./test/controller/new_render_test.rb:3: superclass mismatch for class Customer (TypeError)) so I didn't add a test for this.

Attachments

request.rb.diff (0.5 kB) - added by jdrowell on 07/19/07 23:01:43.
Removes .with_indifferent_access from hash pushes in UrlEncodedPairParser
indifferent.diff (3.5 kB) - added by jdrowell on 07/20/07 13:44:21.
Fixes interaction between HashWithIndifferentAccess and UrlEncodedPairParser and adds test
indifferent.2.diff (3.5 kB) - added by jdrowell on 07/20/07 13:46:50.
Fixes interaction between HashWithIndifferentAccess and UrlEncodedPairParser and adds test
properly_handle_depth_first_array_param_order_as_from_prototype_form_serialize.diff (8.2 kB) - added by lastobelus on 11/08/07 12:03:20.
refactored UrlEncodedPairParser to handle array params in breadth-first order

Change History

07/19/07 23:01:43 changed by jdrowell

  • attachment request.rb.diff added.

Removes .with_indifferent_access from hash pushes in UrlEncodedPairParser

07/20/07 06:31:01 changed by manfred

  • status changed from new to closed.
  • resolution set to incomplete.

You can't just remove .with_indifferent_access, that method converts the Hash into a HashWithIndifferentAccess. Please look for a different solution and _always_ add tests.

Please reopen the ticket if you've addressed these issues.

07/20/07 13:41:04 changed by jdrowell

  • status changed from closed to reopened.
  • resolution deleted.

I determined the source of the problem. The parsing algorithm in UrlEncodedPairParser (apart from being too C-like for my taste) relies heavily on object references (like pointers) and HashWithIndifferentAccess doesn't make any effort to avoid breaking references (changing object_id). Also there seemed to be a bug in HashWithIndifferentAccess where you could call .with_indifferent_access more than once and overload the method aliases. I fixed these and added a test case. Will attach a new patch.

07/20/07 13:44:21 changed by jdrowell

  • attachment indifferent.diff added.

Fixes interaction between HashWithIndifferentAccess and UrlEncodedPairParser and adds test

07/20/07 13:46:50 changed by jdrowell

  • attachment indifferent.2.diff added.

Fixes interaction between HashWithIndifferentAccess and UrlEncodedPairParser and adds test

11/08/07 06:38:20 changed by lastobelus

how does this relate to #8775?

11/08/07 06:47:47 changed by jdrowell

I don't think it's related at all. The problem in #8775 is on the client end (Prototype), while this is on the server side (Rails). The fact that they both deal with how pairs are encoded is the only similarity.

This bug (#9030) is about how references are used to marginally speed up pair parsing at the cost of breaking the parsed result. I'd like to see UrlEncodedPairParser rewritten in an object oriented way instead of a C-like reference passing convoluted way. Unfortunately the patch acceptance process in edge Rails is very slow and I ended up just working around the issue instead of using my own patch (which works!) because it seems it'll never get merged. Keeping one's personalized version of edge is more trouble than it's worth.

11/08/07 07:47:52 changed by manfred

The patch acceptance process changed to a peer reviewed process a few months ago, it is explained in point 5 on the wiki frontpage. This has really helped speed up the acceptance.

I was wondering, can you reduce the testcase you wrote to a smaller example that is descriptive of the problem? Maybe rename the testcase to "test_query_parameters_with_deep_reference"?

11/08/07 11:23:21 changed by lastobelus

I think they are somewhat related. My attached a patch fixes both bugs (by also incorporating your change to HashWithIndifferentAccess).

I've attached this patch to #8775 as well.

I haven't done any benchmarks yet to compare speed of my parser with original, but I will in the next few days and if it is slower I will have a go at optimizing it.

11/08/07 12:03:20 changed by lastobelus

  • attachment properly_handle_depth_first_array_param_order_as_from_prototype_form_serialize.diff added.

refactored UrlEncodedPairParser to handle array params in breadth-first order

11/08/07 12:29:24 changed by mislav

The patch still looks kind of big. You are encouraged to try slim it down and add more tests, but I don't like this:

ActionController::AbstractRequest.parse_query_parameters('x[y][][z]=10&x[y][][w]=a&x[y][][z]=20&x[y][][z]=30&x[y][][w]=c')
#-> {'x' => {'y' => [{'z' => '10', 'w' => 'a'}, {'z' => '20', 'w' => 'c'}, {'z' => '30'}]}}

It should remain like before:

{'x' => {'y' => [{'z' => '10', 'w' => 'a'}, {'z' => '20'}, {'z' => '30', 'w' => 'c'}]}}

By trying to "fix the holes" (as you call them) you're only breaking the parser.

11/08/07 12:40:45 changed by lastobelus

I'm not trying to fix the holes; I don't care about the holes. I'm content to ensure that every row of a set of array params has every key.

The existing parser was broken in the case where not every row had the first key anyway, and very broken if you wanted a "vertical" layout of the rows of an array param instead of horizontal -- ie, exactly the result produced by Form.serialize in Prototype 1.5.1 & 1.6.

11/08/07 12:46:17 changed by lastobelus

In the example you mention, if you do:

ActionController::AbstractRequest.parse_query_parameters('x[y][][z]=10&x[y][][w]=a&x[y][][w]=c&x[y][][z]=30&x[y][][w]=d')

the original parser & mine produce the same result -- ie, the new parser is consistent in what it does with missing keys, and the old parser was not.

with the old parser you could easily have a working form that had some rows with missing keys and break it in a subtle, hard to track down way justby changing the order of fields in a row.

It's the old parser that is broken.

11/08/07 13:22:00 changed by mislav

Can you tell me how your parser would parse this (I wrote newlines instead of "&"):

a[][foo]=1
a[][foo]=2
a[][foo]=3
a[][bar]=4

11/08/07 22:47:43 changed by lastobelus

Sure, it parses it as:

=> {"a"=>[{"foo"=>"1", "bar"=>"4"}, {"foo"=>"2"}, {"foo"=>"3"}]}

instead of

 {"a"=>[{"foo"=>"1"}, {"foo"=>"2"}, {"foo"=>"3", "bar"=>"4"}]}

This is the one class of results where my parser returns a different result than the original parser, however it is necessarily so.

Note the original parser only handles sparse arrays well when the first column is fully populated. This can lead to a situation where you think your form is working and it stops working if you change the order of fields so the first column is a sparsely populated one.

You have to pick between having the first key determine the cardinality of the array (current way), or having each key get put in the lowest inedx in which it fits (what my parser does).

With the first (current) way you have to restrict page layouts to display array row by row and Prototype has to be refactored to exactly preserve the physical order of form fields on a page when it serializes.

With the second way you can layout array params with either rows or column corresponding to records so long as you supply every field for every row, or all missing fields (call them holes in a sparse array) will bubble to the bottom. Prototype can serialize as it does now (so long as it preserves the order of individual fields an an array param).

So, with the new parser you can do, for example:

            Player1     Player2     Player3
Best            8           9           7      
Worst           0           2           4

query:  'players[][best]=8&players[][best]=9&players[][best]=7&players[][worst]=0&players[][worst]=2&players[][worst]=4'

which the original parser would parse as:

{"players"=>[{"best"=>"8"}, {"best"=>"9"}, {"best"=>"7", "worst"=>"0"}, {"worst"=>"2"}, {"worst"=>"4"}]}

but the new parser parses correctly as:

{"players"=>[{"best"=>"8", "worst"=>"0"}, {"best"=>"9", "worst"=>"2"}, {"best"=>"7", "worst"=>"4"}]}

So, implementing the patch may break some existing code, but fixing that code is a straightforward and a somewhat natural fix -- making sure every column is supplied for every row in an array param.

11/08/07 22:57:14 changed by lastobelus

By the way, I did open a new ticket for this patch, it is #10101

01/14/08 22:02:16 changed by lastobelus

on #10101 I have added a new patch and a test app.

The new parser handles multiple nesting of arrays:

object[subobject][][subsubobject][]

to accomplish this, an explicit array index syntax was added.

Also, for the checkbox hack to work when arrays can be depth first or breadth first, an explicit syntax for the hidden field name was added.

The new parser and patched form_helper allow using fields_for to create forms for arbitrarily nested arrays of objects.

01/23/08 14:18:59 changed by hildolfur

wow, thanks for supplying a patch to that. I discovered deep form nesting recently, and it really rocks, especially with make_resourceful controller macros. The patch is against release 1.2, isn't it?

I need the patch functionality for a 2.0.1 project, so is it advisable to wait a few days for a new patch or reimplement your changes in 2.0.1?

01/23/08 14:20:22 changed by hildolfur

copied above post to #10101, posted in the wrong place

(follow-up: ↓ 17 ) 03/02/08 14:08:38 changed by thomas.lee

  • status changed from reopened to closed.
  • resolution set to duplicate.

bitsweat asked me to attempt to merge your (jdrowell) patch with a patch I wrote for 10797. After a bit of analysis, I'm not really sure if there's much I can take from this patch since:

1. Your modifications to with_indifferent_access breaks existing semantics: some people may be relying on the fact that with_indifferent_access returns a *copy* of the target. Unless somebody thinks otherwise, I think it's in the best interests of the framework to keep the existing semantics.

2. Your modifications to convert_value modify the given array in-place. This seems to be an attempt to fix the original bug you reported, but it will lead to weird-ass bugs all over the place. In actual fact, this behaviour is perfectly acceptable: it's just the parameter parsing code that's broken. See the patch for #10797.

lastobelus, while I think your PHP-esque syntax for arrays in Rails form is interesting, the changes are a little too sweeping and broad. Worse, they change the semantics of something that's (reasonably) well understood. I'd recommend breaking up that patch into a few smaller patches and asking for a discussion on the mailing lists. In any case, this patch addresses far more than the immediate issue.

Marking this as a duplicate of issue #10797.

(in reply to: ↑ 16 ; follow-up: ↓ 20 ) 03/05/08 07:06:24 changed by jdrowell

Thanks for the update Thomas.

Replying to thomas.lee:

1. Your modifications to with_indifferent_access breaks existing semantics: some people may be relying on the fact that with_indifferent_access returns a *copy* of the target. Unless somebody thinks otherwise, I think it's in the best interests of the framework to keep the existing semantics.

I haven't examied with_indifferent_access in detail to be able to tell if it always returns a copy or not. My main problem with it was that UrlEncodedPairParser assumed it did _not_ return a copy, so assuming that at least conceptually the original UrlEncodedPairParser was algorithmically correct, then with_indifferent_access had to be changed also. As I tried to express when I made the initial bug report, I personally think that UrlEncodedPairParser is badly coded and is a hack as it is, and I was trying to minimally change it and it's dependencies in order to make it work correctly. A rewrite would be much better.

Other ppl in this thread have come up with different solutions. There are indeed a number of good solutions to this parsing problem. What we need is just something that Rails is great at: a convention. Tell people how values are parsed, how they should (and should not) build their complex forms for them to work out of the box, and code the parsing method in a Ruby-endorsed way (which the current implementation surely isn't). This code (UrlEncodedPairParser) is not called in long loops and should not be obfuscated for the sake of a slight performance gain (e.g. passing recursive parameters by reference).

That said I look forward to looking at your solution to the problem in #10797.

03/05/08 09:08:47 changed by thomas.lee

I haven't examied with_indifferent_access in detail to be able to tell if it always returns a copy or not. My main problem with it was that UrlEncodedPairParser assumed it did _not_ return a copy, so assuming that at least conceptually the original UrlEncodedPairParser was algorithmically correct, then with_indifferent_access had to be changed also.

That's the thing: UrlEncodedPairParser's assumption was flat-out wrong. And at a glance, it's very easy to miss this bug - you can understand how somebody could make such a mistake.

As I tried to express when I made the initial bug report, I personally think that UrlEncodedPairParser is badly coded and is a hack as it is, and I was trying to minimally change it and it's dependencies in order to make it work correctly. A rewrite would be much better.

I completely and wholeheartedly agree. But as you'll see from the patch in #10797, there is a simpler solution still.

Other ppl in this thread have come up with different solutions. There are indeed a number of good solutions to this parsing problem. What we need is just something that Rails is great at: a convention. Tell people how values are parsed, how they should (and should not) build their complex forms for them to work out of the box, and code the parsing method in a Ruby-endorsed way (which the current implementation surely isn't). This code (UrlEncodedPairParser) is not called in long loops and should not be obfuscated for the sake of a slight performance gain (e.g. passing recursive parameters by reference).

Can I suggest you open another ticket for just this? Your points are most definitely valid. I would, however, stress that we simplify this code before we go adding new features to the parser (as others have done in this ticket). New features and simplification rarely go hand in hand.

Anyways, hope you find my patch handy. :)

03/05/08 18:49:54 changed by lastobelus

I will be maintaining the patch in #10101 on an ongoing basis regardless of whether it gets folded into core as every rails app that I've developed so far has required some degree of nesting of entities (sometimes against my recommendations, but be that as it may).

It will likely only be re-integrated into edge every 2-3 months however, as on applications that are under development I only upgrade rails versions about that often.

(in reply to: ↑ 17 ; follow-up: ↓ 21 ) 03/11/08 12:36:13 changed by thomas.lee

Replying to jdrowell:

Thanks for the update Thomas.

I think I might have been wrong. :)

After spending a little more time with this bug, I can understand your frustration with HashWithIndifferentAccess. It really is a nasty piece of work, isn't it? The fact that HashWithIndifferentAccess behaves so differently to the default Hash class (sometimes storing copies, sometimes just keeping the reference) is all sorts of confusing.

What's even more crazy is that, at least in the unit tests for actionpack, the code doesn't seem to rely on the fact that the hashes are instances of HashWithIndifferentAccess at all. You can make them plain old hashes and all the bugs we see here will go away. You could probably seal the deal by just making a call to #with_indifferent_access on the result Hash once parsing is complete.

In any case, managed to keep the necessary changes inside UrlEncodedPairParser. They should be checked in with the patch for #10797. Let me know if you have any ideas for further simplifying the parameter parsing code.

(in reply to: ↑ 20 ) 03/11/08 14:49:38 changed by jdrowell

Replying to thomas.lee:

I think I might have been wrong. :)

Well, removing the #with_indifferent_access call was my first solution ;) (see my initial bug report). Then manfred rejected the solution, and I went on to fix the class itself. Then it was rejected because it changed something too internal in Rails. My logic was: hack it to change the least possible code (thus the one-liner), hoping someone will rewrite the whole method later. Failing that, fix the problem at its source (which my last patch addresses). I will eventually rewrite the parsing code if nobody else does it (it's in a "works for me" state right now though), but I think we agree now that HashWithIndifferentAccess is indeed buggy and should also be fixed to prevent further nasty bugs from creeping in somewhere else. Maybe adding a bunch of tests to force HashWithIndifferentAccess to behave sanely? Maybe audit its use in Rails and as you suggested only call it and the very end of the internal request processing (i.e. just before controllers get involved) in _all_ cases?

I think the syntactic sugar involved in using HashWithIndifferentAccess is indeed sweet (who doesn't love symbols?) but it's also making several things (slightly) slower and more bug prone. Heck, if the core Rails team thinks the syntax is so important, why not change the Hash class itself and make _all_ hashes behave as a HashWithIndifferentAccess? That would surely bring the bugs out ;)

(follow-up: ↓ 23 ) 03/11/08 15:27:36 changed by lastobelus

if you "rewrite the parsing code" -- which I've already done -- are you going to make nested entities work or not? There is no way to make nested entities work fully with the semantics of the current parser. It is a logical impossibility. You can't make the checkbox hack not be ambiguous without introducing syntax.

(in reply to: ↑ 22 ) 03/11/08 17:42:53 changed by jdrowell

Replying to lastobelus:

if you "rewrite the parsing code" -- which I've already done -- are you going to make nested entities work or not? There is no way to make nested entities work fully with the semantics of the current parser. It is a logical impossibility. You can't make the checkbox hack not be ambiguous without introducing syntax.

I think that enhancing the current parser should come next. The most likely reason why it hasn't been done yet is because of the whole UrlEncodedPairParser / HashWithIndifferentAccess mess. I don't see how any maintenance can be done to that code right now. As Thomas said before, delaying the treatment of HashWithIndifferentAccess would basically remove that dependency, so fixing these problems with independent patches would be possible. After the parsing code is sane and passes all current tests we could examine the edge cases and incrementally handle them implicitly or with added semantics where that's not possible.