Ruby on Rails | Screencasts | Download | Documentation | Weblog | Community | Source

Ticket #1937 (reopened defect)

Opened 4 years ago

Last modified 2 years ago

[PATCH] assert_tag erroneously warns on valid HTML4

Reported by: drbrain Assigned to: David
Priority: normal Milestone: 2.x
Component: ActionPack Version: edge
Severity: normal Keywords: assert_tag
Cc: purestorm@ggnore.net, rails@bitsweat.net, jarkko

Description

When run against an action that generates the attached HTML, the following assert_tag gives erroneous warnings:

  def test_index
    get :index
    assert_response :success
    assert_tag(:tag => 'a', :child => 'fade', :attributes => {
                 :href => "/hosts/show/1"
                })

  end

Warnings:

ignoring attempt to close li with ul
  opened at byte 328, line 14
  closed at byte 365, line 15
  attributes at open: {}
  text around open: "ts/show/1\">fade</a>\n<li><a href=\"/hosts/"
  text around close: "ts/show/2\">fade</a>\n</ul>\n\n<p><a href=\"/"
ignoring attempt to close td with table
  opened at byte 504, line 23
  closed at byte 546, line 24
  attributes at open: {}
  text around open: "d>Fade<td>60 seconds<td><a href=\"/test_t"
  text around close: "es/edit/1\">Edit</a>\n</table>\n\n<p><a href"
ignoring attempt to close p with div
  opened at byte 556, line 26
  closed at byte 598, line 28
  attributes at open: {}
  text around open: ">Edit</a>\n</table>\n\n<p><a href=\"/home/ne"
  text around close: ">New Test Type</a>\n\n</div>\n</body>\n</htm"
ignoring attempt to close p with body
  opened at byte 556, line 26
  closed at byte 605, line 29
  attributes at open: {}
  text around open: ">Edit</a>\n</table>\n\n<p><a href=\"/home/ne"
  text around close: "st Type</a>\n\n</div>\n</body>\n</html>\n\n"
ignoring attempt to close p with html
  opened at byte 556, line 26
  closed at byte 613, line 30
  attributes at open: {}
  text around open: ">Edit</a>\n</table>\n\n<p><a href=\"/home/ne"
  text around close: "/a>\n\n</div>\n</body>\n</html>\n\n"

As you can see from the HTML 4 Element Index at http://www.w3.org/TR/REC-html40/index/elements.html, the end-tag is optional for all the cases it is warning about.

Attachments

valid_html4.html (0.6 kB) - added by eric@robotcoop.com on 08/10/05 16:01:49.
A valid HTML4 document
html_document.patch (1.1 kB) - added by peter_marklund on 03/24/07 17:02:12.
Make HTML::Document#initialize default xml argument to true if text has <?xml instruction
html_document.2.patch (1.2 kB) - added by peter_marklund on 03/24/07 17:47:10.
Make HTML::Document#initialize default xml argument to true if text has <?xml instruction at the begging

Change History

08/10/05 16:01:49 changed by eric@robotcoop.com

  • attachment valid_html4.html added.

A valid HTML4 document

08/16/05 03:49:47 changed by anonymous

  • milestone set to 1.0.

08/19/05 08:53:25 changed by anonymous

The attachment was validated (independently) using the www.w3c.org html validator.

09/03/05 21:09:00 changed by scott@sigkill.org

  • keywords set to caches_page.

I've started seeing these all over the place with Typo also, even on XmlBuilder-generated files which pass multiple rounds of validation before being handed to assert_tag.

09/24/05 13:04:26 changed by anonymous

  • cc set to purestorm@ggnore.net.

11/06/05 14:01:19 changed by minam

  • status changed from new to closed.
  • resolution set to wontfix.

Unfortunately, html-scanner is not a complete HTML parsing solution. It only works with well-formed XHTML. I have added documentation to assert_tag to make these precondition explicit:

      # <strong>Please note</strong: #assert_tag and #assert_no_tag only work
      # with well-formed XHTML. They recognize a few tags as implicitly self-closing
      # (like br and hr and such) but will not work correctly with tags
      # that allow optional closing tags (p, li, td). <em>You must explicitly
      # close all of your tags to use these assertions.</em>

If someone were to submit a patch that extended html-scanner to handle optional closing tags, that'd be great.

(follow-up: ↓ 14 ) 11/06/05 19:48:59 changed by scott@sigkill.org

  • status changed from closed to reopened.
  • resolution deleted.

THAT IS NOT THE PROBLEM.

This has *NOTHING* to do with well-formed [X]HTML--with Typo, we're seeing this with the output from rxml files.

The scanner is broken somehow. It's griping about problems that simply don't exist in the original XML--we don't have mis-nested tags at all, yet we're still seeing these assert_tag warnings.

I'll see what I can do to build up a simple test case and post it here later today or tomorrow.

11/06/05 20:38:12 changed by bitsweat

Scott, Jamis mass-closed a bunch of tickets related to assert_tag and XHTML compliance, as you may have noticed. Do proceed to look into the trouble with html-scanner, but please keep these forums civil. Thanks!

11/14/05 07:15:48 changed by bitsweat

Any progress on this defect? Is it related to the partial-versus-exact match issue?

11/14/05 07:16:16 changed by bitsweat

  • cc changed from purestorm@ggnore.net to purestorm@ggnore.net, rails@bitsweat.net.
  • keywords changed from caches_page to assert_tag.

11/14/05 07:16:33 changed by bitsweat

  • priority changed from normal to high.
  • milestone changed from 1.0 to 1.x.

(follow-up: ↓ 12 ) 01/14/07 11:41:11 changed by jarkko

  • cc changed from purestorm@ggnore.net, rails@bitsweat.net to purestorm@ggnore.net, rails@bitsweat.net, jarkko.
  • milestone changed from 1.2 to 1.x.

Is this also a problem with assert_select or can this slowly be forgotten as assert_tag fades into the background?

(in reply to: ↑ 11 ) 01/14/07 12:15:17 changed by drbrain

Replying to jarkko:

Is this also a problem with assert_select or can this slowly be forgotten as assert_tag fades into the background?

Does assert_select erroneously warn on valid HTML4?

01/14/07 20:19:53 changed by minam

  • status changed from reopened to closed.
  • resolution set to wontfix.

assert_select uses, underneath, the same HTML parsing library as assert_tag, and so has the same restrictions on input. You must use only valid XHTML, which means you must not use implicit closing tags. Tags that, by definition, have no closing tag (like link, br, hr, and so forth) are correctly parsed, but tags that (according to HTML4) have optional closing tags (like li, p, and such) must be explicitly closed.

Finally, note that XML documents that include tags that, according to HTML, must not have explicit closing tags (e.g. link, br, etc) will not be parsed correctly, by default. You ought to take a look at http://weblog.jamisbuck.org/2007/1/4/assert_xml_select to see how to use assert_select (and assert_tag) safely with XML documents. (It's a feature, not a bug.)

Regarding the optional closing tags, in my opinion, it's just good practice to close all of your tags explicitly. However, if someone were to submit a patch that was fully backwards compatible and supported all optional closing tags, it would certainly be considered.

Lastly, if anyone can demonstrate a case where a valid XHTML document is emitting warnings incorrectly, please do report it. Until then, I'm closing this ticket (again).

(in reply to: ↑ 6 ) 01/15/07 00:21:39 changed by drbrain

  • status changed from closed to reopened.
  • resolution deleted.

Replying to scott@sigkill.org:

THAT IS NOT THE PROBLEM. This has *NOTHING* to do with well-formed [X]HTML--with Typo, we're seeing this with the output from rxml files. The scanner is broken somehow. It's griping about problems that simply don't exist in the original XML--we don't have mis-nested tags at all, yet we're still seeing these assert_tag warnings. I'll see what I can do to build up a simple test case and post it here later today or tomorrow.

Repopening, since the reason this bug was reopened has apparently not been fixed or commented upon.

01/15/07 01:39:41 changed by minam

"we're seeing this with the output from rxml files...It's griping about problems that simply don't exist in the original XML"

My understanding was this was reopened because someone was getting warnings when scanning XML documents, which I addressed in my last comment.

I won't close this right away, though...if nothing else has been said about XHTML documents causing warnings, I'll close again in a few days.

01/20/07 22:45:48 changed by minam

  • status changed from reopened to closed.
  • resolution set to wontfix.

03/24/07 17:02:12 changed by peter_marklund

  • attachment html_document.patch added.

Make HTML::Document#initialize default xml argument to true if text has <?xml instruction

03/24/07 17:19:13 changed by peter_marklund

  • status changed from closed to reopened.
  • resolution deleted.

Jamis, I added a patch that makes HTML::Document#initialize default the xml argument to whether the text has the "<?xml ...>" instruction or not. That way the parser will still be backwards compatible with HTML 4 (which is not valid XML) and in addition work with XML in general.

Notice that when we say in the documentation that the parser only works with strict XHTML, that is not quite true, since we are self-closing certain tags (in the method HTML::Node#childless?) in order for the parser to work with HTML 4, which is not valid XML and doesn't require the trailing slash to close elements. This backwards compatibility makes sense but should be reflected in the documentation.

I realize the parser was intended to parse HTML and not be a generic XML parser. However, many Rails developers will expect to be able to use assert_select on their XML feeds and other XML documents (I'm using it for VoiceXML) and so are likely to continue to run into warning messages and/or broken behaviour in the future.

I was hoping that with my patch (see uploaded html_document.patch) we are covering all the bases and no workarounds or monkey patches will be needed.

Just to clarify, as far as I know, the parser currently works correctly with HTML and XHTML, and so in effect, does what it's advertised to do. The intention of my patch is to make the parser retain HTML and XHTML support and in addition work transparently with XML in general. There are quite a few developers working with XML output so I think this would be valuable.

03/24/07 17:47:10 changed by peter_marklund

  • attachment html_document.2.patch added.

Make HTML::Document#initialize default xml argument to true if text has <?xml instruction at the begging

05/24/07 20:34:54 changed by josh

  • priority changed from high to normal.
  • version changed from 0.13.1 to edge.
  • summary changed from assert_tag erroneously warns on valid HTML4 to [PATCH] assert_tag erroneously warns on valid HTML4.