Ruby on Rails | Screencasts | Download | Documentation | Weblog | Community | Source

Ticket #10336 (new defect)

Opened 8 months ago

Last modified 3 months ago

[PATCH] Minor correction of AUTO_LINK_RE

Reported by: dim Assigned to: core
Priority: high Milestone: 2.x
Component: ActionPack Version: edge
Severity: normal Keywords: verified
Cc:

Description

The auto_link helper method produces *corrupted* HTML output for links at the end of a sentence. Example:

auto_link("<p>See http://retrospectiva.org/wiki/.</p>")   
  # => Produces (wrong HTML): 
  #   <p>See <a href="http://retrospectiva.org/wiki/.</p">http://retrospectiva.org/wiki/.</p</a>>
  # => Instead of:
  #   <p>See <a href="http://retrospectiva.org/wiki/">http://retrospectiva.org/wiki/</a>.</p>

I've attached the (rather simple) patch (together with a test).

Dimitrij

Attachments

auto_link_re_patch.diff (1.5 kB) - added by dim on 12/02/07 10:39:01.
PATCH for AUTO_LINK_RE
0001-The-auto_link-helper-method-produces-corrupted-HTML.patch (2.1 kB) - added by libc on 05/01/08 05:50:52.
for rails at 6f20efdaf733db26fbf337da73121983785064d5

Change History

12/02/07 10:39:01 changed by dim

  • attachment auto_link_re_patch.diff added.

PATCH for AUTO_LINK_RE

12/03/07 05:36:04 changed by murphy

I'm not sure if the autolink method is supposed to work with HTML; using it after html_escape should be safer.

Anyway, that Regexp is buggy in more places.

  (                          # leading text
    <\w+.*?>|                # leading HTML tag, or
    [^=!:'"/]|               # leading punctuation, or 
    ^                        # beginning of line
  )

The first option only matches single-line tags, so .*? should probably be (?m:.*?). The second option disallows some leading punctation, and allows everything else (especially white space), so that comment is misleading.

In the line dim pointed out:

  (?:/(?:(?:[~\w\+@%-]|(?:[,.;:][^\s$]))+)?)*
                   what's this? ----^

Someone apparently wanted to say "punctation only if not followed by whitespace or the end of the string", but $ matches just the dollar sign inside [ ].

Using (?!\s|<|$) or something seems more appropriate to me as it doesn't eat the "check" character. This avoids problems with texts like "www.rubyonrails.org/.&a=b", where the & would get eaten by the path sub-regexp.

To avoid mathing the punctation in "www.rubyonrails.org/..." or "www.rubyonrails.org/?!", we can use [[:punct:]]+ with atomic matching:

  (?:/(?:(?:[~\w\+@%-]|(?:(?>[[:punct:]]+)(?!\s|<|$)))+)?)* # path

But that still doesn't help us with strings like this:

  <optgroup label="Subdomains of www.rubyonrails.org">

which would still produce invalid HTML when auto_link'd.

I'd leave the priority on "high", and if a solution is found, would like to see it in 1.2 also, because this is exactly the kind of bug that XSS hackers use.

12/04/07 15:52:51 changed by josh

+1

12/05/07 21:39:30 changed by norbert

  • milestone changed from 2.0 to 2.x.

As stated in the description of report {68}, the 2.0 milestone should only be used by core members for real show stoppers.

05/01/08 05:50:52 changed by libc

  • attachment 0001-The-auto_link-helper-method-produces-corrupted-HTML.patch added.

for rails at 6f20efdaf733db26fbf337da73121983785064d5

05/01/08 05:51:21 changed by libc

  • keywords set to verified.

+1