Ruby on Rails | Screencasts | Download | Documentation | Weblog | Community | Source

Ticket #6494 (closed defect: fixed)

Opened 2 years ago

Last modified 2 years ago

[PATCH] ActiveSupport JSON encoder has "utf8" regexp incorrectly marked as unicode

Reported by: whitley Assigned to: David
Priority: high Milestone: 1.2
Component: ActiveSupport Version: edge
Severity: blocker Keywords: fd
Cc: minam

Description

The ActiveSupport JSON encoder contains a regular expression used to identify and encode multi-byte sequences. This regexp parses the Unicode string as a byte string, but is marked with the modifiers "ux" at the end. The code gets away with this now, as the Ruby 1.8 regexp engine is Uni-lame. However, Ruby 1.8 with the Oniguruma regexp engine (and therefore presumably 1.9, but I haven't tried that) errors out on loading ActiveSupport as the character sets used in this regexp contain invalid multi-byte sequences, which they do because they describe byte-wise matches on Unicode strings, not character-wise matches. E.g. "[\xC0-\xDF]" is not a valid Unicode character set, since it contains only the first byte of what must be multi-byte strings in UTF-8.

The fix for this is to explicitly declare this regexp to have encoding None (modifiers "nx"). The explicit "none" encoding is required, as Ruby + Oniguruma with $KCODE='UTF8' will default an encoding-unspecified regexp to be a unicode regexp.

See attached patch, which passes all ActiveSupport tests in both vanilla Ruby 1.8 and Ruby 1.8 with Oniguruma, most importantly passing TestJSONEmitters#test_utf8_string_encoded_properly_when_kcode_is_utf8.

Attachments

oniguruma_json_utf8_regexp_fix.patch (0.6 kB) - added by whitley on 10/26/06 05:15:00.

Change History

10/26/06 05:15:00 changed by whitley

  • attachment oniguruma_json_utf8_regexp_fix.patch added.

11/05/06 18:11:50 changed by Donald Piret

  • priority changed from normal to high.
  • severity changed from normal to blocker.

Confirmed this patch fixes things on Ruby 1.8 with Oniguruma, changed the severity to blocker since this problem stops rails apps from running on this configuration.

11/05/06 18:59:53 changed by bitsweat

  • cc set to minam.
  • keywords set to fd.
  • version set to edge.
  • component changed from ActiveRecord to ActiveSupport.
  • milestone changed from 1.x to 1.2.

11/05/06 19:03:38 changed by bitsweat

  • status changed from new to closed.
  • resolution set to fixed.

(In [5432]) Fix unicode JSON regexp for Onigurama compatibility. Closes #6494.