The ActiveSupport JSON encoder contains a regular expression used to identify and encode multi-byte sequences. This regexp parses the Unicode string as a byte string, but is marked with the modifiers "ux" at the end. The code gets away with this now, as the Ruby 1.8 regexp engine is Uni-lame. However, Ruby 1.8 with the Oniguruma regexp engine (and therefore presumably 1.9, but I haven't tried that) errors out on loading ActiveSupport as the character sets used in this regexp contain invalid multi-byte sequences, which they do because they describe byte-wise matches on Unicode strings, not character-wise matches. E.g. "[\xC0-\xDF]" is not a valid Unicode character set, since it contains only the first byte of what must be multi-byte strings in UTF-8.
The fix for this is to explicitly declare this regexp to have encoding None (modifiers "nx"). The explicit "none" encoding is required, as Ruby + Oniguruma with $KCODE='UTF8' will default an encoding-unspecified regexp to be a unicode regexp.
See attached patch, which passes all ActiveSupport tests in both vanilla Ruby 1.8 and Ruby 1.8 with Oniguruma, most importantly passing TestJSONEmitters#test_utf8_string_encoded_properly_when_kcode_is_utf8.