Changeset 5316
- Timestamp:
- 10/17/06 08:29:16 (2 years ago)
- Files:
-
- trunk/activesupport/CHANGELOG (modified) (1 diff)
- trunk/activesupport/lib/active_support/multibyte/handlers/utf8_handler.rb (modified) (1 diff)
- trunk/activesupport/test/multibyte_handler_test.rb (modified) (2 diffs)
Legend:
- Unmodified
- Added
- Removed
- Modified
- Copied
- Moved
trunk/activesupport/CHANGELOG
r5307 r5316 1 1 *SVN* 2 3 * Ensure Chars#tidy_bytes only tidies broken bytes. Closes #6397 [Manfred Stienstra] 2 4 3 5 * Add 'unloadable', a method used to mark any constant as requiring an unload after each request. [Nicholas Seckar] trunk/activesupport/lib/active_support/multibyte/handlers/utf8_handler.rb
r5286 r5316 260 260 end 261 261 262 # Strips all the non-utf-8 bytes from the stringresulting in a valid utf-8 string262 # Replaces all the non-utf-8 bytes by their iso-8859-1 or cp1252 equivalent resulting in a valid utf-8 string 263 263 def tidy_bytes(str) 264 str.unpack('C*').map { |n| 265 n < 128 ? n.chr : 266 n < 160 ? [UCD.cp1252[n] || n].pack('U') : 267 n < 192 ? "\xC2" + n.chr : "\xC3" + (n-64).chr 268 }.join 264 str.split(//u).map do |c| 265 if !UTF8_PAT.match(c) 266 n = c.unpack('C')[0] 267 n < 128 ? n.chr : 268 n < 160 ? [UCD.cp1252[n] || n].pack('U') : 269 n < 192 ? "\xC2" + n.chr : "\xC3" + (n-64).chr 270 else 271 c 272 end 273 end.join 269 274 end 270 275 trunk/activesupport/test/multibyte_handler_test.rb
r5286 r5316 229 229 result = [0xb8, 0x17e, 0x8, 0x2c6, 0xa5].pack('U*') 230 230 assert_equal result, @handler.tidy_bytes(@bytestring) 231 assert_equal "a#{result}a", @handler.tidy_bytes('a' + @bytestring + 'a') 231 assert_equal "a#{result}a", @handler.tidy_bytes('a' + @bytestring + 'a'), 232 'tidy_bytes should leave surrounding characters intact' 233 assert_equal "é#{result}é", @handler.tidy_bytes('é' + @bytestring + 'é'), 234 'tidy_bytes should leave surrounding characters intact' 232 235 assert_nothing_raised { @handler.tidy_bytes(@bytestring).unpack('U*') } 233 236 … … 237 240 assert_equal "\xE2\x82\xAC", @handler.tidy_bytes("\x80") # win_1252: euro 238 241 assert_equal "\x00", @handler.tidy_bytes("\x00") # null char 239 assert_equal [0x ef, 0xbf, 0xbd].pack('U*'), @handler.tidy_bytes("\xef\xbf\xbd") # invalid char242 assert_equal [0xfffd].pack('U'), @handler.tidy_bytes("\xef\xbf\xbd") # invalid char 240 243 end 241 244