Ruby on Rails | Screencasts | Download | Documentation | Weblog | Community | Source

Changeset 5316

Show
Ignore:
Timestamp:
10/17/06 08:29:16 (2 years ago)
Author:
nzkoz
Message:

Ensure Chars#tidy_bytes only tidies broken bytes. Closes #6397 [Manfred Stienstra]

Files:

Legend:

Unmodified
Added
Removed
Modified
Copied
Moved
  • trunk/activesupport/CHANGELOG

    r5307 r5316  
    11*SVN* 
     2 
     3* Ensure Chars#tidy_bytes only tidies broken bytes. Closes #6397 [Manfred Stienstra] 
    24 
    35* Add 'unloadable', a method used to mark any constant as requiring an unload after each request. [Nicholas Seckar] 
  • trunk/activesupport/lib/active_support/multibyte/handlers/utf8_handler.rb

    r5286 r5316  
    260260      end 
    261261       
    262       # Strips all the non-utf-8 bytes from the string resulting in a valid utf-8 string 
     262      # Replaces all the non-utf-8 bytes by their iso-8859-1 or cp1252 equivalent resulting in a valid utf-8 string 
    263263      def tidy_bytes(str) 
    264         str.unpack('C*').map { |n| 
    265           n < 128 ? n.chr : 
    266           n < 160 ? [UCD.cp1252[n] || n].pack('U') : 
    267           n < 192 ? "\xC2" + n.chr : "\xC3" + (n-64).chr 
    268         }.join 
     264        str.split(//u).map do |c| 
     265          if !UTF8_PAT.match(c) 
     266            n = c.unpack('C')[0] 
     267            n < 128 ? n.chr : 
     268            n < 160 ? [UCD.cp1252[n] || n].pack('U') : 
     269            n < 192 ? "\xC2" + n.chr : "\xC3" + (n-64).chr 
     270          else 
     271            c 
     272          end 
     273        end.join 
    269274      end 
    270275       
  • trunk/activesupport/test/multibyte_handler_test.rb

    r5286 r5316  
    229229    result = [0xb8, 0x17e, 0x8, 0x2c6, 0xa5].pack('U*') 
    230230    assert_equal result, @handler.tidy_bytes(@bytestring) 
    231     assert_equal "a#{result}a", @handler.tidy_bytes('a' + @bytestring + 'a') 
     231    assert_equal "a#{result}a", @handler.tidy_bytes('a' + @bytestring + 'a'), 
     232      'tidy_bytes should leave surrounding characters intact' 
     233    assert_equal "é#{result}é", @handler.tidy_bytes('é' + @bytestring + 'é'), 
     234      'tidy_bytes should leave surrounding characters intact' 
    232235    assert_nothing_raised { @handler.tidy_bytes(@bytestring).unpack('U*') } 
    233236     
     
    237240    assert_equal "\xE2\x82\xAC", @handler.tidy_bytes("\x80") # win_1252: euro 
    238241    assert_equal "\x00", @handler.tidy_bytes("\x00") # null char 
    239     assert_equal [0xef, 0xbf, 0xbd].pack('U*'), @handler.tidy_bytes("\xef\xbf\xbd") # invalid char 
     242    assert_equal [0xfffd].pack('U'), @handler.tidy_bytes("\xef\xbf\xbd") # invalid char 
    240243  end 
    241244