Ruby on Rails | Screencasts | Download | Documentation | Weblog | Community | Source

Ticket #2103 (closed defect: fixed)

Opened 5 years ago

Last modified 4 years ago

truncate() helper is not multibyte-safe

Reported by: me@julik.nl Assigned to: David
Priority: normal Milestone:
Component: ActionPack Version: 0.13.1
Severity: critical Keywords: actionview helpers fd
Cc: me@julik.nl

Description

The standard truncate helper from the text suite truncates multibyte characters incorrectly, slicing a character (which generates improper high bytes and, for one, invalidates XML output).

Change History

09/02/05 22:53:35 changed by minam

  • keywords changed from actionview helpers to actionview helpers fd.

Unfortunately, this is partly a weakness of Ruby. In order for multipart strings to be handled even near correctly by Ruby, you have to add the following to your environment.rb:

  $KCODE='u'
  require 'jcode'

That's half the battle. The other half of the battle is to make the truncate helper do something like the following:

  def truncate(text, length = 30, truncate_string = "...")
    if text.nil? then return end
    chars = text.split(//)
    if chars.length > length then chars[0..(length-3)].join + truncate_string else text end
  end

The bad news is the split/join magic comes at a cost, so there probably ought to be an option that allows you to specify that the string contains wide characters and ought to be handled specially.

09/03/05 07:43:44 changed by me@julik.nl

Well, there are a few problems with that. Of course, Rails can be taught to read $KCODE and, if it set to 'u', use the multibyte-safe versions of all functions. However, I don't know how this can be tested knowing that all test suites for ActionPack are KCODE-agnostic. So the only two options are: 1) slowdown the function for everyone (make multibyte handling the default) 2) leave it as it is.

The only thing I know is that no text helper should ever garble the text (or, to be more precise, create invalid UTF-sequences which waste any kind of strict machine-readable output, be it XML, YAML etc.) Text corruption is text corruption.

I think option 1) is the one we need :-) but will others agree with the slowdown it implies? By the way, multibyte handling is _always_ slower than ASCII.

09/03/05 09:58:19 changed by minam

There are ways to test the KCODE thing. ActionMailer has a unit test that does that--it writes a script to a temporary file, executes it in a separate ruby instance, and reads the result. It then does the assertions it needs to on the result.

I, for one, would rather not impose the overhead of multibyte handling on everyone. I would rather see a condition that checks the value of KCODE and uses the appropriate version. (Especially since, if KCODE is not set, there's not really an easy way to parse the string in a multibyte manner.)

09/19/05 21:37:34 changed by minam

  • status changed from new to closed.
  • resolution set to fixed.

[2265]

If $KCODE has been set appropriately and "jcode" required, truncate() will act correctly on multibyte strings. If $KCODE is "NONE", truncate() will act as if the string consists of single-byte characters.

10/05/05 21:49:35 changed by me@julik.nl

  • cc set to me@julik.nl.

I will try to look into other helpers and check how they handle multibyte strings as soon as I have time, I am sure there are more flies to catch there.