Ruby on Rails | Screencasts | Download | Documentation | Weblog | Community | Source

Ticket #5396 (closed enhancement: invalid)

Opened 2 years ago

Last modified 2 years ago

[XPATCH] ActiveSupport::Multibyte

Reported by: me@julik.nl Assigned to: david
Priority: normal Milestone:
Component: ActiveSupport Version: edge
Severity: normal Keywords: activesupport strings helpers validations
Cc: railspatch@advany.com

Description

This patch provides a test implementation of multibyte text handling. It also incorporates a patch to the parts of Rails that use kludges for string processing with respect to multibyte chars.

It creates a "chars" accessor for Strings that allows the developer to manipulate strings as characters, intelligently working out Ruby's unicode fallacies behind the scenes.

It also makes use of the Unicode gem if the gem is available.

The patch consists of two parts - the Multibyte files itself and the patches to Rails.

Attachments

activesupport_mb.tgz (8.7 kB) - added by me@julik.nl on 06/14/06 18:56:41.
Multibyte files for AS
rails_megapatch.diff (8.3 kB) - added by me@julik.nl on 06/14/06 18:58:26.
First batch of patches to depend on String#chars and remove kludges
test_activesupport_mb_outside_of_rails.diff (2.3 kB) - added by thijs@fngtps.com on 06/15/06 09:16:42.
This changes activesupport_mb slightly so that you can run the tests outside of Rails.
updated_multibyte_activesupport.diff (59.0 kB) - added by manfred on 09/13/06 14:06:24.
updated_multibyte_actionpack.diff (4.6 kB) - added by manfred on 09/13/06 14:06:54.
unicode_tables.tar.gz (128.1 kB) - added by manfred on 09/13/06 14:07:14.

Change History

06/14/06 18:56:41 changed by me@julik.nl

  • attachment activesupport_mb.tgz added.

Multibyte files for AS

06/14/06 18:58:26 changed by me@julik.nl

  • attachment rails_megapatch.diff added.

First batch of patches to depend on String#chars and remove kludges

06/14/06 19:42:18 changed by me@julik.nl

The idea is also that later on it will be possible to implement handlers for Kanji charsets should this become necessary.

06/14/06 19:47:00 changed by anonymous

This patch is a very nice solution for non-english Rails-powered websites. In fact, it's a BIG pain to develop this kind of website without proper support of m17n in Ruby.

"Old" unicode hacks by julik (http://julik.textdriven.com/svn/tools/rails_plugins/unicode_hacks/) break too many things in Rails, so this seems like the only way.

06/14/06 21:37:00 changed by max@maxidoors.ru

I really consider this patch very important. Without this patch, string presentation with non-English characters is broken.

06/14/06 21:59:20 changed by me@julik.nl

The point is not in brokenness - it's more about providing Rails with some foundation to stand on until Matz gets his act together regarding M17N (which is estimated to take about 18 months from now)

06/14/06 23:10:02 changed by me@julik.nl

I've made a fix for #5375 but it's somewhat of a pain to incorporate because multibyte files can't be diffed properly yet.

Shortly speaking, the "excerpt" helper and friends should always use character offsets as opposed to byte offsets and never use the case modifier

06/15/06 05:12:26 changed by contact@k66.ru

I need this patch too.

06/15/06 08:07:05 changed by atuzov@gmail.com

And I

06/15/06 09:16:42 changed by thijs@fngtps.com

  • attachment test_activesupport_mb_outside_of_rails.diff added.

This changes activesupport_mb slightly so that you can run the tests outside of Rails.

06/15/06 09:35:09 changed by skimua@gmail.com

  • owner changed from David to anonymous.
  • status changed from new to assigned.

1

06/15/06 11:14:50 changed by Anton Kovalyov

I really really need this patch.

06/16/06 08:37:38 changed by thijs@vandervossen.net

  • owner changed from anonymous to david.
  • status changed from assigned to new.

Rolling back status and owner.

06/16/06 12:57:49 changed by hendrik@mans.de

Give to Radiskull.

06/17/06 21:35:06 changed by anonymous

  • cc set to railspatch@advany.com.

06/17/06 23:20:38 changed by me@julk.nl

We are stubbing out a pure-ruby UTF table generator and normalizer which is going to be used if no extension is available, but I would like to include it when we have opinions about the API from the core.

Right now all the development happens in unicode_hacks for further extraction.

06/20/06 21:36:54 changed by Manfred Stienstra <m.stienstra@fngtps.com>

+1, currently helping with development of the patch.

06/21/06 11:14:29 changed by thijs@80beans.com

1

08/20/06 12:58:06 changed by anonymous

  • version set to 1.1.1.

09/13/06 14:06:24 changed by manfred

  • attachment updated_multibyte_activesupport.diff added.

09/13/06 14:06:54 changed by manfred

  • attachment updated_multibyte_actionpack.diff added.

09/13/06 14:07:14 changed by manfred

  • attachment unicode_tables.tar.gz added.

09/13/06 14:25:57 changed by manfred

  • version changed from 1.1.1 to edge.

I've just added updated patches for ActiveSupport::Multibyte. The updated_multibyte_activesupport.diff contains all the changes to activesupport except for the unicode tables, which are included in unicode_tables.tar.gz. The unicode tables can also be generated by running:

ruby activesupport/lib/active_support/multibyte/generators/generate_tables.rb

As an example of how to add utf8 support to the helpers I've added update_multibyte_actionpack.diff. I didn't want to create one mega patch to merge the chars accessor into the Rails code, I suggest doing that patch by patch over time.

Biggest changes:

  • A pure ruby implementation of all the needed unicode operations
  • Support for the utf8proc extension when present
  • Updated test coverage
  • Updated documentation

09/20/06 12:53:43 changed by thijsv

  • status changed from new to closed.
  • resolution set to invalid.

Closing ticket because the stable version of ActiveSupport::Multibyte can now be found at #6242.