Improve Unicode friendliness of generated Merb applications
Reported by Matt Colyer | April 21st, 2008 @ 12:00 PM | in 1.0 (Nearish Future)
Since the default application layout template defaults the character set to UTF-8, I believe that $KCODE should be set to UTF-8 as well.
I have a attached a patch.
Comments and changes to this ticket
-
Ezra Zygmuntowicz May 3rd, 2008 @ 08:03 PM
- → Milestone changed from to 0.9
- → State changed from new to open
- → Assigned user changed from to Ezra Zygmuntowicz
I'm not sure this is the right thing to do or not. But I'm no expert at utf8 type stuff, can you explain why we should add this?
-

Matt Colyer May 4th, 2008 @ 07:37 PM
The default templates that come with Merb specify that the documents are UTF-8 encoded.
Therefore in order to keep the strings looking the same inside ruby the KCODE should be set to UTF-8 encoding as well.
This would come up if a user posts a form containing multibyte characters. Ruby still doesn't handle multibyte characters well but this atleast makes it slightly better.
-
Michael Klishin (antares) May 8th, 2008 @ 12:55 PM
- → Milestone changed from 0.9 to 1.0 (Nearish Future)
-
Michael Klishin (antares) May 14th, 2008 @ 11:19 AM
Ezra,
An possible example for you: some crazy math people may use greek alphabet for variables in math calculations in the code. If you name your variable in the language other than English (so it is out of bounds of ASCII table), Ruby will screw up unless KCODE is set to "u" or -Ku command line options given.
I just tried naming variables in Russian, Ruby fails with backtrace one screen long until I run it with -Ku.
-
Michael Klishin (antares) May 14th, 2008 @ 11:25 AM
Other than setting this in init file, I think we should use Ruby 1.9-friendly comments in source files, and set KCODE in -core files.
I am reading pick axe 3 now to make sure I am correct with this.
-
Michael Klishin (antares) May 14th, 2008 @ 11:28 AM
- → Title changed from Default KCODE to UTF-8 to Improve Unicode friendliness of generated Merb applications
-

Paul Dlug May 14th, 2008 @ 02:27 PM
Add my vote, $KCODE should be set to 'u' (rails does this as well). It's still not perfect in that other string methods will fail in subtle ways with unicode strings but this appears to be a ruby problem rather than merb. For example, 'puts "π".length' returns 2 which is obviously incorrect. I believe ActiveSupport has some wrappers which hack around this, hopefully there is an even better solution out there.
-

Matt Aimonetti May 28th, 2008 @ 03:01 AM
I"m working extracting Multibyte from AS (with manfred's blessing) so we can use it with Merb without requiring the entire AS lib.
I guess $KCODE = u wouldn't hurt.
-
Damian Terentiev June 3rd, 2008 @ 12:20 AM
It will be great if we have an option to use Multibyte in Merb.
-

Matt Aimonetti June 3rd, 2008 @ 12:37 AM
- → Assigned user changed from Ezra Zygmuntowicz to Matt Aimonetti
I'm working on extracting the multibyte code from ActiveSupport and make it available to people who would want to use it. However for performance reasons Merb won't more than likely be multibyte aware by default.
Please Login or create a free account to add a new comment.
You can update this ticket by sending an email to from your email client. (help)
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile »
