OK, now Ruby 1.9 has String#each_codepoint and understands \p{Lu} for regular expression. I hope all Unicode whiners would complain no longer.

matz. [ruby-core:18694] Re: Character encodings - a radical suggestion