[Haskell-cafe] Re: Strings and utf-8
Reinier Lamers
reinier.lamers at phil.uu.nl
Thu Nov 29 11:07:35 EST 2007
Thomas Hartman wrote:
>
> A translation of
>
> http://www.ahinea.com/en/tech/perl-unicode-struggle.html
>
> from perl to haskell would be a very useful piece of documentation, I
> think.
Perl encodes both Unicode and binary data as the same (dynamic) data
type. Haskell - at least in theory - has two different types for them,
namely [Char] for characters and [Word8] or ByteString for sequences of
bytes. I think the Haskell approach is better, because the programmer in
most cases knows whether he wants to treat his data as characters or as
bytes. Perl does it the Perlish "We guess at what the coder means" way,
which leads to a lot of frustration when Perl guesses wrong.
The problems of the Haskeller trying to use Unicode, I think, will be
different from those of the Perl hacker trying to use Unicode: the
Haskeller will have to search for third-party modules to do what he
wants, and finding those modules is the problem. The Perl hacker has all
the Unicode support built in, but has to fight Perl occasionally to keep
it from doing byte operations on his Unicode data.
I had a colleague here go all but insane last week trying to use 'split'
on a Unicode string in Perl on Windows. split would break the string in
the middle of a UTF-8 wide character, crashing UTF-8 processing later on.
Reinier
More information about the Haskell-Cafe
mailing list