[Haskell-cafe] Re: Sugestion for a basic Utf8 type.
Mauricio
briqueabraque at yahoo.com
Tue Dec 2 11:50:03 EST 2008
>> I would like to sugest a new basic type in Haskell. What if we had
>> something like this (with any other quoting character):
>>
>> «Je ne parle pas français. (...) ¿Hablas español?»
>>
>> This would be of type Utf8. I think now it is not a bad idea,
>> since Haskell source code is supposed to be utf-8. The internal
>> representation of this datatype would be a null terminated utf-8
>> byte vector. ...
> Stream fusion on Haskell Unicode strings - Tom Harper
> http://www.wellquite.org/non-blog/AngloHaskell2008/tom%20harper.pdf
> (...)
Actually, what I suggest is quite different, in points I see as
worthwhile:
* His focus is on speed and memory, my goal is more elegant and
safe code.
* His approach consolidates Prelude. My approach allows complete
elimination of Prelude. If we had a Utf8 basic type, we could
have modules with many different basic types, and many different
ideas on how to 'read «something» :: <sometype>'. In the future,
we could write a module to implement some sort of not yet
invented numeral type, which other module would allow to be
readed from Chinese kanji.
* He wants to preserve many properties of [Char]. I think Utf8
type should have no standard properties at all. See next
argument on why this would avoid some unsafe code.
* He insists on the idea of text as something over char. Well, I'm
probably alone there, but I think this was nice, but today we
could have better approachs. Except for source code, text is a
block of information, not a sequence of anything. I explicitly
would like a type we could not map over, because we can't do
that — text is built from so many things, there's no basic unit
we can apply functions to. Even something like "printing of a
table of all characters and their unicode numbers" is
impossible, since a lot of unicode is not printable. "Are these
blocks of text equal?" also do not work like that, since
different sets of bytes can have the same meaning. If you want
some piece of text to obey specific properties, you should have
to extract it to a proper type.
Sorry if this is insane for some reason.
Thanks,
Maurício
More information about the Haskell-Cafe
mailing list