[Haskell-cafe] Re: Sugestion for a basic Utf8 type.

Tue Dec 2 11:50:03 EST 2008

 >> I would like to sugest a new basic type in Haskell. What if we had
 >> something like this (with any other quoting character):
 >>
 >> «Je ne parle pas français. (...) ¿Hablas español?»
 >>
 >> This would  be of type  Utf8. I  think now it  is not a  bad idea,
 >> since Haskell source  code is supposed to be  utf-8.  The internal
 >> representation of  this datatype would be a  null terminated utf-8
 >> byte vector. ...

 > Stream fusion on Haskell Unicode strings - Tom Harper
 > http://www.wellquite.org/non-blog/AngloHaskell2008/tom%20harper.pdf
 > (...)

Actually, what  I suggest is quite  different, in points  I see as
worthwhile:

* His focus  is on speed and  memory, my goal is  more elegant and
   safe code.

* His approach  consolidates Prelude. My  approach allows complete
   elimination of  Prelude. If we had  a Utf8 basic  type, we could
   have modules with many different basic types, and many different
   ideas on how to 'read «something» :: <sometype>'. In the future,
   we  could write  a  module to  implement  some sort  of not  yet
   invented  numeral type,  which other  module would  allow  to be
   readed from Chinese kanji.

* He wants  to preserve  many properties of  [Char]. I  think Utf8
   type  should  have  no  standard  properties at  all.  See  next
   argument on why this would avoid some unsafe code.

* He insists on the idea of text as something over char. Well, I'm
   probably alone  there, but I think  this was nice,  but today we
   could have better approachs.  Except  for source code, text is a
   block of information, not  a sequence of anything.  I explicitly
   would like  a type we  could not map  over, because we  can't do
   that — text is built from  so many things, there's no basic unit
   we can apply  functions to.  Even something like  "printing of a
   table  of   all  characters   and  their  unicode   numbers"  is
   impossible, since a lot of  unicode is not printable. "Are these
   blocks  of text  equal?"   also  do not  work  like that,  since
   different sets of  bytes can have the same  meaning. If you want
   some piece of text to  obey specific properties, you should have
   to extract it to a  proper type.

Sorry if this is insane for some reason.

Thanks,
Maurício