UTF-8 library

anatoli anatoli@yahoo.com
Sat, 10 Aug 2002 04:44:01 -0700 (PDT)


--- Ashley Yakeley <ashley@semantic.org> wrote:
> >By the same token, we should handle CR/LF/CR-LF/LF-CR mess by hand.
> >(Files don't have lines in them, they are just sequences of octets.)
> 
> Correct. Exactly what kind of newline do you want in your file?

The correct answer depends on the level of abstraction. It can be either
"some specific kind of newline" or "whatever kind the OS wants", but
mostly it's "I don't care" (i.e. "whatever kind the Handle wants").

> >A "Word8 stream" can be either Handle (Word8Handle?) or [Word8]. We can 
> >transform
> >[Word8] to [Char], but not Word8Handle to CharHandle. I argue that the latter
> >is needed as well.
> 
> Well, it should be a utility library built on top of the real Word8-based 
> functions:
> 
>   data TextHandle = MkTextHandle Handle TextEncoding;
>   etc.

I have no problem with that, except for the naming. Current IO functions
are mostly text-based and centered around Handles, and there's no good reason
to break that. Thus, your TextHandle probably should be a Handle and your
Handle probably should be a BinaryHandle.

Plus, the utility library should probably live on the C side, but that's
an implementation detail :)

-- 
a.


__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com