UTF-8 library
Ashley Yakeley
ashley@semantic.org
Sat, 10 Aug 2002 03:42:22 -0700
At 2002-08-10 03:03, anatoli wrote:
>--- Sven Moritz Hallberg <pesco@gmx.de> wrote:
>> I argue _strongly_ against associating some sort of locale state with
>> handles.
>>
>> 1) In agreement with Ashley's statements, file IO should use octets,
>> because that's what's in a file.
>
>By the same token, we should handle CR/LF/CR-LF/LF-CR mess by hand.
>(Files don't have lines in them, they are just sequences of octets.)
Correct. Exactly what kind of newline do you want in your file?
>I prefer somewhat higher-level view of files.
Well, that's what encoding functions are for. You can take higher-level
views of your octets as text, images, XML-structures, experimental
datasets, whatever.
What's so special about text that the functionality should be bound
_right into the API_?
>> 2) If you need to decode those octets to characters, or vice-versa,
>> compose a (de)serialization function before it.
>
>I *always* need that. (Except for binary IO).
You *always* need that. (Except when you don't).
The term of "binary" is quite misleading. It suggests a particular file
type, but it's actually used to mean "something other than
ASCII-compatible text". One might as well have a word that means
"something other than a JPEG image".
...
>A "Word8 stream" can be either Handle (Word8Handle?) or [Word8]. We can
>transform
>[Word8] to [Char], but not Word8Handle to CharHandle. I argue that the latter
>is needed as well.
Well, it should be a utility library built on top of the real Word8-based
functions:
data TextHandle = MkTextHandle Handle TextEncoding;
etc.
--
Ashley Yakeley, Seattle WA