[Haskell-cafe] Writing binary files?

Udo Stenzel u.stenzel at web.de
Tue Sep 14 16:58:34 EDT 2004


Glynn Clements wrote:
> Marcin 'Qrczak' Kowalczyk wrote:
> > [...]
> Note that this needs to include all of the core I/O functions, not
> just reading/writing streams. E.g. FilePath is currently an alias for
> String, but (on Unix, at least) filenames are strings of bytes, not
> characters. Ditto for argv, environment variables, possibly other
> cases which I've overlooked.

I don't think so.  They all are sequences of CChars, and C isn't
particularly known for keeping bytes and chars apart.  I believe,
Windows NT has (alternate) filename handling functions that use unicode
stringsr.  This would strengthen the view that a filename is a sequence
of characters.  Ditto for argv, env, whatnot; they are typically entered
from the shell and therefore are characters in the local encoding.


> > 3. The default encoding is settable from Haskell, defaults to
> >    ISO-8859-1.
> 
> Agreed.

Oh no, please don't do that.  A global, settable encoding is, well,
dys-functional.  Hidden state makes programs hard to understand and
Haskell imho shouldn't go that route.  And please don't introduce the
notion of a "default" encoding.


I'd like to see the following:

- Duplicate the IO library.  The duplicate should work with [Byte]
  everywhere where the old library uses String.  Byte is some suitable
  unsigned integer, on most (all?) platforms this will be Word8

- Provide an explicit conversion between encodings.  A simple conversion
  of type [Word8] -> String would suit me, iconv would provide all that
  is needed.

- iconv takes names of encodings as arguments.  Provide some names as
  constants: one name for the internal encoding (probably UCS4), one
  name for the canonical external encoding (probably locale dependent).

- Then redefine the old IO API in terms of the new API and appropriate
  conversions.

While we're at it, do away with the annoying CR/LF problem on Windows,
this should simply be part of the local encoding.  This way file can
always be opened as binary, hSetBinary can be dropped.  (This won't wont
on ancient platforms where text files and binary files are genuinely
different, but these are probably not interesting anyway.)

The same thoughts apply to filenames.  Make them [Word8] and convert
explicitly.  By the way, I think a path should be a list of names (that
is of type [[Word8]]) and the library would be concerned with putting in
the right path separator.  Add functions to read and show pathnames in
the local conventions and we'll never need to worry about path
separators again.

 
> There are limits to the extent to which this can be achieved. E.g. 
> what happens if you set the encoding to UTF-8, then call
> getDirectoryContents for a directory which contains filenames which
> aren't valid UTF-8 strings?

Well, then you did something stupid, didn't you?  If you don't know the
encoding you shouldn't decode anything.  That's a strong point against
any implicit decoding, I think.


Also, if efficiency is a concern, lists probably shouldn't be passed
between filesystem operations and iconv.  I think, we need a better
representation here (like PackedString for Word8), not a convoluted API.

Regards,

Udo.
-- 
If Perl is the solution, you're solving the wrong problem. -- Erik Naggum
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 240 bytes
Desc: not available
Url : http://www.haskell.org//pipermail/haskell-cafe/attachments/20040914/f352c4ea/attachment.bin


More information about the Haskell-Cafe mailing list