[Haskell-cafe] Re: Writing binary files?

Glynn Clements glynn.clements at virgin.net
Thu Sep 16 16:52:28 EDT 2004


Gabriel Ebner wrote:

> > For case testing, locale-dependent sorting and the like, you need to
> > convert to characters. [Although possibly only temporarily; you can
> > sort a list of byte strings based upon their corresponding character
> > strings using sortBy. This means that a decoding failure only means
> > that the ordering will be wrong. This is essentially what happens with
> > "ls" if you have filenames which aren't valid in the current locale.]
> 
> sortBy could only cope with single-byte encodings.  Multi-byte
> encodings would need something else.

I think that you may have misunderstood my point. I was referring to
something like this:

	type ByteString = [Word8]

	decode :: ByteString -> String
	decode = ...

	comparator :: ByteString -> ByteString
	comparator s1 s2 = compare (decode s1) (decode s2)

	sortByteStrings :: [ByteString] -> [ByteString]
	sortByteStrings ss = sortBy comparator ss

The byte strings which are returned from sortByteStrings are the
original byte strings, but the ordering will be determined by the
encoding. This produces the same results as decode->sort->encode (in
the cases where the latter actually works), but is more robust.

> > It's broken. Being able to represent filenames as byte strings is
> > fundamental. Being able to convert them to or from character strings
> > is useful but not essential. The only reason why the existing API
> > doesn't cause serious problems is because the translation is currently
> > hardwired to an encoding which can't fail.
> 
> Handling binary filenames is hardly fundamental.  It isn't even very
> portable, see the posts about filename handling under modern Windows.
> It might be an important feature, but there are other programs out
> there (mostly GUIs) that expect filenames to be encoded according to
> the locale settings too.

It's fundamental if you want your programs to be robust. For most
programs, there is no legitimate reason to refuse to read a file
because of its name.

A GUI program (or for that matter, a terminal) might legitimately fail
to *display* a filename correctly if it can't decode it (it has to
index into the font). But that isn't a reason to reject it altogether.

E.g. if I create a file whose name contains control characters, most
GUI programs display it incorrectly in the file selection dialog, but
they still manage to open it.

-- 
Glynn Clements <glynn.clements at virgin.net>


More information about the Haskell-Cafe mailing list