[Haskell-cafe] Strings and utf-8

Andrew Coppin andrewcoppin at btinternet.com
Wed Nov 28 17:11:38 EST 2007

Duncan Coutts wrote:
> When it's phrased as "truncates to 8 bits" it sounds so simple, surely
> all we need to do is not truncate to 8 bits right?
> The problem is, what encoding should it pick? UTF8, 16, 32, EBDIC? How
> would people specify that they really want to use a binary file.
> Whatever we change it'll break programs that use the existing meanings.
> One sensible suggestion many people have made is that H98 file IO should
> use the locale encoding and do Unicode/String <-> locale conversion. So
> that'd all be text files. Then openBinaryFile would be used for binary
> files. Of course then we'd need control over setting the encoding and
> what to do on encountering encoding errors.
> IMHO, someone should make a full proposal by implementing an alternative
> System.IO library that deals with all these encoding issues and
> implements H98 IO in terms of that.
> It doesn't have to be fast initially, it just has to get the API right
> and not design the API so as to exclude the possibility of a fast
> implementation later.

In my humble opinion, what should happen is this:

We need two seperate interfaces. One for text-mode I/O, one for raw 
binary I/O. ByteString provides some of the latter. [Can you use that on 
network sockets?] I guess what's needed is a good binary library to go 
with it. [I know there's been quite a few people who've had a go at this 

When doing text-mode I/O, the programmer needs to be able to explicitly 
specify exactly which character encoding is required. (Presumably 
default to the current 8-bit truncation encoding?) That way the 
programmer can decide exactly how to choose an encoding, rather than the 
library designer trying to guess what The Right Thing is for all 
possible application programs. And it needs to be possible to cleanly 
add new encodings too.

I'd have a go at implementing all this myself, but I wouldn't know where 
to begin...

More information about the Haskell-Cafe mailing list