[Haskell-cafe] invalid character encoding

Ian Lynagh igloo at earth.li
Tue Mar 15 22:54:19 EST 2005


On Tue, Mar 15, 2005 at 10:44:28AM +0000, Ross Paterson wrote:
> On Mon, Mar 14, 2005 at 07:38:09PM -0600, John Goerzen wrote:
> > I've got some gzip (and Ian Lynagh's Inflate) code that breaks under
> > the new hugs with:
> > 
> >  <handle>: IO.getContents: protocol error (invalid character encoding)
> > 
> > What is going on, and how can I fix it?
> 
> A Haskell 98 Handle is a character stream, and doesn't support binary
> I/O.  This would have bitten you sooner or later on systems that do CRLF
> conversion, but Hugs is now much stricter, because character streams now
> use the encoding determined by the current locale (for the C locale, that
> means ASCII only).

Do you have a list of functions which behave differently in the new
release to how they did in the previous release?
(I'm not interested in changes that will affect only whether something
compiles, not how it behaves given it compiles both before and after).

Simons, Malcolm, are there any such functions in the new ghc/nhc98?

Also, are you all agreed that the hugs interpretation of the report is
correct, and thus ghc at least is buggy in this respect? (I'm afraid I
haven't been able to test nhc98 yet).

Finally, the hugs behaviour seems a little odd to me. The below shows 4
cases where iconv complains when asked to convert utf8 to utf8, but hugs
only gives an error in one of them. In the others it just truncates the
input. Is this really correct? It also seems to behave the same for me
regardless of whether I export LC_CTYPE to en_GB.UTF-8 or C.


Thanks
Ian


printf "\x00\x7F" > inp1
printf "\x00\x80" > inp2
printf "\x00\xC4" > inp3
printf "\xFF\xFF" > inp4
printf "\xb1\x41\x00\x03\x65\x6d\x70\x74\x79\x00\x03\x00\x00\x00\x00\x00" > inp5
echo 'main = do xs <- getContents; print xs' > run.hs
for i in `seq 1 5`; do runhugs run.hs < inp$i; done
for i in `seq 1 5`; do runghc6 run.hs < inp$i; done
for i in `seq 1 5`; do echo $i; iconv -f utf8 -t utf8 < inp$i; done

which gives me the following output:

$ for i in `seq 1 5`; do runhugs run.hs < inp$i; done
"\NUL\DEL"
"\NUL"
"\NUL"
""
"
Program error: <stdin>: IO.getContents: protocol error (invalid character encoding)
$ for i in `seq 1 5`; do runghc6 run.hs < inp$i; done
"\NUL\DEL"
"\NUL\128"
"\NUL\196"
"\255\255"
"\177A\NUL\ETXempty\NUL\ETX\NUL\NUL\NUL\NUL\NUL"
$ for i in `seq 1 5`; do echo $i; iconv -f utf8 -t utf8 < inp$i; done
1
2
iconv: illegal input sequence at position 1
3
iconv: incomplete character or shift sequence at end of buffer
4
iconv: illegal input sequence at position 0
5
iconv: illegal input sequence at position 0
$ 




More information about the Haskell-Cafe mailing list