[Haskell-cafe] invalid character encoding
Ian Lynagh
igloo at earth.li
Sat Mar 19 14:14:25 EST 2005
On Wed, Mar 16, 2005 at 11:55:18AM +0000, Ross Paterson wrote:
> On Wed, Mar 16, 2005 at 03:54:19AM +0000, Ian Lynagh wrote:
> > Do you have a list of functions which behave differently in the new
> > release to how they did in the previous release?
> > (I'm not interested in changes that will affect only whether something
> > compiles, not how it behaves given it compiles both before and after).
>
> I got lost in the negatives here. It affects all Haskell 98 primitives
> that do character I/O, or that exchange C strings with the C library.
In the below, it looks like there is a bug in getDirectoryContents.
Also, the error from w.hs is going to stdout, not stderr.
Most importantly, though: is there any way to remove this file without
doing something like an FFI import of unlink?
Is there anything LC_CTYPE can be set to that will act like C/POSIX but
accept 8-bit bytes as chars too?
(in the POSIX locale)
$ echo 'import Directory; main = getDirectoryContents "." >>= print' > q.hs
$ runhugs q.hs
[".","..","q.hs"]
$ touch 1`printf "\xA2"`
$ runhugs q.hs
runhugs: Error occurred
ERROR - Garbage collection fails to reclaim sufficient space
$ echo 'import Directory; main = removeFile "1\xA2"' > w.hs
$ runhugs w.hs
Program error: 1?: Directory.removeFile: does not exist (file does not exist)
$ strace -o strace.out runhugs w.hs > /dev/null
$ grep unlink strace.out | head -c 14 | hexdump -C
00000000 75 6e 6c 69 6e 6b 28 22 31 3f 22 29 20 20 |unlink("1?") |
0000000e
$ strace -o strace2.out rm 1*
$ grep unlink strace2.out | head -c 14 | hexdump -C
00000000 75 6e 6c 69 6e 6b 28 22 31 a2 22 29 20 20 |unlink("1.") |
0000000e
$
Now consider this e.hs:
--------------------
import IO
main = do hWaitForInput stdin 10000
putStrLn "Input is ready"
r <- hReady stdin
print r
c <- hGetChar stdin
print c
putStrLn "Done!"
--------------------
$ { printf "\xC2\xC2\xC2\xC2\xC2\xC2\xC2"; sleep 30; } | runhugs e.hs
Input is ready
True
Program error: <stdin>: IO.hGetChar: protocol error (invalid character encoding)
$
It takes 30 seconds for this error to be printed. This shows two issues:
First of all, I think you should be giving an error as soon as you have
a prefix that is the start of no character. Second, hReady now only
guarantees hGetChar won't block on a binary mode handle, but I guess
there is not much we can do except document that (short of some hideous
hacks).
Thanks
Ian
More information about the Haskell-Cafe
mailing list