Working character by character in Haskell

Simon Marlow simonmar@microsoft.com
Fri, 19 Oct 2001 09:49:57 +0100


> Humn... I agree with both of you, Albert and Tom. I started=20
> it from the
> beginning, using map and don't using reverse anymore. But the=20
> C program is
> still 7x faster than the Haskell one. Here is the code of the Haskell
> program:
>=20
> main :: IO ()
> main =3D do
>  bmFile <- openFileEx "in.txt" (BinaryMode ReadMode)
>  bmString <- hGetContents bmFile
>  writeFile "out.txt" (map inc bmString)
>  hClose bmFile
>=20
> inc :: Char -> Char
> inc a =3D toEnum ((fromEnum a) + 1)

Well, in Haskell each character of the string takes 20 bytes: 12 bytes
for the list cell, and 8 bytes for the character itself - the memory
used by the character will be recovered at GC time though, as long as
the character is < chr 256.  The map operation also allocates a further
28 bytes per character: list cell + thunk(8) + character, assuming the
inc operation is suitably optimised not to do any extra allocation.
That's a total of 48 bytes per character.

The C code, by comparison, doesn't do any dynamic allocation at all.

To really match the C program, you need to use IOExts.hGetBuf and
IOExts.hPutBuf, and do the operations on raw characters in memory.
Using a UArray of Word8 would be better, but there aren't any operations
to do IO to/from a UArray yet (actually I've written these, but they
aren't in the tree yet).

You should find that the IO library in GHC 5.02 is slightly faster than
the one in 5.00.2.

Anyway, I hope all this helps to explain why the Haskell version is so
slow.

Cheers,
	Simon