[Haskell-cafe] OCaml list sees abysmal Language Shootout results

Malcolm Wallace Malcolm.Wallace at cs.york.ac.uk
Wed Sep 29 14:53:59 EDT 2004

Graham Klyne <GK at ninebynine.org> writes:

> >     main = do file <- getContents
> >               putStrLn $ show (length $ lines file) ++ " " ++
> >                          show (length $ words file) ++ " " ++
> >                          show (length file)
> >
> >Space-leak or what?
> I can see that this requires the original file to be kept for 3-time 
> scanning,  so enough memory for the entire file will be required.  Is that 
> *the* problem to which you allude?  I can't see any other problem 
> here.

Yes, it is the main problem.  Don't forget, the shootout benchmark
runs this example over a very large input (15Mb).  Since the
character-list stored in memory for this file takes approximately 12
bytes per character, that blows up to about 180Mb to store temporarily.
The shootout performance figures reckon that ghc actually uses 223Mb
in total.

>  And why would this put Haskell at a disadvantage?

Large live heap space means a large time spent in GC, trying to find
the needle that is actual garbage in the haystack of live pointers.
It also increases the likelihood of cache misses and all kinds of
other bad memory effects.  In other words, wasting space is wasting
time.  There is a good literature on heap profiling in Haskell which
demonstrates the benefits of keeping space usage small to improve
time performance.

In any case, for the shootout, this is patently a different algorithm
to the one every other solution uses.  All the other languages do a
simple one-pass traversal of the file, in constant memory space.  Why
artificially handicap Haskell by insisting it do the job naively?


More information about the Haskell-Cafe mailing list