[Haskell-cafe] How on Earth Do You Reason about Space?

Johan Tibell johan.tibell at gmail.com
Wed Jun 1 17:39:21 CEST 2011

Hi Aleks,

On Wed, Jun 1, 2011 at 12:14 AM, Aleksandar Dimitrov
<aleks.dimitrov at googlemail.com> wrote:
> I implemented your method, with these minimal changes (i.e. just using a main
> driver in the same file.)
>> countUnigrams :: Handle -> IO (M.Map S.ByteString Int)
>> countUnigrams = foldLines (\ m s -> M.insertWith (+) s 1 m) M.empty
>> main :: IO ()
>> main = do (f:_) <- getArgs
>>           openFile f ReadMode >>= countUnigrams >>= print . M.toList
> It seems to perform about 3x worse than the iteratee method in terms of time,
> and worse in terms of space :-( On Brandon's War & Peace example, hGetLine uses
> 1.565 seconds for the small file, whereas my iteratee method uses 1.085s for the
> small file, and around 2 minutes for the large file.

That's curious. I chatted with Duncan Coutts today and he mentioned
that hGetLine can be a bit slow as it needs to take a lock in every
read and causes some copying, which could explain why it's slower than
iteratee which works in blocks. However, I don't understand why it
uses more memory. The ByteStrings that are returned by hGetLine should
have an underlying storage of the same size as the ByteString (as
reported by length). You can try to verify this by calling 'copy' on
the ByteString before inserting it.

It looks like hGetLine needs some love.

> I also tried sprinkling strictness annotations throughout your above code, but I
> failed to produce good results :-(

The strictness of the code I gave should be correct. The problem
should be elsewhere.

> I, unfortunately, don't really have any contact to "the elders," apart from what
> I read on their respective blogs…

You and everyone else. :) I just spent enough time talking to people
on IRC, reading good code, blogs and mailing list posts. I think Bryan
described the process pretty well in his CUFP keynote:



More information about the Haskell-Cafe mailing list