[Haskell-cafe] How on Earth Do You Reason about Space?
Aleksandar Dimitrov
aleks.dimitrov at googlemail.com
Wed Jun 1 00:14:54 CEST 2011
Hi Johan,
> Here's how I would do it:
I implemented your method, with these minimal changes (i.e. just using a main
driver in the same file.)
> countUnigrams :: Handle -> IO (M.Map S.ByteString Int)
> countUnigrams = foldLines (\ m s -> M.insertWith (+) s 1 m) M.empty
>
> main :: IO ()
> main = do (f:_) <- getArgs
> openFile f ReadMode >>= countUnigrams >>= print . M.toList
It seems to perform about 3x worse than the iteratee method in terms of time,
and worse in terms of space :-( On Brandon's War & Peace example, hGetLine uses
1.565 seconds for the small file, whereas my iteratee method uses 1.085s for the
small file, and around 2 minutes for the large file.
For the large file, the code above starts consuming around 2.5GB of RAM,
so it clearly has a space leak somewhere. Where, I don't know.
If you want to try it out, here's a short command line to make a test corpus the
way Brandon made one:
+++
wget 'http://www.gutenberg.org/files/2600/2600.zip';
unzip 2600.zip;
touch wnp100.txt;
for i in {1..100}; do echo -n "$i "; cat 2600.txt >> wnp100.txt; done;
echo "Done.
+++
Note, that, as I detailed in my prior email to Brandon, even if you do end up
with a (supposedly) non-leaking program for this example corpus, that doesn't
mean it'll scale well to real world data.
I also tried sprinkling strictness annotations throughout your above code, but I
failed to produce good results :-(
> We definitely need more accessible material on how to reliably write
> fast Haskell code. There are those among us who can, but it shouldn't
> be necessary to learn it in the way they did (i.e. by lots of
> tinkering, learning from the elders, etc). I'd like to write a 60 (or
> so) pages tutorial on the subject, but haven't found the time.
I'd be an eager reader :-) Please do announce it on -cafe or the "usual places"
should you ever come around to writing it!
I, unfortunately, don't really have any contact to "the elders," apart from what
I read on their respective blogs…
> In addition to RWH, perhaps the slides from the talk on
> high-performance Haskell I gave could be useful:
>
> http://blog.johantibell.com/2010/09/slides-from-my-high-performance-haskell.html
Thanks, I'll give it a look later tomorrow!
Regards,
Aleks
PS: Sorry I didn't answer you in #haskell, I ended up having to go afk for a
short while. Thanks for all your help!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: Digital signature
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110601/730dcbb6/attachment.pgp>
More information about the Haskell-Cafe
mailing list