[Haskell-beginners] Fwd: Space consumption and strictness in strict map

John Lusk john-haskell at how-hard-can-it-be.com
Sat Oct 24 19:36:48 UTC 2015

---------- Forwarded message ----------
From: John Lusk <johnlusk4 at gmail.com>
Date: Sat, Oct 24, 2015 at 3:25 PM
Subject: Space consumption and strictness in strict map
To: beginners at haskell.org

Hey, I have a problem I "solved" by sprinkling !s around like pixie dust,
but I'd like to know what's going on.  I'll attach my program, in all its
messy glory, but the whole ball of wax is at
https://github.com/JohnL4/PassphraseGenerator.  It takes as input one of
the 2012 Google ngram (specifically, 1-gram) raw data files.  The expected
input format is documented in several places.  (For testing, I took the
first million lines.)

The bang that worked is on the 3rd argument of 'wordCount', the map.  I
tried commenting out the "(Map.insertWith (+) ngram matchCount aMap)" part
(and just returning the input map).  When I did that, boy was space
consumption small, so that line is part of the problem.

So, what's going on without the bang?

I guess I'm just building up a bunch of thunks w/that Map.insertWith call,
but what kind of thunks? Unevaluated calls to splitOn and (!!) and read and
(+) and lines?

I thought a strict Map would avoid that.  I guess WHNF isn't enough?
Looking at https://wiki.haskell.org/Weak_head_normal_form, I don't *think* I
have any constructors in there, but profiling with -hd shows me that (:) is
the most frequently occurring closure. Where is *that* coming from?  Is the
occurrence of (:) so high because it's either a built-in function applied
to too few arguments or a lambda abstraction? Which one?  (Also, is "lambda
abstraction" the same as "lambda expresson"?)

And then... what does that bang on the map argument do?  Does it force
evaluation of the passed argument all the way down to primitives, so that
we truly get a data structure containing only strings and ints (and no
thunks)?  What does that do to time complexity?  Does it have to traverse
the entire map looking for thunks, even though I only added one at some
random location in the map?

Is there a better way?

I guess I need to force strictness somewhere else, but I'm not sure how.  I
tried using seq (a little half-heartedly), and ($!), but I guess I did it
wrong and only wound up with more thunks to seq and id ($!), right?  So, I
wound up with lazy strictness?

Thanks for any help.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/beginners/attachments/20151024/06fc7aad/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mostFrequent.hs
Type: application/octet-stream
Size: 2725 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/beginners/attachments/20151024/06fc7aad/attachment.obj>

More information about the Beginners mailing list