[Haskell-cafe] [newbie] processing large logs

Udo Stenzel u.stenzel at web.de
Sun May 14 08:59:22 EDT 2006

Eugene Crosser wrote:
> Anyway, I understand that you used 'seq' in your example as a way to
> "strictify" the function that updates accumulator.  Could you (or
> anyone) explain (in plain English, preferably:) the reason why 'seq' is
> the way it is.  In the first place, why does it have the first argument
> at all

If you write 'seq a b' it means: "Should you need to evaluate b,
evaluate (the top constructor of) a first."

The example at hand was something like

update' key value map = 
	let value' = lookupWithDefault 0 key map
	in value' `seq` insert key value' map

We assume that your program will somehow demand the final value of the
map involved, so the 'insert ...' expression will be evaluated at some
point.  Due to lazy semantics that doesn't mean that value' is
evaluated, instead an unevaluated thunk is put into the map to be
evaluated once you look it up.  Since it is this thunk which takes up all the
space, we have to make sure it is evaluated eagerly.  That's what the
'seq' does: if evaluation of the map is demanded, value' has to be
evaluated before.

Notice that there is an application of seq inside of foldl', too.  Foldl
would build an expression like this:

( insert kn vn ( ... ( insert k2 v2 ( insert k1 v1 empty ) ) ... ) )

Nothing demands the evaluation of the deeply nested part.  Foldl' places
seq at the appropriate places, so evaluation progresses from the inside
out, which is exactly what you need.  If you mistakenly used foldl, the
'seq' in the update function would never be triggered.  (A single
forgotten 'seq' can sometimes ruin everything.  This makes "sprinkling
seqs until it works" quite frustrating.)

> and what should you put there?

I wish I had a good rule of thumb here.  Accumulators are a good
candidate, the things deep in data structures are good, too, and heap
profiling might point you at the right place.

> Still, it consumes 20 times more CPU...

Well, that's probably the result of strings being represented as linked
lists of unicode characters and Data.Map not being tailored to
structured keys.  You can make your code faster if you don't care that
it gets uglier.

If you're not making waves, you're not underway - Adm. Nimitz
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://www.haskell.org//pipermail/haskell-cafe/attachments/20060514/a4f838bb/attachment.bin

More information about the Haskell-Cafe mailing list