Need principled approach to strictness properties in Data.Map.Strict

Johan Tibell johan.tibell at gmail.com
Sat Nov 19 00:03:31 CET 2011


Hi all,

After trying to document the strictness properties of the new
Data.Map.Strict module, which is meant to provides a value strict
version of the Data.Map API, I feel less sure about our prior decision
to make all functions strict in value arguments (even though I was the
main proponent).

The documentation I ended up with in Data.Map.Strict was:

    Strictness properties
    =====================

    This module satisfies the following strictness properties:

     1. Key and value arguments are evaluated to WHNF;

     2. Keys and values are evaluated to WHNF before they are stored in
     the map.

    Here are some examples that illustrate the first property:

        insertWith (\ old new -> old) k undefined m == undefined
        delete undefined m == undefined

    Here are some examples that illustrate the second property:

        map (\ v -> undefined) m == undefined -- m is not empty
        mapKeys (\ k -> undefined) m == undefined -- m is not empty

More than one person said the distinction between (1) and (2) isn't
clear and someone else felt that the extra (1) is unnecessary (for
values).

We definitely want (2): it's the property that ensures that the map
never contains any thunks (if we also guarantee that the map is also
spine strict). This property eliminates the space leaks that people
sometimes run into. This property is the raison d'etre for the
Data.Map.Strict module.

I would like to settle the strictness properties of Data.Map.Strict
once and for all. I would prefer to find a principled reason for the
resulting decision or, if I can't have that, I want to lay out all the
pros and cons and then make a decision. I think this issue is
important, as we're likely to see more modules that help the users get
a hold on evaluation order in the future.

For reference, here are the arguments I used to argue for having
functions be strict in value arguments last time we discussed this:

## Ease of reasoning about evaluation order

I felt that being consistently strict would it make it easier to
reason about the space usage/evaluation order of code that use the
API. Without the extra strictness there are degenerate cases that can
catch people off guard. For example,

    insertWith (\ old new -> if old == maxValue then old else new + 1) k v m

is lazy in 'v' because there's a rare case (old == maxValue) where
'new' isn't evaluated.

Another potential benefit: in

    f :: Int -> Int -> Int -> Int
    f x y z
        | x == y = z
        | otherwise = 1

you can rely on == being strict in both arguments and avoid putting
redundant bang on x and y (assuming that you want them strict). I was
hoping that making all function value strict would enable similar
reasoning where Data.Map.Strict is used.

## Constant factor performance improvements

We already have (1) for keys in Data.Map. It's an important
optimization as it allows us to keep the key unboxed in loops.
Consider:

    lookup :: Ord k => k -> Map k a -> Maybe a
    lookup = go
      where
        go !_ Tip = Nothing
        go k (Bin _ kx x l r) =
            case compare k kx of
                LT -> go k l
                GT -> go k r
                EQ -> Just x
    {-# INLINABLE lookup #-}

This function is strict in the key argument, even though the first
case alternative doesn't use it. This allows us to compile the inner
loop to

    go :: Int# -> Map Int Int -> Maybe Int
    go k# m = case m of
       Tip -> Nothing
       (Bin _ kx x l r) -> case kx of
           I# kx# -> case <# k# kx# of
               True -> go k# l
               ...

when lookup is specialized for some unboxable key type (here: Int).
This is a performance win.

There's a small (or even tiny) potential performance win in having
strict value arguments. Given

    insertWith :: Ord k => (v -> v -> v) -> k -> v -> Map k v -> Map k v
    insertWith = go
      where
        go _ !k def Tip = def `seq` Bin 1 k def Tip Tip
        go f k def (Bin sz ky y l r) =
            case compare k ky of
                LT -> balanceL ky y (go f k x l) r
                GT -> balanceR ky y l (go f k x r)
                EQ -> let x = f x y in x `seq` Bin sz k x l r
    {-# INLINEABLE insertWith #-}

and some call site

    ...let v = ... in v `seq` insertWith (+) k v m...

v will end up being evaluated twice, once at the call site and once
inside insertWith. If insertWith was strict in the value argument the
evaluation at the call site could perhaps be avoided (Simons?).

A second, perhaps even more theoretical, possible optimization we
could do is this: If we could specialize the runtime representation of
Map to have unboxed keys and value (using e.g. associated data types),
then we could write more allocation free functions. For example:

    findWithDefault :: Ord k => k -> v -> Map k v -> v
    findWithDefault = go
      where
        go !_ !def Tip = def
        go k def (Bin _ kx x l r) =
            case compare k kx of
                LT -> go k def l
                GT -> go k def r
                EQ -> x
    {-# INLINABLE findWithDefault #-}

specialized for Ints (and thus Int#) could (using a w/w
transformation) take def as an Int# and return it (or the found) value
as an Int#. We don't really know how to do this well in practice so
this remains a theoretical benefit, for now.

Thoughts?

Cheers,
Johan



More information about the Libraries mailing list