Need principled approach to strictness properties in Data.Map.Strict

Edward Z. Yang ezyang at MIT.EDU
Sun Nov 20 21:14:48 CET 2011


Here is perhaps one alternate way to frame the documentation:

    Rule (1) applies to first-order functions (they take the value directly)
    Rule (2) applies to higher-order functions (they take functions which generate the value)

I think the appeal of only specifying (2) is that it is *consistent*, though
perhaps not in the way someone who hasn't thought too carefully about the issue
would immediately assume.  I admit, I do fear that there exist some higher
order functions for which we really don't want (1) to apply, but fortunately,
it's easier to work around being overly strict than being overly lazy.

Edward

Excerpts from Johan Tibell's message of Fri Nov 18 18:03:31 -0500 2011:
> Hi all,
> 
> After trying to document the strictness properties of the new
> Data.Map.Strict module, which is meant to provides a value strict
> version of the Data.Map API, I feel less sure about our prior decision
> to make all functions strict in value arguments (even though I was the
> main proponent).
> 
> The documentation I ended up with in Data.Map.Strict was:
> 
>     Strictness properties
>     =====================
> 
>     This module satisfies the following strictness properties:
> 
>      1. Key and value arguments are evaluated to WHNF;
> 
>      2. Keys and values are evaluated to WHNF before they are stored in
>      the map.
> 
>     Here are some examples that illustrate the first property:
> 
>         insertWith (\ old new -> old) k undefined m == undefined
>         delete undefined m == undefined
> 
>     Here are some examples that illustrate the second property:
> 
>         map (\ v -> undefined) m == undefined -- m is not empty
>         mapKeys (\ k -> undefined) m == undefined -- m is not empty
> 
> More than one person said the distinction between (1) and (2) isn't
> clear and someone else felt that the extra (1) is unnecessary (for
> values).
> 
> We definitely want (2): it's the property that ensures that the map
> never contains any thunks (if we also guarantee that the map is also
> spine strict). This property eliminates the space leaks that people
> sometimes run into. This property is the raison d'etre for the
> Data.Map.Strict module.
> 
> I would like to settle the strictness properties of Data.Map.Strict
> once and for all. I would prefer to find a principled reason for the
> resulting decision or, if I can't have that, I want to lay out all the
> pros and cons and then make a decision. I think this issue is
> important, as we're likely to see more modules that help the users get
> a hold on evaluation order in the future.
> 
> For reference, here are the arguments I used to argue for having
> functions be strict in value arguments last time we discussed this:
> 
> ## Ease of reasoning about evaluation order
> 
> I felt that being consistently strict would it make it easier to
> reason about the space usage/evaluation order of code that use the
> API. Without the extra strictness there are degenerate cases that can
> catch people off guard. For example,
> 
>     insertWith (\ old new -> if old == maxValue then old else new + 1) k v m
> 
> is lazy in 'v' because there's a rare case (old == maxValue) where
> 'new' isn't evaluated.
> 
> Another potential benefit: in
> 
>     f :: Int -> Int -> Int -> Int
>     f x y z
>         | x == y = z
>         | otherwise = 1
> 
> you can rely on == being strict in both arguments and avoid putting
> redundant bang on x and y (assuming that you want them strict). I was
> hoping that making all function value strict would enable similar
> reasoning where Data.Map.Strict is used.
> 
> ## Constant factor performance improvements
> 
> We already have (1) for keys in Data.Map. It's an important
> optimization as it allows us to keep the key unboxed in loops.
> Consider:
> 
>     lookup :: Ord k => k -> Map k a -> Maybe a
>     lookup = go
>       where
>         go !_ Tip = Nothing
>         go k (Bin _ kx x l r) =
>             case compare k kx of
>                 LT -> go k l
>                 GT -> go k r
>                 EQ -> Just x
>     {-# INLINABLE lookup #-}
> 
> This function is strict in the key argument, even though the first
> case alternative doesn't use it. This allows us to compile the inner
> loop to
> 
>     go :: Int# -> Map Int Int -> Maybe Int
>     go k# m = case m of
>        Tip -> Nothing
>        (Bin _ kx x l r) -> case kx of
>            I# kx# -> case <# k# kx# of
>                True -> go k# l
>                ...
> 
> when lookup is specialized for some unboxable key type (here: Int).
> This is a performance win.
> 
> There's a small (or even tiny) potential performance win in having
> strict value arguments. Given
> 
>     insertWith :: Ord k => (v -> v -> v) -> k -> v -> Map k v -> Map k v
>     insertWith = go
>       where
>         go _ !k def Tip = def `seq` Bin 1 k def Tip Tip
>         go f k def (Bin sz ky y l r) =
>             case compare k ky of
>                 LT -> balanceL ky y (go f k x l) r
>                 GT -> balanceR ky y l (go f k x r)
>                 EQ -> let x = f x y in x `seq` Bin sz k x l r
>     {-# INLINEABLE insertWith #-}
> 
> and some call site
> 
>     ...let v = ... in v `seq` insertWith (+) k v m...
> 
> v will end up being evaluated twice, once at the call site and once
> inside insertWith. If insertWith was strict in the value argument the
> evaluation at the call site could perhaps be avoided (Simons?).
> 
> A second, perhaps even more theoretical, possible optimization we
> could do is this: If we could specialize the runtime representation of
> Map to have unboxed keys and value (using e.g. associated data types),
> then we could write more allocation free functions. For example:
> 
>     findWithDefault :: Ord k => k -> v -> Map k v -> v
>     findWithDefault = go
>       where
>         go !_ !def Tip = def
>         go k def (Bin _ kx x l r) =
>             case compare k kx of
>                 LT -> go k def l
>                 GT -> go k def r
>                 EQ -> x
>     {-# INLINABLE findWithDefault #-}
> 
> specialized for Ints (and thus Int#) could (using a w/w
> transformation) take def as an Int# and return it (or the found) value
> as an Int#. We don't really know how to do this well in practice so
> this remains a theoretical benefit, for now.
> 
> Thoughts?
> 
> Cheers,
> Johan
> 



More information about the Libraries mailing list