Various folds in the containers package

Johan Tibell johan.tibell at gmail.com
Wed Jun 15 11:55:59 CEST 2011


On Wed, Jun 15, 2011 at 11:15 AM, Milan Straka <fox at ucw.cz> wrote:
> a) we could left and right folds. For the Set and IntSet it is simple --
>   we can add foldr and foldl, and deprecated fold with the message 'use
>   foldr instead'.

I think we should have both left and right folds for ordered containers.

>   For the Map and IntMap it is not so simple -- should we add both
>   foldr, foldl, foldlWithKey and foldrWithKey?

I think so.

> b) we could use the Foldable instances more. We can implement foldr and
>   foldl efficiently and let the users to use those instead of providing
>   eg Set.foldr, Set.foldl.
>
>   This does not cover the *WithKey variant, as the Foldable instances
>   over Map and IntMap does not utilize the key.

I think we should provide Foldable instances, but singe e.g. foldl'
isn't a method of the Foldable class we cannot do so as efficiently as
we'd like. I looked into this before and the core for foldl' in
Foldable is quite poor.

> c) strict folds. If we add also strict folds, we would have a total of
>   _four_ folds in Set and IntSet and __eight__ folds in Map and IntMap.
>
>   The Foldable module exports foldl' and foldr', which use foldr, foldl
>   respectively, but needs some intermediate memory (it creates a thunk
>   for every node of the structure). Apart from that, this does not
>   cover the *WithKey variants as in b).

I think we definitely need strict folds. These are by far the most
useful folds in my experience (foldr is mostly used to implement
toList).

> Personally I would add foldr, foldl, foldlWithKey, foldrWithKey and
> specialised foldr, foldl implementations in the Foldable classes.
> I am not sure about the strict folds -- I used an equivalent of the
> foldr' from the Foldable module in GHC and it was reasonable fast.
>
> Any opinions?

I think we need both lazy/strict, left/right, with/without key folds.
The duplication is quite annoying but they all provide necessary
functionality.

The duplication touches on a bigger issue in the design of our
container libraries. We currently fail at separating traversal from
"consumption" efficiently. Ideally we'd have two functions:

    preorder :: Map k v -> [(k,v)]
    postorder :: Map k v -> [(k,v)]

and use the list folds to work on these. However, build/foldr fusion
doesn't fuse this

    List.foldl' f (preorder m)

which makes the separation less viable, for performance reasons.

This also applies to things like union, intersection, etc. Ideally we
could define e.g. intersection using a general intersection function
on streams (lists)

    module Data.List where
        -- Precondition: the lists must be sorted
        intersection :: Ord a => [a] -> [a] -> [a]

    module Data.Set where
        intersection :: Ord a => Set a -> Set a -> Set a
        intersection s1 s2 = fromList . List.intersection (preorder
s1) (preorder s2)

or something similar. In other words, I'd like to achieve the
separation of algorithms and data structure that the STL achieves.
(Note: Scala also does something similar). Right now there's rampant
code duplication in the container library implementation that I'd like
to address, but at the moment I don't know how to do so efficiently.

Cheers,
Johan



More information about the Libraries mailing list