Various folds in the containers package
Johan Tibell
johan.tibell at gmail.com
Wed Jun 15 11:55:59 CEST 2011
On Wed, Jun 15, 2011 at 11:15 AM, Milan Straka <fox at ucw.cz> wrote:
> a) we could left and right folds. For the Set and IntSet it is simple --
> we can add foldr and foldl, and deprecated fold with the message 'use
> foldr instead'.
I think we should have both left and right folds for ordered containers.
> For the Map and IntMap it is not so simple -- should we add both
> foldr, foldl, foldlWithKey and foldrWithKey?
I think so.
> b) we could use the Foldable instances more. We can implement foldr and
> foldl efficiently and let the users to use those instead of providing
> eg Set.foldr, Set.foldl.
>
> This does not cover the *WithKey variant, as the Foldable instances
> over Map and IntMap does not utilize the key.
I think we should provide Foldable instances, but singe e.g. foldl'
isn't a method of the Foldable class we cannot do so as efficiently as
we'd like. I looked into this before and the core for foldl' in
Foldable is quite poor.
> c) strict folds. If we add also strict folds, we would have a total of
> _four_ folds in Set and IntSet and __eight__ folds in Map and IntMap.
>
> The Foldable module exports foldl' and foldr', which use foldr, foldl
> respectively, but needs some intermediate memory (it creates a thunk
> for every node of the structure). Apart from that, this does not
> cover the *WithKey variants as in b).
I think we definitely need strict folds. These are by far the most
useful folds in my experience (foldr is mostly used to implement
toList).
> Personally I would add foldr, foldl, foldlWithKey, foldrWithKey and
> specialised foldr, foldl implementations in the Foldable classes.
> I am not sure about the strict folds -- I used an equivalent of the
> foldr' from the Foldable module in GHC and it was reasonable fast.
>
> Any opinions?
I think we need both lazy/strict, left/right, with/without key folds.
The duplication is quite annoying but they all provide necessary
functionality.
The duplication touches on a bigger issue in the design of our
container libraries. We currently fail at separating traversal from
"consumption" efficiently. Ideally we'd have two functions:
preorder :: Map k v -> [(k,v)]
postorder :: Map k v -> [(k,v)]
and use the list folds to work on these. However, build/foldr fusion
doesn't fuse this
List.foldl' f (preorder m)
which makes the separation less viable, for performance reasons.
This also applies to things like union, intersection, etc. Ideally we
could define e.g. intersection using a general intersection function
on streams (lists)
module Data.List where
-- Precondition: the lists must be sorted
intersection :: Ord a => [a] -> [a] -> [a]
module Data.Set where
intersection :: Ord a => Set a -> Set a -> Set a
intersection s1 s2 = fromList . List.intersection (preorder
s1) (preorder s2)
or something similar. In other words, I'd like to achieve the
separation of algorithms and data structure that the STL achieves.
(Note: Scala also does something similar). Right now there's rampant
code duplication in the container library implementation that I'd like
to address, but at the moment I don't know how to do so efficiently.
Cheers,
Johan
More information about the Libraries
mailing list