[Haskell] combining IntMaps
Daan Leijen
daan at cs.uu.nl
Sat Jul 30 13:52:08 EDT 2005
Hi Chad,
Scherrer, Chad wrote:
> the sum of the values is zero. My specification is
>
> addMaps :: IntMap Int -> IntMap Int -> IntMap Int
> addMaps m = IntMap.filter (/= 0) . IntMap.unionWith (+) m
>
> But I'm not really happy with this because it traverses both maps for
> the union, and then traverses the result to get rid of all the zeros.
> (This function is a performance bottleneck in my current code).
The function is not so bad as it looks at first sight. First of all, the unionWith
and filter are (somewhat) lazy and it is not the case that an entire map will be
build before the filter is run. Furthermore, complexity wise, you can not improve the
function much -- only a constant factor since the map is traversed twice (assuming
that the compiler does no deforestation).
I do not think that we want families of functions like "filterUnion" etc, but maybe
it would be good to add an "adjustInsert" function that can insert, delete, or adjust
a key in the map -- just what you needed to implement the single-traversal algorithm:
> I thought I could do something like
>
> addMaps' :: IntMap Int -> IntMap Int -> IntMap Int
> addMaps' = IntMap.foldWithKey f
> where
> f k x = IntMap.update (maybeAdd x) k
> maybeAdd x v = let s = v + x in if s == 0 then Nothing else Just s
However, note that "union" might have a much better runtime behaviour than folding
through one map and looking up values in the other map. Through the "hedge" algorithm
it more or less takes unions of sub-trees which, depending on the actual runtime
structure, generally take much less time. Having said that, one has to be *extremely*
careful making any performance remarks about functional data structures -- especially
in a lazy setting. I have been surprised many times by benchmark results that changed
completely by little changes in how the structure was 'demanded' at runtime. For the
same reason, I highly suspect that your original 'bottleneck' in the program is
probably not the "filter . union" call, but somewhere else in the program -- maybe
even the wrong algorithm? From my experience, it usually doesn't help much to mess
with the internal representation of data structures in Haskell. Usually there are so
many factors involved that writing a clear algorithm and thinking hard about it works
out much better in the long run (together with a handful of seq's and strictness
annotations :-) )
I hope this helps,
All the best,
Daan Leijen
> But this is no good, because IntMap.update only affects those keys where
> lookup succeeds, so
> IntMap.update (maybeAdd 3) 8 IntMap.empty
> returns IntMap.empty, rather than a map with 8 -> 3 as I had hoped.
>
> What would you suggest to help addMaps run faster? Do I need to pry open
> the IntMap implementation to do better?
>
> Thanks so much,
> Chad Scherrer
>
> PS - I mistakenly crosslisted this to the GHC users list, because I
> didn't realize how the mailing lists were set up. Please excuse my
> noobiness.
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Haskell mailing list
> Haskell at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell
More information about the Haskell
mailing list