[Haskell] combining IntMaps

Daan Leijen daan at cs.uu.nl
Sat Jul 30 13:52:08 EDT 2005


Hi Chad,

Scherrer, Chad wrote:
> the sum of the values is zero. My specification is
> 
> addMaps :: IntMap Int -> IntMap Int -> IntMap Int
> addMaps m = IntMap.filter (/= 0) . IntMap.unionWith (+) m
> 
> But I'm not really happy with this because it traverses both maps for 
> the union, and then traverses the result to get rid of all the zeros. 
> (This function is a performance bottleneck in my current code).

The function is not so bad as it looks at first sight. First of all, the unionWith 
and filter are (somewhat) lazy and it is not the case that an entire map will be 
build before the filter is run. Furthermore, complexity wise, you can not improve the 
function much -- only a constant factor since the map is traversed twice (assuming 
that the compiler does no deforestation).

I do not think that we want families of functions like "filterUnion" etc, but maybe 
it would be good to add an "adjustInsert" function that can insert, delete, or adjust 
a key in the map -- just what you needed to implement the single-traversal algorithm:

> I thought I could do something like
> 
> addMaps' :: IntMap Int -> IntMap Int -> IntMap Int
> addMaps' = IntMap.foldWithKey f
>     where
>     f k x = IntMap.update (maybeAdd x) k
>     maybeAdd x v = let s = v + x in if s == 0 then Nothing else Just s

However, note that "union" might have a much better runtime behaviour than folding 
through one map and looking up values in the other map. Through the "hedge" algorithm 
  it more or less takes unions of sub-trees which, depending on the actual runtime 
structure, generally take much less time. Having said that, one has to be *extremely* 
careful making any performance remarks about functional data structures -- especially 
in a lazy setting. I have been surprised many times by benchmark results that changed 
completely by little changes in how the structure was 'demanded' at runtime. For the 
same reason, I highly suspect that your original 'bottleneck' in the program is 
probably not the "filter . union" call, but somewhere else in the program -- maybe 
even the wrong algorithm? From my experience, it usually doesn't help much to mess 
with the internal representation of data structures in Haskell. Usually there are so 
many factors involved that writing a clear algorithm and thinking hard about it works 
out much better in the long run (together with a handful of seq's and strictness 
annotations :-) )

I hope this helps,
All the best,
  Daan Leijen


> But this is no good, because IntMap.update only affects those keys where 
> lookup succeeds, so
> IntMap.update (maybeAdd 3) 8 IntMap.empty
> returns IntMap.empty, rather than a map with 8 -> 3 as I had hoped.
> 
> What would you suggest to help addMaps run faster? Do I need to pry open 
> the IntMap implementation to do better?
> 
> Thanks so much,
> Chad Scherrer
> 
> PS - I mistakenly crosslisted this to the GHC users list, because I 
> didn't realize how the mailing lists were set up. Please excuse my 
> noobiness.
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Haskell mailing list
> Haskell at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell



More information about the Haskell mailing list