Proposal: more general unionWith for Data.Map

Fri Jan 27 03:48:29 CET 2012

On 1/25/12 10:45 AM, Bertram Felgenhauer wrote:
> Milan Straka wrote:
>> Anyway, we could instead of proposed unionWithKey offer functions
>>    mergeWith :: Ord k =>  (Maybe a ->  Maybe b ->  Maybe c) ->  Map k a ->  Map k b ->  Map k c
>>    mergeWithKey :: Ord k =>  (k ->  Maybe a ->  Maybe b ->  Maybe c) ->  Map k a ->  Map k b ->  Map k c
>> The combining function is executed for every unique key from both maps.
>> All functions unionWith, intersectionWith, differencseWith, proposed
>> unionWith can be expressed using this function.
>>
>> But I am not sure about the performance (using so many Maybes).
>
> There's a bigger problem with performance than this constant factor,
> namely that it's no longer possible to short-cut evaluation for
> subtrees of the map that are known to be disjoint. For example,
> empty `intersect` x currently can be computed in constant time,
> no matter what x is; this can not be done with `merge`.
>
> This reasoning justifies the existence of intersection, union and
> difference functions in Data.Map in addition to a merge function.
>
> Of course, the functions union, intersect and difference could be
> implemented as a single function that takes two boolean arguments
> to specify which of the disjoint parts to keep in the result.

The greatest generality is obtained by starting from the natural 
representation of what's going on:

     data Or a b = Fst a | Both a b | Snd b

Since we're always interested in (Or a b -> c) or (Or a b -> Maybe c) 
morphisms, we should use an Or-algebra rather than Or itself. Thus,

     data Alg a b c = Alg
         { or_fst  :: a -> Maybe c
         , or_both :: a -> b -> Maybe c
         , or_snd  :: b -> Maybe c }

In order to avoid extraneous traversals, rather than using Haskell 
functions, we can define our own function type which allows 
case-matching to identify the trivial and vacuous functions into Maybe.

     data MaybeFun a b where
         Trivial  :: MaybeFun a a
         Vacuous  :: MaybeFun a b
         Function :: (a -> Maybe b) -> MaybeFun a b

     maybefun :: MaybeFun a b -> (a -> Maybe b)
     maybefun Trivial      a = Just a
     maybefun Vacuous      a = Nothing
     maybefun (Function f) a = f a

     data Alg a b c = Alg
         { or_fst  :: !(MaybeFun a c)
         , or_both :: !(MaybeFun (a,b) c)
         , or_snd  :: !(MaybeFun b c) }

Now, rather than actually using the interpretation function (maybefun), 
we can perform the case match in the 
unionIntersectionDifferenceMergeEverythingInOne function and use the 
knowledge that one of the algebra functions is trivial or vacuous in 
order to avoid traversing the appropriate subtree; only traversing each 
of the three regions of interest as necessary.

Adding those case matches will introduce some overhead, though some of 
it may be avoidable via proper use of inlining. However, I don't think 
it'd be very pretty to expect people to use this sort of interface, so 
you'd still end up writing a bunch of helper functions to hide the 
generality by automatically constructing Alg values.

-- 
Live well,
~wren