Proposal: more general unionWith for Data.Map
wren ng thornton
wren at freegeek.org
Fri Jan 27 03:48:29 CET 2012
On 1/25/12 10:45 AM, Bertram Felgenhauer wrote:
> Milan Straka wrote:
>> Anyway, we could instead of proposed unionWithKey offer functions
>> mergeWith :: Ord k => (Maybe a -> Maybe b -> Maybe c) -> Map k a -> Map k b -> Map k c
>> mergeWithKey :: Ord k => (k -> Maybe a -> Maybe b -> Maybe c) -> Map k a -> Map k b -> Map k c
>> The combining function is executed for every unique key from both maps.
>> All functions unionWith, intersectionWith, differencseWith, proposed
>> unionWith can be expressed using this function.
>>
>> But I am not sure about the performance (using so many Maybes).
>
> There's a bigger problem with performance than this constant factor,
> namely that it's no longer possible to short-cut evaluation for
> subtrees of the map that are known to be disjoint. For example,
> empty `intersect` x currently can be computed in constant time,
> no matter what x is; this can not be done with `merge`.
>
> This reasoning justifies the existence of intersection, union and
> difference functions in Data.Map in addition to a merge function.
>
> Of course, the functions union, intersect and difference could be
> implemented as a single function that takes two boolean arguments
> to specify which of the disjoint parts to keep in the result.
The greatest generality is obtained by starting from the natural
representation of what's going on:
data Or a b = Fst a | Both a b | Snd b
Since we're always interested in (Or a b -> c) or (Or a b -> Maybe c)
morphisms, we should use an Or-algebra rather than Or itself. Thus,
data Alg a b c = Alg
{ or_fst :: a -> Maybe c
, or_both :: a -> b -> Maybe c
, or_snd :: b -> Maybe c }
In order to avoid extraneous traversals, rather than using Haskell
functions, we can define our own function type which allows
case-matching to identify the trivial and vacuous functions into Maybe.
data MaybeFun a b where
Trivial :: MaybeFun a a
Vacuous :: MaybeFun a b
Function :: (a -> Maybe b) -> MaybeFun a b
maybefun :: MaybeFun a b -> (a -> Maybe b)
maybefun Trivial a = Just a
maybefun Vacuous a = Nothing
maybefun (Function f) a = f a
data Alg a b c = Alg
{ or_fst :: !(MaybeFun a c)
, or_both :: !(MaybeFun (a,b) c)
, or_snd :: !(MaybeFun b c) }
Now, rather than actually using the interpretation function (maybefun),
we can perform the case match in the
unionIntersectionDifferenceMergeEverythingInOne function and use the
knowledge that one of the algebra functions is trivial or vacuous in
order to avoid traversing the appropriate subtree; only traversing each
of the three regions of interest as necessary.
Adding those case matches will introduce some overhead, though some of
it may be avoidable via proper use of inlining. However, I don't think
it'd be very pretty to expect people to use this sort of interface, so
you'd still end up writing a bunch of helper functions to hide the
generality by automatically constructing Alg values.
--
Live well,
~wren
More information about the Libraries
mailing list