Proposal: more general unionWith for Data.Map

Christian Sattler sattler.christian at gmail.com
Wed Jan 25 01:02:19 CET 2012


On 01/24/2012 11:27 PM, Johan Tibell wrote:
> On Tue, Jan 24, 2012 at 3:20 PM, Milan Straka<fox at ucw.cz>  wrote:
>> Hi,
>>
>>> merge :: (a ->  b ->  Maybe c) ->  (a ->  Maybe c) ->  (b ->  Maybe c) ->
>>> Map k a ->  Map k b ->  Map k c
>>> mergeWithKey :: (k ->  a ->  b ->  Maybe c) ->  (k ->  a ->  Maybe c) ->
>>> (k->  b ->  Maybe c) ->  Map k a ->  Map k b ->  Map k c
>>>
>>> where the function arguments are
>>> what to use when the key appears in both maps
>>> what to use when the key appears in just the left map
>>> what to use when the key appears in just the right map.
>>>
>>> It seems like it would be the ultimate fully general fast map merging routine.
>>>
>>> intersection = merge (Just . const) (const Nothing) (const Nothing)
>>> union = merge (Just . const) Just Just
>>> xor = merge ( const Nothing . const) Just Just
>>>
>>> and so forth...
>> I agree with the ultimate generality, but the problem is efficiency. For
>> example running
>>   intersection (fromList [1..1000]) (fromList [1001..2000])
>> will end very soon with empty result, but
>>   merge (Just . const) (const Nothing) (const Nothing) (fromList [1..1000]) (fromList [1001..2000])
>> will call the (const Nothing) for 2000 times.
> I have managed to use very general functions inside
> unordered-container with maintained performance by using this pattern:
>
>      -- Internal only. Always inlined.
>      merge' :: (a ->  b ->  Maybe c) ->  (a ->  Maybe c) ->  (b ->  Maybe c)
> ->  Map k a ->  Map k b ->  Map k c
>      {-# INLINE merge' #-}
>
>      merge :: (a ->  b ->  Maybe c) ->  (a ->  Maybe c) ->  (b ->  Maybe c)
> ->  Map k a ->  Map k b ->  Map k c
>      merge = merge'
>      {-# INLINABLE merge #-}
>
>      intersection = merge' (Just . const) (const Nothing) (const Nothing)
>      {-# INLINABLE intersection #-}
>
>      union = merge' (Just . const) Just Just
>      {-# INLINABLE union #-}
>
>      xor = merge' (const Nothing . const) Just Just
>      {-# INLINABLE xor #-}
>
> By inlining merge the case-of-case, case-of-known-constructor, and
> dead-code-elimination optimizations should be able to remove all
> unused code.
>
> -- Johan

Some non-trivial recursive optimizations are required here to meet 
expected performance targets, like recognizing that

   f t = case t of Bin s k x l r -> Bin s k x (f l) (f r); Tip -> Tip

can be replaced by id. I would be pleasantly surprised if the compiler 
was able to handle this.




More information about the Libraries mailing list