Proposal: more general unionWith for Data.Map
Christian Sattler
sattler.christian at gmail.com
Wed Jan 25 01:02:19 CET 2012
On 01/24/2012 11:27 PM, Johan Tibell wrote:
> On Tue, Jan 24, 2012 at 3:20 PM, Milan Straka<fox at ucw.cz> wrote:
>> Hi,
>>
>>> merge :: (a -> b -> Maybe c) -> (a -> Maybe c) -> (b -> Maybe c) ->
>>> Map k a -> Map k b -> Map k c
>>> mergeWithKey :: (k -> a -> b -> Maybe c) -> (k -> a -> Maybe c) ->
>>> (k-> b -> Maybe c) -> Map k a -> Map k b -> Map k c
>>>
>>> where the function arguments are
>>> what to use when the key appears in both maps
>>> what to use when the key appears in just the left map
>>> what to use when the key appears in just the right map.
>>>
>>> It seems like it would be the ultimate fully general fast map merging routine.
>>>
>>> intersection = merge (Just . const) (const Nothing) (const Nothing)
>>> union = merge (Just . const) Just Just
>>> xor = merge ( const Nothing . const) Just Just
>>>
>>> and so forth...
>> I agree with the ultimate generality, but the problem is efficiency. For
>> example running
>> intersection (fromList [1..1000]) (fromList [1001..2000])
>> will end very soon with empty result, but
>> merge (Just . const) (const Nothing) (const Nothing) (fromList [1..1000]) (fromList [1001..2000])
>> will call the (const Nothing) for 2000 times.
> I have managed to use very general functions inside
> unordered-container with maintained performance by using this pattern:
>
> -- Internal only. Always inlined.
> merge' :: (a -> b -> Maybe c) -> (a -> Maybe c) -> (b -> Maybe c)
> -> Map k a -> Map k b -> Map k c
> {-# INLINE merge' #-}
>
> merge :: (a -> b -> Maybe c) -> (a -> Maybe c) -> (b -> Maybe c)
> -> Map k a -> Map k b -> Map k c
> merge = merge'
> {-# INLINABLE merge #-}
>
> intersection = merge' (Just . const) (const Nothing) (const Nothing)
> {-# INLINABLE intersection #-}
>
> union = merge' (Just . const) Just Just
> {-# INLINABLE union #-}
>
> xor = merge' (const Nothing . const) Just Just
> {-# INLINABLE xor #-}
>
> By inlining merge the case-of-case, case-of-known-constructor, and
> dead-code-elimination optimizations should be able to remove all
> unused code.
>
> -- Johan
Some non-trivial recursive optimizations are required here to meet
expected performance targets, like recognizing that
f t = case t of Bin s k x l r -> Bin s k x (f l) (f r); Tip -> Tip
can be replaced by id. I would be pleasantly surprised if the compiler
was able to handle this.
More information about the Libraries
mailing list