darcs patch: Improve Control.Monad.filterM:

Spencer Janssen sjanssen at cse.unl.edu
Tue Aug 8 11:47:56 EDT 2006


On Aug 8, 2006, at 7:37 AM, Chris Kuklewicz wrote:
> Simon Marlow wrote:
>> Spencer Janssen wrote:
>>> Fri Aug  4 13:49:57 CDT 2006  Spencer Janssen <sjanssen at cse.unl.edu>
>>>   * Improve Control.Monad.filterM:
>>>    * filterM is defined in terms of foldr, making it a good  
>>> consumer in GHC's      fusion framework
>>>    * filterM uses linear stack space with respect to the number  
>>> of items that      the predicate returns true, rather than the  
>>> total number of elements in      the input.
>> ...
>>> hunk ./Control/Monad.hs 151
>>> -filterM _ []     =  return []
>>> -filterM p (x:xs) =  do
>>> -   flg <- p x
>>> -   ys  <- filterM p xs
>>> -   return (if flg then x:ys else ys)
>>> +filterM p        = foldr f (return [])
>>> + where +    f x xs = do
>>> +                flg <- p x
>>> +                if flg
>>> +                    then xs >>= return . (x:)
>>> +                    else xs
>>> }
>> The new definition looks less lazy than the original, so it's not  
>> a drop-in
>> replacement.  Also, we would need some measurements to test  
>> whether this version
>> doesn't lose efficiency - it probably fuses better, but might be  
>> slower when it
>> doesn't fuse.  Rules to turn the foldr version back into the  
>> recursive version
>> might be needed (or aggressive inlining).
>> Cheers,
>>     Simon
>
> The new one looks better to me.  But the foldr is not needed:
>
> filterM _ []     =  return []
> filterM p (x:xs) =  do
>    flg <- p x
>    if flg
>       then do ys <- filterM p xs
>               return (x:ys)
>       else filterM p xs
>
> The above filterM differs from the original in when the garbage  
> collector will be able to collect "x".  It looks like the original  
> had to hold onto "x" while "ys" was being computed.  The version  
> above looks like it can discard "x" as soon as the else branch is  
> chosen.
>
> -- 
> Chris

I've benchmarked the versions from base, Chris Kuklewicz and myself.   
Each implementation was placed in a separate module, given an INLINE  
pragma and compiled with -O2.

\begin{code}
-- construct a list that won't fuse
construct 0 = []
construct i = i : construct (i - 1)

all  = filterM (return . const True)  (construct $ 2^18)
half = filterM (return . even)        (construct $ 2^18)
none = filterM (return . const False) (construct $ 2^19)
fuse = filterM (return . const False) [1 .. 2^22 :: Int]
\end{code}


Results (GHC 6.5 on an Intel Core Duo 1.83 GHz):
               | All  | Half | None | Fuse
Janssen       | 0.15 | 0.09 | 0.02 | 0.02
Kuklewicz     | 0.15 | 0.11 | 0.02 | 0.15
Control.Monad | 0.21 | 0.31 | 0.25 | stack overflow

The tests are about what we expect.  The alternate versions beat  
filterM solidly, even more so with 'sparse' predicates.  In addition,  
Janssen ties or beats Kuklewicz in each test showing that the  
overhead due to foldr isn't a problem.


Cheers,
Spencer Janssen


More information about the Libraries mailing list