[Haskell-cafe] Fusing foldr's

Dan Weston westondan at imageworks.com
Fri Oct 26 14:28:41 EDT 2007


Thanks for letting me know about the Data.Strict library on Hackage. I 
will definitely make use of that! BTW, you left out an import 
Data.List(foldl') in your example.

My timing test is an order of magnitude worse than yours. Do you have an 
extra zero in your list endpoint?

 > I fed these functions to ghc with the -O2 and -threaded flags and
 > timed them using the list [1..10000000]. The result (best times out of
 > several runs):
 > avg4: 284 ms
 > avgS: 184 ms
 > avgP: 248 ms

Using ghc -threaded -O2 --make Avg.hs for ghc 6.6.1, I ran your tests on 
[1..10000000] and got the user times:

avg4: 12.75 s
avgS:  3.65 s
avgP: 15.56 s

The funny thing is that avg4/avgS = 3.5 for and only 1.5 for you. I 
understand that with only 1 processor my avgP time may be twice yours, 
but not the avgS or avg4.

I have the following machine:

Main memory size: 2026 Mbytes
Num Processors: 1
Processor Type: Intel(R) Xeon(TM) CPU 2.80GHz x32
Clock Speed: 2790 MHZ

Josef Svenningsson wrote:
> Sorry for reacting so late on this mail. I'm digging through some old mails...
> 
> On 10/12/07, Dan Weston <westondan at imageworks.com> wrote:
>> Always check optimizations to make sure they are not pessimizations!
>>
>> Actually, traversing the list twice is very cheap compared to space
>> leakage, and accumulating pairs requires tuple boxing and unboxing which
>> I don't know how to get GHC not to do.
>>
> I agree hole-heartedly that replacing multiple traversals with a
> single traversal should be done with care as it more often than not
> results in a pessimization. Indeed you showed just that with your
> examples! But I'd thought it'd be interesting to see how it can
> actually be an improvement if done carefully.
> 
> \begin{code}
> import Control.Arrow
> import qualified Data.Strict.Tuple as T
> import Data.Strict.Tuple (Pair(..))
> import Control.Parallel
> 
> avg4 = uncurry (/) . (foldl' (+) 0 &&& foldl' (\x y -> x + 1) 0)
> avgS = T.uncurry (/) . foldl' (\p n -> ((+n) *!* (+1)) p) (0 :!: 0)
> avgP = uncurry (/) . (foldl' (+) 0 &!& foldl' (\x y -> x + 1) 0)
> 
> (*!*) f g (a :!: b) = f a :!: g b
> 
> (&!&) f g a = fa `par` (fa,ga)
>   where fa = f a
>         ga = g a
> \end{code}
> 
> avg4 is the function which was best among Dan's benchmarks. avgS uses
> strict tuples. I just threw in avgP for fun, it traverses the lists in
> parallel. Note: I do have a dual core machine so it makes sense to try
> avgP.
> 
> I fed these functions to ghc with the -O2 and -threaded flags and
> timed them using the list [1..10000000]. The result (best times out of
> several runs):
> avg4: 284 ms
> avgS: 184 ms
> avgP: 248 ms
> 
> It seems doing a single traversal can be faster if your write your
> function carefully. Doing the traversal in parallel was beneficial but
> not as good as the single traversal.
> 
> Cheers,
> 
> /Josef
> 
> 




More information about the Haskell-Cafe mailing list