Josef Svenningsson josef.svenningsson at gmail.com
Fri Oct 26 10:34:11 EDT 2007

Sorry for reacting so late on this mail. I'm digging through some old mails...

On 10/12/07, Dan Weston <westondan at imageworks.com> wrote:
> Always check optimizations to make sure they are not pessimizations!
>
> Actually, traversing the list twice is very cheap compared to space
> leakage, and accumulating pairs requires tuple boxing and unboxing which
> I don't know how to get GHC not to do.
>
I agree hole-heartedly that replacing multiple traversals with a
single traversal should be done with care as it more often than not
results in a pessimization. Indeed you showed just that with your
examples! But I'd thought it'd be interesting to see how it can
actually be an improvement if done carefully.

\begin{code}
import Control.Arrow
import qualified Data.Strict.Tuple as T
import Data.Strict.Tuple (Pair(..))
import Control.Parallel

avg4 = uncurry (/) . (foldl' (+) 0 &&& foldl' (\x y -> x + 1) 0)
avgS = T.uncurry (/) . foldl' (\p n -> ((+n) *!* (+1)) p) (0 :!: 0)
avgP = uncurry (/) . (foldl' (+) 0 &!& foldl' (\x y -> x + 1) 0)

(*!*) f g (a :!: b) = f a :!: g b

(&!&) f g a = fa par (fa,ga)
where fa = f a
ga = g a
\end{code}

avg4 is the function which was best among Dan's benchmarks. avgS uses
strict tuples. I just threw in avgP for fun, it traverses the lists in
parallel. Note: I do have a dual core machine so it makes sense to try
avgP.

I fed these functions to ghc with the -O2 and -threaded flags and
timed them using the list [1..10000000]. The result (best times out of
several runs):
avg4: 284 ms
avgS: 184 ms
avgP: 248 ms

It seems doing a single traversal can be faster if your write your
function carefully. Doing the traversal in parallel was beneficial but
not as good as the single traversal.

Cheers,

/Josef