[Haskell-cafe] idea for avoiding temporaries

Mon Mar 12 08:57:34 EDT 2007

>>> 1) readArray m (i,j)
>> yes, indeed. since we are dealing in bulk operations, we might as well take advantage
>> of that, so dropping the repeated bounds-checks inside the loops makes a lot of sense.
> 
> no, i say here only about memory leaks. of course, unsafeRead omits bounds 
> checking but more important in this case is that readArray created a temporary 
> memory cells - index calculation of matrix turns out to be not strict. it was biggest 
> surprise for me - i thrown a lot of time, adding 'seq' here and there before i even 
> tried to replace this call with unsafeRead

i'm not so sure about that conclusion;) i admit that i more often optimise for time than
for space, so unless there's a real space leak to get rid of, i tend to measure space
performance indirectly, by its impact on time, which is perhaps not a good habit. but 
i did a step-by-step rerun of the modifications, to see their effect on (time) performance:

    time ./CG array 100000: 33s
    time ./CG_Bulat array 100000: 8s

    33s: baseline, my original code
    30 - strict formal pars in l, in dotA/matA
    22 - inline l, for +*=/-*=
    14 - replace readArray m (i,j) by unsafeRead m (index .. (i,j)),
           replace index by unsafeIndex, eliminating bounds-check
    12 - same for readArray/writeArray v
    12 - eliminating the tuple in readMatrix makes no difference
      8 - seq-ing all parameters in l,*+=,dotA,matA

to handle the 2d indexing, i replaced readArray m (i,j) by readMatrix m (i,j):

    {-# INLINE readMatrix #-}
    readMatrix m ij = unsafeRead m (unsafeIndex matrixBounds ij)

    matrixBounds :: ((Int,Int),(Int,Int))
    matrixBounds = ((1,1),(n,n))

so we're still dealing with pairs, just got rid of the bounds-checks in readArray/index, 
and that alone brings us from 22s to 14s (12s if we do the same for vectors), a 
substantial improvement. 

eliminating the tuples, passing i and j directly into the computation, doesn't seem to 
make any further difference (shifting the indices to avoid the decrements might, but 
not by much, certainly not enough to justify extending the arrays;-), so just getting rid 
of the bounds-check had sufficiently exposed the index computation already.

    -- readMatrix m i j = unsafeRead m $! ((i-1)*n+(j-1))

ensuring strictness of all formal parameters in the basic vector/matrix operations, 
through bang-patterns or seq, brings us from 33s to 30s, and from 12s to 8s, so 
that helps a lot. 

the inline pragma on l brings us from 30s to 22s, so that helps a lot, too.

> afaik, ghc can't inline recursive functions. it will be great if ghc
> can automatically make specialized version of function it can't
> inline. so i'm wonder whether INLINE really helps anything?

perhaps it can't unroll the (conceptually infinite) body of the loop, but it can bring
copies of the definition to the places where the op parameters are known.

> (let f x = .. in f $! par) vs (let f !x = .. in f par)

so the difference is between passing evaluated parameters into functions that don't
expect them and passing parameters to functions that expect them evaluated. thanks,
that makes sense to me: apart from the boxing/unboxing of evaluated parameters,
the function body itself might look different.

thanks,
claus