efficiency of mfix

Hal Daume III hdaume@ISI.EDU
Fri, 27 Sep 2002 17:19:28 -0700 (PDT)


Hi all,

I've been experimenting in ghc with mfix (I noticed on the cvs logs that
mdo is going to be supported shortly :P), so I decided to see how it would
work for me.

My simple task is to normalize an array.  That is, given an array, replace
each value with a new value such that the array sums to one and that the
ratios between values before and after normalization are the same.

There is of course a two pass solutions, using a fold and a map.  This is
implementing using IOArrays, so this looks like:

  norm a = do
    t <- foldM (\t i -> (t+) `liftM` readArray a i) 0 [1..sz]
    mapM_ (\i -> readArray a i >>= writeArray a i . (/t)) [1..sz]
    where (_,sz) = Data.Array.IO.bounds a

There is a one pass solution using mfix.  This looks like:

  mfix (\ ~s -> ioaA a 1 s)

where  ioaA is defined as:

  ioaA a i s
    | i > sz = return 0
    | otherwise =
        do s' <- ioaA a (i+1) s
           v  <- readArray a i
           writeArray a i (v/s)
           return (s' + v)
    where (_,sz) = Data.Array.IO.bounds a

Now, in practice, the two-pass solution is about twice as fast as the
one-pass solution.  I've looked a bit at the core generated, but without
knowing what's going on internally, it's hard to say where the
inefficiencies come from.  I'm hoping someone can shed some light on this.

Some timing results based on different array sizes:

50000 elements, 32m heap

fix:
0.79u 0.04s 0:00.89 93.2%
0.77u 0.07s 0:00.89 94.3%
0.75u 0.10s 0:00.94 90.4%

normal:
0.58u 0.01s 0:00.60 98.3%
0.53u 0.03s 0:00.66 84.8%
0.53u 0.05s 0:00.61 95.0%

100000 elements, 32m heap

fix:
2.40u 0.10s 0:02.73 91.5%
2.38u 0.10s 0:02.54 97.6%
2.37u 0.13s 0:02.70 92.5%

normal:
1.50u 0.03s 0:01.55 98.7%
1.49u 0.05s 0:01.57 98.0%
1.52u 0.03s 0:01.59 97.4%

200000 elements, 32m heap

fix:
7.90u 0.16s 0:08.09 99.6%
7.99u 0.17s 0:08.38 97.3%
7.96u 0.21s 0:08.40 97.2%

normal:
4.43u 0.06s 0:04.69 95.7%
4.38u 0.14s 0:04.75 95.1%
4.49u 0.07s 0:04.79 95.1%

400000 elements, 32m heap

fix:
29.09u 0.34s 0:30.95 95.0%

normal:
14.51u 0.16s 0:15.02 97.6%

Using unboxed arrays is not possible in the fixed point version
(loop).  On the normal version, it just about triples the speed across the
board.

 - Hal


--
Hal Daume III

 "Computer science is no more about computers    | hdaume@isi.edu
  than astronomy is about telescopes." -Dijkstra | www.isi.edu/~hdaume