efficiency of mfix

Hal Daume III hdaume@ISI.EDU
Mon, 14 Oct 2002 09:25:47 -0700 (PDT)


  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.
  Send mail to mime@docserver.cac.washington.edu for more info.

---559023410-342241519-1034612747=:22747
Content-Type: TEXT/PLAIN; charset=US-ASCII

Hi again, all.

So I rewrote some of the versions, so there are now six versions of the
array normalization code.  They are:

normal:         combination of foldM and mapM_
loop:           a two-pass loop mimicking foldM and mapM_
unboxed-normal: normal on unboxed arrays
unboxed-loop:   loop on unboxed arrays
fix:            using fixIO and a look with Double accumulator
cpsfix:         using fixIO and a CPS accumulator

I ran each of these on arrays of size 100,000, 250,000, 500,000 and
1,000,000 elements.  The results are:

        |       FIXIO        |                  TWO LOOPS
        |  fix       cpsfix  |   map/fold    loop    unboxed-m/f
unboxed-loop
--------+--------------------+--------------------------------------------------
 100000 |  0.71       1.37   |     1.60      1.61       0.90          0.42      
 250000 |  6.51       5.74   |     7.48      7.24       2.90          1.22      
 500000 | 23.88      21.32   |    25.35     26.38       7.51          2.27      
1000000 | 92.72      79.16   |    97.83    105.73      21.78          4.54      

(sorry if that wraps on your screen -- see the attached file)

So, looking at the FIXIO methods, for large arrays, cpsfix seems to
dominate.  There's a small overhead for small arrays, but this is passed
once we get to 250,000 elements (probably much before).

I'm (pleasantly) surprised that both fix and cps fix consistently beat the
two boxed implementations on the right.  I'm shocked that the handwritten
loop version does worse than the map/fold version and cannot explain this.

Yet, when looking at the unboxed arrays, the map/fold version does *much*
worse than the loop version.  I'm guessing this has to do with the fact
that in the loop version the compiler is unboxing the index variable for
the whole loop, rather than unboxing each element in the list for map/fold
one-by-one.

While I'm happy that the fix versions outperform the 2-pass versions for
boxed arrays, the discrepency between 79.16 seconds for one million
elements and 4.54 sectons on the same data is alarming.  Can anyone
suggest a way to reconcile this?

 - Hal

p.s., I've attached the code and results (as comments in the code).


--
Hal Daume III

 "Computer science is no more about computers    | hdaume@isi.edu
  than astronomy is about telescopes." -Dijkstra | www.isi.edu/~hdaume

On Sat, 28 Sep 2002, Levent Erkok wrote:

> On Friday 27 September 2002 05:19 pm, Hal Daume III wrote:
> > There is a one pass solution using mfix.  This looks like:
> >
> >   mfix (\ ~s -> ioaA a 1 s)
> >
> > where  ioaA is defined as:
> >
> >   ioaA a i s
> >
> >     | i > sz = return 0
> >     | otherwise =
> >
> >         do s' <- ioaA a (i+1) s
> >            v  <- readArray a i
> >            writeArray a i (v/s)
> >            return (s' + v)
> >     where (_,sz) = Data.Array.IO.bounds a
> >
> > Using unboxed arrays is not possible in the fixed point version
> > (loop).  On the normal version, it just about triples the speed across the
> > board.
> 
> Hi,
> 
> I'm not sure if it's mfix we should blame here. Can you please 
> experiment with the following version of normalize and report on 
> running times?
> 
>   norm a = mdo s <- ioaA 1 s id
>                return ()
>     where (_, sz) = bounds a
>           ioaA i s acc
>            | i > sz = return (acc 0)
>            | True   =
>                do v <- readArray a i
>                   writeArray a i (v/s)
>                   ioaA (i+1) s (\x -> v + acc x)
> 
> It'd be interesting to see the results especially for very large 
> inputs (i.e. >= half a million elements.)
> 
> -Levent.
> 


---559023410-342241519-1034612747=:22747
Content-Type: TEXT/PLAIN; charset=US-ASCII; name="Fix.hs"
Content-Transfer-Encoding: BASE64
Content-ID: <Pine.GSO.4.21.0210140925470.22747@moussor.isi.edu>
Content-Description: 
Content-Disposition: attachment; filename="Fix.hs"

bW9kdWxlIE1haW4gd2hlcmUNCg0KaW1wb3J0IERhdGEuQXJyYXkNCmltcG9y
dCBEYXRhLkFycmF5LklPDQppbXBvcnQgQ29udHJvbC5Nb25hZA0KaW1wb3J0
IENvbnRyb2wuTW9uYWQuRml4DQppbXBvcnQgU3lzdGVtDQoNCg0KaW9hQSBh
IHMgPSBpb2FBJyAxIHMgMA0KICAgIHdoZXJlIGlvYUEnIGkgcyBhY2MNCiAg
ICAgICAgICAgICAgfCBpID4gc3ogPSByZXR1cm4gYWNjDQogICAgICAgICAg
ICAgIHwgVHJ1ZSAgID0gZG8gdiA8LSByZWFkQXJyYXkgYSBpDQogICAgICAg
ICAgICAgICAgICAgICAgICAgICAgd3JpdGVBcnJheSBhIGkgKHYgLyBzKQ0K
ICAgICAgICAgICAgICAgICAgICAgICAgICAgIGlvYUEnIChpKzEpIHMgJCEg
KHYgKyBhY2MpDQogICAgICAgICAgKF8sIHN6KSA9IERhdGEuQXJyYXkuSU8u
Ym91bmRzIGENCg0KaW9hQUNQUyBhIHMgPSBpb2FBQ1BTJyAxIHMgaWQNCiAg
ICB3aGVyZSAoXywgc3opID0gRGF0YS5BcnJheS5JTy5ib3VuZHMgYQ0KICAg
ICAgICAgIGlvYUFDUFMnIGkgcyBhY2MNCiAgICAgICAgICAgfCBpID4gc3og
PSByZXR1cm4gKGFjYyAwKQ0KICAgICAgICAgICB8IFRydWUgICA9DQogICAg
ICAgICAgICAgICBkbyB2IDwtIHJlYWRBcnJheSBhIGkNCiAgICAgICAgICAg
ICAgICAgIHdyaXRlQXJyYXkgYSBpICh2L3MpDQogICAgICAgICAgICAgICAg
ICBpb2FBQ1BTJyAoaSsxKSBzIChceCAtPiB2ICsgYWNjIHgpDQoNCg0KbWFp
biA9IGRvDQogIFttZXRob2Qsbl0gPC0gZ2V0QXJncw0KICAoYTo6SU9BcnJh
eSBJbnQgRG91YmxlKSA8LSBuZXdMaXN0QXJyYXkgKDE6OkludCxyZWFkIG4p
IFsoMTo6RG91YmxlKS4ucmVhZCBuXQ0KICBpZiBtZXRob2QgPT0gImZpeCIN
CiAgICB0aGVuIG1maXggKFwgfnMgLT4gaW9hQSBhIHMpID4+IHJldHVybiAo
KQ0KICAgIGVsc2UgaWYgbWV0aG9kID09ICJjcHMiDQogICAgICAgICAgIHRo
ZW4gbWZpeCAoXCB+cyAtPiBpb2FBQ1BTIGEgcykgPj4gcmV0dXJuICgpDQog
ICAgICAgICAgIGVsc2UgaWYgbWV0aG9kID09ICJsb29wIg0KICAgICAgICAg
ICAgICAgICAgdGhlbiBub3JtTG9vcCBhDQogICAgICAgICAgICAgICAgICBl
bHNlIG5vcm0gYQ0KICByZXR1cm4gKCkNCg0Kbm9ybSBhID0gZG8NCiAgdCA8
LSBmb2xkTSAoXHQgaSAtPiAodCspIGBsaWZ0TWAgcmVhZEFycmF5IGEgaSkg
MCBbMS4uc3pdDQogIG1hcE1fIChcaSAtPiByZWFkQXJyYXkgYSBpID4+PSB3
cml0ZUFycmF5IGEgaSAuICgvdCkpIFsxLi5zel0NCiAgd2hlcmUgKF8sc3op
ID0gRGF0YS5BcnJheS5JTy5ib3VuZHMgYQ0KDQpub3JtTG9vcCBhID0gZG8g
dCA8LSBub3JtTG9vcCcgMSAwDQogICAgICAgICAgICAgICAgbm9ybUxvb3An
JyAxIHQNCiAgICB3aGVyZSBub3JtTG9vcCcgaSBhY2MNCiAgICAgICAgICAg
ICAgfCBpID4gc3ogPSByZXR1cm4gYWNjDQogICAgICAgICAgICAgIHwgVHJ1
ZSAgID0gZG8gdiA8LSByZWFkQXJyYXkgYSBpDQogICAgICAgICAgICAgICAg
ICAgICAgICAgICAgbm9ybUxvb3AnIChpKzEpICQhICh2ICsgYWNjKQ0KICAg
ICAgICAgIG5vcm1Mb29wJycgaSB0DQogICAgICAgICAgICAgIHwgaSA+IHN6
ID0gcmV0dXJuICgpDQogICAgICAgICAgICAgIHwgVHJ1ZSAgID0gZG8gdiA8
LSByZWFkQXJyYXkgYSBpDQogICAgICAgICAgICAgICAgICAgICAgICAgICAg
d3JpdGVBcnJheSBhIGkgKHYvdCkNCiAgICAgICAgICAgICAgICAgICAgICAg
ICAgICBub3JtTG9vcCcnIChpKzEpIHQNCiAgICAgICAgICAoXywgc3opID0g
RGF0YS5BcnJheS5JTy5ib3VuZHMgYQ0KDQp7LQ0KICAgICAgICB8ICAgICAg
IEZJWElPICAgICAgICB8ICAgICAgICAgICAgICAgICAgVFdPIExPT1BTDQog
ICAgICAgIHwgIGZpeCAgICAgICBjcHNmaXggIHwgICBtYXAvZm9sZCAgICBs
b29wICAgIHVuYm94ZWQtbS9mICB1bmJveGVkLWxvb3ANCi0tLS0tLS0tKy0t
LS0tLS0tLS0tLS0tLS0tLS0tKy0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQogMTAwMDAwIHwgIDAuNzEgICAg
ICAgMS4zNyAgIHwgICAgIDEuNjAgICAgICAxLjYxICAgICAgIDAuOTAgICAg
ICAgICAgMC40MiAgICAgIA0KIDI1MDAwMCB8ICA2LjUxICAgICAgIDUuNzQg
ICB8ICAgICA3LjQ4ICAgICAgNy4yNCAgICAgICAyLjkwICAgICAgICAgIDEu
MjIgICAgICANCiA1MDAwMDAgfCAyMy44OCAgICAgIDIxLjMyICAgfCAgICAy
NS4zNSAgICAgMjYuMzggICAgICAgNy41MSAgICAgICAgICAyLjI3ICAgICAg
DQoxMDAwMDAwIHwgOTIuNzIgICAgICA3OS4xNiAgIHwgICAgOTcuODMgICAg
MTA1LjczICAgICAgMjEuNzggICAgICAgICAgNC41NCAgICAgIA0KLX0=
---559023410-342241519-1034612747=:22747--