[Haskell-beginners] Speed performance problem on Windows?

MAN elviotoccalino at gmail.com
Sat Mar 6 13:50:46 EST 2010



For the record, I'm adding my numbers to the pool:

Calling "bigmean1.hs" to the first piece of code (the recursive version)
and "bigmean2.hs" to the second (the one using 'foldU'), I compiled four
versions of the two and timed them while they computed the mean of
[1..1e9]. Here are the results:


MY SYSTEM (512 RAM, Mobile AMD Sempron(tm) 3400+ proc [1 core]) (you're
run-o-the-mill Ubuntu laptop):
~$ uname -a
Linux dy-book 2.6.31-19-generic #56-Ubuntu SMP Thu Jan 28 01:26:53 UTC
2010 i686 GNU/Linux
~$ ghc -V
The Glorious Glasgow Haskell Compilation System, version 6.12.1

RUN 1 - C generator, without excess-precision

~$ ghc -o bigmean1 --make -fforce-recomp -O2 -fvia-C -optc-O3
bigmean1.hs 
~$ ghc -o bigmean2 --make -fforce-recomp -O2 -fvia-C -optc-O3
bigmean2.hs 

~$ time ./bigmean1 1e9
500000000.067109

real 0m47.685s	user 0m47.655s	sys 0m0.000s

~$ time ./bigmean2 1e9
500000000.067109

real 1m4.696s	user 1m4.324s	sys 0m0.028s


RUN 2 - default generator, no excess-precision

~$ ghc --make -O2 -fforce-recomp -o bigmean2-noC bigmean2.hs
~$ ghc --make -O2 -fforce-recomp -o bigmean1-noC bigmean1.hs

~$ time ./bigmean1-noC 1e9
500000000.067109

real 0m16.571s	user 0m16.493s	sys 0m0.012s

~$ time ./bigmean2-noC 1e9
500000000.067109

real 0m27.146s	user 0m27.086s	sys 0m0.004s


RUN 3 - C generator, with excess-precision.

~$ ghc --make -fforce-recomp -O2 -fvia-C -optc-O3 -fexcess-precision -o
bigmean1-precis bigmean1.hs 
~$ ghc --make -fforce-recomp -O2 -fvia-C -optc-O3 -fexcess-precision -o
bigmean2-precis bigmean2.hs 

~$ time ./bigmean1-precis 1e9
500000000.067109

real 0m11.937s	user 0m11.841s	sys 0m0.012s

~$ 
time ./bigmean2-precis 1e9
500000000.067109

real 0m17.105s	user 0m17.081s	sys 0m0.004s


RUN 4 - default generator, with excess-precision

~$ ghc --make -fforce-recomp -O2 -fexcess-precision -o bigmean1-precis
bigmean1.hs
~$ ghc --make -fforce-recomp -O2 -fexcess-precision -o bigmean2-precis
bigmean2.hs

~$ time ./bigmean1-precis 1e9
500000000.067109

real 0m16.521s	user 0m16.413s	sys 0m0.008s

~$ time ./bigmean2-precis 1e9

500000000.067109

real 0m27.381s	user 0m27.190s	sys 0m0.016s


CONCLUSIONS:
· Big difference between the two versions (recursive and
fusion-oriented). I check compiling with -ddump-simple-stats, and the
rule mention in Don's article IS being fired (streamU/unstraemU) once.
The recursive expression of the algorithm is quite faster
· Big gain adding the excess-precision flag to the compiling step, even
if not using the C code generator.
· The best time is achieved compiling through the C generator, with
excess-precis flag; second best (5 seconds away in execution) is adding
the same flag to the default generator.

I didn't know of the -fexcess-precision. It really makes a BIG
difference to number cruncher modules :D


El sáb, 06-03-2010 a las 01:36 +0100, Daniel Fischer escribió:
> Am Samstag 06 März 2010 00:20:52 schrieb Travis Erdman:
> > I'm working through one of Don Stewart's many excellent articles ...
> >
> > http://cgi.cse.unsw.edu.au/~dons/blog/2008/06/04#fast-fusion
> >
> > I faithfully re-created the source of his initial GHC reference
> > implementation as:
> <snip>
> >
> > Then, compiled and executed like this:
> >
> > C:\Documents and Settings\Travis\My Documents\Haskell Code>ghc -O2
> > biglistmean.hs -optc-O2 -fvia-C --make -fforce-recomp [1 of 1] Compiling
> > Main             ( biglistmean.hs, biglistmean.o ) Linking
> > biglistmean.exe ...
> 
> Not the best combination of options, for me at least. On my box, that is 
> approximately 35% slower than -O2 with the native code generator.
> 
> >
> > On the final test of 10^9, Don reports that it took 1.76 secs on his
> > machine.
> 
> Well, Don has a super fast 64-bit thingy, on normal machines, all code runs 
> much slower than on Don's :)
> 
> > In contrast, just 10^8 takes 12.63 secs on my machine
> 
> But not that much slower, ouch.
> 
> On my machine, 10^8 takes
> ~3.8s compiled with -O2 -fvia-C -optc-O2 [or -optc-O3, doesn't make a 
> difference]
> ~2.8s compiled with -O2 [with and without -fexcess-precision]
> ~1.18s compiled with -O2 -fexcess-precision -fvia-C -optc-O3
> 
> Floating point arithmetic compiled via C profits greatly from -fexcess-
> precision (well, at least on my system, YMMV).
> 
> Alas, equivalent gcc-compiled C code takes only 0.35s for 10^8 (0.36 with 
> icc).
> 
> Multiply all timings by 10 for 10^9.
> 
> > (sophisticatedly timed with handheld stopwatch) and on the coup de grace
> > 10^9 test, it takes 2min:04secs.  Yikes!  My hardware is a little old
> > (Win XP on Pentium 4 3.06GHz w 2 GB RAM) but not THAT old.  I'm using
> > the latest Haskell Platform which includes ghc v 6.10.4.
> 
> I also have 3.06GHz P4 (2 cores, 1 GB RAM), running openSuSE 11.1 and 
> ghc-6.12.1, ghc-6.10.3 (no difference between 6.10 and 6.12 for this loop).
> The P4 isn't particularly fast, unfortunately.
> 
> >
> > Primary question:  What gives here?
> 
> GCC on XP sucks. Big time, AFAIK. Compile your stuff once via C and once 
> with the native code generator and compare. I think you'll almost always 
> find the NCG faster, sometimes very much.
> 
> >
> > Incidental questions:  Is there a nice way to time executed code in
> > Windows ala the "time" command Don shows under Linux?
> 
> There's timeit.exe, as linked to in 
> http://channel9.msdn.com/forums/Coffeehouse/258979-Windows-equivalent-of-
> UnixLinux-time-command/
> 
> > Also, does the
> > ordering of the compiler flags have any impact (I hope not, but I don't
> > want to be surprised ...)
> 
> Depends. If you give conflicting options, the last takes precedence (unless 
> some combination gives an error, don't know if that happens). If the 
> options aren't conflicting, the order doesn't matter.
> 
> >
> > Thanks,
> >
> > Travis Erdman
> 
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners




More information about the Beginners mailing list