[Haskell-cafe] [Somewhat OT] Speed

Wed Oct 29 16:22:24 EDT 2008

Richard O'Keefe wrote:
>
> On 29 Oct 2008, at 8:31 am, Andrew Coppin wrote:
>
>> Hi guys.
>>
>> This isn't specifically to do with Haskell, but... does anybody have 
>> any idea roughly how fast various CPU operations are?
>>
>> For example, is integer arithmetic faster or slower than 
>> floating-point? Is addition faster or slower than multiplication? How 
>> much slower are the trigonometric functions? etc. Does using 8-bit 
>> integers make arithmetic any faster than using wider values?
>>
>> Does anybody have a good resource for this kind of information?
>
> lmbench3
>
> http://sourceforge.net/projects/lmbench
>
> Building and running it will give you some answers for your
> machine.
> It's hard to answer your questions as they are posed,
> because of the difference between throughput and latency.
> For example, you might be able to start a new multiplication
> on every cycle on some machine (throughput is 1 multiply
> per cycle) but the result of any one of
> them might not be available for twelve cycles (latency is
> 12 cycles per multiply).
>
> Rough guesses:
>  integer adds, subtracts, and compares are fast,
>  integer multiplies and divides are much slower,
>  slow enough that compilers go to some trouble to
>  do something else when multiplying or dividing
>  by a constant.
>
>  The speed of the trig functions depends on how much hardware
>  support you have and on whether your library writer favoured
>  speed or accuracy (especially accuracy over a wide range).
>  I don't think lmbench measures these, but it wouldn't be hard
>  to add something suitable.
>
>  Using 8-bit integers is unlikely to make your program *directly*
>  faster unless you are using a compiler which is smart enough to
>  exploit SIMD instructions without programmer-written hints.  The
>  Intel C compiler _is_ that smart.  In my code I have found it to
>  be extremely good at vectorising trivial loops, with no actual
>  effect on the run-time of my programs.  However, code written
>  with the intention of exploiting that would be different.
>
>  The one thing that lmbench3 will tell you that will drop your
>  jaw and widen your eyes is the bit at the end where it tells
>  you about memory speed.  Main memory is exceeding slow compared
>  with cache.  This is why I said that switching over to 8-bit
>  integers might not make your program *directly* faster; if you
>  have a lot of data which you can pack tightly, so that as 32-bit
>  words it would not fit into L2 cache but as bytes it does, you
>  may well get a very useful speedup from that.
>
> Or you may not.  There is no substitute for measurement.

OK, well thanks for the info.

I'm not really interested in getting down to instruction-level 
scheduling. I just want to know, at a high level, will implementing my 
algorithm in integer arithmetic rather than floating-point make a 
measurable difference to overall program speed.

Actually, thinking about it, I suppose the killer question is this: Does 
changing between different numer representations make any measurable 
performance difference at all, or are Haskell programs dominated by 
cache misses?