[Haskell-cafe] blas bindings, why are they so much slower the C?

Jules Bean jules at jellybean.co.uk
Wed Jun 18 13:03:42 EDT 2008


Anatoly Yakovenko wrote:
>>> #include <cblas.h>
>>> #include <stdlib.h>
>>>
>>> int main() {
>>>   int size = 1024;
>>>   int ii = 0;
>>>   double* v1 = malloc(sizeof(double) * (size));
>>>   double* v2 = malloc(sizeof(double) * (size));
>>>   for(ii = 0; ii < size*size; ++ii) {
>>>      double _dd = cblas_ddot(0, v1, size, v2, size);
>>>   }
>>>   free(v1);
>>>   free(v2);
>>> }
>> Your C compiler sees that you're not using the result of cblas_ddot,
>> so it doesn't even bother to call it. That loop never gets run. All
>> your program does at runtime is call malloc and free twice, which is
>> very fast :-)
> 
> C doesn't work like that :). 

C compilers can do what they like ;)

GCC in particular is pretty good at removing dead code, including entire 
loops. However it shouldn't eliminate the call to cblas_ddot unless it 
thinks cblas_ddot has no side effects at all, which would be surprising 
unless it's inlined somehow.

Jules


More information about the Haskell-Cafe mailing list