[Haskell-cafe] blas bindings, why are they so much slower the C?
Jules Bean
jules at jellybean.co.uk
Wed Jun 18 13:03:42 EDT 2008
Anatoly Yakovenko wrote:
>>> #include <cblas.h>
>>> #include <stdlib.h>
>>>
>>> int main() {
>>> int size = 1024;
>>> int ii = 0;
>>> double* v1 = malloc(sizeof(double) * (size));
>>> double* v2 = malloc(sizeof(double) * (size));
>>> for(ii = 0; ii < size*size; ++ii) {
>>> double _dd = cblas_ddot(0, v1, size, v2, size);
>>> }
>>> free(v1);
>>> free(v2);
>>> }
>> Your C compiler sees that you're not using the result of cblas_ddot,
>> so it doesn't even bother to call it. That loop never gets run. All
>> your program does at runtime is call malloc and free twice, which is
>> very fast :-)
>
> C doesn't work like that :).
C compilers can do what they like ;)
GCC in particular is pretty good at removing dead code, including entire
loops. However it shouldn't eliminate the call to cblas_ddot unless it
thinks cblas_ddot has no side effects at all, which would be surprising
unless it's inlined somehow.
Jules
More information about the Haskell-Cafe
mailing list