[Haskell-cafe] blas bindings, why are they so much slower the C?

David Roundy droundy at darcs.net
Wed Jun 18 13:06:17 EDT 2008


On Wed, Jun 18, 2008 at 06:03:42PM +0100, Jules Bean wrote:
> Anatoly Yakovenko wrote:
> >>>#include <cblas.h>
> >>>#include <stdlib.h>
> >>>
> >>>int main() {
> >>>  int size = 1024;
> >>>  int ii = 0;
> >>>  double* v1 = malloc(sizeof(double) * (size));
> >>>  double* v2 = malloc(sizeof(double) * (size));
> >>>  for(ii = 0; ii < size*size; ++ii) {
> >>>     double _dd = cblas_ddot(0, v1, size, v2, size);
> >>>  }
> >>>  free(v1);
> >>>  free(v2);
> >>>}
> >>Your C compiler sees that you're not using the result of cblas_ddot,
> >>so it doesn't even bother to call it. That loop never gets run. All
> >>your program does at runtime is call malloc and free twice, which is
> >>very fast :-)
> >
> >C doesn't work like that :). 
> 
> C compilers can do what they like ;)
> 
> GCC in particular is pretty good at removing dead code, including entire 
> loops. However it shouldn't eliminate the call to cblas_ddot unless it 
> thinks cblas_ddot has no side effects at all, which would be surprising 
> unless it's inlined somehow.

Or unless it's been annotated as pure, which it should be.

David


More information about the Haskell-Cafe mailing list