[Haskell-cafe] fast Eucl. dist. - Haskell vs C

Claus Reinke claus.reinke at talk21.com
Tue May 19 07:30:00 EDT 2009

> I understand from your later post that is was in fact specialized, but
> how do I make sure it _is_ specialized? 

-ddump-tc seems to give the generalized type, so it seems you'd need
to look at the -ddump-simpl output if you want to know whether a
local function is specialized.


> Can I just add a type signature in the dist_fast definition for euclidean, 

If you need it at exactly one type, that is the simplest way. There's
also the SPECIALIZE pragma


and for local and non-exported functions '-O' should enable auto-specialization


>> After that, unrolling the fused fold loop (uvector internal) might  
>> help a bit, but isn't there yet:
>> http://hackage.haskell.org/trac/ghc/ticket/3123
>> http://hackage.haskell.org/trac/ghc/wiki/Inlining
>> And even if that gets implemented, it doesn't apply directly to your
>> case, where the loop is in a library, but you might want to control  
>> its unrolling in your client code. Having the loop unrolled by a default
>> factor (8x or so) should help for loops like this, with little  
>> computation.
> This seems rather serious, and might be one of the bigger reasons why
> I'm getting nowhere close to C in terms of performance...

You should be able to get near (a small factor) without it, but not having 
it leaves a substantial gap in performance, especially for simple loop 
bodies (there is also the case of enabling fusion over multiple loop 
iterations, but that may call for proper fix-point fusion).

> The loop body is ridiculously small, so it would make sense to
> unroll it somewhat to help avoid the loop overhead.
> However, it seems like GHC isn't able to do that now.

Apart from the links above, the new backend also has relevant TODO
items: http://hackage.haskell.org/trac/ghc/wiki/BackEndNotes

> Is there any way to unroll the loop myself, to speed things up?
> Seems hard, because I'm using uvector...

You could 'cabal unpack' uvector, hand-unroll the core loop all
its operations get fused into, then reinstall the modified package..
(perhaps that should be a package configuration option, at least
until GHC gets loop or recursion unrolling - since this would be
a temporary workaround, it would probably not be worth it to 
have multiple package versions with different unroll factors; if
this actually helps uvector users with performance in practice, 
they could suggest it as a patch).


More information about the Haskell-Cafe mailing list