[Haskell-cafe] fast Eucl. dist. - Haskell vs C
claus.reinke at talk21.com
Tue May 19 07:30:00 EDT 2009
> I understand from your later post that is was in fact specialized, but
> how do I make sure it _is_ specialized?
-ddump-tc seems to give the generalized type, so it seems you'd need
to look at the -ddump-simpl output if you want to know whether a
local function is specialized.
> Can I just add a type signature in the dist_fast definition for euclidean,
If you need it at exactly one type, that is the simplest way. There's
also the SPECIALIZE pragma
and for local and non-exported functions '-O' should enable auto-specialization
>> After that, unrolling the fused fold loop (uvector internal) might
>> help a bit, but isn't there yet:
>> And even if that gets implemented, it doesn't apply directly to your
>> case, where the loop is in a library, but you might want to control
>> its unrolling in your client code. Having the loop unrolled by a default
>> factor (8x or so) should help for loops like this, with little
> This seems rather serious, and might be one of the bigger reasons why
> I'm getting nowhere close to C in terms of performance...
You should be able to get near (a small factor) without it, but not having
it leaves a substantial gap in performance, especially for simple loop
bodies (there is also the case of enabling fusion over multiple loop
iterations, but that may call for proper fix-point fusion).
> The loop body is ridiculously small, so it would make sense to
> unroll it somewhat to help avoid the loop overhead.
> However, it seems like GHC isn't able to do that now.
Apart from the links above, the new backend also has relevant TODO
> Is there any way to unroll the loop myself, to speed things up?
> Seems hard, because I'm using uvector...
You could 'cabal unpack' uvector, hand-unroll the core loop all
its operations get fused into, then reinstall the modified package..
(perhaps that should be a package configuration option, at least
until GHC gets loop or recursion unrolling - since this would be
a temporary workaround, it would probably not be worth it to
have multiple package versions with different unroll factors; if
this actually helps uvector users with performance in practice,
they could suggest it as a patch).
More information about the Haskell-Cafe