[Haskell-cafe] Re: Known Unknowns
simonmarhaskell at gmail.com
Tue Jan 31 09:59:49 EST 2006
Donald Bruce Stewart wrote:
>>Donald Bruce Stewart wrote:
>>>>There is a new combined benchmark, "partial sums" that subsumes several earlier
>>>>benchmarks and runs 9 different numerical calculations:
>>>Ah! I had an entry too. I've posted it on the wiki. I was careful to
>>>watch that all loops are compiled into nice unboxed ones in the Core. It
>>>seems to run a little bit faster than your more abstracted code.
>>>Timings on the page.
>>>Also, -fasm seems to only be a benefit on the Mac, as you've pointed out
>>>previously. Maybe you could check the times on the Mac too?
>>Yeah. I had not tried all the compiler options. Using -fasm is slower on this
>>for me as well. I suspect that since your code will beat the entries that have
>>been posted so far, so I thin you should submit it.
> ok, I'll submit it.
>>Also, could you explain how to check the Core (un)boxing in a note on the (new?)
>>wiki? I would be interested in learning that trick.
> Ah, i just do: ghc A.hs -O2 -ddump-simpl | less
> and then read the Core, keeping an eye on the functions I'm interested
> in, and checking they're compiling to the kind of loops I'd write by
> hand. This is particularly useful for the kinds of tight numeric loops
> used in some of the shootout entries.
Some comments on this: I couldn't get it to go any faster (1-2% is all,
with some really ugly hacks). It comes down to good low-level loop
optimisation, which GHC doesn't do.
You could improve things by passing the array around rather than having
it as a global, because then it can be unpacked - make sure you seq the
array in the right places, check the Core to be sure. I didn't try
this, and it might only improve things marginally.
-fexcess-precision is required when compiling via C. It should only be
necessary on x86, but 6.4.1 and earlier require it on all platforms (we
fixed that recently).
gcc -O2 is about 15% better than -fasm on x86_64 here.
More information about the Haskell-Cafe