marlowsd at gmail.com
Thu Feb 18 05:13:05 EST 2010
On 17/02/2010 21:15, Scott Michel wrote:
> Depends a lot on the benchmark. The FreeBSD kernel dev crowd (one of
> whom works for me) have seen performance improvements between 10-20%
> using LLVM and clang over gcc. It also depends heavily on which
> optimization passes you have LLVM invoke -- bear in mind that LLVM is a
> compiler optimization infrastructure first and foremost.
Right, such benchmarks tend to be quickly out of date especially when
both projects are being actively developed. I have no vested interest
in either - we'll use whatever suits us better, and that seems to be LLVM.
> Even so, LLVM doesn't let us generate exactly the code we'd like: we
> can't use GHC's tables-next-to-code optimisation. Measurements done
> by David Terei who built the LLVM backend apparently show that this
> doesn't matter much (~3% slower IIRC), though I'm still surprised
> that all those extra indirections don't have more of an effect, I
> think we need to investigate this more closely. It's important
> because if the LLVM backend is to be a compile-time option, we have
> to either drop tables-next-to-code, or wait until LLVM supports
> generating code in that style.
> This sounds like an impedance mismatch between GHC's concept of IR and
It certainly is an impedence mismatch - there's no good reason why LLVM
couldn't generate the code we want, but its IR doesn't allow us to
represent it. So there's every reason to believe that this could be
fixed in LLVM without too much difficulty.
We can work around the impedence mismatch the other way, by not using
tables-next-to-code in GHC, but that costs us a bit in performance.
> [disclaimer: grain of salt speculation, haven't read the code]
> Tables-next-to-code has an obvious cache-friendliness property, BTW.
Oh absolutely, that's why it's not a clear win and some people argue
that the code cache pollution should outweigh the negative effects of
those extra indirections. Having seen the effect of branch
mispredictions though I'm inclined to believe that those indirections
are more expensive, though. The cost is this: every return to a stack
frame takes two indirections rather than one.
Of course GHC's two representations are not the only two you could
choose - people have been designing clever ways to map code addresses to
data structures for a long time. If returning to a stack frame is the
dominant operation then you would put the return address on the stack
and use a hash table to map those to info tables. That trades off
mutator time against GC time, and we don't know whether it would be a
win, but we do know it would take a lot of effort to find out. The
tables-next-to-code representation means that you don't have to fiddle
around with hash tables, so it's simpler and probably faster.
> Generally, there's going to be some instruction prefetch into the cache.
> This is likely why it's faster. Otherwise, you have to warm up the data
> cache, since LLVM spills the tables into the target's constant pool.
Not sure what "spills the tables" means, but maybe that's not important.
> NCGs should be faster than plain old C. Trying to produce optimized C is
> the fool's errand, and I'm starting to agree with dropping that. My
> worry was that the C backend would be dropped in its entirety, also a
> fool's errand.
More information about the Glasgow-haskell-users