Removing/deprecating -fvia-c

Thu Feb 18 05:13:05 EST 2010

On 17/02/2010 21:15, Scott Michel wrote:

> Depends a lot on the benchmark. The FreeBSD kernel dev crowd (one of
> whom works for me) have seen performance improvements between 10-20%
> using LLVM and clang over gcc. It also depends heavily on which
> optimization passes you have LLVM invoke -- bear in mind that LLVM is a
> compiler optimization infrastructure first and foremost.

Right, such benchmarks tend to be quickly out of date especially when 
both projects are being actively developed.  I have no vested interest 
in either - we'll use whatever suits us better, and that seems to be LLVM.

>     Even so, LLVM doesn't let us generate exactly the code we'd like: we
>     can't use GHC's tables-next-to-code optimisation. Measurements done
>     by David Terei who built the LLVM backend apparently show that this
>     doesn't matter much (~3% slower IIRC), though I'm still surprised
>     that all those extra indirections don't have more of an effect, I
>     think we need to investigate this more closely.  It's important
>     because if the LLVM backend is to be a compile-time option, we have
>     to either drop tables-next-to-code, or wait until LLVM supports
>     generating code in that style.
>
>
> This sounds like an impedance mismatch between GHC's concept of IR and
> LLVM's.

It certainly is an impedence mismatch - there's no good reason why LLVM 
couldn't generate the code we want, but its IR doesn't allow us to 
represent it.  So there's every reason to believe that this could be 
fixed in LLVM without too much difficulty.

We can work around the impedence mismatch the other way, by not using 
tables-next-to-code in GHC, but that costs us a bit in performance.

> [disclaimer: grain of salt speculation, haven't read the code]
> Tables-next-to-code has an obvious cache-friendliness property, BTW.

Oh absolutely, that's why it's not a clear win and some people argue 
that the code cache pollution should outweigh the negative effects of 
those extra indirections. Having seen the effect of branch 
mispredictions though I'm inclined to believe that those indirections 
are more expensive, though.  The cost is this: every return to a stack 
frame takes two indirections rather than one.

Of course GHC's two representations are not the only two you could 
choose - people have been designing clever ways to map code addresses to 
data structures for a long time.  If returning to a stack frame is the 
dominant operation then you would put the return address on the stack 
and use a hash table to map those to info tables.  That trades off 
mutator time against GC time, and we don't know whether it would be a 
win, but we do know it would take a lot of effort to find out.  The 
tables-next-to-code representation means that you don't have to fiddle 
around with hash tables, so it's simpler and probably faster.

> Generally, there's going to be some instruction prefetch into the cache.
> This is likely why it's faster. Otherwise, you have to warm up the data
> cache, since LLVM spills the tables into the target's constant pool.

Not sure what "spills the tables" means, but maybe that's not important.

> NCGs should be faster than plain old C. Trying to produce optimized C is
> the fool's errand, and I'm starting to agree with dropping that. My
> worry was that the C backend would be dropped in its entirety, also a
> fool's errand.

Yes, exactly.

Cheers,
	Simon