Proposal: Improving the LLVM backend by packaging it

Sat Nov 1 17:31:29 UTC 2014

Austin Seipp <austin at well-typed.com> writes:

> Joachim, thanks for the forward and discussion.
>
> Just to rehash two points for the people reading at home:
>
>  - I *do not* want to ship GHC specific patches to LLVM in the builds
> we use, anymore than anyone else does. I don't have any plans or even
> patches I would apply right now. A stock LLVM is ideal - one that's
> just been picked to work well by us. Even if it has some bugs or
> workarounds are needed, that's probably OK.
>
I agree. Shipping a known-stable LLVM is one thing; patching our own
LLVM is quite another. Patching LLVM should be avoided if at all
possible. Thankfully LLVM is quite modular so this shouldn't be so
difficult.

[snip]

> I'd love to work more with LLVM upstream to fix problems... but the
> time to do so is pretty limited for most of us, I think, and the
> current backend has real issues in the design that cooperation just
> can't fundamentally fix - cooperation can't fix the fact a new release
> may change IR semantics and break existing GHC releases, for example.
> Users will simply suffer from that. And some of those changes may not
> be totally trivial to accommodate (as Ben's recent work shows).
>
While this is technically true I wonder whether IR changes will be a
persistent problem going forward. I don't have a deep knowledge of the
history of the LLVM IR but my impression is that the maintainers are
fairly deliberate in their consideration of sematic changes (despite the
arguable `symbol_offset` and `prefix_data` mis-step in 3.5). The alias
change was an unexpected turn but frankly our previous use of aliases
was a bit odd and was never supposed to work in the first place
(something is amiss when you are relying on the optimizer to elide
aliases to produce valid code).

The alias rework and (hopefully) upcoming TNTC rework make me optimistic
that our use of LLVM moving closer to how the interfaces are designed to
be used. Hopefully this will be accompanied by a corresponding
improvement in maintainability.

There are other reasons besides IR instability that we might want to
distribute our own LLVM. These might include,

 * Decoupling GHC from changes in LLVM's optimization passes

 * Wanting to ship own optimization passes that need to link against
   LLVM.

 * Wanting to use a library like llvm-general in GHC

 * Wanting to use leverage related libraries such as Polly

I'm not sure how these weigh against the maintenance and packaging costs
of shipping our own LLVM.  I've not seen evidence that changes in LLVM's
optimizer have hurt us in the past; then again if we are going to be
more selective about which optimizations we ask LLVM to perform perhaps
this will become a bigger concern in the future.

Shipping our own passes would be great and is probably necessary to
continue to improve performance. It not clear whether LLVM's analysis
pass interface is stable enough to facilitate this without shipping our
own LLVM. Max's analysis is now three years old; I'll try dusting off
the code and see how bad the damage is.

It's not clear to me that we really want to add a dependency on another
library to GHC. Being able to leverage things like Polly sounds tempting
but adding another moving part to LlvmGen will likely incur a
maintenance cost. Moreover, the fact that there still isn't a
llvm-general release targeting LLVM 3.5 is a bit worrying.

As a quite note, I spoke briefly with a few Rustaceans and they report
that they were hoping to ultimately avoid shipping an LLVM with
rustc. At this point Rust doesn't have an active packaging effort so
perhaps the rustc precedent isn't as useful as I originally thought.

Cheers,

- Ben
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 472 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20141101/1a5fe5b3/attachment.sig>