Status of "Improved LLVM backend"

Mon Nov 28 01:43:04 UTC 2016

> On Nov 27, 2016, at 10:17 PM, Michal Terepeta <michal.terepeta at gmail.com> wrote:
> 
> > Hi,
> > 
> > I’m trying to implement a bitcode producing llvm backend[1], which would potentially
> > allow to use a range of llvm versions with ghc. However, this is only tangentially
> > relevant to the improved llvm backend, as Austin correctly pointed out[2], as there are
> > other complications besides the fragility of the textual representation.
> > 
> > So this is mostly only relevant to the improved ir you mentioned. The bitcode code gen
> > plugin right now follows mostly the textual ir generation, but tries to prevent the
> > ubiquitous symbol to i8* casting. The llvm gen turns cmm into ir, at this point however
> > at that point, the wordsize has been embedded already, which means that the current
> > textual llvm gen as well as the bitcode llvm gen try to figure out if relative access is
> > in multiple wordsizes to use llvms getElementPointer.
> 
> That sounds interesting, do you know where could I find out more about this?
> (both when it comes to the current LLVM codegen and yours)

For the llvm code gen in ghc it’s usually the `_fast` suffix functions. See [1] and
the `genStore_fast` 30 lines further down.  My bitcode llvm gen follows that file [1],
almost identically, as can be seen in [2].  However the `_fast` path is currently
disabled. 

An example of the generated ir for the current llvm backend, and the bitcode backend,
(textual ir, via llvm-dis) can be found in [3] and [4] respectively.

> 
> > I don’t know if generating llvm from stg instead of cmm would be a better
> > approach, which is what ghcjs and eta do as far as I know.
> 
> Wouldn't a step from STG to LLVM be much harder (LLVM IR is a pretty low-level
> representation compared to STG)? There are also a few passes on the Cmm level
> that seem necessary, e.g., `cmmLayoutStack`.

There is certainly a tradeoff between retaining more high-level information and 
having to lower them oneself.  If I remember luite correctly, he said he had a similar
intermediate format to cmm, just not cmm but something richer, which allows
to better target javascript.  The question basically boils down to asking if cmm is
too low-level for llvm already; the embedding of wordsizes is an example where I think
cmm might be to low-level for llvm.

—
[1]: https://github.com/ghc/ghc/blob/master/compiler/llvmGen/LlvmCodeGen/CodeGen.hs#L824
[2]: https://github.com/angerman/data-bitcode-plugin/blob/master/src/Data/BitCode/LLVM/Gen.hs
[3]: https://gist.github.com/angerman/32ce9395e73cfea3348fcc7da108cd0a
[4]: https://gist.github.com/angerman/d87db1657aac4e06a0886801aaf44329