ANN: preview of alternative llvm code gen

Tue Sep 19 06:38:45 UTC 2017

Hi, 

as presented during the HIW at ICFP 2016, and ICFP 2017, I’ve
been spending a some time on an alternative llvm backend.

GHC used LLVM as an external tool, and communicates with LLVM
via LLVMs Intermediate Representation.  For this the LLVM
backend in GHC (via `-fllvm`) writes out text files containing
the textual LLVM IR, and feeds those into LLVMs the `opt` and
`llc` tools to produce the final machine code.

The textual IR has been changing quite a bit for each release
in the past. But seems to have been quite stable from LLVM3.9
through LLVM4 and now LLVM5.

However LLVM also has a binary IR called LLVM Bitcode, which
is a very stable format and the one they advise to work against.
Sadly, the documentation on it is mostly contained in the
BitcodeReader and BitcodeWriter C++ files from the LLVM project.

As Cmm hands out only labels for function without function
signatures, and the LLVM backend wants to be a good consumer,
we assume functions to always be pointers to int8 (i8*), as
LLVM expects them to be properly typed.  Of course when
defining functions we do have the full function signature and
also need it to pass arguments.  Yet to make i8* assumption
work, we create aliases for each function we define, that are
of type i8*.

On macOS (and iOS) where the mach-o format is used, the linker
uses a feature called `subsections_via_symbols` to strip dead
code. This works, by assuming that all code between two symbols
belongs to the first symbol.  GHCs use of Tables Next To Code,
and the `prefixdata` feature that was added to llvm just for
this purpose however put data in front of symbols, and the
linker will strip that data if it determines the previous symbol
is not used. Thus we can not strip any code on macOS or iOS
produced by GHC. The NCG inserts `$dsp` suffix symbols in front
of the TNTC data, and marks those as used, to work around the
dead-strip issue. The current llvm backend uses the LLVMManger
to strip out the `.symbols_via_subsections` directive from the
generated assembly prior to handing it off to the assembler.

I therefore sat out to try and see if I could fix some of
the issues we ran into. This has lead to building three libraries:

- data-bitcode (github.com/angerman/data-bitcode)
  A bitcode reader/writer. In itself Bitcode is a rather simple
  encoding format. It is based on sequences of bits as opposed
  to bytes. And encoded so called blocks (with IDs), that can
  give meaning to records (think: structs) that are in those
  blocks. It also comes with a compression mechanism, where one
  can define so called abbreviated records (think: packed structs).

- data-bitcode-llvm (github.com/angerman/data-bitcode-llvm)
  A package to model LLVM modules, and lower them into the
  bitcode AST

- data-bitcode-edsl (github.com/angerman/data-bitcode-edsl)
  A Haskell EDSL, that allows to construct LLVM modules. E.g.

          testModule :: Module
          testModule = mod "undef"
            [ def "main" ([i32, ptr =<< i8ptr] --> i32) $ \[ argc, argv ] -> mdo
                block "entry" $ do
                  mem <- undef =<< (arr 10 =<< i8)
                  memG <- global (mutable . private) "mem" mem
                  ptr <- gep memG =<< sequence [int32 0, int32 0]
                  memset <- fun "llvm.memset.p0i8.i32" =<< [i8ptr, i8, i32, i32, i1] --> void
                  ccall memset =<< (ptr:) <$> sequence [ int8 0, int32 10, int32 4, int 1 0 ]
                  ret =<< int32 0
                pure ()
            ]

  More examples can be found in github.com/angerman/data-bitcode-edsl/blob/master/test/EDSLSpec.hs

With this in hand, I went a head and ported the existing
llvm code gen to use the EDSL instead of concatenating
strings.

After this introduction, I’m now pleased to inform you that
the `llvmng` backend now passes fast and slow validation, with
the exception of the peculiar case of T6084 (ghc.haskell.org/t/6084)
where the callee and caller signatures do not match up, and
this causes the `llvmng` backend to topple over.

Another note is in order: the llvmng backend is currently 
quite a bit more memory hungry and time consuming than the
current llvm backend for non trivial modules. We do get dead-strippable
binaries through (for `main = putStrLn “hello world”`):

  1.2M Main8.2-llvm
  972k MainHEAD-llvmng

The relevant code can be found in github.com/zw3rk/ghc/tree/llvm-ng

Cheers,
 Moritz

Validation Results Below:
fast validation:

Unexpected results from:
TEST="MultiLayerModules T12707 T13379 T13701 T13719 T6084 T9630"

SUMMARY for test run started at Tue Sep 19 12:53:34 2017 +08
 0:08:06 spent to go through
    6077 total tests, which gave rise to
   23781 test cases, of which
   16423 were skipped

      17 had missing libraries
    2392 expected passes
      32 expected failures

       0 caused framework failures
       0 caused framework warnings
       0 unexpected passes
       1 unexpected failures
       6 unexpected stat failures

Unexpected failures:
   codeGen/should_run/T6084.run  T6084 [exit code non-0] (llvmng)

Unexpected stat failures:
   perf/compiler/T13379.run             T13379 [stat not good enough] (llvmng)
   perf/compiler/T9630.run              T9630 [stat not good enough] (llvmng)
   perf/compiler/T12707.run             T12707 [stat not good enough] (llvmng)
   perf/compiler/MultiLayerModules.run  MultiLayerModules [stat not good enough] (llvmng)
   perf/compiler/T13719.run             T13719 [stat not good enough] (llvmng)
   perf/compiler/T13701.run             T13701 [stat not good enough] (llvmng)

slow validation:

Unexpected results from:
TEST="MultiLayerModules T12707 T13379 T13701 T13719 T6084 T9630"

SUMMARY for test run started at Tue Sep 19 13:14:18 2017 +08
 0:28:57 spent to go through
    6077 total tests, which gave rise to
   23908 test cases, of which
   16302 were skipped

      69 had missing libraries
    6223 expected passes
      72 expected failures

       0 caused framework failures
       0 caused framework warnings
       0 unexpected passes
       3 unexpected failures
      12 unexpected stat failures

Unexpected failures:
   codeGen/should_run/T6084.run  T6084 [exit code non-0] (llvmng)
   codeGen/should_run/T6084.run  T6084 [exit code non-0] (llvmng)
   codeGen/should_run/T6084.run  T6084 [exit code non-0] (llvmng)

Unexpected stat failures:
   perf/compiler/T13379.run             T13379 [stat not good enough] (llvmng)
   perf/compiler/T13379.run             T13379 [stat not good enough] (llvmng)
   perf/compiler/T9630.run              T9630 [stat not good enough] (llvmng)
   perf/compiler/T12707.run             T12707 [stat not good enough] (llvmng)
   perf/compiler/T9630.run              T9630 [stat not good enough] (llvmng)
   perf/compiler/T12707.run             T12707 [stat not good enough] (llvmng)
   perf/compiler/T13719.run             T13719 [stat not good enough] (llvmng)
   perf/compiler/MultiLayerModules.run  MultiLayerModules [stat not good enough] (llvmng)
   perf/compiler/T13701.run             T13701 [stat not good enough] (llvmng)
   perf/compiler/T13719.run             T13719 [stat not good enough] (llvmng)
   perf/compiler/MultiLayerModules.run  MultiLayerModules [stat not good enough] (llvmng)
   perf/compiler/T13701.run             T13701 [stat not good enough] (llvmng)