The Curious Case of T6084 -or- Register Confusion with LLVM

Wed Sep 20 09:44:33 UTC 2017

Hi *,

TLDR: The LLVM backend might confuse floating registers in GHC.

# Demo (Ticket #14251)

Let Demo.hs be the following short program (a minor modification from T6084):
```
{-# LANGUAGE MagicHash, BangPatterns #-}
module Main where

import GHC.Exts

{-# NOINLINE f #-}
f :: (Int# -> Float# -> Double# -> Float# -> Double# -> String) -> String
f g = g 3# 4.0# 5.0## 6.0# 6.9## ++ " World!"

{-# NOINLINE q #-}
q :: Int# -> Float# -> Double# -> Float# -> Double# -> String
q i j k l m = "Hello " ++ show (F# l) ++ " " ++ show (D# m)

main = putStrLn (f $ q)
```

What happens if we compile them with the NCG and LLVM?

$ ghc -fasm -fforce-recomp Demo.hs -O2 -o Demo-ncg && ./Demo-ncg
Hello 6.0 6.9 World!

$ ghc -fllvm -fforce-recomp Demo.hs -O2 -o Demo-llvm && ./Demo-llvm
Hello 4.0 5.0 World!

# Discussion

What is happening here?  The LLVM backend passes the registers in arguments,
which are then mapped to registers via the GHC calling convention we added
to LLVM.

As the LLVM backend takes off from Cmm, we produce function that always hold
the always live registers (on x86_64 these are: Base, Sp, Hp, R1, R2, R3, R4, R5, R6, SpLim)
and appends those registers that are live throughout the function call: in the
case of `q` this is one Float and one Double register. Let’s assume these are
F3 and D4.  Thus the function signature we generate looks like:

ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double)

And expect the passed arguments to represent the following registers:

              base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f3, d4

as we found that f1 and d1 are not live.

Yet, when we call `q` in the form of `g` in the body of `f`. We will pass it 14 arguments
instead of 12.  To make this “typecheck” in LLVM, we

@q' = bitcast @q to (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double)

and call @q’(base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f1, d2, f3, d4).

at this point, we now assign f3 <- f1 and d4 <- d2; while silently ignoring
the passed arguments f3 and d4.

(This is where my llvmng backend fell over, as it does not bitcast function
 signatures but tries to unify them.)

# Solution?

Initially, Ben and I though we could simply always pass all registers as
arguments in LLVM and call it a day with the downside of create more verbose
but correct code.  As I found out, that comes with a few complications.  For
some reason, all active stg registers for my machine give me 

  Base, Sp, Hp, R1, R2, R3, R4, R5, R6, SpLim,
  F1, D1, F2, D2, F3, D3,
  XMM1,XMM2,XMM3,XMM4,XMM5,XMM6,
  YMM1,YMM2,YMM3,YMM4,YMM5,YMM6,
  ZMM1,ZMM2,ZMM3,ZMM4,ZMM5,ZMM6

I should not have the YMM*, and ZMM* registers as I don’t have any AVX nor AVX512;
that looks like only a patch away.  However we try to optimize our register, such
that we can pass up to six doubles or six floats or any combination of both if needed
in registers, without having to allocate them on the stack, by assuming overlapping
registers (See Note [Overlapping global registers]).

And as such a full function signature in LLVM would as opposed to one that’s based on
the “live” registers as we have right now, would consist of 12 float/double registers,
and LLVM only maps 6.  My current idea is to, pass only the explicit F1,D1,…,F3,D3
and try to disable the register overlapping for LLVM.  This would probably force more
floating values to be stack allocated rather than passed via registers, but would
likely guarantee that the registers match up.  The other option I can think of is to
define some viertual generic floating registers in the llvm code gen: V1,…,V6
and then perform something like

  F1 <- V1 as float
  D1 <- V1 as double

in the body of the function, while trying to use the `live` information at the call site
to decide which of F1 or D1 to pass as V1.

Ideas?

Cheers,
 Moritz