<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div><span></span></div><div>The issue is at the function definition. In the price point splitting code we determine that the F1 and D2 registers are not actually used in the body of `q`.  And as such optimize the set of live register tees from R1, F1, D2, F3, D4 to R1, F3, D4.</div><div><br></div><div>Thus in <a href="https://phabricator.haskell.org/D4003">https://phabricator.haskell.org/D4003</a></div><div>I simply retain the live registers of the top proc instead of updating them to the optimized set.</div><div><br></div><div>As such we generate the correct function signature in the llvm backend.</div><div><br><div id="AppleMailSignature">Sent from my iPhone</div><div><br>On 22 Sep 2017, at 2:08 AM, Kavon Farvardin <<a href="mailto:kavon@farvard.in">kavon@farvard.in</a>> wrote:<br><br></div><blockquote type="cite"><div><span>Let me elaborate a bit more because I clearly missed some points you already made in your original message. Sorry about that:</span><br><span></span><br><span></span><br><span>I don't think we need a heavyweight solution to this problem (the suggestions of: disabling overlapping registers for LLVM, or adding a new virtual register class Vx).</span><br><span></span><br><span>Instead, let's first remember how the type of the called function pointer corresponds to its calling convention when it is lowered to assembly in LLVM. In our GHC calling convention in LLVM, we can specify that</span><br><span></span><br><span>    if type == float OR type == double, use:</span><br><span>        XMM1,XMM2,XMM3,XMM4,XMM5,XMM6</span><br><span></span><br><span>When a calling convention is being determined by LLVM for any function definition or call, it goes in order from left to right in the list of parameters, and assigns float or double arguments to the first currently available register in that XMM list.</span><br><span></span><br><span>So, if `q` were indeed using F3 and D4 to accept its first two floating point arguments, the function signature we generate,</span><br><span></span><br><span>    ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double)</span><br><span></span><br><span>is wrong. The registers for the `float, double` arguments will be assigned to XMM1 and XMM2 by LLVM. Since F3 and D4 use XMM3 and XMM4, respectively, we should have padded out the type of `q` in LLVM to be:</span><br><span></span><br><span></span><br><span>    ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double)</span><br><span></span><br><span>where the first `float, double` parameters are now unused. We would also perform the same type of padding at every call site where the first two float arguments are F3 and D4, so that they end up in the right physical registers.</span><br><span>We pass `undef` for the first two `float, double` arguments.</span><br><span></span><br><span></span><br><span></span><br><blockquote type="cite"><span>On Sep 21, 2017, at 12:32 PM, Kavon Farvardin <<a href="mailto:kavon@farvard.in">kavon@farvard.in</a>> wrote:</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Responses are inline below:</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><blockquote type="cite"><span>As the LLVM backend takes off from Cmm, we produce function that always hold</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>the always live registers (on x86_64 these are: Base, Sp, Hp, R1, R2, R3, R4, R5, R6, SpLim)</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>and appends those registers that are live throughout the function call: in the</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>case of `q` this is one Float and one Double register.</span><br></blockquote></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>To be more precise, we append only the live floating point or vector arguments to this always live list. We need to do this because of overlapping register usage in our calling convention on x86-64 (F1 and D1 are both put in XMM1). See Note [Overlapping global registers] for details.</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><blockquote type="cite"><span>Let’s assume these are F3 and D4.  Thus the function signature we generate looks like:</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double)</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>And expect the passed arguments to represent the following registers:</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>            base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f3, d4</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>as we found that f1 and d1 are not live.</span><br></blockquote></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>I think it's wrong to assume that `q` accepts its first two floating-point arguments in F3 and D4, because I'm pretty sure the standard Cmm calling convention assigns them to F1 and D2, respectively. Are we actually outputting `q` such that F3 and D4 are used?</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><blockquote type="cite"><span>(This is where my llvmng backend fell over, as it does not bitcast function</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>signatures but tries to unify them.)</span><br></blockquote></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>I think to solve this problem, we'll want to bitcast functions whenever we call them, because the type of an LLVM function is important for us to get the calling convention correct.</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>~kavon</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><blockquote type="cite"><span>On Sep 20, 2017, at 4:44 AM, Moritz Angermann <<a href="mailto:moritz.angermann@gmail.com">moritz.angermann@gmail.com</a>> wrote:</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Hi *,</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>TLDR: The LLVM backend might confuse floating registers in GHC.</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span># Demo (Ticket #14251)</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Let Demo.hs be the following short program (a minor modification from T6084):</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>```</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>{-# LANGUAGE MagicHash, BangPatterns #-}</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>module Main where</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>import GHC.Exts</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>{-# NOINLINE f #-}</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>f :: (Int# -> Float# -> Double# -> Float# -> Double# -> String) -> String</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>f g = g 3# 4.0# 5.0## 6.0# 6.9## ++ " World!"</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>{-# NOINLINE q #-}</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>q :: Int# -> Float# -> Double# -> Float# -> Double# -> String</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>q i j k l m = "Hello " ++ show (F# l) ++ " " ++ show (D# m)</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>main = putStrLn (f $ q)</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>```</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>What happens if we compile them with the NCG and LLVM?</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>$ ghc -fasm -fforce-recomp Demo.hs -O2 -o Demo-ncg && ./Demo-ncg</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Hello 6.0 6.9 World!</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>$ ghc -fllvm -fforce-recomp Demo.hs -O2 -o Demo-llvm && ./Demo-llvm</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Hello 4.0 5.0 World!</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span># Discussion</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>What is happening here?  The LLVM backend passes the registers in arguments,</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>which are then mapped to registers via the GHC calling convention we added</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>to LLVM.</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>As the LLVM backend takes off from Cmm, we produce function that always hold</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>the always live registers (on x86_64 these are: Base, Sp, Hp, R1, R2, R3, R4, R5, R6, SpLim)</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>and appends those registers that are live throughout the function call: in the</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>case of `q` this is one Float and one Double register. Let’s assume these are</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>F3 and D4.  Thus the function signature we generate looks like:</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double)</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>And expect the passed arguments to represent the following registers:</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>            base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f3, d4</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>as we found that f1 and d1 are not live.</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Yet, when we call `q` in the form of `g` in the body of `f`. We will pass it 14 arguments</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>instead of 12.  To make this “typecheck” in LLVM, we</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>@q' = bitcast @q to (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double)</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>and call @q’(base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f1, d2, f3, d4).</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>at this point, we now assign f3 <- f1 and d4 <- d2; while silently ignoring</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>the passed arguments f3 and d4.</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>(This is where my llvmng backend fell over, as it does not bitcast function</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>signatures but tries to unify them.)</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span># Solution?</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Initially, Ben and I though we could simply always pass all registers as</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>arguments in LLVM and call it a day with the downside of create more verbose</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>but correct code.  As I found out, that comes with a few complications.  For</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>some reason, all active stg registers for my machine give me </span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Base, Sp, Hp, R1, R2, R3, R4, R5, R6, SpLim,</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>F1, D1, F2, D2, F3, D3,</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>XMM1,XMM2,XMM3,XMM4,XMM5,XMM6,</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>YMM1,YMM2,YMM3,YMM4,YMM5,YMM6,</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>ZMM1,ZMM2,ZMM3,ZMM4,ZMM5,ZMM6</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>I should not have the YMM*, and ZMM* registers as I don’t have any AVX nor AVX512;</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>that looks like only a patch away.  However we try to optimize our register, such</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>that we can pass up to six doubles or six floats or any combination of both if needed</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>in registers, without having to allocate them on the stack, by assuming overlapping</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>registers (See Note [Overlapping global registers]).</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>And as such a full function signature in LLVM would as opposed to one that’s based on</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>the “live” registers as we have right now, would consist of 12 float/double registers,</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>and LLVM only maps 6.  My current idea is to, pass only the explicit F1,D1,…,F3,D3</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>and try to disable the register overlapping for LLVM.  This would probably force more</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>floating values to be stack allocated rather than passed via registers, but would</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>likely guarantee that the registers match up.  The other option I can think of is to</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>define some viertual generic floating registers in the llvm code gen: V1,…,V6</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>and then perform something like</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>F1 <- V1 as float</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>D1 <- V1 as double</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>in the body of the function, while trying to use the `live` information at the call site</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>to decide which of F1 or D1 to pass as V1.</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Ideas?</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Cheers,</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Moritz</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>_______________________________________________</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>ghc-devs mailing list</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span><a href="mailto:ghc-devs@haskell.org">ghc-devs@haskell.org</a></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span><a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs">http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs</a></span><br></blockquote></blockquote><blockquote type="cite"><span></span><br></blockquote><span></span><br></div></blockquote></div></body></html>