LLVM and dynamic linking
Carter Schonwald
carter.schonwald at gmail.com
Fri Dec 27 20:41:10 UTC 2013
great work! :)
On Fri, Dec 27, 2013 at 3:21 PM, Ben Gamari <bgamari.foss at gmail.com> wrote:
> Simon Marlow <marlowsd at gmail.com> writes:
>
> > This sounds right to me. Did you submit a patch?
> >
> > Note that dynamic linking with LLVM is likely to produce significantly
> > worse code that with the NCG right now, because the LLVM back end uses
> > dynamic references even for symbols in the same package, whereas the NCG
> > back-end uses direct static references for these.
> >
> Today with the help of Edward Yang I examined the code produced by the
> LLVM backend in light of this statement. I was surprised to find that
> LLVM's code appears to be no worse than the NCG with respect to
> intra-package references.
>
> My test case can be found here[2] and can be built with the included
> `build.sh` script. The test consists of two modules build into a shared
> library. One module, `LibTest`, exports a few simple members while the
> other module (`LibTest2`) defines members that consume them. Care is
> taken to ensure the members are not inlined.
>
> The tests were done on x86_64 running LLVM 3.4 and GHC HEAD with the
> patches[1] I referred to in my last message. Please let me know if I've
> missed something.
>
>
>
> # Evaluation
>
> ## First example ##
>
> The first member is a simple `String` (defined in `LibTest`),
>
> helloWorld :: String
> helloWorld = "Hello World!"
>
> The use-site is quite straightforward,
>
> testHelloWorld :: IO String
> testHelloWorld = return helloWorld
>
> With `-O1` the code looks reasonable in both cases. Most importantly,
> both backends use IP relative addressing to find the symbol.
>
> ### LLVM ###
>
> 0000000000000ef8 <rKw_info>:
> ef8: 48 8b 45 00 mov 0x0(%rbp),%rax
> efc: 48 8d 1d cd 11 20 00 lea 0x2011cd(%rip),%rbx
> # 2020d0 <libtestzm0zi1zi0zi0_LibTest_helloWorld_closure>
> f03: ff e0 jmpq *%rax
>
> 0000000000000f28 <libtestzm0zi1zi0zi0_LibTest2_testHelloWorld_info>:
> f28: eb ce jmp ef8 <rKw_info>
> f2a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
>
> ### NCG ###
>
> 0000000000000d58 <rH1_info>:
> d58: 48 8d 1d 71 13 20 00 lea 0x201371(%rip),%rbx
> # 2020d0 <libtestzm0zi1zi0zi0_LibTest_helloWorld_closure>
> d5f: ff 65 00 jmpq *0x0(%rbp)
>
> 0000000000000d88 <libtestzm0zi1zi0zi0_LibTest2_testHelloWorld_info>:
> d88: eb ce jmp d58 <rH1_info>
>
>
> With `-O0` the code is substantially longer but the relocation behavior
> is still correct, as one would expect.
>
> Looking at the definition of `helloWorld`[3] itself it becomes clear that
> the LLVM backend is more likely to use PLT relocations over GOT. In
> general, `stg_*` primitives are called through the PLT. As far as I can
> tell, both of these call mechanisms will incur two memory
> accesses. However, in the case of the PLT the call will consist of two
> JMPs whereas the GOT will consist of only one. Is this a cause for
> concern? Could these two jumps interfere with prediction?
>
> In general the LLVM backend produces a few more instructions than the
> NCG although this doesn't appear to be related to handling of
> relocations. For instance, the inexplicable (to me) `mov` at the
> beginning of LLVM's `rKw_info`.
>
>
> ## Second example ##
>
> The second example demonstrates an actual call,
>
> -- Definition (in LibTest)
> infoRef :: Int -> Int
> infoRef n = n + 1
>
> -- Call site
> testInfoRef :: IO Int
> testInfoRef = return (infoRef 2)
>
> With `-O1` this produces the following code,
>
> ### LLVM ###
>
> 0000000000000fb0 <rLy_info>:
> fb0: 48 8b 45 00 mov 0x0(%rbp),%rax
> fb4: 48 8d 1d a5 10 20 00 lea 0x2010a5(%rip),%rbx
> # 202060 <rLx_closure>
> fbb: ff e0 jmpq *%rax
>
> 0000000000000fe0 <libtestzm0zi1zi0zi0_LibTest2_testInfoRef_info>:
> fe0: eb ce jmp fb0 <rLy_info>
>
> ### NCG ###
>
> 0000000000000e10 <rI3_info>:
> e10: 48 8d 1d 51 12 20 00 lea 0x201251(%rip),%rbx
> # 202068 <rI2_closure>
> e17: ff 65 00 jmpq *0x0(%rbp)
>
> 0000000000000e40 <libtestzm0zi1zi0zi0_LibTest2_testInfoRef_info>:
> e40: eb ce jmp e10 <rI3_info>
>
> Again, it seems that LLVM is a bit more verbose but seems to handle
> intra-package calls efficiently.
>
>
>
> [1] https://github.com/bgamari/ghc/commits/llvm-dynamic
> [2] https://github.com/bgamari/ghc-linking-tests/tree/master/ghc-test
> [3] `helloWorld` definitions:
>
> LLVM:
> 00000000000010a8 <libtestzm0zi1zi0zi0_LibTest_helloWorld_info>:
> 10a8: 50 push %rax
> 10a9: 4c 8d 75 f0 lea -0x10(%rbp),%r14
> 10ad: 4d 39 fe cmp %r15,%r14
> 10b0: 73 07 jae 10b9
> <libtestzm0zi1zi0zi0_LibTest_helloWorld_info+0x11>
> 10b2: 49 8b 45 f0 mov -0x10(%r13),%rax
> 10b6: 5a pop %rdx
> 10b7: ff e0 jmpq *%rax
> 10b9: 4c 89 ef mov %r13,%rdi
> 10bc: 48 89 de mov %rbx,%rsi
> 10bf: e8 0c fd ff ff callq dd0 <newCAF at plt>
> 10c4: 48 85 c0 test %rax,%rax
> 10c7: 74 22 je 10eb
> <libtestzm0zi1zi0zi0_LibTest_helloWorld_info+0x43>
> 10c9: 48 8b 0d 18 0f 20 00 mov 0x200f18(%rip),%rcx
> # 201fe8 <_DYNAMIC+0x228>
> 10d0: 48 89 4d f0 mov %rcx,-0x10(%rbp)
> 10d4: 48 89 45 f8 mov %rax,-0x8(%rbp)
> 10d8: 48 8d 05 21 00 00 00 lea 0x21(%rip),%rax #
> 1100 <cJC_str>
> 10df: 4c 89 f5 mov %r14,%rbp
> 10e2: 49 89 c6 mov %rax,%r14
> 10e5: 58 pop %rax
> 10e6: e9 b5 fc ff ff jmpq da0
> <ghczmprim_GHCziCString_unpackCStringzh_info at plt>
> 10eb: 48 8b 03 mov (%rbx),%rax
> 10ee: 5a pop %rdx
> 10ef: ff e0 jmpq *%rax
>
>
> NCG:
>
> 0000000000000ef8 <libtestzm0zi1zi0zi0_LibTest_helloWorld_info>:
> ef8: 48 8d 45 f0 lea -0x10(%rbp),%rax
> efc: 4c 39 f8 cmp %r15,%rax
> eff: 72 3f jb f40
> <libtestzm0zi1zi0zi0_LibTest_helloWorld_info+0x48>
> f01: 4c 89 ef mov %r13,%rdi
> f04: 48 89 de mov %rbx,%rsi
> f07: 48 83 ec 08 sub $0x8,%rsp
> f0b: b8 00 00 00 00 mov $0x0,%eax
> f10: e8 1b fd ff ff callq c30 <newCAF at plt>
> f15: 48 83 c4 08 add $0x8,%rsp
> f19: 48 85 c0 test %rax,%rax
> f1c: 74 20 je f3e
> <libtestzm0zi1zi0zi0_LibTest_helloWorld_info+0x46>
> f1e: 48 8b 1d cb 10 20 00 mov 0x2010cb(%rip),%rbx
> # 201ff0 <_DYNAMIC+0x238>
> f25: 48 89 5d f0 mov %rbx,-0x10(%rbp)
> f29: 48 89 45 f8 mov %rax,-0x8(%rbp)
> f2d: 4c 8d 35 1c 00 00 00 lea 0x1c(%rip),%r14 #
> f50 <cGG_str>
> f34: 48 83 c5 f0 add $0xfffffffffffffff0,%rbp
> f38: ff 25 7a 10 20 00 jmpq *0x20107a(%rip) #
> 201fb8 <_DYNAMIC+0x200>
> f3e: ff 23 jmpq *(%rbx)
> f40: 41 ff 65 f0 jmpq *-0x10(%r13)
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20131227/138bbf13/attachment-0001.html>
More information about the ghc-devs
mailing list