LLVM and dynamic linking

Simon Marlow marlowsd at gmail.com
Wed Jan 8 10:19:40 UTC 2014


On 27/12/13 20:21, Ben Gamari wrote:
> Simon Marlow <marlowsd at gmail.com> writes:
>
>> This sounds right to me.  Did you submit a patch?
>>
>> Note that dynamic linking with LLVM is likely to produce significantly
>> worse code that with the NCG right now, because the LLVM back end uses
>> dynamic references even for symbols in the same package, whereas the NCG
>> back-end uses direct static references for these.
>>
> Today with the help of Edward Yang I examined the code produced by the
> LLVM backend in light of this statement. I was surprised to find that
> LLVM's code appears to be no worse than the NCG with respect to
> intra-package references.
>
> My test case can be found here[2] and can be built with the included
> `build.sh` script. The test consists of two modules build into a shared
> library. One module, `LibTest`, exports a few simple members while the
> other module (`LibTest2`) defines members that consume them. Care is
> taken to ensure the members are not inlined.
>
> The tests were done on x86_64 running LLVM 3.4 and GHC HEAD with the
> patches[1] I referred to in my last message. Please let me know if I've
> missed something.

This is good news, however what worries me is that I still don't 
understand *why* you got these results.  Where in the LLVM backend is 
the magic that does something special for intra-package references?  I 
know where it is in the NCG backend - CLabel.labelDynamic - but I can't 
see this function used at all in the LLVM backend.  So what is the 
mechanism that lets LLVM optimise these calls?  Is it happening 
magically in the linker, perhaps?  But that would only be possible when 
using -Bsymbolic or -Bsymbolic-functions, which is a choice made at link 
time.

As far as I can tell, all we do is pass a flag to llc to tell it to 
compile for dynamic/PIC, in DriverPipeline.runPhase.

Cheers,
	Simon


>
>
> # Evaluation
>
> ## First example ##
>
> The first member is a simple `String` (defined in `LibTest`),
>
>      helloWorld :: String
>      helloWorld = "Hello World!"
>
> The use-site is quite straightforward,
>
>      testHelloWorld :: IO String
>      testHelloWorld = return helloWorld
>
> With `-O1` the code looks reasonable in both cases. Most importantly,
> both backends use IP relative addressing to find the symbol.
>
> ### LLVM ###
>
>      0000000000000ef8 <rKw_info>:
>           ef8:	48 8b 45 00          	mov    0x0(%rbp),%rax
>           efc:	48 8d 1d cd 11 20 00 	lea    0x2011cd(%rip),%rbx        # 2020d0 <libtestzm0zi1zi0zi0_LibTest_helloWorld_closure>
>           f03:	ff e0                	jmpq   *%rax
>
>      0000000000000f28 <libtestzm0zi1zi0zi0_LibTest2_testHelloWorld_info>:
>           f28:	eb ce                	jmp    ef8 <rKw_info>
>           f2a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
>
> ### NCG ###
>
>      0000000000000d58 <rH1_info>:
>       d58:	48 8d 1d 71 13 20 00 	lea    0x201371(%rip),%rbx        # 2020d0 <libtestzm0zi1zi0zi0_LibTest_helloWorld_closure>
>       d5f:	ff 65 00             	jmpq   *0x0(%rbp)
>
>      0000000000000d88 <libtestzm0zi1zi0zi0_LibTest2_testHelloWorld_info>:
>       d88:	eb ce                	jmp    d58 <rH1_info>
>
>
> With `-O0` the code is substantially longer but the relocation behavior
> is still correct, as one would expect.
>
> Looking at the definition of `helloWorld`[3] itself it becomes clear that
> the LLVM backend is more likely to use PLT relocations over GOT. In
> general, `stg_*` primitives are called through the PLT. As far as I can
> tell, both of these call mechanisms will incur two memory
> accesses. However, in the case of the PLT the call will consist of two
> JMPs whereas the GOT will consist of only one. Is this a cause for
> concern? Could these two jumps interfere with prediction?
>
> In general the LLVM backend produces a few more instructions than the
> NCG although this doesn't appear to be related to handling of
> relocations. For instance, the inexplicable (to me) `mov` at the
> beginning of LLVM's `rKw_info`.
>
>
> ## Second example ##
>
> The second example demonstrates an actual call,
>
>      -- Definition (in LibTest)
>      infoRef :: Int -> Int
>      infoRef n = n + 1
>
>      -- Call site
>      testInfoRef :: IO Int
>      testInfoRef = return (infoRef 2)
>
> With `-O1` this produces the following code,
>
> ### LLVM ###
>
>      0000000000000fb0 <rLy_info>:
>           fb0:	48 8b 45 00          	mov    0x0(%rbp),%rax
>           fb4:	48 8d 1d a5 10 20 00 	lea    0x2010a5(%rip),%rbx        # 202060 <rLx_closure>
>           fbb:	ff e0                	jmpq   *%rax
>
>      0000000000000fe0 <libtestzm0zi1zi0zi0_LibTest2_testInfoRef_info>:
>           fe0:	eb ce                	jmp    fb0 <rLy_info>
>
> ### NCG ###
>
>      0000000000000e10 <rI3_info>:
>       e10:	48 8d 1d 51 12 20 00 	lea    0x201251(%rip),%rbx        # 202068 <rI2_closure>
>       e17:	ff 65 00             	jmpq   *0x0(%rbp)
>
>      0000000000000e40 <libtestzm0zi1zi0zi0_LibTest2_testInfoRef_info>:
>       e40:	eb ce                	jmp    e10 <rI3_info>
>
> Again, it seems that LLVM is a bit more verbose but seems to handle
> intra-package calls efficiently.
>
>
>
> [1] https://github.com/bgamari/ghc/commits/llvm-dynamic
> [2] https://github.com/bgamari/ghc-linking-tests/tree/master/ghc-test
> [3] `helloWorld` definitions:
>
> LLVM:
>      00000000000010a8 <libtestzm0zi1zi0zi0_LibTest_helloWorld_info>:
>          10a8:	50                   	push   %rax
>          10a9:	4c 8d 75 f0          	lea    -0x10(%rbp),%r14
>          10ad:	4d 39 fe             	cmp    %r15,%r14
>          10b0:	73 07                	jae    10b9 <libtestzm0zi1zi0zi0_LibTest_helloWorld_info+0x11>
>          10b2:	49 8b 45 f0          	mov    -0x10(%r13),%rax
>          10b6:	5a                   	pop    %rdx
>          10b7:	ff e0                	jmpq   *%rax
>          10b9:	4c 89 ef             	mov    %r13,%rdi
>          10bc:	48 89 de             	mov    %rbx,%rsi
>          10bf:	e8 0c fd ff ff       	callq  dd0 <newCAF at plt>
>          10c4:	48 85 c0             	test   %rax,%rax
>          10c7:	74 22                	je     10eb <libtestzm0zi1zi0zi0_LibTest_helloWorld_info+0x43>
>          10c9:	48 8b 0d 18 0f 20 00 	mov    0x200f18(%rip),%rcx        # 201fe8 <_DYNAMIC+0x228>
>          10d0:	48 89 4d f0          	mov    %rcx,-0x10(%rbp)
>          10d4:	48 89 45 f8          	mov    %rax,-0x8(%rbp)
>          10d8:	48 8d 05 21 00 00 00 	lea    0x21(%rip),%rax        # 1100 <cJC_str>
>          10df:	4c 89 f5             	mov    %r14,%rbp
>          10e2:	49 89 c6             	mov    %rax,%r14
>          10e5:	58                   	pop    %rax
>          10e6:	e9 b5 fc ff ff       	jmpq   da0 <ghczmprim_GHCziCString_unpackCStringzh_info at plt>
>          10eb:	48 8b 03             	mov    (%rbx),%rax
>          10ee:	5a                   	pop    %rdx
>          10ef:	ff e0                	jmpq   *%rax
>
>
> NCG:
>
>      0000000000000ef8 <libtestzm0zi1zi0zi0_LibTest_helloWorld_info>:
>       ef8:	48 8d 45 f0          	lea    -0x10(%rbp),%rax
>       efc:	4c 39 f8             	cmp    %r15,%rax
>       eff:	72 3f                	jb     f40 <libtestzm0zi1zi0zi0_LibTest_helloWorld_info+0x48>
>       f01:	4c 89 ef             	mov    %r13,%rdi
>       f04:	48 89 de             	mov    %rbx,%rsi
>       f07:	48 83 ec 08          	sub    $0x8,%rsp
>       f0b:	b8 00 00 00 00       	mov    $0x0,%eax
>       f10:	e8 1b fd ff ff       	callq  c30 <newCAF at plt>
>       f15:	48 83 c4 08          	add    $0x8,%rsp
>       f19:	48 85 c0             	test   %rax,%rax
>       f1c:	74 20                	je     f3e <libtestzm0zi1zi0zi0_LibTest_helloWorld_info+0x46>
>       f1e:	48 8b 1d cb 10 20 00 	mov    0x2010cb(%rip),%rbx        # 201ff0 <_DYNAMIC+0x238>
>       f25:	48 89 5d f0          	mov    %rbx,-0x10(%rbp)
>       f29:	48 89 45 f8          	mov    %rax,-0x8(%rbp)
>       f2d:	4c 8d 35 1c 00 00 00 	lea    0x1c(%rip),%r14        # f50 <cGG_str>
>       f34:	48 83 c5 f0          	add    $0xfffffffffffffff0,%rbp
>       f38:	ff 25 7a 10 20 00    	jmpq   *0x20107a(%rip)        # 201fb8 <_DYNAMIC+0x200>
>       f3e:	ff 23                	jmpq   *(%rbx)
>       f40:	41 ff 65 f0          	jmpq   *-0x10(%r13)
>



More information about the ghc-devs mailing list