[GHC] #9481: Linker does not correctly resolve symbols in previously loaded objects

GHC ghc-devs at haskell.org
Wed Aug 20 10:25:44 UTC 2014


#9481: Linker does not correctly resolve symbols in previously loaded objects
-------------------------------------+-------------------------------------
       Reporter:  edsko              |                   Owner:
           Type:  bug                |                  Status:  new
       Priority:  normal             |               Milestone:
      Component:  Compiler           |                 Version:  7.8.2
       Keywords:                     |        Operating System:
   Architecture:  Unknown/Multiple   |  Unknown/Multiple
     Difficulty:  Unknown            |         Type of failure:
     Blocked By:                     |  None/Unknown
Related Tickets:                     |               Test Case:
                                     |                Blocking:
                                     |  Differential Revisions:
-------------------------------------+-------------------------------------
 '''Summary''': Given two object files (created from C files) A and B,
 where B refers to symbols defined in A, if we load B before A, calling
 `resolveObjs` after each object file, symbol resolution goes wrong. Full
 test case attached.

 '''Detailed description''': Consider

 {{{#!c
 // a.c
 #include <stdio.h>

 void defined_in_A() {
   printf("In A\n");
 }
 }}}

 {{{#!c
 // b.c
 #include <stdio.h>

 void defined_in_A();

 void defined_in_B() {
   printf("In B\n");
   defined_in_A();
   printf("In B\n");
 }
 }}}

 and

 {{{#!hs
 {-# LANGUAGE ForeignFunctionInterface #-}
 module Main where

 import System.IO

 foreign import ccall "defined_in_B" defined_in_B :: IO ()

 main :: IO ()
 main = defined_in_B
 }}}

 and we use the GHC API to load the object files `a.o` and `b.o`
 (corresponding to `a.c` and `b.c`), calling `resolveObjs` after loading
 each object, and then load `Main` and call `Main.main`, everything works
 fine (the attached test case contains both `a.c`, `b.c` and `Main.hs` as
 well as the test program that calls into the GHC API).

 However, if we load `b.o` ''before'' `a.o`, things go wrong. When we load
 `b.o` and call `resolveObjs` then `resolveObjs` (quite reasonably)
 complains that

 {{{
 lookupSymbol failed in relocateSection (relocate external)
 b.o: unknown symbol `_defined_in_A'
 }}}

 since we haven't loaded `a.o` yet. But when we then load `a.o` and call
 `resolveObjs` again, `resolveObjs` reports okay, but something goes wrong
 because the program subsequently segfaults.

 '''A successful run'''

 I took a closer look at what happens exactly. First, let's consider the
 test run where we load A before B, and everything works fine, and let's
 look at the precise code that we actually execute when `Main.main` does
 the foreign call to `defined_in_B`:

 {{{
 # lldb Linkerbug
 Current executable set to 'Linkerbug' (x86_64).

 (lldb) breakpoint set -n ffi_call
 Breakpoint 1: where = Linkerbug`ffi_call + 29 at ffi64.c:421, address =
 0x0000000102be2eed

 (lldb) run
 Process 92361 launched: 'Linkerbug' (x86_64)
 Loading object "a.o"
   symbol resolution ok
 Loading object "b.o"
   symbol resolution ok
 Loading Haskell module "Main.hs"
   ok
 Running "Main.main"
 Process 92361 stopped
 * thread #1: tid = 0x32048b, 0x0000000102be2eed
 Linkerbug`ffi_call(cif=0x000000010cc68500, fn=0x0000000104c6c210,
 rvalue=0x00007fff5fbff920, avalue=0x00007fff5fbff8c0) + 29 at ffi64.c:421,
 queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
     frame #0: 0x0000000102be2eed
 Linkerbug`ffi_call(cif=0x000000010cc68500, fn=0x0000000104c6c210,
 rvalue=0x00007fff5fbff920, avalue=0x00007fff5fbff8c0) + 29 at ffi64.c:421

 (lldb) disassemble -c 14 -s fn
    0x104c6c210:  pushq  %rbp
    0x104c6c211:  movq   %rsp, %rbp
    0x104c6c214:  pushq  %rbx
    0x104c6c215:  pushq  %rax
    0x104c6c216:  leaq   0x1d(%rip), %rbx
    0x104c6c21d:  movq   %rbx, %rdi
    0x104c6c220:  callq  0x104c6c3c8
    0x104c6c225:  xorl   %eax, %eax
    0x104c6c227:  callq  0x104c6b210
    0x104c6c22c:  movq   %rbx, %rdi
    0x104c6c22f:  addq   $0x8, %rsp
    0x104c6c233:  popq   %rbx
    0x104c6c234:  popq   %rbp
    0x104c6c235:  jmpq   0x104c6c3c8
 }}}

 This disassembly is the compiled code for B which, in order, does a call
 to `printf`, then to `defined_in_A`, and then back to `printf` (the last
 call is `jmpq` rather than `callq`: the C compiler did a tail call
 optimization). Let's make sure that this is actually true; the first call
 is a call to

 {{{
 (lldb) disassemble -c 2 -s 0x104c6c3c8
    0x104c6c3c8:  jmpq   *-0xe(%rip)
    0x104c6c3ce:  addb   %al, (%rax)
 }}}

 which is an indirect jump to

 {{{
 (lldb) memory read -f A -c 1 "0x104c6c3ce - 0xe"
 0x104c6c3c0: 0x00007fff84fd5b9b libsystem_c.dylib`puts
 }}}

 to `puts`, okay, good. Seems an unnecessary level of indirection, but
 that's okay. The next call is to

 {{{
 (lldb) disassemble -c 5 -s 0x104c6b210
    0x104c6b210:  pushq  %rbp
    0x104c6b211:  movq   %rsp, %rbp
    0x104c6b214:  leaq   0x6(%rip), %rdi
    0x104c6b21b:  popq   %rbp
    0x104c6b21c:  jmpq   0x104c6b370
 }}}

 which is the compiled code for `defined_in_A`, which should be just a call
 to `printf`. Let's confirm:

 {{{
 (lldb) disassemble -c 2 -s 0x104c6b370
    0x104c6b370:  jmpq   *-0xe(%rip)
    0x104c6b376:  addb   %al, (%rax)

 (lldb) memory read -f A -c 1 "0x104c6b376 - 0xe"
 0x104c6b368: 0x00007fff84fd5b9b libsystem_c.dylib`puts
 }}}

 Ok, all good.

 '''A failed run'''

 Now let's check what happens when we load the objects in the opposite
 order. The code that we execute is

 {{{
 (lldb) disassemble -c 14 -s fn
    0x104c6b210:  pushq  %rbp
    0x104c6b211:  movq   %rsp, %rbp
    0x104c6b214:  pushq  %rbx
    0x104c6b215:  pushq  %rax
    0x104c6b216:  leaq   0x1d(%rip), %rbx
    0x104c6b21d:  movq   %rbx, %rdi
    0x104c6b220:  callq  0x104c6b3c8
    0x104c6b225:  xorl   %eax, %eax
    0x104c6b227:  callq  0x104c6c210
    0x104c6b22c:  movq   %rbx, %rdi
    0x104c6b22f:  addq   $0x8, %rsp
    0x104c6b233:  popq   %rbx
    0x104c6b234:  popq   %rbp
    0x104c6b235:  jmpq   0x104c6b556
 }}}

 This immediately looks suspicious: the address of the `jmpq` (the second
 -- tail -- call to `printf`) is different from the first. The first
 `callq` is okay:

 {{{
 (lldb) disassemble -c 2 -s 0x104c6b3c8
    0x104c6b3c8:  jmpq   *-0xe(%rip)
    0x104c6b3ce:  addb   %al, (%rax)

 (lldb) memory read -f A -c 1 "0x104c6b3ce - 0xe"
 0x104c6b3c0: 0x00007fff84fd5b9b libsystem_c.dylib`puts
 }}}

 and the call to `defined_in_A`, as well as the code ''for''
 `defined_in_A`, are both ok:

 {{{
 (lldb) disassemble -c 5 -s 0x104c6c210
    0x104c6c210:  pushq  %rbp
    0x104c6c211:  movq   %rsp, %rbp
    0x104c6c214:  leaq   0x6(%rip), %rdi
    0x104c6c21b:  popq   %rbp
    0x104c6c21c:  jmpq   0x104c6c370

 (lldb) disassemble -c 2 -s 0x104c6c370
    0x104c6c370:  jmpq   *-0xe(%rip)
    0x104c6c376:  addb   %al, (%rax)

 (lldb) memory read -f A -c 1 "0x104c6c376 - 0xe"
 0x104c6c368: 0x00007fff84fd5b9b libsystem_c.dylib`puts
 }}}

 However, that second call to `printf` (the `jmpq`) in `defined_in_B` is
 indeed wrong: it's jumping into nowhere:

 {{{
 (lldb) memory read -c 8 0x104c6b556
 0x104c6b556: 00 00 00 00 00 00 00 00                          ........
 }}}

 Note that somewhat surprisingly is it ''not'' the resolution of the symbol
 from `a.o` that is wrong; that ''does'' get resolved okay once we load
 `a.o`; rather, it is the jump to `puts` which is wrong (and then only the
 one of them).

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9481>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list