[GHC] #9481: Linker does not correctly resolve symbols in previously loaded objects
GHC
ghc-devs at haskell.org
Wed Aug 20 10:25:44 UTC 2014
#9481: Linker does not correctly resolve symbols in previously loaded objects
-------------------------------------+-------------------------------------
Reporter: edsko | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.8.2
Keywords: | Operating System:
Architecture: Unknown/Multiple | Unknown/Multiple
Difficulty: Unknown | Type of failure:
Blocked By: | None/Unknown
Related Tickets: | Test Case:
| Blocking:
| Differential Revisions:
-------------------------------------+-------------------------------------
'''Summary''': Given two object files (created from C files) A and B,
where B refers to symbols defined in A, if we load B before A, calling
`resolveObjs` after each object file, symbol resolution goes wrong. Full
test case attached.
'''Detailed description''': Consider
{{{#!c
// a.c
#include <stdio.h>
void defined_in_A() {
printf("In A\n");
}
}}}
{{{#!c
// b.c
#include <stdio.h>
void defined_in_A();
void defined_in_B() {
printf("In B\n");
defined_in_A();
printf("In B\n");
}
}}}
and
{{{#!hs
{-# LANGUAGE ForeignFunctionInterface #-}
module Main where
import System.IO
foreign import ccall "defined_in_B" defined_in_B :: IO ()
main :: IO ()
main = defined_in_B
}}}
and we use the GHC API to load the object files `a.o` and `b.o`
(corresponding to `a.c` and `b.c`), calling `resolveObjs` after loading
each object, and then load `Main` and call `Main.main`, everything works
fine (the attached test case contains both `a.c`, `b.c` and `Main.hs` as
well as the test program that calls into the GHC API).
However, if we load `b.o` ''before'' `a.o`, things go wrong. When we load
`b.o` and call `resolveObjs` then `resolveObjs` (quite reasonably)
complains that
{{{
lookupSymbol failed in relocateSection (relocate external)
b.o: unknown symbol `_defined_in_A'
}}}
since we haven't loaded `a.o` yet. But when we then load `a.o` and call
`resolveObjs` again, `resolveObjs` reports okay, but something goes wrong
because the program subsequently segfaults.
'''A successful run'''
I took a closer look at what happens exactly. First, let's consider the
test run where we load A before B, and everything works fine, and let's
look at the precise code that we actually execute when `Main.main` does
the foreign call to `defined_in_B`:
{{{
# lldb Linkerbug
Current executable set to 'Linkerbug' (x86_64).
(lldb) breakpoint set -n ffi_call
Breakpoint 1: where = Linkerbug`ffi_call + 29 at ffi64.c:421, address =
0x0000000102be2eed
(lldb) run
Process 92361 launched: 'Linkerbug' (x86_64)
Loading object "a.o"
symbol resolution ok
Loading object "b.o"
symbol resolution ok
Loading Haskell module "Main.hs"
ok
Running "Main.main"
Process 92361 stopped
* thread #1: tid = 0x32048b, 0x0000000102be2eed
Linkerbug`ffi_call(cif=0x000000010cc68500, fn=0x0000000104c6c210,
rvalue=0x00007fff5fbff920, avalue=0x00007fff5fbff8c0) + 29 at ffi64.c:421,
queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x0000000102be2eed
Linkerbug`ffi_call(cif=0x000000010cc68500, fn=0x0000000104c6c210,
rvalue=0x00007fff5fbff920, avalue=0x00007fff5fbff8c0) + 29 at ffi64.c:421
(lldb) disassemble -c 14 -s fn
0x104c6c210: pushq %rbp
0x104c6c211: movq %rsp, %rbp
0x104c6c214: pushq %rbx
0x104c6c215: pushq %rax
0x104c6c216: leaq 0x1d(%rip), %rbx
0x104c6c21d: movq %rbx, %rdi
0x104c6c220: callq 0x104c6c3c8
0x104c6c225: xorl %eax, %eax
0x104c6c227: callq 0x104c6b210
0x104c6c22c: movq %rbx, %rdi
0x104c6c22f: addq $0x8, %rsp
0x104c6c233: popq %rbx
0x104c6c234: popq %rbp
0x104c6c235: jmpq 0x104c6c3c8
}}}
This disassembly is the compiled code for B which, in order, does a call
to `printf`, then to `defined_in_A`, and then back to `printf` (the last
call is `jmpq` rather than `callq`: the C compiler did a tail call
optimization). Let's make sure that this is actually true; the first call
is a call to
{{{
(lldb) disassemble -c 2 -s 0x104c6c3c8
0x104c6c3c8: jmpq *-0xe(%rip)
0x104c6c3ce: addb %al, (%rax)
}}}
which is an indirect jump to
{{{
(lldb) memory read -f A -c 1 "0x104c6c3ce - 0xe"
0x104c6c3c0: 0x00007fff84fd5b9b libsystem_c.dylib`puts
}}}
to `puts`, okay, good. Seems an unnecessary level of indirection, but
that's okay. The next call is to
{{{
(lldb) disassemble -c 5 -s 0x104c6b210
0x104c6b210: pushq %rbp
0x104c6b211: movq %rsp, %rbp
0x104c6b214: leaq 0x6(%rip), %rdi
0x104c6b21b: popq %rbp
0x104c6b21c: jmpq 0x104c6b370
}}}
which is the compiled code for `defined_in_A`, which should be just a call
to `printf`. Let's confirm:
{{{
(lldb) disassemble -c 2 -s 0x104c6b370
0x104c6b370: jmpq *-0xe(%rip)
0x104c6b376: addb %al, (%rax)
(lldb) memory read -f A -c 1 "0x104c6b376 - 0xe"
0x104c6b368: 0x00007fff84fd5b9b libsystem_c.dylib`puts
}}}
Ok, all good.
'''A failed run'''
Now let's check what happens when we load the objects in the opposite
order. The code that we execute is
{{{
(lldb) disassemble -c 14 -s fn
0x104c6b210: pushq %rbp
0x104c6b211: movq %rsp, %rbp
0x104c6b214: pushq %rbx
0x104c6b215: pushq %rax
0x104c6b216: leaq 0x1d(%rip), %rbx
0x104c6b21d: movq %rbx, %rdi
0x104c6b220: callq 0x104c6b3c8
0x104c6b225: xorl %eax, %eax
0x104c6b227: callq 0x104c6c210
0x104c6b22c: movq %rbx, %rdi
0x104c6b22f: addq $0x8, %rsp
0x104c6b233: popq %rbx
0x104c6b234: popq %rbp
0x104c6b235: jmpq 0x104c6b556
}}}
This immediately looks suspicious: the address of the `jmpq` (the second
-- tail -- call to `printf`) is different from the first. The first
`callq` is okay:
{{{
(lldb) disassemble -c 2 -s 0x104c6b3c8
0x104c6b3c8: jmpq *-0xe(%rip)
0x104c6b3ce: addb %al, (%rax)
(lldb) memory read -f A -c 1 "0x104c6b3ce - 0xe"
0x104c6b3c0: 0x00007fff84fd5b9b libsystem_c.dylib`puts
}}}
and the call to `defined_in_A`, as well as the code ''for''
`defined_in_A`, are both ok:
{{{
(lldb) disassemble -c 5 -s 0x104c6c210
0x104c6c210: pushq %rbp
0x104c6c211: movq %rsp, %rbp
0x104c6c214: leaq 0x6(%rip), %rdi
0x104c6c21b: popq %rbp
0x104c6c21c: jmpq 0x104c6c370
(lldb) disassemble -c 2 -s 0x104c6c370
0x104c6c370: jmpq *-0xe(%rip)
0x104c6c376: addb %al, (%rax)
(lldb) memory read -f A -c 1 "0x104c6c376 - 0xe"
0x104c6c368: 0x00007fff84fd5b9b libsystem_c.dylib`puts
}}}
However, that second call to `printf` (the `jmpq`) in `defined_in_B` is
indeed wrong: it's jumping into nowhere:
{{{
(lldb) memory read -c 8 0x104c6b556
0x104c6b556: 00 00 00 00 00 00 00 00 ........
}}}
Note that somewhat surprisingly is it ''not'' the resolution of the symbol
from `a.o` that is wrong; that ''does'' get resolved okay once we load
`a.o`; rather, it is the jump to `puts` which is wrong (and then only the
one of them).
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9481>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list