[GHC] #13624: loadObj() does not respect alignment
GHC
ghc-devs at haskell.org
Fri Apr 28 03:22:39 UTC 2017
#13624: loadObj() does not respect alignment
-------------------------------------+-------------------------------------
Reporter: tmcdonell | Owner: (none)
Type: bug | Status: new
Priority: normal | Milestone:
Component: Runtime | Version: 8.0.1
System (Linker) |
Keywords: | Operating System: Unknown/Multiple
Architecture: | Type of failure: None/Unknown
Unknown/Multiple |
Test Case: | Blocked By:
Blocking: | Related Tickets:
Differential Rev(s): | Wiki Page:
-------------------------------------+-------------------------------------
This is perhaps known, but I'll write it down here in case somebody else
runs into this problem as well.
Since `loadObj()` just `mmap()`s the entire object file and decodes it
''in place'', it does not respect the alignment requirements specified in
the section headers. This is problematic for instructions which require
alignment, e.g. SSE, AVX.
The attached `map.ll` program is `map (+1)` over an array of floating
point numbers. In particular, the core loop is 8-way SIMD vectorised x
4-way unrolled, for 32-elements per loop iteration. A tail loop handles
any remainder one-at-a-time.
You can compile it using `llc -filetype=obj -mcpu=native map.ll`. For a
CPU with AVX instructions (sandy bridge or later) you should get the
following:
{{{
$ objdump -d map.o
Disassembly of section .text:
0000000000000000 <map>:
0: 49 89 f3 mov %rsi,%r11
3: 49 29 fb sub %rdi,%r11
6: 0f 8e f9 00 00 00 jle 105 <map+0x105>
c: 49 83 fb 20 cmp $0x20,%r11
10: 0f 82 bd 00 00 00 jb d3 <map+0xd3>
16: 4d 89 da mov %r11,%r10
19: 49 83 e2 e0 and $0xffffffffffffffe0,%r10
1d: 4d 89 d9 mov %r11,%r9
20: 49 83 e1 e0 and $0xffffffffffffffe0,%r9
24: 0f 84 a9 00 00 00 je d3 <map+0xd3>
2a: 49 01 fa add %rdi,%r10
2d: 48 8d 44 ba 60 lea 0x60(%rdx,%rdi,4),%rax
32: 49 8d 7c b8 60 lea 0x60(%r8,%rdi,4),%rdi
37: c5 fc 28 05 00 00 00 vmovaps 0x0(%rip),%ymm0 # 3f
<map+0x3f>
3e: 00
3f: 4c 89 c9 mov %r9,%rcx
42: 66 66 66 66 66 2e 0f data16 data16 data16 data16 nopw
%cs:0x0(%rax,%rax,1)
49: 1f 84 00 00 00 00 00
50: c5 f8 10 4f a0 vmovups -0x60(%rdi),%xmm1
55: c5 f8 10 57 c0 vmovups -0x40(%rdi),%xmm2
5a: c5 f8 10 5f e0 vmovups -0x20(%rdi),%xmm3
5f: c5 f8 10 27 vmovups (%rdi),%xmm4
63: c4 e3 75 18 4f b0 01 vinsertf128 $0x1,-0x50(%rdi),%ymm1,%ymm1
6a: c4 e3 6d 18 57 d0 01 vinsertf128 $0x1,-0x30(%rdi),%ymm2,%ymm2
71: c4 e3 65 18 5f f0 01 vinsertf128 $0x1,-0x10(%rdi),%ymm3,%ymm3
78: c4 e3 5d 18 67 10 01 vinsertf128 $0x1,0x10(%rdi),%ymm4,%ymm4
7f: c5 f4 58 c8 vaddps %ymm0,%ymm1,%ymm1
83: c5 ec 58 d0 vaddps %ymm0,%ymm2,%ymm2
87: c5 e4 58 d8 vaddps %ymm0,%ymm3,%ymm3
8b: c5 dc 58 e0 vaddps %ymm0,%ymm4,%ymm4
8f: c4 e3 7d 19 48 b0 01 vextractf128 $0x1,%ymm1,-0x50(%rax)
96: c5 f8 11 48 a0 vmovups %xmm1,-0x60(%rax)
9b: c4 e3 7d 19 50 d0 01 vextractf128 $0x1,%ymm2,-0x30(%rax)
a2: c5 f8 11 50 c0 vmovups %xmm2,-0x40(%rax)
a7: c4 e3 7d 19 58 f0 01 vextractf128 $0x1,%ymm3,-0x10(%rax)
ae: c5 f8 11 58 e0 vmovups %xmm3,-0x20(%rax)
b3: c4 e3 7d 19 60 10 01 vextractf128 $0x1,%ymm4,0x10(%rax)
ba: c5 f8 11 20 vmovups %xmm4,(%rax)
be: 48 83 e8 80 sub $0xffffffffffffff80,%rax
c2: 48 83 ef 80 sub $0xffffffffffffff80,%rdi
c6: 48 83 c1 e0 add $0xffffffffffffffe0,%rcx
ca: 75 84 jne 50 <map+0x50>
cc: 4d 39 cb cmp %r9,%r11
cf: 75 05 jne d6 <map+0xd6>
d1: eb 32 jmp 105 <map+0x105>
d3: 49 89 fa mov %rdi,%r10
d6: 4c 29 d6 sub %r10,%rsi
d9: 4a 8d 04 92 lea (%rdx,%r10,4),%rax
dd: 4b 8d 0c 90 lea (%r8,%r10,4),%rcx
e1: c5 fa 10 05 00 00 00 vmovss 0x0(%rip),%xmm0 # e9
<map+0xe9>
e8: 00
e9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
f0: c5 fa 58 09 vaddss (%rcx),%xmm0,%xmm1
f4: c5 fa 11 08 vmovss %xmm1,(%rax)
f8: 48 83 c0 04 add $0x4,%rax
fc: 48 83 c1 04 add $0x4,%rcx
100: 48 ff ce dec %rsi
103: 75 eb jne f0 <map+0xf0>
105: c5 f8 77 vzeroupper
108: c3 req
}}}
The attached `test.c` will load the object file and try to execute it. The
`#define N` on line 7 will change the size of the array. For fewer than 32
elements this works as expected (where the input array is [0..N-1]):
{{{
$ ./build.sh
+ llc-4.0 -filetype=obj -mcpu=native map.ll
+ ghc --make -no-hs-main test.c
$ ./a.out
array size is 31
calling function...
ok
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 13.0 14.0 15.0 16.0
17.0 18.0 19.0 20.0 21.0 22.0 23.0 24.0 25.0 26.0 27.0 28.0 29.0 30.0 31.0
}}}
For 32 elements or larger (i.e. entering the core loop) the program will
(almost certainly) segfault.
{{{
$ lldb a.out
(lldb) target create "a.out"
Current executable set to 'a.out' (x86_64).
(lldb) run
Process 7294 launched: '<snip>/a.out' (x86_64)
array size is 32
calling function...
Process 7294 stopped
* thread #1: tid = 0xc41676, 0x000000010019f207, queue = 'com.apple.main-
thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
frame #0: 0x000000010019f207
-> 0x10019f207: vmovaps 0xe1(%rip), %ymm0
0x10019f20f: movq %r9, %rcx
0x10019f212: nopw %cs:(%rax,%rax)
0x10019f220: vmovups -0x60(%rdi), %xmm1
}}}
The `VMOVAPS` instruction requires the source address to be 32-byte
aligned. It is attempting to load 8 floats from one of the const sections
(the ones for the +1), but since the section was not loaded at the
required alignment, fails.
I've tested this on x86_64 macOS (Mach-O) and ubuntu (ELF). I don't have
any other systems to test on.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/13624>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list