[GHC] #14619: Output value of program changes upon compiling with -O optimizations
GHC
ghc-devs at haskell.org
Sun Jan 14 23:01:09 UTC 2018
#14619: Output value of program changes upon compiling with -O optimizations
-------------------------------------+-------------------------------------
Reporter: sheaf | Owner: (none)
Type: bug | Status: new
Priority: highest | Milestone: 8.4.1
Component: Compiler | Version: 8.2.2
Resolution: | Keywords:
Operating System: Windows | Architecture: x86_64
Type of failure: Incorrect result | (amd64)
at runtime | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Phyx-):
I don't think it's a register allocation issue. I think it's a genuine bug
in a Core2Core pass:
Following the code fom `sphereIntersection`, the first interesting
location is
`0x0000000000401E72` (it's all statically linked). At this address the
first 6 doubles are loaded from the
stack:
{{{
0x401e72 <Main_zdwsphereIntersection_info+626>: movsd
%xmm1,-0x30(%rbp)
0x401e77 <Main_zdwsphereIntersection_info+631>: movsd
%xmm2,-0x28(%rbp)
0x401e7c <Main_zdwsphereIntersection_info+636>: movsd
%xmm3,-0x20(%rbp)
0x401e81 <Main_zdwsphereIntersection_info+641>: movsd
%xmm4,-0x18(%rbp)
0x401e86 <Main_zdwsphereIntersection_info+646>: movsd
%xmm5,-0x10(%rbp)
0x401e8b <Main_zdwsphereIntersection_info+651>: movsd
%xmm6,-0x8(%rbp)
}}}
Here:
{{{
xmm1 = 0
xmm2 = 0
xmm3 = 0
xmm4 = 1.1
xmm5 = 2.2
xmm6 = 3.3
}}}
So far so good.
The first operation to get done is `b = oc <.> dir`. oc we already know
since `(<+>)` seems to have been inlined
and folded away (I assume GHC does constant folding since I can't find any
code for this).
so the code for `(<.>)` is at `0x0000000000401C14`:
{{{
0x401c14 <Main_zdwsphereIntersection_info+20>: movsd
0x10(%rbp),%xmm0 (= 200)
0x401c19 <Main_zdwsphereIntersection_info+25>: addsd %xmm3,%xmm0
0x401c1d <Main_zdwsphereIntersection_info+29>: mulsd %xmm6,%xmm0
0x401c21 <Main_zdwsphereIntersection_info+33>: movsd
0x8(%rbp),%xmm6 (= 0)
0x401c26 <Main_zdwsphereIntersection_info+38>: addsd %xmm2,%xmm6
0x401c2a <Main_zdwsphereIntersection_info+42>: mulsd %xmm5,%xmm6
0x401c2e <Main_zdwsphereIntersection_info+46>: movsd
0x0(%rbp),%xmm7 (= 0)
0x401c33 <Main_zdwsphereIntersection_info+51>: addsd %xmm1,%xmm7
0x401c37 <Main_zdwsphereIntersection_info+55>: mulsd %xmm4,%xmm7
0x401c3b <Main_zdwsphereIntersection_info+59>: addsd %xmm6,%xmm7
0x401c3f <Main_zdwsphereIntersection_info+63>: addsd %xmm0,%xmm7
}}}
So this performed `oc <.> dir` and `xmm7` now contains `b`. Also notice we
clobbed `xmm6` here. It now contains `0`.
The next thing we must do is calculate `sqrtDisc` and calculate `t1`.
t1 is at `0000000000401C9B`
{{{
0x401c9b <Main_zdwsphereIntersection_info+155>: movsd
0x68(%rsp),%xmm1
0x401ca1 <Main_zdwsphereIntersection_info+161>: movsd %xmm1,%xmm2
0x401ca5 <Main_zdwsphereIntersection_info+165>: subsd %xmm0,%xmm2
0x401ca9 <Main_zdwsphereIntersection_info+169>: xorpd %xmm3,%xmm3
0x401cad <Main_zdwsphereIntersection_info+173>: ucomisd
%xmm3,%xmm2
0x401cb1 <Main_zdwsphereIntersection_info+177>: ja 0x401cd8
<Main_zdwsphereIntersection_info+216> (t1 > 0)
0x401cb3 <Main_zdwsphereIntersection_info+179>: addsd %xmm0,%xmm1
0x401cb7 <Main_zdwsphereIntersection_info+183>: xorpd %xmm0,%xmm0
0x401cbb <Main_zdwsphereIntersection_info+187>: ucomisd
%xmm0,%xmm1
0x401cbf <Main_zdwsphereIntersection_info+191>: ja 0x401d9a
<Main_zdwsphereIntersection_info+410> (t2 > 0)
}}}
we take the branch to `0x401cd8` which is `t1 > 0` and then must evaluate
`(*>)` which is at `0x0000000000401CD8`
`t1` is stored in `xmm2`.
{{{
0x401cd8 <Main_zdwsphereIntersection_info+216>: movq
$0x498cd8,-0x80(%r12)
0x401ce1 <Main_zdwsphereIntersection_info+225>: movsd
%xmm6,-0x78(%r12)
0x401ce8 <Main_zdwsphereIntersection_info+232>: movq
$0x498cd8,-0x70(%r12)
0x401cf1 <Main_zdwsphereIntersection_info+241>: movsd %xmm2,%xmm0
0x401cf5 <Main_zdwsphereIntersection_info+245>: mulsd %xmm6,%xmm0
0x401cf9 <Main_zdwsphereIntersection_info+249>: movsd
%xmm0,-0x68(%r12)
0x401d00 <Main_zdwsphereIntersection_info+256>: movq
$0x498cd8,-0x60(%r12)
0x401d09 <Main_zdwsphereIntersection_info+265>: movsd %xmm2,%xmm0
0x401d0d <Main_zdwsphereIntersection_info+269>: movsd
0x60(%rsp),%xmm1
0x401d13 <Main_zdwsphereIntersection_info+275>: mulsd %xmm1,%xmm0
0x401d17 <Main_zdwsphereIntersection_info+279>: movsd
%xmm0,-0x58(%r12)
0x401d1e <Main_zdwsphereIntersection_info+286>: movq
$0x498cd8,-0x50(%r12)
0x401d27 <Main_zdwsphereIntersection_info+295>: movsd
0x58(%rsp),%xmm0
0x401d2d <Main_zdwsphereIntersection_info+301>: mulsd %xmm0,%xmm2
0x401d31 <Main_zdwsphereIntersection_info+305>: movsd
%xmm2,-0x48(%r12)
0x401d38 <Main_zdwsphereIntersection_info+312>: movq
$0x498b18,-0x40(%r12)
}}}
Notice a couple of weird things here.
`xmm6` is still clobbered and has no meaning, yet we still spill it but
never load it again (that I could find).
Then we do the multiplication of `a*x'` without ever restoring `x'`
{{{
0x401cf5 <Main_zdwsphereIntersection_info+245>: mulsd
%xmm6,%xmm0
}}}
Weirdly, we then restore `y'` and `z'` which are stored at `0x60(%rsp)`
and `0x58(%rsp)`.
Inspecting `%rsp` I see `xmm6` (3.3) was never spilled to begin with.
{{{
0000000000B6DBB8 0 0
0000000000B6DBC8 0 1.1
0000000000B6DBD8 2.2 660
}}}
Now that we know what's happening, let's compare `-O0` and `-O2`.
At `-O0` where it works, we have the following sequence for `(<.>)`:
{{{
.Ln4nu:
movsd (%rbp),%xmm0
movsd 8(%rbp),%xmm7
movsd 16(%rbp),%xmm8
...
.Ln4nw:
addsd %xmm3,%xmm8
mulsd %xmm6,%xmm8
addsd %xmm2,%xmm7
mulsd %xmm5,%xmm7
addsd %xmm1,%xmm0
mulsd %xmm4,%xmm0
addsd %xmm7,%xmm0
addsd %xmm8,%xmm0
xorpd %xmm7,%xmm7
ucomisd %xmm7,%xmm0
}}}
Notice that `xmm6` is not clobbered here.
The `-O2` version is:
{{{
movsd 16(%rbp),%xmm0
addsd %xmm3,%xmm0
mulsd %xmm6,%xmm0
movsd 8(%rbp),%xmm6
addsd %xmm2,%xmm6
mulsd %xmm5,%xmm6
movsd (%rbp),%xmm7
addsd %xmm1,%xmm7
mulsd %xmm4,%xmm7
addsd %xmm6,%xmm7
addsd %xmm0,%xmm7
xorpd %xmm0,%xmm0
ucomisd %xmm0,%xmm7
}}}
At `-O0` because it's not clobbered later it correctly spills `xmm6`:
{{{
.Ln4o8:
movl $1,%eax
movsd %xmm1,104(%rsp)
movsd %xmm2,112(%rsp)
movsd %xmm3,120(%rsp)
movsd %xmm4,128(%rsp)
movsd %xmm5,136(%rsp)
movsd %xmm6,144(%rsp)
movsd %xmm8,152(%rsp)
}}}
Whereas `-O2` thinks it doesn't need the value and spills one register too
few.
{{{
.Ln4os:
movl $1,%eax
movsd %xmm1,104(%rsp)
movsd %xmm2,112(%rsp)
movsd %xmm3,120(%rsp)
movsd %xmm4,128(%rsp)
movsd %xmm5,136(%rsp)
movsd %xmm7,144(%rsp)
}}}
My guess is, at `-O2` it thinks it has enough registers to not need to
spill `xmm6`.
But it then later clobbers without spilling and reloading it!
However I'm too tired to look at Core tonight, so I'll continue next week.
I think it's a Core pass eliminating a value it shouldn't.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14619#comment:25>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list