[GHC] #10012: Cheap-to-compute values aren't pushed into case branches inducing unnecessary register pressure
GHC
ghc-devs at haskell.org
Thu Jan 22 03:35:04 UTC 2015
#10012: Cheap-to-compute values aren't pushed into case branches inducing
unnecessary register pressure
-------------------------------------+-------------------------------------
Reporter: bgamari | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.8.4
Keywords: | Operating System: Unknown/Multiple
Architecture: | Type of failure: Runtime
Unknown/Multiple | performance bug
Test Case: | Blocked By:
Blocking: | Related Tickets:
Differential Revisions: |
-------------------------------------+-------------------------------------
When characterizing bytestring's `Builder` interface[1] I noticed that
some benchmarks involving repeated appends (e.g. `Host endian/1MB of Word8
in chunks of 16`) perform inexplicably much worse than others. A glance at
the assembly revealed that a large number of `Word8`s are being evaluated
and saved to registers, only to be later stored. E.g.,
```#!asm
# First we evaluate a number of Word8s, saving them in registers
movzbl %bl,%ebx
leaq 1(%r14),%rcx
movzbl %cl,%ecx
leaq 2(%r14),%rdx
movzbl %dl,%edx
leaq 3(%r14),%rsi
movzbl %sil,%esi
leaq 4(%r14),%r9
movzbl %r9b,%r9d
leaq 5(%r14),%r10
movzbl %r10b,%r10d
...
# Eventually we run out of registers and start spilling to the stack
movq %rax,64(%rsp)
leaq 7(%r14),%rax
movzbl %al,%eax
movq %rbx,72(%rsp)
...
# Only after evaluating all of the needed words do we actually consume
# them
movq %rax,-152(%r12)
movq 72(%rsp),%rax
movq %rax,-144(%r12)
movq 80(%rsp),%rax
movq %rax,-136(%r12)
movq 88(%rsp),%rax
movq %rax,-128(%r12)
...
movq %rsi,-56(%r12)
movq %r9,-48(%r12)
movq %r10,-40(%r12)
movq %r11,-32(%r12)
movq %r14,-24(%r12)
```
This is due to the fact that the `Word`s are bound outside of a case
analysis and GHC is reluctant to push them inside of the branches. The
float-in pass will only float a binding into a case if the value is
"small" and at least one branch doesn't use the binding. Unfortunately the
case expression in question has only two branches.
This is demonstrated in the attached testcase.
[1] https://github.com/kolmodin/binary/pull/65
[2] https://www.haskell.org/pipermail/ghc-devs/2015-January/007997.html
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10012>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list