[GHC] #10062: Codegen on sequential FFI calls is not very good

Tue Feb 3 08:46:58 UTC 2015

#10062: Codegen on sequential FFI calls is not very good
-------------------------------------+-------------------------------------
              Reporter:  chadaustin  |             Owner:
                  Type:  bug         |            Status:  new
              Priority:  normal      |         Milestone:
             Component:  Compiler    |           Version:  7.8.3
  (CodeGen)                          |  Operating System:  Unknown/Multiple
              Keywords:              |   Type of failure:  Runtime
          Architecture:              |  performance bug
  Unknown/Multiple                   |        Blocked By:
             Test Case:              |   Related Tickets:
              Blocking:              |
Differential Revisions:              |
-------------------------------------+-------------------------------------
 I'm writing a library for efficiently building up a byte buffer.  The
 fastest approach I've found is via FFI, with restricted effects like ST.
 It's over twice as fast as ByteString Builder.

 Consider this example API usage: https://github.com/chadaustin/buffer-
 builder/blob/6bd0a39c56f63ab751faf29f9784ac87d52638be/bench/Bench.hs#L46

 It compiles into an instruction sequence containing direct, sequenced FFI
 calls.  For example, the last three calls work out to:

         addq $8,%rsp
         movq %rbx,%rdi
         movq 72(%rsp),%rax
         movq %rax,%rsi
         subq $8,%rsp
         movl $0,%eax
         call bw_append_bsz

         addq $8,%rsp
         movq %rbx,%rdi
         movl $35,%esi
         subq $8,%rsp
         movl $0,%eax
         call bw_append_byte

         addq $8,%rsp
         movq %rbx,%rdi
         movq 64(%rsp),%rax
         movq %rax,%rsi
         subq $8,%rsp
         movl $0,%eax
         call bw_append_bsz

 I don't know why rsp is being changed so much.  I also can't explain the
 assignment to eax before the call.  (It should also be xorl eax,eax, I
 would think.)

 To my reading, the above instruction sequence could be reduced to:

         movq %rbx,%rdi
         movq 64(%rsp),%rsi
         call bw_append_bsz

         movq %rbx,%rdi
         movl $35,%esi
         call bw_append_byte

         movq %rbx,%rdi
         movq 56(%rsp),%rsi
         call bw_append_bsz

 To reproduce, check out git at github.com:chadaustin/buffer-builder.git at
 revision 6bd0a39c56f63ab751faf29f9784ac87d52638be

 cabal configure --enable-benchmarks
 cabal bench

 And then look at the ./dist/build/bench/bench-tmp/bench/Bench.dump-asm
 file.

 This is specifically on OS X 64-bit with GHC 7.8.3, but I saw similar code
 generation on GHC 7.6 on Linux 64-bit.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10062>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler