[GHC] #10062: Codegen on sequential FFI calls is not very good

Sun Aug 30 11:15:10 UTC 2015

#10062: Codegen on sequential FFI calls is not very good
-------------------------------------+-------------------------------------
        Reporter:  chadaustin        |                   Owner:
            Type:  bug               |                  Status:  new
        Priority:  normal            |               Milestone:
       Component:  Compiler          |                 Version:  7.8.3
  (CodeGen)                          |
      Resolution:                    |                Keywords:
Operating System:  Unknown/Multiple  |            Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |               Test Case:
      Blocked By:                    |                Blocking:
 Related Tickets:                    |  Differential Revisions:
-------------------------------------+-------------------------------------
Description changed by bgamari:

Old description:

> I'm writing a library for efficiently building up a byte buffer.  The
> fastest approach I've found is via FFI, with restricted effects like ST.
> It's over twice as fast as ByteString Builder.
>
> Consider this example API usage: https://github.com/chadaustin/buffer-
> builder/blob/6bd0a39c56f63ab751faf29f9784ac87d52638be/bench/Bench.hs#L46
>
> It compiles into an instruction sequence containing direct, sequenced FFI
> calls.  For example, the last three calls work out to:
>
>         addq $8,%rsp
>         movq %rbx,%rdi
>         movq 72(%rsp),%rax
>         movq %rax,%rsi
>         subq $8,%rsp
>         movl $0,%eax
>         call bw_append_bsz
>
>         addq $8,%rsp
>         movq %rbx,%rdi
>         movl $35,%esi
>         subq $8,%rsp
>         movl $0,%eax
>         call bw_append_byte
>
>         addq $8,%rsp
>         movq %rbx,%rdi
>         movq 64(%rsp),%rax
>         movq %rax,%rsi
>         subq $8,%rsp
>         movl $0,%eax
>         call bw_append_bsz
>
> I don't know why rsp is being changed so much.  I also can't explain the
> assignment to eax before the call.  (It should also be xorl eax,eax, I
> would think.)
>
> To my reading, the above instruction sequence could be reduced to:
>
>         movq %rbx,%rdi
>         movq 64(%rsp),%rsi
>         call bw_append_bsz
>
>         movq %rbx,%rdi
>         movl $35,%esi
>         call bw_append_byte
>
>         movq %rbx,%rdi
>         movq 56(%rsp),%rsi
>         call bw_append_bsz
>
> To reproduce, check out git at github.com:chadaustin/buffer-builder.git at
> revision 6bd0a39c56f63ab751faf29f9784ac87d52638be
>
> cabal configure --enable-benchmarks
> cabal bench
>
> And then look at the ./dist/build/bench/bench-tmp/bench/Bench.dump-asm
> file.
>
> This is specifically on OS X 64-bit with GHC 7.8.3, but I saw similar
> code generation on GHC 7.6 on Linux 64-bit.

New description:

 I'm writing a library for efficiently building up a byte buffer.  The
 fastest approach I've found is via FFI, with restricted effects like ST.
 It's over twice as fast as ByteString Builder.

 Consider this example API usage: https://github.com/chadaustin/buffer-
 builder/blob/6bd0a39c56f63ab751faf29f9784ac87d52638be/bench/Bench.hs#L46

 It compiles into an instruction sequence containing direct, sequenced FFI
 calls.  For example, the last three calls work out to:

 {{{
 addq $8,%rsp
 movq %rbx,%rdi
 movq 72(%rsp),%rax
 movq %rax,%rsi
 subq $8,%rsp
 movl $0,%eax
 call bw_append_bsz

 addq $8,%rsp
 movq %rbx,%rdi
 movl $35,%esi
 subq $8,%rsp
 movl $0,%eax
 call bw_append_byte

 addq $8,%rsp
 movq %rbx,%rdi
 movq 64(%rsp),%rax
 movq %rax,%rsi
 subq $8,%rsp
 movl $0,%eax
 call bw_append_bsz
 }}}

 I don't know why `rsp` is being changed so much.  I also can't explain the
 assignment to `eax` before the call.  (It should also be `xorl eax,eax`, I
 would think.)

 To my reading, the above instruction sequence could be reduced to:

 {{{
 movq %rbx,%rdi
 movq 64(%rsp),%rsi
 call bw_append_bsz

 movq %rbx,%rdi
 movl $35,%esi
 call bw_append_byte

 movq %rbx,%rdi
 movq 56(%rsp),%rsi
 call bw_append_bsz
 }}}

 To reproduce, check out `git at github.com:chadaustin/buffer-builder.git` at
 revision 6bd0a39c56f63ab751faf29f9784ac87d52638be

 {{{
 cabal configure --enable-benchmarks
 cabal bench
 }}}

 And then look at the `./dist/build/bench/bench-tmp/bench/Bench.dump-asm`
 file.

 This is specifically on OS X 64-bit with GHC 7.8.3, but I saw similar code
 generation on GHC 7.6 on Linux 64-bit.

--

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10062#comment:7>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler