[GHC] #8814: 7.8 optimizes attoparsec improperly

Fri Feb 21 17:59:54 UTC 2014

#8814: 7.8 optimizes attoparsec improperly
--------------------------------------------+------------------------------
        Reporter:  joelteon                 |            Owner:
            Type:  bug                      |           Status:  new
        Priority:  normal                   |        Milestone:
       Component:  Compiler                 |          Version:  7.8.1-rc1
      Resolution:                           |         Keywords:
Operating System:  MacOS X                  |     Architecture:  x86_64
 Type of failure:  Runtime performance bug  |  (amd64)
       Test Case:                           |       Difficulty:  Unknown
        Blocking:                           |       Blocked By:
                                            |  Related Tickets:
--------------------------------------------+------------------------------

Comment (by simonpj):

 I have not had any time to devote to this.  I tried
 {{{
 ghc -O T8814.hs -ddump-simpl -o T8814
 }}}
 with and without `-fno-full-laziness`.  Indeed I see the perf difference.

 The Core from `-ddump-simpl` looks very different. Inside `Main.$wa`
 you'll see a call to `runSTRep`.  The function to which `runSTRep` is
 applied looks very different.
  * Without full laziness, it consists of a call to `newArray#` followed by
 a couple of `memcpy` calls
  * With full laziness, it has a rather complicated local recursive
 function that allocates a LOT of memory.

 I have no idea why. I think it must be to do with optimisations being done
 by RULES in the text library.  If I add `-ddump-rule-firings` and grep for
 `TEXT` in the rule names, I get
 {{{
 -- With full laziness
 Rule fired: TEXT append -> fused
 Rule fired: TEXT append -> fused
 Rule fired: TEXT append -> fused
 Rule fired: TEXT append -> fused
 Rule fired: TEXT append -> unfused
 Rule fired: TEXT tail -> unfused
 Rule fired: TEXT tail -> unfused

 -- Without full laziness
 Rule fired: TEXT append -> fused
 Rule fired: TEXT append -> fused
 Rule fired: TEXT append -> fused
 Rule fired: TEXT append -> fused
 Rule fired: TEXT append -> unfused
 Rule fired: TEXT append -> unfused
 Rule fired: TEXT append -> unfused
 Rule fired: TEXT tail -> unfused
 Rule fired: TEXT tail -> unfused
 Rule fired: TEXT append -> unfused
 }}}
 So there is clearly a difference.  Should that difference have such a
 massive performance impact?  Ask the author of the text library!  Why does
 full laziness have the effect?  Well if you have `(\x. map (f x) (map g
 ys))`, say, full laziness may float out the `map g ys` and then the
 map/map fusion won't happen.

 At this point I hope that someone else will take over debugging to find
 out more.

 Simon

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8814#comment:8>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler