[GHC] #14035: Weird performance results.

Fri Jul 28 18:59:42 UTC 2017

#14035: Weird performance results.
-------------------------------------+-------------------------------------
        Reporter:  danilo2           |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  high              |            Milestone:
       Component:  Compiler          |              Version:  8.0.1
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by danilo2):

 Simon, first of all, thank you very much for your time and help with this
 topic!
 I added some important notices to the points mentioned in your response:

 **(1)** I'm so happy that you've found out that something is wrong and
 you've got fix for that! In generall, `-XStrict` is awesome, we need it in
 high performance Haskell code, putting bangs everywhere (and remembering
 about it) could be cumbersome.

 **(2)** You're of course right. I just opened the browser to add comment
 exactly about the same finding. The specification of `(|||)` allows GHC to
 easily discover that if we always use `XFalse` value, it could shorten the
 mentioned code to `s@(T b' a') <- fromFailParser $ f a ; return s` (just
 reuse the value). There are however 3 other non-obvious questions
 involved:

 **(2a)** Why GHC is able to optimize the code this way if we use
 everywhere `-XFalse` but it does not when using everywhere `-XTrue`? Very
 similar final core could be generated in the later case – if `b` is
 `XFalse` we can just reuse the output value, if it is `XTrue` we can be
 sure the output always contains `XTrue` as well.

 **(2b)** Even if GHC needs to create code like `T b' a' <- fromFailParser
 $ f a ; return $ T something a'`, why it takes so long? This is a strict,
 fully evaluated value, so why "updating a field" takes 10x longer than
 Char comparison?

 **(2c)** Moreover, what is the reason to "allocate a fresh `T` every time
 round the loop"? The fields of the tuple `T` do not "interact" with each
 other, they are just 2 separate outputs from a function. I could of course
 be very wrong, but I think it should be possible to just optimize `T a b`
 to `(# a,b #)` and cut the "fresh `T` allocation time" completely out, am
 I right?

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14035#comment:7>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler