[GHC] #14035: Weird performance results.
GHC
ghc-devs at haskell.org
Fri Jul 28 18:59:42 UTC 2017
#14035: Weird performance results.
-------------------------------------+-------------------------------------
Reporter: danilo2 | Owner: (none)
Type: bug | Status: new
Priority: high | Milestone:
Component: Compiler | Version: 8.0.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by danilo2):
Simon, first of all, thank you very much for your time and help with this
topic!
I added some important notices to the points mentioned in your response:
**(1)** I'm so happy that you've found out that something is wrong and
you've got fix for that! In generall, `-XStrict` is awesome, we need it in
high performance Haskell code, putting bangs everywhere (and remembering
about it) could be cumbersome.
**(2)** You're of course right. I just opened the browser to add comment
exactly about the same finding. The specification of `(|||)` allows GHC to
easily discover that if we always use `XFalse` value, it could shorten the
mentioned code to `s@(T b' a') <- fromFailParser $ f a ; return s` (just
reuse the value). There are however 3 other non-obvious questions
involved:
**(2a)** Why GHC is able to optimize the code this way if we use
everywhere `-XFalse` but it does not when using everywhere `-XTrue`? Very
similar final core could be generated in the later case – if `b` is
`XFalse` we can just reuse the output value, if it is `XTrue` we can be
sure the output always contains `XTrue` as well.
**(2b)** Even if GHC needs to create code like `T b' a' <- fromFailParser
$ f a ; return $ T something a'`, why it takes so long? This is a strict,
fully evaluated value, so why "updating a field" takes 10x longer than
Char comparison?
**(2c)** Moreover, what is the reason to "allocate a fresh `T` every time
round the loop"? The fields of the tuple `T` do not "interact" with each
other, they are just 2 separate outputs from a function. I could of course
be very wrong, but I think it should be possible to just optimize `T a b`
to `(# a,b #)` and cut the "fresh `T` allocation time" completely out, am
I right?
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14035#comment:7>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list