Performance degradation when factoring out common code

Sat Sep 9 05:05:59 UTC 2017

While trying to come up with a minimal example I discovered one more
puzzling thing. runghc is fastest, ghc is slower, ghc with optimization is
slowest. This is completely reverse of the expected order.

ghc -O1 (-O2 is similar):

time                 15.23 ms   (14.72 ms .. 15.73 ms)

ghc -O0:

time                 3.612 ms   (3.548 ms .. 3.728 ms)

runghc:

time                 2.250 ms   (2.156 ms .. 2.348 ms)

I am grokking it further. Any pointers will be helpful. I understand that
-O2 can sometimes be slower e.g. aggressive inlining can sometimes be
counterproductive. But 4x variation is a lot and this is the case with -O1
as well which should be relatively safer than -O2 in general. Worst of all
runghc is significantly faster than ghc. What's going on?

-harendra

On 8 September 2017 at 18:49, Harendra Kumar <harendra.kumar at gmail.com>
wrote:

> I will try creating a minimal example and open a ticket for the inlining
> problem, the one I am sure about.
>
> -harendra
>
> On 8 September 2017 at 18:35, Simon Peyton Jones <simonpj at microsoft.com>
> wrote:
>
>> *I know that this is not an easy request*, but can either of you produce
>> a small example that demonstrates your problem?   If so, please open a
>> ticket.
>>
>>
>>
>> I don’t like hearing about people having to use trial and error  with
>> INLINE or SPECIALISE pragmas.  But I can’t even begin to solve the problem
>> unless I can reproduce it.
>>
>>
>>
>> Simon
>>
>>
>>
>> *From:* ghc-devs [mailto:ghc-devs-bounces at haskell.org] *On Behalf Of *Harendra
>> Kumar
>> *Sent:* 08 September 2017 13:50
>> *To:* Mikolaj Konarski <mikolaj.konarski at gmail.com>
>> *Cc:* ghc-devs at haskell.org
>> *Subject:* Re: Performance degradation when factoring out common code
>>
>>
>>
>> I should also point out that I saw performance improvements by manually
>> factoring out and propagating some common expressions to outer loops in
>> performance sensitive paths. Now I have made this a habit to do this
>> manually. Not sure if something like this has also been fixed with that
>> ticket or some other ticket.
>>
>>
>>
>> -harendra
>>
>>
>>
>> On 8 September 2017 at 17:34, Harendra Kumar <harendra.kumar at gmail.com>
>> wrote:
>>
>> Thanks Mikolaj! I have seen some surprising behavior quite a few times
>> recently and I was wondering whether GHC should do better. In one case I
>> had to use SPECIALIZE very aggressively, in another version of the same
>> code it worked well without that. I have been doing a lot of trial and
>> error with the INLINE/NOINLINE pragmas to figure out what the right
>> combination is. Sometimes it just feels like black magic, because I cannot
>> find a rationale to explain the behavior. I am not sure if there are any
>> more such problems lurking in, perhaps this is an area where some
>> improvement looks possible.
>>
>>
>>
>> -harendra
>>
>>
>>
>>
>>
>> On 8 September 2017 at 17:10, Mikolaj Konarski <
>> mikolaj.konarski at gmail.com> wrote:
>>
>> Hello,
>>
>> I've had a similar problem that's been fixed in 8.2.1:
>>
>> https://ghc.haskell.org/trac/ghc/ticket/12603
>>
>> You can also use some extreme global flags, such as
>>
>> ghc-options: -fexpose-all-unfoldings -fspecialise-aggressively
>>
>> to get most the GHC subtlety and shyness out of the way
>> when experimenting.
>>
>> Good luck
>> Mikolaj
>>
>>
>>
>>
>> On Fri, Sep 8, 2017 at 11:21 AM, Harendra Kumar
>> <harendra.kumar at gmail.com> wrote:
>> > Hi,
>> >
>> > I have this code snippet for the bind implementation of a Monad:
>> >
>> >     AsyncT m >>= f = AsyncT $ \_ stp yld ->
>> >         let run x = (runAsyncT x) Nothing stp yld
>> >             yield a _ Nothing  = run $ f a
>> >             yield a _ (Just r) = run $ f a <> (r >>= f)
>> >         in m Nothing stp yield
>> >
>> > I want to have multiple versions of this implementation parameterized
>> by a
>> > function, like this:
>> >
>> > bindWith k (AsyncT m) f = AsyncT $ \_ stp yld ->
>> >     let run x = (runAsyncT x) Nothing stp yld
>> >         yield a _ Nothing  = run $ f a
>> >         yield a _ (Just r) = run $ f a `k` (bindWith k r f)
>> >     in m Nothing stp yield
>> >
>> > And then the bind function becomes:
>> >
>> > (>>=) = bindWith (<>)
>> >
>> > But this leads to a performance degradation of more than 10%. inlining
>> does
>> > not help, I tried INLINE pragma as well as the "inline" GHC builtin. I
>> > thought this should be a more or less straightforward replacement
>> making the
>> > second version equivalent to the first one. But apparently there is
>> > something going on here that makes it perform worse.
>> >
>> > I did not look at the core, stg or asm yet. Hoping someone can quickly
>> > comment on it. Any ideas why is it so? Can this be worked around
>> somehow?
>> >
>> > Thanks,
>> > Harendra
>> >
>>
>> > _______________________________________________
>> > ghc-devs mailing list
>> > ghc-devs at haskell.org
>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>> <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&data=02%7C01%7Csimonpj%40microsoft.com%7C5ff3c69fb9d447c47b5908d4f6b832de%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636404718373134824&sdata=zyHYozym6TzL61Tq5CSERjqhKlxr%2ByV0j%2FyHtxmXmVE%3D&reserved=0>
>> >
>>
>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20170909/77dd18c4/attachment-0001.html>