Integer constant folding in the presence of new primops

Wed Jun 19 18:39:01 CEST 2013

I mean, it certainly *seems* reasonable a 15% hit could come from
pipelining changes or cache behavior or something. I don't think
alignment would really be a huge issue; post-Nehalem I believe
non-aligned writes/reads are extremely cheap. Non-intuitive behavior
can totally happen too: I've seen cases of adding instructions to a
loop which speeds things up (e.g. by taking the extra step, you may
mitigate a dependency stall, which massively helps pipelining across
the loop body etc.)

Nicolas, can I ask what benchmark you're looking at? And what
performance tools are you using, Intels'? If you're on Linux, the
'perf' tool on a modern kernel can be used to quickly get an overview
of how many cache misses/hits your process has, how many pipeline
stalls occur, etc. You can then use it to drill down a bit into the
assembly that's problematic.

That might not give you an exact culprit (it could be many changes and
accumulative hits,) but it's a start.

On Wed, Jun 19, 2013 at 10:43 AM, Nicolas Frisby
<nicolas.frisby at gmail.com> wrote:
> I'm also seeing performance regressions in the shootout benchmarks that I
> can't identify in the asm. The new asm looks better but performs worse, with
> a ~15% slowdown.
>
> I fired up the performance counters in my CPU and the free Intel code for
> inspecting them showed that my CPU utilization took about a 10% hit, even
> while executing fewer total instructions.
>
>   1) Jan, perhaps we're seeing the same sort of behavior — the shootout
> benchmarks have extremely hot loops (hundreds of millions of iterations
> IIRC). I used ticky profiling too, and saw no suspicious changes in any
> counters.
>
>   2) Dear Low-level Gurus: How feasible is it that a ~15% slowdown in a
> program with a very hot loop is due to incidentally inhibiting some caching
> behavior (instr? data?)? Or perhaps effecting alignment? FTR my CPU is a
> Core i7-2620M, Sandy Bridge.
>
> Thanks all.
>
> On Wed, Jun 19, 2013 at 9:27 AM, Jan Stolarek <jan.stolarek at p.lodz.pl>
> wrote:
>>
>> > If it's not sorted out, can you open a ticket, put in the relevant info
>> > (so
>> > we don't need to look at the email trail), and we can tackle it when you
>> > get here.
>> Currently there's a temporary workaround: I'm using new folding rules for
>> all primitive types,
>> except for Integer, in which case I left the old folding rules unchanged.
>> This of course should
>> be modified to make all rules uniform, but for now it at least passes
>> validation. I didn't fill
>> the ticket, because the bug does not exist yet :) It only manifests itself
>> in my patches, which
>> have not been applied yet. I'll add all the information from this
>> discussion to my github fork of
>> GHC and then move it to Trac once the bug makes it to HEAD.
>>
>> What worries me more about my patches is the performance regression in
>> kahan, because I see no
>> obvious differences in the generated assembly.
>>
>> Janek
>>
>> >
>> > Simon
>> >
>> > -----Original Message-----
>> > From: ghc-devs-bounces at haskell.org [mailto:ghc-devs-bounces at haskell.org]
>> > On
>> > Behalf Of Jan Stolarek Sent: 20 May 2013 12:35
>> > To: Ian Lynagh
>> > Cc: ghc-devs at haskell.org
>> > Subject: Re: Integer constant folding in the presence of new primops
>> >
>> > > If you remove everything but the quotInteger test from
>> > > integerConstantFolding and compile with -ddump-rule-rewrites then
>> > > you'll see that the eqInteger rule fires before quotInteger. This is
>> > > presumably comparing against 0, as the definition of quot for Integer
>> > > (in GHC.Real) is
>> > >     _ `quot` 0 = divZeroError
>> > >     n `quot` d = n `quotInteger` d
>> >
>> > Yes, I noticed these two rules firing together - perhaps that's the
>> > explanation why. I created a small program for testing:
>> >
>> > main = print quotInt
>> > quotInt :: Integer
>> > quotInt = 100063 `quot` 156
>> >
>> > I noticed that when I define eqInteger wrapper to be NOINLINE, the call
>> > to
>> > quot is translated to Core as:
>> >
>> > Main.quotInt =
>> >   GHC.Real.$fIntegralInteger_$cquot
>> >     (__integer 100063) (__integer 156)
>> >
>> > but when I change the wrapper to INLINE I get:
>> >
>> > Main.quotInt =
>> >   GHC.Real.$fNumRatio_$cquot             <-------- NumRatio instead of
>> > IntegralInteger (__integer 100063) (__integer 156)
>> >
>> > All rule firing happens later (I used -ddump-simpl-iterations
>> > -ddump-rule-firings), except that for $fNumRatio_$cquot the quot rules
>> > don't fire.
>> >
>> > > Do you also still have eqInteger wired in? It sounds like you might
>> > > have given them both the same unique?
>> >
>> > No, they didn't have the same unique. I modified the existing rules to
>> > work
>> > on the new primops and ignore their wrappers. At the moment I reverted
>> > these changes so that I can make progress and leave this problem for
>> > later.
>> >
>> > Janek
>> >
>> > _______________________________________________
>> > ghc-devs mailing list
>> > ghc-devs at haskell.org
>> > http://www.haskell.org/mailman/listinfo/ghc-devs
>>
>>
>>
>> _______________________________________________
>> ghc-devs mailing list
>> ghc-devs at haskell.org
>> http://www.haskell.org/mailman/listinfo/ghc-devs
>
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>

-- 
Regards,
Austin - PGP: 4096R/0x91384671