[Haskell-cafe] Fast JSON validation - reducing allocations

David Turner dct25-561bs at mythic-beasts.com
Thu May 11 18:56:00 UTC 2017


Ben, Eric, thanks for your help. A whole new world of low-level statistics
opens up before me...

On 11 May 2017 at 18:57, Ben Gamari <ben at smart-cactus.org> wrote:
> Eric Seidel <eric at seidel.io> writes:
>
> > On Thu, May 11, 2017, at 09:12, David Turner wrote:
> >> Could anyone help, e.g. by pointing me at the bit in the Core that is
> >> allocating within the main loop?
> >
> > GHC has a -ticky flag that tracks precisely where allocations are
> > happening. It's quite low level (you may have to stare at the STG in
> > addition to the Core to interpret the results) but I've found it
> > incredibly useful when trying to understand performance swings in GHC's
> > own benchmarks.
> >
> > https://ghc.haskell.org/trac/ghc/wiki/Debugging/TickyTicky
> >
> > You can enable it per module, which is nice break from the
> > rebuild-the-world -prof, but the flip side IIRC is that if the
> > allocations are happening in a module that wasn't compiled with -ticky
> > (e.g. somewhere inside Data.Array) you might not see them.
>
> Indeed, -ticky is wonderful for this sort of thing since it doesn't
> affect Core-to-Core optimization, meaning that the instrumented program
> will behave very similarly to the uninstrumented version (just a bit
> slower due to counter overhead).
>
> I had a quick look at your example over lunch. I started like this
> (after commenting out the aeson benchmark in Main),
>
>     $ ghc app/Main.hs -isrc  -O2 -ddump-stg -fforce-recomp \
>           -ddump-to-file -dsuppress-idinfo -ddump-simpl -rtsopts \
>           -ticky -ticky-allocd
>     $ app/Main +RTS -rticky.out


Ok, I was just about to ask about the Alloc'd column as mine was all
zeroes, but the -ticky-allocd was what I needed, thanks.

> The table suggests a few nice starting points: $w$j1_slJT is allocated
> quite a bit, probably by $wgo7_slHA. Next we can turn to the STG output
> (src/Automaton.dump-stg) to try to understand why this is. Reading STG
> generally tends to be quite hard due to the rather large, deeply nested
> trees that most programs compile to.

You're not wrong.

I've trawled through the STG as best as I can for a newbie. The function
with a large number in the 'Alloc' column looks pretty much to be the heart
of the `step` function. There's a lot of nesting but it looks largely to be
just checking the bounds on array indices, which I don't think allocates
anything.

However I see this this little snippet at the very bottom of the nesting:

case
    indexArray# [ds1_sl1b
                 y2_sl5H]
of
_ [Occ=Dead]
{ (##) ipv8_sl60 [Occ=Once!] ->
      case
          ipv8_sl60
      of
      _ [Occ=Dead]
      { GHC.Word.W8# dt6_sl62 [Occ=Once] ->
            case
                ss_sl5v
            of
            dt7_sl63
            { __DEFAULT ->
                  (#,,#) [__word 1
                          dt6_sl62
                          dt7_sl63];
            };
      };
};


I'm guessing that the `indexArray` call is the line that reads `nextState =
aTransitionsTable makeAutomaton AU.! (currentState, nextByte)`, and to me
it looks like it might be unboxing the thing it's getting out of the array.
Not sure that counts as an allocation, but I'm surprised anyway.

I'll keep digging anyway, but it'd be good to hear of any progress at your
end too. Hope it didn't spoil your lunch!

Cheers,

> While turning this over in my head I realized that this is the sort of
> program which may be helped significantly by GHC 8.2's improved join
> point handling. Indeed the timings seem to support this hypothesis:
>
> GHC 8.0.2
>
>     benchmarking json-validator/Automaton/testEvent
>     time                 22.84 μs   (22.76 μs .. 22.94 μs)
>                          1.000 R²   (1.000 R² .. 1.000 R²)
>     mean                 22.84 μs   (22.76 μs .. 22.94 μs)
>     std dev              297.4 ns   (221.8 ns .. 378.8 ns)
>
>     Alloc rate     4,015,256,078,038 bytes per MUT second
>
> GHC 8.2
>
>     benchmarking json-validator/Automaton/testEvent
>     time                 9.221 μs   (9.141 μs .. 9.318 μs)
>                          0.998 R²   (0.996 R² .. 1.000 R²)
>     mean                 9.163 μs   (9.084 μs .. 9.356 μs)
>     std dev              399.8 ns   (193.0 ns .. 745.4 ns)
>     variance introduced by outliers: 54% (severely inflated)
>
>     Alloc rate           123,141,635 bytes per MUT second
>
>
> Wow! I suspect your allocations have now vanished.

Ooo, that's more like it.

Could you run again using the following to get Criterion's estimate of the
allocations-per-call?

json-validator-exe --regress allocated:iters +RTS -T

Cheers,

David

On 11 May 2017 at 19:45, David Turner <dave.c.turner at gmail.com> wrote:

> On 11 May 2017 at 19:40, Ben Gamari <ben at smart-cactus.org> wrote:
>
>> Ccing Luke Maurer under the assumption that he will appreciate seeing
>> the fruits of his labor.
>>
>>
>> David Turner <dct25-561bs at mythic-beasts.com> writes:
>>
>> > Dear Café,
>> >
>> (snip)
>> >
>> > There's a copy of the relevant code for option 4 at
>> > https://github.com/DaveCTurner/json-validator. I've hacked around with
>> it a
>> > bit since producing the numbers above, so it's now apparently a bit
>> slower
>> > than Aeson but allocates less too (~65kB).
>> >
>> > Could anyone help, e.g. by pointing me at the bit in the Core that is
>> > allocating within the main loop?
>> >
>> While turning this over in my head I realized that this is the sort of
>> program which may be helped significantly by GHC 8.2's improved join
>> point handling. Indeed the timings seem to support this hypothesis:
>>
>> GHC 8.0.2
>>
>>     benchmarking json-validator/Automaton/testEvent
>>     time                 22.84 μs   (22.76 μs .. 22.94 μs)
>>                          1.000 R²   (1.000 R² .. 1.000 R²)
>>     mean                 22.84 μs   (22.76 μs .. 22.94 μs)
>>     std dev              297.4 ns   (221.8 ns .. 378.8 ns)
>>
>>     Alloc rate     4,015,256,078,038 bytes per MUT second
>>
>> GHC 8.2
>>
>>     benchmarking json-validator/Automaton/testEvent
>>     time                 9.221 μs   (9.141 μs .. 9.318 μs)
>>                          0.998 R²   (0.996 R² .. 1.000 R²)
>>     mean                 9.163 μs   (9.084 μs .. 9.356 μs)
>>     std dev              399.8 ns   (193.0 ns .. 745.4 ns)
>>     variance introduced by outliers: 54% (severely inflated)
>>
>>     Alloc rate           123,141,635 bytes per MUT second
>>
>>
>> Wow! I suspect your allocations have now vanished.
>>
>
>
> Ooo, that's more like it.
>
> Could you run again using the following to get Criterion's estimate of the
> allocations-per-call?
>
> json-validator-exe --regress allocated:iters +RTS -T
>
> Cheers,
>
> David
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20170511/dbaa6e9f/attachment.html>


More information about the Haskell-Cafe mailing list