[Haskell-cafe] Fast JSON validation - reducing allocations
Ben Gamari
ben at smart-cactus.org
Thu May 11 17:57:58 UTC 2017
Eric Seidel <eric at seidel.io> writes:
> On Thu, May 11, 2017, at 09:12, David Turner wrote:
>> Could anyone help, e.g. by pointing me at the bit in the Core that is
>> allocating within the main loop?
>
> GHC has a -ticky flag that tracks precisely where allocations are
> happening. It's quite low level (you may have to stare at the STG in
> addition to the Core to interpret the results) but I've found it
> incredibly useful when trying to understand performance swings in GHC's
> own benchmarks.
>
> https://ghc.haskell.org/trac/ghc/wiki/Debugging/TickyTicky
>
> You can enable it per module, which is nice break from the
> rebuild-the-world -prof, but the flip side IIRC is that if the
> allocations are happening in a module that wasn't compiled with -ticky
> (e.g. somewhere inside Data.Array) you might not see them.
>
Indeed, -ticky is wonderful for this sort of thing since it doesn't
affect Core-to-Core optimization, meaning that the instrumented program
will behave very similarly to the uninstrumented version (just a bit
slower due to counter overhead).
I had a quick look at your example over lunch. I started like this
(after commenting out the aeson benchmark in Main),
$ ghc app/Main.hs -isrc -O2 -ddump-stg -fforce-recomp \
-ddump-to-file -dsuppress-idinfo -ddump-simpl -rtsopts \
-ticky -ticky-allocd
$ app/Main +RTS -rticky.out
This produces a Ticky report (named ticky.out) in the current directory.
This includes a table which in my case looked something like,
Entries Alloc Alloc'd Non-void Arguments STG Name
--------------------------------------------------------------------------------
123 192 48 1 L go9{v slyX} (main at main:Automaton)
2 0 32 1 L go8{v slyH} (main at main:Automaton)
1465016 58600640 146501600 0 $w$j1{v slMc} (main at main:Automaton) in slHB
161884268 011761148448 0 $w$j1{v slJT} (main at main:Automaton) in slHA
0 0 11720128 4 ppww $s$wgo2{v slHz} (main at main:Automaton) in r4s
16353241111790448768 13185144 5 wwLpp $wgo7{v slHA} (main at main:Automaton) in r4s
1831270 167011824 13185144 6 ppwLww $s$wgo1{v slHB} (main at main:Automaton) in r4s
183127 0 13185144 0 $w$j{v slGN} (main at main:Automaton) in r4s
992 8624 10944 1 L go7{v slwV} (main at main:Automaton) in r4m
3 72 0 1 C main at main:Automaton.showGraph120{v r3O}
681 0 0 2 ML go6{v rkSd} (main at main:Automaton)
The important things we care about here are Alloc (that is how many
bytes each name allocates) and Alloc'd (that is, how many of each
closure are allocated in bytes). [Aside: we explicitly enabled the
Alloc'd column during compilation with the -ticky-allocd flag; if you
omit it this column will contain only zeros. In general I find it
helpful to have so I generally enable it, despite the overhead it
introduces.]
The table suggests a few nice starting points: $w$j1_slJT is allocated
quite a bit, probably by $wgo7_slHA. Next we can turn to the STG output
(src/Automaton.dump-stg) to try to understand why this is. Reading STG
generally tends to be quite hard due to the rather large, deeply nested
trees that most programs compile to. That being said, a good editor can
help immensely (I also have my ghc-dump [1] package which I've been
meaning to add STG support to, which would enable some interesting
analysis techniques).
Unfortunately, my lunch is now all gone so I'll have to drop this here
for now. However, I'll try to pick it up again tonight. Do keep us
apprised of your progress!
Cheers,
- Ben
[1] https://github.com/bgamari/ghc-dump
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 487 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20170511/7005fe35/attachment.sig>
More information about the Haskell-Cafe
mailing list