[Haskell-cafe] Fast JSON validation - reducing allocations

Fri May 12 08:27:40 UTC 2017

On Fri, 2017-05-12 at 09:10 +0100, David Turner wrote:
> Morning all,
> 
> On 11 May 2017 at 20:32, Ben Gamari <ben at smart-cactus.org> wrote:
> > Something that I should have mentioned earlier is that STG has the
> > nice
> > property that all allocation is syntactically obvious: allocated
> > closures manifest as `let`s. This makes it fairly easy to pick out
> > possible allocation sites, even in large dumps.
> 
> 
> Ah, that's very useful to know!
> 
> Armed with that knowledge, I came to the conclusion that the
> allocation was for the sharing of the `nextState` variable. Inlining
> it brings it down to 20us and 22kB per iteration.
> 
> https://github.com/DaveCTurner/json-validator/commit/ec994ec9226ca7bc
> 2e76f19bef98f42e0b233524
> 
> Getting closer, but it looks like waiting for 8.2 is a better answer.
> Looking forward to it!
> 
> Cheers,
> 

Maybe this is a silly question, and please let me know why if so, but:

Has anyone thought about parallelizing it for multiple messages in
order to "produce garbage faster"? While reducing allocation will make
the single validations faster, doing multiple ones might improve the
throughput per GC ratio. This assumes that the amount of live data in
the heap is small, making GC sort of constant time, and having multiple
cores available.

I wonder whether a few strategically placed par's and pseq's might
allow you to scale horizontally without requiring invasive changes to
the program's structure. Apologies for not trying to do this myself
first ;-).

kind regards, Arjen