[Haskell-cafe] Fast JSON validation - reducing allocations

David Turner dct25-561bs at mythic-beasts.com
Fri May 12 09:48:01 UTC 2017

On 12 May 2017 at 09:27, Arjen <arjenvanweelden at gmail.com> wrote:

> Maybe this is a silly question, and please let me know why if so, but:
> Has anyone thought about parallelizing it for multiple messages in
> order to "produce garbage faster"? While reducing allocation will make
> the single validations faster, doing multiple ones might improve the
> throughput per GC ratio. This assumes that the amount of live data in
> the heap is small, making GC sort of constant time, and having multiple
> cores available.

Not a silly question at all. Adding the following incantation:

    `using` parListChunk 100 rseq

does quite happily spread things across all 4 cores on my development
machine, and it's certainly a bit faster. To give some stats, it processes
~24 events between GCs rather than ~6, and collects ~2MB rather than
~500kB. The throughput becomes a lot less consistent, however, at least
partly due to some bigger GC pauses along the way. As Ben's statistics
showed, our allocation rate on one thread is around 4TBps, which is already
stressing the GC out a bit, and parallelising it doesn't make that problem
any easier.

I know in the OP I said "we have a stream" (accidental MLK misquote) but in
fact there are a bunch of parallel streams whose number exceeds the number
of available cores, so we don't anticipate any enormous benefit from
spreading the processing of any one stream across multiple cores:
single-threaded performance is what we think we should be concentrating on.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20170512/90b18fdd/attachment.html>

More information about the Haskell-Cafe mailing list