[Haskell-cafe] Fast JSON validation - reducing allocations

David Turner dct25-561bs at mythic-beasts.com
Thu May 11 16:59:32 UTC 2017


Interesting, thanks for the link. However we're trying to do _even less_
than that - this is just the "scan the document to produce the Elias-Fano
encoding" step, except without needing to keep a hold of it for later. It's
not quite as trivial as that paper makes out as (a) it doesn't mention the
possibility that the documents might not be well-formed, and (b) it doesn't
really talk about dealing with the special delimiting characters within
string literals, for which you need to drag along a bunch of state while
you're scanning.

This'd be pretty trivial to do in a C-like language now we've got the DFA
tables built, and I may resort to that at some point if we can't work out
how to avoid the allocations in Haskell-land, but I'd quite like to be able
to solve problems of this form without unnecessarily resorting to C.

Cheers,

On 11 May 2017 at 17:30, Bardur Arantsson <spam at scientician.net> wrote:

> On 2017-05-11 18:12, David Turner wrote:
> > Dear Café,
> >
> > We have a stream of ~900byte JSON documents (lazy ByteStrings) that we
> > would like to validate (NB not parse) as quickly as possible. We just
> > need to check that the bytes are a syntactically valid JSON document and
> > then pass them elsewhere; we do not care what the content is. The
> > documents are encoded in UTF-8 and have no extraneous whitespace.
> >
>
> No particular recommendations, but you might want to look into
> semi-indexing[1] as a strategy. It looks plausible that it would be
> possible to do that without a lot of allocation; see the paper for
> details. (I there's also a demo implementation in Python on GH.)
>
> [1] http://www.di.unipi.it/~ottavian/files/semi_index_cikm.pdf
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20170511/3feb556a/attachment.html>


More information about the Haskell-Cafe mailing list