[Haskell-cafe] JSON parser that returns the rest of the string that was not used

Ryan Newton rrnewton at gmail.com
Mon May 30 00:52:35 UTC 2016


On Sun, May 29, 2016 at 1:53 PM, Stephen Tetley <stephen.tetley at gmail.com>
 wrote:

> Isn't this a problem of JSON rather than it's parsers?
>

I can understand that a file with multiple JSONs is not a legal "JSON
text".  But... isn't that issue separate from whether parsers expect
terminated strings, or, conversely, are tolerant of arbitrary text
following the JSON expr?  Scheme "read" functions from time immemorial
would read the first expression off a handle without worrying about what
followed it!  It doesn't mean the whole file needs to be valid JSON, just
that each prefix chewed off the front is valid JSON.

Thanks to Nikita for the links to json-stream and
json-incremental-decoder.  My understanding is that if I use a top-level
array to wrap the objects, then these approaches will let me retain a
streaming/incremental IO.  I'm not sure yet how to use this to stream
output from a monadic computation.

Let me be specific about the scenario I'm trying to handle:

Criterion loops over benchmarks, and after running each, it writes the
report out to disk appending it to a file:

https://github.com/bos/criterion/blob/fb815c928af2cb089cea9399503304530e27883d/Criterion/Internal.hs#L128

This way, the report doesn't sit in memory affecting subsequent benchmarks.
 (I.e. polluting the live set for major GC.)  When all benchmarks are
complete, the reports are read back from the file.

There are bugs in the binary serialization used in the linked code.  We
want to switch it to dump and read back in JSON instead.

In this case, we can just write an initial "[" to the file, and then
serialize one JSON object at a time, interspersed with ",".  That's ok...
but it's kind of an ugly solution -- it requires that, we, the client of
the JSON serialization API, make assumptions about the serialization format
and reimplement a tiny tiny fraction of it.

Cheers,
   -Ryan


On Sun, May 29, 2016 at 1:53 PM, Stephen Tetley <stephen.tetley at gmail.com>
wrote:

> Hi Ryan
>
> Isn't this a problem of JSON rather than it's parsers?
>
> That's too say I believe (but could easily be wrong...) that a file
> with multiple JSON objects would be ill-formed; it would be
> well-formed if the multiple objects were in a single top-level array.
>
> On 29 May 2016 at 18:09, Ryan Newton <rrnewton at gmail.com> wrote:
> > As someone who spent many years putting data in S-expression format, it
> > seems natural to me to write multiple S-expressions (or JSON objects) to
> a
> > file, and expect a reader to be able to read them back one at a time.
> >
> > This seems comparatively uncommon in the JSON world.  Accordingly, it
> looks
> > like the most popular JSON parsing lib, Aeson, doesn't directly provide
> this
> > functionality.  Functions like decode just return a "Maybe a", not the
> > left-over input, meaning that you would need to somehow split up your
> > multi-object file before attempting to parse, which is annoying and error
> > prone.
> >
> > It looks like maybe you can get Aeson to do what I want by dropping down
> to
> > the attoparsec layer and messing with IResult.
> >
> > But is there a better way to do this?  Would this be a good convenience
> > routine to add to aeson in a PR?  I.e. would anyone else use this?
> >
> > Thanks,
> >   -Ryan
> >
> >
> >
> > _______________________________________________
> > Haskell-Cafe mailing list
> > Haskell-Cafe at haskell.org
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> >
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20160529/3301cca8/attachment.html>


More information about the Haskell-Cafe mailing list