[Haskell-cafe] JSON parser that returns the rest of the string that was not used

Richard A. O'Keefe ok at cs.otago.ac.nz
Tue May 31 00:42:13 UTC 2016



On 31/05/16 1:16 AM, Ryan Newton wrote:
> Thanks Richard.  I didn't know that the spec was precise about the 
> JSON expr not going beyond the closing character.


First, older versions of JSON required a value to be an object or
an array.  Second, the JSON grammar in RFC 7159 is quite
precise.

Insignificant whitespace is allowed before or after any of the six
structural characters.
-- That is, [ ] { } , :
-- Oddly enough, the specification does NOT say that whitespace
-- is allowed before or after any other token; it appears that
-- "false" is legal at JSON top level but " false" is not.

       ws = *(
               %x20 /              ; Space
               %x09 /              ; Horizontal tab
               %x0A /              ; Line feed or New line
               %x0D )              ; Carriage return


value = false / null / true / object / array / number / string

false = %x66.61.6c.73.65   ; false

null  = %x6e.75.6c.6c      ; null

true  = %x74.72.75.65      ; true

Note that it's not a matter of scanning a sequence of letters
and then checking for particular values, it must be one of
those three exact sequences.  Once you have read the "e"
of "false" there is no point in reading any further.  You
certainly don't need to skip white space, indeed, if you take
the specification literally, you mustn't.  (But, sigh, it IS ok
to skip white space after a final ] or }.  Such are the standards
the net is made from.)

object = ws %x7B ws [ member *( ws %x2C ws member ) ]
        ws %x7D ws

member = string ws %x3A ws value

array =  ws %x5B ws [ value *( ws %x2C ws value ) ]  ws %x5D ws

And yes, the grammar is ambiguous.  Consider
"[ [ ] ]"
Does the first white space character go with the first left bracket
or the second one?
All they needed to do was to say that strings, any other values,
and , : ] } can be preceded by insignificant white space, and the
ambiguity would be gone and " false" would be legal.

Every kind of number ends with a block of digits; since white space
isn't allowed after numbers, the next character, whatever it is,
should not be consumed, but must be checked to make sure it is
not a digit.

I wonder if anyone has a JSON parser that follows the letter of the
standard?  Preparing this message has made me realise that
(a) mine doesn't and (b) I don't really want it to.






More information about the Haskell-Cafe mailing list