[Haskell-cafe] Is it possible to have constant-space JSON decoding?

Iustin Pop iustin at google.com
Tue Dec 4 15:11:07 CET 2012


Hi,

I'm trying to parse a rather simple but big JSON message, but it turns
out that memory consumption is a problem, and I'm not sure what the
actual cause is.

Let's say we have a simple JSON message: an array of 5 million numbers.
I would like to parse this in constant space, such that if I only need
the last element, overall memory usage is low (yes, unrealistic use, but
please bear with me for a moment).

Using aeson, I thought the following program will be nicely-behaved:

> import Data.Aeson
> import qualified Data.Attoparsec.ByteString.Lazy as AL
> import qualified Data.ByteString.Lazy as L
> 
> main = do
>   r <- L.readFile "numbers"
>   case AL.parse json r :: AL.Result Value of
>     AL.Fail _ context errs -> do
>          print context
>          print errs
>     AL.Done _ d -> case fromJSON d::Result [Value] of
>                      Error x -> putStrLn x
>                      Success d -> print $ last d

However, this uses (according to +RTS -s) ~1150 GB of memory. I've tried
switching from json to json', but while that uses slightly less memory
(~1020 MB) it clearly can't be fully lazy, since it forces conversion to
actual types.

Looking at the implementation of "FromJSON [a]", it seems we could
optimise the code by not forcing to a list. New (partial) version does:

>     AL.Done _ d -> case d of
>                      Array v -> print $ V.last v

And this indeed reduces the memory, when using json', to about ~700MB.
Better, but still a lot.

It seems that the Array constructor holds a vector, and this results in
too much strictness?

Looking at the memory profiles (with "json" and "Array"), things are
quite interesting - lots of VOID, very small USE, all generated from
Data.Aeson.Parser.Internal:array. Using -hd, we have a reasonable equal
split between various attoparsec combinators, Data.Aeson.Parser.Internal
epressions, etc.

So, am I doing something wrong, or is it simply not feasible to get
constant-space JSON decoding?

Using the 'json' library instead of 'aeson' is no better, since that
wants the input as a String which consumes even more memory (and dies,
when compiled with -O2, with out of stack even for 64MB stack).

thanks,
iustin



More information about the Haskell-Cafe mailing list