[Haskell-cafe] Is it possible to have constant-space JSON decoding?

Felipe Almeida Lessa felipe.lessa at gmail.com
Tue Dec 4 15:23:19 CET 2012


Aeson doesn't have an incremental parser so it'll be
difficult/impossible to do what you want.  I guess you want an
event-based JSON parser, such as yajl [1].  I've never used this
library, though.

Cheers,

[1] http://hackage.haskell.org/package/yajl-0.3.0.5

On Tue, Dec 4, 2012 at 12:11 PM, Iustin Pop <iustin at google.com> wrote:
> Hi,
>
> I'm trying to parse a rather simple but big JSON message, but it turns
> out that memory consumption is a problem, and I'm not sure what the
> actual cause is.
>
> Let's say we have a simple JSON message: an array of 5 million numbers.
> I would like to parse this in constant space, such that if I only need
> the last element, overall memory usage is low (yes, unrealistic use, but
> please bear with me for a moment).
>
> Using aeson, I thought the following program will be nicely-behaved:
>
>> import Data.Aeson
>> import qualified Data.Attoparsec.ByteString.Lazy as AL
>> import qualified Data.ByteString.Lazy as L
>>
>> main = do
>>   r <- L.readFile "numbers"
>>   case AL.parse json r :: AL.Result Value of
>>     AL.Fail _ context errs -> do
>>          print context
>>          print errs
>>     AL.Done _ d -> case fromJSON d::Result [Value] of
>>                      Error x -> putStrLn x
>>                      Success d -> print $ last d
>
> However, this uses (according to +RTS -s) ~1150 GB of memory. I've tried
> switching from json to json', but while that uses slightly less memory
> (~1020 MB) it clearly can't be fully lazy, since it forces conversion to
> actual types.
>
> Looking at the implementation of "FromJSON [a]", it seems we could
> optimise the code by not forcing to a list. New (partial) version does:
>
>>     AL.Done _ d -> case d of
>>                      Array v -> print $ V.last v
>
> And this indeed reduces the memory, when using json', to about ~700MB.
> Better, but still a lot.
>
> It seems that the Array constructor holds a vector, and this results in
> too much strictness?
>
> Looking at the memory profiles (with "json" and "Array"), things are
> quite interesting - lots of VOID, very small USE, all generated from
> Data.Aeson.Parser.Internal:array. Using -hd, we have a reasonable equal
> split between various attoparsec combinators, Data.Aeson.Parser.Internal
> epressions, etc.
>
> So, am I doing something wrong, or is it simply not feasible to get
> constant-space JSON decoding?
>
> Using the 'json' library instead of 'aeson' is no better, since that
> wants the input as a String which consumes even more memory (and dies,
> when compiled with -O2, with out of stack even for 64MB stack).
>
> thanks,
> iustin
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe



-- 
Felipe.



More information about the Haskell-Cafe mailing list