aeson and dlist in HP 2013.4.0.0

Bas van Dijk v.dijk.bas at gmail.com
Thu Nov 28 16:42:49 UTC 2013


On 28 November 2013 13:42, Sven Panne <svenpanne at gmail.com> wrote:
> Just two add my 2c: Given all these new packages which would need to
> be pulled into the HP just for aeson, let's not include aeson for
> 2013.4.0.0 and release 2013.4.0.0 soon without the need for lengthy
> discussions.

As the proposer for inclusion of aeson in the HP I'm beginning to agree.

There's another reason I would like to postpone the aeson inclusion: I
just started working on improving the encoding performance of aeson.
This requires some significant changes to the API. Therefore I think
it would be better to see how well this new API works out. If it works
out, release it as aeson-7 (or aeson-8) and include that release in
the HP after next. This way we have time to discuss the new
dependencies and the HP remains stable.

The following is a brief explanation of the new aeson API (you can
stop reading here if you're not interested in it):

The idea is to use the same trick that is used in the upcoming binary
package[1].

First of all toJSON will return a JsonBuilder instead of a Value:

class ToJSON a where
  toJSON :: a -> JsonBuilder

A JsonBuilder is basically a difference list:

newtype JsonBuilder = JsonBuilder (IStream -> IStream)

instance Monoid JsonBuilder where ...

The "list", here represented as an IStream, is a sequence of
instructions to the encoder:

data IStream =
    INull                              IStream

  | ITrue                              IStream
  | IFalse                             IStream

  | IDoubleQuote                       IStream

  | IChar   {-# UNPACK #-} !Char       IStream
  | IString                !String     IStream
  | IText                  !Text       IStream

  | IInt    {-# UNPACK #-} !Int        IStream
  | IInt8   {-# UNPACK #-} !Int8       IStream
  | IInt16  {-# UNPACK #-} !Int16      IStream
  | IInt32  {-# UNPACK #-} !Int32      IStream
  | IInt64  {-# UNPACK #-} !Int64      IStream

  | IWord   {-# UNPACK #-} !Word       IStream
  | IWord8  {-# UNPACK #-} !Word8      IStream
  | IWord16 {-# UNPACK #-} !Word16     IStream
  | IWord32 {-# UNPACK #-} !Word32     IStream
  | IWord64 {-# UNPACK #-} !Word64     IStream

  | IFloat  {-# UNPACK #-} !Float      IStream
  | IDouble {-# UNPACK #-} !Double     IStream

  | IInteger               !Integer    IStream

  | IScientific            !Scientific IStream

  | IComma                             IStream

  | IBeginArray                        IStream
  | IEndArray                          IStream

  | IBeginObject                       IStream
  | IEndObject                         IStream

  | IColon                             IStream

  | IValue                 !Value      IStream

  -- Fused:
  | IBeginObject_IDoubleQuote          IStream
  | IComma_IDoubleQuote                IStream
  -- TODO; more

  | IEnd

Converting a JsonBuilder to a Builder (note that I'm using the new
bytestring Builder here) is simply a matter of executing the right
Builder for each instruction:

toBuilder :: JsonBuilder -> Builder
toBuilder (JsonBuilder g) = go (g IEnd)
 where
  go :: IStream -> Builder
  go is = case is of
    INull             is' -> nullB               <> go is'

    ITrue             is' -> trueB               <> go is'
    IFalse            is' -> falseB              <> go is'

    IDoubleQuote      is' -> char8          '"'  <> go is'

    IChar        c    is' -> char           c    <> go is'
    IString      cs   is' -> string         cs   <> go is'
    IText        t    is' -> text           t    <> go is'

    IInt         i    is' -> intDec         i    <> go is'
    IInt8        i8   is' -> int8Dec        i8   <> go is'
    IInt16       i16  is' -> int16Dec       i16  <> go is'
    IInt32       i32  is' -> int32Dec       i32  <> go is'
    IInt64       i64  is' -> int64Dec       i64  <> go is'

    IWord        w    is' -> wordDec        w    <> go is'
    IWord8       w8   is' -> word8Dec       w8   <> go is'
    IWord16      w16  is' -> word16Dec      w16  <> go is'
    IWord32      w32  is' -> word32Dec      w32  <> go is'
    IWord64      w64  is' -> word64Dec      w64  <> go is'

    IFloat       f    is' -> floatDec       f    <> go is'
    IDouble      d    is' -> doubleDec      d    <> go is'

    IInteger     i    is' -> integerDec     i    <> go is'
    IScientific  s    is' -> fromScientific s    <> go is'

    IComma            is' -> char8          ','  <> go is'

    IBeginArray       is' -> char8          '['  <> go is'
    IEndArray         is' -> char8          ']'  <> go is'

    IBeginObject      is' -> char8          '{'  <> go is'
    IEndObject        is' -> char8          '}'  <> go is'

    IColon            is' -> char8          ':'  <> go is'

    IValue       v    is' -> fromValue      v    <> go is'

    -- Fused:
    IBeginObject_IDoubleQuote is'-> fixed2('{','"')<> go is'
    IComma_IDoubleQuote       is'-> fixed2(',','"')<> go is'
    -- TODO: more

    IEnd -> mempty

nullB :: Builder
nullB = fixed4 ('n',('u',('l','l')))
{-# INLINE nullB #-}

trueB :: Builder
trueB = fixed4 ('t',('r',('u','e')))
{-# INLINE trueB #-}

falseB :: Builder
falseB = fixed5 ('f',('a',('l',('s','e'))))
{-# INLINE falseB #-}

fixed2 :: (Char, Char) -> Builder
fixed2 = P.primFixed (P.char8 >*< P.char8)
{-# INLINE fixed2 #-}

fixed4 :: (Char, (Char, (Char, Char))) -> Builder
fixed4 = P.primFixed (P.char8 >*< P.char8 >*< P.char8 >*< P.char8)
{-# INLINE fixed4 #-}

fixed5 :: (Char, (Char, (Char, (Char, Char)))) -> Builder
fixed5 = P.primFixed (P.char8 >*< P.char8 >*< P.char8 >*< P.char8 >*< P.char8)
{-# INLINE fixed5 #-}

This representation allows a lot of optimizations. For example we can
define rewrite rules that "fuse" the Builders of common sequences
like:

{-# RULES
  "IBeginObject_IDoubleQuote" forall is.
   IBeginObject (IDoubleQuote is) =
   IBeginObject_IDoubleQuote is #-}

{-# RULES
  "IComma_IDoubleQuote" forall is.
   IComma (IDoubleQuote is) =
   IComma_IDoubleQuote  is #-}

The encoder can handle these common sequences more efficiently.

Of course the JsonBuilder is abstract to the user. There will be a
safe API to construct well-formed JsonBuilders. (While writing this I
realize that users will be able to use the Monoid instance for
JsonBuilders which is undesirable. I will solve this by wrapping the
JsonBuilder returned from toJSON in another newtype which doesn't have
a Monoid instance)

What do we loose? In the current API of aeson, toJSON will directly
return a Value. This Value can then be inspected or extended. In order
to do the same in the new API the JsonBuilder first has to be parsed
to a Value which is less efficient. However, if the new API proves to
be significantly more efficient for encoding I think this extra
parsing cost is warranted since it's far less common than encoding.

A first version of this API will soon be ready and I will push that to
my github. Hopefully I can come up with some convincing benchmarks!

Bas


More information about the Libraries mailing list