aeson and dlist in HP 2013.4.0.0
Bas van Dijk
v.dijk.bas at gmail.com
Thu Nov 28 16:42:49 UTC 2013
On 28 November 2013 13:42, Sven Panne <svenpanne at gmail.com> wrote:
> Just two add my 2c: Given all these new packages which would need to
> be pulled into the HP just for aeson, let's not include aeson for
> 2013.4.0.0 and release 2013.4.0.0 soon without the need for lengthy
> discussions.
As the proposer for inclusion of aeson in the HP I'm beginning to agree.
There's another reason I would like to postpone the aeson inclusion: I
just started working on improving the encoding performance of aeson.
This requires some significant changes to the API. Therefore I think
it would be better to see how well this new API works out. If it works
out, release it as aeson-7 (or aeson-8) and include that release in
the HP after next. This way we have time to discuss the new
dependencies and the HP remains stable.
The following is a brief explanation of the new aeson API (you can
stop reading here if you're not interested in it):
The idea is to use the same trick that is used in the upcoming binary
package[1].
First of all toJSON will return a JsonBuilder instead of a Value:
class ToJSON a where
toJSON :: a -> JsonBuilder
A JsonBuilder is basically a difference list:
newtype JsonBuilder = JsonBuilder (IStream -> IStream)
instance Monoid JsonBuilder where ...
The "list", here represented as an IStream, is a sequence of
instructions to the encoder:
data IStream =
INull IStream
| ITrue IStream
| IFalse IStream
| IDoubleQuote IStream
| IChar {-# UNPACK #-} !Char IStream
| IString !String IStream
| IText !Text IStream
| IInt {-# UNPACK #-} !Int IStream
| IInt8 {-# UNPACK #-} !Int8 IStream
| IInt16 {-# UNPACK #-} !Int16 IStream
| IInt32 {-# UNPACK #-} !Int32 IStream
| IInt64 {-# UNPACK #-} !Int64 IStream
| IWord {-# UNPACK #-} !Word IStream
| IWord8 {-# UNPACK #-} !Word8 IStream
| IWord16 {-# UNPACK #-} !Word16 IStream
| IWord32 {-# UNPACK #-} !Word32 IStream
| IWord64 {-# UNPACK #-} !Word64 IStream
| IFloat {-# UNPACK #-} !Float IStream
| IDouble {-# UNPACK #-} !Double IStream
| IInteger !Integer IStream
| IScientific !Scientific IStream
| IComma IStream
| IBeginArray IStream
| IEndArray IStream
| IBeginObject IStream
| IEndObject IStream
| IColon IStream
| IValue !Value IStream
-- Fused:
| IBeginObject_IDoubleQuote IStream
| IComma_IDoubleQuote IStream
-- TODO; more
| IEnd
Converting a JsonBuilder to a Builder (note that I'm using the new
bytestring Builder here) is simply a matter of executing the right
Builder for each instruction:
toBuilder :: JsonBuilder -> Builder
toBuilder (JsonBuilder g) = go (g IEnd)
where
go :: IStream -> Builder
go is = case is of
INull is' -> nullB <> go is'
ITrue is' -> trueB <> go is'
IFalse is' -> falseB <> go is'
IDoubleQuote is' -> char8 '"' <> go is'
IChar c is' -> char c <> go is'
IString cs is' -> string cs <> go is'
IText t is' -> text t <> go is'
IInt i is' -> intDec i <> go is'
IInt8 i8 is' -> int8Dec i8 <> go is'
IInt16 i16 is' -> int16Dec i16 <> go is'
IInt32 i32 is' -> int32Dec i32 <> go is'
IInt64 i64 is' -> int64Dec i64 <> go is'
IWord w is' -> wordDec w <> go is'
IWord8 w8 is' -> word8Dec w8 <> go is'
IWord16 w16 is' -> word16Dec w16 <> go is'
IWord32 w32 is' -> word32Dec w32 <> go is'
IWord64 w64 is' -> word64Dec w64 <> go is'
IFloat f is' -> floatDec f <> go is'
IDouble d is' -> doubleDec d <> go is'
IInteger i is' -> integerDec i <> go is'
IScientific s is' -> fromScientific s <> go is'
IComma is' -> char8 ',' <> go is'
IBeginArray is' -> char8 '[' <> go is'
IEndArray is' -> char8 ']' <> go is'
IBeginObject is' -> char8 '{' <> go is'
IEndObject is' -> char8 '}' <> go is'
IColon is' -> char8 ':' <> go is'
IValue v is' -> fromValue v <> go is'
-- Fused:
IBeginObject_IDoubleQuote is'-> fixed2('{','"')<> go is'
IComma_IDoubleQuote is'-> fixed2(',','"')<> go is'
-- TODO: more
IEnd -> mempty
nullB :: Builder
nullB = fixed4 ('n',('u',('l','l')))
{-# INLINE nullB #-}
trueB :: Builder
trueB = fixed4 ('t',('r',('u','e')))
{-# INLINE trueB #-}
falseB :: Builder
falseB = fixed5 ('f',('a',('l',('s','e'))))
{-# INLINE falseB #-}
fixed2 :: (Char, Char) -> Builder
fixed2 = P.primFixed (P.char8 >*< P.char8)
{-# INLINE fixed2 #-}
fixed4 :: (Char, (Char, (Char, Char))) -> Builder
fixed4 = P.primFixed (P.char8 >*< P.char8 >*< P.char8 >*< P.char8)
{-# INLINE fixed4 #-}
fixed5 :: (Char, (Char, (Char, (Char, Char)))) -> Builder
fixed5 = P.primFixed (P.char8 >*< P.char8 >*< P.char8 >*< P.char8 >*< P.char8)
{-# INLINE fixed5 #-}
This representation allows a lot of optimizations. For example we can
define rewrite rules that "fuse" the Builders of common sequences
like:
{-# RULES
"IBeginObject_IDoubleQuote" forall is.
IBeginObject (IDoubleQuote is) =
IBeginObject_IDoubleQuote is #-}
{-# RULES
"IComma_IDoubleQuote" forall is.
IComma (IDoubleQuote is) =
IComma_IDoubleQuote is #-}
The encoder can handle these common sequences more efficiently.
Of course the JsonBuilder is abstract to the user. There will be a
safe API to construct well-formed JsonBuilders. (While writing this I
realize that users will be able to use the Monoid instance for
JsonBuilders which is undesirable. I will solve this by wrapping the
JsonBuilder returned from toJSON in another newtype which doesn't have
a Monoid instance)
What do we loose? In the current API of aeson, toJSON will directly
return a Value. This Value can then be inspected or extended. In order
to do the same in the new API the JsonBuilder first has to be parsed
to a Value which is less efficient. However, if the new API proves to
be significantly more efficient for encoding I think this extra
parsing cost is warranted since it's far less common than encoding.
A first version of this API will soon be ready and I will push that to
my github. Hopefully I can come up with some convincing benchmarks!
Bas
More information about the Libraries
mailing list