Associativity of the generic representation of sum types

Bas van Dijk v.dijk.bas at gmail.com
Thu Sep 22 09:39:51 CEST 2011


2011/9/22 José Pedro Magalhães <jpm at cs.uu.nl>:
> Hi Bas,
>
> On Thu, Sep 22, 2011 at 03:55, Bas van Dijk <v.dijk.bas at gmail.com> wrote:
>>
>> Hello,
>>
>> I just used the new GHC generics together with the DefaultSignatures
>> extension to provide a default generic implementation for toJSON and
>> parseJSON in the aeson package:
>>
>> https://github.com/mailrank/aeson/pull/26
>>
>> It appears that the generic representation of a sum type has a tree shape
>> as in:
>>
>> (a :+: b) :+: (c :+: d)
>
> That is correct.
>
>>
>> In my case this tree-shaped representation is problematic when parsing
>> a JSON value to this type. My overloaded parsing function is
>> parameterized with a key which specifies which of the a, b, c or d
>> constructors to parse. When it encounters a constructor it checks if
>> it matches the key, if so it is parsed, if not parsing will fail.
>> Because of the tree-shaped representation of sum types I have to
>> recursively parse the left and right branch and join them using <|>:
>>
>>
>> https://github.com/basvandijk/aeson/blob/d5535817ceb192aa9d7d0d0b291e1901f3fbb899/Data/Aeson/Types/Internal.hs#L1003
>>
>> I don't know for sure but I suspect that this can cause memory leaks
>> since the <|> has to keep the right value in memory when it is parsing
>> the left.
>
> It is not immediately clear to me why this would cause memory leaks...
>
>>
>> Ideally the generic representation of sum types is right associative as
>> in:
>>
>> a :+: (b :+: (c :+: d))
>>
>> This way I only have to check if 'a' matches, if it does the right
>> branch can be forgotten.
>>
>> Is there a good reason for not having it right associative?
>
> The reason is performance. In particular for large datatypes with many
> constructors, a balanced sum-of-products performs better than a right-nested
> one. Also, it makes things like writing generic space-efficient
> encoders/decoders easier.
>
> But I would be very interested in understanding if/how the balanced view
> leads to a space leak, so please let me know if you can provide some more
> information.
>
>
> Thanks,
> Pedro
>
>>
>> Regards,
>>
>> Bas
>>
>> _______________________________________________
>> Glasgow-haskell-users mailing list
>> Glasgow-haskell-users at haskell.org
>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>
>

Hi José,

After thinking about this some more, I don't think I need to worry
about space leaks.

The only worry I have is that in the following code:

------------------------------------------------------------------------
class GFromSum f where
    gParseSum :: Pair -> Parser (f a)

instance (GFromSum a, GFromSum b) => GFromSum (a :+: b) where
    gParseSum keyVal = fmap L1 (gParseSum keyVal) <|> fmap R1 (gParseSum keyVal)

instance (Constructor c, GFromJSON a) => GFromSum (C1 c a) where
    gParseSum (key, value)
        | key == pack (conName (undefined :: t c a p)) = gParseJSON value
        | otherwise = notFound $ unpack key

notFound :: String -> Parser a
notFound key = fail $ "The key \"" ++ key ++ "\" was not found"
------------------------------------------------------------------------

when gParseSum determines that the key equals the name of the
constructor, it is going to parse the value. However what happens when
the parsing of the value fails? Ideally it would immediately
terminate. However because of the <|> it will try all other branches.
This is unnecessary computation. Fortunately parsing those other
branches will immediately fail because non will have a constructor
that equals the key. So maybe I'm just worrying to much :-)

Regards,

Bas



More information about the Glasgow-haskell-users mailing list