`binary` serializing datatypes deriving Generic wrong on arm (32bit) with GHC HEAD

Moritz Angermann moritz.angermann at gmail.com
Tue Nov 28 04:52:41 UTC 2017


Hi!

while trying to make sure cross compilation with Template Haskell works properly
with 8.4, I ran into the following situation:

When serializing data types, e.g. `Name OccName NameFlavour` in the transmission
of Template Haskell Splice results from a 32bit arm device to the x86_64 host ghc.

We expect to see:
```
  .- 0 (first constructor)         .- 1  .- 4  (fifth constructor)             .- 4                                 .- 3
  v                                v     v                                     v                                    v
\NUL \NUL\NUL\NUL\NUL\NUL\NUL\NUL\SOHf \EOT \NUL \NUL\NUL\NUL\NUL\NUL\NUL\NUL\EOTmain \NUL\NUL\NUL\NUL\NUL\NUL\NUL\ETXTmp
                                            '--' '----------------------------------' '---------------------------------'
                                            NameSpace       PkgName                               ModName
     '-------------------------------' '--------------------------------------------------------------------------------'
                  OccName                       NameG NameSpace PkgName ModName :: NameFlavour
'-----------------------------------------------------------------------------------------------------------------------'
                                Name OccName NameFlavour :: Name
```

However, the `NameSpace` on the 32bit arm ends up being 8 bytes. Even though the full
`Namespace` data type can be fully serialized in a single byte.

The `binary` package tries to compute the size it needs for a generic data type, using
the following logic (from binary/src/Data/Binary/Generic.hs):

```
class SumSize f where
  sumSize :: Tagged f Word64

newtype Tagged (s :: * -> *) b = Tagged {unTagged :: b}

instance (SumSize a, SumSize b) => SumSize (a :+: b) where

sumSize = Tagged $ unTagged (sumSize :: Tagged a Word64) +
                   unTagged (sumSize :: Tagged b Word64)

instance SumSize (C1 c a) where
  sumSize = Tagged 1
```


Thus for a simple sum type `data X = A | B` we should get a `sumSize` of 2.
The arm32 device however ends up getting 2^33, because `sumSize :: Tagged a Word64`
and `sumSize :: Tagged b Word64` each end up being 2^32.

With some help from the nice folks in #ghc, I was able to conjure up the following
condensed test case:

```
{-# LANGUAGE DeriveGeneric, KindSignatures, PolyKinds, CPP,
    ScopedTypeVariables, TypeOperators, TypeSynonymInstances,
    FlexibleInstances  #-}
{-# OPTIONS_GHC -O2 #-}

import GHC.Generics

import Data.Word
import Debug.Trace

data X = A | B deriving (Show, Generic)

main :: IO ()
main = print (sumSize :: Tagged (Rep X))

-- like traceShowId, but allows us to prepend a message.
t :: Show a => String -> a -> a
#if TRACE
t msg x = traceShow (msg ++ show x) x
#else
t _ = id
#endif

class SumSize f where
    sumSize :: Tagged f

newtype Tagged (s :: * -> *) = Tagged {unTagged :: WORD} deriving Show

instance (SumSize a, SumSize b) => SumSize (a :+: b) where
    sumSize = t "SumSize (a :+: b): " $ Tagged $ unTagged (t "a :+: b => sumSize :: Tagged a: " $ sumSize :: Tagged a) +
                                                 unTagged (t "a :+: b => sumSize :: Tagged b: " $ sumSize :: Tagged b)

instance SumSize (C1 c a) where
    sumSize = t "SumSize (C1 c a): " $ Tagged 1

instance SumSize a => SumSize (M1 D c a) where
    sumSize = t "SumSize (M1 D c a): " $ Tagged . unTagged $ (sumSize :: Tagged a)

```

compiling this with `-DWORD=Word32 -DTRACE=1` yields the correct result (=2), with `-DWORD=Word64 -DTRACE=0` as well.
With `-DWORD=Word64 -DTRACE=2` the wrong result (=2^33)

Optimization flags seem not to play any role when everything is in a single module (as the test case).

As such I have attached the `-ddump-simple -dsuppress-all` files for the Word64 and Word32 with TRACE=1, as well
as the diff between the Word32 and Word64 dump.

The output with WORD=Word32, TRACE=1 is:
```
"SumSize (C1 c a): Tagged {unTagged = 1}"
"a :+: b => sumSize :: Tagged a: Tagged {unTagged = 4294967296}"
"a :+: b => sumSize :: Tagged b: Tagged {unTagged = 4294967296}"
"SumSize (a :+: b): Tagged {unTagged = 8589934592}"
"SumSize (M1 D c a): Tagged {unTagged = 8589934592}"
Tagged {unTagged = 8589934592}
```

with WORD=Word64, TRACE=1 is:
```
"SumSize (C1 c a): Tagged {unTagged = 1}"
"a :+: b => sumSize :: Tagged a: Tagged {unTagged = 1}"
"a :+: b => sumSize :: Tagged b: Tagged {unTagged = 1}"
"SumSize (a :+: b): Tagged {unTagged = 2}"
"SumSize (M1 D c a): Tagged {unTagged = 2}"
Tagged {unTagged = 2}
```

Any help with this would be greatly appreciated!

Cheers,
 Moritz

PS: I'm not absolutely sure, but this might also be related to https://ghc.haskell.org/trac/ghc/ticket/13513

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Main.word32.dump-simpl
Type: application/octet-stream
Size: 18358 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20171128/5705743e/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Main.word64.dump-simpl
Type: application/octet-stream
Size: 23750 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20171128/5705743e/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Main.dump-simpl.diff
Type: application/octet-stream
Size: 22950 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20171128/5705743e/attachment-0005.obj>


More information about the ghc-devs mailing list