[Haskell-cafe] Data.Binary questions

Derek Elkins derek.a.elkins at gmail.com
Sun Jan 20 12:57:35 EST 2008


On Sun, 2008-01-20 at 18:18 +0000, Lauri Pesonen wrote:
> Hi,
> 
> I'm relatively new to Haskell so please bear with me. I'm trying to
> parse Java class files with Data.Binary and I'm having a few problems:
> 
> (The class file format is described here:
> http://java.sun.com/docs/books/jvms/second_edition/html/ClassFile.doc.html
> and the bytecode instructions are described here:
> http://java.sun.com/docs/books/jvms/second_edition/html/Instructions.doc.html
> )
> 
> 1. The class file format contains a number of tables. The table
> definitions start with the length of the list and carry on with that
> many table entries. Lists would be a good representation for them in
> Haskell, because  there is not need to index them directly (except
> with the constants table). I've created my own list type so that I can
> redefine the serialisation functions for it so that the serialisation
> matches the format defined in the class file format:
> 
> newtype MyList e = MkList ([e])
>         deriving Show
> 
> instance (Binary e) => Binary (MyList e) where
>          put (MkList es) = do
>                        put (fromIntegral (length es) :: Word16)
>                        mapM_ put es
> 
>          get = do
>              n <- get :: Get Word16
>              xs <- replicateM (fromIntegral n) get
>              return (MkList xs)
> 
> The problem is that one of the tables, namely the attribute_info
> structures, use a u32 length field whereas all the other tables use a
> u16 length field. My implementation uses u16, but it would be nice to
> be able to use the same data type for both types of tables. I think I
> can do it by adding a lenght field to MyList and specifying the type
> when I use MyList in some other data structure, but that would also
> mean that I have to keep track of the length of the list manually?

You may want to consider using the other side of Data.Binary rather than
the Binary class.  The -class- Binary is intended for de/serialization
when you don't care about the format.  From the documentation:

"For parsing and generating simple external binary formats (e.g. C
structures), Binary may be used, but in general is not suitable for
complex protocols. Instead use the Put and Get primitives directly."

Nevertheless, one way to solve your problem is with a phantom type.
Change MyList to,
newtype MyList t e = MkList [e] deriving Show

getLengthType :: MyList t e -> t
getLengthType = undefined

instance (Binary e) => Binary (MyList t e) where
         put l@(MkList es) = do
             put (fromIntegral (length es) `asTypeOf` getLengthType l)
             mapM_ put es

         get = do
             n <- get
             xs <- replicateM (fromIntegral (n `asTypeOf` getLengthType t)) get
             return (MkList xs `asTypeOf` t)
           where t = undefined

The asTypeOfs are just to propagate the type information around.  GHC's
extension for scoped type variables would make this code simpler and
more direct.  At any rate, now the code will use the Binary instance for
whatever type t is to serialize the length.

> 2. This is the bigger problem. The Java class file contains a
> constants table in the beginning of the file. The other fields later
> on in the class file contain indexes that reference entries in that
> constants table. So in order to be able to replace an index in a data
> structure with the actual string, I need to be able to look up the
> string from the constants table while I'm deserialising the field.
> 
> My guess is that I should be able to put the constants table into a
> state monad. On the other hand Data.Binary already uses the state
> monad for holding onto the binary data being deserialised. So it's not
> clear to me if I can use StateT with Data.Binary.Get? And if not, can
> I implement my own state monad and do it that way? I'm not very
> comfortable with Monads yet, so I might be missing something very
> obvious.

If you mean that you there references to the constant table in e.g. the
fields table then the problem here is that you need to the class methods
to use that monad transformer (in this case, ReaderT is all you should
need and not even that), but you can't change their type.  These are the
kind of issues that make the Binary class unsuitable for this type of
work.  If that is the case, the only way to use this is to explicitly
write out the deserialization code rather than relying on get, i.e.
you'll have to write a function 'getTable constantTable' that will
deserialize the table.




More information about the Haskell-Cafe mailing list