[Haskell-cafe] representation on persistent store question

Antoine Latter aslatter at gmail.com
Fri Jan 2 01:24:54 EST 2009


On Jan 1, 2009 11:50pm, "Galchin, Vasili" <vigalchin at gmail.com> wrote:
> it is a bioinformatics standard .. . I am writing on this newsgroup in order to try to be objective to get a "correct" and elegant answer .. in any case I am helping on the bioinformatics code (you can see on Hackage). I am trying to finish the 2Bit file format code ... it seems to me that bioinformatics as an area is not clearly defined .... e.g. it is unclear to me whether "offset" is a marshalled/serialized concept or or unmarshalled/unserialized concept ..... this distinction is very important .... I will have to think about more myself!
>
>
> Regards, Vasili
>


Here's some code using Data.Binary to store data as offsets into a
byte array.  I haven't tested it too much, so it may have bugs.  Maybe
there's some inspiration in there.

-Antoine

>>>>
import Data.Binary
import Data.Binary.Get
import Data.Binary.Put

import Data.ByteString.Lazy (ByteString)
import qualified Data.ByteString.Lazy as B

data TestStruct = TestStruct
    { property1 :: ByteString
    , property2 :: ByteString
    , property3 :: ByteString
    }
 deriving Show

{-

 The serialized format looks like (all big-endian):

 * first offset into data block (Word32)
 * second offset into data block (Word32)
 * third offset into data block (Word32)
 * length of bnary data block (Word32)
 * binary data block (Arbitrary binary data)

-}
instance Binary TestStruct where
    put struct =
        let data1 = property1 struct
            data2 = property2 struct
            data3 = property3 struct

            dataBlock = data1 `B.append` data2 `B.append` data3

            offset1 = 0
            offset2 = offset1 + B.length data1
            offset3 = offset2 + B.length data2

       in do
         putWord32be $ fromIntegral offset1
         putWord32be $ fromIntegral offset2
         putWord32be $ fromIntegral offset3

         putWord32be $ fromIntegral $ B.length dataBlock
         putLazyByteString dataBlock

    get = do
      offset1 <- getWord32be
      offset2 <- getWord32be
      offset3 <- getWord32be

      dataBlockLength <- getWord32be
      dataBlock <- B.drop (fromIntegral offset1) `fmap`
                   getLazyByteString (fromIntegral dataBlockLength)

      let (data1, rest1) =
              B.splitAt (fromIntegral $ offset2 - offset1) dataBlock
          (data2, rest2) =
              B.splitAt (fromIntegral $ offset3 - offset2 - offset1) rest1
          data3          = rest2

      return $ TestStruct data1 data2 data3
<<<<<


More information about the Haskell-Cafe mailing list