[Haskell-cafe] Pure serialisation and compression [Was: No Derived Read for Unboxed Arrays]

Donald Bruce Stewart dons at cse.unsw.edu.au
Thu Jan 25 20:24:48 EST 2007


mattcbro:
> No doubt any kind of binary serialization would be a lot faster.  In my
> case, however,  I just wanted it to work out of the box.  I need to read in
> about 5-10 arrays of only 1000 entries or so, saved in files.  I suspect
> even the ascii parser could do that within a few seconds.

Faster, and trivial to write! Here's a complete example:

This little demo shows how to serialise (read/show) unboxed arrays
efficiently now using Data.Binary, available from Hackage,

    http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary/0.2

It uses lazy bytestrings to wrap fast serialisation, in the style of
newBinary, in a pure interface.

For example, to serialise a bunch of unboxed arrays to and from disk:

Import Binary,

> import Data.Binary
> import Data.Array.Unboxed
> import Codec.Compression.Zlib
> import qualified Data.ByteString.Lazy as B

> type A = UArray Int Int

Create some unboxed arrays

> a1, a2, a3, a4, a5 :: A
> a1 = listArray (0,999) [0..]
> a2 = listArray (0,999) [1..]
> a3 = listArray (0,999) [2..]
> a4 = listArray (0,999) [3..]
> a5 = listArray (0,999) [4..]


Collect our arrays, and the files we'll write them to

> arrs :: [(FilePath,A)]
> arrs = [("a1",a1) ,("a1",a1) ,("a2",a2) ,("a3",a3) ,("a4",a4) ,("a5",a5)]


An action to write some unboxed arrays to disk

> writeArrays :: [(FilePath,A)] -> IO ()
> writeArrays as = mapM_ (uncurry encodeFile) as


An action to read them back from disk:

> readArrays :: [FilePath] -> IO [(FilePath,A)]
> readArrays fs = zip fs `fmap` mapM decodeFile fs


Write, then read, and check that the operation was the identity

> main' = do
>     writeArrays arrs
>     arrs' <- readArrays (map fst arrs)
>     print . all (uncurry (==)) $ zip arrs arrs'


Running this:

    $ ghc -O A.hs --make
    [1 of 1] Compiling Main             ( A.hs, A.o )
    Linking A ...

    $ time ./A
    True
    ./A  0.01s user 0.02s system 107% cpu 0.022 total


Using zlib (also available from hackage), we can do some more fun
things. For example, compressing the arrays before we write them to
disk:

> writeGzipArrays :: [(FilePath,A)] -> IO ()
> writeGzipArrays as = mapM_ (uncurry gzEncodeFile) as
>   where
>       gzEncodeFile f a = B.writeFile f . compress . encode $ a

And read them back in

> readGzipArrays :: [FilePath] -> IO [(FilePath,A)]
> readGzipArrays fs = zip fs `fmap` mapM gzDecodeFile fs
>   where
>       gzDecodeFile f = return . decode . decompress =<< B.readFile f

Our main function then simply becomes:

> main = do
>     writeGzipArrays arrs
>     arrs' <- readGzipArrays (map fst arrs)
>     print . all (uncurry (==)) $ zip arrs arrs'

Runs a bit slower:

    $ time ./A
    True
    ./A  0.03s user 0.01s system 104% cpu 0.037 total

But the resulting files on disk are, instead of word sized chunks:

    $ wc -c a1
    8024 a1

Are 8 times smaller:

    $ wc -c a1
    1869 a1

Yay for pure compression and serialisation!
Note that you can use this code *right now* , in Hugs or GHc.

Cheers,
   Don


More information about the Haskell-Cafe mailing list