[Haskell-cafe] Pure serialisation and compression [Was: No Derived
Read for Unboxed Arrays]
Donald Bruce Stewart
dons at cse.unsw.edu.au
Thu Jan 25 20:24:48 EST 2007
mattcbro:
> No doubt any kind of binary serialization would be a lot faster. In my
> case, however, I just wanted it to work out of the box. I need to read in
> about 5-10 arrays of only 1000 entries or so, saved in files. I suspect
> even the ascii parser could do that within a few seconds.
Faster, and trivial to write! Here's a complete example:
This little demo shows how to serialise (read/show) unboxed arrays
efficiently now using Data.Binary, available from Hackage,
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary/0.2
It uses lazy bytestrings to wrap fast serialisation, in the style of
newBinary, in a pure interface.
For example, to serialise a bunch of unboxed arrays to and from disk:
Import Binary,
> import Data.Binary
> import Data.Array.Unboxed
> import Codec.Compression.Zlib
> import qualified Data.ByteString.Lazy as B
> type A = UArray Int Int
Create some unboxed arrays
> a1, a2, a3, a4, a5 :: A
> a1 = listArray (0,999) [0..]
> a2 = listArray (0,999) [1..]
> a3 = listArray (0,999) [2..]
> a4 = listArray (0,999) [3..]
> a5 = listArray (0,999) [4..]
Collect our arrays, and the files we'll write them to
> arrs :: [(FilePath,A)]
> arrs = [("a1",a1) ,("a1",a1) ,("a2",a2) ,("a3",a3) ,("a4",a4) ,("a5",a5)]
An action to write some unboxed arrays to disk
> writeArrays :: [(FilePath,A)] -> IO ()
> writeArrays as = mapM_ (uncurry encodeFile) as
An action to read them back from disk:
> readArrays :: [FilePath] -> IO [(FilePath,A)]
> readArrays fs = zip fs `fmap` mapM decodeFile fs
Write, then read, and check that the operation was the identity
> main' = do
> writeArrays arrs
> arrs' <- readArrays (map fst arrs)
> print . all (uncurry (==)) $ zip arrs arrs'
Running this:
$ ghc -O A.hs --make
[1 of 1] Compiling Main ( A.hs, A.o )
Linking A ...
$ time ./A
True
./A 0.01s user 0.02s system 107% cpu 0.022 total
Using zlib (also available from hackage), we can do some more fun
things. For example, compressing the arrays before we write them to
disk:
> writeGzipArrays :: [(FilePath,A)] -> IO ()
> writeGzipArrays as = mapM_ (uncurry gzEncodeFile) as
> where
> gzEncodeFile f a = B.writeFile f . compress . encode $ a
And read them back in
> readGzipArrays :: [FilePath] -> IO [(FilePath,A)]
> readGzipArrays fs = zip fs `fmap` mapM gzDecodeFile fs
> where
> gzDecodeFile f = return . decode . decompress =<< B.readFile f
Our main function then simply becomes:
> main = do
> writeGzipArrays arrs
> arrs' <- readGzipArrays (map fst arrs)
> print . all (uncurry (==)) $ zip arrs arrs'
Runs a bit slower:
$ time ./A
True
./A 0.03s user 0.01s system 104% cpu 0.037 total
But the resulting files on disk are, instead of word sized chunks:
$ wc -c a1
8024 a1
Are 8 times smaller:
$ wc -c a1
1869 a1
Yay for pure compression and serialisation!
Note that you can use this code *right now* , in Hugs or GHc.
Cheers,
Don
More information about the Haskell-Cafe
mailing list