[Haskell-cafe] Data.Binary poor read performance
Don Stewart
dons at galois.com
Tue Feb 24 18:17:49 EST 2009
jnf:
>
>
> wren ng thornton wrote:
> >
> > If you have many identical strings then you will save lots by memoizing
> > your strings into Integers, and then serializing that memo table and the
> > integerized version of your data structure. The amount of savings
> > decreases as the number of duplications decrease, though since you don't
> > need the memo table itself you should be able to serialize it in a way
> > that doesn't have much overhead.
> >
>
> I had problems with the size of the allocated heap space after serializing
> and loading data with the binary package. The reason was that
> binary does not support sharing of identical elements and considered a
> restricted solution for strings and certain other data types first, but
> came up with a generic solution in the end.
> (I did it just last weekend).
And this is exactly the intended path -- that people will release their
own special instances for doing more elaborate parsing/printing tricks!
> I put the Binary monad in a state transformer with maps for memoization:
> type PutShared = St.StateT (Map Object Int, Int) PutM ()
> type GetShared = St.StateT (IntMap Object) Bin.Get
>
> In addition to standard get ant put methods:
> class (Typeable α, Ord α, Eq α) ⇒ BinaryShared α where
> put :: α → PutShared
> get :: GetShared α
> I added putShared and getShared methods with memoization:
> putShared :: (α → PutShared) → α → PutShared
> getShared :: GetShared α → GetShared α
>
> For types that I don't want memoization I can either refer to the underlying
> binary monad for primitive types, e.g.:
> instance BinaryShared Int where
> put = lift∘Bin.put
> get = lift Bin.get
> or stay in the BinaryShared monad for types of which I may memoize
> components, e.g.:
> instance (BinaryShared a, BinaryShared b) ⇒ BinaryShared (a,b) where
> put (a,b) = put a ≫ put b
> get = liftM2 (,) get get
>
> And for types for which I want memoization, I wrap it with putShared and
> getShared ,e.g:
> instance BinaryShared a ⇒ BinaryShared [a] where
> put = putShared (λl → lift (Bin.put (length l)) ≫ mapM_ put l)
> get = getShared (do
> n ← lift (Bin.get :: Bin.Get Int)
> replicateM n get)
>
> This save 1/3 of heap space to my application. I didn't measure time.
> Maybe it would be useful to have something like this in the binary module.
>
Very nice. Maybe even upload these useful instances in a little
binary-extras package?
More information about the Haskell-Cafe
mailing list