[Haskell-cafe] Data.Binary poor read performance
jutaro
jnf at arcor.de
Tue Feb 24 17:42:26 EST 2009
wren ng thornton wrote:
>
> If you have many identical strings then you will save lots by memoizing
> your strings into Integers, and then serializing that memo table and the
> integerized version of your data structure. The amount of savings
> decreases as the number of duplications decrease, though since you don't
> need the memo table itself you should be able to serialize it in a way
> that doesn't have much overhead.
>
I had problems with the size of the allocated heap space after serializing
and loading data with the binary package. The reason was that
binary does not support sharing of identical elements and considered a
restricted solution for strings and certain other data types first, but
came up with a generic solution in the end.
(I did it just last weekend).
I put the Binary monad in a state transformer with maps for memoization:
type PutShared = St.StateT (Map Object Int, Int) PutM ()
type GetShared = St.StateT (IntMap Object) Bin.Get
In addition to standard get ant put methods:
class (Typeable α, Ord α, Eq α) ⇒ BinaryShared α where
put :: α → PutShared
get :: GetShared α
I added putShared and getShared methods with memoization:
putShared :: (α → PutShared) → α → PutShared
getShared :: GetShared α → GetShared α
For types that I don't want memoization I can either refer to the underlying
binary monad for primitive types, e.g.:
instance BinaryShared Int where
put = lift∘Bin.put
get = lift Bin.get
or stay in the BinaryShared monad for types of which I may memoize
components, e.g.:
instance (BinaryShared a, BinaryShared b) ⇒ BinaryShared (a,b) where
put (a,b) = put a ≫ put b
get = liftM2 (,) get get
And for types for which I want memoization, I wrap it with putShared and
getShared ,e.g:
instance BinaryShared a ⇒ BinaryShared [a] where
put = putShared (λl → lift (Bin.put (length l)) ≫ mapM_ put l)
get = getShared (do
n ← lift (Bin.get :: Bin.Get Int)
replicateM n get)
This save 1/3 of heap space to my application. I didn't measure time.
Maybe it would be useful to have something like this in the binary module.
Jürgen
PS: And here is the dirty implementation, in the case someone finds it
useful:
putShared :: (α → PutShared) → α → PutShared
putShared fput v = do
(dict, unique) ← St.get
case (ObjC v) `Map.lookup` dict of
Just i → lift (Bin.putWord8 0 ≫ putWord64be (fromIntegral i))
Nothing → do
St.put (dict,unique + 1)
lift (Bin.putWord8 1)
lift (putWord64be (fromIntegral unique))
fput v
(dict2, unique2) ← St.get
let newDict = Map.insert (ObjC v) unique dict2
St.put (newDict,unique2)
getShared :: GetShared α → GetShared α
getShared f = do
dict ← St.get
w ← lift Bin.getWord8
case w of
0 → do
i ← lift (liftM fromIntegral (getWord64be))
case IMap.lookup i dict of
Just (ObjC obj) → return (forceJust (cast obj)
"Shared≫getShared: Cast failed")
Nothing → error ◊ "Shared≫getShared : Dont find in Map
" ⊕ show i
1 → do
i ← lift (liftM fromIntegral (getWord64be))
obj ← f
dict2 ← St.get
St.put (IMap.insert i (ObjC obj) dict2)
return obj
_ → error ◊ "Shared≫getShared : Encoding error"
data Object = ∀ α. (Typeable α, Ord α, Eq α) ⇒ ObjC {unObj :: α}
instance Eq Object where
(ObjC a) ≡ (ObjC b) = if typeOf a ≠ typeOf b
then False
else (Just a) ≡ cast b -- can someone
explain to me why this works?
instance Ord Object where
compare (ObjC a) (ObjC b) = if typeOf a ≠ typeOf b
then compare
((unsafePerformIO∘typeRepKey∘typeOf) a)
((unsafePerformIO∘typeRepKey∘typeOf) b)
else compare (Just a) (cast b)
encodeSer :: BinaryShared a ⇒ a → L.ByteString
encodeSer v = runPut (evalStateT (put v) (Map.empty,0))
decodeSer :: BinaryShared α ⇒ L.ByteString → α
decodeSer = runGet (evalStateT get IMap.empty)
--
View this message in context: http://www.nabble.com/Data.Binary-poor-read-performance-tp22167466p22192337.html
Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.
More information about the Haskell-Cafe
mailing list