[Haskell-cafe] binary package: memory problem decoding an IntMap
Manlio Perillo
manlio_perillo at libero.it
Thu Apr 2 05:54:34 EDT 2009
Hi.
I'm having memory problems decoding a big IntMap.
The data structure is:
IntMap (UArr (Word16 :*: Word8))
There are 480189 keys, and a total of 100480507 elements
(Netflix Prize).
The size of the encoded (and compressed) data is 184 MB.
When I load data from the Netflix Prize data set, total memory usage is
1030 Mb.
However when I try to decode the data, memory usage grows too much (even
using the -F1.1 option in the RTS).
The problem seems to be with `fromAscList` function, defined as:
fromList :: [(Key,a)] -> IntMap a
fromList xs
= foldlStrict ins empty xs
where
ins t (k,x) = insert k x t
(by the way, why IntMap module does not use Data.List.foldl'?).
The `ins` function is not strict.
This seems an hard problem to solve.
First of all, IntMap should provide strict variants of the implemented
functions.
And the binary package should choose whether use the strict or lazy version.
For me, the simplest solution is to serialize the association list
obtained from `toAscList` function, instead of directly serialize the
IntMap.
The question is: can I "reuse" the data already serialized?
Is the binary format of `IntMap a` and `[(Int, a)]` compatible?
Thanks Manlio Perillo
More information about the Haskell-Cafe
mailing list