[Haskell-cafe] binary package: memory problem decoding an IntMap

Manlio Perillo manlio_perillo at libero.it
Thu Apr 2 05:54:34 EDT 2009


Hi.

I'm having memory problems decoding a big IntMap.

The data structure is:

IntMap (UArr (Word16 :*: Word8))


There are 480189 keys, and a total of 100480507 elements
(Netflix Prize).
The size of the encoded (and compressed) data is 184 MB.

When I load data from the Netflix Prize data set, total memory usage is
1030 Mb.

However when I try to decode the data, memory usage grows too much (even 
using the -F1.1 option in the RTS).


The problem seems to be with `fromAscList` function, defined as:

fromList :: [(Key,a)] -> IntMap a
fromList xs
   = foldlStrict ins empty xs
   where
     ins t (k,x)  = insert k x t

(by the way, why IntMap module does not use Data.List.foldl'?).

The `ins` function is not strict.



This seems an hard problem to solve.
First of all, IntMap should provide strict variants of the implemented 
functions.
And the binary package should choose whether use the strict or lazy version.


For me, the simplest solution is to serialize the association list 
obtained from `toAscList` function, instead of directly serialize the 
IntMap.

The question is: can I "reuse" the data already serialized?
Is the binary format of `IntMap a` and `[(Int, a)]` compatible?



Thanks  Manlio Perillo


More information about the Haskell-Cafe mailing list