[Haskell-cafe] possible memory leak in uvector 0.1.0.3
Manlio Perillo
manlio_perillo at libero.it
Tue Mar 3 09:35:33 EST 2009
Claus Reinke ha scritto:
>>> At first guess it sounds like you're holding onto too much, if not the
>>> whole stream perhaps bits within each chunk.
>>
>> It is possible.
>>
>> I split the string in lines, then map some functions on each line to
>> parse the data, and finally calling toU, for converting to an UArr.
>
> Just to make sure (code fragments or, better, reduced examples
> would make it easier to see what the discussion is about): are you
> forcing the UArr to be constructed before putting it into the Map?
>
parse handle =
contents <- S.hGetContents handle
let v = map singleton' $ ratings contents
let m = foldl1' (unionWith appendU) v
v `seq` return $! m
where
-- Build a Map with a single movie rating
singleton' :: (Word32, Word8) -> MovieRatings
singleton' (id, rate) =
singleton (fromIntegral $ id) (singletonU $ pairS (id, rate))
This function gets called over each file, with
r <- mapM parse' [1..17770]
let movieRatings = foldl1' (unionWith appendU) r
The `ratings` function parse each line of the file, and return a tuple.
For each line of the file I build an IntMap, then merge them together;
The IntMaps, are then further merged in the main function.
NOTE that the memory usage is the same if I remove array concatenation.
There are 100,000,000 ratings, so I create 100,000,000 arrays containing
only one element.
However, memory usage is 1 GB just after 800 files.
The data type is:
type Rating = Word32 :*: Word8
type MovieRatings = IntMap (UArr Rating) -- UArr from uvector
Code is here: http://haskell.mperillo.ath.cx/netflix-0.0.1.tar.gz
but it is an old version (where I used lazy ByteString).
Thanks Manlio Perillo
More information about the Haskell-Cafe
mailing list