[Haskell-cafe] possible memory leak in uvector 0.1.0.3

Tue Mar 3 09:35:33 EST 2009

Claus Reinke ha scritto:
>>> At first guess it sounds like you're holding onto too much, if not the
>>> whole stream perhaps bits within each chunk. 
>>
>> It is possible.
>>
>> I split the string in lines, then map some functions on each line to 
>> parse the data, and finally calling toU, for converting to an UArr.
> 
> Just to make sure (code fragments or, better, reduced examples
> would make it easier to see what the discussion is about): are you 
> forcing the UArr to be constructed before putting it into the Map?
> 

parse handle =
   contents <- S.hGetContents handle
   let v =  map singleton' $ ratings contents
   let m = foldl1' (unionWith appendU) v
   v `seq` return $! m

   where
     -- Build a Map with a single movie rating
     singleton' :: (Word32, Word8) -> MovieRatings
     singleton' (id, rate) =
       singleton (fromIntegral $ id) (singletonU $ pairS (id, rate))

This function gets called over each file, with

r <- mapM parse' [1..17770]
let movieRatings = foldl1' (unionWith appendU) r

The `ratings` function parse each line of the file, and return a tuple.
For each line of the file I build an IntMap, then merge them together;
The IntMaps, are then further merged in the main function.

NOTE that the memory usage is the same if I remove array concatenation.

There are 100,000,000 ratings, so I create 100,000,000 arrays containing 
only one element.
However, memory usage is 1 GB just after 800 files.

The data type is:

type Rating = Word32 :*: Word8
type MovieRatings = IntMap (UArr Rating) -- UArr from uvector

Code is here: http://haskell.mperillo.ath.cx/netflix-0.0.1.tar.gz
but it is an old version (where I used lazy ByteString).

Thanks  Manlio Perillo