[Haskell-cafe] possible memory leak in uvector 0.1.0.3

Claus Reinke claus.reinke at talk21.com
Tue Mar 3 18:29:17 EST 2009


>> That helps to make things clearer, I think. One issue is
>> the nature of Maps (strict in keys, non-strict in values).
>> 
>> - neither singleton nor unionWith are strict in the Map values, so
>>    nothing here forces the evaluation of rate or construction    of UArr
>> 
> But, as I have written, in one of my tests I also tried rnf to force 
> evaluation:
> rnf v `seq` rnf m `seq` return m
> Isn't this sufficient?

It will force the Map that results from the repeated unioning,
but does not ensure that this is done in an efficient way.

>> A standard trick to keep Map values evaluated by construction
>> is to make the availability of keys dependent on their values, eg
>> (singleton key) $! value. That won't help with unionWith and
>> the appendUs, but it should allow the source string references
>> to be dropped early, as the singletons are constructed.
> 
> Tried; but, even using union instead of unionWith, the memory 
> grows fast as before.

Strange. I built myself a small wrapper to make your code
fragment compilable, and just replacing (unionWith appendU)
with (union) makes a drastic difference - as it should.

It is rather annoying that Data.IntMap doesn't provide a strict
form of unionWith or insertWith (Data.Map does at least provide
insertWith'). But we can define our own, at the cost of an extra
lookup. We can then foldl' that insertWith' directly over the ratings 
list, bypassing the non-strict parts of the Data.IntMap API (see
code below).

Claus 
   (who still thinks that all Maps should be parameterized
    over their key-value pair type constructor, so that the default
    non-strict Maps would result from using non-strict pairs

    type IntMap = IntMapP (,)

    while the often desired element-strict Maps would result 
    from using strict pairs, with no other change in API

    type IntMapStrict = IntMapP (:*:)
    )

-------------------------------------------------------
{-# LANGUAGE TypeOperators #-}
import qualified Data.ByteString.Lazy as L
import Data.Array.Vector
import qualified Data.IntMap as IM
import Data.List
import Data.Word
import qualified Data.ByteString.Lazy.Char8 as L8
import Data.Maybe
import System.IO

-- don't use this for real, test wrapper only
ratings :: L.ByteString -> [(Word32,Word8)]
ratings = map (\[i,r]->(fromIntegral $ fst $ fromJust $ L8.readInt i
                       ,fromIntegral $ fst $ fromJust $ L8.readInt r))
        . map L8.words . L8.lines

parse handle = do
  contents <- L.hGetContents handle
  let v =  map singleton' $ ratings contents
  let m = foldl' (\m (kw,v)->insertWith' appendU (fromIntegral kw,v) m) IM.empty v
  -- let m = foldl1' (IM.unionWith appendU) v
  -- let m = foldl1' (IM.union) v
  return $! m

  where
    -- Build a Map with a single movie rating
    singleton' :: (Word32, Word8) -> (Int,UArr Rating)
    singleton' (id, rate) =
      ((fromIntegral $ id), (singletonU $ pairS (id, rate)))
      -- (IM.singleton (fromIntegral $ id)) $ (singletonU $ pairS (id, rate))

insertWith' op (k,v) m =
  maybe (IM.insert k v m)
        (\old->((IM.insert k) $! (v `op` old)) m)
        (IM.lookup k m)

type Rating = Word32 :*: Word8
type MovieRatings = IM.IntMap (UArr Rating) -- UArr from uvector

-- more test wrapper, some trivial input data
generate = withFile "in.data" WriteMode $ \h->
            mapM_ (\(i,r)->hPutStrLn h $ show i++" "++show r) 
                $ take 1000000 $ cycle [(i,i)|i<-[0..100]]

main = withFile "in.data" ReadMode parse >>= print



More information about the Haskell-Cafe mailing list