[Haskell-cafe] uvector package appendU: memory leak?
manlio_perillo at libero.it
Tue Mar 31 08:04:11 EDT 2009
Claus Reinke ha scritto:
>>> appendU is strict, insertWith just doesn't force it (follow the
>>> source link
>>> in the haddocks to see why).
>> Ok, I see.
>> But, IMHO, this should be clearly documented.
> There seems to be some agreement that strict variant operations should
> also be provided, but it needs some more thinking, and whenever I look
> at the code, I have the feeling that there are just too many operations
> defined in it, so I look away again:-(
Well, it does not matter if strict variant operation is provided or not.
But at least it should be documented whether an implemented operation is
strict or lazy.
>> However, reading the code now, I prefer my version using alter.
> Yes, it is a more direct way to enforce strict evaluation. Btw, if your
> update function 'f' is strict in 'old' and 'new', you might just want to
> use 'Just $! f old new' and save all those 'seq's.
And this seems to be a bit more memory friendly, too.
>> By the way, about insertWith/alter; from IntMap documentation:
>> insertWithKey: O(min(n,W)
>> alter: O(log n)
>> So, alter is more efficient than insertWithKey?
>> And what is that `W` ?
> 'W' is a maximum bound (see comment at top of that haddock page).
And the top of that page there is only a link to a paper where the
algorithm is defined.
> 'alter' has some shortcuts if the update functions doesn't actually do
> anything, but for reimplementing 'insertWithKey', the complexity should
> be the same as the code would be the same.
>>> But piling up appendUs is still not a good idea. For a moment,
>>> I thought that the stream representation's (+++) was handling
>>> runtime fusion gracefully, but a simple test case suggests otherwise
>>> at least for the simpler case of consU (the attached appendU.hs
>>> doesn't do any appendUs, as the consU case already demonstrates
>>> the issue; be careful with large numbers here, it'll quickly eat your
>> I'm not sure to fully understand the code.
> [ Ok ]
>> But, again, IMHO it does not apply to my original problem.
> It is a question of resource constraints, probably. Instead of maintaining
> an 'IntMap (UArr Int)' all the way through buildup, you could use an
> 'IntMap [Int]' while parsing, and then compress into 'IntMap (UArr Int)'
> when parsing is finished.
No, as I have written, this is not possible for my problem.
Because this would eat all available memory after few minutes.
The data I'm parsing from Netflix Prize data set contains:
480189 customers (with customer ids ranging from 1 to 2649429)
100480507 total ratings
ratings are grouped for each movie, but I want to group them by costumers.
This means that each customer has few ratings (200, on average), so
there are a lot of small arrays.
Since ratings for each customers are parsed "at the same time", using a
plain list would consume a lot of memory, since stream fusion can only
be executed at the end of the parsing.
On the other hand, when I want to group ratings by movies, stream fusion
seems to work fine.
This is IMHO, of course.
> If you can afford the larger memory profile
> during parsing, this should give better overall performance, with the
> same memory need for the final state, but larger memory needs up
> to that state. I'd be interested in the differences, if you try it.
Unfortunately, I can not afford this memory profile!
My PC has only 2 GB of RAM.
And I doubt any other could afford the required memory.
Using UArr, the total required memory is about 900 MB.
More information about the Haskell-Cafe