Vector memory usage
rl at cse.unsw.edu.au
Fri Feb 24 09:10:29 CET 2012
See http://trac.haskell.org/vector/ticket/78. I'll fix this, but for now you can you fst (span f) instead of dropWhile f, span doesn't copy. Sorry about this.
On 24/02/2012, at 05:46, Ben Gamari wrote:
> Recently I stumbled upon a rather non-obvious behavior in what seems
> like a fairly common use a vector. It appears that, despite the claim
> of the documentation, dropWhile can result in copies, resulting in a 1MB
> vector ballooning to several gigabytes within a few seconds. This has
> been reproduced with ghc 7.4.1 and 7.0.3 with -O and -prof.
> My knowledge of the internals of vector is quite limited, so I'll leave
> potential explanations to those with better understanding; below I've
> attached some relevant discussion from #haskell. Let me know if there's
> anything at all I could do to help.
> - Ben
>  http://hackage.haskell.org/trac/ghc/ticket/5893
> Test case:
> import qualified Data.Vector.Unboxed as V
> type Time = Int
> spansPhotons :: V.Vector Time -> [(Time,Time)] -> [V.Vector Time]
> spansPhotons ts spans = map (\(start,_)->V.takeWhile (<start) ts) spans
> main = do
> let times = V.generate 1000000 (*100)
> spans = [(10000*i, 10000*i+5000) | i <- [1..10000]]
> let bursts = spansPhotons times spans
> print $ sum $ map V.length bursts
> print $ map (const 1) $ bursts
> 00:16 < rwbarton> there the problem is that the bursts aren't sharing the same memory
> 00:17 < rwbarton> like they would if you wrote a similar program with ByteString
> 00:17 < rwbarton> the thing is that the Vector implementation of dropWhile is good for the stream fusion situation where you have a pipeline of operations
> 00:17 < quintessence> Yeah, I don't think dropWhile can participate in stream fusion and not copy
> 00:18 < rwbarton> but not good when you want to dropWhile many different times on the same input
> 00:18 < rwbarton> right
> 00:18 < quintessence> because when you have dropWhile p . map f the thing you want to share with doesn't exist
> 00:18 < rwbarton> so I don't know what the resolution would be
> 00:18 < bgamari> ouch
> 00:18 < rwbarton> besides giving some more manual control over stream fusion perhaps
> 00:19 < bgamari> Yep
> 00:20 < bgamari> That is a tricky one
> 00:20 < dmwit> dropWhileFusable + dropWhileCopy
> 00:20 < dmwit> err...
> 00:21 < dmwit> dropWhileFusable + dropWhileDoesn'tCopy, I guess
> 00:21 < quintessence> Could you implement dropWhile the non-fusing way and then have a RULE for dropWhile p . unstream = unstream . S.dropWhile p?
> 00:21 < rwbarton> at the least you could implement dropWhile' the non-fusing way; maybe that already exists somewhere in Data.Vector
> 00:21 < tibbe> Isn't there a function to explicitly copy? Like ByteString's clone?
> 00:21 < rwbarton> there is
> 00:21 < rwbarton> but the goal here is to not copy
> 00:22 < tibbe> I'm joining this discussion late I'm afraid :)
> 00:22 < tibbe> it should be possible to write rules that only fire when fusion doesn't happen
> 00:22 < rwbarton> the issue is a program like http://hpaste.org/64262#a64263 ends up allocating a bunch of nonoverlapping vectors
> 00:22 < tibbe> they could do sharing, but the performance model might turn out to be confusing
> 00:22 < rwbarton> when it could share them (and the documentation would have you believe it does)
> 00:23 < tibbe> how would you document dropWhile? Might share the underlying vector?
> 00:23 < bgamari> tibbe: the performance model is already a bit confusing ;)
> 00:23 < tibbe> :)
> 00:24 < bgamari> I'm surprised I'm the first person to encounter this
> 00:25 < bgamari> The bug report is http://hackage.haskell.org/trac/ghc/ticket/5893#comment:1
> 00:29 < bgamari> rwbarton, Do you mind if I quote you in a message to libraries@?
More information about the Libraries