Vector memory usage

Ben Gamari bgamari.foss at gmail.com
Fri Feb 24 06:46:42 CET 2012


Recently I stumbled upon a rather non-obvious behavior in what seems
like a fairly common use a vector[1]. It appears that, despite the claim
of the documentation, dropWhile can result in copies, resulting in a 1MB
vector ballooning to several gigabytes within a few seconds. This has
been reproduced with ghc 7.4.1 and 7.0.3 with -O and -prof.

My knowledge of the internals of vector is quite limited, so I'll leave
potential explanations to those with better understanding; below I've
attached some relevant discussion from #haskell. Let me know if there's
anything at all I could do to help.

Cheers,

- Ben


[1] http://hackage.haskell.org/trac/ghc/ticket/5893


Test case:

import qualified Data.Vector.Unboxed as V
  
type Time = Int

spansPhotons :: V.Vector Time -> [(Time,Time)] -> [V.Vector Time]
spansPhotons ts spans = map (\(start,_)->V.takeWhile (<start) ts) spans

main = do
  let times = V.generate 1000000 (*100)
      spans = [(10000*i, 10000*i+5000) | i <- [1..10000]]
  let bursts = spansPhotons times spans
  print $ sum $ map V.length bursts
  print $ map (const 1) $ bursts
  


00:16 < rwbarton> there the problem is that the bursts aren't sharing the same memory
00:17 < rwbarton> like they would if you wrote a similar program with ByteString
00:17 < rwbarton> the thing is that the Vector implementation of dropWhile is good for the stream fusion situation where you have a pipeline of operations
00:17 < quintessence> Yeah, I don't think dropWhile can participate in stream fusion and not copy
00:18 < rwbarton> but not good when you want to dropWhile many different times on the same input
00:18 < rwbarton> right
00:18 < quintessence> because when you have dropWhile p . map f the thing you want to share with doesn't exist
00:18 < rwbarton> so I don't know what the resolution would be
00:18 < bgamari> ouch
00:18 < rwbarton> besides giving some more manual control over stream fusion perhaps
00:19 < bgamari> Yep
00:20 < bgamari> That is a tricky one
00:20 < dmwit> dropWhileFusable + dropWhileCopy
00:20 < dmwit> err...
00:21 < dmwit> dropWhileFusable + dropWhileDoesn'tCopy, I guess
00:21 < quintessence> Could you implement dropWhile the non-fusing way and then have a RULE for dropWhile p . unstream = unstream . S.dropWhile p?
00:21 < rwbarton> at the least you could implement dropWhile' the non-fusing way; maybe that already exists somewhere in Data.Vector
00:21 < tibbe> Isn't there a function to explicitly copy? Like ByteString's clone?
00:21 < rwbarton> there is
00:21 < rwbarton> but the goal here is to not copy
00:22 < tibbe> I'm joining this discussion late I'm afraid :)
00:22 < tibbe> it should be possible to write rules that only fire when fusion doesn't happen
00:22 < rwbarton> the issue is a program like http://hpaste.org/64262#a64263 ends up allocating a bunch of nonoverlapping vectors
00:22 < tibbe> they could do sharing, but the performance model might turn out to be confusing
00:22 < rwbarton> when it could share them (and the documentation would have you believe it does)
00:23 < tibbe> how would you document dropWhile? Might share the underlying vector?
00:23 < bgamari> tibbe: the performance model is already a bit confusing ;)
00:23 < tibbe> :)
00:24 < bgamari> I'm surprised I'm the first person to encounter this
00:25 < bgamari> The bug report is http://hackage.haskell.org/trac/ghc/ticket/5893#comment:1
00:29 < bgamari> rwbarton, Do you mind if I quote you in a message to libraries@?



More information about the Libraries mailing list