[Haskell-beginners] leaky folding
Daniel Fischer
daniel.is.fischer at web.de
Sun Aug 15 10:20:54 EDT 2010
On Sunday 15 August 2010 09:58:49, Travis Erdman wrote:
> in the code below, finalmap seems fast enough ... but it has a space
> leak. otoh, finalmap'rnf runs in constant space, but its performance is
> terrible, at least 4x slower than finalmap.
The performance isn't so bad, actually.
Consider that in each step it has to rnf the entire map. Even for small
maps like this one, that's quite a bit of work (and mostly unnecessary,
because almost everything is already in normal form).
>
> this is a common problem i'm having ... foldl' isn't strict enough, but
> foldl'rnf kills performance.
Firstly, foldl'rnf is a good idea *only* for small structures or if in each
step large parts of the structure are changed. If you change only small
parts of a large structure, you're wasting a lot of work.
Secondly, in this case your problem is a) the choice of a suboptimal data
structure b) a bad choice of functions to manipulate the structure c)
perhaps the lacking strictness of Data.IntMap.
a) STUArray would be better. But that makes itself really felt only for
larger n.
b) that's the biggo
c) Data.IntMap doesn't offer any strict versions of insertWith,
insertWithKey, adjust et al, which would often make it far easier to avoid
space leaks. Data.Map at least offers strict(er) versions of
insertWith[Key].
> And not only with IntMap as the cumulating
> data structure, but others as well.
>
> any ideas on this one?
Sure. See below.
> how can i get a fast fold in constant space?
>
> thanks again,
>
> travis
>
>
> {-# LANGUAGE BangPatterns #-}
>
> import System.Environment
> import Foreign (unsafePerformIO)
> import System.Random.Mersenne
> import Data.List
> import Control.DeepSeq
> import Control.Parallel.Strategies
> import qualified Data.IntMap as IntMap
>
> mersennegen = unsafePerformIO $ newMTGen Nothing
> infrandoms = unfoldr ( Just . splitAt 3) $ map (\x -> abs (x `mod` n))
> (unsafePerformIO $ (randoms mersennegen)::[Int])
Arrrgh! Use of unsafePerformIO in that way makes my head hurt. Pass things
as arguments, please.
>
> n = 200
>
> foldl'rnf :: NFData a => (a -> b -> a) -> a -> [b] -> a
> foldl'rnf f z xs = lgo z xs
> where
> lgo z [] = z
> lgo !z (x:xs) = lgo (runEval (rdeepseq (f z x))) xs
>
> startmap = IntMap.fromDistinctAscList $ zip [0..] [1..n]
That is almost certainly a bad idea. If the map contains a contiguous range
of keys, an array is practically guaranteed to be more appropriate (uses
less memory and is faster to boot).
>
> finalmap x = foldl' g startmap (take x infrandoms)
Okay, that looks reasonable.
> finalmap'rnf x = foldl'rnf g startmap (take x infrandoms)
As stated above, rnf'ing the entire map at each step is a baad idea.
You can get reasonable performance while squashing the leak by splitting
your list in chunks of size k and rnf'ing the map only after each chunk has
been processed.
finalmap'rnf x = foldl'rnf h startmap
(takeWhile (not . null)
(unfoldr (Just . splitAt 200) (take x infrandoms)))
where
h mp lst = foldl' g mp lst
The chunk size should be approximately the size of the map, so that after
each chunk
- a large part of the map has been changed
- no value has been changed too often.
Then rnf'ing the entire map doesn't waste too much work (since only a small
part of the map is already in NF) and no key maps to a too large thunk
(which can cause a space leak with vanilla foldl', though here the problem
is something else).
But, here at least, changing the folded function does better.
>
> g:: IntMap.IntMap Int -> [Int] -> IntMap.IntMap Int
> g !a [x,y,z] = IntMap.adjust (const $ y + (a IntMap.! z) `mod` n) x a
Oy gevalt!
The bang on the map parameter is superfluous since we foldl' anyway, but
that's no big deal.
IntMap.adjust (const val) is not the best if the key at which the map is to
be adjusted is guaranteed to be in the map. Then IntMap.insert is better.
But the difference is small, and if the key is not algorithmically
guaranteed to be present, adjust is cleaner. So the use of adjust is no big
deal either, even if insert is faster here.
What is bad is that you IntMap.adjust (const thunk).
Well, that alone isn't so bad.
What kills you is that the thunk refers to the map.
Thus each modified map contains a reference to the previous version, the
old contents can't be garbage collected, hello space leak (and finally
getting the values involves hopping through old cells).
If you modify a data structure, don't let the modified version contain any
thunks referencing the old. That spells space leak.
With
g a [x,y,z] = let !w = y + (a IntMap.! z) `mod` n
in IntMap.adjust (const w) x a
you're fine with foldl'.
Question by the way, did you really want
y + ((a ! z) `mod` n)
or rather
(y + (a ! z)) `mod` n
?
>
> main = do
> args <- getArgs
> print $ finalmap (read $ head args)
More information about the Beginners
mailing list