[Haskell-beginners] space leak

Daniel Fischer daniel.is.fischer at web.de
Mon Feb 15 12:07:16 EST 2010


Am Montag 15 Februar 2010 16:44:51 schrieb Uchida Yasuo:
> Hello,
>
> I came across the following space leak problem today.
> How can I fix this?
> (Tested on Mac OS X 10.5.8, GHC 6.10.3)
>
> -- test.hs
> module Main where
>
> import System
> import qualified Data.ByteString.Lazy.Char8 as L
>
> main = do args <- getArgs
>           let n = read $ args !! 0
>           cs <- L.getContents
>           let !a = L.take n cs

The problem is this. The Bang pattern does less than you probably think.
The definition of lazy ByteStrings is

data ByteString = Empty | Chunk {-# UNPACK #-} !S.ByteString ByteString

, so when you write

let !a = L.take n cs

, you force the constructor (null cs ? Empty : Chunk start rest), Since cs 
is not empty, it's Chunk, and that forces the first part of the ByteString, 
which will be as long as the prefix which stdin immediately delivers, but 
at most the default chunk size (32K or 64K, normally [minus two words for 
bookkeeping]).

If n is larger than a) the default chunk size or b) what L.getContents got 
immediately[*], a holds on to the (almost) entire input and you have a bad 
memory leak.

Fix: force a to be completely evaluated, e.g.

    let !a = L.take n cs
        !l = L.length a

By evaluating the length, a doesn't keep references to cs and all can be 
garbage collected.

[*] how long the first chunk is, depends in this pipeline on scheduling, 
number of available cores/CPUs, OS buffer size.

>           mapM_ (print . L.length) $ L.lines cs
>           print a
>
>
> -- gen.hs
> module Main where
>
> main = do putStrLn $ take 1000000 $ cycle "foo"
>           main
>
>
> These are compiled with the following options:
>
> $ ghc --make -O2 test
> $ ghc --make -O2 gen
>
> The memory usage seems to depend on the argument(=17000) passed.
> On my MacBook(Core2 Duo 2.0GHz), 16000 works fine.


More information about the Beginners mailing list