[GHC] #9520: Running an action twice uses much more memory than running it once

GHC ghc-devs at haskell.org
Thu Aug 28 13:42:01 UTC 2014


#9520: Running an action twice uses much more memory than running it once
-------------------------------------+-------------------------------------
       Reporter:  snoyberg           |                   Owner:
           Type:  bug                |                  Status:  new
       Priority:  normal             |               Milestone:
      Component:  Compiler           |                 Version:  7.8.3
       Keywords:                     |        Operating System:  Linux
   Architecture:  x86_64 (amd64)     |         Type of failure:  Runtime
     Difficulty:  Unknown            |  performance bug
     Blocked By:                     |               Test Case:
Related Tickets:                     |                Blocking:
                                     |  Differential Revisions:
-------------------------------------+-------------------------------------
 This started as a [http://www.haskell.org/pipermail/haskell-
 cafe/2014-August/115751.html Haskell cafe discussion] about conduit. This
 may be related to #7206, but I can't be certain. It's possible that GHC is
 not doing anything wrong here, but I can't see a way that the code in
 question is misbehaving to trigger this memory usage.

 Consider the following code, which depends on conduit-1.1.7 and conduit-
 extra:

 {{{#!hs
 import Data.Conduit ( Sink, (=$), ($$), await )
 import qualified Data.Conduit.Binary as CB
 import System.IO (withBinaryFile, IOMode (ReadMode))

 main :: IO ()
 main = do
     action "random.gz"
     --action "random.gz"

 action :: FilePath -> IO ()
 action filePath = withBinaryFile filePath ReadMode $ \h -> do
     _ <- CB.sourceHandle h
       $$ CB.lines
       =$ sink2 1
     return ()

 sink2 :: (Monad m) => Int -> Sink a m Int
 sink2 state = do
   maybeToken <- await
   case maybeToken of
     Nothing     -> return state
     Just _      -> sink2 $! state + 1
 }}}

 The code should open up the file "random.gz" (I simply `gzip`ed about 10MB
 of data from /dev/urandom), break it into chunks at each newline
 character, and then count the number of lines. When I run it as-is, it
 uses 53KB of memory, which seems reasonable.

 However, if I uncomment the second call to `action` in `main`, maximum
 residency shoots up to 45MB (this seems to be linear in the size of the
 input file. I additionally tried copying `random.gz` into two files,
 `random1.gz` and `random2.gz`, and changed the two calls to `action` to
 use different file names. It still resulted in large memory usage.

 I'm going to continue working to make this a smaller reproducing test
 case, but I wanted to start with what I had so far. I'll also attach the
 core generated by both the low-memory and high-memory versions.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9520>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list