[GHC] #9520: Running an action twice uses much more memory than running it once
GHC
ghc-devs at haskell.org
Thu Aug 28 13:42:01 UTC 2014
#9520: Running an action twice uses much more memory than running it once
-------------------------------------+-------------------------------------
Reporter: snoyberg | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.8.3
Keywords: | Operating System: Linux
Architecture: x86_64 (amd64) | Type of failure: Runtime
Difficulty: Unknown | performance bug
Blocked By: | Test Case:
Related Tickets: | Blocking:
| Differential Revisions:
-------------------------------------+-------------------------------------
This started as a [http://www.haskell.org/pipermail/haskell-
cafe/2014-August/115751.html Haskell cafe discussion] about conduit. This
may be related to #7206, but I can't be certain. It's possible that GHC is
not doing anything wrong here, but I can't see a way that the code in
question is misbehaving to trigger this memory usage.
Consider the following code, which depends on conduit-1.1.7 and conduit-
extra:
{{{#!hs
import Data.Conduit ( Sink, (=$), ($$), await )
import qualified Data.Conduit.Binary as CB
import System.IO (withBinaryFile, IOMode (ReadMode))
main :: IO ()
main = do
action "random.gz"
--action "random.gz"
action :: FilePath -> IO ()
action filePath = withBinaryFile filePath ReadMode $ \h -> do
_ <- CB.sourceHandle h
$$ CB.lines
=$ sink2 1
return ()
sink2 :: (Monad m) => Int -> Sink a m Int
sink2 state = do
maybeToken <- await
case maybeToken of
Nothing -> return state
Just _ -> sink2 $! state + 1
}}}
The code should open up the file "random.gz" (I simply `gzip`ed about 10MB
of data from /dev/urandom), break it into chunks at each newline
character, and then count the number of lines. When I run it as-is, it
uses 53KB of memory, which seems reasonable.
However, if I uncomment the second call to `action` in `main`, maximum
residency shoots up to 45MB (this seems to be linear in the size of the
input file. I additionally tried copying `random.gz` into two files,
`random1.gz` and `random2.gz`, and changed the two calls to `action` to
use different file names. It still resulted in large memory usage.
I'm going to continue working to make this a smaller reproducing test
case, but I wanted to start with what I had so far. I'll also attach the
core generated by both the low-memory and high-memory versions.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9520>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list