[Haskell-cafe] Conduit+GHC high memory use for simple Sink
Bryan Vicknair
bryanvick at gmail.com
Wed Aug 27 18:19:37 UTC 2014
Hello Cafe,
First I'd like to thank Michael Snoyman and Gabriel Gonzalez for the work
they've done on the conduit and pipes stream processing libraries, and all the
accompanying tutorial content. I've been having fun converting a text
processing app from C to Haskell.
I'm seeing unexpectedly high memory usage in a stream-processing program that
uses the conduit library.
I've created a example that demonstrates the problem. The program accepts gzip
files as arguments, and for each one, classifies each line as either Even or
Odd depending on the length, then outputs some result depending on the Sink
used. For each gzip file:
action :: GzipFilePath -> IO ()
action (GzipFilePath filePath) = do
result <- runResourceT $ CB.sourceFile filePath
$$ Zlib.ungzip
=$ CB.lines
=$ token
=$ sink2
putStrLn $ show result
The problem is the following Sink, which counts how many even/odd Tokens are
seen:
type SinkState = (Integer, Integer)
sink2 :: (Monad m) => SinkState -> Sink Token m SinkState
sink2 state@(!evenCount, !oddCount) = do
maybeToken <- await
case maybeToken of
Nothing -> return state
(Just Even) -> sink2 (evenCount + 1, oddCount )
(Just Odd ) -> sink2 (evenCount , oddCount + 1)
When I give this program a few gzip files, it uses hundreds of megabytes of
resident memory. When I give the same files as input, but use the following
simple Sink, it only uses about 8Mb of resident memory:
sink1 :: MonadIO m => Sink Token m ()
sink1 = awaitForever (liftIO . putStrLn . show)
At first I thought that sink2 performed so poorly because the addition thunks
were being placed onto the heap until the end, so I added some bang patterns to
make it strict. That didn't help however.
I've done profiling, but I'm unable to figure out exactly what is being added
to the heap in sink2 but not sink1, or what is being garbage collected in
sink1, but not sink2.
The full source is here:
https://bitbucket.org/bryanvick/conduit-mem/src/HEAD/hsrc/bin/mem.hs
Or you can clone the repo, which contains a cabal file for easy building:
git clone git at bitbucket.org:bryanvick/conduit-mem.git
cd comduit-mem
cabal sandbox init
cabal install --only-dependencies
cabal build mem
./dist/build/mem/mem [GIVE SOME GZIP FILES HERE]
You can change which sink is used in the 'action' function to see the different
memory usage.
More information about the Haskell-Cafe
mailing list