[Haskell-cafe] Executing conduit streams in parallel leads to memory leaks

Simon Hafner hafnersimon at gmail.com
Wed Sep 13 18:02:46 UTC 2017


When I run my conduit without any additions, it works as expected,
with low constant memory usage, as advertised. It's a bit slow, so I
tried to speed it up with worker pools (via parallel-io) and staged
folding (via stm-conduit). However, then the memory usage indicates
all the ByteString from the file readings are being fully allocated
and kept in memory, even though they're not being used after a step of
conduit. [1]

I thought maybe because of the closing IO, the release of the file
handle somehow keeps the read string in memory, so I wanted to make
absolutely sure that's not the problem. [2] Switch out the
`Lib.readFile` with `B.readFile` to undo that specific part.

I was not using a worker pool in the beginning, so maybe the
`mapConcurrently_` somehow allocated all the threads, but with the
pooled solution, that should be solved as well.

What else could cause all the ByteStrings to be kept in memory in the
parallel version?

The example is available on:
https://github.com/reactormonk/non-constant-memory

[1] https://github.com/reactormonk/non-constant-memory/blob/master/src/Lib.hs#L51
[2] https://github.com/reactormonk/non-constant-memory/blob/master/src/Lib.hs#L62


More information about the Haskell-Cafe mailing list