[Haskell-cafe] Conduit+GHC high memory use for simple Sink
Michael Snoyman
michael at snoyman.com
Thu Aug 28 06:37:45 UTC 2014
Can you provide the output from +RTS -s, as well as the heap profile for
-hy?
I believe I'm now also reproducing the memory leak on conduit 1.2, so it
must have been a mistake in my testing last night when I thought 1.2 fixed
it.
On Thu, Aug 28, 2014 at 8:58 AM, Bryan Vicknair <bryanvick at gmail.com> wrote:
> Thanks for the interesting blog posts Michael. I updated the example
> project
> [1] to use conduit 1.2. Unfortunately, on my machine [2], my original
> sink2
> still uses about 500Mb of memory when processing 4 gzip files of about 5Mb
> each, while sink1 only uses about 8Mb. I added sink3, which does the same
> as
> sink2 but uses fold from Conduit.List as you recommended, and that seems to
> work, using about 8Mb.
>
> Looking at the code for sink2 vs sink3, I don't understand what would be
> occupying so much memory in sink2 even in the case of expensive monadic
> binding, or exclusion from stream fusion. I'm curious if sink2 adds
> thunks to
> the heap that sink3 doesn't, or if the GC is failing to clean up heap
> objects
> in sink2 that is cleans up in sink3. I'm new at memory profiling, but the
> chart I get with '+RTS -h' or '+RTS -hr' basically just tells me that the
> action function is expensive.
>
> In the real project that inspired this example I'm going to do some
> cleanup,
> replacing manual recursion with higher-level functions from Conduit.List,
> as
> that seems like an all around good idea.
>
>
> Bryan Vicknair
>
> [1] https://bitbucket.org/bryanvick/conduit-mem
> [2] GHC 7.8.3, Arch Linux 3.16.1 kernel x86-64
>
>
> On Thu, Aug 28, 2014 at 07:00:41AM +0300, Michael Snoyman wrote:
> <snip>
> > But looking at the code again with fresher eyes than last night: I really
> > don't understand why it had such abysmal performance. I'll look into
> this a
> > bit more, looks like it should be interesting.
> >
> >
> > On Thu, Aug 28, 2014 at 1:39 AM, Dan Burton <danburton.email at gmail.com>
> > wrote:
> >
> > > Michael, I don't see how your code sample for (3) is any different to
> the
> > > compiler than Roman's original sink2.
> > >
> > > I also don't see how the original sink2 creates a bad bind tree. I
> presume
> > > that the reason "fold" works is due to the streaming optimization
> rule, and
> > > not due to its implementation, which looks almost identical to (3).
> > >
> > > I worry about using fold in this case, which is only strict up to WHNF,
> > > and therefore wouldn't necessarily force the integers in the tuples;
> > > instead it would create tons of integer thunks, wouldn't it? Roman's
> > > hand-coded sink2 avoids this issue so I presume that's not what is
> causing
> > > his memory woes.
> > >
> > > -- Dan Burton
> > >
> > >
> > > On Wed, Aug 27, 2014 at 2:55 PM, Roman Cheplyaka <roma at ro-che.info>
> wrote:
> > >
> > >> * Michael Snoyman <michael at snoyman.com> [2014-08-27 23:48:06+0300]
> > >> > > The problem is the following Sink, which counts how many even/odd
> > >> Tokens
> > >> > > are
> > >> > > seen:
> > >> > >
> > >> > > type SinkState = (Integer, Integer)
> > >> > >
> > >> > > sink2 :: (Monad m) => SinkState -> Sink Token m SinkState
> > >> > > sink2 state@(!evenCount, !oddCount) = do
> > >> > > maybeToken <- await
> > >> > > case maybeToken of
> > >> > > Nothing -> return state
> > >> > > (Just Even) -> sink2 (evenCount + 1, oddCount )
> > >> > > (Just Odd ) -> sink2 (evenCount , oddCount + 1)
> > >> >
> > >> > Wow, talk about timing! What you've run into here is expensive
> monadic
> > >> > bindings. As it turns out, this is exactly what my blog post from
> last
> > >> > week[1] covered. You have three options to fix this:
> > >> >
> > >> > 1. Just upgrade to conduit 1.2.0, which I released a few hours ago,
> and
> > >> > uses the codensity transform to avoid the problem. (I just tested
> your
> > >> > code; you get constant memory usage under conduit 1.2.0, seemingly
> > >> without
> > >> > any code change necessary.)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20140828/213ffefd/attachment.html>
More information about the Haskell-Cafe
mailing list