[Haskell-cafe] Conduit+GHC high memory use for simple Sink

Michael Snoyman michael at snoyman.com
Thu Aug 28 04:00:41 UTC 2014


Firstly, I think you mean Bryan's code, right? Secondly, I wrote this email
five minutes before going to bed, so please excuse any inaccuracies or
incoherence in it.

(3) is different to the original because it is form as `await >>= maybe`.
There's a rewrite rule for this (providing the conduit 1.1 version):

{-# RULES "await >>= maybe" forall x y. await >>= maybe x y = ConduitM
(NeedInput (unConduitM . y) (unConduitM . const x)) #-}

This gives two advantages: it avoids a costly (at least in conduit 1.1)
monadic bind, and avoids creating a Maybe value that we're going to
immediately throw away. This is the general idea of church encoding data
structures, which is why it should give *some* speedup, but nothing
significant.

Regarding fold: it *is* identical to (3). I don't think the rewrite rules
are playing a big part here. You're right about needing to be careful about
fold due to WHNF. However, Bryan's code already used bang patterns to force
evaluation of both values, so that isn't a problem. In a real-world
application, I'd *also* recommend creating a strict pair datatype to avoid
accidentally leaking thunks.

But looking at the code again with fresher eyes than last night: I really
don't understand why it had such abysmal performance. I'll look into this a
bit more, looks like it should be interesting.


On Thu, Aug 28, 2014 at 1:39 AM, Dan Burton <danburton.email at gmail.com>
wrote:

> Michael, I don't see how your code sample for (3) is any different to the
> compiler than Roman's original sink2.
>
> I also don't see how the original sink2 creates a bad bind tree. I presume
> that the reason "fold" works is due to the streaming optimization rule, and
> not due to its implementation, which looks almost identical to (3).
>
> I worry about using fold in this case, which is only strict up to WHNF,
> and therefore wouldn't necessarily force the integers in the tuples;
> instead it would create tons of integer thunks, wouldn't it? Roman's
> hand-coded sink2 avoids this issue so I presume that's not what is causing
> his memory woes.
>
> -- Dan Burton
>
>
> On Wed, Aug 27, 2014 at 2:55 PM, Roman Cheplyaka <roma at ro-che.info> wrote:
>
>> * Michael Snoyman <michael at snoyman.com> [2014-08-27 23:48:06+0300]
>> > > The problem is the following Sink, which counts how many even/odd
>> Tokens
>> > > are
>> > > seen:
>> > >
>> > >   type SinkState = (Integer, Integer)
>> > >
>> > >   sink2 :: (Monad m) => SinkState -> Sink Token m SinkState
>> > >   sink2 state@(!evenCount, !oddCount) = do
>> > >     maybeToken <- await
>> > >     case maybeToken of
>> > >       Nothing     -> return state
>> > >       (Just Even) -> sink2 (evenCount + 1, oddCount    )
>> > >       (Just Odd ) -> sink2 (evenCount    , oddCount + 1)
>> >
>> > Wow, talk about timing! What you've run into here is expensive monadic
>> > bindings. As it turns out, this is exactly what my blog post from last
>> > week[1] covered. You have three options to fix this:
>> >
>> > 1. Just upgrade to conduit 1.2.0, which I released a few hours ago, and
>> > uses the codensity transform to avoid the problem. (I just tested your
>> > code; you get constant memory usage under conduit 1.2.0, seemingly
>> without
>> > any code change necessary.)
>>
>> Interesting. From looking at sink2, it seems that it produces a good,
>> right-associated bind tree. Am I missing something?
>>
>> And what occupies the memory in this case?
>>
>> Roman
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20140828/250a75fa/attachment-0001.html>


More information about the Haskell-Cafe mailing list