[Haskell-cafe] Conduit Best Practices for leftover data

Myles C. Maxfield myles.maxfield at gmail.com
Sun Apr 15 23:12:22 CEST 2012


Thanks for responding to this. Some responses are inline.

On Sat, Apr 14, 2012 at 8:30 PM, Michael Snoyman <michael at snoyman.com> wrote:
> On Thu, Apr 12, 2012 at 9:25 AM, Myles C. Maxfield
> <myles.maxfield at gmail.com> wrote:
>> Hello,
>> I am interested in the argument to Done, namely, leftover data. More
>> specifically, when implementing a conduit/sink, what should the
>> conduit specify for the (Maybe i) argument to Done in the following
>> scenarios (Please note that these scenarios only make sense if the
>> type of 'i' is something in Monoid):
>>
>> 1) The conduit outputted the last thing that it felt like outputting,
>> and exited willfully. There seem to be two options here - a) the
>> conduit/sink should greedily gather up all the remaining input in the
>> stream and mconcat them, or b) Return the part of the last thing that
>> never got represented in any part of anything outputted. Option b
>> seems to make the most sense here.
>
> Yes, option (b) is definitely what's intended.
>
>> 2) Something upstream produced Done, so the second argument to
>> NeedInput gets run. This is guaranteed to be run at the boundary of an
>> item, so should it always return Nothing? Instead, should it remember
>> all the input it has consumed for the current (yet-to-be-outputted)
>> element, so it can let Data.Conduit know that, even though the conduit
>> appeared to consume the past few items, it actually didn't (because it
>> needs more input items to make an output)? Remembering this sequence
>> could potentially have disastrous memory usage. On the other hand, It
>> could also greedily gather everything remaining in the stream.
>
> No, nothing so complicated is intended. Most likely you'll never
> return any leftovers from the second field of NeedInput. One other
> minor point: it's also possible that the second field will be used if
> the *downstream* pipe returns Done.

Just to help me understand, what is a case when you want to specify
something in this field? I can't think of a case when a Conduit would
specify anything in this case.

>
>> 3) The conduit/sink encountered an error mid-item. In general, is
>> there a commonly-accepted way to deal with this? If a conduit fails in
>> the middle of an item, it might not be clear where it should pick up
>> processing, so the conduit probably shouldn't even attempt to
>> continue. It would probably be good to return some notion of where it
>> was in the input when it failed. It could return (Done (???) (Left
>> errcode)) but this requires that everything downstream in the pipeline
>> be aware of Errcode, which is not ideal.I could use MonadError along
>> with PipeM, but this approach completely abandons the part of the
>> stream that has been processed successfully. I'd like to avoid using
>> Exceptions if at all possible.
>
> Why avoid Exceptions? It's the right fit for the job. You can still
> keep your conduit pure by setting up an `ExceptionT Identity` stack,
> which is exactly how you can use the Data.Conduit.Text functions from
> pure code. Really, what you need to be asking is "is there any logical
> way to recover from an exception here?"

I suppose this is a little off-topic, but do you prefer ExceptionT or
ErrorT? Any exception/error that I'd be throwing is just  a container
around a String, so both of them will work fine for my purposes.

>
>> It doesn't seem that a user application even has any way to access
>> leftover data anyway, so perhaps this discussion will only be relevant
>> in a future version of Conduit. At any rate, any feedback you could
>> give me on this issue would be greatly appreciated.
>
> Leftover data is definitely used:
>
> 1. If you compose together two `Sink` with monadic bind, the leftovers
> from the first will be passed to the second.

You can do that???? That's so cool!I never realized that Pipes are
members of Monad.

> 2. If you use connect-and-resume ($$+), the leftovers are returned as
> part of the `Source`, and provided downstream.

This too is really neat :] I didn't realize how this worked.

>
> Michael



More information about the Haskell-Cafe mailing list