[Haskell-cafe] Help requested: naming things in conduit

Fri Jun 29 06:22:19 CEST 2012

On Thu, Jun 28, 2012 at 8:36 PM, Paolo Capriotti <p.capriotti at gmail.com> wrote:
> On Thu, Jun 28, 2012 at 6:11 PM, Michael Snoyman <michael at snoyman.com> wrote:
>> Hi all,
>>
>> I'm just about ready to make the 0.5 release of conduit. And as usual,
>> I'm running up against the hardest thing in programming: naming
>> things.
>>
>> Here's the crux of the matter: in older versions of conduit, functions
>> would have a type signature of Source, Sink, or Conduit. For example:
>>
>>    sourceFile :: MonadResource m => FilePath -> Source m ByteString
>>
>> I think most people can guess at what this function does: it produces
>> a stream of ByteStrings, which are read from the given file.
>>
>> Now the trick: Source (and Sink and Conduit) are all type synonyms
>> wrapping around the same type, Pipe. Ideally, we'd like to be able to
>> reuse functions like sourceFile in other contexts, such as producing a
>> Conduit that calls sourceFile[1]. However, the type synonym Source
>> over-specifies some of the type parameters to Pipe, and therefore
>> `sourceFile` can't be used directly to create a Conduit[2].
>>
>> To get around this whole problem, I've added a number of type synonyms
>> with rank-2 types, that don't over-specify. You can see the type
>> synonyms here[3], and more explanation of the problem here[4]. So my
>> question is: can anyone come up with better names for these synonyms?
>> Just to summarize here:
>>
>> * All of the generalized types start with a G, e.g., Source becomes GSource.
>> * For Sinks and Conduits, if leftovers are generated, there's an L
>> after the G (e.g., GLSink).
>> * For Sinks and Conduits which consume all of their input and then
>> return the upstream result, we tack on an Inf for Infinite (e.g.,
>> GInfConduit, GLInfSink).
>>
>> I think these names are relatively descriptive, and certain `GSink
>> ByteString m Int` is easier to follow than `Pipe l ByteString o u m
>> Int`, but I was wondering if anyone had some better recommendations.
>
> I ran into this problem myself with my implementation that used 7 type
> parameter (the extra parameter wrt to conduit was used by Defer), and I
> couldn't think of any satisfactory solution.
>
> The dilemma here is:
>
>  - exposing the full `Pipe` type as the primary API would be really confusing
>   for new users
>  - creating a bunch of type synonyms adds a lot of conceptual overhead, and
>   it's actually a leaky abstraction, because `Pipe` will probably be shown in
>   error messages, and appears in the signatures of basic combinators
>
> In the end, I gave up the 2 non-essential parameters, built the corresponding
> lost features on top of `Pipe` using newtypes, and decided to expose a
> 5-parameter `Pipe` type with no universally quantified synonyms.
>
> I'm not sure how easy this Pipe type is to understand, but at least all
> parameters have a clear meaning that can be explained in the documentation,
> whereas the `l` parameter is sort of a hack (like my 'd' parameter).

I think even five parameters are too many. The original conduit types
had either 2 or 3 parameters, and each one was essential and easily
explainable. I realize that- for now- type synonyms will not help at
all with error messages (which I consider a serious problem), but at
least normal API functions like sourceFile will get helpful
signatures.

One idea that I've toyed around with- but not really pursued- is
creating actual newtypes for Source, Conduit, and Sink, and using
Chris's typeclass approach for when we want general functions. After
some basic fiddling, the typeclasses just seem to make everything more
difficult to work with.

You're correct by the way that we need a lot of type synonyms (I got 9
of them). But I still think it helps with the overhead instead of
hurting. While it may be important for some cases to understand the
different between GSink and GLSink, for most use cases simply knowing
"oh, this thing takes a stream of `a` and gives a single result of
`b`" is sufficient. But I think only real world usage is going to help
us determine the best approach here.

Michael