Proposal: deepseq should not depend on containers

Tue Jan 4 14:05:41 CET 2011

On Tue, Jan 4, 2011 at 11:51 AM, Christian Maeder
<Christian.Maeder at dfki.de> wrote:
> classes and data types are usually created independently and one cannot
> expect all data type maintainers to continuously adapt their packages to
> provide instances for new classes.

That is a downside of the approach I'm proposing. It's small in
comparison to the downsides of the alternatives.

Note that the opposite approach, having the package that defines the
type class define the instances, would mean that the maintainers of
that package would have to add a new instance for every data type
(that makes sense) that gets added to Hackage. Since the number of
data types is much larger than the number of type classes, my proposal
distributes the work among many more maintainers.

The problem with having the package defining the class depend on the
packages defining the data types is that this approach doesn't scale.
If we applied it consistently the deepseq package should depend on
every package on Hackage that defines a data type (i.e. every package
on Hackage), as NFData instances make sense for just about every data
type. The current deepseq packages picks two at semi-random (i.e. due
to legacy reasons.)

> Will the next step be to move the Binary instances from the binary
> package to containers, too? (There are plenty of other serialization
> classes!)

Aside: I think the binary package should be split into two packages.
It currently exports two completely separate pieces of functionality:

 * Low-level primitives for reading/writing machine native types:
integers (unsigned or not) and binary blobs.
 * A specific and undocumented data format, via the Binary class.

I don't think a type class approach is necessary the right thing.
However, if we stick with one I suggest that the package provides
instances for the data type in base, as the package (and indeed almost
all packages) have to depend on base anyway. If the data format is to
be extended to serialize containers the instances should be defined in
terms of another type class and not in terms of a particular type.
Tying a data format to a particular implementation of a container
doesn't sound like a great idea. Here's a sketch of what this could
look like:

class IMap where
    empty :: ...
    insert :: ...

instance IMap m => Binary m where
    get = do
        -- decode data and call insert repeatedly.

The binary package would depend on the package that defines this data type.

>> Orphan instances should be avoided as they can cause hard to prevent
>> and hard to fix breakages in large code bases.
>
> The problem with large code bases are only duplicate orphaned instances
> that are added only as non-separated parts of other code.
>
> If all code would be based on the same instances (provided by a central
> package on hackage!) I see no problem.

If I understand you correctly you're suggesting that the problem is
solved by convention e.g. that all instances belonging to a particular
type class is found in a particular package and no one should define
an instance of this type class elsewhere? Is that a correct
interpretation? I think there are at least three issues with this
approach:

 * The package that defines the instance might have to depend on all
of Hackage (see above). This is the same as when the package that
defines the type class also defines the instances.

 * I don't think this is enforceable, even by convention. Where do you
put instances for data types not on Hackage (e.g. in some company's
source control repo)?

 * You need to tell everyone to not define instances for data types in
the same package as they define the data type as all instances are
supposed to be in the special instances package. If they do add the
instance in the package that defines the data type, things will break
when the *-instances package adds the same instance.

> If NFData is such an important class it should go into the base package.
> Since other data types depend on base anyway, then there is no need to
> change the dependency from "base" to "base, deepseq" for many data
> packages.

Any packages dependency problem can be solved by having only one
package (e.g. base). But all Haskell code (not even all type classes
people might ever define) can live in base. We can try to avoid the
question of where to declare what by say that some type classes are
"important" and goes into base and pretend other type classes don't
exist. This doesn't really answer the question of where to define type
classes and their instances, in general, though.

> Is it possible to find out where (if) NFData instances of container
> types are actually used (for hackage packages)?

They are used in a bunch of Criterion benchmarks for different
packages at the moment.

Johan