[Haskell-cafe] Re: could we get a Data instance for Data.Text.Text?

Sat Jan 23 19:02:43 EST 2010

On Sat, Jan 23, 2010 at 4:57 PM, Jeremy Shaw <jeremy at n-heptane.com> wrote:
>  On Sat, Jan 23, 2010 at 7:57 AM, Neil Mitchell <ndmitchell at gmail.com>
> wrote:
>
>>
>> No, that's definitely not correct, or even remotely scalable as we
>> increase the number of abstract types in disparate packages.
>
> Yes.. happstack is facing another aspect of this scalability issue as well.
> We have a class, Serialize, which is used to serialize and deserialize data.
> It builds on the binary library, but adds the ability to version your data
> types and migrate data from older versions to newer versions.
> This has a serious scalability issue though, because it requires that each
> type a user might want to serialize has a Serialize instance.
> So do we:
>   1. provide Serialize instances for as many data types from libraries on
> hackage as we can, resulting in depending on a large number of packages that
> people are required to install, even though they will only use a small
> fraction of them.
>   2. convince people that Serialize deserves the same status as Data, and
> then convince authors to create Serialize instances for their type? It would
> be nice, but authors will start complaining if they are asked to provide a
> zillion other instances for their types as well. And they will be annoyed if
> they their library has to depend on a bunch of other libraries, just so they
> can provide some instances that only a small fraction of their users might
> use. So, this method does not scale as the number of 'interesting' classes
> grows.
>   3. let individual users define the Serialize instances as they need them.
> Unfortunately, if two different library authors defined a Serialize instance
> for Text in their libraries, you could not use both libraries in your
> application because of the conflicting Serialize instances. So this method
> does not scale when the number of libraries using the Serialize class grows.
> Not really sure what the work around is. #1 could work if there was some way
> to just selectively install the pieces as you need them. But the only way to
> do this now would be to create a lot of cabal packages which just defined a
> single instance -- happstack-text, happstack-map, happstack-time,
> happstack-etc. One for each package that has types we want to create a
> serialization instance for...
> Any other suggestions?
> - jeremy

The only safe rule is: if you don't control the class, C, or you don't
control the type constructor, T, don't make instance C T.  Application
writers can often relax that rule as the set of dependencies for the
whole application is known and in many cases any reasonable instance
for a class C and constructor T is acceptable.  Under those
conditions, the worst-case scenario is that the application writer may
need to remove an instance declaration when migrating to new versions
of the dependencies.  When you control a class C, you should make as
many (relevant) type constructors instances of it as is reasonably
possible, i.e. without adding any extensive dependencies.  So at the
very least, all standard type constructors.  Similarly for those who
control a type constructor T.  This is for convenience.  These
correspond to solutions #1 and #2 only significantly weakened.
Definitely, making a package depend on tons of other packages just to
add instances is NOT the correct solution.

The library writers depending on a package for a class and another
package for a type are the problem case.  There are three potential
solutions in this case which basically are reduce the problem to one
of the above three cases.  Either introduce a new type and add it to a
class, introduce a new class and add the types to it, or try to push
the resolution of such things onto the application writer.  The first
two options have the benefit that they also protect you from the
upstream libraries introducing instances that won't work for you.
These two options have the drawback that they are usually less
convenient to use.  The last option has the benefit that it usually
corresponds to having a more flexible/generic library, in some cases
you can even go so far as to remove your dependence on the libraries
altogether.

One solution to this problem though it can't be done post-hoc usually,
is to simply not use the class mechanism except as a convenience.
This has the benefit that it usually leads to more flexibility and it
helps to realize the third option above.  Using Monoid as an example,
one can provide functions of the form: f :: m -> (m -> m -> m) -> ...
and then also provide f' = f mempty mappend :: Monoid m => ...  The
parameters can be collected into a record as well.  You could even
systematize this into: class C a where getCDict :: CDict a, and then
write f :: CDict a -> ... and f' = f getCDict :: C a => ...

Whatever one does, do NOT add instances of type constructors you don't
control to classes you don't control.  This can lead to cases where
two libraries can't be used together at all.