[Haskell-cafe] Re: Rewriting a famous library and using the same name: pros and cons

Wed Jun 9 23:25:14 EDT 2010

On 10 June 2010 12:38, sterl <s.clover at gmail.com> wrote:
> There's a big range of issues here, and to be honest I'm not sure if our
> ability to distinguished between them is helped by the title of this thread,
> which somewhat begs the question. That is to say, it isn't clear to me that
> calling the proposed changes to the fgl "rewriting a library" is necessarily
> accurate -- it seems more the case that these are incremental improvements
> of a library that require breaking API changes.

Except it is a re-write in the truest sense of the word: we started
completely from scratch.  We did compare our API with the current API
(in an attempt to keep function names, etc. the same where possible
because I'm hopeless at choosing names) but we didn't exactly take the
class as-is and then change it.  On the other hand, we were both
familiar with the current version of FGL and how it's layed out, so
there's probably some implicit influences from there as well.

Oh the other hand, it can also be considered as incremental
improvements: we wanted to keep the terminology and fundamental
concepts as similar as possible to avoid having a jarrring change in
how its used.  Instead, we focused on improving the current version:
using explicit data types for Context and Edge (for why Edge needs a
data type of its own, read the Graph section of "Fun with type
functions" by Oleg, SPJ and Chung-chieh Shan) rather than tuple
aliases; allowing restrictions on the label types (though we've just
come across a problem where this doesn't play nicely with mapping
functions); increasing the scope for per-instance optimisations, etc.

So in a sense we did a re-write that happened to come close to the
current definition.  This is not to say that this is because the
current API is close to an ideal perfect API, but rather because we
were focussing on developing something _like_ the current version
without worrying about compatability too much.

> So on the concrete issue at hand, I'd be for the new fgl version being
> developed under some new provisional name, and taking pains to provide a
> compatibility layer where possible. Then, after we see what the changes
> really are, coming to some informed decision on whether to rebrand it as the
> real fgl version 6. If so, the old stable fgl can be put up on hackage as
> fgl98, which lets packages which want to stick with it do so while avoiding
> any possibility of the dread diamond dependency.

Considering the "rename the old version" issues first:

* It won't solve the problem of people not specifying correct
constraints on the version of fgl used, since it means they'd have to
edit their dependencies to use fgl98 or whatever anyway.

* Calling it "fgl98" is on a slippery slope: what happens when
GHC-6.14 comes out with Haskell2010 support?  Do we then release an
fgl2010 version as well? (I believe Ross brought this problem up
already).

* I'm wanting people to move _off_ of the old version of fgl.  The
only real advantage (though how practical this will be in the real
world is debateable IMHO) is that the current version doesn't use any
extensions whereas the new one does (and they're needed to provide the
asked-for functionality of letting instance writers constrain the
types of labels - i.e. the reason why Set isn't an instance of Functor
- and to have custom Node types).  Since (like it or not) for the most
part when people write Haskell code they use GHC and GHC supports
these extensions, I do not think this is that much of a problem (I am
open to being convinced otherwise about this though; I think it would
be _great_ if there were other Haskell compilers that were as good as
if not better than GHC in terms of runtime, etc. ... until I start
considering how to manage two different compilers in Gentoo, etc. :p).

As for having a temporary name for the testing releases, I am open to
doing so, but this in affect pollutes the package name-space with
packages that shouldn't/wouldn't be used.  I would prefer to host it
elsewhere and just tell people to grab a copy and see what they think
rather than use a temporary name and then change it later when its
"stabilised".  It would be preferable IMO that if we were going to
change package names then it should be done once and then not changed
again.

> More broadly, we have to accept that breaking API changes are an irritating
> but necessary fact of life. As much as the parsec and quickcheck issues have
> caused some modest pain, there's been equal hassle from things like the
> strictness behavior of binary, or even the type change in tagsoup. Splitting
> out Category from Arrow caused me probably the most hassle. In retrospect it
> was the right thing to do. But how it was done was particularly abrupt and
> painful. Exceptions got it right in pretty much every respect, but still
> migration necessarily took some work. We want our packages to grow,
> including our core packages. Otherwise we get fragmentation and duplicated
> effort. When we want to grow, but don't know exactly how, then we get
> experimentation. But experimentation without some organization can lead to
> the wrong sort of fragmentation -- like the mtl mess, whose resolution now
> thankfully seems to be in hand.

Right.  It's this abrupt change that I'm trying to avoid by publically
warning people ahead of time that they should fix their package
dependencies and then have a series of preview releases to see what
people think.

I think to an extent the base-3 to -4 transition coupled with
exceptions was a jarring/coming-of-age point for the Haskell community
in terms of dependencies.  We'd already had the split base issue for
base-2 to -3, but that mainly involved using the split-base flag in
our .cabal files and adding dependencies on containers, arrays, etc.
where needed (and could almost have been automated).  However, with
the transition to base-4 we really started to get serious about proper
versioned dependencies (which is what I was trying to avoid by
starting this whole chain of emails) because developers were blindly
specifying either just base or "base >= 3" (and in some cases silly
things like "base < 5"; this has also occurred in packages that came
out after GHC 6.10.1 was released by people that should know better
resulting in packages that didn't build with base-4).

In cases like QuickCheck, Parsec, etc. this version dependency issue
is only half the problem (the other half being diamond dependencies).
But with fgl, the problem isn't that severe since there are very few
libraries that use FGL; most usages seem to be for applications.  As
such, the diamond dependency problem isn't that much of a
consideration in this case.

However, the mtl vs. transformers issue is to an extent a problem
here: if users have both versions of fgl installed (which ghc-pkg lets
you do), then there will be issues with developers trying to use one
but not the other.  For this problem, the best solution is probably to
make a concerted effort with all package maintainers that use fgl to
do a mass upgrade release at the same time the new version of fgl is
publically released (in terms of actually being worth using rather
than "hey, how about we do it this way?  you happy with this now?"
preview releases).

> Some lessons I think we can learn from the past about changes to widely-used
> stable APIs:
> * Clear and documented upgrade paths.

We're planning on writing suitable upgrade documentation.

> * Preferably a compat layer (Exceptions and Parsec both did a killer job
> with this).

Probably not going to happen here unfortunately.  However, there are a
few pseudo-compatability options that can help resolve this:

* When I get the generic graph class written (in about a months' time
at AusHack), people should start migrating their code to as low a
class in the hierarchy as they can (so if they don't need the
inductive nature of fgl, then there's no reason for them to specify
doing so in their type signatures).

* If you're writing an application rather than a library for graphs,
pick an appropriate graph type and stick with it (using various type
aliases where necessary).  That way, rather than having to have
polymorphic type signatures with type family notation (so stuff like
"Num (EdgeLabel g), NodeLabel g ~ ()") you can just use the actual
type or an alias of the graph type you're using.  We might be able to
provide this type of alias notation for a default graph type, but it
will probably require a different module to be imported than what is
currently use; i.e. new fgl still won't be a drop-in replacement for
old fgl.

> * No, or demonstrably minimal performance regressions.

In this case, the actual library itself is just a type class with a
couple of default instances so there should be no regressions.  In
fact, we're increasing the scope of per-instance optimisations so the
default graph type (based upon what is currently in
Data.Graph.Inductive.PatriciaTree) may end up being faster in some
situations (e.g. mapping over the labels).

> * Strong release notes and other documentation, either duplicating or
> supplementing what existed prior.

Definitely.  We're even considering using the new instance-level
documentation that Haddock 2.7 provides.

> * For particularly long-lived stable APIs, forking off a
> maintenance-mode-only version may make good sense, especially when the
> subset of language extensions used differs significantly.

I'm hoping that in this case that won't be neccessary.  What might
happen is that along with the preview releases (whether fgl-6.x or
otherwise), we might slowly start backporting some features (e.g.
usage of the generic graph class library) to the 5.y series.

> Some lessons to us API consumers who write somewhat-less-core packages:
> * Upper version bounds.

Pretty please? :p

I'm really looking forward to when Cabal supports PVP opt-in so that
Hackage will complain if you don't have proper bounds on packages that
follow the PVP.

> * If at all possible, don't move to the fancy new thing until the fancy new
> thing is fully baked, and on track to widespread adoption. (early adopters
> of new mtl implementations, I'm looking at you :-))

To an extent, this is a bit of a mixed bag; in cases like mtl vs
transformers, if we didn't have the early adopters then we would have
no impetus for _anyone_ to use the new version.

That said, in this case, DON'T USE THE NEW VERSION OF FGL UNTIL WE SAY
IT'S OK (probably the 7.x series)!!!!!!!!!!!!!!!!!!!!!!!

> * If at all possible, try to stay compatible with at least the prior GHC
> version as well as the current.

At the moment, this is rather easy to do unless you want to take
advantage of (or are being bitten by) the new locale-aware stuff in
GHC 6.12.

> * Don't pull in big packages for small reasons unless really necessary --
> minor duplication of trivial code is often the lesser evil.

I would argue that for this reason big packages may want to consider
being split up into smaller, more manage-able smaller packages.  For
example, we're going to split off the Data.Graph.Inductive.Query.*
modules into an fgl-algorithms package to make them easier to
maintain, etc.

> Some lessons for folks exploring new variants:
> * Don't step on already-used module names.

This depends on the situation; transformers (+ monads-fd) was meant to
serve as a drop-in replacement for mtl; this, however, obviously
causes problems when people are trying to use one or the other in ghci
and have both installed.

In this case, for fgl we're wanting to do a library upgrade, so IMO it
makes sense to use the same module names.

> Some technical issues that will help as time goes on (many already
> underway):
> * Depreciation of packages on hackage/redirects. (Makes it easier to
> establish upgrade / migration / transition paths).

There is already some support for this: packages on Hackage can be
explicitly marked as being deprecated and as such won't appear on the
default package listing (IIUC) but will still be pulled in by
cabal-install if necessary.

> * Tree organization of packages on hackage. (Reduces the noise generated by
> lots of small packages, and so encourages splitting things out).

Not sure what you mean by this.  If you're talking about per-category
trees, this wont' quite work: some packages will appear in multiple
categories (e.g. data structures + graphs) and as such this won't be a
tree.

> * Wikilike documentation features on hackage (lets users contribute and
> share upgrade paths, etc. more directly and simply -- hopefully will help
> with community documentation of packages in general).

Coming soon (as soon as someone codes it)!

> * The "local usage" annotation for cabal files to help avoid the dread
> diamond dependency.

I understand that this requires support in ghc-pkg first.

> * A DSL to describe transforms of Haskell programs for at least simple API
> migrations. Yes, this is a bit more "out there" but it's a great space to
> explore. The upside is not only better tools to help authors migrate their
> code, but a strong representation of what exactly the API changes are. So
> even if the spec language describes things that can't be applied
> automatically, it can still formalize what authors need to do. A standard
> format for an API change log as a hackage plugin would be a good start to
> this.

I would be wary of anything that tried to automagically upgrade my
code, since there would most likely be subtleties that it won't get
right.

-- 
Ivan Lazar Miljenovic
Ivan.Miljenovic at gmail.com
IvanMiljenovic.wordpress.com