[Haskell-cafe] Re: GSoC: Hackage 2.0

Fri Apr 9 08:45:24 EDT 2010

On Wed, 2010-04-07 at 00:40 -0400, Matthew Gruen wrote: 
> Hi Haskellers,
> 
> I'm Matt Gruen (Gracenotes in #haskell), and the Hackage 2.0 SoC
> project at <http://hackage.haskell.org/trac/summer-of-code/ticket/1587>
> really piqued my interest. It seems doable, in a summer, to make the
> new hackage-server more-than-deployment-ready as well as clearing out
> some items in the hackage bug tracker[0]; so, I've been working on a
> proposal. In this email I'd like to consolidate my mental notes for
> haskell-cafe and formulate a roadmap towards a more social Hackage.

Great.

> The most vital part is getting hackage-server
> <http://code.haskell.org/hackage-server/> to a state where it can be
> switched in place of hackage-scripts
> <http://darcs.haskell.org/hackage-scripts/>, and doing it properly,
> organizing the code so it can be extended painlessly in the future.

Yes. I should warn you that I've become increasingly keen on the latter
aspect recently. :-)

> For putting the 2.0 in Hackage 2.0, any interface changes should help
> the library users and the library writers/uploaders without hurting
> either of them. 

Yes, there can sometimes be a bit of a tradoff between users and
uploaders. With some proposed features we have to be careful not to
annoy one group or the other.

> Hackage should contain more of the right kind of information.
> Statistics help everyone, and they're a pretty good gauge on the
> direction of Hackage as a whole. Package popularity contents are one
> form of this. Reverse dependencies and even dependency graphs[1] are
> great, if I can integrate and expand Roel van Dijk's work[2].

Yep, reverse deps are totally doable and really useful. Number of
reverse deps, combined with number of downloads is probably a pretty
good popularity metric.

> There should also be some space on package pages, or on pages a link
> away from them, for users to contribute information and suggestions.
> Coders can explain why or why not the package met their needs, as a
> sort of informal bug/enhancement tracking service.

Yeah, that's where we've got to be careful. Many packages already have
bug trackers and maintainers do not necessarily want yet another website
to have to cover to see where users are complaining. I think a user
commenting system is probably one of the most tricky bits to design,
because of the social aspects. There are issues like not duplicating
existing mailing lists / bug trackers / wikis and trying to keep
information relevant as new releases come out (eg imaging a comment
saying "this package is no good because it does not have feature X" and
yet the current release has feature X).

My suggestion is to put this feature further down the TODO list.

> Another helpful flavor of information is package relationships beyond
> dependencies: 'Deprecated in favor of Foo', 'a fork of Foo'

Yes, deprecation is important. We currently have some support for that,
but it's not very good or easy for maintainers to use.

> There's also a need for a more interactive form of package
> documentation, but this should strengthen relationships with existing
> tools like Haddock and Cabal, not bypass the tools. For example,
> adding a changelog[3] or making Haddock's declaration-by-declaration
> commentary more wiki-like[4]. Changelogs seem to be within the scope
> of Hackage 2.0, integrating with Cabal; Haddock wikification might not
> be, perhaps deserving a separate student-summer session of its own.
> These can improve the package page and documentation subtrees.

Yes, I'd suggest looking at the changelog issue but probably not wiki
haddock editing. That would indeed be cool but is a rather bigger scope.

> More generally, how can library users find the package they want?

Search! Metrics!

> Categories themselves are great, but a tag system could identify and
> group specific package functionality. There could be sorting by
> ratings and reviews (4/5 lambdas!). Metadata searches, like those
> Sascha Böhme implemented in SoC 2007[5], could be integrated. It's not
> always obvious which ideas will help and which won't see good returns,
> which makes it all the more important to bring hackage-server to a
> state where future extensions can be easily written, submitted and
> deployed. That's the goal here.

Again, I suspect this is a feature too far for a GSoC. If we can build
the infrastructure which makes adding such features easier then the
project would be a success.

> On the technical side, I realize I'd need to spend a not-insignificant
> amount of time on a user account system, dealing with authentication
> and related issues. One additional bit of functionality to manage is
> the hackage build system, which is used to ensure that packages build
> and to generate documentation. When building depends on FFI or
> OS-specific bindings, specific versions of other packages, compiler
> choice or compiler version choice, including language extensions, this
> is not trivial. One of two good routes is running cabal server-side to
> generate build reports and alleviate the miscompiling a smidge. Given
> that cabal install dependency calculation seems to be up in the air
> still, the current Setup-running script might end up staying for the
> moment, with some basic integrity checking (no cycles or other
> impossible dependency scenarios). The other not-mutually-exclusive
> approach is accepting failed build reports from users as a web
> service[6] to generate a matrix of the platforms that seem to
> encounter the most trouble.

Yes, I think build reporting is probably more important at the moment
than user comments or integration with hayoo/hoogle.

> Ultimately, it's important that I, or whoever ends up doing the
> project, plans for it to benefit you guys. What do you think?

So as I mentioned at the beginning, one of the most important aspects I
think is to get the architecture right. The hackage server will
increasingly become a huge database of useful information and it is
important that that information can be easily got at by other tools and
systems. My suggestion is to use a RESTful approach and to make
information available in human and machine readable formats so that it
is usable as a website and also by other specialised clients.

Another aspect is the internal architecture. We want to be able to add
(and remove) features relatively easily without everything getting
entangled together. Partly we want this for smoother feature growth (or
feature replacement) and partly because it is quite likely that we will
want slightly different sets of features in: a test/staging server, the
central server, package mirror servers, in-house team servers.

Duncan