[arch-haskell] Thoughts on Procedure

Fri Oct 15 04:27:17 EDT 2010

On Thu, Oct 14, 2010 at 20:04, Peter Simons <simons at cryp.to> wrote:
> Hi guys,
>
> in my understanding, our current update procedure works like this:
>
>  1) We notice that a package was updated (or added) on Hackage by means
>    of RSS.
>
>  2) A maintainer runs cabal2arch to generate an updated PKGBUILD.
>
>  3) If the generated PKGBUILD looks good, the file is committed to the
>    Git repository and uploaded to AUR.
>
> There are a few things worth noting about that procedure:
>
>  - A maintainer must perform 1 manual step per updated package: that is
>   linear complexity O(n).
>
>  - There is no mechanism to guarantee that the updated set of PKGBUILD
>   files actually works.
>
>  - It's common practice to use version control systems like Git to track
>   original source code. Our setup, however, tracks generated files: the
>   PKGBUILDs are produced automatically by cabal2arch. So why do we
>   track them? Shouldn't we rather track the Cabal files?
>
> Naturally, one wonders how to improve the update process. There are a
> few possible optimizations:
>
>  - The simplest way to verify whether all PKGBUILDs compile is to, well,
>   compile them. Given a set of updated packages, all packages that
>   directly or indirectly depend on any of the updated packages need
>   re-compilation, and the current set of PKGBUILDs is to be considered
>   valid only if all those builds succeed.
>
>  - It is possible to download the entire state of Hackage in a single
>   tarball. Given all the Cabal files, a Makefile can automatically
>   re-generate those PKGBUILDs that need updating. The same Makefile can
>   also run the necessary builds, and it also perform the necessary
>   uploads to AUR.
>
> Based on these thoughts, I would like to propose an improved procedure
> for discussion. Let our Git repository track a set of Cabal files. Then
> an update would work like this:
>
>  1) A maintainer downloads
>
>      http://hackage.haskell.org/packages/archive/00-index.tar.gz
>
>    and extracts the Cabal files into a checked-out Git repository.
>
>  2) Optionally, inspect changes with "git status" and "git diff".
>
>  3) Run "make all" to re-build all PKGBUILD files that need updating.
>
>  4) Run "make check" to perform all necessary re-builds of binary
>    packages. If all builds succeed, proceed with (5). Otherwise, figure
>    out which package broke the build and revert the changes in the
>    corresponding Cabal file. Go back to (3).
>
>  5) Run "make upload" and "git commit" the changes.
>
> Now, this procedure is supposed to update AUR, but "make upload" can be
> easily extended to copy the generated packages into a binary repository
> as well.
>
> The worst case scenario occurs when every single available update breaks
> during "make check". In that case, the procedure has linear complexity
> O(n). The best case scenario, on the other hand, is the one where every
> single update succeeds. That case is handled by running "make all &&
> make check && make upload", which gives constant complexity O(1).
>
> More importantly, however, the "make check" phase would guarantee that
> we never ever publish a configuration that doesn't compile.
>
> How do you feel about the idea?

Taking it one step further:

• Replace archhaskell/habs with a single version-controlled file
containing tuples of <package name, package version>.
• Make use of bauerbill's already existing support for hackage.  (I
don't know anything about the internals of bauerbill, but it might
need some extending to closer match what cabal2arch does.)

Then the process would be:

1. Monitor the RSS feed from hackage.
2. Modify the relevant tuples in the file.
3. Based on 'git diff' run bauerbill on the updated packages.
4. Find the dependants, and re-build them.
5. If all is well upload to AUR or the binary repo.
6. Rinse and repeat.

All steps could then be wrapped up in a makefile.  Furthermore,
bauerbill could just have knowledge of the control file we maintain,
and then step 5 can be skipped.

In any case, I feel that the discussion of what to store in our git
repo, whether it's Arch source packages or cabal files or tuples,
isn't that important at this point, i.e. your steps 2-5 are the steps
to concentrate on.  If we are going to attempt maintaining more than a
handful binary packages then we'll get most value out of automating
the time consuming bits of that.  Remy is hard at work on pieces of
that, but there's more to be worked out.

Don't get me wrong, I think it *is* worth discussing what we keep in
our repo but right now that seems to be the least of our problems, and
I think it won't be difficult to switch at a later date.

/M

-- 
Magnus Therning                        (OpenPGP: 0xAB4DFBA4)
magnus＠therning．org          Jabber: magnus＠therning．org
http://therning.org/magnus         identi.ca|twitter: magthe