[Haskell-cafe] Final steps in GHC's Trac-to-GitLab migration

Tobias Dammers tdammers at gmail.com
Wed Mar 6 10:55:15 UTC 2019


On Wed, Mar 06, 2019 at 09:32:44AM +0300, Ömer Sinan Ağacan wrote:
> - Redirects don't seem to work:
>   https://gitlab.staging.haskell.org/ghc/ghc/wikis/commentary/rts/heap-objects

I believe this is an unfortunate result of the way we migrate wiki
pages. The way that works is that we don't actually parse the original
Trac markup; instead, we scrape the rendered HTML directly from the live
Trac instance, and massage that into GitLab markup.

This has a few interesting consequences:

1. "Wiki processors", such as for example dynamically-generated TOCs and
issue lists, get to run on the Trac instance as we request the page, and
thus capture a snapshot of the dynamic data at the time of migration.
2. Redirects, being implemented as such wiki processors, cause
client-side redirects, which our scraper will not follow. Hence, the
converted page is based on an HTML page body that you don't normally get
to see, and no actual redirect is generated on the GitLab side of
things.
3. The scraper only looks at what is normally the actual page content;
any additional UI generated outside of the main content element is
ignored. Hence, when Trac generates links to the redirect target for
clients that do not support client-side redirects, those links don't
make it into the converted page.
4. Because redirects are usually the last thing to be added to a page,
that page's history ends there, and becomes the "current" version on the
GitLab side. So we end up with what you're seeing: a nonsensical page
that contains the fallback content, a somewhat cryptic question asking
whether it should redirect, and no way to answer that question.

Since GitLab doesn't have an equivalent to those "wiki processors", and
AFAIK does not cater for such redirects, the question is how we should
handle these. I can think of several options:

1. Do nothing; when anyone complains, fix the offending pages manually
(either by converting the useless redirect message into a proper
hyperlink, or by manually adding a rewrite entry to the nginx
configuration).
2. Generate a list of redirecting pages from the Trac dataset, either as
part of the import (2a), or with some grep/sed/awk magic based on the
converted git repo after the fact (2b); then use that list to generate
suitable nginx redirects.
3. Extend the import script to detect redirects, and special-case those
so that they render as proper links to the redirect target.
4. Do more research and see if there is a way to make GitLab redirect
based on wiki content, then extend the import script like in step 3, but
render redirecting pages to use the (currently hypothetical) redirect
feature.

Personally, I'm inclined to say let's go with option 2b: run the import,
then grep for 'redirect(wiki:', and massage that into nginx redirects.

TL;DR: the import currently ignores Trac wiki redirects, and I'm not
sure what the best way is to deal with this.


More information about the ghc-devs mailing list