[Haskell-cafe] Re: Haddock API and .haddock interface files questions

Sat Oct 30 05:59:56 EDT 2010

2010/10/26 Claus Reinke <claus.reinke at talk21.com>:
> Some questions about Haddock usage:
>
> 1. Haddock executable and library are a single hackage package,
>   but GHC seems to include only the former (haddock does not
>   even appear as a hidden package anymore). Is that intended?

Yes, I think that's so that GHC maintainers don't need to worry about
API changes in Haddock when making new releases. The Haddock API is
not very stable.

> 2. Naively, I'd expect Haddock processing to involve three stages:
>   1. extract information for each file/package
>   2. mix and match information batches for crosslinking
>   3. generate output for each file/package
>
>   I would then expect .haddock interface files to repesent the
>   complete per-package information extracted in step 1, so    that packages
> with source can be used interchangeably
>   with packages with .haddock files.
>
>   However, I can't seem to use 'haddock --hoogle', say, with
>   only .haddock interface files as input ("No input file(s).").

Haddock currently mostly works on GHC's front-end AST, called HsSyn,
which is not stored in the .haddock files, so that's why you need
sources.

I say mostly, because the one-year old feature that we call
cross-package documentation (allowing the user to re-export
documentation from other packages), is implemented by taking
information from GHC's .hi files, converting that to HsSyn. The syntax
used in the .hi files is slightly less detailed than HsSyn so we loose
some information about the exact declaration syntax used by the
programmer (brackets in type expressions, infix/prefix declaration
styles, etc - nothing that is semantically relevant).

In theory we could continue along that road and let you build output
from a combination of .haddock and .hi files. Or we could do as you
say and just put everything in the .haddock files (in which case we
could use the HsSyn type).

> 3. It would be nice if the Haddock executable was just a thin
>   wrapper over the Haddock API, if only to test that the API
>   exposes sufficient functionality for implementing everything
>   Haddock can do.

Yes, good idea. We haven't done that yet since the API started out as
something quite experimental, and it's still in that stage although it
has gained a lot more functionality recently.

>   Instead, there is an awful lot of useful code in Haddock's
>   Main.hs, which is not available via the API. So when coding
>   against the API, for instance, to extract information from
>   .haddock files, one has to copy much of that code.
>
>   Also, some inportant functionality isn't exported (e.g., the
>   standard form of constructing URLs), so it has to be copied
>   and kept in synch with the in-Haddock version of the code.

Right. We should export that.

>   It might also be useful to think about the representation
>   of the output of stage 2 above: currently, Haddock directly
>   generates indices in XHtml form, even though much of
>   the index computation should be shareable accross
>   backends. That is, current "backends" seem to do both
>   stage 2 and stage 3, with little reuse of code for stage 2.

True. The index could be factored out of the Xhtml backend and added
to the output of stage 2.

> It seems that exposing sufficient information in the API, and
> allowing .haddock interface files as first-class inputs, there
> should be less need for hardcoding external tools into Haddock
> (such as --hoogle, or haddock-leksah). Instead, clients should
> be able to code alternative backends separately, using Haddock
> to extract information from sources into .haddock files, and
> the API for processing those .haddock files.
> Are these expectations reasonable, or am I misreading the intent behind API
> and .haddock files? Is there any documentation about the role and usage of
> these two
> Haddock features, as well as the plans for their development?

No documentation yet, but yes, the long term plan is to be able to
split Haddock in parts: one program that creates data from sources,
probably resulting in a .haddock file or maybe something text based,
and backends that use those files. The API should provide a convenient
way to read the files. It's not been fleshed out in detail yet, and
the API is quite ad-hoc at the moment so we need think more about this
and write documentation on the Haddock trac.

Thanks for the input!

David