A Haskell Documentation Standard

Simon Marlow simonmar@microsoft.com
Thu, 1 Feb 2001 02:56:27 -0800


Hi folks,

Henrik argues in favour of a raw documentation format:

> I do not think that the staged approach is absolutely essential:
> if we just could come up with a good standard for how to write
> embedded documentation in Haskell, that would be extremely valuable
> in its own right. But I think there are a number of really compelling
> reasons for taking the staged approach and also standardize 
> on the "raw"
> machine-readable, documentation format which would be the output
> from the first stage.
> 
> One reason is that there potentially are a large number of different
> formats in which one might want to generate documentation. Some might
> be of general interest, some might have more limited scope. One
> can even imagine completely different applications in which it is
> important to have access to "documentation level" information about
> source code, for example various serach tools. Given a carefully
> designed "raw" format as a starting point, it is fairly easy to write
> such tools. There is evidence of this from the past: both the
> Fudgets documentation tool and the HaskellDoc tool used HBC
> generated interface files as their "raw" format to good effect.
> But this also meant that these tools became system specific, and
> they were also limited by what HBC (ok, Lennart) happened to record
> in the interface files. E.g. for HaskellDoc, this meant that there
> are only hyper-links to entire source code files, not to individual
> functions. (OK, that might be a limitation of HTML as well.)
> A more recent example, of course, is Jan's source code browser.
> 
> Thus I view the "raw" format as a way of getting the benefits from
> creative use of interface files, while avoding getting tied to
> and restricted by some particular Haskell compiler.
> 
> One could argue that given a freely available, easy-to-use,
> documentation-extracting Haskell parser, the above is a non issue.
> Just incorporate that code into your own. But I can see that
> approach leading to a number of maintenance problems which a
> well-specified and extensible file format would insulate against.
> Also, some peolpe might prefer to write their documentation generator
> in something else than Haskell (assuming that the above-mentioned
> parser is a piece of Haskell code).

Good point.  So, taking the idea of XML as widely implemented
machine-readable format, I'd like to propose a low-effort way to achieve
this (again, using freely available existing tools):

    Haskell source + annotations  ==> XML rendering ==> docs

Stage 1 consists of reading the source and documentation, and outputing
the information in XML.  The XML might or might not omit the actual
source code at this point.  This stage can be done by either the
compiler (in which case it could fill in any missing type signatures),
or it might be done by a HDoc backend which just generated XML.  The XML
can be generated by HaXml - basically plugging together the Haskell
parser and HaXml should give us stage 1.  

Stage 2 doesn't have to be written in Haskell, since we'll have the DTD
for the intermediate format, but if we're writing in Haskell we could
again use HaXml with the Haskell abstract syntax data type to read the
interface.

Furthermore, a tool like HDoc could be written to go straight from
Haskell to the documentation without passing through the intermediate
format, as Henrik points out.

If the embedded documentation is in XML format, this also fits in
nicely.

This seems neat, useful and not too much effort - what does everyone
think?  (I have a suspicion that we might need to tweak HaXml to
generate a nice-looking DTD from the abstract syntax).

      ...--------...

On a slightly higher level, I should say something about my motivations
and goals for this project.  

Another thing to come out of the recent implementors' meeting was a
proposal for a new module namespace and a set of libraries for Haskell
(from Malcolm Wallace); we're going to start discussing this soon.  With
a large set of interconnected libraries, automatically-generated
documentation becomes essential.  Also, much of the code we have already
has no documentation, or it is in differing formats.  Being able to
generate hyperlinked, indexed documentation from raw source code will be
a real win, before we gradually incorporate the existing documentation
back into the source.

Lots of people moan about the lack of actual documentation for the
Prelude beyond the source code in the report.  Having a way to take the
prelude source files, with comments appropriately turned into
documentation tags, and generate hyperlinked documentation will address
much of this criticism - and I imagine most of us would find it useful,
I certainly would.

I'm less concerned for now about the intermediate format, because the
above goals can be achieved with a single tool.  But after all, the two
issues are largely orthogonal - having a tool which can generate
documentation from source doesn't preclude also having a well-understood
intermediate format.

Cheers,
	Simon