A Haskell Documentation Standard

Henrik Nilsson nilsson@cs.yale.edu
Wed, 31 Jan 2001 19:06:46 -0500


Hi all,

It seems as the discussion has got going and that there are plenty
of ideas around. Good!

I'll get back with something more substantial in a bit. Trying
to structure what's been discussed so far so as to get a clearer
picture of the options is probably a good thing, and I also think
we should focus a bit more on the goal and design principles before
getting too involved in the details of different formats and such.

Anyway, just a remark on what Simon Marlow wrote:

> If I understand correctly, I think you were proposing a two stage
> process to get the documentation (similar to the Eiffel approach?):
> 
>         Haskell source --> interface ---> on-line documentation
>                                  `--> printed documentation
>                                        .....
>
> Why not do it in one?
> 
>         Haskell source ---> on-line documentation
>                            `--> printed documentation

I don't know if "you" above referred to me, but that was indeed
part of what I suggested at the HI workshop.

I do not think that the staged approach is absolutely essential:
if we just could come up with a good standard for how to write
embedded documentation in Haskell, that would be extremely valuable
in its own right. But I think there are a number of really compelling
reasons for taking the staged approach and also standardize on the "raw"
machine-readable, documentation format which would be the output
from the first stage.

One reason is that there potentially are a large number of different
formats in which one might want to generate documentation. Some might
be of general interest, some might have more limited scope. One
can even imagine completely different applications in which it is
important to have access to "documentation level" information about
source code, for example various serach tools. Given a carefully
designed "raw" format as a starting point, it is fairly easy to write
such tools. There is evidence of this from the past: both the
Fudgets documentation tool and the HaskellDoc tool used HBC
generated interface files as their "raw" format to good effect.
But this also meant that these tools became system specific, and
they were also limited by what HBC (ok, Lennart) happened to record
in the interface files. E.g. for HaskellDoc, this meant that there
are only hyper-links to entire source code files, not to individual
functions. (OK, that might be a limitation of HTML as well.)
A more recent example, of course, is Jan's source code browser.

Thus I view the "raw" format as a way of getting the benefits from
creative use of interface files, while avoding getting tied to
and restricted by some particular Haskell compiler.

One could argue that given a freely available, easy-to-use,
documentation-extracting Haskell parser, the above is a non issue.
Just incorporate that code into your own. But I can see that
approach leading to a number of maintenance problems which a
well-specified and extensible file format would insulate against.
Also, some peolpe might prefer to write their documentation generator
in something else than Haskell (assuming that the above-mentioned
parser is a piece of Haskell code).

Another reason for why I like the staged approach is that a Haskell
compiler conceivably could perform the first stage.
Now, I know that this was not to everyones liking. And I'm absoluetly
not saying that we in any way should require a Haskell compiler
to do this work, or that we should rule out stand-alone tools.
But I think there are quite a few good reasons for why one might
want to do it that way, and thus I belive it is good if the
documentation standard is such that it is possible to do so.

Thus I'm saying that I think there should be two related parts of the
standard: one for how to write documentation embedded in Haskell
source, and one for representing collected documentation in a
stand-alone, machine-friendly, format. I'm not saying that we should
require tools to work in a staged manner. E.g. I can easily see
an augmented HDoc emitting the raw format for the benefit of
other tools, as well as keeping its current HTML-emitting capabilities.

[Incidentally, note that documentation extracting compilers is not
a new thing. For example, I know of two different C/C++ compilers which
supported source code browsing by extracting information from source
code and storing it in special files.]

Finally, in defining the "raw" format, we will have to decide on
exactly what Haskell documentation essentially is. I think that will
be very helpful during the standardization process. I also think
this will be very helpful later for people writing document
formating tools. Of course, there are many ways to achieve this.
But specifying the context-free syntax of "raw" documentation seems
to me to be a good way.

Assuming that there will be such a thing as the raw format, then the
question is what that format should look like. Simon suggested that
it should look like Haskell + pragma-style comments. This is certainly
an interesting idea with a number of merits. On the other hand,
XML is gaining wide-spread acceptance as a standard on which various
exchange formats are based. This means that there are quite a few tools
out there that might be put to useful use, including at least one
Haskell library, as Armin mentioned. Also, the fact that XML currently
IS used for things similar to what we'd like to do, might mean
that we can avoid a number a pitfalls by sticking to the standard.

But as I said earlier, detailed format discussions can probably wait a
little.

Best regards,

/Henrik

-- 
Henrik Nilsson
Yale University
Department of Computer Science
nilsson@cs.yale.edu