From simonmar@microsoft.com Tue Jan 30 11:37:04 2001 Date: Tue, 30 Jan 2001 03:37:04 -0800 From: Simon Marlow simonmar@microsoft.com Subject: A Haskell Documentation Standard
[ resending... mail problems at MS, sorry if you see this twice ] Hi All, Henrik - thanks for your comprehensive message! I've set up the mailing list on haskell.org (haskelldoc@haskell.org; go to http://www.haskell.org/mailman/listinfo/haskelldoc). Let's communicate on there from now on. My thoughts on the matter appear to differ slightly from yours and Jan's, so I'll try to express my own priorities for a documentation system. - we're all agreed that what is needed is an automatic documentation-generation tool that takes annotated Haskell and delivers documentation. It's important that we don't tie ourselves to one particular documentation format, so I believe we need a separately-specified format for the in-source documentation. This is one target for attack. - a big win is that even without any annotations, you get documentation for free (i.e. hyperlinked type signatures for exported functions, class definitions, etc.). Just having this working would make we very happy :) - here's where we differ: instead of defining a new interface format and writing code to generate/parse the new format, I propose we define the format to be the same as Haskell source. Why? - we already have a pretty printer/parser for Haskell source. - we can start generating documentation from existing source straight away, without waiting for support from the compiler writers (who are generally a lazy bunch :-) It is true that by eliminating the compiler from the documentation path we lose the benefit of automatically derived types. However, the compiler often infers ugly types anyway (type synonyms may be expanded), and it's always good practice to give type signatures to every exported function. I bet there are very few functions in GHC's libraries that are exported without a type signature. Of course, I'm thinking more about "external" than "internal" documentation. But even for internal documentation, I don't think it would be a big loss not to get types if they weren't explicitly mentioned in the source. And there's an upgrade path: if we define the "interface-file" format to be exactly the same as the syntax of a Haskell module (perhaps with code optinoal), then the compiler can always just read a source file and output the same file with types filled in. I see the priorities as (a) getting the format for the documentation annotations specified, and (b) extending HDoc to do the business (and to add GHC extensions :). Cheers, SimonFrom jans@numeric-quest.com Tue Jan 30 18:03:47 2001 Date: Tue, 30 Jan 2001 13:03:47 -0500 (EST) From: Jan Skibinski jans@numeric-quest.com Subject: A Haskell Documentation Standard
On Tue, 30 Jan 2001, Simon Marlow wrote: > - a big win is that even without any annotations, you get > documentation for free (i.e. hyperlinked type signatures > for exported functions, class definitions, etc.). Just having > this working would make we very happy :) Right, but without the comments. The question of compiler support aside ... > And there's an upgrade path: if we define the "interface-file" > format to be exactly the same as the syntax of a Haskell module > (perhaps with code optinoal), then the compiler can always just > read a source file and output the same file with types filled in. Am I missing something here? I thought that the Haskell Report does not the comments semantics. They are the free floating entities, and can be put anywhere as it pleases a module developer. And we can have many possible types of comments describing: module, datatype, class, function, etc. Add to it some important decorations, delimiting some categories of functions, and we are in a complete mess. Granted the variety of styles used, how on earth even the cleverest parser can figure it out? Typical parsers (I am not talking here about HDoc) do not care because they skip the comments anyway but it is not the case for the documentation extractors. And the scheme: [(E1, C1), (E2, C2) ...] where (Ei,Ci) are pairs of top level entities and their associated comments, may work sometimes under the assumption that Ci always precedes Ei, but this is only the assumption which I can easily break anytime I wish: I can insert a category decoration somewhere in between or even reverse the order from Ci Ei to Ei Ci - what I often do because it pleases me so. This is the sad affair, because we could treat the problem the same way we treat the function signatures. The association between the function signatures (FSi) and their bodies (FBi) are not in any way positionally described. Sure, I have not seen a code whether someone writes: FS1 FS2 FS3 FB3 FB2 FB1, or FB1 FS1, but this is still OK - at least in Hugs, which I just checked to see if I am correct. There are few solutions possible: 1. Describe precise positional standard for placement of comments (see Eiffel (*) example) 2. Invent markups to identify comment types and their associations (see Java. But I am not talking here about the variable markups, etc. for a moment.) 3. Deliver development tools that force one particular layout and use its own internal markup system, but which is never visible to humans, unless they insist. (See Smalltalk browsers enforcing the style. Eiffel has it too, but admits old fashion editing, as long as the result conforms to the standard.) 4. Bite a bullet and admit that comments are as important as the code; extend Haskell Report to add those specialized comments to be part of the language ...... etc. All of those have their problems however. Jan P.S. Because our first few messages are not stored in the haskelldoc archive I am again showing the pointer to Eiffel online book. Chapters 1 and 5 talk about documentation issues. http://www-staff.socs.uts.edu.au/~rist/eiffel/ There are plenty of Eiffel sites. The oldest is the ISE site of Bertrand Meyer, the language inventor: http://www.eiffel.comFrom simonmar@microsoft.com Wed Jan 31 11:46:27 2001 Date: Wed, 31 Jan 2001 03:46:27 -0800 From: Simon Marlow simonmar@microsoft.com Subject: A Haskell Documentation Standard
Jan Sibinki writes: > Am I missing something here? I thought that the Haskell Report > does not the comments semantics. > They are the free floating entities, and can be put anywhere > as it pleases a module developer. And we can have many possible > types of comments describing: module, datatype, class, function, > etc. > Add to it some important decorations, delimiting some > categories of functions, and we are in a complete mess. > > Granted the variety of styles used, how on earth even the > cleverest parser can figure it out? Typical parsers (I am not > talking here about HDoc) do not care because they skip the > comments anyway but it is not the case for the documentation > extractors. Sorry for not being clear about this. You're right in that trying to understand arbitrary comments in Haskell source isn't workable. The documentation annotations must be in a special format that the documentation tool can understand. eg. HDoc's {--- .. -} style comments. I had in mind using Haskell's pragma convention, like this: {-# DOC f <desc><id>f</id> turns people into frogs</desc> <arg><id>x</id> A <type>Person</type></arg> <ret>A <type>Frog</type></ret> #-} f :: Person -> Frog f x = ... with similar annotations for classes, instances, datatypes, newtypes etc. There's no requirement that the documentation appears directly before the source code for the function; since it contains the identifier of the entity being documented, it can be placed anywhere (even in a different file). The markup format for the documentation is of course up for discussion. XML seems plausible but verbose. If I understand correctly, I think you were proposing a two stage process to get the documentation (similar to the Eiffel approach?): Haskell source --> interface ---> on-line documentation `--> printed documentation ..... Why not do it in one? Haskell source ---> on-line documentation `--> printed documentation .... Cheers, SimonFrom groessli@fmi.uni-passau.de Wed Jan 31 13:32:56 2001 Date: Wed, 31 Jan 2001 14:32:56 +0100 (MET) From: Armin Groesslinger groessli@fmi.uni-passau.de Subject: A Haskell Documentation Standard
On Wed, 31 Jan 2001, Simon Marlow wrote: > eg. HDoc's {--- .. -} style comments. I had in mind using Haskell's > pragma convention, like this: > > {-# DOC f > <desc><id>f</id> turns people into frogs</desc> > <arg><id>x</id> A <type>Person</type></arg> > <ret>A <type>Frog</type></ret> > #-} > f :: Person -> Frog > f x = ... > > with similar annotations for classes, instances, datatypes, newtypes > etc. There's no requirement that the documentation appears directly > before the source code for the function; since it contains the > identifier of the entity being documented, it can be placed anywhere > (even in a different file). > That means that classes and instances (and their functions) have to be distinguished by giving the complete type of the class / instance declaration, right? E.g. class X a b where f :: a -> b instance X Person Frog where f x = ... How do we avoid that the tool confuses the two version of "f" ? An obvious way would be {-# DOC f INSTANCE X Person Frog ... #-} {-# DOC f CLASS X ... #-} (HDoc required similar "help" in early versions, but I changed that in favour of considering positional information to reduce the redundancy required in the annotations.) I guess additional annotations are an unavoidable drawback when not relying on positional information. On the other hand, being able to put the documentation in different files may be a big advantage. So, should we allow both variants? I.e. use positional information when the pragma happens to be next to a class/instance declaration (or a function therein) and rely on extra information (like "CLASS X") in the other case? > The markup format for the documentation is of course up for discussion. > XML seems plausible but verbose. > As flexibility should be a priority here (we want to produce many different output formats, right?), I think XML is verbose, but not too verbose. I don't see an alternative format which is significantly less verbose. And, there's HaXml which could do a good job at processing the documentation (I haven't had a very close look at HaXml, yet). Regards, ArminFrom jans@numeric-quest.com Wed Jan 31 10:00:59 2001 Date: Wed, 31 Jan 2001 05:00:59 -0500 (EST) From: Jan Skibinski jans@numeric-quest.com Subject: A Haskell Documentation Standard
On Wed, 31 Jan 2001, Simon Marlow wrote: > Sorry for not being clear about this. You're right in that trying to > understand arbitrary comments in Haskell source isn't workable. The > documentation annotations must be in a special format that the > documentation tool can understand. > > eg. HDoc's {--- .. -} style comments. I had in mind using Haskell's > pragma convention, like this: > > {-# DOC f > <desc><id>f</id> turns people into frogs</desc> > <arg><id>x</id> A <type>Person</type></arg> > <ret>A <type>Frog</type></ret> > #-} > f :: Person -> Frog > f x = ... Machine-wise, this is an excellent format, because it not only signifies that this comment is important (as HDoc does by using triple dashes) but also associates the comment with the specific entity (as I was discussing it in the previous post). Easy to parse, no ambiguities. Human-wise, this is a terrible thing. I would never like to read sources written this way, nor to produce them by hand this way. > The markup format for the documentation is of course up for discussion. > XML seems plausible but verbose. Now take a look at this disciplined ascii version: f :: Person -> Frog f x -- A frog made from a person 'x' = ..... (Note the I always specify the result by placing it up front of the sentence. No need for <ret>, no need for types to be told twice.) or at a more complex example to make even a stronger point: f :: Person -> Frog -> Bool f person y -- True if a 'person' can be turned -- to a frog -- where -- y is a frog -- = .... I am sure there is nothing in your XML version that I did not explain in plain English of my versions - unless you want to cross-reference all the frogs, persons and booleans (which would not make any sense to me). But the the major difference between the two is such that I can still read the source files with ease. This is similar to the Eiffel style of documentation: you extract the signature, the left hand side of function definition and the comments. You place them in your interface (which can be pretty printed in any format) in exactly this order - from most general info to the most detailed explanation. Comments can refer to arguments by their names for clarity. And a function comment clearly becomes a part of a function definition. This positional method has few drawbacks however. First, this specific order cannot be applied to functions with multiple equations (the order: "signature, comment, equations" looks better in such cases), although it is still fine with guards. Secondly, this could open a can of protests about the order requirements. Thirdly, it requires a lot of self discipline. However, the final result definitely exceeds other styles -- readability-wise. But I am not trying to promote this style, I am just pointing out some more readable alternatives to your original version. For example, this would do equally well: f :: Person -> Frog f x {-# Doc f -- Frog from person 'x' -#} = ..... and could be pretty printed, with pragmas removed. But --as a developer -- I would still hate reading and writing those pragma tokens, especially the complex thingies -- as pointed by Armin in another post .... unless I would never see them anytime during the development cycle. In theory it is possible to have both worlds with the help of tools: you write a single function in plain English, annotate it (or not - depending on the tool), and it comes back again, but prettified. So if we want both worlds then we better provide good tools and then convince the community that this is the way to work with Haskell. > If I understand correctly, I think you were proposing a two stage > process to get the documentation (similar to the Eiffel approach?): No, the numbered list in my previous post did not represent any order of steps, but some options I was musing about. JanFrom simonmar@microsoft.com Wed Jan 31 14:08:58 2001 Date: Wed, 31 Jan 2001 06:08:58 -0800 From: Simon Marlow simonmar@microsoft.com Subject: A Haskell Documentation Standard
> That means that classes and instances (and their functions) have to be > distinguished by giving the complete type of the class / instance > declaration, right? > > E.g. > > class X a b where > f :: a -> b > > instance X Person Frog where > f x = ... > > How do we avoid that the tool confuses the two version of "f" ? > An obvious way would be > > {-# DOC f INSTANCE X Person Frog ... #-} > > {-# DOC f CLASS X ... #-} For classes, the 'class' keyword isn't required: classes and type constructor share the same namespace, so an identifier beginning with an upper case letter is unambiguous. I hadn't considered the documentation of individual instances, but perhaps you need that for internal documentation (most of the time, the only documentation you need for an instance is that it exists). If documentation for instances is required, then yes, you have to give the full instance header in order to identify the exact instance. > (HDoc required similar "help" in early versions, but I changed that in > favour of considering positional information to reduce the redundancy > required in the annotations.) > > > I guess additional annotations are an unavoidable drawback when not > relying on positional information. On the other hand, being > able to put > the documentation in different files may be a big advantage. > > So, should we allow both variants? I.e. use positional > information when > the pragma happens to be next to a class/instance declaration (or a > function therein) and rely on extra information (like "CLASS > X") in the > other case? I wouldn't object to allowing both variants - perhaps the convention could be that if the identifer is missing, then it applies to the following declaration. PS. Henrik - before this discussion gets too detailed, do you want to announce the existence of the mailing list on haskell@haskell.org? Cheers, SimonFrom jans@numeric-quest.com Wed Jan 31 23:29:51 2001 Date: Wed, 31 Jan 2001 18:29:51 -0500 (EST) From: Jan Skibinski jans@numeric-quest.com Subject: ready to announce?
On Wed, 31 Jan 2001, Henrik Nilsson wrote: > Simon Marlow wrote: > > > PS. Henrik - before this discussion gets too detailed, do you want to > > announce the existence of the mailing list on haskell@haskell.org? > > I think it would be very good if we could agree on some general > design goals and principles before the discussion gets too detailed, > and incorporate these into an announcement. Obviously, they would > not be cast in stone, but I think that having a little bit of structure > from the outset would help in getting a focused and constructive > design process going, and I also believe that it is somewhat easier > to decide on that structure in a relatively small group. (Even > among us, there seems to be some quite different opinions on > what we are trying to achieve! ;-) I second it. We need a bit of time to have some feel for each other's position and to define the basic terms and goals. I also suggest that our discussions should be summarized periodicly for the benefit of ours and those who will join us later. Henrik, could you do it, say under the heading "Summary - date", every so often? JanFrom jans@numeric-quest.com Wed Jan 31 23:54:55 2001 Date: Wed, 31 Jan 2001 18:54:55 -0500 (EST) From: Jan Skibinski jans@numeric-quest.com Subject: raw machine standard
I sympathize with Henrik idea about the developing the "raw", rich, machine interface standard. I appreciate it because I already experienced the impact of incompatibilities on development of Haskell Module Browser. I use NHC and Hugs interfaces as helpers an guidelines even though I do extract other information directly from sources. From this perspective I consider it one of the priorities, especially because Henrik made me realize that the incompatibilities could multiply when an implementor decided one day to switch to a new format. And this looks quite probable - vide the announcement of the new version of Hugs. It appears that Johan Nordlander is taking over the Hugs maintenance. Examples of incompatibilities between NHC and Hugs interfaces are numerous. One good example is different representation of function signatures: f :: a -> a -> Int - Hugs f :: (a -> (a -> Prelude.Int)) - NHC JanFrom simonmar@microsoft.com Mon Jan 29 13:06:39 2001 Date: Mon, 29 Jan 2001 05:06:39 -0800 From: Simon Marlow simonmar@microsoft.com Subject: A Haskell Documentation Standard
Hi All, Henrik - thanks for your comprehensive message! I've set up the mailing list on haskell.org (haskelldoc@haskell.org; go to http://www.haskell.org/mailman/listinfo/haskelldoc). Let's communicate on there from now on. My thoughts on the matter appear to differ slightly from yours and Jan's, so I'll try to express my own priorities for a documentation system. - we're all agreed that what is needed is an automatic documentation-generation tool that takes annotated Haskell and delivers documentation. It's important that we don't tie ourselves to one particular documentation format, so I believe we need a separately-specified format for the in-source documentation. This is one target for attack. - a big win is that even without any annotations, you get documentation for free (i.e. hyperlinked type signatures for exported functions, class definitions, etc.). Just having this working would make we very happy :) - here's where we differ: instead of defining a new interface format and writing code to generate/parse the new format, I propose we define the format to be the same as Haskell source. Why? - we already have a pretty printer/parser for Haskell source. - we can start generating documentation from existing source straight away, without waiting for support from the compiler writers (who are generally a lazy bunch :-) It is true that by eliminating the compiler from the documentation path we lose the benefit of automatically derived types. However, the compiler often infers ugly types anyway (type synonyms may be expanded), and it's always good practice to give type signatures to every exported function. I bet there are very few functions in GHC's libraries that are exported without a type signature. Of course, I'm thinking more about "external" than "internal" documentation. But even for internal documentation, I don't think it would be a big loss not to get types if they weren't explicitly mentioned in the source. And there's an upgrade path: if we define the "interface-file" format to be exactly the same as the syntax of a Haskell module (perhaps with code optinoal), then the compiler can always just read a source file and output the same file with types filled in. I see the priorities as (a) getting the format for the documentation annotations specified, and (b) extending HDoc to do the business (and to add GHC extensions :). Cheers, Simon