[Haskell-cafe] Literate haskell format unclear (implementation and specification inconsistencies)

Wed Feb 28 18:15:33 EST 2007

Hi Isaac

> Trying to implement literate haskell[*], I realized several
> ways in which the correct behavior for unliterating (especially with
> regard to errors) was unclear.  I have several cases which ghc, hugs
> and Haskell 98 have differing opinions on!  The Report as it stands
> is far from a clear and complete specification (and I didn't find
> anything in the Haskell' wiki/trac about literate haskell).

If you look at Yhc and nhc they both took the unlit code straight out
of the Haskell 1.something report, so I guess could be treated as the
official spec of the last version that had a concrete one.

Thanks

Neil

>
> [*](particularly, to make DrIFT able to deal with TeX-style lhs
>       - there's unfinished work in darcs repo
>      http://isaac.cedarswampstudios.org/2007/DrIFT/ )
>
> testing with:
> ghc: 6.4.2, 6.6(some)
> hugs: Hugs Version 20050308
> nhc98: recent darcs (1.19)
> report: Haskell 98 (The Revised Report: December 2002), section 9.4,
>   http://www.haskell.org/onlinereport/syntax-iso.html#sect9.4
>
> A full set of .lhs test files for all the issues:
> darcs get http://isaac.cedarswampstudios.org/2007/LiterateHaskellTests
> or download
> http://isaac.cedarswampstudios.org/2007/LiterateHaskellTests-1.tar.gz
> or you can try just prefixing all examples with
> \begin{code}
> module Main where
> main = print str
> \end{code}
> or
> >module Main where
> > main = print str
> as appropriate... (please don't get mangled by mail programs,
>                      initial '>'s... ):
>
>
> 1.[UnmatchedBegin]
> If a \begin{code} starts a section of code, is \end{code}
> _required_ before the end of the file?
>        report: unclear
>           ghc: required
>   hugs, nhc98: not required
> The report says "entirely enclosed between", but goes on to say
> "More precisely:" and give a description that is not at all precise
> in the matter of this question.
>
> 2.[AfterBeginOrEnd/{BeginWhite,EndWhite,BeginPrint,EndPrint}]
> Can a line beginning \begin{code} or \end{code} have additional
> stuff on the end, where the directive is understood and the
> additional stuff is ignored?
>   report:[yes]
>   hugs:[yesIffAdditionalStuffIsInvisible]
>   ghc:[case beginningOfLine of
>         "\end{code}" -> yes
>         "\begin{code}" -> yesIffAdditionalStuffIsInvisible]
>   nhc98:[UNLIT_IGNORED]
>    where
>     yesIffAdditionalStuffIsInvisible =
>       if (all isSpace additionalStuff) then yes else UNLIT_IGNORED
>     UNLIT_IGNORED means that if it was inside a code block then
>       the line is treated as program text (so it's probably
>       a syntax error) and if it was in a literate comment section
>       it is treated as a non-empty literate comment line.
> Note that it takes a careful reading of the report: for begin,
> program code only begins on the _following_ line.  Most seem to agree
> that it shouldn't mess up your program to have trailing whitespace
> on such a line (but at least nhc98 doesn't currently implement this).
> Is there any reason to allow NON-whitespace in that location?
>
> 3.[IgnoringStringLiterals/{A,B}]
> what does "(ignoring string literals, of course)" mean?
> that the following(A) makes str = "string gap:end{code}" and an
> unended code block(A), or that it makes an ended code block(B)?
> (A)---------
> \begin{code}
> str = "string gap:\
> \end{code}"
> - ---------
> report:unclear, hugs:A, ghc:B, nhc98:A
> This works for ghc, the result being "string gap:string gap ends":
> (B)---------
> \begin{code}
> str = "string gap:\
> \end{code}"
>
> \begin{code}
> \string gap ends"
> \end{code}
> - -----------
> Note that behavior 1 requires a detailed knowledge of Haskell's syntax
> in order to unliterate a file, for a dubious benefit (if a string literal
> with string gaps is used like that, the programmer could just indent
> the second line!)
>
> 4.[ExtraBeginEnd/{ExtraBegin,ExtraEnd}]
> What happens if \begin{code} appears after another \begin{code}
> before an \end{code}; and what happens if an \end{code} appears
> without a code block previously having been started by a \begin{code}?
> stray end:
>    ghc, nhc98:[UNLIT_IGNORED (-> probable successful compile)]
>          hugs:[error "\end{code} encountered outside code block"]
> stray begin:
>     ghc, nhc98:[UNLIT_IGNORED (-> probable syntax error)]
>           hugs:[error "\begin{code} encountered inside code block"]
>
> 5.[LexicalUnitAcrossLiterateComment/{StringGap,BlockComment}]
> Can lexical units jump across literate comment gaps?
> report, ghc, hugs, nhc98: yes...
> Note that the Report specifies it by removing all non-program lines,
> rather than converting them to blank lines, but an additional blank
> line in the middle of a Haskell program NEVER makes a difference
> (except for line numbering, of course).
> - ----------
> > str = "string gap:\
>
> This might be a literate comment.
>
> >   \ends here"
> - ---------
> ghc, hugs, nhc98: "string gap:ends here"
> or
> - --------
> > str = "string"
> > {- a comment
>
> This might be a literate comment -} with weird character sequences.
>
> > ends here -}
> - ----------
> ghc, hugs, nhc98: think it's a fine comment
> I mention this because allowing these makes it complicated to preserve
> literate comments in a translation to .hs, because, other than cases
> like these, prefixing literate comment lines with "--  " works fine.[*]
> However, banning these could make processing that wants to report errors
> end up more complicated.  Maybe the report could/should say that it
> is "not advisable", as it does for mixing '>' and {code} styles?
> (Also it's confusing to the programmer - I wondered
>    "can I (and should I) really do that?!" sometimes..)
>
> [*]Haddock style is a nuisance too, which is why there are two spaces
> added -- Haddock seems not to recognize such comments then, as desired.
> Or would it be better to take the other approach and say those should
> count as haddock comments?
>
>
> 6.[TeXBirdtrack/]
> I understand that
> "It is not advisable to mix these two styles in the same file."
> and the report doesn't even talk about how they mix, but now that
> I've gotten started on the implementation inconsistencies...
> Actually, despite the Report's advice against it, there seems to be
> a consensus on what the meaning of mixing the two styles is, which
> I'll describe below:
>
> Sensibly, ghc, hugs and nhc98 treat begin/end{code} lines as blank
> for the purposes of '>'-style comment checking (which is that
> a code and a non-blank literate comment line can't be adjacent);
> this works:
> [TeXBirdtrack/NoLayout]------------
> >module Main where
> >{main = print str
> \begin{code}
> ;str = "string"}
> \end{code}
> ok
> - ------------
> Note I didn't rely on the layout rule. This should work:
> [TeXBirdtrack/AlignedLayout]------------
> >module Main where
> > main = print str
> \begin{code}
>   str = "string"
> \end{code}
> ok
> - ------------
> It does in hugs and nhc98, and according to
> http://hackage.haskell.org/trac/ghc/ticket/210
> it does in GHC HEAD now (6.7) as well.
> As another example, this doesn't work, for the same reason
> that you can't start a line with '>' in a .hs file:
> [TeXBirdtrack/Wrong]------------
> >module Main where
> > main = print str
> \begin{code}
> > str = "string"
> \end{code}
> ok
> - ------------
>
>
>
> Hoping to start some discussion,
> Isaac
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.3 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFF5gaoHgcxvIWYTTURAoF4AJwIjQ3hJ9jpwUgHiYgTB7IhN2so4QCdGCKU
> 96q4YIeakWtlBKOdAiFM+vU=
> =qzCQ
> -----END PGP SIGNATURE-----
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>