[Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes,
performance improvement
Benedikt Huber
benjovi at gmx.net
Tue Jun 24 08:11:34 EDT 2008
Hello,
I'd like to propose bugfixes, documentation fixes and a performance
improvement for Text.PrettyPrint.HughesPJ. The changes shouldn't
effect the expected behaviour of the PP library.
I've written a QuickCheck test suite for the pretty printer (to test
the improvement), and found two bugs and some misconceptions/
ambiguities in the documentation. Additionally, there is a
microbenchmark for the suggested improvement.
Both are available at http://code.haskell.org/~bhuber/Text/
PrettyPrint/. Note that the QuickCheck tests need access to all top-
level names in HughesPJ (i.e. ignore the export list).
In summary, I propose to
* fix a bug in fillNB and one in fillNB/sepNB
* correct documentation on laws and invariants.
* add more efficient implementations of vcat,hsep,hcat
More specifically:
(1) Bugfix fillNB: Additional case for fillNB Empty (Empty : ys)
(2) Bugfix [f](cat|sep): do not allow overlapping ($$) in vertical
composition
(3) Lazy implementations of vcat,hcat and hsep
(4) Law <t2> isn't always true
(5) Invariant 5 should be made stronger
(6) Change the comment about negative indentation
The details follow below (maybe a little long).
best regards,
benedikt
=
=
=
=
=
=
=
=
========================================================================
== Details (Long) ==
Bug Fixes
--------------------------------------
(1) Bugfix fillNB
Law <l1> states that
> sep (ps++[empty]++qs) = sep (ps ++ qs)
> ...ditto hsep, hcat, vcat, fill...
In the current implementation, this fails for the paragraph fill
variants.
> render' $ fill True [ text "c", text "c",empty, text "c", text "b"]
> where render' = renderStyle (Style PageMode 7 1.4)
>> c c c
>> b
The reason is a missing test for Empty in fillNB
2) Bugfix: overlap and f?(cat|sep)
The specification for cat/sep:
* oneLiner (hcat/hsep ps)
`union`
vcat ps [*]
But currently cat, sep, fcat and fsep attempt to overlap the second
line with the first one, i.e. they use
`foldr ($$) empty ps' instead of `foldr ($+$) empty ps' [*]. I assume
this is a mistake.
This bug can lead to situations, where the line in the right argument
of Union is actually longer:
> prettyDoc$ cat [ text "a", nest 2 ( text "b") ]
>> text "a"; union
>> (text "b"; empty)
>> (nilabove; nest 1; text "b"; empty)
> renderStyle (Style PageMode 1 1) $ cat [ text "a", nest 2 ( text
"b") ]
>> "a b"
In the implementation, we call `nilAbove False' instead of `nilAbove
True' (see patch).
Performance Improvements
--------------------------------------
3) Improving the space/time performance of vcat, hsep, hcat
vcat (hsep,cat) is implemented in an unneccessarily strict way.
We only get some output after all of vcat's arguments are evaluated
and checked against being Empty.
This can be improved by only checking the right argument of foldr
against being Empty, and
then applying an Empty-filter on the resulting Doc. Space improvement
is obvious.
The microbenchmark (code.haskell.org/~bhuber/Text/PrettyPrint/
HughesPJPerfCheck.hs) suggests that
the improvements in time are remarkable too.
Documentation fixes
--------------------------------------
The QuickCheck tests revealed that
4) Law <t2> isn't always true
<t2> text "" <> x = x, if x non-empty
only holds if x does not start with nest (obviously, because of law
<n6>).
5) The invariant 5 should be extended:
A @NoDoc@ may only appear on the first line of the left argument of
an union. Therefore, the right argument of an union can never be
equivalent
to the empty set.
because this assumption is used when rendering.
6) Change the comment about negative indentation
In the source code we have:
-- (spaces n) generates a list of n spaces
-- It should never be called with 'n' < 0, but that can happen for
reasons I don't understand
If we compose a <> b, and the first line of b is deeply nested, but
other lines of b are not,
then, because <> eats the nest, the pretty printer will try to layout
some of b's lines with
negative indentation:
doc |0123345
------------------
d1 |a
d2 | b
|c
d1<>d2 |ab
c|
Here is the reason why this is rather unavoidable: In John Hughes
paper, there is a line stating that
"composing layouts does not change the layouts being composed".
Another one states that
"<> should eat up nests"
But this leads to negative indentation - imho a user error.
== End ==
-------------- next part --------------
A non-text attachment was scrubbed...
Name: HughesPJ.patch
Type: application/octet-stream
Size: 5178 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/libraries/attachments/20080624/2c814c94/HughesPJ-0001.obj
-------------- next part --------------
More information about the Libraries
mailing list