[Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes, performance improvement

Benedikt Huber benjovi at gmx.net
Tue Jun 24 08:11:34 EDT 2008


Hello,

I'd like to propose bugfixes, documentation fixes and a performance  
improvement for Text.PrettyPrint.HughesPJ. The changes shouldn't  
effect the expected behaviour of the PP library.

I've written a QuickCheck test suite for the pretty printer (to test  
the improvement), and found two bugs and some misconceptions/ 
ambiguities in the documentation. Additionally, there is a  
microbenchmark for the suggested improvement.
Both are available at http://code.haskell.org/~bhuber/Text/ 
PrettyPrint/. Note that the QuickCheck tests need access to all top- 
level names in HughesPJ (i.e. ignore the export list).

In summary, I propose to
* fix a bug in fillNB and one in fillNB/sepNB
* correct documentation on laws and invariants.
* add more efficient implementations of vcat,hsep,hcat

More specifically:

(1) Bugfix fillNB: Additional case for  fillNB Empty (Empty : ys)

(2) Bugfix [f](cat|sep): do not allow overlapping ($$) in vertical  
composition

(3) Lazy implementations of vcat,hcat and hsep

(4) Law <t2> isn't always true

(5) Invariant 5 should be made stronger

(6) Change the comment about negative indentation



The details follow below (maybe a little long).

best regards,

benedikt


= 
= 
= 
= 
= 
= 
= 
= 
========================================================================
== Details (Long) ==

Bug Fixes
--------------------------------------


(1) Bugfix fillNB

Law <l1> states that

 > sep (ps++[empty]++qs)   = sep (ps ++ qs)
 >         ...ditto hsep, hcat, vcat, fill...

In the current implementation, this fails for the paragraph fill  
variants.

 > render' $ fill True [ text "c", text "c",empty, text "c", text "b"]
 >   where render' = renderStyle (Style PageMode 7 1.4)
 >> c c c
 >>     b

The reason is a missing test for Empty in fillNB

2) Bugfix: overlap and f?(cat|sep)

The specification for cat/sep:
  * oneLiner (hcat/hsep ps)
   `union`
    vcat ps [*]

But currently cat, sep, fcat and fsep attempt to overlap the second  
line with the first one, i.e. they use
`foldr ($$) empty ps' instead of `foldr ($+$) empty ps' [*]. I assume  
this is a mistake.

This bug can lead to situations, where the line in the right argument  
of Union is actually longer:

 > prettyDoc$ cat [ text "a", nest 2 ( text "b") ]
 >> text "a"; union
 >>           (text "b"; empty)
 >>           (nilabove; nest 1; text "b"; empty)

 > renderStyle (Style PageMode 1 1) $ cat [ text "a", nest 2 ( text  
"b") ]
 >> "a b"

In the implementation, we call `nilAbove False' instead of `nilAbove  
True' (see patch).

Performance Improvements
--------------------------------------

3) Improving the space/time performance of vcat, hsep, hcat

vcat (hsep,cat) is implemented in an unneccessarily strict way.
We only get some output after all of vcat's arguments are evaluated  
and checked against being Empty.
This can be improved by only checking the right argument of foldr  
against being Empty, and
then applying an Empty-filter on the resulting Doc. Space improvement  
is obvious.
The microbenchmark (code.haskell.org/~bhuber/Text/PrettyPrint/ 
HughesPJPerfCheck.hs) suggests that
the improvements in time are remarkable too.


Documentation fixes
--------------------------------------
The QuickCheck tests revealed that

4) Law <t2> isn't always true

   <t2>    text "" <> x            = x, if x non-empty

only holds if x does not start with nest (obviously, because of law  
<n6>).

5) The invariant 5 should be extended:

    A @NoDoc@ may only appear on the first line of the left argument of
    an union. Therefore, the right argument of an union can never be  
equivalent
    to the empty set.

because this assumption is used when rendering.

6) Change the comment about negative indentation

In the source code we have:

-- (spaces n) generates a list of n spaces
-- It should never be called with 'n' < 0, but that can happen for  
reasons I don't understand

If we compose a <> b, and the first line of b is deeply nested, but  
other lines of b are not,
then, because <> eats the nest, the pretty printer will try to layout  
some of b's lines with
negative indentation:

doc       |0123345
------------------
d1        |a
d2        |   b
          |c
d1<>d2    |ab
         c|

Here is the reason why this is rather unavoidable: In John Hughes  
paper, there is a line stating that
    "composing layouts does not change the layouts being composed".
Another one states that
    "<> should eat up nests"
But this leads to negative indentation - imho a user error.

== End ==

-------------- next part --------------
A non-text attachment was scrubbed...
Name: HughesPJ.patch
Type: application/octet-stream
Size: 5178 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/libraries/attachments/20080624/2c814c94/HughesPJ-0001.obj
-------------- next part --------------




More information about the Libraries mailing list