[Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes, performance improvement

Benedikt Huber benjovi at gmx.net
Wed Jun 25 08:41:20 EDT 2008


Simon Peyton-Jones schrieb:
> Thank you for doing this Benedict.  I've added your more detailed comments to ticket #2393 so that they are preserved.
> 
> Ian: would you like to apply?  I'm not sure how to integrate the QuickCheck tests, but I bet you know.
I'm glad this is useful - and I'm impressed by the complexity of the 
pretty printer library. I'm tackling the other tickets now and adding 
more QuickCheck test.

> Benedict: while you are in the area, would you like to take a swing at http://hackage.haskell.org/trac/ghc/ticket/1337, and 1176?
In my opinion, ticket #667 can be marked invalid (see comments).

Concerning ticket #1337, we have to change the formal specification of 
fill (it doesn't match the implementation):

-- Current Specification:
--   fill []  = empty
--   fill [p] = p
--   fill (p1:p2:ps) = oneLiner p1 <#> nest (length p1)
--                                          (fill (oneLiner p2 : ps))
--                     `union`
--                      p1 $$ fill ps

Problem 1: We want to `unnest' the second argument of (p1 $$ fill ps), 
but not the first one

In the definition above we have e.g.

 > getSecondLayout $
 >   fillDef False [text "a", text "b", text "a"]

 >> text "ab"; nilabove; nest -1; text "a"; empty
 >> |ab|
 >> |.a|

Problem 2: The overlapping $$ should only be used for those layouts of 
p1 which aren't one liners (otherwise violating the invariant "Left 
union arg has shorter first line").

I suggest the following specification (i believe it almost matches the 
current implementation, modulo [fillNB: fix bug #1337] (see below):

-- Revised Specification:
--   fill g docs = fill' 0 docs
--   gap g       = if g then 1 else 0
--   fill' n []  = []
--   fill' n [p] = [p]
--   fill' n (p1:p2:ps) =
--      oneLiner p1 <g> (fill' (n+length p1+gap g) (oneLiner p2 : ps))
--        `union`
--     (p1 $*$ nest (-n) (fill' g ps))
--
-- $*$ is defined for layouts (One-Layout Documents) as
--
-- layout1 $*$ layout2 | isOneLiner layout1 = layout1 $+$ layout2
--                     | otherwise          = layout1 $$ layout2

I've also implemented the specification in HughesPJQuickCheck.hs,
and checked them against the patched pretty printer.

Concerning Bug #1337:
~~~~~~~~~~~~~~~~~~~~~

If the above formal specification is fine, it is easy to fix:
elide the nests of (oneLiner p2) [see attached patch, record bug #1337].

 > PrettyPrint(0) $ ./Bug1337
 > ....ab
 > ...c

The (long) explanation follows below.

I'll look into the other tickets too.
thorkil: Can you help me with a simplified test case (pretty printer 
only) of Bug #1176 ?
Also, I didn't find any regressions tests for the pretty printer - I'll 
create some if someone points me in the right direction.

best regards,
benedikt

===========================================================
Explanation of Bug #1337:

Consider

 > fcat [ nest 1 $ text "a", nest 2 $ text "b", text "c"]

--> expected: (nest 1; text "a"; text "b"; nest -3; "c")
--> actual  : (nest 1; text "a"; text "b"; nest -5; "c")

Reduction:
=== (nest 1; text a) <> (fill (-2) (p2:ps))
==>                     (nest 2 (text "b") $+$ text "c")
==>                     (nest 2 (text "b")) `nilabove`
                         (nest (-3) (text "c"))
==> (nest 1; text a; text b; nest -5 c)

The problem is that if we decide to layout (p1:p2:ps) as

| p1 p2
| ps

(call it layout A), then we want to have

 > (p1 <> p2) $+$ ps.

But following law <n6> this means that

 > fcat_A [p1:nest k p2:ps]

is equivalent to

 > fcat_A [p1,p2,ps]

so the nest of p2 has to be removed.

This is somewhat similar to bug #667, but easier to fix
from a semantic point of view:
p1,p2 and ps are distinct layouts - we only have to preserve the 
individual layouts, and no combinations of them.



> 
> Simon
> 
> 
> | -----Original Message-----
> | From: libraries-bounces at haskell.org [mailto:libraries-bounces at haskell.org] On Behalf Of Benedikt Huber
> | Sent: 24 June 2008 13:12
> | To: libraries at haskell.org
> | Subject: [Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes, performance improvement
> |
> | Hello,
> |
> | I'd like to propose bugfixes, documentation fixes and a performance
> | improvement for Text.PrettyPrint.HughesPJ. The changes shouldn't
> | effect the expected behaviour of the PP library.
> |
> | I've written a QuickCheck test suite for the pretty printer (to test
> | the improvement), and found two bugs and some misconceptions/
> | ambiguities in the documentation. Additionally, there is a
> | microbenchmark for the suggested improvement.
> | Both are available at http://code.haskell.org/~bhuber/Text/
> | PrettyPrint/. Note that the QuickCheck tests need access to all top-
> | level names in HughesPJ (i.e. ignore the export list).
> |
> | In summary, I propose to
> | * fix a bug in fillNB and one in fillNB/sepNB
> | * correct documentation on laws and invariants.
> | * add more efficient implementations of vcat,hsep,hcat
> |
> | More specifically:
> |
> | (1) Bugfix fillNB: Additional case for  fillNB Empty (Empty : ys)
> |
> | (2) Bugfix [f](cat|sep): do not allow overlapping ($$) in vertical
> | composition
> |
> | (3) Lazy implementations of vcat,hcat and hsep
> |
> | (4) Law <t2> isn't always true
> |
> | (5) Invariant 5 should be made stronger
> |
> | (6) Change the comment about negative indentation
> |

-------------- next part --------------

New patches:

[fillNB bug, lazy vcat
benedikt.huber at gmail.com**20080624113715] {
hunk ./Text/PrettyPrint/HughesPJ.hs 410
+  
+** because of law n6, t2 only holds if x doesn't
+** start with `nest'.
+    
hunk ./Text/PrettyPrint/HughesPJ.hs 429
-<m1>    (text s <> x) $$ y = text s <> ((text "" <> x)) $$ 
-                                         nest (-length s) y)
+<m1>    (text s <> x) $$ y = text s <> ((text "" <> x) $$
+                                         nest (-length s) y) 
hunk ./Text/PrettyPrint/HughesPJ.hs 490
+-- lazy list versions
+hcat = reduceAB . foldr (beside_' False) empty
+hsep = reduceAB . foldr (beside_' True)  empty
+vcat = reduceAB . foldr (above_' True) empty
hunk ./Text/PrettyPrint/HughesPJ.hs 495
-hcat = foldr (<>)  empty
-hsep = foldr (<+>) empty
-vcat = foldr ($$)  empty
+beside_' :: Bool -> Doc -> Doc -> Doc
+beside_' _ p Empty = p
+beside_' g p q = Beside p g q
+
+above_' :: Bool -> Doc -> Doc -> Doc
+above_' _ p Empty = p
+above_' g p q = Above p g q
+
+reduceAB :: Doc -> Doc
+reduceAB (Above Empty _ q) = q
+reduceAB (Beside Empty _ q) = q
+reduceAB doc = doc
hunk ./Text/PrettyPrint/HughesPJ.hs 556
-  * The arugment of @TextBeside@ is never @Nest at .
+  * The argument of @TextBeside@ is never @Nest at .
hunk ./Text/PrettyPrint/HughesPJ.hs 564
-  * The right argument of a union cannot be equivalent to the empty set
-    (@NoDoc@).  If the left argument of a union is equivalent to the
-    empty set (@NoDoc@), then the @NoDoc@ appears in the first line.
+  * A @NoDoc@ may only appear on the first line of the left argument of an 
+    union. Therefore, the right argument of an union can never be equivalent
+    to the empty set (@NoDoc@).
hunk ./Text/PrettyPrint/HughesPJ.hs 578
-        -- Arg of a NilAbove is always an RDoc
-nilAbove_ :: Doc -> Doc
+-- Invariant: Args to the 4 functions below are always RDocs
+nilAbove_ :: RDoc -> RDoc
hunk ./Text/PrettyPrint/HughesPJ.hs 583
-textBeside_ :: TextDetails -> Int -> Doc -> Doc
+textBeside_ :: TextDetails -> Int -> RDoc -> RDoc
hunk ./Text/PrettyPrint/HughesPJ.hs 586
-        -- Arg of Nest is always an RDoc
-nest_ :: Int -> Doc -> Doc
+nest_ :: Int -> RDoc -> RDoc
hunk ./Text/PrettyPrint/HughesPJ.hs 589
-        -- Args of union are always RDocs
-union_ :: Doc -> Doc -> Doc
+union_ :: RDoc -> RDoc -> RDoc
hunk ./Text/PrettyPrint/HughesPJ.hs 765
-                            nilAboveNest False k (reduceDoc (vcat ys))
+                            nilAboveNest True k (reduceDoc (vcat ys))
hunk ./Text/PrettyPrint/HughesPJ.hs 810
-fillNB g Empty k (y:ys)    = nilBeside g (fill1 g (oneLiner (reduceDoc y)) k1 ys)
+fillNB g Empty k (Empty:ys)  = fillNB g Empty k ys
+fillNB g Empty k (y:ys)    = fillNBE g k y ys
+fillNB g p k ys            = fill1 g p k ys
+
+fillNBE g k y ys           = nilBeside g (fill1 g (oneLiner (reduceDoc y)) k1 ys)
hunk ./Text/PrettyPrint/HughesPJ.hs 816
-                             nilAboveNest False k (fill g (y:ys))
+                             nilAboveNest True k (fill g (y:ys))
hunk ./Text/PrettyPrint/HughesPJ.hs 821
-fillNB g p k ys            = fill1 g p k ys
-
-
hunk ./Text/PrettyPrint/HughesPJ.hs 1039
--- It should never be called with 'n' < 0, but that can happen for reasons I don't understand
--- Here's a test case:
---	ncat x y = nest 4 $ cat [ x, y ]
---	d1 = foldl1 ncat $ take 50 $ repeat $ char 'a'
---	d2 = parens $  sep [ d1, text "+" , d1 ]
---	main = print d2
--- I don't feel motivated enough to find the Real Bug, so meanwhile we just test for n<=0
+-- returns the empty string on negative argument.
+--
hunk ./Text/PrettyPrint/HughesPJ.hs 1045
-{- Comments from Johannes Waldmann about what the problem might be:
+{-
+Concerning negative indentation:
+If we compose a <> b, and the first line of b is deeply nested, but other lines of b are not,
+then, because <> eats the nest, the pretty printer will try to layout some of b's lines with
+negative indentation:
hunk ./Text/PrettyPrint/HughesPJ.hs 1051
-   In the example above, d2 and d1 are deeply nested, but `text "+"' is not, 
-   so the layout function tries to "out-dent" it.
-   
-   when I look at the Doc values that are generated, there are lots of
-   Nest constructors with negative arguments.  see this sample output of
-   d1 (obtained with hugs, :s -u)
-   
-   tBeside (TextDetails_Chr 'a') 1 Doc_Empty) (Doc_NilAbove (Doc_Nest
-   (-241) (Doc_TextBeside (TextDetails_Chr 'a') 1 Doc_Empty)))))
-   (Doc_NilAbove (Doc_Nest (-236) (Doc_TextBeside (TextDetails_Chr 'a') 1
-   (Doc_NilAbove (Doc_Nest (-5) (Doc_TextBeside (TextDetails_Chr 'a') 1
-   Doc_Empty)))))))) (Doc_NilAbove (Doc_Nest (-231) (Doc_TextBeside
-   (TextDetails_Chr 'a') 1 (Doc_NilAbove (Doc_Nest (-5) (Doc_TextBeside
-   (TextDetails_Chr 'a') 1 (Doc_NilAbove (Doc_Nest (-5) (Doc_TextBeside
-   (TextDetails_Chr 'a') 1 Doc_Empty))))))))))) (Doc_NilAbove (Doc_Nest
+doc       |0123345
+------------------
+d1        |a
+d2        |   b
+          |c
+d1<>d2    |ab
+         c|
}

[fillNB: fix bug #1337
benedikt.huber at gmail.com**20080625114659] {
hunk ./Text/PrettyPrint/HughesPJ.hs 814
-fillNBE g k y ys           = nilBeside g (fill1 g (oneLiner (reduceDoc y)) k1 ys)
+fillNBE g k y ys           = nilBeside g (fill1 g (elideNest . oneLiner . reduceDoc $ y) k1 ys)
hunk ./Text/PrettyPrint/HughesPJ.hs 820
-
+elideNest (Nest k d) = d
+elideNest d = d
}

Context:

[Fix warnings
Ian Lynagh <igloo at earth.li>**20080620135156] 
[TAG 2008-05-28
Ian Lynagh <igloo at earth.li>**20080528004408] 
Patch bundle hash:
1c1d36f58462f5c86700d37fb4d7729461aea35c


More information about the Libraries mailing list