[Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes, performance
improvement
Benedikt Huber
benjovi at gmx.net
Wed Jun 25 08:41:20 EDT 2008
Simon Peyton-Jones schrieb:
> Thank you for doing this Benedict. I've added your more detailed comments to ticket #2393 so that they are preserved.
>
> Ian: would you like to apply? I'm not sure how to integrate the QuickCheck tests, but I bet you know.
I'm glad this is useful - and I'm impressed by the complexity of the
pretty printer library. I'm tackling the other tickets now and adding
more QuickCheck test.
> Benedict: while you are in the area, would you like to take a swing at http://hackage.haskell.org/trac/ghc/ticket/1337, and 1176?
In my opinion, ticket #667 can be marked invalid (see comments).
Concerning ticket #1337, we have to change the formal specification of
fill (it doesn't match the implementation):
-- Current Specification:
-- fill [] = empty
-- fill [p] = p
-- fill (p1:p2:ps) = oneLiner p1 <#> nest (length p1)
-- (fill (oneLiner p2 : ps))
-- `union`
-- p1 $$ fill ps
Problem 1: We want to `unnest' the second argument of (p1 $$ fill ps),
but not the first one
In the definition above we have e.g.
> getSecondLayout $
> fillDef False [text "a", text "b", text "a"]
>> text "ab"; nilabove; nest -1; text "a"; empty
>> |ab|
>> |.a|
Problem 2: The overlapping $$ should only be used for those layouts of
p1 which aren't one liners (otherwise violating the invariant "Left
union arg has shorter first line").
I suggest the following specification (i believe it almost matches the
current implementation, modulo [fillNB: fix bug #1337] (see below):
-- Revised Specification:
-- fill g docs = fill' 0 docs
-- gap g = if g then 1 else 0
-- fill' n [] = []
-- fill' n [p] = [p]
-- fill' n (p1:p2:ps) =
-- oneLiner p1 <g> (fill' (n+length p1+gap g) (oneLiner p2 : ps))
-- `union`
-- (p1 $*$ nest (-n) (fill' g ps))
--
-- $*$ is defined for layouts (One-Layout Documents) as
--
-- layout1 $*$ layout2 | isOneLiner layout1 = layout1 $+$ layout2
-- | otherwise = layout1 $$ layout2
I've also implemented the specification in HughesPJQuickCheck.hs,
and checked them against the patched pretty printer.
Concerning Bug #1337:
~~~~~~~~~~~~~~~~~~~~~
If the above formal specification is fine, it is easy to fix:
elide the nests of (oneLiner p2) [see attached patch, record bug #1337].
> PrettyPrint(0) $ ./Bug1337
> ....ab
> ...c
The (long) explanation follows below.
I'll look into the other tickets too.
thorkil: Can you help me with a simplified test case (pretty printer
only) of Bug #1176 ?
Also, I didn't find any regressions tests for the pretty printer - I'll
create some if someone points me in the right direction.
best regards,
benedikt
===========================================================
Explanation of Bug #1337:
Consider
> fcat [ nest 1 $ text "a", nest 2 $ text "b", text "c"]
--> expected: (nest 1; text "a"; text "b"; nest -3; "c")
--> actual : (nest 1; text "a"; text "b"; nest -5; "c")
Reduction:
=== (nest 1; text a) <> (fill (-2) (p2:ps))
==> (nest 2 (text "b") $+$ text "c")
==> (nest 2 (text "b")) `nilabove`
(nest (-3) (text "c"))
==> (nest 1; text a; text b; nest -5 c)
The problem is that if we decide to layout (p1:p2:ps) as
| p1 p2
| ps
(call it layout A), then we want to have
> (p1 <> p2) $+$ ps.
But following law <n6> this means that
> fcat_A [p1:nest k p2:ps]
is equivalent to
> fcat_A [p1,p2,ps]
so the nest of p2 has to be removed.
This is somewhat similar to bug #667, but easier to fix
from a semantic point of view:
p1,p2 and ps are distinct layouts - we only have to preserve the
individual layouts, and no combinations of them.
>
> Simon
>
>
> | -----Original Message-----
> | From: libraries-bounces at haskell.org [mailto:libraries-bounces at haskell.org] On Behalf Of Benedikt Huber
> | Sent: 24 June 2008 13:12
> | To: libraries at haskell.org
> | Subject: [Ticket #2393] Text.PrettyPrint.HughesPJ: Bug fixes, performance improvement
> |
> | Hello,
> |
> | I'd like to propose bugfixes, documentation fixes and a performance
> | improvement for Text.PrettyPrint.HughesPJ. The changes shouldn't
> | effect the expected behaviour of the PP library.
> |
> | I've written a QuickCheck test suite for the pretty printer (to test
> | the improvement), and found two bugs and some misconceptions/
> | ambiguities in the documentation. Additionally, there is a
> | microbenchmark for the suggested improvement.
> | Both are available at http://code.haskell.org/~bhuber/Text/
> | PrettyPrint/. Note that the QuickCheck tests need access to all top-
> | level names in HughesPJ (i.e. ignore the export list).
> |
> | In summary, I propose to
> | * fix a bug in fillNB and one in fillNB/sepNB
> | * correct documentation on laws and invariants.
> | * add more efficient implementations of vcat,hsep,hcat
> |
> | More specifically:
> |
> | (1) Bugfix fillNB: Additional case for fillNB Empty (Empty : ys)
> |
> | (2) Bugfix [f](cat|sep): do not allow overlapping ($$) in vertical
> | composition
> |
> | (3) Lazy implementations of vcat,hcat and hsep
> |
> | (4) Law <t2> isn't always true
> |
> | (5) Invariant 5 should be made stronger
> |
> | (6) Change the comment about negative indentation
> |
-------------- next part --------------
New patches:
[fillNB bug, lazy vcat
benedikt.huber at gmail.com**20080624113715] {
hunk ./Text/PrettyPrint/HughesPJ.hs 410
+
+** because of law n6, t2 only holds if x doesn't
+** start with `nest'.
+
hunk ./Text/PrettyPrint/HughesPJ.hs 429
-<m1> (text s <> x) $$ y = text s <> ((text "" <> x)) $$
- nest (-length s) y)
+<m1> (text s <> x) $$ y = text s <> ((text "" <> x) $$
+ nest (-length s) y)
hunk ./Text/PrettyPrint/HughesPJ.hs 490
+-- lazy list versions
+hcat = reduceAB . foldr (beside_' False) empty
+hsep = reduceAB . foldr (beside_' True) empty
+vcat = reduceAB . foldr (above_' True) empty
hunk ./Text/PrettyPrint/HughesPJ.hs 495
-hcat = foldr (<>) empty
-hsep = foldr (<+>) empty
-vcat = foldr ($$) empty
+beside_' :: Bool -> Doc -> Doc -> Doc
+beside_' _ p Empty = p
+beside_' g p q = Beside p g q
+
+above_' :: Bool -> Doc -> Doc -> Doc
+above_' _ p Empty = p
+above_' g p q = Above p g q
+
+reduceAB :: Doc -> Doc
+reduceAB (Above Empty _ q) = q
+reduceAB (Beside Empty _ q) = q
+reduceAB doc = doc
hunk ./Text/PrettyPrint/HughesPJ.hs 556
- * The arugment of @TextBeside@ is never @Nest at .
+ * The argument of @TextBeside@ is never @Nest at .
hunk ./Text/PrettyPrint/HughesPJ.hs 564
- * The right argument of a union cannot be equivalent to the empty set
- (@NoDoc@). If the left argument of a union is equivalent to the
- empty set (@NoDoc@), then the @NoDoc@ appears in the first line.
+ * A @NoDoc@ may only appear on the first line of the left argument of an
+ union. Therefore, the right argument of an union can never be equivalent
+ to the empty set (@NoDoc@).
hunk ./Text/PrettyPrint/HughesPJ.hs 578
- -- Arg of a NilAbove is always an RDoc
-nilAbove_ :: Doc -> Doc
+-- Invariant: Args to the 4 functions below are always RDocs
+nilAbove_ :: RDoc -> RDoc
hunk ./Text/PrettyPrint/HughesPJ.hs 583
-textBeside_ :: TextDetails -> Int -> Doc -> Doc
+textBeside_ :: TextDetails -> Int -> RDoc -> RDoc
hunk ./Text/PrettyPrint/HughesPJ.hs 586
- -- Arg of Nest is always an RDoc
-nest_ :: Int -> Doc -> Doc
+nest_ :: Int -> RDoc -> RDoc
hunk ./Text/PrettyPrint/HughesPJ.hs 589
- -- Args of union are always RDocs
-union_ :: Doc -> Doc -> Doc
+union_ :: RDoc -> RDoc -> RDoc
hunk ./Text/PrettyPrint/HughesPJ.hs 765
- nilAboveNest False k (reduceDoc (vcat ys))
+ nilAboveNest True k (reduceDoc (vcat ys))
hunk ./Text/PrettyPrint/HughesPJ.hs 810
-fillNB g Empty k (y:ys) = nilBeside g (fill1 g (oneLiner (reduceDoc y)) k1 ys)
+fillNB g Empty k (Empty:ys) = fillNB g Empty k ys
+fillNB g Empty k (y:ys) = fillNBE g k y ys
+fillNB g p k ys = fill1 g p k ys
+
+fillNBE g k y ys = nilBeside g (fill1 g (oneLiner (reduceDoc y)) k1 ys)
hunk ./Text/PrettyPrint/HughesPJ.hs 816
- nilAboveNest False k (fill g (y:ys))
+ nilAboveNest True k (fill g (y:ys))
hunk ./Text/PrettyPrint/HughesPJ.hs 821
-fillNB g p k ys = fill1 g p k ys
-
-
hunk ./Text/PrettyPrint/HughesPJ.hs 1039
--- It should never be called with 'n' < 0, but that can happen for reasons I don't understand
--- Here's a test case:
--- ncat x y = nest 4 $ cat [ x, y ]
--- d1 = foldl1 ncat $ take 50 $ repeat $ char 'a'
--- d2 = parens $ sep [ d1, text "+" , d1 ]
--- main = print d2
--- I don't feel motivated enough to find the Real Bug, so meanwhile we just test for n<=0
+-- returns the empty string on negative argument.
+--
hunk ./Text/PrettyPrint/HughesPJ.hs 1045
-{- Comments from Johannes Waldmann about what the problem might be:
+{-
+Concerning negative indentation:
+If we compose a <> b, and the first line of b is deeply nested, but other lines of b are not,
+then, because <> eats the nest, the pretty printer will try to layout some of b's lines with
+negative indentation:
hunk ./Text/PrettyPrint/HughesPJ.hs 1051
- In the example above, d2 and d1 are deeply nested, but `text "+"' is not,
- so the layout function tries to "out-dent" it.
-
- when I look at the Doc values that are generated, there are lots of
- Nest constructors with negative arguments. see this sample output of
- d1 (obtained with hugs, :s -u)
-
- tBeside (TextDetails_Chr 'a') 1 Doc_Empty) (Doc_NilAbove (Doc_Nest
- (-241) (Doc_TextBeside (TextDetails_Chr 'a') 1 Doc_Empty)))))
- (Doc_NilAbove (Doc_Nest (-236) (Doc_TextBeside (TextDetails_Chr 'a') 1
- (Doc_NilAbove (Doc_Nest (-5) (Doc_TextBeside (TextDetails_Chr 'a') 1
- Doc_Empty)))))))) (Doc_NilAbove (Doc_Nest (-231) (Doc_TextBeside
- (TextDetails_Chr 'a') 1 (Doc_NilAbove (Doc_Nest (-5) (Doc_TextBeside
- (TextDetails_Chr 'a') 1 (Doc_NilAbove (Doc_Nest (-5) (Doc_TextBeside
- (TextDetails_Chr 'a') 1 Doc_Empty))))))))))) (Doc_NilAbove (Doc_Nest
+doc |0123345
+------------------
+d1 |a
+d2 | b
+ |c
+d1<>d2 |ab
+ c|
}
[fillNB: fix bug #1337
benedikt.huber at gmail.com**20080625114659] {
hunk ./Text/PrettyPrint/HughesPJ.hs 814
-fillNBE g k y ys = nilBeside g (fill1 g (oneLiner (reduceDoc y)) k1 ys)
+fillNBE g k y ys = nilBeside g (fill1 g (elideNest . oneLiner . reduceDoc $ y) k1 ys)
hunk ./Text/PrettyPrint/HughesPJ.hs 820
-
+elideNest (Nest k d) = d
+elideNest d = d
}
Context:
[Fix warnings
Ian Lynagh <igloo at earth.li>**20080620135156]
[TAG 2008-05-28
Ian Lynagh <igloo at earth.li>**20080528004408]
Patch bundle hash:
1c1d36f58462f5c86700d37fb4d7729461aea35c
More information about the Libraries
mailing list