[Haskell-cafe] Help understanding John Hughes' "The Design of a Pretty-printing Library"

Fri Jul 9 18:59:01 UTC 2021

On Fri, Jul 9, 2021 at 10:11 AM Galaxy Being <borgauf at gmail.com> wrote:

> Getting stuck in *Real World Haskell*'s chapter on pretty printing (chap
> 5), I decided to take a look at one of the suggested sources, i.e., John
> Hughes' *The Design of a Pretty-printing Library
> <https://www.researchgate.net/publication/226525714_The_Design_of_a_Pretty-printing_Library>*.
> Not that I fully got the first three section, still, let me skip ahead to
> section 4, *Designing a Sequence Type *where we're designing a supposedly
> generic sequence type, I'm guessing in the same universal, fundamental way
> that Peano's Axioms give a generic underpinning of the natural numbers.
> Good. I get it. This is why Haskell is unique. So he says we take the
> following *signature* as our starting point
>
> nil :: Seq a
> unit :: a -> Seq a
> cat :: Seq a -> Seq a -> Seq a
> list :: Seq a -> [a]
>
> where nil , unit , and cat give us ways to build sequences, and list is
> an *observation *[explained before]. The correspondence with the usual
> list operations is
>
> nil = []
> unit x = [x]
> cat = (++)
>
> These operations are to satisfy the following laws:
>
> xs `cat` ( ys `cat` zs ) = ( xs `cat` ys ) `cat` zs
> nil `cat` xs = xs
> xs `cat` nil = xs
> list nil = []
> list ( unit x `cat` xs ) = x : list xs
>
> My first question is why must we satisfy these laws? Where did these laws
> come from? Is this a universal truth thing? I've heard the word *monoid *and
> *semigroup* tossed around. Is this germane here?
>

This is how the paper defines what the word Sequence means in a
mathematical way, otherwise there might be some ambiguity for what a
Sequence should be. The laws are the design for the Sequence type. Some of
them also happen to be the laws that define Monoid and Semigroup.

The first law is associativity, which means Sequence can implement
Semigroup. The next two laws are the identity laws, which means Sequence
can implement Monoid. https://wiki.haskell.org/Monoid#The_basics

The last two laws say that you can take any Sequence and make a list out of
it without losing any information (all elements in the same order). This is
also related to Foldable but it's stronger because of the ordering.
https://wiki.haskell.org/Foldable_and_Traversable

Reading all of the laws together (Monoid + Foldable with ordering) you can
infer that Sequence is isomorphic to list. With mconcat you can turn a list
into a Sequence and with list you can get the original list back. This also
means that Sequence should be able to lawfully (but perhaps not as
efficiently) implement any other typeclass or operation that is implemented
for list.

Next in *4.1 Term Representation* I'm immediately lost with the
concept of *term.
> *On a hunch, I borrowed a book *Term Rewriting and All That *from Baader
> and Nipkow which I'm sure goes very deep into the concept of terms, but I'm
> not there yet. What is meant by *term*? As 4.1 continues, confusing is
> this passage
>
> The most direct way to represent values of [the?] sequence type is just as
> terms of the
> algebra [huh?], for example using
>
> data Seq a = Nil j Unit a j Seq a `Cat` Seq a
>
> But this trivial representation does not exploit the algebraic laws that
> we know to
> hold [huh?], and moreover the list observation will be a little tricky to
> de ne (ideally we would like to implement observations by very simple,
> non-recursive functions: the real work should be done in the
> implementations of the Seq operators themselves). Instead, we may choose a
> restricted subset of terms -- call them simplified forms -- into which
> every term can be put using the algebraic laws. Then we can represent
> sequences using a datatype that represents the syntax of simplified forms.
>

This essentially states that the Unit a constructor is redundant because
it's equivalent to Seq a Nil. Having fewer possible representations greatly
simplifies how easy it is to reason about a data type and usually also
simplifies its implementation.

> In this case, there is an obvious candidate for simplified forms: terms of
> the form nil and unit x `cat` xs , where xs is also in simplified form.
> Simplified forms can be represented using the type
>
> data Seq a = Nil j a `UnitCat` Seq a
>
> with the interpretation
>
> Nil = nil
> x `UnitCat` xs = unit x `cat` xs
>
> All of this presupposes much math lore that I'm hereby asking someone on
> this mailing list to point me towards. I threw in a few [huh?] as specific
> points, but in general, I'm not there yet. Any explanations appreciated.
>

This section of the paper explains how a first attempt at implementing a
Sequence type from first principles could end up with a representation that
is indistinguishable from the list type other than the names (nil = []; cat
= (:)). The important part is the end of *4.1* where we learn why this
representation doesn't have the desired performance characteristics for
this use case and then in *4.2* a more suitable representation is derived.

-bob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20210709/709ed310/attachment.html>