[Haskell-beginners] Thompson Exercise 9.13

Wed Jul 14 17:02:28 EDT 2010

On Wednesday 14 July 2010 20:55:17, dan portin wrote:
> Hi,
>
> I am new to Haskell (and programming). Thompson's exercise 9.13  in
> *Craft of Functional Programming *gave me trouble. Searching the list
> archives, I saw people define init (xs), last (xs), and so on, in a
> variety of complex ways (using the Maybe monad,

Using Maybe isn't really complex, and for the implementations I sort of 
remember, the fact that Maybe is a Monad didn't play a role.

> using fairly complex
> post-processing). This seems to be a hard problem for beginners; at
> least, it was rather hard for me.

Yes, it's not easy before you're familiar with foldr.
If you don't try too hard to avoid any post-processing, it's not incredibly 
hard, though.

>
> The problem is to define the Prelude functions *init* and *last* using *
> foldr*. After a while, I came up with:
>
> -- *Exercise 9.13*: Use foldr (f, s, xs) to give definitions of the
> prelude functions
> -- unzip, last, and init.
>
> -- Clearly,
> --     [(x, y), (x1, y1)] = (x, y) : (x1, y1) : ([], [])

That last one is a typo,

[(x,y),(x1,y1)] = (x,y) : (x1,y1) : []

> --     foldr f ([], []) ((x, y):(x1, y1):[]) = f (x, y) (f (x1, y1) ([],
> []))
> -- Hence, f (x, y) (xs, ys) must equal (x:xs, y:ys) for any xs, ys.

Yup.

>
> unzip :: [(a, b)] -> ([a], [b])
> unzip xys = foldr f ([], []) xys
>  where f :: (a, b) -> ([a], [b]) -> ([a], [b])
>        f (x, y) (xs, ys) = (x:xs, y:ys)

However, this has a small problem,

take 3 . fst $ unzip [(i,i+1) | i <- [0 .. ]]

won't return with that definition of unzip.

The reason is subtle,

f (x,y) (xs,ys) = (x:xs,y:ys)

must inspect its second argument to match it with the pattern (xs,ys).
To do that, it must evaluate the nested call to f first.

f (x,y) (f (x1,y1) ([],[]))
~> match (xs,ys) with f (x1,y1) ([],[])
  ~> evaluate f (x1,y1) ([],[])
    ~> match (xs,ys) with ([],[])
    ~> matches
  ~> (x1:[], y1:[])
  ~> matches
~> (x:x1:[], y:y1:[])

Thus it needs to traverse the entire list before it can start assembling 
the result.
To avoid that, so the result can be assembled from the start of the list, 
you need to make the pattern match on the second argument lazy,

f (x,y) ~(xs,ys) = (x:xs,y:ys)

or

f (x,y) p = (x : fst p, y : snd p)

Now

f (x,y) (f (x1,y1) ([],[]))
~> let (xs,ys) = f (x1,y1) ([],[]) in (x:xs, y:ys)

and assembling the result starts immediately.

The tilde on a pattern makes that pattern irrefutable, the passed argument 
is bound to the pattern immediately and it will only be 
deconstructed/evaluated when needed.
It's sort of a "trust me, the argument will have that form, don't check it" 
message to the compiler/interpreter. Of course it will usually crash hard 
if the passed argument doesn't have the promised form.

In this case, it can't crash very hard, because the type checker doesn't 
allow anything but a pair to be passed as an argument, and a pair can only 
be (blah, blub) or _|_ (bottom). But if you use a tilde-pattern for a 
multi-constructor type, you better get it right.

>
> *last* :: [a] -> a
> last xs = head $ foldr f [] xs
>  where f :: a -> [a] -> [a]
>        f x [] = [x]
>        f x ys = ys ++ [x]

last xs = head (reverse xs), yes, it's correct, but not very pretty.
And not very efficient since it builds a left-associated nest of (++) 
applications and needs to pattern match to decide which branch to take.

last (1:2:3:4:[])
~> head $ foldr f [] (1:2:3:4:[])
~> head $ f 1 (f 2 (f 3 (f 4 [])))
~> head $ f 1 (f 2 (f 3 [4]))
~> head $ f 1 (f 2 ([4] ++ [3]))
~> head $ f 1 (([4] ++ [3]) ++ [2])
~> head $ ((([4] ++ [3]) ++ [2]) ++ [1]

a) in the second branch of f, you don't actually need to concatenate,

f x [] = [x]
f _ ys = ys

works too, but is faster.

b) you can get much faster by delaying the pattern match,

f x ys = (case ys of { [] -> x; y:_ -> y }) : []

>
> *init* :: [a] -> [a]
> init xs = tail $ foldr f [] xs
>  where f :: a -> [a] -> [a]
>        f x [] = [x]
>        f x (y:xs) = y : x : xs

Correct too, but again not very efficient since it has to find the last 
element and bubble it to the front.

Much faster:

import Data.Maybe (fromMaybe)

init' :: [a] -> [a]
init' = fromMaybe (error "init': empty list") . foldr f Nothing
    where
        f x mb = Just $ case mb of
                          Just xs -> x:xs
                          Nothing -> []

By delaying the pattern match on the Maybe until after the constructor is 
applied, we can start building the output with minimal delay (we only need 
to look whether there's a next list element to decide whether to cons it to 
the front or not).

>
> Now, these seemed to be hard questions. So, I have three questions: (1)
> are these correct? They work on test cases, and I *did* do some quick
> proofs. They seem okay. 

They are correct for finite lists, unzip and init above won't return on 
infinite lists (last shouldn't, so that's correct for infinite lists too).
They are not, strictly speaking, correct for infinite lists. But that is 
way beyond beginner territory :)

> (2) Is there a way to eliminate the
> post-processing of the lists (i.e., *head* in *last* and *tail* in
> *init*)? 

Not in a clean way.

Let us consider last first.

Suppose we had

last xs = foldr f z xs

without post-processing.
Since foldr f z [] = z and last [] = error "Prelude.last: empty list",
we must have z = error "...".
Now last (... x:[]) = x and
foldr f z (... x:[]) = ... (f x z)

So f x y = y if y is not error "..." and f x (error "...") = x, that means 
f would have to find out whether its second argument is a specific error 
and return its first argument in that case, otherwise its second argument.
It's possible to do that, but very unclean.

For init, the situation is similar, the value for the empty list case 
supplied to foldr must be an error and the combining function needs to know 
whether its second argument is an error and do things accordingly.

> (3) Why the complex answers in the list archives? Am I missing
> something?

Don't know. In part, because beginners didn't find the easiest ways, I 
suppose, in part because it's not too easy to give efficient 
implementations with foldr.