[Haskell-beginners] Re: folds -- help!

Tue Mar 10 03:59:03 EDT 2009

Daniel Fischer <daniel.is.fischer <at> web.de> writes:
> 
> Am Montag, 9. März 2009 17:46 schrieb 7stud:
> > This is an example that shows how foldl and foldr work (from RWH p.93-94):
> >
> > foldl (+) 0 (1:2:3:[])
> >    == foldl (+) (0 + 1)             (2:3:[])
> >    == foldl (+) ((0 + 1) + 2)       (3:[])
> >    == foldl (+) (((0 + 1) + 2) + 3) []
> >    ==           (((0 + 1) + 2) + 3)
> >
> >
> > foldr (+) 0 (1:2:3:[])
> >    ==  1 +           foldr (+) 0 (2:3:[])
> >    ==  1 + (2 +      foldr (+) 0 (3:[])
> >    ==  1 + (2 + (3 + foldr (+) 0 []))
> >    ==  1 + (2 + (3 + 0))
> >
> > The book says on p.94:
> >
> > -----
> > The difference between foldl and foldr should be clear from looking at
> > where the parentheses and the empty list elements show up.  With foldl, the
> > empty list element is on the left, and all the parentheses group to the
> > left. With foldr, the zero value is on the right, and the parentheses group
> > to the right.
> > ----
> >
> > Huh?  With foldl, the only empty list element I see is on the right.
> 
> What they meant was "the value that is the result in case the fold is applied 
> to an empty list", in this case the 0, in the definition
> 

So that's an error right? Or is that correct haskell terminology?

The book also says on p. 95:

-------------
Like foldl, foldr takes a function and a base case(what to do when the input 
list is empty) as arguments.
-------------

That also does not seem correct.  For example:

foldrSum xs =  foldr accFunc 0 xs
    where accFunc x acc = acc + x

*Main> foldrSum [1, 2, 3]
6

In that example, the first two arguments to foldr are the function accFunc
and 0.  It does not seem accurate to say that  "0 is what to do when the 
input list is empty".   What foldr  does when the input list is empty
is return the value of the acc parameter variable:

foldr _ acc [] = acc

In my example, the value of the acc parameter is 6 "when the input list is
empty"--not the value 0, which is the argument to foldr.

> fold(l/r) f z xs = ...

> 
> the 'z'.
> 
> >
> > Initially, it looked to me ike they did the same thing, and that the only
> > difference was the way they called step.  I think "step" is a horrible,
> > non-descriptive name, so I'm going to use "accFunc" instead:
> >
> > foldl calls: accFunc acc x
> >
> > foldr calls: accFunc x acc
> >
> > So it looks like you can define a function using either one and get the
> > same result.
> 
> Note that in general the list elements and acc have different types, so only 
> one of 
> accFun acc x
> and
> accFun x acc
> typechecks.
> 

I don't know how that comment is relevant.  In my examples, acc and x have 
different types:

*Main> :type []
[] :: [a]

*Main> :type 1
1 :: (Num t) => t

And both examples work fine.

> If the types are the same, in general accFun x acc /= accFun acc x, 
>

Is that correct?  Can you give some examples?  Here is what I tried:

1)
accFunc1 acc x = x + acc 
accFunc2 x acc = x + acc 

*Main> let x = 1 
*Main> let acc = 3
*Main> accFunc1 acc x
4
*Main> accFunc1 x acc
4

2)
accFunc3 acc x = x ++ acc
accFunc4 x acc = x ++ acc

*Main> let x = [1]
*Main> let acc = [2, 3]
*Main> accFunc3 acc x
[1,2,3]
*Main> accFunc4 x acc
[1,2,3]

> so foldr and foldl give different results, too.
> 

I think the results produced by my example functions myFilter1 and myFilter2
demonstrate that, but the differing results are because of the way foldr and 
foldl are defined.

> >  Here is a test:
> >
> > --I am going to use odd for pfunc and [1, 2, 3] for xs:
> >
> > myFilter1 pfunc xs = foldl accFunc [] xs
> >     where accFunc acc x
> >
> >             | pfunc x       = acc ++ [x]
> >             | otherwise     = acc
> >
> > myFilter2 pfunc xs = foldr accFunc [] xs
> >     where accFunc x acc
> >
> >             | pfunc x       = acc ++ [x]
> >             | otherwise     = acc
> >
> > *Main> myFilter1 odd [1, 2, 3]
> > [1,3]
> > *Main> myFilter2 odd [1, 2, 3]
> > [3,1]
> >
> > Hmmm.  So there is a difference.  foldr appears to grab elements from
> > the end of the list.  Therefore, to get the same result from the function
> > that uses foldr, I did this:
> >
> >
> > myFilter3 pfunc xs = foldr accFunc [] xs
> >     where accFunc x acc
> >
> >             | pfunc x       = x : acc
> >             | otherwise     = acc
> >
> > *Main> myFilter3 odd [1, 2, 3]
> > [1,3]
> >
> > But then RWH explains that you would never use foldl in practice because it
> > thunks the result, which for large lists can overwhelm the maximum memory
> > alloted for a thunk.  But it appears to me the same thunk problem would
> > occur with foldr.  So why is foldr used in practice but not foldl?
> >
> 
> Since with foldr, the parentheses are grouped to the right:
>
> if f can start delivering the result without looking at its second argument, 
> you can start consuming the result before the fold has traversed the whole 
> list.
> 

Ok, that isn't clearly illustrated by the example in the book:

> > foldl (+) 0 (1:2:3:[])
> >    == foldl (+) (0 + 1)             (2:3:[])
> >    == foldl (+) ((0 + 1) + 2)       (3:[])
> >    == foldl (+) (((0 + 1) + 2) + 3) []
> >    ==           (((0 + 1) + 2) + 3)
> >
> >
> > foldr (+) 0 (1:2:3:[])
> >    ==  1 +           foldr (+) 0 (2:3:[])
> >    ==  1 + (2 +      foldr (+) 0 (3:[])
> >    ==  1 + (2 + (3 + foldr (+) 0 []))
> >    ==  1 + (2 + (3 + 0))
> >

In that example, it doesn't look like anything in foldr can be evaluated
until the whole fold has been completed. 

> Common examples are things like 
> 
> concat = foldr (++) [],
> so 
> concat [l1,l2,l3,l4,l5] = l1 ++ (foldr (++) [] [l2,l3,l4,l5])
> and the start (l1) can be used before further reducing the fold,
> 

So does haskell store a thunk for everything to the right of l1?
You said that when using foldr you can start "consuming" the beginning of the 
result before the whole result is reduced.  I don't quite get that.

> and = foldr (&&) True
> 
> [to evaluate the expression] and [True,False,..........]
> [haskell] needs only inspect the list until it encounters the first False 
>(if any), otherwise it must of course traverse the whole list
> 
> or = foldr (||) False
> 
> foldr is useful if the combination function is lazy in its second argument.
>

Ok.

> foldl on the other hand can't deliver anything before the whole list is 
> consumed. So since foldl builds thunks (except in some easy cases where the 
> optimiser sees it should be strict), which would have to be evaluated at the 
> end when they've become rather large, foldl isn't as useful and one uses the 
> strict left fold, foldl'.
> 

Thanks.