[Haskell-beginners] Re: folds -- help!

Tue Mar 10 08:50:49 EDT 2009

On Tue, Mar 10, 2009 at 3:59 AM, 7stud <bbxx789_05ss at yahoo.com> wrote:
> -------------
> Like foldl, foldr takes a function and a base case(what to do when the input
> list is empty) as arguments.
> -------------
>
> That also does not seem correct.  For example:
>
> foldrSum xs =  foldr accFunc 0 xs
>    where accFunc x acc = acc + x
>
> *Main> foldrSum [1, 2, 3]
> 6
>
> In that example, the first two arguments to foldr are the function accFunc
> and 0.  It does not seem accurate to say that  "0 is what to do when the
> input list is empty".   What foldr  does when the input list is empty
> is return the value of the acc parameter variable:
>

I'm not sure why they explain the base case for fold in that way. At
least to me, that is only a trivial result of the 'zero' value's main
purpose, which is to be the initial value used in the accumulating
fold function. When starting the fold, we have to start somewhere, so
the accFunc needs a seed value. That value, and the first element of
the list, are fed into accFunc to start things off. Then it just so
happens that if the list is empty, the seed value is the result, since
no folding can happen.

> foldr _ acc [] = acc
>
> In my example, the value of the acc parameter is 6 "when the input list is
> empty"--not the value 0, which is the argument to foldr.

You're thinking of a slightly different 'empty' here. You're thinking
of what happens when you reach the end of the list, after folding it
all, and there are no more elements to fold. In this example, you're
right that the acc parameter is 6. But what if the list you *first
gave to foldr* was empty? Then it would evaluate to 0, the initial
seed value.

Now you may be thinking, "Why would I ever apply foldr to an empty
list? Obviously that would do nothing." Well you may not know whether
a list is empty, if it's the result of other calculations.

You also may be thinking, "Why do I need to provide a seed value, why
can't foldr just start with the first two elements of the list?"
That's because the accFunc does not always evaluate to the same type
as the elements in the list. For example, you could use a fold to
count the number of 'a's in a list of characters. Then the type of
accFunc would be "accFunc :: Int -> Char -> Int" for foldl or "accFunc
:: Char -> Int -> Int" for foldr. It takes one element from the list,
the previous result, and evaluates to a new result. But in the very
first fold step, there's no previous result, so you have to provide
one.

It's called zero as a convention, but it doesn't actually have to *be*
zero. It can be any initial value you want. Make your foldSum function
only evaluate to results of 10 or larger by doing this:

foldSum xs = foldr  accFunc 10 xs
  where accFunc x acc = acc + x

[---snip---]

>> if f can start delivering the result without looking at its second argument,
>> you can start consuming the result before the fold has traversed the whole
>> list.
>>
>
> Ok, that isn't clearly illustrated by the example in the book:
>
>> > foldl (+) 0 (1:2:3:[])
>> >    == foldl (+) (0 + 1)             (2:3:[])
>> >    == foldl (+) ((0 + 1) + 2)       (3:[])
>> >    == foldl (+) (((0 + 1) + 2) + 3) []
>> >    ==           (((0 + 1) + 2) + 3)
>> >
>> >
>> > foldr (+) 0 (1:2:3:[])
>> >    ==  1 +           foldr (+) 0 (2:3:[])
>> >    ==  1 + (2 +      foldr (+) 0 (3:[])
>> >    ==  1 + (2 + (3 + foldr (+) 0 []))
>> >    ==  1 + (2 + (3 + 0))
>> >
>
> In that example, it doesn't look like anything in foldr can be evaluated
> until the whole fold has been completed.

You're right, that example doesn't show how you could start using the
result without fully evaluating the fold, since addition doesn't give
partial results. The concat example is better in that regard.

>> Common examples are things like
>>
>> concat = foldr (++) [],
>> so
>> concat [l1,l2,l3,l4,l5] = l1 ++ (foldr (++) [] [l2,l3,l4,l5])
>> and the start (l1) can be used before further reducing the fold,
>>
>
> So does haskell store a thunk for everything to the right of l1?
> You said that when using foldr you can start "consuming" the beginning of the
> result before the whole result is reduced.  I don't quite get that.

A thunk is used as a stand-in for most calculations before the result
is actually calculated. That way, if you never try to use the result,
the calculation never needs to be done, and that means less work. As
an example that ties to the concat example above, say your program
only wanted to test if the result of the concat fold was an empty
list. The function 'null' takes a list and evaluates to True or False,
based on whether the list is empty or not. So:

someFunc xs = null ( concat xs )
  where
  concat ys = foldr (++) [] ys

The 'null' function only needs to test whether the list that is the
result of foldConcat has at least one element. Let's say l1 has an
element. So it's kind of evaluated like this:
  someFunc [ l1, l2, l3, l4, l5 ]
  null (concat [ l1, l2, l3, l4, l5 ] )
  null ( l1 ++ ( thunk with rest of fold ))
  False

The rest of the fold doesn't need to be evaluated, since the beginning
part is enough for 'null' to tell that the result would have at least
one element (because l1 does). That's one way foldr can be used to
start consuming the result before the entire fold is done. It depends
completely on the accFunc: if it can return partial results, like
concat, then you can start consuming the result before a full
evaluation. Some accFunc's can't return partial results, like regular
addition. In that case, it's probably better to use foldl' (note the
apostrophe), which will force the thunks to be evaluated as they are
generated and so use less memory. foldl is the same as foldl' except
it does generate thunks, and then evaluates them all at the end of the
fold, so it uses a bunch of memory to store the thunks in the
meantime, which usually isn't useful.

Kurt