[Haskell-beginners] A question on seq

Wed Sep 15 10:55:16 EDT 2010

Thank You both very much for the quick and helpful answers! The
mailing list seems to be really supportive for learning Haskell (:

2010/9/15, Daniel Fischer <daniel.is.fischer at web.de>:
> On Tuesday 14 September 2010 23:26:47, Klaus Gy wrote:
>> Hi!
>>
>> Inspred from the discussion
>> http://www.haskell.org/pipermail/beginners/2010-February/003396.html ,
>> I just try to understand the seq function a bit better. When I compare
>> these two functions:
>>
>> firstSum :: Num a => a -> [a] -> a
>> firstSum s []     = s
>> firstSum s (x:xs) = seq help (firstSum help xs)
>>   where help      = x + s
>>
>> secondSum :: Num a => [a] -> a
>> secondSum []      = 0
>> secondSum (x:xs)  = seq help (x + help)
>>   where help      = secondSum xs
>>
>> What should be the difference?
>
> That depends on the types at which you use them.
> For types like Int, Integer, Double, Float, Word, ..., evaluation to WHNF,
> what seq does, is complete evaluation.
> So for these types, firstSum keeps a completely evaluated accumulation
> parameter, runs through the list and delivers the result when the end is
> reached. It corresponds closely to
>
> int firstSum(int s, intList xs){
>     if (empty(xs)) return s;
>     s += xs.head;
>     return firstSum(s,xs.tail);
> }
>
> where an intList has an int field `head', a pointer `tail' to its tail and
> empty(xs) would be (xs == NULL) in C, (xs == null) in Java.
> Since that function is tail-recursive, it doesn't need to allocate new
> stack-frames and thus can run in constant (stack) space (if the Haskell
> version doesn't it's a bug, in C, you'd probably have to tell gcc to
> -foptimize-sibling-calls, then it should do fine, in Java, I don't know of
> a VM that does tail-call optimisation - but then, I don't know much about
> Java).
>
> So for these, firstSum is well behaved, pretty much the best you can get.
>
> secondSum is different, the seq there says evaluate the sum of the tail and
> add that to x. Of course, for Int &c, to add x to the sum of the tail, the
> latter has to be evaluated anyway, so the seq is rather pointless.
> secondSum is almost equivalent to
>
> thirdSum :: Num a => [a] -> a
> thirdSum = foldr (+) 0
>
> and it more or less corresponds to the imperative version
>
> int secondSum(intList xs){
>     if (empty(xs)) return 0;
>     int tailsum = secondSum(xs.tail);
>     return (xs.head + tailsum);
> }
>
> This is not tail-recursive, so it needs O(length xs) stack and marches
> twice through the list, so to say, once to the end, building the chain of
> recursive calls and back to the front adding.
>
> So for Int &c, firstSum is vastly superior in space behaviour (always uses
> constant stack space and if the list isn't held in memory by other
> references, also constant heap space).
>
> Things become different when you work with lazy number types.
> firstSum, being tail-recursive, can't deliver anything until it has reached
> the end of the list. Keeping the accumulator evaluated to WHNF doesn't mean
> much for lazy types, so the accumulator may well build up huge thunks (but,
> for lazy types, evaluating a huge thunk can still run in small stack space,
> so that's not necessarily catastrophic).
>
> secondSum, on the other hand, can start delivering before the list has been
> completely traversed (depends on the behaviour of (+)).
> But if it can, it can probably do even better if you don't seq on the sum
> of the tail, so for those types, thirdSum would be the better option.
>
> Example for lazy number type:
>
> data Peano
>     = Zero
>     | Succ Peano
>       deriving (Eq, Show)
>
> instance Num Peano where
>     Zero + n = n
>     (Succ m) + n = Succ (m + n)
>     -- other methods
>     fromInteger n
>         | n <= 0    = Zero
>         | otherwise = Succ (fromInteger (n-1))
>
> instance Ord Peano where
>     compare Zero Zero = EQ
>     compare Zero _      = LT
>     compare _     Zero = GT
>     compare (Succ m) (Succ n) = compare m n
>
>
> Now try
>
> list :: [Peano]
> list = 4:replicate (10^5) 0
>
> thirdSum list > 3
> secondSum list > 3
> firstSum list > 3
>
> and then increase the length of the list (10^6, 10^7 instead of 10^5).
>
>> In my opinion both functions do not
>> return a complete unevaluated thunk (secondSum returns a thunk of the
>> form (THUNK + b) where THUNK contains a single numeral value). But it
>> seems to me that the first function computes the output somehow linear
>> in the sense that it does just a set of substitutions while the second
>> functions has to create a tree to handle all the recursive calls of
>> seq (sorry, my terminology is for sure a bit poor).
>
> Well, it's a flat tree, it's
>
> secondSum (x:xs) =
>     case secondSum xs of
>       s -> x+s
>
> where the `case' is supposed to have core semantics, i.e. evaluates the
> expression to WHNF.
>
> So
>
> secondSum [1,2,3]
> ~> case secondSum [2,3] of
>         s1 -> 1 + s1
> ~> case case secondSum [3] of { s2 -> 2 + s2 } of
>         s1 -> 1 + s1
> ~> case case case secondSum [] of { s3 -> 3 + s3 } of { s2 -> 2 + s2 } of
>         s1 -> 1 + s1
> ~> case case case 0 of { s3 -> 3 + s3 } of { s2 -> 2 + s2} of
>         s1 -> 1 + s1
> ~> case case 3 of { s2 -> 2 + s2 } of
>         s1 -> 1 + s1
> ~> case 5 of
>         s1 -> 1 + 5
> ~> 6
>
>> So I would say the
>> first function delivers a better performance. (In the discussion I
>> mentioned, the second function was not discussed in this form with the
>> seq implementation). Am I right with my thoughts?
>
> Pretty much.
>
>>
>> Thanks, fweth
>
>