[Haskell-beginners] A question on seq
Daniel Fischer
daniel.is.fischer at web.de
Wed Sep 15 06:22:14 EDT 2010
On Tuesday 14 September 2010 23:26:47, Klaus Gy wrote:
> Hi!
>
> Inspred from the discussion
> http://www.haskell.org/pipermail/beginners/2010-February/003396.html ,
> I just try to understand the seq function a bit better. When I compare
> these two functions:
>
> firstSum :: Num a => a -> [a] -> a
> firstSum s [] = s
> firstSum s (x:xs) = seq help (firstSum help xs)
> where help = x + s
>
> secondSum :: Num a => [a] -> a
> secondSum [] = 0
> secondSum (x:xs) = seq help (x + help)
> where help = secondSum xs
>
> What should be the difference?
That depends on the types at which you use them.
For types like Int, Integer, Double, Float, Word, ..., evaluation to WHNF,
what seq does, is complete evaluation.
So for these types, firstSum keeps a completely evaluated accumulation
parameter, runs through the list and delivers the result when the end is
reached. It corresponds closely to
int firstSum(int s, intList xs){
if (empty(xs)) return s;
s += xs.head;
return firstSum(s,xs.tail);
}
where an intList has an int field `head', a pointer `tail' to its tail and
empty(xs) would be (xs == NULL) in C, (xs == null) in Java.
Since that function is tail-recursive, it doesn't need to allocate new
stack-frames and thus can run in constant (stack) space (if the Haskell
version doesn't it's a bug, in C, you'd probably have to tell gcc to
-foptimize-sibling-calls, then it should do fine, in Java, I don't know of
a VM that does tail-call optimisation - but then, I don't know much about
Java).
So for these, firstSum is well behaved, pretty much the best you can get.
secondSum is different, the seq there says evaluate the sum of the tail and
add that to x. Of course, for Int &c, to add x to the sum of the tail, the
latter has to be evaluated anyway, so the seq is rather pointless.
secondSum is almost equivalent to
thirdSum :: Num a => [a] -> a
thirdSum = foldr (+) 0
and it more or less corresponds to the imperative version
int secondSum(intList xs){
if (empty(xs)) return 0;
int tailsum = secondSum(xs.tail);
return (xs.head + tailsum);
}
This is not tail-recursive, so it needs O(length xs) stack and marches
twice through the list, so to say, once to the end, building the chain of
recursive calls and back to the front adding.
So for Int &c, firstSum is vastly superior in space behaviour (always uses
constant stack space and if the list isn't held in memory by other
references, also constant heap space).
Things become different when you work with lazy number types.
firstSum, being tail-recursive, can't deliver anything until it has reached
the end of the list. Keeping the accumulator evaluated to WHNF doesn't mean
much for lazy types, so the accumulator may well build up huge thunks (but,
for lazy types, evaluating a huge thunk can still run in small stack space,
so that's not necessarily catastrophic).
secondSum, on the other hand, can start delivering before the list has been
completely traversed (depends on the behaviour of (+)).
But if it can, it can probably do even better if you don't seq on the sum
of the tail, so for those types, thirdSum would be the better option.
Example for lazy number type:
data Peano
= Zero
| Succ Peano
deriving (Eq, Show)
instance Num Peano where
Zero + n = n
(Succ m) + n = Succ (m + n)
-- other methods
fromInteger n
| n <= 0 = Zero
| otherwise = Succ (fromInteger (n-1))
instance Ord Peano where
compare Zero Zero = EQ
compare Zero _ = LT
compare _ Zero = GT
compare (Succ m) (Succ n) = compare m n
Now try
list :: [Peano]
list = 4:replicate (10^5) 0
thirdSum list > 3
secondSum list > 3
firstSum list > 3
and then increase the length of the list (10^6, 10^7 instead of 10^5).
> In my opinion both functions do not
> return a complete unevaluated thunk (secondSum returns a thunk of the
> form (THUNK + b) where THUNK contains a single numeral value). But it
> seems to me that the first function computes the output somehow linear
> in the sense that it does just a set of substitutions while the second
> functions has to create a tree to handle all the recursive calls of
> seq (sorry, my terminology is for sure a bit poor).
Well, it's a flat tree, it's
secondSum (x:xs) =
case secondSum xs of
s -> x+s
where the `case' is supposed to have core semantics, i.e. evaluates the
expression to WHNF.
So
secondSum [1,2,3]
~> case secondSum [2,3] of
s1 -> 1 + s1
~> case case secondSum [3] of { s2 -> 2 + s2 } of
s1 -> 1 + s1
~> case case case secondSum [] of { s3 -> 3 + s3 } of { s2 -> 2 + s2 } of
s1 -> 1 + s1
~> case case case 0 of { s3 -> 3 + s3 } of { s2 -> 2 + s2} of
s1 -> 1 + s1
~> case case 3 of { s2 -> 2 + s2 } of
s1 -> 1 + s1
~> case 5 of
s1 -> 1 + 5
~> 6
> So I would say the
> first function delivers a better performance. (In the discussion I
> mentioned, the second function was not discussed in this form with the
> seq implementation). Am I right with my thoughts?
Pretty much.
>
> Thanks, fweth
More information about the Beginners
mailing list