[Haskell-cafe] Re: [Haskell] simple function: stack overflow in
hugs vs none in ghc
apfelmus
apfelmus at quantentunnel.de
Mon Sep 24 06:24:02 EDT 2007
john lask wrote:
>
>> test1 = readFile "big.dat" >>= (\x->print $ parse x)
>> test2 = readFile "big.dat" >>= (\x->print $ fst $ parse x)
>
> test1 (on a large file) will succeed in ghc but fail in hugs
> test2 on same file will succeed in both ghc and hugs
>
> big.dat is just some large data file say 1MB.
> (not particularly large by todays standards!)
>
> The question: is there any changes that can be made to the code to
> make test1 work in hugs without changing the essence of the function?
>
>> parse x = sqnc item x
>> where
>
>> item =( \ ts -> case ts of
>> [] -> ( Nothing, [])
>> ts -> ( Just (head ts), tail ts) )
>>
>> sqnc p ts =
>> let ( r, ts' ) = p ts in case r of
>> Nothing -> ([],ts')
>> Just x -> let (r',ts'') = (sqnc p ts') in ( x:r', ts'' )
Strange, this shouldn't happen :) You may want to try
item [] = (Nothing, [])
item (t:ts) = (Just t , ts)
but that shouldn't help ;)
Let's try to find out what's going on by doing graph reduction with our
bare hands. The preliminary material on
http://en.wikibooks.org/wiki/Haskell/Graph_reduction
should help a bit. Ideally, there would be tool support (hat? other
debugger?) but when things become too complicated, tools can only keep
you a few minutes longer above the water before drowning in complexity, too.
The main point is that (print $ parse x) and (print $ fst $ parse x)
differ in that the latter only computes the answer but not the remaining
tokens. So, the stack overflow is triggered when evaluating the
remaining tokens, but I don't see why. What happens for (print $ snd $
parse x) ?
Let's rewrite your code to figure out what's going on
item [] = (Nothing, [])
item ts = (Just (head ts), tail ts)
For sqnc , we need to translate stuff like let (a,b) = e in .
Let-bound patterns aren't explained in the wikibook and in fact they're
tricky. When done wrong, there may be space leaks, see also
J. Sparud. Fixing Some Space Leaks without a Garbage Collector.
http://citeseer.ist.psu.edu/sparud93fixing.html
I don't know whether its implemented in Hugs (probably not?) and GHC
(probably, but maybe with bugs?). We'll use the not so good translatation
let (a,b) = e in e'
<=>
let x = e; a = fst x; b = snd x; in e'
I'd like to call sqnc differently, namely many . We get
many p ts =
let z = p ts
r = fst z
ts' = snd z
in case r of
Nothing -> ([], ts')
Just x ->
let z' = many p ts'
r' = fst z'
ts''= snd z'
in (x:r', ts'')
Intimidating, no? :) Now, let's evaluate an example expression, like
many item (1:2:3:...)
(the list is intended to be finite, but we'll decide later about its
length). To preserve space and stay sane, we'll only focus on the things
that get evaluated and write ... for the rest. Let's start:
many item (1:2:3:...)
=> let ts = 1:2:3:... in
let ... z = item ts; r = fst z; ... in case r of ...
=> let ... z = (Just (head ts), tail ts); r = fst z ...
=> let ... z = (r, tail ts); r = Just (head ts) ... in case r of
The above step is not clear from the description in the wikibook, but
it's a handy notation of saying that the first component and r point to
the same thing. Expanding the case expression yields (in full form)
=> let ts = 1:2:3: ... in
let z = (r, tail ts)
r = Just x
x = head ts
ts'= snd z
in
let z' = many item ts'
r' = fst z'
ts''= snd z'
in (x:r', ts'')
This is the weak head normal form of our expression. Of course, we
wanted print (many item ts) = putStrLn (show ...) which means
evaluating the first component and then the second component in the pair
to full normal form. So, the next redex to be reduced is x followed by
r' which forces z' which at least forces ts'
=> ...
=> let ts = x:ts'
x = 1
ts' = 2:3:...
in
let z = (r, ts')
r = Just x
in
let z' = let ... in (..,..)
r' = fst z'
ts''= snd z'
in (x:r',ts'')
To stay sane, we garbage collect z and r and rename variables before
expanding the expression for z' which is obtained in the same way we
obtained it before
let ts0 = x0 : ts1
x0 = 1
ts1 = 2:3:...
z0 = let z = (r, tail ts1)
r = Just x
x = head ts1
ts'= snd z
in
let z' = many item ts'
r' = fst z'
ts''= snd z'
in (x:r', ts'')
r0 = fst z0
us0 = snd z0
in (x0:r0, us0)
Collecting lets and renaming yields
let ts0 = x0 : ts1
x0 = 1
ts1 = 2:3:...
z = (r, tail ts1)
r = Just x1
x1 = head ts1
ts' = snd z
z1 = many item ts'
r1 = fst z1
us1 = snd z1
z0 = (x1:r1, us1)
r0 = fst z0
us0 = snd z0
in (x0:r0, us0)
The insight is that the original naming was bad, r and z are quite
different from r0 and z0. Reducing r0 and x1 yields
=>
let ts0 = x0 : ts1
x0 = 1
ts1 = x1 : ts2
x1 = 2
ts2 = 3:...
z = (r, tail ts1)
r = Just x1
ts' = snd z
z1 = many item ts'
r1 = fst z1
us1 = snd z1
z0 = (r0, us1)
r0 = x1:r1
us0 = snd z0
in (x0:r0, us0)
The general scheme should be clear now: z,r and ts' are temporary
variables and further reduction of r1, r2 and so on leads to a chain
let x0 = 1; ts0 = x0 : ts1
x1 = 2; ts1 = x1 : ts2
x2 = 3; ts2 = x2 : ts3
...
x8 = ..
z = (r, tail ts8)
r = Just x8
ts' = snd z
z8 = many item ts'
r8 = fst z8
us8 = snd z8
z7 = (r7, us8)
r7 = x8:r8
us7 = snd z7
...
z0 = (r0, us1)
r0 = x1:r1
us0 = snd z0
in (x0:r0, us0)
So, after forcing the first component of the overall result to normal
form, the result looks like
(1:2:3:..., snd (_,snd (_,snd (_,...))) )
and it seems that Hugs fails to evaluate the tail recursive chain of
snd ??
In the end, here's our decisive result: either Hugs or my analysis has a
bug :D
Regards,
apfelmus
More information about the Haskell-Cafe
mailing list