[GHC] #7919: Heap corruption (segfault) from large 'let' expression
GHC
cvs-ghc at haskell.org
Fri May 17 22:31:16 CEST 2013
#7919: Heap corruption (segfault) from large 'let' expression
--------------------------+-------------------------------------------------
Reporter: duncan | Owner:
Type: bug | Status: new
Priority: normal | Component: Runtime System
Version: 7.6.3 | Keywords:
Os: Linux | Architecture: x86_64 (amd64)
Failure: Runtime crash | Blockedby:
Blocking: | Related:
--------------------------+-------------------------------------------------
The attached test program reliably triggers an assertion in the storage
manager with the `-debug` rts.
{{{
LargeUse: internal error: ASSERTION FAILED: file rts/sm/GCUtils.c, line
208
(GHC version 7.6.3 for x86_64_unknown_linux)
Please report this as a GHC bug:
http://www.haskell.org/ghc/reportabug
}}}
This behaviour is reproducible with many recent ghc versions (tried 7.6.3,
7.4.2, 6.12.3) and all fail at the same assertion when using the `-debug`
rts. (Without `-debug` we get a more random variety of segfaults and GC
errors.)
It looks like a pretty clear case of heap corruption. I'll explain why...
The test program uses TH to generate a program that looks like this:
{{{
data Large = Large Int Int ... -- 512 non-strict Int fields
test =
let step (Large i1 i2 ... i512) =
let j1 = i1 + i4
j2 = i2 + i7
...
j511 = i511 + i510
j512 = i512 + i1
in Large j1 j2 ... j512
in runSteps step 100000 (Large 1 1 ... 1)
-- basically an unfoldr:
runSteps :: (state -> (state, Int)) -> Int -> state -> [Int]
runSteps f n i | n <= 0 = []
| otherwise = case f i of
(i', r) -> r : runSteps f (n - 1) i'
}}}
We use TH to generate this program, and we use a "size" parameter that
determines size of the data constructor (and corresponding letrec). This
makes it easy to find the size threshold where it fails.
For small sizes this program works fine, and for larger values it triggers
the assert. With ghc 7.6.3 on a x86-64 machine, the magic threshold is
511: that is, the program works fine with size 510 and hits the assertion
at size 511. This is suspiciously close to 512. And of course on a 64bit
machine 512 * 8 is 4k, which is the storage manager's block size. And the
failing assertion is in a bit of the storage manager that is dealing with
blocks...
{{{
// If this block does not have enough space to allocate the
// current object, but it also doesn't have any work to push, then
// push it on to the scanned list. It cannot be empty, because
// then there would be enough room to copy the current object.
if (bd->u.scan == bd->free)
{
ASSERT(bd->free != bd->start);
push_scanned_block(bd, ws);
}
}}}
So it looks very much like we have a situation where something is writing
over the end of a block and messing up the SM's data structures.
But, it is not nearly as simple as the data constructor being too big. We
can demonstrate other programs that use much larger data constructors
without any problem at all. Our suspicion falls on the big letrec.
Indeed with this program if we change the data constructor to have strict
fields then it no longer fails, and we can run it with much larger data
constructor sizes. What would be different between strict and non-strict
fields here? Well, one observation is that when it is strict then ghc can
(and does) turn the code into a big cascade of case expressions, while
when it is non-strict then the STG code is all 'let's.
{{{
case tpl_s6jQ of tpl_s6Ak {
__DEFAULT ->
case tpl_s6jS of tpl_s6Al {
__DEFAULT ->
case tpl_s6jU of tpl_s6Am {
-- etc for all 500+ elements
}}}
versus
{{{
let {
sat_s5UK :: GHC.Types.Int
[LclId] =
\u [] GHC.Num.$fNumInt_$c+ i511_s5Ly i1_s5E9; } in
let {
sat_s62X :: GHC.Types.Int
[LclId] =
\u [] GHC.Num.$fNumInt_$c+ i510_s5R2 i509_s5UG; } in
let {
sat_s62W :: GHC.Types.Int
[LclId] =
\u [] GHC.Num.$fNumInt_$c+ i509_s5UG i506_s5UC; } in
-- etc for all 500+ elements
}}}
Note also, that it is nothing to do with the obvious space leak here. If
we modify the code to generate an NFData instance and to use deepseq at
each iteration then we eliminate the space leak, but we keep the big stg
'let', and it still fails.
--
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/7919>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list