Shake fails test with GHC 7.10 RC3

Neil Mitchell ndmitchell at gmail.com
Fri Mar 20 16:56:51 UTC 2015


> do you use a deepSeq when catching errors from a pure computation? Or is it
> in a context where you know the value should be treated strictly?

The computation is essentially (error "here" :: IO ()), so the IO
monadic binding should force the exception. In the full source code to
Shake it's actually in ContT/ReaderT/IO, but a modification to insert
liftIO (if done in exactly the right way) gives the same result. That
simplified code is at http://community.haskell.org/~ndm/temp/src.zip
(to run, compile with the logic from
https://github.com/ndmitchell/shake/blob/master/.ghci :benchmark, then
run "main oracle test").

> https://github.com/ndmitchell/shake/issues/216 seems to be the ticket
> in question

That is the ticket. The test that is failing is the "oracle"
(Test.Oracle) test suite, and you can see full logs of both success
and failure at Travis:
https://travis-ci.org/ndmitchell/shake/builds/54748475 . Note that it
succeeds on GHC 7.2, 7.4, 7.6 and 7.8, but fails on GHC 7.10 RC3 and
GHC Head.

Alas, the actual error is coming from deep in the middle of Shake,
specifically this line:
https://github.com/ndmitchell/shake/blob/master/src/Development/Shake/Core.hs#L329

Thinking some more on what I saw, my best guess is that:

case (\_ -> ...) of {}

Is a fatal error in GHC. I am guessing that {} means there are no
alternatives because GHC managed to show that the scrutinee throws an
exception. In addition, evaluating a literal lambda to WHNF will not
throw an exception, so you have a contradiction.

Making increasingly wild speculative guesses, I could imagine this
code compiling to CMM that just fell off the end of a basic block,
arriving at the next statement of supposedly unconnected code. That
actually matches what I'm seeing - I have the exception, using print I
can see I am at the line before, and that I never get to the line
after, but pop up somewhere nearby and continue onwards seemingly
ignoring the exception ever happened.

If someone can confirm the above pattern of Core is always illegal and
always indicates a GHC error then I can reduce the test case driven
purely by the Core, which would be far easier - at the moment Shake
passes through these routines several times to set things up, making
it harder to simplify them too much without changing the preparation
steps.

Thanks, Neil


>>
>> On Fri, Mar 20, 2015 at 12:01 AM, Neil Mitchell <ndmitchell at gmail.com>
>> wrote:
>> > More delving later, it seems the incorrect optimized version has been
>> > turned into:
>> >
>> > case (\ _ [Occ=Dead, OS=OneShot] -> error "here") of _ [Occ=Dead] {}
>> >
>> > While the working one has been turned into:
>> >
>> > errorFunc argument realWorldToken
>> >
>> > where errorFunc _ = error "here"
>> >
>> > I'm not familiar with case ... of _ {} - what does it mean when there
>> > are no alternatives to the case? And why isn't case on a literal
>> > lambda optimised out? Is it the OneShot annotation (perhaps coming
>> > from the state hack?)
>> >
>> > The full trace is at
>> > https://gist.github.com/ndmitchell/b222e04eb0c3a397c758. I've uploaded
>> > bad (optimises the error out) and good (works as expected) versions of
>> > the Core. The summary files are the subexpression that changed making
>> > a single difference (moving a monomorphic NOINLINE function from one
>> > module to another) plus the handful of functions they depend on, which
>> > I've reformatted/inlined/simplified to produce the expressions above.
>> > The full versions are the entire -ddump-simpl output from about
>> > halfway through the build, starting when the differences occur - let
>> > me know if you need further back. The dodgy function is "exec".
>> >
>> > Thanks, Neil
>> >
>> > On Thu, Mar 19, 2015 at 11:07 PM, Neil Mitchell <ndmitchell at gmail.com>
>> > wrote:
>> >> Herbert, thanks for the list of patches, nothing obvious there - my
>> >> best guess is it's something incredibly sensitive and it only needs
>> >> the tiniest change anywhere to make it happen. Things like moving
>> >> NOINLINE monomorphic-type definitions from one module to another are
>> >> causing the bug to appear/disappear, which isn't what I'd expect.
>> >>
>> >> Simon, changing from error to error in IO causes the bug to disappear,
>> >> but then so do most things. The error return type is type IO (), so I
>> >> suspect that forces it to be raised at the right place - but it's
>> >> certainly one of the possibilities for what is going wrong. Diffing
>> >> the Core is a great idea.
>> >>
>> >> I'll keep reducing and see what I get to. Given the sensitivity of the
>> >> bug, I'm sure a NOINLINE on an out-of-the-way function will make it go
>> >> away, so I can easily fix Shake itself - so I'm more tracking it down
>> >> from the point of GHC now.
>> >>
>> >> Thanks, Neil
>> >>
>> >>
>> >> On Wed, Mar 18, 2015 at 5:04 PM, Simon Peyton Jones
>> >> <simonpj at microsoft.com> wrote:
>> >>> I'm really sorry but I can't think of anything.  Sounds horrible.
>> >>>
>> >>> If you throw exceptions using 'error' (not in IO), then you are of
>> >>> course vulnerable to strictness changes.  If the thing isn't actually
>> >>> evaluated inside the catch block, you won't see the exception.  But I'm sure
>> >>> you've thought of that.
>> >>>
>> >>> I'd experiment with one of the smaller changes you describe, such as
>> >>> adding a putStrLn, and comparing Core, before and after.  Switching off -O
>> >>> will make a huge difference, so hard to compare.  Turning off the state hack
>> >>> will have a more global effect.  But the other changes sound more pin-point
>> >>> and hence the differences will be smaller.
>> >>>
>> >>> Simon
>> >>>
>> >>> |  -----Original Message-----
>> >>> |  From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of
>> >>> Neil
>> >>> |  Mitchell
>> >>> |  Sent: 18 March 2015 15:33
>> >>> |  To: ghc-devs at haskell.org
>> >>> |  Subject: Shake fails test with GHC 7.10 RC3
>> >>> |
>> >>> |  Hi,
>> >>> |
>> >>> |  Testing GHC 7.10 RC3 I've found a bug where Shake seems to catch
>> >>> the
>> >>> |  wrong exception in the wrong place. It's only hit by one of my
>> >>> tests,
>> >>> |  and I've managed to isolate it to a fragment of code with no
>> >>> |  unsafePerformIO, that throws exceptions using error (so not in IO),
>> >>> and
>> >>> |  operates in IO. Turning off the stack hack makes the bug go away,
>> >>> but
>> >>> |  then so does -O0, marking one of the functions it calls NOINLINE,
>> >>> or
>> >>> |  moving an INLINE function it calls to a different module, or adding
>> >>> a
>> >>> |  putStrLn under a catch block - it's very sensitive to the exact
>> >>> |  conditions. This test and this exact code worked fine with GHC 7.10
>> >>> |  RC2.
>> >>> |
>> >>> |  I was wondering if there have been any state hack related changes
>> >>> or
>> >>> |  other potentially dangerous optimisation changes since RC2? I'll
>> >>> |  continue to try reducing the bug, but it's somewhat difficult as
>> >>> the
>> >>> |  larger system is quite big, and the code is very sensitive.
>> >>> |
>> >>> |  Thanks, Neil
>> >>> |  _______________________________________________
>> >>> |  ghc-devs mailing list
>> >>> |  ghc-devs at haskell.org
>> >>> |  http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>> >
>> > On Thu, Mar 19, 2015 at 11:24 PM, Simon Peyton Jones
>> > <simonpj at microsoft.com> wrote:
>> >> Thanks!  I think a -ddump-simpl before and after the smallest change
>> >> the makes the difference would be illuminating.
>> >>
>> >> Simon
>> >>
>> >> | -----Original Message-----
>> >> | From: Neil Mitchell [mailto:ndmitchell at gmail.com]
>> >> | Sent: 19 March 2015 23:07
>> >> | To: Simon Peyton Jones
>> >> | Cc: ghc-devs at haskell.org
>> >> | Subject: Re: Shake fails test with GHC 7.10 RC3
>> >> |
>> >> | Herbert, thanks for the list of patches, nothing obvious there - my
>> >> | best guess is it's something incredibly sensitive and it only needs
>> >> | the tiniest change anywhere to make it happen. Things like moving
>> >> | NOINLINE monomorphic-type definitions from one module to another are
>> >> | causing the bug to appear/disappear, which isn't what I'd expect.
>> >> |
>> >> | Simon, changing from error to error in IO causes the bug to
>> >> disappear,
>> >> | but then so do most things. The error return type is type IO (), so I
>> >> | suspect that forces it to be raised at the right place - but it's
>> >> | certainly one of the possibilities for what is going wrong. Diffing
>> >> | the Core is a great idea.
>> >> |
>> >> | I'll keep reducing and see what I get to. Given the sensitivity of
>> >> the
>> >> | bug, I'm sure a NOINLINE on an out-of-the-way function will make it
>> >> go
>> >> | away, so I can easily fix Shake itself - so I'm more tracking it down
>> >> | from the point of GHC now.
>> >> |
>> >> | Thanks, Neil
>> >> |
>> >> |
>> >> | On Wed, Mar 18, 2015 at 5:04 PM, Simon Peyton Jones
>> >> | <simonpj at microsoft.com> wrote:
>> >> | > I'm really sorry but I can't think of anything.  Sounds horrible.
>> >> | >
>> >> | > If you throw exceptions using 'error' (not in IO), then you are of
>> >> | course vulnerable to strictness changes.  If the thing isn't actually
>> >> | evaluated inside the catch block, you won't see the exception.  But
>> >> I'm
>> >> | sure you've thought of that.
>> >> | >
>> >> | > I'd experiment with one of the smaller changes you describe, such
>> >> as
>> >> | adding a putStrLn, and comparing Core, before and after.  Switching
>> >> off -
>> >> | O will make a huge difference, so hard to compare.  Turning off the
>> >> state
>> >> | hack will have a more global effect.  But the other changes sound
>> >> more
>> >> | pin-point and hence the differences will be smaller.
>> >> | >
>> >> | > Simon
>> >> | >
>> >> | > |  -----Original Message-----
>> >> | > |  From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf
>> >> Of
>> >> | Neil
>> >> | > |  Mitchell
>> >> | > |  Sent: 18 March 2015 15:33
>> >> | > |  To: ghc-devs at haskell.org
>> >> | > |  Subject: Shake fails test with GHC 7.10 RC3
>> >> | > |
>> >> | > |  Hi,
>> >> | > |
>> >> | > |  Testing GHC 7.10 RC3 I've found a bug where Shake seems to catch
>> >> the
>> >> | > |  wrong exception in the wrong place. It's only hit by one of my
>> >> | tests,
>> >> | > |  and I've managed to isolate it to a fragment of code with no
>> >> | > |  unsafePerformIO, that throws exceptions using error (so not in
>> >> IO),
>> >> | and
>> >> | > |  operates in IO. Turning off the stack hack makes the bug go
>> >> away,
>> >> | but
>> >> | > |  then so does -O0, marking one of the functions it calls
>> >> NOINLINE, or
>> >> | > |  moving an INLINE function it calls to a different module, or
>> >> adding
>> >> | a
>> >> | > |  putStrLn under a catch block - it's very sensitive to the exact
>> >> | > |  conditions. This test and this exact code worked fine with GHC
>> >> 7.10
>> >> | > |  RC2.
>> >> | > |
>> >> | > |  I was wondering if there have been any state hack related
>> >> changes or
>> >> | > |  other potentially dangerous optimisation changes since RC2? I'll
>> >> | > |  continue to try reducing the bug, but it's somewhat difficult as
>> >> the
>> >> | > |  larger system is quite big, and the code is very sensitive.
>> >> | > |
>> >> | > |  Thanks, Neil
>> >> | > |  _______________________________________________
>> >> | > |  ghc-devs mailing list
>> >> | > |  ghc-devs at haskell.org
>> >> | > |  http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>> _______________________________________________
>> ghc-devs mailing list
>> ghc-devs at haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
>


More information about the ghc-devs mailing list