Shake fails test with GHC 7.10 RC3
Simon Peyton Jones
simonpj at microsoft.com
Fri Mar 20 17:19:09 UTC 2015
Neil
Could you open a ticket for this, and attach the info to it? Lots of useful info in this thread, but may get lost.
Re empty cases, see Note [Empty case alternatives] in CoreSyn. "A case expression can have empty alternatives if (and only if) the scrutinee is bound to raise an exception or diverge."
So yes, case (\_ -> ...) of {} is definitely wrong. Could someone extend Core Lint to check for this? The function to call is exprIsHNF: if we have
case e of {}
then exprIsHNF should never return True.
That is a powerful clue, and adding it to Lint will nail the moment at which it first happens. Good progress.
(I'd do it myself but I'm struggling with backlog, and only have 3 days left before going on holiday for 10 days.)
Simon
| -----Original Message-----
| From: Neil Mitchell [mailto:ndmitchell at gmail.com]
| Sent: 20 March 2015 16:57
| To: Carter Schonwald
| Cc: Simon Peyton Jones; ghc-devs at haskell.org
| Subject: Re: Shake fails test with GHC 7.10 RC3
|
| > do you use a deepSeq when catching errors from a pure computation? Or
| > is it in a context where you know the value should be treated strictly?
|
| The computation is essentially (error "here" :: IO ()), so the IO monadic
| binding should force the exception. In the full source code to Shake it's
| actually in ContT/ReaderT/IO, but a modification to insert liftIO (if
| done in exactly the right way) gives the same result. That simplified
| code is at http://community.haskell.org/~ndm/temp/src.zip
| (to run, compile with the logic from
| https://github.com/ndmitchell/shake/blob/master/.ghci :benchmark, then
| run "main oracle test").
|
| > https://github.com/ndmitchell/shake/issues/216 seems to be the ticket
| > in question
|
| That is the ticket. The test that is failing is the "oracle"
| (Test.Oracle) test suite, and you can see full logs of both success and
| failure at Travis:
| https://travis-ci.org/ndmitchell/shake/builds/54748475 . Note that it
| succeeds on GHC 7.2, 7.4, 7.6 and 7.8, but fails on GHC 7.10 RC3 and GHC
| Head.
|
| Alas, the actual error is coming from deep in the middle of Shake,
| specifically this line:
| https://github.com/ndmitchell/shake/blob/master/src/Development/Shake/Cor
| e.hs#L329
|
| Thinking some more on what I saw, my best guess is that:
|
| case (\_ -> ...) of {}
|
| Is a fatal error in GHC. I am guessing that {} means there are no
| alternatives because GHC managed to show that the scrutinee throws an
| exception. In addition, evaluating a literal lambda to WHNF will not
| throw an exception, so you have a contradiction.
|
| Making increasingly wild speculative guesses, I could imagine this code
| compiling to CMM that just fell off the end of a basic block, arriving at
| the next statement of supposedly unconnected code. That actually matches
| what I'm seeing - I have the exception, using print I can see I am at the
| line before, and that I never get to the line after, but pop up somewhere
| nearby and continue onwards seemingly ignoring the exception ever
| happened.
|
| If someone can confirm the above pattern of Core is always illegal and
| always indicates a GHC error then I can reduce the test case driven
| purely by the Core, which would be far easier - at the moment Shake
| passes through these routines several times to set things up, making it
| harder to simplify them too much without changing the preparation steps.
|
| Thanks, Neil
|
|
| >>
| >> On Fri, Mar 20, 2015 at 12:01 AM, Neil Mitchell
| >> <ndmitchell at gmail.com>
| >> wrote:
| >> > More delving later, it seems the incorrect optimized version has
| >> > been turned into:
| >> >
| >> > case (\ _ [Occ=Dead, OS=OneShot] -> error "here") of _ [Occ=Dead]
| >> > {}
| >> >
| >> > While the working one has been turned into:
| >> >
| >> > errorFunc argument realWorldToken
| >> >
| >> > where errorFunc _ = error "here"
| >> >
| >> > I'm not familiar with case ... of _ {} - what does it mean when
| >> > there are no alternatives to the case? And why isn't case on a
| >> > literal lambda optimised out? Is it the OneShot annotation (perhaps
| >> > coming from the state hack?)
| >> >
| >> > The full trace is at
| >> > https://gist.github.com/ndmitchell/b222e04eb0c3a397c758. I've
| >> > uploaded bad (optimises the error out) and good (works as expected)
| >> > versions of the Core. The summary files are the subexpression that
| >> > changed making a single difference (moving a monomorphic NOINLINE
| >> > function from one module to another) plus the handful of functions
| >> > they depend on, which I've reformatted/inlined/simplified to produce
| the expressions above.
| >> > The full versions are the entire -ddump-simpl output from about
| >> > halfway through the build, starting when the differences occur -
| >> > let me know if you need further back. The dodgy function is "exec".
| >> >
| >> > Thanks, Neil
| >> >
| >> > On Thu, Mar 19, 2015 at 11:07 PM, Neil Mitchell
| >> > <ndmitchell at gmail.com>
| >> > wrote:
| >> >> Herbert, thanks for the list of patches, nothing obvious there -
| >> >> my best guess is it's something incredibly sensitive and it only
| >> >> needs the tiniest change anywhere to make it happen. Things like
| >> >> moving NOINLINE monomorphic-type definitions from one module to
| >> >> another are causing the bug to appear/disappear, which isn't what
| I'd expect.
| >> >>
| >> >> Simon, changing from error to error in IO causes the bug to
| >> >> disappear, but then so do most things. The error return type is
| >> >> type IO (), so I suspect that forces it to be raised at the right
| >> >> place - but it's certainly one of the possibilities for what is
| >> >> going wrong. Diffing the Core is a great idea.
| >> >>
| >> >> I'll keep reducing and see what I get to. Given the sensitivity of
| >> >> the bug, I'm sure a NOINLINE on an out-of-the-way function will
| >> >> make it go away, so I can easily fix Shake itself - so I'm more
| >> >> tracking it down from the point of GHC now.
| >> >>
| >> >> Thanks, Neil
| >> >>
| >> >>
| >> >> On Wed, Mar 18, 2015 at 5:04 PM, Simon Peyton Jones
| >> >> <simonpj at microsoft.com> wrote:
| >> >>> I'm really sorry but I can't think of anything. Sounds horrible.
| >> >>>
| >> >>> If you throw exceptions using 'error' (not in IO), then you are
| >> >>> of course vulnerable to strictness changes. If the thing isn't
| >> >>> actually evaluated inside the catch block, you won't see the
| >> >>> exception. But I'm sure you've thought of that.
| >> >>>
| >> >>> I'd experiment with one of the smaller changes you describe, such
| >> >>> as adding a putStrLn, and comparing Core, before and after.
| >> >>> Switching off -O will make a huge difference, so hard to compare.
| >> >>> Turning off the state hack will have a more global effect. But
| >> >>> the other changes sound more pin-point and hence the differences
| will be smaller.
| >> >>>
| >> >>> Simon
| >> >>>
| >> >>> | -----Original Message-----
| >> >>> | From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf
| >> >>> | Of
| >> >>> Neil
| >> >>> | Mitchell
| >> >>> | Sent: 18 March 2015 15:33
| >> >>> | To: ghc-devs at haskell.org
| >> >>> | Subject: Shake fails test with GHC 7.10 RC3
| >> >>> |
| >> >>> | Hi,
| >> >>> |
| >> >>> | Testing GHC 7.10 RC3 I've found a bug where Shake seems to
| >> >>> | catch
| >> >>> the
| >> >>> | wrong exception in the wrong place. It's only hit by one of my
| >> >>> tests,
| >> >>> | and I've managed to isolate it to a fragment of code with no
| >> >>> | unsafePerformIO, that throws exceptions using error (so not in
| >> >>> | IO),
| >> >>> and
| >> >>> | operates in IO. Turning off the stack hack makes the bug go
| >> >>> | away,
| >> >>> but
| >> >>> | then so does -O0, marking one of the functions it calls
| >> >>> | NOINLINE,
| >> >>> or
| >> >>> | moving an INLINE function it calls to a different module, or
| >> >>> | adding
| >> >>> a
| >> >>> | putStrLn under a catch block - it's very sensitive to the
| >> >>> | exact conditions. This test and this exact code worked fine
| >> >>> | with GHC 7.10 RC2.
| >> >>> |
| >> >>> | I was wondering if there have been any state hack related
| >> >>> | changes
| >> >>> or
| >> >>> | other potentially dangerous optimisation changes since RC2?
| >> >>> | I'll continue to try reducing the bug, but it's somewhat
| >> >>> | difficult as
| >> >>> the
| >> >>> | larger system is quite big, and the code is very sensitive.
| >> >>> |
| >> >>> | Thanks, Neil
| >> >>> | _______________________________________________
| >> >>> | ghc-devs mailing list
| >> >>> | ghc-devs at haskell.org
| >> >>> | http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
| >> >
| >> > On Thu, Mar 19, 2015 at 11:24 PM, Simon Peyton Jones
| >> > <simonpj at microsoft.com> wrote:
| >> >> Thanks! I think a -ddump-simpl before and after the smallest
| >> >> change the makes the difference would be illuminating.
| >> >>
| >> >> Simon
| >> >>
| >> >> | -----Original Message-----
| >> >> | From: Neil Mitchell [mailto:ndmitchell at gmail.com]
| >> >> | Sent: 19 March 2015 23:07
| >> >> | To: Simon Peyton Jones
| >> >> | Cc: ghc-devs at haskell.org
| >> >> | Subject: Re: Shake fails test with GHC 7.10 RC3
| >> >> |
| >> >> | Herbert, thanks for the list of patches, nothing obvious there -
| >> >> | my best guess is it's something incredibly sensitive and it only
| >> >> | needs the tiniest change anywhere to make it happen. Things like
| >> >> | moving NOINLINE monomorphic-type definitions from one module to
| >> >> | another are causing the bug to appear/disappear, which isn't what
| I'd expect.
| >> >> |
| >> >> | Simon, changing from error to error in IO causes the bug to
| >> >> disappear,
| >> >> | but then so do most things. The error return type is type IO (),
| >> >> | so I suspect that forces it to be raised at the right place -
| >> >> | but it's certainly one of the possibilities for what is going
| >> >> | wrong. Diffing the Core is a great idea.
| >> >> |
| >> >> | I'll keep reducing and see what I get to. Given the sensitivity
| >> >> | of
| >> >> the
| >> >> | bug, I'm sure a NOINLINE on an out-of-the-way function will make
| >> >> | it
| >> >> go
| >> >> | away, so I can easily fix Shake itself - so I'm more tracking it
| >> >> | down from the point of GHC now.
| >> >> |
| >> >> | Thanks, Neil
| >> >> |
| >> >> |
| >> >> | On Wed, Mar 18, 2015 at 5:04 PM, Simon Peyton Jones
| >> >> | <simonpj at microsoft.com> wrote:
| >> >> | > I'm really sorry but I can't think of anything. Sounds
| horrible.
| >> >> | >
| >> >> | > If you throw exceptions using 'error' (not in IO), then you
| >> >> | > are of
| >> >> | course vulnerable to strictness changes. If the thing isn't
| >> >> | actually evaluated inside the catch block, you won't see the
| >> >> | exception. But
| >> >> I'm
| >> >> | sure you've thought of that.
| >> >> | >
| >> >> | > I'd experiment with one of the smaller changes you describe,
| >> >> | > such
| >> >> as
| >> >> | adding a putStrLn, and comparing Core, before and after.
| >> >> | Switching
| >> >> off -
| >> >> | O will make a huge difference, so hard to compare. Turning off
| >> >> | the
| >> >> state
| >> >> | hack will have a more global effect. But the other changes
| >> >> | sound
| >> >> more
| >> >> | pin-point and hence the differences will be smaller.
| >> >> | >
| >> >> | > Simon
| >> >> | >
| >> >> | > | -----Original Message-----
| >> >> | > | From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On
| >> >> | > | Behalf
| >> >> Of
| >> >> | Neil
| >> >> | > | Mitchell
| >> >> | > | Sent: 18 March 2015 15:33
| >> >> | > | To: ghc-devs at haskell.org
| >> >> | > | Subject: Shake fails test with GHC 7.10 RC3
| >> >> | > |
| >> >> | > | Hi,
| >> >> | > |
| >> >> | > | Testing GHC 7.10 RC3 I've found a bug where Shake seems to
| >> >> | > | catch
| >> >> the
| >> >> | > | wrong exception in the wrong place. It's only hit by one of
| >> >> | > | my
| >> >> | tests,
| >> >> | > | and I've managed to isolate it to a fragment of code with
| >> >> | > | no unsafePerformIO, that throws exceptions using error (so
| >> >> | > | not in
| >> >> IO),
| >> >> | and
| >> >> | > | operates in IO. Turning off the stack hack makes the bug go
| >> >> away,
| >> >> | but
| >> >> | > | then so does -O0, marking one of the functions it calls
| >> >> NOINLINE, or
| >> >> | > | moving an INLINE function it calls to a different module,
| >> >> | > | or
| >> >> adding
| >> >> | a
| >> >> | > | putStrLn under a catch block - it's very sensitive to the
| >> >> | > | exact conditions. This test and this exact code worked fine
| >> >> | > | with GHC
| >> >> 7.10
| >> >> | > | RC2.
| >> >> | > |
| >> >> | > | I was wondering if there have been any state hack related
| >> >> changes or
| >> >> | > | other potentially dangerous optimisation changes since RC2?
| >> >> | > | I'll continue to try reducing the bug, but it's somewhat
| >> >> | > | difficult as
| >> >> the
| >> >> | > | larger system is quite big, and the code is very sensitive.
| >> >> | > |
| >> >> | > | Thanks, Neil
| >> >> | > | _______________________________________________
| >> >> | > | ghc-devs mailing list
| >> >> | > | ghc-devs at haskell.org
| >> >> | > | http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
| >> _______________________________________________
| >> ghc-devs mailing list
| >> ghc-devs at haskell.org
| >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
| >
| >
More information about the ghc-devs
mailing list