Shake fails test with GHC 7.10 RC3

Simon Peyton Jones simonpj at microsoft.com
Fri Mar 20 17:19:09 UTC 2015


Neil

Could you open a ticket for this, and attach the info to it?  Lots of useful info in this thread, but may get lost.

Re empty cases, see Note [Empty case alternatives] in CoreSyn.  "A case expression can have empty alternatives if (and only if) the scrutinee is bound to raise an exception or diverge."

So yes, case (\_ -> ...) of {} is definitely wrong.  Could someone extend Core Lint to check for this?  The function to call is exprIsHNF: if we have 
	case e of {}
then exprIsHNF should never return True.

That is a powerful clue, and adding it to Lint will nail the moment at which it first happens.  Good progress.

(I'd do it myself but I'm struggling with backlog, and only have 3 days left before going on holiday for 10 days.)

Simon

|  -----Original Message-----
|  From: Neil Mitchell [mailto:ndmitchell at gmail.com]
|  Sent: 20 March 2015 16:57
|  To: Carter Schonwald
|  Cc: Simon Peyton Jones; ghc-devs at haskell.org
|  Subject: Re: Shake fails test with GHC 7.10 RC3
|  
|  > do you use a deepSeq when catching errors from a pure computation? Or
|  > is it in a context where you know the value should be treated strictly?
|  
|  The computation is essentially (error "here" :: IO ()), so the IO monadic
|  binding should force the exception. In the full source code to Shake it's
|  actually in ContT/ReaderT/IO, but a modification to insert liftIO (if
|  done in exactly the right way) gives the same result. That simplified
|  code is at http://community.haskell.org/~ndm/temp/src.zip
|  (to run, compile with the logic from
|  https://github.com/ndmitchell/shake/blob/master/.ghci :benchmark, then
|  run "main oracle test").
|  
|  > https://github.com/ndmitchell/shake/issues/216 seems to be the ticket
|  > in question
|  
|  That is the ticket. The test that is failing is the "oracle"
|  (Test.Oracle) test suite, and you can see full logs of both success and
|  failure at Travis:
|  https://travis-ci.org/ndmitchell/shake/builds/54748475 . Note that it
|  succeeds on GHC 7.2, 7.4, 7.6 and 7.8, but fails on GHC 7.10 RC3 and GHC
|  Head.
|  
|  Alas, the actual error is coming from deep in the middle of Shake,
|  specifically this line:
|  https://github.com/ndmitchell/shake/blob/master/src/Development/Shake/Cor
|  e.hs#L329
|  
|  Thinking some more on what I saw, my best guess is that:
|  
|  case (\_ -> ...) of {}
|  
|  Is a fatal error in GHC. I am guessing that {} means there are no
|  alternatives because GHC managed to show that the scrutinee throws an
|  exception. In addition, evaluating a literal lambda to WHNF will not
|  throw an exception, so you have a contradiction.
|  
|  Making increasingly wild speculative guesses, I could imagine this code
|  compiling to CMM that just fell off the end of a basic block, arriving at
|  the next statement of supposedly unconnected code. That actually matches
|  what I'm seeing - I have the exception, using print I can see I am at the
|  line before, and that I never get to the line after, but pop up somewhere
|  nearby and continue onwards seemingly ignoring the exception ever
|  happened.
|  
|  If someone can confirm the above pattern of Core is always illegal and
|  always indicates a GHC error then I can reduce the test case driven
|  purely by the Core, which would be far easier - at the moment Shake
|  passes through these routines several times to set things up, making it
|  harder to simplify them too much without changing the preparation steps.
|  
|  Thanks, Neil
|  
|  
|  >>
|  >> On Fri, Mar 20, 2015 at 12:01 AM, Neil Mitchell
|  >> <ndmitchell at gmail.com>
|  >> wrote:
|  >> > More delving later, it seems the incorrect optimized version has
|  >> > been turned into:
|  >> >
|  >> > case (\ _ [Occ=Dead, OS=OneShot] -> error "here") of _ [Occ=Dead]
|  >> > {}
|  >> >
|  >> > While the working one has been turned into:
|  >> >
|  >> > errorFunc argument realWorldToken
|  >> >
|  >> > where errorFunc _ = error "here"
|  >> >
|  >> > I'm not familiar with case ... of _ {} - what does it mean when
|  >> > there are no alternatives to the case? And why isn't case on a
|  >> > literal lambda optimised out? Is it the OneShot annotation (perhaps
|  >> > coming from the state hack?)
|  >> >
|  >> > The full trace is at
|  >> > https://gist.github.com/ndmitchell/b222e04eb0c3a397c758. I've
|  >> > uploaded bad (optimises the error out) and good (works as expected)
|  >> > versions of the Core. The summary files are the subexpression that
|  >> > changed making a single difference (moving a monomorphic NOINLINE
|  >> > function from one module to another) plus the handful of functions
|  >> > they depend on, which I've reformatted/inlined/simplified to produce
|  the expressions above.
|  >> > The full versions are the entire -ddump-simpl output from about
|  >> > halfway through the build, starting when the differences occur -
|  >> > let me know if you need further back. The dodgy function is "exec".
|  >> >
|  >> > Thanks, Neil
|  >> >
|  >> > On Thu, Mar 19, 2015 at 11:07 PM, Neil Mitchell
|  >> > <ndmitchell at gmail.com>
|  >> > wrote:
|  >> >> Herbert, thanks for the list of patches, nothing obvious there -
|  >> >> my best guess is it's something incredibly sensitive and it only
|  >> >> needs the tiniest change anywhere to make it happen. Things like
|  >> >> moving NOINLINE monomorphic-type definitions from one module to
|  >> >> another are causing the bug to appear/disappear, which isn't what
|  I'd expect.
|  >> >>
|  >> >> Simon, changing from error to error in IO causes the bug to
|  >> >> disappear, but then so do most things. The error return type is
|  >> >> type IO (), so I suspect that forces it to be raised at the right
|  >> >> place - but it's certainly one of the possibilities for what is
|  >> >> going wrong. Diffing the Core is a great idea.
|  >> >>
|  >> >> I'll keep reducing and see what I get to. Given the sensitivity of
|  >> >> the bug, I'm sure a NOINLINE on an out-of-the-way function will
|  >> >> make it go away, so I can easily fix Shake itself - so I'm more
|  >> >> tracking it down from the point of GHC now.
|  >> >>
|  >> >> Thanks, Neil
|  >> >>
|  >> >>
|  >> >> On Wed, Mar 18, 2015 at 5:04 PM, Simon Peyton Jones
|  >> >> <simonpj at microsoft.com> wrote:
|  >> >>> I'm really sorry but I can't think of anything.  Sounds horrible.
|  >> >>>
|  >> >>> If you throw exceptions using 'error' (not in IO), then you are
|  >> >>> of course vulnerable to strictness changes.  If the thing isn't
|  >> >>> actually evaluated inside the catch block, you won't see the
|  >> >>> exception.  But I'm sure you've thought of that.
|  >> >>>
|  >> >>> I'd experiment with one of the smaller changes you describe, such
|  >> >>> as adding a putStrLn, and comparing Core, before and after.
|  >> >>> Switching off -O will make a huge difference, so hard to compare.
|  >> >>> Turning off the state hack will have a more global effect.  But
|  >> >>> the other changes sound more pin-point and hence the differences
|  will be smaller.
|  >> >>>
|  >> >>> Simon
|  >> >>>
|  >> >>> |  -----Original Message-----
|  >> >>> |  From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf
|  >> >>> | Of
|  >> >>> Neil
|  >> >>> |  Mitchell
|  >> >>> |  Sent: 18 March 2015 15:33
|  >> >>> |  To: ghc-devs at haskell.org
|  >> >>> |  Subject: Shake fails test with GHC 7.10 RC3
|  >> >>> |
|  >> >>> |  Hi,
|  >> >>> |
|  >> >>> |  Testing GHC 7.10 RC3 I've found a bug where Shake seems to
|  >> >>> | catch
|  >> >>> the
|  >> >>> |  wrong exception in the wrong place. It's only hit by one of my
|  >> >>> tests,
|  >> >>> |  and I've managed to isolate it to a fragment of code with no
|  >> >>> | unsafePerformIO, that throws exceptions using error (so not in
|  >> >>> | IO),
|  >> >>> and
|  >> >>> |  operates in IO. Turning off the stack hack makes the bug go
|  >> >>> | away,
|  >> >>> but
|  >> >>> |  then so does -O0, marking one of the functions it calls
|  >> >>> | NOINLINE,
|  >> >>> or
|  >> >>> |  moving an INLINE function it calls to a different module, or
|  >> >>> | adding
|  >> >>> a
|  >> >>> |  putStrLn under a catch block - it's very sensitive to the
|  >> >>> | exact  conditions. This test and this exact code worked fine
|  >> >>> | with GHC 7.10  RC2.
|  >> >>> |
|  >> >>> |  I was wondering if there have been any state hack related
|  >> >>> | changes
|  >> >>> or
|  >> >>> |  other potentially dangerous optimisation changes since RC2?
|  >> >>> | I'll  continue to try reducing the bug, but it's somewhat
|  >> >>> | difficult as
|  >> >>> the
|  >> >>> |  larger system is quite big, and the code is very sensitive.
|  >> >>> |
|  >> >>> |  Thanks, Neil
|  >> >>> |  _______________________________________________
|  >> >>> |  ghc-devs mailing list
|  >> >>> |  ghc-devs at haskell.org
|  >> >>> |  http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
|  >> >
|  >> > On Thu, Mar 19, 2015 at 11:24 PM, Simon Peyton Jones
|  >> > <simonpj at microsoft.com> wrote:
|  >> >> Thanks!  I think a -ddump-simpl before and after the smallest
|  >> >> change the makes the difference would be illuminating.
|  >> >>
|  >> >> Simon
|  >> >>
|  >> >> | -----Original Message-----
|  >> >> | From: Neil Mitchell [mailto:ndmitchell at gmail.com]
|  >> >> | Sent: 19 March 2015 23:07
|  >> >> | To: Simon Peyton Jones
|  >> >> | Cc: ghc-devs at haskell.org
|  >> >> | Subject: Re: Shake fails test with GHC 7.10 RC3
|  >> >> |
|  >> >> | Herbert, thanks for the list of patches, nothing obvious there -
|  >> >> | my best guess is it's something incredibly sensitive and it only
|  >> >> | needs the tiniest change anywhere to make it happen. Things like
|  >> >> | moving NOINLINE monomorphic-type definitions from one module to
|  >> >> | another are causing the bug to appear/disappear, which isn't what
|  I'd expect.
|  >> >> |
|  >> >> | Simon, changing from error to error in IO causes the bug to
|  >> >> disappear,
|  >> >> | but then so do most things. The error return type is type IO (),
|  >> >> | so I suspect that forces it to be raised at the right place -
|  >> >> | but it's certainly one of the possibilities for what is going
|  >> >> | wrong. Diffing the Core is a great idea.
|  >> >> |
|  >> >> | I'll keep reducing and see what I get to. Given the sensitivity
|  >> >> | of
|  >> >> the
|  >> >> | bug, I'm sure a NOINLINE on an out-of-the-way function will make
|  >> >> | it
|  >> >> go
|  >> >> | away, so I can easily fix Shake itself - so I'm more tracking it
|  >> >> | down from the point of GHC now.
|  >> >> |
|  >> >> | Thanks, Neil
|  >> >> |
|  >> >> |
|  >> >> | On Wed, Mar 18, 2015 at 5:04 PM, Simon Peyton Jones
|  >> >> | <simonpj at microsoft.com> wrote:
|  >> >> | > I'm really sorry but I can't think of anything.  Sounds
|  horrible.
|  >> >> | >
|  >> >> | > If you throw exceptions using 'error' (not in IO), then you
|  >> >> | > are of
|  >> >> | course vulnerable to strictness changes.  If the thing isn't
|  >> >> | actually evaluated inside the catch block, you won't see the
|  >> >> | exception.  But
|  >> >> I'm
|  >> >> | sure you've thought of that.
|  >> >> | >
|  >> >> | > I'd experiment with one of the smaller changes you describe,
|  >> >> | > such
|  >> >> as
|  >> >> | adding a putStrLn, and comparing Core, before and after.
|  >> >> | Switching
|  >> >> off -
|  >> >> | O will make a huge difference, so hard to compare.  Turning off
|  >> >> | the
|  >> >> state
|  >> >> | hack will have a more global effect.  But the other changes
|  >> >> | sound
|  >> >> more
|  >> >> | pin-point and hence the differences will be smaller.
|  >> >> | >
|  >> >> | > Simon
|  >> >> | >
|  >> >> | > |  -----Original Message-----
|  >> >> | > |  From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On
|  >> >> | > | Behalf
|  >> >> Of
|  >> >> | Neil
|  >> >> | > |  Mitchell
|  >> >> | > |  Sent: 18 March 2015 15:33
|  >> >> | > |  To: ghc-devs at haskell.org
|  >> >> | > |  Subject: Shake fails test with GHC 7.10 RC3
|  >> >> | > |
|  >> >> | > |  Hi,
|  >> >> | > |
|  >> >> | > |  Testing GHC 7.10 RC3 I've found a bug where Shake seems to
|  >> >> | > | catch
|  >> >> the
|  >> >> | > |  wrong exception in the wrong place. It's only hit by one of
|  >> >> | > | my
|  >> >> | tests,
|  >> >> | > |  and I've managed to isolate it to a fragment of code with
|  >> >> | > | no  unsafePerformIO, that throws exceptions using error (so
|  >> >> | > | not in
|  >> >> IO),
|  >> >> | and
|  >> >> | > |  operates in IO. Turning off the stack hack makes the bug go
|  >> >> away,
|  >> >> | but
|  >> >> | > |  then so does -O0, marking one of the functions it calls
|  >> >> NOINLINE, or
|  >> >> | > |  moving an INLINE function it calls to a different module,
|  >> >> | > | or
|  >> >> adding
|  >> >> | a
|  >> >> | > |  putStrLn under a catch block - it's very sensitive to the
|  >> >> | > | exact  conditions. This test and this exact code worked fine
|  >> >> | > | with GHC
|  >> >> 7.10
|  >> >> | > |  RC2.
|  >> >> | > |
|  >> >> | > |  I was wondering if there have been any state hack related
|  >> >> changes or
|  >> >> | > |  other potentially dangerous optimisation changes since RC2?
|  >> >> | > | I'll  continue to try reducing the bug, but it's somewhat
|  >> >> | > | difficult as
|  >> >> the
|  >> >> | > |  larger system is quite big, and the code is very sensitive.
|  >> >> | > |
|  >> >> | > |  Thanks, Neil
|  >> >> | > |  _______________________________________________
|  >> >> | > |  ghc-devs mailing list
|  >> >> | > |  ghc-devs at haskell.org
|  >> >> | > |  http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
|  >> _______________________________________________
|  >> ghc-devs mailing list
|  >> ghc-devs at haskell.org
|  >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
|  >
|  >


More information about the ghc-devs mailing list