Shake fails test with GHC 7.10 RC3
Simon Peyton Jones
simonpj at microsoft.com
Fri Mar 20 23:16:03 UTC 2015
| FWIW, I've been noodling over this today since I saw this email, and
| sitting down, I think this may be a blocker worth holding up for after
| thinking about it. At least, it is until we can quantify how much it
| might bite (can we find a workaround? How often does it hit in
| practice?)
I'm reluctant to delay 7.10 further. I'm sure there are other bugs, and we'll want 7.10.2 before too long. But we won't find them until we push it out. The danger is that we stay in about-to-release mode for ages, and that holds everyone up. We've had three release candidates out over a period of months and Neil seems to be the only one who has tripped over it.
I think it'd be better to push it out. (We planned to do so today!) What do others think?
Simon
| -----Original Message-----
| From: mad.one at gmail.com [mailto:mad.one at gmail.com] On Behalf Of Austin
| Seipp
| Sent: 20 March 2015 22:48
| To: Herbert Valerio Riedel
| Cc: Simon Peyton Jones; Austin Seipp
| Subject: Re: Shake fails test with GHC 7.10 RC3
|
| FWIW, I've been noodling over this today since I saw this email, and
| sitting down, I think this may be a blocker worth holding up for after
| thinking about it. At least, it is until we can quantify how much it
| might bite (can we find a workaround? How often does it hit in
| practice?)
|
| I'm testing Iavor's patches that he wanted in at the last minute (I'm
| really hesitant for that, since we could break something), so we've
| got some time, but I'd like to at least narrow this down a bit more.
|
| On Fri, Mar 20, 2015 at 12:37 PM, Herbert Valerio Riedel
| <hvriedel at gmail.com> wrote:
| >
| > ...does this in any way affect today's GHC 7.10.1 release? I.e. is this
| > blocking the release?
| >
| >
| > On 2015-03-20 at 18:19:09 +0100, Simon Peyton Jones wrote:
| >> Neil
| >>
| >> Could you open a ticket for this, and attach the info to it? Lots of
| useful info in this thread, but may get lost.
| >>
| >> Re empty cases, see Note [Empty case alternatives] in CoreSyn. "A
| >> case expression can have empty alternatives if (and only if) the
| >> scrutinee is bound to raise an exception or diverge."
| >>
| >> So yes, case (\_ -> ...) of {} is definitely wrong. Could someone
| >> extend Core Lint to check for this? The function to call is
| >> exprIsHNF: if we have
| >> case e of {}
| >> then exprIsHNF should never return True.
| >>
| >> That is a powerful clue, and adding it to Lint will nail the moment at
| which it first happens. Good progress.
| >>
| >> (I'd do it myself but I'm struggling with backlog, and only have 3
| days left before going on holiday for 10 days.)
| >>
| >> Simon
| >>
| >> | -----Original Message-----
| >> | From: Neil Mitchell [mailto:ndmitchell at gmail.com]
| >> | Sent: 20 March 2015 16:57
| >> | To: Carter Schonwald
| >> | Cc: Simon Peyton Jones; ghc-devs at haskell.org
| >> | Subject: Re: Shake fails test with GHC 7.10 RC3
| >> |
| >> | > do you use a deepSeq when catching errors from a pure
| computation? Or
| >> | > is it in a context where you know the value should be treated
| strictly?
| >> |
| >> | The computation is essentially (error "here" :: IO ()), so the IO
| monadic
| >> | binding should force the exception. In the full source code to
| Shake it's
| >> | actually in ContT/ReaderT/IO, but a modification to insert liftIO
| (if
| >> | done in exactly the right way) gives the same result. That
| simplified
| >> | code is at http://community.haskell.org/~ndm/temp/src.zip
| >> | (to run, compile with the logic from
| >> | https://github.com/ndmitchell/shake/blob/master/.ghci :benchmark,
| then
| >> | run "main oracle test").
| >> |
| >> | > https://github.com/ndmitchell/shake/issues/216 seems to be the
| ticket
| >> | > in question
| >> |
| >> | That is the ticket. The test that is failing is the "oracle"
| >> | (Test.Oracle) test suite, and you can see full logs of both success
| and
| >> | failure at Travis:
| >> | https://travis-ci.org/ndmitchell/shake/builds/54748475 . Note that
| it
| >> | succeeds on GHC 7.2, 7.4, 7.6 and 7.8, but fails on GHC 7.10 RC3
| and GHC
| >> | Head.
| >> |
| >> | Alas, the actual error is coming from deep in the middle of Shake,
| >> | specifically this line:
| >> |
| https://github.com/ndmitchell/shake/blob/master/src/Development/Shake/Cor
| >> | e.hs#L329
| >> |
| >> | Thinking some more on what I saw, my best guess is that:
| >> |
| >> | case (\_ -> ...) of {}
| >> |
| >> | Is a fatal error in GHC. I am guessing that {} means there are no
| >> | alternatives because GHC managed to show that the scrutinee throws
| an
| >> | exception. In addition, evaluating a literal lambda to WHNF will
| not
| >> | throw an exception, so you have a contradiction.
| >> |
| >> | Making increasingly wild speculative guesses, I could imagine this
| code
| >> | compiling to CMM that just fell off the end of a basic block,
| arriving at
| >> | the next statement of supposedly unconnected code. That actually
| matches
| >> | what I'm seeing - I have the exception, using print I can see I am
| at the
| >> | line before, and that I never get to the line after, but pop up
| somewhere
| >> | nearby and continue onwards seemingly ignoring the exception ever
| >> | happened.
| >> |
| >> | If someone can confirm the above pattern of Core is always illegal
| and
| >> | always indicates a GHC error then I can reduce the test case driven
| >> | purely by the Core, which would be far easier - at the moment Shake
| >> | passes through these routines several times to set things up,
| making it
| >> | harder to simplify them too much without changing the preparation
| steps.
| >> |
| >> | Thanks, Neil
| >> |
| >> |
| >> | >>
| >> | >> On Fri, Mar 20, 2015 at 12:01 AM, Neil Mitchell
| >> | >> <ndmitchell at gmail.com>
| >> | >> wrote:
| >> | >> > More delving later, it seems the incorrect optimized version
| has
| >> | >> > been turned into:
| >> | >> >
| >> | >> > case (\ _ [Occ=Dead, OS=OneShot] -> error "here") of _
| [Occ=Dead]
| >> | >> > {}
| >> | >> >
| >> | >> > While the working one has been turned into:
| >> | >> >
| >> | >> > errorFunc argument realWorldToken
| >> | >> >
| >> | >> > where errorFunc _ = error "here"
| >> | >> >
| >> | >> > I'm not familiar with case ... of _ {} - what does it mean
| when
| >> | >> > there are no alternatives to the case? And why isn't case on a
| >> | >> > literal lambda optimised out? Is it the OneShot annotation
| (perhaps
| >> | >> > coming from the state hack?)
| >> | >> >
| >> | >> > The full trace is at
| >> | >> > https://gist.github.com/ndmitchell/b222e04eb0c3a397c758. I've
| >> | >> > uploaded bad (optimises the error out) and good (works as
| expected)
| >> | >> > versions of the Core. The summary files are the subexpression
| that
| >> | >> > changed making a single difference (moving a monomorphic
| NOINLINE
| >> | >> > function from one module to another) plus the handful of
| functions
| >> | >> > they depend on, which I've reformatted/inlined/simplified to
| produce
| >> | the expressions above.
| >> | >> > The full versions are the entire -ddump-simpl output from
| about
| >> | >> > halfway through the build, starting when the differences occur
| -
| >> | >> > let me know if you need further back. The dodgy function is
| "exec".
| >> | >> >
| >> | >> > Thanks, Neil
| >> | >> >
| >> | >> > On Thu, Mar 19, 2015 at 11:07 PM, Neil Mitchell
| >> | >> > <ndmitchell at gmail.com>
| >> | >> > wrote:
| >> | >> >> Herbert, thanks for the list of patches, nothing obvious
| there -
| >> | >> >> my best guess is it's something incredibly sensitive and it
| only
| >> | >> >> needs the tiniest change anywhere to make it happen. Things
| like
| >> | >> >> moving NOINLINE monomorphic-type definitions from one module
| to
| >> | >> >> another are causing the bug to appear/disappear, which isn't
| what
| >> | I'd expect.
| >> | >> >>
| >> | >> >> Simon, changing from error to error in IO causes the bug to
| >> | >> >> disappear, but then so do most things. The error return type
| is
| >> | >> >> type IO (), so I suspect that forces it to be raised at the
| right
| >> | >> >> place - but it's certainly one of the possibilities for what
| is
| >> | >> >> going wrong. Diffing the Core is a great idea.
| >> | >> >>
| >> | >> >> I'll keep reducing and see what I get to. Given the
| sensitivity of
| >> | >> >> the bug, I'm sure a NOINLINE on an out-of-the-way function
| will
| >> | >> >> make it go away, so I can easily fix Shake itself - so I'm
| more
| >> | >> >> tracking it down from the point of GHC now.
| >> | >> >>
| >> | >> >> Thanks, Neil
| >> | >> >>
| >> | >> >>
| >> | >> >> On Wed, Mar 18, 2015 at 5:04 PM, Simon Peyton Jones
| >> | >> >> <simonpj at microsoft.com> wrote:
| >> | >> >>> I'm really sorry but I can't think of anything. Sounds
| horrible.
| >> | >> >>>
| >> | >> >>> If you throw exceptions using 'error' (not in IO), then you
| are
| >> | >> >>> of course vulnerable to strictness changes. If the thing
| isn't
| >> | >> >>> actually evaluated inside the catch block, you won't see the
| >> | >> >>> exception. But I'm sure you've thought of that.
| >> | >> >>>
| >> | >> >>> I'd experiment with one of the smaller changes you describe,
| such
| >> | >> >>> as adding a putStrLn, and comparing Core, before and after.
| >> | >> >>> Switching off -O will make a huge difference, so hard to
| compare.
| >> | >> >>> Turning off the state hack will have a more global effect.
| But
| >> | >> >>> the other changes sound more pin-point and hence the
| differences
| >> | will be smaller.
| >> | >> >>>
| >> | >> >>> Simon
| >> | >> >>>
| >> | >> >>> | -----Original Message-----
| >> | >> >>> | From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On
| Behalf
| >> | >> >>> | Of
| >> | >> >>> Neil
| >> | >> >>> | Mitchell
| >> | >> >>> | Sent: 18 March 2015 15:33
| >> | >> >>> | To: ghc-devs at haskell.org
| >> | >> >>> | Subject: Shake fails test with GHC 7.10 RC3
| >> | >> >>> |
| >> | >> >>> | Hi,
| >> | >> >>> |
| >> | >> >>> | Testing GHC 7.10 RC3 I've found a bug where Shake seems
| to
| >> | >> >>> | catch
| >> | >> >>> the
| >> | >> >>> | wrong exception in the wrong place. It's only hit by one
| of my
| >> | >> >>> tests,
| >> | >> >>> | and I've managed to isolate it to a fragment of code with
| no
| >> | >> >>> | unsafePerformIO, that throws exceptions using error (so
| not in
| >> | >> >>> | IO),
| >> | >> >>> and
| >> | >> >>> | operates in IO. Turning off the stack hack makes the bug
| go
| >> | >> >>> | away,
| >> | >> >>> but
| >> | >> >>> | then so does -O0, marking one of the functions it calls
| >> | >> >>> | NOINLINE,
| >> | >> >>> or
| >> | >> >>> | moving an INLINE function it calls to a different module,
| or
| >> | >> >>> | adding
| >> | >> >>> a
| >> | >> >>> | putStrLn under a catch block - it's very sensitive to the
| >> | >> >>> | exact conditions. This test and this exact code worked
| fine
| >> | >> >>> | with GHC 7.10 RC2.
| >> | >> >>> |
| >> | >> >>> | I was wondering if there have been any state hack related
| >> | >> >>> | changes
| >> | >> >>> or
| >> | >> >>> | other potentially dangerous optimisation changes since
| RC2?
| >> | >> >>> | I'll continue to try reducing the bug, but it's somewhat
| >> | >> >>> | difficult as
| >> | >> >>> the
| >> | >> >>> | larger system is quite big, and the code is very
| sensitive.
| >> | >> >>> |
| >> | >> >>> | Thanks, Neil
| >> | >> >>> | _______________________________________________
| >> | >> >>> | ghc-devs mailing list
| >> | >> >>> | ghc-devs at haskell.org
| >> | >> >>> | http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
| >> | >> >
| >> | >> > On Thu, Mar 19, 2015 at 11:24 PM, Simon Peyton Jones
| >> | >> > <simonpj at microsoft.com> wrote:
| >> | >> >> Thanks! I think a -ddump-simpl before and after the smallest
| >> | >> >> change the makes the difference would be illuminating.
| >> | >> >>
| >> | >> >> Simon
| >> | >> >>
| >> | >> >> | -----Original Message-----
| >> | >> >> | From: Neil Mitchell [mailto:ndmitchell at gmail.com]
| >> | >> >> | Sent: 19 March 2015 23:07
| >> | >> >> | To: Simon Peyton Jones
| >> | >> >> | Cc: ghc-devs at haskell.org
| >> | >> >> | Subject: Re: Shake fails test with GHC 7.10 RC3
| >> | >> >> |
| >> | >> >> | Herbert, thanks for the list of patches, nothing obvious
| there -
| >> | >> >> | my best guess is it's something incredibly sensitive and it
| only
| >> | >> >> | needs the tiniest change anywhere to make it happen. Things
| like
| >> | >> >> | moving NOINLINE monomorphic-type definitions from one
| module to
| >> | >> >> | another are causing the bug to appear/disappear, which
| isn't what
| >> | I'd expect.
| >> | >> >> |
| >> | >> >> | Simon, changing from error to error in IO causes the bug to
| >> | >> >> disappear,
| >> | >> >> | but then so do most things. The error return type is type
| IO (),
| >> | >> >> | so I suspect that forces it to be raised at the right place
| -
| >> | >> >> | but it's certainly one of the possibilities for what is
| going
| >> | >> >> | wrong. Diffing the Core is a great idea.
| >> | >> >> |
| >> | >> >> | I'll keep reducing and see what I get to. Given the
| sensitivity
| >> | >> >> | of
| >> | >> >> the
| >> | >> >> | bug, I'm sure a NOINLINE on an out-of-the-way function will
| make
| >> | >> >> | it
| >> | >> >> go
| >> | >> >> | away, so I can easily fix Shake itself - so I'm more
| tracking it
| >> | >> >> | down from the point of GHC now.
| >> | >> >> |
| >> | >> >> | Thanks, Neil
| >> | >> >> |
| >> | >> >> |
| >> | >> >> | On Wed, Mar 18, 2015 at 5:04 PM, Simon Peyton Jones
| >> | >> >> | <simonpj at microsoft.com> wrote:
| >> | >> >> | > I'm really sorry but I can't think of anything. Sounds
| >> | horrible.
| >> | >> >> | >
| >> | >> >> | > If you throw exceptions using 'error' (not in IO), then
| you
| >> | >> >> | > are of
| >> | >> >> | course vulnerable to strictness changes. If the thing
| isn't
| >> | >> >> | actually evaluated inside the catch block, you won't see
| the
| >> | >> >> | exception. But
| >> | >> >> I'm
| >> | >> >> | sure you've thought of that.
| >> | >> >> | >
| >> | >> >> | > I'd experiment with one of the smaller changes you
| describe,
| >> | >> >> | > such
| >> | >> >> as
| >> | >> >> | adding a putStrLn, and comparing Core, before and after.
| >> | >> >> | Switching
| >> | >> >> off -
| >> | >> >> | O will make a huge difference, so hard to compare. Turning
| off
| >> | >> >> | the
| >> | >> >> state
| >> | >> >> | hack will have a more global effect. But the other changes
| >> | >> >> | sound
| >> | >> >> more
| >> | >> >> | pin-point and hence the differences will be smaller.
| >> | >> >> | >
| >> | >> >> | > Simon
| >> | >> >> | >
| >> | >> >> | > | -----Original Message-----
| >> | >> >> | > | From: ghc-devs [mailto:ghc-devs-bounces at haskell.org]
| On
| >> | >> >> | > | Behalf
| >> | >> >> Of
| >> | >> >> | Neil
| >> | >> >> | > | Mitchell
| >> | >> >> | > | Sent: 18 March 2015 15:33
| >> | >> >> | > | To: ghc-devs at haskell.org
| >> | >> >> | > | Subject: Shake fails test with GHC 7.10 RC3
| >> | >> >> | > |
| >> | >> >> | > | Hi,
| >> | >> >> | > |
| >> | >> >> | > | Testing GHC 7.10 RC3 I've found a bug where Shake
| seems to
| >> | >> >> | > | catch
| >> | >> >> the
| >> | >> >> | > | wrong exception in the wrong place. It's only hit by
| one of
| >> | >> >> | > | my
| >> | >> >> | tests,
| >> | >> >> | > | and I've managed to isolate it to a fragment of code
| with
| >> | >> >> | > | no unsafePerformIO, that throws exceptions using error
| (so
| >> | >> >> | > | not in
| >> | >> >> IO),
| >> | >> >> | and
| >> | >> >> | > | operates in IO. Turning off the stack hack makes the
| bug go
| >> | >> >> away,
| >> | >> >> | but
| >> | >> >> | > | then so does -O0, marking one of the functions it
| calls
| >> | >> >> NOINLINE, or
| >> | >> >> | > | moving an INLINE function it calls to a different
| module,
| >> | >> >> | > | or
| >> | >> >> adding
| >> | >> >> | a
| >> | >> >> | > | putStrLn under a catch block - it's very sensitive to
| the
| >> | >> >> | > | exact conditions. This test and this exact code worked
| fine
| >> | >> >> | > | with GHC
| >> | >> >> 7.10
| >> | >> >> | > | RC2.
| >> | >> >> | > |
| >> | >> >> | > | I was wondering if there have been any state hack
| related
| >> | >> >> changes or
| >> | >> >> | > | other potentially dangerous optimisation changes since
| RC2?
| >> | >> >> | > | I'll continue to try reducing the bug, but it's
| somewhat
| >> | >> >> | > | difficult as
| >> | >> >> the
| >> | >> >> | > | larger system is quite big, and the code is very
| sensitive.
| >> | >> >> | > |
| >> | >> >> | > | Thanks, Neil
| >> | >> >> | > | _______________________________________________
| >> | >> >> | > | ghc-devs mailing list
| >> | >> >> | > | ghc-devs at haskell.org
| >> | >> >> | > | http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-
| devs
| >> | >> _______________________________________________
| >> | >> ghc-devs mailing list
| >> | >> ghc-devs at haskell.org
| >> | >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
| >> | >
| >> | >
| >> _______________________________________________
| >> ghc-devs mailing list
| >> ghc-devs at haskell.org
| >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
| >
| > --
| > "Elegance is not optional" -- Richard O'Keefe
| >
|
|
|
| --
| Regards,
|
| Austin Seipp, Haskell Consultant
| Well-Typed LLP, http://www.well-typed.com/
More information about the ghc-devs
mailing list