Random crashes with memory corruption symptoms

Ömer Sinan Ağacan omeragacan at gmail.com
Mon Feb 3 05:52:15 UTC 2020


You should try with 8.8.2 which fixes a bug in the compacting GC (#17088).

When debugging it's a good idea to use the latest minor release of your GHC
version (8.8.2 in your case), as minor releases fix bugs and usually do not
introduce new ones as they don't ship new features.

If the problem still exists, unless you're interested in GHC hacking I think
most productive use of the time would be to make the reproduer smaller, and
collect as many data as possible, like which flags trigger/hide the bug.

Some of the things you could check:

- Build your program with `-dcore-lint -dstg-lint -dcmm-lint` and see if it
  builds.
- Build your program with `-debug` and run it, see if it crahes.
- Build your program with `-debug` and run it with `+RTS -DS` and see if the
  error message changes.

But really you should try with 8.8.2 as first thing. It's possible that this is
another manifestation of #17088.

Ömer

Harendra Kumar <harendra.kumar at gmail.com>, 3 Şub 2020 Pzt, 01:26
tarihinde şunu yazdı:
>
> Hi,
>
> While running a test-suite for the streaming library streamly I am encountering a crash which seems to happen at random places at different times. The common messages are:
>
> * Segmentation fault: 11
> *  internal error: scavenge_mark_stack: unimplemented/strange closure type 24792696 @ 0x4200a623e0
> * internal error: update_fwd: unknown/strange object  223743520
>
> and several other such messages. Prima facie this looks like the memory is getting corrupted/scribbled somehow. My first suspicion was that this could be a problem in the streamly library code. But I have stripped down the code to bare minimum and there is no C FFI code or no poking to memory pointers.
>
> My next suspicion was the hspec/quickcheck testing code that is being used in this test. I checked the hspec code to ensure that there is no C code/pointer poking in any of the code involved. But no luck there as well, still looking to further strip down that code.
>
> My suspicion now is moving more towards the GHC RTS. This issue only shows when the following conditions are met:
>
> * hspec "parallel" combinator is used to run tests in parallel
> * streamly concurrent code is being tested which can create many threads
> * The GHC heap size is restricted to a small size ~32MB using "-M32M" rts option.
> * It is consistently seen with GHC 8.6.5 as well as GHC 8.8.1
>
> It never occurs when the heap size is not restricted. I have seen random crashes before as well with a "IO manager die" message, when using concurrent networking IO with streamly. Though earlier it was not easily reproducible, I stopped chasing it. But now it looks like that issue might also be a manifestation of the same underlying problem.
>
> My guess is it could be something in the RTS concurrency/threading related code. Let me know if the symptoms ring a bell or if you can point to something specific based on the symptoms. Also, what are the usual tools/methods/debugging aids/flags to debug such issues in GHC? If not a GHC issue what are the possible ways in which such problem can be induced by application code?
>
> Meanwhile, I am also trying to simplify the reproducing code further to remove other factors as much as possible. The current code is at https://github.com/composewell/streamly on the ghc-segfault branch. Run "$ while true; do cabal run properties || break; done" in the shell and if you are lucky it may crash soon. The test code is in "test/Prop.hs" - here https://github.com/composewell/streamly/blob/ghc-segfault/test/Prop.hs .
>
> -harendra
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


More information about the ghc-devs mailing list