<div dir="ltr">Hi,<div><br></div><div>While running a test-suite for the streaming library streamly I am encountering a crash which seems to happen at random places at different times. The common messages are:</div><div><br></div>* Segmentation fault: 11<div>* internal error: scavenge_mark_stack: unimplemented/strange closure type 24792696 @ 0x4200a623e0
</div>* internal error: update_fwd: unknown/strange object 223743520
<div><br></div><div>and several other such messages. Prima facie this looks like the memory is getting corrupted/scribbled somehow. My first suspicion was that this could be a problem in the streamly library code. But I have stripped down the code to bare minimum and there is no C FFI code or no poking to memory pointers.</div><div><br></div><div>My next suspicion was the hspec/quickcheck testing code that is being used in this test. I checked the hspec code to ensure that there is no C code/pointer poking in any of the code involved. But no luck there as well, still looking to further strip down that code.</div><div><br></div><div>My suspicion now is moving more towards the GHC RTS. This issue only shows when the following conditions are met:</div><div><br></div><div>* hspec "parallel" combinator is used to run tests in parallel</div><div>* streamly concurrent code is being tested which can create many threads</div><div>* The GHC heap size is restricted to a small size ~32MB using "-M32M" rts option.</div><div>* It is consistently seen with GHC 8.6.5 as well as GHC 8.8.1</div><div><br></div><div>It never occurs when the heap size is not restricted. I have seen random crashes before as well with a "IO manager die" message, when using concurrent networking IO with streamly. Though earlier it was not easily reproducible, I stopped chasing it. But now it looks like that issue might also be a manifestation of the same underlying problem.</div><div><br></div><div>My guess is it could be something in the RTS concurrency/threading related code. Let me know if the symptoms ring a bell or if you can point to something specific based on the symptoms. Also, what are the usual tools/methods/debugging aids/flags to debug such issues in GHC? If not a GHC issue what are the possible ways in which such problem can be induced by application code?</div><div><br></div><div>Meanwhile, I am also trying to simplify the reproducing code further to remove other factors as much as possible. The current code is at <a href="https://github.com/composewell/streamly">https://github.com/composewell/streamly</a> on the ghc-segfault branch. Run "$ while true; do cabal run properties || break; done" in the shell and if you are lucky it may crash soon. The test code is in "test/Prop.hs" - here <a href="https://github.com/composewell/streamly/blob/ghc-segfault/test/Prop.hs">https://github.com/composewell/streamly/blob/ghc-segfault/test/Prop.hs</a> .</div><div><br></div><div>-harendra</div>
</div>