Possible bug related to stm and exceptions

Andreas Voellmy andreas.voellmy at gmail.com
Thu Nov 7 23:21:41 UTC 2013


Hi all,

Thanks so much for everyone's responses! I finally found the problem, so I
thought I'd follow up and share what happened...

It turned out that the problem was not in the STM implementation, but
rather in bad programming on my part. For some reason, I had a thread
(thread #1) performing a transaction that blocked until any one of several
TQueues become non-empty. On success, the thread sent a value onto another
TQueue monitored by some another thread (thread #2). Thread #2 would then
process all the items in the queues monitored by the first thread.

This lead to the following problem: when one of the TQueues became
non-empty, the first thread would just go through its loop repeatedly,
filling the other queue with values, and thread #2 wouldn't get a chance to
run for a long time. This quickly lead to huge amounts of memory being used
and the program would get totally bogged down.  I finally found the problem
when I noticed that I could make the problem less severe with -C0 and more
severe with large values for -C. Large values let the first thread repeat
the loop for a longer time before the second thread is scheduled and
removes values from the queues.

-Andi



On Thu, Oct 17, 2013 at 10:53 AM, Andreas Voellmy <andreas.voellmy at gmail.com
> wrote:

> Thanks! I'll try to reduce my test case and then I'll post an issue.
>
> I'm currently suspecting that it has something to do with signal handling
> and STM. It seems that the program goes wrong after getting a SIGPIPE from
> trying to send to a closed socket.
>
>
> On Thu, Oct 17, 2013 at 9:35 AM, Ryan Yates <fryguybob at gmail.com> wrote:
>
>> The bug that Luite and I uncovered is
>> http://ghc.haskell.org/trac/ghc/ticket/7930.  It would not be related.
>>  There was a bug relating to `catchSTM` that was fixed recently:
>> http://ghc.haskell.org/trac/ghc/ticket/8035.  And another related to
>> profiling: http://ghc.haskell.org/trac/ghc/ticket/8298.  I doubt either
>> of these is related.  I'm happy to help narrow things down.
>>
>> Ryan
>>
>>
>> On Thu, Oct 17, 2013 at 4:39 AM, Simon Marlow <marlowsd at gmail.com> wrote:
>>
>>> On 17/10/2013 03:01, Andreas Voellmy wrote:
>>>
>>>> Hi all,
>>>>
>>>> I have a program that uses STM heavily and also performs lots of foreign
>>>> calls. I've noticed that sometimes the program uses 100% CPU
>>>> indefinitely and uses lots of memory - I see it go up to about 5GB
>>>> before I kill it. I've grabbed some preliminary samples of stack traces
>>>> and see lots stm related stuff (e.g. lots of stg_atomically_frame_info
>>>> and stmCommitTransaction entries).  I can pretty reliably get the
>>>> behavior to happen now by closing a socket that my Haskell program is
>>>> trying to recv from. When this causes an exception to be raised
>>>> (something like "recv: resource vanished (Connection reset by peer)") ,
>>>> then this behavior gets triggered.  I haven't pinned down the bug yet,
>>>> but I'm suspecting it is STM related - somehow the exception causes some
>>>> STM transaction to go wrong.
>>>>
>>>> Are there any known bugs that sound similar to this?
>>>>
>>>> BTW, this is with GHC 7.6.3 from a recent HP release on OS X.
>>>>
>>>
>>> Please create a ticket and dump all the information you have in it.
>>> There might be something we can tell from the stack trace, but if not we'll
>>> need a way to reproduce it.
>>>
>>> Cheers,
>>> Simon
>>>
>>>
>>> _______________________________________________
>>> ghc-devs mailing list
>>> ghc-devs at haskell.org
>>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20131107/0dc9a206/attachment.html>


More information about the ghc-devs mailing list