Mac/PPC threaded RTS problems -- potential clue
Simon Marlow
simonmarhaskell at gmail.com
Mon Aug 21 11:33:21 EDT 2006
Hi David,
Thanks for the patch, it looks reasonable so I've applied it. This is not the
issue affecting 6.4.3 on MacOS X/PPC, because SMP.h isn't used in the threaded
RTS in 6.4.3. Nevertheless, it's a good fix.
Cheers,
Simon
David Kirkman wrote:
> I managed to build ghc-6.5.20060804 on a powerpc mac, and I spent a
> little time Saturday night trying to work out what might be going on
> with the threaded RTS.
>
> Running the testsuite with make WAY=threaded1 shows many (73)
> failures. Many of them (the conc??? set) have to do with TVars, in
> particular writing TVars. On my dual-proc G5 macs, with
> ghc-6.5.20060804, the following program will hang when running with
> the threaded RTS, but will work fine when compiled without -threaded.
>
>> module Main where
>>
>> import GHC.Conc
>> import Control.Concurrent
>>
>> main = do
>> t1 <- atomically (do t1 <- newTVar 80
>> return t1)
>> atomically ( do writeTVar t1 1 )
>> putStr ("done\n")
>
>
> This problem seems to be powerpc specific -- it works fine with the
> threaded RTS on a multi-processor intel mac (built by me, from the
> same ghc-6.5.20060804 source tree I used to build the ppc compiler).
>
> Looking around in STM.c (via decidedly low-tech printfs) I quickly
> zoomed in on cond_lock_tvar, which lead me to cas (atomic compare and
> swap) in SMP.h, where I found (I think) a fairly clear error in the
> powerpc code -- I've appended a patch to the end of this message.
>
> The problem is that the inline assembler code was placing the result
> of an operation in a register that is used as input later in the code.
> At the bottom of this message I've extracted a short short code
> fragment that you can run through gcc (on a powerpc machine) to see
> the generated assembly output.
>
> The changes to fix the problem are fairly simple. The first adds an
> ampersand to the output list of the assembly fragment ("=r" (result)
> --> "=&r" (result)) The ampersand just tells gcc that result can not
> be placed in a register used for any of the input parameters (o, n, or
> p). Otherwise, it feels free to place output parameters in the same
> registers used by the inputs -- but because of the flow of control
> here we need everything in a distinct register. This change fixes the
> TVar program above.
>
> The second change adds a clobber list (the :"cc", "memory"). This
> tells gcc that the condition code (due to the compare) and memory (due
> to the store) might be changed during the asm execution. The lack of
> a clobber list did not seem to be causing any trouble, but without it
> gcc is free to assume that no state is changed during the execution.
>
> Applying the following patch to SMP.h, and rebuilding everything, I
> not only fixed the simple writeTVar program, but it also fixed 8
> programs in the testsuite (conc043 -> conc049, conc052 and conc053).
> The only conc test program that still fails is conc039. But there are
> still many mac problems, I still have 132 unexpected failures with
> make fast. At least the patch does not cause any new failures (in
> either make fast or make WAY=threaded1).
>
> Anyway, seeing as the change to SMP.h fixes a fair number of test
> cases in the testsuite, I figure there is some chance that it might
> fix some of the problems that people are having with the threaded RTS.
> On the other hand, I'm not real happy the the large number of
> testsuite failures my build gets, so I can't really call this
> 'tested'. But I'm posting because it might be a useful clue for
> somebody with a little more mac/ghc experience.
>
> Cheers,
>
> -david k.
>
> //
> // Short code to run through gcc -S. On my powermac, without the change
> // the generated assembly produces
> // 1: lwarx r0, 0, r0
> // load to here ---^ ^------- from address here
> // But we need this value for the
> stwcx.
> //
> // with the fix, the first line of the generated assembly becomes
> // 1: lwarx r11, 0, r0
> // and r0 remains unmodified if we need to use it later in the stwcx.
>
> /*
> * CMPXCHG - the single-word atomic compare-and-exchange instruction. Used
> * in the STM implementation.
> */
> long cas(long* p, long o, long n)
> {
> long result;
>
> //
> // Change
> // :"=r" (result) --> :"=&r" (result)
> // to get result and p in different registers
> __asm__ __volatile__ (
> "1: lwarx %0, 0, %3\n"
> " cmpw %0, %1\n"
> " bne 2f\n"
> " stwcx. %2, 0, %3\n"
> " bne- 1b\n"
> "2:"
> :"=r" (result)
> :"r" (o), "r" (n), "r" (p)
> );
> return result;
> }
>
>
> Here's a "diff -cp" for SMP.h
>
> *** SMP.h Sun Aug 13 01:08:53 2006
> --- SMP-new.h Sun Aug 13 01:08:47 2006
> *************** cas(StgVolatilePtr p, StgWord o, StgWord
> *** 76,83 ****
> " stwcx. %2, 0, %3\n"
> " bne- 1b\n"
> "2:"
> ! :"=r" (result)
> :"r" (o), "r" (n), "r" (p)
> );
> return result;
> #else
> --- 76,84 ----
> " stwcx. %2, 0, %3\n"
> " bne- 1b\n"
> "2:"
> ! :"=&r" (result)
> :"r" (o), "r" (n), "r" (p)
> + :"cc", "memory"
> );
> return result;
> #else
More information about the Glasgow-haskell-users
mailing list