Mac/PPC threaded RTS problems -- potential clue

Simon Marlow simonmarhaskell at gmail.com
Mon Aug 21 11:33:21 EDT 2006


Hi David,

Thanks for the patch, it looks reasonable so I've applied it.  This is not the 
issue affecting 6.4.3 on MacOS X/PPC, because SMP.h isn't used in the threaded 
RTS in 6.4.3.  Nevertheless, it's a good fix.

Cheers,
	Simon

David Kirkman wrote:
> I managed to build ghc-6.5.20060804 on a powerpc mac, and I spent a
> little time Saturday night trying to work out what might be going on
> with the threaded RTS.
> 
> Running the testsuite with make WAY=threaded1 shows many (73)
> failures.  Many of them (the conc??? set) have to do with TVars, in
> particular writing TVars.  On my dual-proc G5 macs, with
> ghc-6.5.20060804, the following program will hang when running with
> the threaded RTS, but will work fine when compiled without -threaded.
> 
>> module Main where
>>
>> import GHC.Conc
>> import Control.Concurrent
>>
>> main = do
>>       t1 <- atomically (do t1 <- newTVar 80
>>                         return t1)
>>       atomically ( do writeTVar t1 1 )
>>       putStr ("done\n")
> 
> 
> This problem seems to be powerpc specific -- it works fine with the
> threaded RTS on a multi-processor intel mac (built by me, from the
> same ghc-6.5.20060804 source tree I used to build the ppc compiler).
> 
> Looking around in STM.c (via decidedly low-tech printfs) I quickly
> zoomed in on cond_lock_tvar, which lead me to cas (atomic compare and
> swap) in SMP.h, where I found (I think) a fairly clear error in the
> powerpc code -- I've appended a patch to the end of this message.
> 
> The problem is that the inline assembler code was placing the result
> of an operation in a register that is used as input later in the code.
> At the bottom of this message I've extracted a short short code
> fragment that you can run through gcc (on a powerpc machine) to see
> the generated assembly output.
> 
> The changes to fix the problem are fairly simple.  The first adds an
> ampersand to the output list of the assembly fragment ("=r" (result)
> --> "=&r" (result)) The ampersand just tells gcc that result can not
> be placed in a register used for any of the input parameters (o, n, or
> p).  Otherwise, it feels free to place output parameters in the same
> registers used by the inputs -- but because of the flow of control
> here we need everything in a distinct register.  This change fixes the
> TVar program above.
> 
> The second change adds a clobber list (the :"cc", "memory").  This
> tells gcc that the condition code (due to the compare) and memory (due
> to the store) might be changed during the asm execution.  The lack of
> a clobber list did not seem to be causing any trouble, but without it
> gcc is free to assume that no state is changed during the execution.
> 
> Applying the following patch to SMP.h, and rebuilding everything, I
> not only fixed the simple writeTVar program, but it also fixed 8
> programs in the testsuite (conc043 -> conc049, conc052 and conc053).
> The only conc test program that still fails is conc039.  But there are
> still many mac problems, I still have 132 unexpected failures with
> make fast.  At least the patch does not cause any new failures (in
> either make fast or make WAY=threaded1).
> 
> Anyway, seeing as the change to SMP.h fixes a fair number of test
> cases in the testsuite, I figure there is some chance that it might
> fix some of the problems that people are having with the threaded RTS.
> On the other hand, I'm not real happy the the large number of
> testsuite failures my build gets, so I can't really call this
> 'tested'.  But I'm posting because it might be a useful clue for
> somebody with a little more mac/ghc experience.
> 
> Cheers,
> 
> -david k.
> 
> //
> // Short code to run through gcc -S.  On my powermac, without the change
> // the generated assembly produces
> //            1: lwarx   r0, 0, r0
> //      load to here  ---^      ^------- from address here
> //                                       But we need this value for the 
> stwcx.
> //
> // with the fix, the first line of the generated assembly becomes
> //            1: lwarx   r11, 0, r0
> // and r0 remains unmodified if we need to use it later in the stwcx.
> 
> /*
>  * CMPXCHG - the single-word atomic compare-and-exchange instruction.  Used
>  * in the STM implementation.
> */
> long cas(long* p, long o, long n)
> {
>    long result;
> 
>    //
>    // Change
>    //       :"=r" (result) --> :"=&r" (result)
>    // to get result and p in different registers
>    __asm__ __volatile__ (
>        "1:     lwarx     %0, 0, %3\n"
>        "       cmpw      %0, %1\n"
>        "       bne       2f\n"
>        "       stwcx.    %2, 0, %3\n"
>        "       bne-      1b\n"
>        "2:"
>        :"=r" (result)
>        :"r" (o), "r" (n), "r" (p)
>    );
>    return result;
> }
> 
> 
> Here's a "diff -cp" for SMP.h
> 
> *** SMP.h       Sun Aug 13 01:08:53 2006
> --- SMP-new.h   Sun Aug 13 01:08:47 2006
> *************** cas(StgVolatilePtr p, StgWord o, StgWord
> *** 76,83 ****
>          "       stwcx.    %2, 0, %3\n"
>          "       bne-      1b\n"
>          "2:"
> !         :"=r" (result)
>          :"r" (o), "r" (n), "r" (p)
>      );
>      return result;
>  #else
> --- 76,84 ----
>          "       stwcx.    %2, 0, %3\n"
>          "       bne-      1b\n"
>          "2:"
> !         :"=&r" (result)
>          :"r" (o), "r" (n), "r" (p)
> +         :"cc", "memory"
>      );
>      return result;
>  #else



More information about the Glasgow-haskell-users mailing list