Mac/PPC threaded RTS problems -- potential clue

Simon Peyton-Jones simonpj at microsoft.com
Mon Aug 14 04:31:24 EDT 2006


David

Thanks for looking into this. It sounds as if you made real progress.

This reply is just to say that Simon M is on holiday this week, so you
won't hear back from him till next wk.

Simon


| -----Original Message-----
| From: glasgow-haskell-users-bounces at haskell.org
[mailto:glasgow-haskell-users-bounces at haskell.org]
| On Behalf Of David Kirkman
| Sent: 13 August 2006 21:59
| To: glasgow-haskell-users at haskell.org
| Subject: Mac/PPC threaded RTS problems -- potential clue
| 
| I managed to build ghc-6.5.20060804 on a powerpc mac, and I spent a
| little time Saturday night trying to work out what might be going on
| with the threaded RTS.
| 
| Running the testsuite with make WAY=threaded1 shows many (73)
| failures.  Many of them (the conc??? set) have to do with TVars, in
| particular writing TVars.  On my dual-proc G5 macs, with
| ghc-6.5.20060804, the following program will hang when running with
| the threaded RTS, but will work fine when compiled without -threaded.
| 
| >module Main where
| >
| >import GHC.Conc
| >import Control.Concurrent
| >
| >main = do
| >       t1 <- atomically (do t1 <- newTVar 80
| >	                    return t1)
| >       atomically ( do writeTVar t1 1 )
| >       putStr ("done\n")
| 
| This problem seems to be powerpc specific -- it works fine with the
| threaded RTS on a multi-processor intel mac (built by me, from the
| same ghc-6.5.20060804 source tree I used to build the ppc compiler).
| 
| Looking around in STM.c (via decidedly low-tech printfs) I quickly
| zoomed in on cond_lock_tvar, which lead me to cas (atomic compare and
| swap) in SMP.h, where I found (I think) a fairly clear error in the
| powerpc code -- I've appended a patch to the end of this message.
| 
| The problem is that the inline assembler code was placing the result
| of an operation in a register that is used as input later in the code.
| At the bottom of this message I've extracted a short short code
| fragment that you can run through gcc (on a powerpc machine) to see
| the generated assembly output.
| 
| The changes to fix the problem are fairly simple.  The first adds an
| ampersand to the output list of the assembly fragment ("=r" (result)
| --> "=&r" (result)) The ampersand just tells gcc that result can not
| be placed in a register used for any of the input parameters (o, n, or
| p).  Otherwise, it feels free to place output parameters in the same
| registers used by the inputs -- but because of the flow of control
| here we need everything in a distinct register.  This change fixes the
| TVar program above.
| 
| The second change adds a clobber list (the :"cc", "memory").  This
| tells gcc that the condition code (due to the compare) and memory (due
| to the store) might be changed during the asm execution.  The lack of
| a clobber list did not seem to be causing any trouble, but without it
| gcc is free to assume that no state is changed during the execution.
| 
| Applying the following patch to SMP.h, and rebuilding everything, I
| not only fixed the simple writeTVar program, but it also fixed 8
| programs in the testsuite (conc043 -> conc049, conc052 and conc053).
| The only conc test program that still fails is conc039.  But there are
| still many mac problems, I still have 132 unexpected failures with
| make fast.  At least the patch does not cause any new failures (in
| either make fast or make WAY=threaded1).
| 
| Anyway, seeing as the change to SMP.h fixes a fair number of test
| cases in the testsuite, I figure there is some chance that it might
| fix some of the problems that people are having with the threaded RTS.
| On the other hand, I'm not real happy the the large number of
| testsuite failures my build gets, so I can't really call this
| 'tested'.  But I'm posting because it might be a useful clue for
| somebody with a little more mac/ghc experience.
| 
| Cheers,
| 
| -david k.
| 
| //
| // Short code to run through gcc -S.  On my powermac, without the
change
| // the generated assembly produces
| //            1: lwarx   r0, 0, r0
| //      load to here  ---^      ^------- from address here
| //                                       But we need this value for
the stwcx.
| //
| // with the fix, the first line of the generated assembly becomes
| //            1: lwarx   r11, 0, r0
| // and r0 remains unmodified if we need to use it later in the stwcx.
| 
| /*
|   * CMPXCHG - the single-word atomic compare-and-exchange instruction.
Used
|   * in the STM implementation.
|  */
| long cas(long* p, long o, long n)
| {
|     long result;
| 
|     //
|     // Change
|     //       :"=r" (result) --> :"=&r" (result)
|     // to get result and p in different registers
|     __asm__ __volatile__ (
|         "1:     lwarx     %0, 0, %3\n"
|         "       cmpw      %0, %1\n"
|         "       bne       2f\n"
|         "       stwcx.    %2, 0, %3\n"
|         "       bne-      1b\n"
|         "2:"
|         :"=r" (result)
|         :"r" (o), "r" (n), "r" (p)
|     );
|     return result;
| }
| 
| 
| Here's a "diff -cp" for SMP.h
| 
| *** SMP.h       Sun Aug 13 01:08:53 2006
| --- SMP-new.h   Sun Aug 13 01:08:47 2006
| *************** cas(StgVolatilePtr p, StgWord o, StgWord
| *** 76,83 ****
|           "       stwcx.    %2, 0, %3\n"
|           "       bne-      1b\n"
|           "2:"
| !         :"=r" (result)
|           :"r" (o), "r" (n), "r" (p)
|       );
|       return result;
|   #else
| --- 76,84 ----
|           "       stwcx.    %2, 0, %3\n"
|           "       bne-      1b\n"
|           "2:"
| !         :"=&r" (result)
|           :"r" (o), "r" (n), "r" (p)
| +         :"cc", "memory"
|       );
|       return result;
|   #else
| _______________________________________________
| Glasgow-haskell-users mailing list
| Glasgow-haskell-users at haskell.org
| http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


More information about the Glasgow-haskell-users mailing list