Mac/PPC threaded RTS problems -- potential clue
Simon Peyton-Jones
simonpj at microsoft.com
Mon Aug 14 04:31:24 EDT 2006
David
Thanks for looking into this. It sounds as if you made real progress.
This reply is just to say that Simon M is on holiday this week, so you
won't hear back from him till next wk.
Simon
| -----Original Message-----
| From: glasgow-haskell-users-bounces at haskell.org
[mailto:glasgow-haskell-users-bounces at haskell.org]
| On Behalf Of David Kirkman
| Sent: 13 August 2006 21:59
| To: glasgow-haskell-users at haskell.org
| Subject: Mac/PPC threaded RTS problems -- potential clue
|
| I managed to build ghc-6.5.20060804 on a powerpc mac, and I spent a
| little time Saturday night trying to work out what might be going on
| with the threaded RTS.
|
| Running the testsuite with make WAY=threaded1 shows many (73)
| failures. Many of them (the conc??? set) have to do with TVars, in
| particular writing TVars. On my dual-proc G5 macs, with
| ghc-6.5.20060804, the following program will hang when running with
| the threaded RTS, but will work fine when compiled without -threaded.
|
| >module Main where
| >
| >import GHC.Conc
| >import Control.Concurrent
| >
| >main = do
| > t1 <- atomically (do t1 <- newTVar 80
| > return t1)
| > atomically ( do writeTVar t1 1 )
| > putStr ("done\n")
|
| This problem seems to be powerpc specific -- it works fine with the
| threaded RTS on a multi-processor intel mac (built by me, from the
| same ghc-6.5.20060804 source tree I used to build the ppc compiler).
|
| Looking around in STM.c (via decidedly low-tech printfs) I quickly
| zoomed in on cond_lock_tvar, which lead me to cas (atomic compare and
| swap) in SMP.h, where I found (I think) a fairly clear error in the
| powerpc code -- I've appended a patch to the end of this message.
|
| The problem is that the inline assembler code was placing the result
| of an operation in a register that is used as input later in the code.
| At the bottom of this message I've extracted a short short code
| fragment that you can run through gcc (on a powerpc machine) to see
| the generated assembly output.
|
| The changes to fix the problem are fairly simple. The first adds an
| ampersand to the output list of the assembly fragment ("=r" (result)
| --> "=&r" (result)) The ampersand just tells gcc that result can not
| be placed in a register used for any of the input parameters (o, n, or
| p). Otherwise, it feels free to place output parameters in the same
| registers used by the inputs -- but because of the flow of control
| here we need everything in a distinct register. This change fixes the
| TVar program above.
|
| The second change adds a clobber list (the :"cc", "memory"). This
| tells gcc that the condition code (due to the compare) and memory (due
| to the store) might be changed during the asm execution. The lack of
| a clobber list did not seem to be causing any trouble, but without it
| gcc is free to assume that no state is changed during the execution.
|
| Applying the following patch to SMP.h, and rebuilding everything, I
| not only fixed the simple writeTVar program, but it also fixed 8
| programs in the testsuite (conc043 -> conc049, conc052 and conc053).
| The only conc test program that still fails is conc039. But there are
| still many mac problems, I still have 132 unexpected failures with
| make fast. At least the patch does not cause any new failures (in
| either make fast or make WAY=threaded1).
|
| Anyway, seeing as the change to SMP.h fixes a fair number of test
| cases in the testsuite, I figure there is some chance that it might
| fix some of the problems that people are having with the threaded RTS.
| On the other hand, I'm not real happy the the large number of
| testsuite failures my build gets, so I can't really call this
| 'tested'. But I'm posting because it might be a useful clue for
| somebody with a little more mac/ghc experience.
|
| Cheers,
|
| -david k.
|
| //
| // Short code to run through gcc -S. On my powermac, without the
change
| // the generated assembly produces
| // 1: lwarx r0, 0, r0
| // load to here ---^ ^------- from address here
| // But we need this value for
the stwcx.
| //
| // with the fix, the first line of the generated assembly becomes
| // 1: lwarx r11, 0, r0
| // and r0 remains unmodified if we need to use it later in the stwcx.
|
| /*
| * CMPXCHG - the single-word atomic compare-and-exchange instruction.
Used
| * in the STM implementation.
| */
| long cas(long* p, long o, long n)
| {
| long result;
|
| //
| // Change
| // :"=r" (result) --> :"=&r" (result)
| // to get result and p in different registers
| __asm__ __volatile__ (
| "1: lwarx %0, 0, %3\n"
| " cmpw %0, %1\n"
| " bne 2f\n"
| " stwcx. %2, 0, %3\n"
| " bne- 1b\n"
| "2:"
| :"=r" (result)
| :"r" (o), "r" (n), "r" (p)
| );
| return result;
| }
|
|
| Here's a "diff -cp" for SMP.h
|
| *** SMP.h Sun Aug 13 01:08:53 2006
| --- SMP-new.h Sun Aug 13 01:08:47 2006
| *************** cas(StgVolatilePtr p, StgWord o, StgWord
| *** 76,83 ****
| " stwcx. %2, 0, %3\n"
| " bne- 1b\n"
| "2:"
| ! :"=r" (result)
| :"r" (o), "r" (n), "r" (p)
| );
| return result;
| #else
| --- 76,84 ----
| " stwcx. %2, 0, %3\n"
| " bne- 1b\n"
| "2:"
| ! :"=&r" (result)
| :"r" (o), "r" (n), "r" (p)
| + :"cc", "memory"
| );
| return result;
| #else
| _______________________________________________
| Glasgow-haskell-users mailing list
| Glasgow-haskell-users at haskell.org
| http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
More information about the Glasgow-haskell-users
mailing list