Proposal: provide cas and barriers symbols even without -threaded
Ryan Newton
rrnewton at gmail.com
Sat Jul 20 08:18:35 CEST 2013
Hi Carter,
Yes, SMP.h is where I've copy pasted the duplicate functionality from
(since I can't presently rely on linking the symbols).
Your proposal for the LLVM backend sounds **great**. But it also is going
to provide additional constraints for getting "atomic-primops" right.
The goal of atomic-primops is to be a stable Haskell-level interface
into the relevant CAS and fetch-and-add stuff. The reason this is
important is that one has to be very careful to defeat the GHC optimizer in
all the relevant places and make pointer equality a reliable property. I
would like to get atomic-primops to work reliably in 7.4, 7.6 [and 7.8] and
have more "native" support in future GHC releases, where maybe the foreign
primops would become unecessary. (They are a pain and have already exposed
one blocking cabal bug, fixed in upcoming 1.17.)
A couple additional suggestions for the proposal in ticket #7883:
- we should use more unique symbols than "cas", especially for this
rewriting trick. How about "ghc_cas" or something?
- it would be great to get at least fetch-and-add in addition to CAS and
barriers
- if we reliably provide this set of special symbols, libraries like
atomic-primops may use them in the .cmm and benefit from the CMM->LLVM
substitutions
- if we include all the primops I need in GHC proper the previous bullet
will stop applying ;-)
Cheers,
-Ryan
P.S. Just as a bit of motivation, here are some recent performance numbers.
We often wonder about how close our "pure values in a box" approach comes
to efficient lock-free structures. Well here are some numbers about using
a proper unboxed counter in the Haskell heap, vs using an IORef Int and
atomicModifyIORef': Up to 100X performance difference on some platforms
for microbenchmarks that hammer a counter:
https://github.com/rrnewton/haskell-lockfree-queue/blob/fb12d1121690553e4f737af258848f279147ea24/AtomicPrimops/DEVLOG.md#20130718-timing-atomic-counter-ops
And here are the performance and scaling advantages of using ChaseLev
(based on atomic-primops), over a traditional pure-in-a-box structure
(IORef Data.Seq). The following are timings of ChaseLev/traditional
respectively on a 32 core westmere:
fib(42) 1 threads: 21s
fib(42) 2 threads: 10.1s
fib(42) 4 threads: 5.2s (100%prod)
fib(42) 8 threads: 2.7s - 3.2s (100%prod)
fib(42) 16 threads: 1.28s
fib(42) 24 threads: 1.85s
fib(42) 32 threads: 4.8s (high variance)
(hive) fib(42) 1 threads: 41.8s (95% prod)
(hive) fib(42) 2 threads: 25.2s (66% prod)
(hive) fib(42) 4 threads: 14.6s (27% prod, 135GB alloc)
(hive) fib(42) 8 threads: 17.1s (26% prod)
(hive) fib(42) 16 threads: 16.3s (13% prod)
(hive) fib(42) 24 threads: 21.2s (30% prod)
(hive) fib(42) 32 threads: 29.3s (33% prod)
And that is WITH the inefficiency of doing a "ccall" on every single atomic
operation.
Notes on parfib performance are here:
https://github.com/rrnewton/haskell-lockfree-queue/blob/d6d3e9eda2a487a5f055b1f51423954bb6b6bdfa/ChaseLev/Test.hs#L158
On Fri, Jul 19, 2013 at 5:05 PM, Carter Schonwald <
carter.schonwald at gmail.com> wrote:
> ryan, the relevant machinery on the C side is here, see
> ./includes/stg/SMP.h :
> https://github.com/ghc/ghc/blob/7cc8a3cc5c2970009b83844ff9cc4e27913b8559/includes/stg/SMP.h
>
> (unless i'm missing something)
>
>
> On Fri, Jul 19, 2013 at 4:53 PM, Carter Schonwald <
> carter.schonwald at gmail.com> wrote:
>
>> Ryan,
>> if you look at line 270, you'll see the CAS is a C call
>> https://github.com/ghc/ghc/blob/95e6865ecf06b2bd80fa737e4fa4a24beaae25c5/rts/PrimOps.cmm#L270
>>
>>
>> What Simon is alluding to is some work I started (but need to finish)
>> http://ghc.haskell.org/trac/ghc/ticket/7883 is the relevant ticket, and
>> I'll need to sort out doing the same on the native code gen too
>>
>> there ARE no write barrier primops, they're baked into the CAS machinery
>> in ghc's rts
>>
>>
>> On Fri, Jul 19, 2013 at 1:02 PM, Ryan Newton <rrnewton at gmail.com> wrote:
>>
>>> Yes, I'd absolutely rather not suffer C call overhead for these
>>> functions (or the CAS functions). But isn't that how it's done currently
>>> for the casMutVar# primop?
>>>
>>>
>>> https://github.com/ghc/ghc/blob/95e6865ecf06b2bd80fa737e4fa4a24beaae25c5/rts/PrimOps.cmm#L265
>>>
>>> To avoid the overhead, is it necessary to make each primop in-line
>>> rather than out-of-line, or just to get rid of the "ccall"?
>>>
>>> Another reason it would be good to package these with GHC is that I'm
>>> having trouble building robust libraries of foreign primops that work under
>>> all "ways" (e.g. GHCI). For example, this bug:
>>>
>>> https://github.com/rrnewton/haskell-lockfree-queue/issues/10
>>>
>>> If I write .cmm code that depends on RTS functionality like
>>> stg_MUT_VAR_CLEAN_info, then it seems to work fine when in compiled mode
>>> (with/without threading, profiling), but I get link errors from GHCI where
>>> these symbols aren't defined.
>>>
>>> I've got a draft of the relevant primops here:
>>>
>>>
>>> https://github.com/rrnewton/haskell-lockfree-queue/blob/master/AtomicPrimops/cbits/primops.cmm
>>>
>>> Which includes:
>>>
>>> - variants of CAS for MutableArray# and MutableByteArray#
>>> - fetch-and-add for MutableByteArray#
>>>
>>> Also, there are some tweaks to support the new "ticketed" interface for
>>> safer CAS:
>>>
>>>
>>> http://hackage.haskell.org/packages/archive/atomic-primops/0.3/doc/html/Data-Atomics.html#g:3
>>>
>>> I started adding some of these primops to GHC proper (still as
>>> out-of-line), but not all of them. I had gone with the foreign primop
>>> route instead...
>>>
>>> https://github.com/rrnewton/ghc/commits/master
>>>
>>> -Ryan
>>>
>>> P.S. Where is the write barrier primop? I don't see it listed in
>>> prelude/primops.txt...
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Jul 19, 2013 at 11:41 AM, Carter Schonwald <
>>> carter.schonwald at gmail.com> wrote:
>>>
>>>> I guess I should find the time to finish the CAS primop work I
>>>> volunteered to do then. Ill look into in a few days.
>>>>
>>>>
>>>> On Friday, July 19, 2013, Simon Marlow wrote:
>>>>
>>>>> On 18/07/13 14:17, Ryan Newton wrote:
>>>>>
>>>>>> The "atomic-primops" library depends on symbols such as
>>>>>> store_load_barrier and "cas", which are defined in SMP.h. Thus the
>>>>>> result is that if the program is linked WITHOUT "-threaded", the user
>>>>>> gets a linker error about undefined symbols.
>>>>>>
>>>>>> The specific place it's used is in the 'foreign "C"' bits of this
>>>>>> .cmm code:
>>>>>>
>>>>>> https://github.com/rrnewton/**haskell-lockfree-queue/blob/**
>>>>>> 87e63b21b2a6c375e93c30b98c28c1**d04f88781c/AtomicPrimops/**
>>>>>> cbits/primops.cmm<https://github.com/rrnewton/haskell-lockfree-queue/blob/87e63b21b2a6c375e93c30b98c28c1d04f88781c/AtomicPrimops/cbits/primops.cmm>
>>>>>>
>>>>>> I'm trying to explore hacks that will enable me to pull in those
>>>>>> functions during compile time, without duplicating a whole bunch of
>>>>>> code
>>>>>> from the RTS. But it's a fragile business.
>>>>>>
>>>>>> It seems to me that some of these routines have general utility. In
>>>>>> future versions of GHC, could we consider linking in those routines
>>>>>> irrespective of "-threaded"?
>>>>>>
>>>>>
>>>>> We should make the non-THREADED versions EXTERN_INLINE too, so that
>>>>> there will be (empty) functions to call in rts/Inlines.c. Want to submit a
>>>>> patch?
>>>>>
>>>>> A better solution would be to make them into primops. You don't
>>>>> really want to be calling out to a C function to implement a memory
>>>>> barrier. We have this for write_barrier(), but none of the others so far.
>>>>> Of couse that's a larger change.
>>>>>
>>>>> Cheers,
>>>>> Simon
>>>>>
>>>>>
>>>>>
>>>>> ______________________________**_________________
>>>>> ghc-devs mailing list
>>>>> ghc-devs at haskell.org
>>>>> http://www.haskell.org/**mailman/listinfo/ghc-devs<http://www.haskell.org/mailman/listinfo/ghc-devs>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130720/044074af/attachment.htm>
More information about the ghc-devs
mailing list