Proposal: provide cas and barriers symbols even without -threaded

Ryan Newton rrnewton at gmail.com
Sat Jul 20 08:18:35 CEST 2013


Hi Carter,

Yes, SMP.h is where I've copy pasted the duplicate functionality from
(since I can't presently rely on linking the symbols).

Your proposal for the LLVM backend sounds **great**.  But it also is going
to provide additional constraints for getting "atomic-primops" right.
   The goal of atomic-primops is to be a stable Haskell-level interface
into the relevant CAS and fetch-and-add stuff.  The reason this is
important is that one has to be very careful to defeat the GHC optimizer in
all the relevant places and make pointer equality a reliable property.  I
would like to get atomic-primops to work reliably in 7.4, 7.6 [and 7.8] and
have more "native" support in future GHC releases, where maybe the foreign
primops would become unecessary.  (They are a pain and have already exposed
one blocking cabal bug, fixed in upcoming 1.17.)

A couple additional suggestions for the proposal in ticket #7883:

   - we should use more unique symbols than "cas", especially for this
   rewriting trick.  How about "ghc_cas" or something?
   - it would be great to get at least fetch-and-add in addition to CAS and
   barriers
   - if we reliably provide this set of special symbols, libraries like
   atomic-primops may use them in the .cmm and benefit from the CMM->LLVM
   substitutions
   - if we include all the primops I need in GHC proper the previous bullet
   will stop applying ;-)

Cheers,
  -Ryan

P.S. Just as a bit of motivation, here are some recent performance numbers.
 We often wonder about how close our "pure values in a box" approach comes
to efficient lock-free structures.  Well here are some numbers about using
a proper unboxed counter in the Haskell heap, vs using an IORef Int and
atomicModifyIORef':  Up to 100X performance difference on some platforms
for microbenchmarks that hammer a counter:


https://github.com/rrnewton/haskell-lockfree-queue/blob/fb12d1121690553e4f737af258848f279147ea24/AtomicPrimops/DEVLOG.md#20130718-timing-atomic-counter-ops

And here are the performance and scaling advantages of using ChaseLev
(based on atomic-primops), over a traditional pure-in-a-box structure
(IORef Data.Seq). The following are timings of ChaseLev/traditional
respectively on a 32 core westmere:

    fib(42) 1 threads:  21s
    fib(42) 2 threads:  10.1s
    fib(42) 4 threads:  5.2s (100%prod)
    fib(42) 8 threads:  2.7s - 3.2s (100%prod)
    fib(42) 16 threads: 1.28s
    fib(42) 24 threads: 1.85s
    fib(42) 32 threads: 4.8s (high variance)

    (hive) fib(42) 1 threads:  41.8s  (95% prod)
    (hive) fib(42) 2 threads:  25.2s  (66% prod)
    (hive) fib(42) 4 threads:  14.6s  (27% prod, 135GB alloc)
    (hive) fib(42) 8 threads:  17.1s  (26% prod)
    (hive) fib(42) 16 threads: 16.3s  (13% prod)
    (hive) fib(42) 24 threads: 21.2s  (30% prod)
    (hive) fib(42) 32 threads: 29.3s  (33% prod)

And that is WITH the inefficiency of doing a "ccall" on every single atomic
operation.

Notes on parfib performance are here:

https://github.com/rrnewton/haskell-lockfree-queue/blob/d6d3e9eda2a487a5f055b1f51423954bb6b6bdfa/ChaseLev/Test.hs#L158







On Fri, Jul 19, 2013 at 5:05 PM, Carter Schonwald <
carter.schonwald at gmail.com> wrote:

> ryan, the relevant machinery on the C side is here, see
> ./includes/stg/SMP.h :
> https://github.com/ghc/ghc/blob/7cc8a3cc5c2970009b83844ff9cc4e27913b8559/includes/stg/SMP.h
>
> (unless i'm missing something)
>
>
> On Fri, Jul 19, 2013 at 4:53 PM, Carter Schonwald <
> carter.schonwald at gmail.com> wrote:
>
>> Ryan,
>> if you look at line 270, you'll see the CAS is a C call
>> https://github.com/ghc/ghc/blob/95e6865ecf06b2bd80fa737e4fa4a24beaae25c5/rts/PrimOps.cmm#L270
>>
>>
>> What Simon is alluding to is some work I started (but need to finish)
>> http://ghc.haskell.org/trac/ghc/ticket/7883 is the relevant ticket, and
>> I'll need to sort out doing the same on the native code gen too
>>
>> there ARE no write barrier primops, they're baked into the CAS machinery
>> in ghc's rts
>>
>>
>> On Fri, Jul 19, 2013 at 1:02 PM, Ryan Newton <rrnewton at gmail.com> wrote:
>>
>>> Yes, I'd absolutely rather not suffer C call overhead for these
>>> functions (or the CAS functions).  But isn't that how it's done currently
>>> for the casMutVar# primop?
>>>
>>>
>>> https://github.com/ghc/ghc/blob/95e6865ecf06b2bd80fa737e4fa4a24beaae25c5/rts/PrimOps.cmm#L265
>>>
>>> To avoid the overhead, is it necessary to make each primop in-line
>>> rather than out-of-line, or just to get rid of the "ccall"?
>>>
>>> Another reason it would be good to package these with GHC is that I'm
>>> having trouble building robust libraries of foreign primops that work under
>>> all "ways" (e.g. GHCI).  For example, this bug:
>>>
>>>     https://github.com/rrnewton/haskell-lockfree-queue/issues/10
>>>
>>> If I write .cmm code that depends on RTS functionality like
>>> stg_MUT_VAR_CLEAN_info, then it seems to work fine when in compiled mode
>>> (with/without threading, profiling), but I get link errors from GHCI where
>>> these symbols aren't defined.
>>>
>>> I've got a draft of the relevant primops here:
>>>
>>>
>>> https://github.com/rrnewton/haskell-lockfree-queue/blob/master/AtomicPrimops/cbits/primops.cmm
>>>
>>> Which includes:
>>>
>>>    - variants of CAS for MutableArray# and MutableByteArray#
>>>    - fetch-and-add for MutableByteArray#
>>>
>>> Also, there are some tweaks to support the new "ticketed" interface for
>>> safer CAS:
>>>
>>>
>>> http://hackage.haskell.org/packages/archive/atomic-primops/0.3/doc/html/Data-Atomics.html#g:3
>>>
>>> I started adding some of these primops to GHC proper (still as
>>> out-of-line), but not all of them.  I had gone with the foreign primop
>>> route instead...
>>>
>>>    https://github.com/rrnewton/ghc/commits/master
>>>
>>>   -Ryan
>>>
>>> P.S. Where is the write barrier primop?  I don't see it listed in
>>> prelude/primops.txt...
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Jul 19, 2013 at 11:41 AM, Carter Schonwald <
>>> carter.schonwald at gmail.com> wrote:
>>>
>>>> I guess I should find the time to finish the CAS primop work I
>>>> volunteered to do then. Ill look into in a few days.
>>>>
>>>>
>>>> On Friday, July 19, 2013, Simon Marlow wrote:
>>>>
>>>>> On 18/07/13 14:17, Ryan Newton wrote:
>>>>>
>>>>>> The "atomic-primops" library depends on symbols such as
>>>>>> store_load_barrier and "cas", which are defined in SMP.h.  Thus the
>>>>>> result is that if the program is linked WITHOUT "-threaded", the user
>>>>>> gets a linker error about undefined symbols.
>>>>>>
>>>>>> The specific place it's used is in the 'foreign "C"' bits of this
>>>>>> .cmm code:
>>>>>>
>>>>>> https://github.com/rrnewton/**haskell-lockfree-queue/blob/**
>>>>>> 87e63b21b2a6c375e93c30b98c28c1**d04f88781c/AtomicPrimops/**
>>>>>> cbits/primops.cmm<https://github.com/rrnewton/haskell-lockfree-queue/blob/87e63b21b2a6c375e93c30b98c28c1d04f88781c/AtomicPrimops/cbits/primops.cmm>
>>>>>>
>>>>>> I'm trying to explore hacks that will enable me to pull in those
>>>>>> functions during compile time, without duplicating a whole bunch of
>>>>>> code
>>>>>> from the RTS.  But it's a fragile business.
>>>>>>
>>>>>> It seems to me that some of these routines have general utility.  In
>>>>>> future versions of GHC, could we consider linking in those routines
>>>>>> irrespective of "-threaded"?
>>>>>>
>>>>>
>>>>> We should make the non-THREADED versions EXTERN_INLINE too, so that
>>>>> there will be (empty) functions to call in rts/Inlines.c.  Want to submit a
>>>>> patch?
>>>>>
>>>>> A better solution would be to make them into primops.  You don't
>>>>> really want to be calling out to a C function to implement a memory
>>>>> barrier. We have this for write_barrier(), but none of the others so far.
>>>>>  Of couse that's a larger change.
>>>>>
>>>>> Cheers,
>>>>>         Simon
>>>>>
>>>>>
>>>>>
>>>>> ______________________________**_________________
>>>>> ghc-devs mailing list
>>>>> ghc-devs at haskell.org
>>>>> http://www.haskell.org/**mailman/listinfo/ghc-devs<http://www.haskell.org/mailman/listinfo/ghc-devs>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130720/044074af/attachment.htm>


More information about the ghc-devs mailing list