• Re: Strange asm generated by GCC...

    From aph@littlepinkcloud.invalid@21:1/5 to Chris M. Thomasson on Mon Dec 23 08:35:33 2024
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 12/22/2024 7:49 PM, Chris M. Thomasson wrote:
    On 12/21/2024 2:37 AM, aph@littlepinkcloud.invalid wrote:
    jseigh <jseigh_es00@xemaps.com> wrote:

    I don't see anything that forces a store memory barrier
    on all the fail paths.  I could be missing something.

    Why would there be one? If the store does not take place, there's no
    need for a memory barrier because there's no store for anyone to
    synchronize with. The only effect of a failed weak CAS is a load. If
    you really need a store on failure because of its side effect you can
    always add one.

    Iirc, the membars for the success and failure can be "useful" for
    popping from a lock-free stack. Wrt the C++ API the CAS can give you the
    updated value on a failure. So, there is a load. Depending on what you
    are doing, it might require an acquire.

    Loading the head of the lock-free stack would be an acquire at the start
    of the CAS loop. The CAS can use relaxed for the success and an acquire
    for the failure.

    The para I'm quoting:

    I don't see anything that forces a store memory barrier

    We were talking about the *store barrier* associated with the store.
    There is acquire ordering, regardless of the success or failure of the
    store.

    Andrew.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jseigh@21:1/5 to Chris M. Thomasson on Mon Dec 23 20:16:08 2024
    On 12/23/24 16:20, Chris M. Thomasson wrote:


    Wrt a traditional lock-free stack, I think the store can use relaxed for
    the success path of a CAS.

    For pushing onto a stack, you want release. For popping from a stack
    you want acquire.

    You are probably ok using relaxed loading the old value. It's not
    real clear how aggressive the compiler is allowed to be with relaxed
    loads and stores. To be super safe, you might want to add acquire
    to all your cas loops.

    I would just stick with the compare_exchange w/ 1 memory order
    parameter. The success/fail form is just confusing, the fail
    parameter doesn't do anything.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jseigh@21:1/5 to Chris M. Thomasson on Tue Dec 24 08:26:05 2024
    On 12/23/24 21:35, Chris M. Thomasson wrote:
    On 12/23/2024 5:53 PM, Chris M. Thomasson wrote:
    On 12/23/2024 5:16 PM, jseigh wrote:


    You are probably ok using relaxed loading the old value.  It's not
    real clear how aggressive the compiler is allowed to be with relaxed
    loads and stores.  To be super safe, you might want to add acquire
    to all your cas loops.

    I usually use signal fences in loops w/ relaxed atomics.


    I would just stick with the compare_exchange w/ 1 memory order
    parameter.  The success/fail form is just confusing, the fail
    parameter doesn't do anything.




    Actually, can the acquire be relaxed into a consume?

    Compare_exchange is 2 ops.

    A load which happens on success and fail paths.
    A store which effectively only happens on success path.

    The memory barrier argument is decomposed into the what
    is valid for a load and a store respectively. The 2nd
    memory barrier appears to be redundant.

    So for arm w/o cas

    #include <atomic>

    bool try_add(std::atomic<int>& var, int addend)
    {
    int expected = var.load(std::memory_order_relaxed);
    int update = expected + addend;
    return var.compare_exchange_weak(expected, update, std::memory_order_acq_rel);
    }

    try_add(std::atomic<int>&, int):
    ldr w8, [x0]
    ldaxr w9, [x0]
    cmp w9, w8
    b.ne .LBB0_3
    add w8, w8, w1
    stlxr w9, w8, [x0]
    cbz w9, .LBB0_4
    mov w0, wzr
    ret
    .LBB0_3:
    clrex
    mov w0, wzr
    ret
    .LBB0_4:
    mov w0, #1
    ret

    You can see the load has acquire and the store
    has the release. You'd get the same thing even
    if you used seq_cst.

    Joe Seigh

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)