• Re: Strange asm generated by GCC...

    From jseigh@21:1/5 to Chris M. Thomasson on Fri Dec 20 06:55:50 2024
    On 12/19/24 20:02, Chris M. Thomasson wrote:
    On 12/19/2024 4:43 PM, Chris M. Thomasson wrote:
    Why in the world would GCC use an XCHG instruction for the following
    code. The damn XCHG has an implied LOCK prefix! Yikes!

    https://godbolt.org/z/Thxchdcr8
    _______________________
    #include <atomic>

    int main()
    {

         std::atomic<unsigned long> m_state = 0;

         m_state.store(std::memory_order_relaxed);
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    Strange to me that this even compiled at all. I clearly forgot to put in
    a value. aka:

    m_state.store(1, std::memory_order_release);

    Sorry about that. ;^o

    The 2nd parameter has a default value, std::memory_order_cst.
    Your 1st argument was a std::memory_order value which is an
    integer value. So basically

    m.state.store(std::memory_order_relaxed, std::memory_order_cst);

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jseigh@21:1/5 to Chris M. Thomasson on Fri Dec 20 12:46:28 2024
    On 12/19/24 19:43, Chris M. Thomasson wrote:
    Why in the world would GCC use an XCHG instruction for the following
    code. The damn XCHG has an implied LOCK prefix! Yikes!


    Speaking of strange code

    #include <atomic>

    bool test1(std::atomic<int> var, int addend)
    {
    int expected = var.load(std::memory_order_relaxed);
    int update = expected + addend;
    return var.compare_exchange_weak(expected, update, std::memory_order_acq_rel, std::memory_order_seq_cst);
    }

    This is asm for armv8-a clang 9.0.0

    test1(std::atomic<int>, int):
    ldr w8, [x0]
    ldaxr w9, [x0]
    cmp w9, w8
    b.ne .LBB0_3
    add w8, w8, w1
    stlxr w9, w8, [x0]
    cbz w9, .LBB0_4
    mov w0, wzr
    ret
    .LBB0_3:
    clrex
    mov w0, wzr
    ret
    .LBB0_4:
    mov w0, #1
    ret

    I picked a version that just did ll/sc to avoid
    the question of whether a failed CASAL did a store or not.

    I don't see anything that forces a store memory barrier
    on all the fail paths. I could be missing something.

    Joe Seigh

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jseigh@21:1/5 to jseigh on Fri Dec 20 14:59:36 2024
    On 12/20/24 12:46, jseigh wrote:
    On 12/19/24 19:43, Chris M. Thomasson wrote:
    Why in the world would GCC use an XCHG instruction for the following
    code. The damn XCHG has an implied LOCK prefix! Yikes!


    Speaking of strange code

    That should be a ref paramter. I though I updated the pasted code.

    #include <atomic>

    bool test1(std::atomic<int>& var, int addend)
    {
    int expected = var.load(std::memory_order_relaxed);
    int update = expected + addend;
    return var.compare_exchange_weak(expected, update, std::memory_order_acq_rel, std::memory_order_seq_cst);
    }

    test1(std::atomic<int>&, int): //
    @test1(std::atomic<int>&, int)
    ldr w8, [x0]
    ldaxr w9, [x0]
    cmp w9, w8
    b.ne .LBB0_3
    add w8, w8, w1
    stlxr w9, w8, [x0]
    cbz w9, .LBB0_4
    mov w0, wzr
    ret
    .LBB0_3:
    clrex
    mov w0, wzr
    ret
    .LBB0_4:
    mov w0, #1
    ret

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jseigh@21:1/5 to jseigh on Fri Dec 20 15:17:19 2024
    On 12/20/24 14:59, jseigh wrote:
    On 12/20/24 12:46, jseigh wrote:
    On 12/19/24 19:43, Chris M. Thomasson wrote:
    Why in the world would GCC use an XCHG instruction for the following
    code. The damn XCHG has an implied LOCK prefix! Yikes!


    Speaking of strange code

    That should be a ref paramter.  I though I updated the pasted code.

    #include <atomic>

    bool test1(std::atomic<int>& var, int addend)
    {
        int expected = var.load(std::memory_order_relaxed);
        int update = expected + addend;
        return var.compare_exchange_weak(expected, update, std::memory_order_acq_rel, std::memory_order_seq_cst);
    }

    Alright, my bad. I should have double checked the docs.
    It's undefined behavior in this case.

    It seems like the success/failure form of compare_exchange
    is redundant unless you want it for self documentation.

    Joe Seigh

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From aph@littlepinkcloud.invalid@21:1/5 to jseigh on Sat Dec 21 10:37:40 2024
    jseigh <jseigh_es00@xemaps.com> wrote:
    On 12/19/24 19:43, Chris M. Thomasson wrote:
    Why in the world would GCC use an XCHG instruction for the following
    code. The damn XCHG has an implied LOCK prefix! Yikes!


    Speaking of strange code

    #include <atomic>

    bool test1(std::atomic<int> var, int addend)
    {
    int expected = var.load(std::memory_order_relaxed);
    int update = expected + addend;
    return var.compare_exchange_weak(expected, update, std::memory_order_acq_rel, std::memory_order_seq_cst);
    }

    This is asm for armv8-a clang 9.0.0

    test1(std::atomic<int>, int):
    ldr w8, [x0]
    ldaxr w9, [x0]
    cmp w9, w8
    b.ne .LBB0_3
    add w8, w8, w1
    stlxr w9, w8, [x0]
    cbz w9, .LBB0_4
    mov w0, wzr
    ret
    .LBB0_3:
    clrex
    mov w0, wzr
    ret
    .LBB0_4:
    mov w0, #1
    ret

    I picked a version that just did ll/sc to avoid
    the question of whether a failed CASAL did a store or not.

    I don't see anything that forces a store memory barrier
    on all the fail paths. I could be missing something.

    Why would there be one? If the store does not take place, there's no
    need for a memory barrier because there's no store for anyone to
    synchronize with. The only effect of a failed weak CAS is a load. If
    you really need a store on failure because of its side effect you can
    always add one.

    Andrew.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From aph@littlepinkcloud.invalid@21:1/5 to Chris M. Thomasson on Mon Dec 23 08:35:33 2024
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 12/22/2024 7:49 PM, Chris M. Thomasson wrote:
    On 12/21/2024 2:37 AM, aph@littlepinkcloud.invalid wrote:
    jseigh <jseigh_es00@xemaps.com> wrote:

    I don't see anything that forces a store memory barrier
    on all the fail paths.  I could be missing something.

    Why would there be one? If the store does not take place, there's no
    need for a memory barrier because there's no store for anyone to
    synchronize with. The only effect of a failed weak CAS is a load. If
    you really need a store on failure because of its side effect you can
    always add one.

    Iirc, the membars for the success and failure can be "useful" for
    popping from a lock-free stack. Wrt the C++ API the CAS can give you the
    updated value on a failure. So, there is a load. Depending on what you
    are doing, it might require an acquire.

    Loading the head of the lock-free stack would be an acquire at the start
    of the CAS loop. The CAS can use relaxed for the success and an acquire
    for the failure.

    The para I'm quoting:

    I don't see anything that forces a store memory barrier

    We were talking about the *store barrier* associated with the store.
    There is acquire ordering, regardless of the success or failure of the
    store.

    Andrew.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jseigh@21:1/5 to Chris M. Thomasson on Mon Dec 23 20:16:08 2024
    On 12/23/24 16:20, Chris M. Thomasson wrote:


    Wrt a traditional lock-free stack, I think the store can use relaxed for
    the success path of a CAS.

    For pushing onto a stack, you want release. For popping from a stack
    you want acquire.

    You are probably ok using relaxed loading the old value. It's not
    real clear how aggressive the compiler is allowed to be with relaxed
    loads and stores. To be super safe, you might want to add acquire
    to all your cas loops.

    I would just stick with the compare_exchange w/ 1 memory order
    parameter. The success/fail form is just confusing, the fail
    parameter doesn't do anything.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jseigh@21:1/5 to Chris M. Thomasson on Tue Dec 24 08:26:05 2024
    On 12/23/24 21:35, Chris M. Thomasson wrote:
    On 12/23/2024 5:53 PM, Chris M. Thomasson wrote:
    On 12/23/2024 5:16 PM, jseigh wrote:


    You are probably ok using relaxed loading the old value.  It's not
    real clear how aggressive the compiler is allowed to be with relaxed
    loads and stores.  To be super safe, you might want to add acquire
    to all your cas loops.

    I usually use signal fences in loops w/ relaxed atomics.


    I would just stick with the compare_exchange w/ 1 memory order
    parameter.  The success/fail form is just confusing, the fail
    parameter doesn't do anything.




    Actually, can the acquire be relaxed into a consume?

    Compare_exchange is 2 ops.

    A load which happens on success and fail paths.
    A store which effectively only happens on success path.

    The memory barrier argument is decomposed into the what
    is valid for a load and a store respectively. The 2nd
    memory barrier appears to be redundant.

    So for arm w/o cas

    #include <atomic>

    bool try_add(std::atomic<int>& var, int addend)
    {
    int expected = var.load(std::memory_order_relaxed);
    int update = expected + addend;
    return var.compare_exchange_weak(expected, update, std::memory_order_acq_rel);
    }

    try_add(std::atomic<int>&, int):
    ldr w8, [x0]
    ldaxr w9, [x0]
    cmp w9, w8
    b.ne .LBB0_3
    add w8, w8, w1
    stlxr w9, w8, [x0]
    cbz w9, .LBB0_4
    mov w0, wzr
    ret
    .LBB0_3:
    clrex
    mov w0, wzr
    ret
    .LBB0_4:
    mov w0, #1
    ret

    You can see the load has acquire and the store
    has the release. You'd get the same thing even
    if you used seq_cst.

    Joe Seigh

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)