Sysop: | Amessyroom |
---|---|
Location: | Fayetteville, NC |
Users: | 43 |
Nodes: | 6 (0 / 6) |
Uptime: | 104:32:20 |
Calls: | 290 |
Files: | 905 |
Messages: | 76,619 |
jseigh <jseigh_es00@xemaps.com> writes:
Anybody doing that sort of programming, i.e. lock-free or distributed
algorithms, who can't handle weakly consistent memory models, shouldn't
be doing that sort of programming in the first place.
Do you have any argument that supports this claim.
Fwiw, in C++ std::memory_order_consume is useful for traversing a node
based stack of something in RCU. In most systems it only acts like a
compiler barrier. On the Alpha, it must emit a membar instruction. Iirc,
mb for alpha? Cannot remember that one right now.
Even if the hardware memory
memory model is strongly ordered, compilers can reorder stuff,
so you still have to program as if a weak memory model was in
effect.
Or maybe disable reordering or optimization altogether
for those target architectures.
On 11/15/2024 11:37 PM, Anton Ertl wrote:
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
On 11/15/2024 9:27 AM, Anton Ertl wrote:
jseigh <jseigh_es00@xemaps.com> writes:
Anybody doing that sort of programming, i.e. lock-free or distributed >>>>> algorithms, who can't handle weakly consistent memory models, shouldn't >>>>> be doing that sort of programming in the first place.
Strongly consistent memory won't help incompetence.
Strong words to hide lack of arguments?
For instance, a 100% sequential memory order won't help you with, say,
solving ABA.
Sure, not all problems are solved by sequential consistency, and yes,
it won't solve race conditions like the ABA problem. But jseigh
implied that finding it easier to write correct and efficient code for
sequential consistency than for a weakly-consistent memory model
(e.g., Alphas memory model) is incompetent.
What if you had to write code for a weakly ordered system, and the >performance guidelines said to only use a membar when you absolutely
have to. If you say something akin to "I do everything using >std::memory_order_seq_cst", well, that is a violation right off the bat.
Fair enough?
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:I am trying to say you might not be hired if you only knew how to handle >std::memory_order_seq_cst wrt C++... ?
What if you had to write code for a weakly ordered system, and the
performance guidelines said to only use a membar when you absolutely
have to. If you say something akin to "I do everything using
std::memory_order_seq_cst", well, that is a violation right off the bat. ...
aph@littlepinkcloud.invalid writes:
Yes. That Alpha behaviour was a historic error. No one wants to do
that again.
Was it an actual behaviour of any Alpha for public sale, or was it
just the Alpha specification?
On 11/16/24 16:21, Chris M. Thomasson wrote:
Fwiw, in C++ std::memory_order_consume is useful for traversing a node
based stack of something in RCU. In most systems it only acts like a
compiler barrier. On the Alpha, it must emit a membar instruction. Iirc,
mb for alpha? Cannot remember that one right now.
That got deprecated. Too hard for compilers to deal with. It's now
same as memory_order_acquire.
Which brings up an interesting point. Even if the hardware memory
memory model is strongly ordered, compilers can reorder stuff,
so you still have to program as if a weak memory model was in
effect.
On 11/18/2024 3:34 PM, Chris M. Thomasson wrote:
Don't tell me you want all of std::memory_order_* to default to
std::memory_order_seq_cst? If your on a system that only has seq_cst and
nothing else, okay, but not on other weaker (memory order) systems,
right?
defaulting a relaxed to a seq_cst is a bit much.... ;^o
On 11/14/2024 11:25 PM, Anton Ertl wrote:
aph@littlepinkcloud.invalid writes:
Yes. That Alpha behaviour was a historic error. No one wants to do
that again.
Was it an actual behaviour of any Alpha for public sale, or was it[...]
just the Alpha specification? I certainly think that Alpha's lack
of guarantees in memory ordering is a bad idea, and so is ARM's:
"It's only 32 pages" <YfxXO.384093$EEm7.56154@fx16.iad>. Seriously? Sequential consistency can be specified in one sentence: "The result
of any execution is the same as if the operations of all the
processors were executed in some sequential order, and the
operations of each individual processor appear in this sequence in
the order specified by its program."
Well, iirc, the Alpha is the only system that requires an explicit
membar for a RCU based algorithm. Even SPARC in RMO mode does not
need this. Iirc, akin to memory_order_consume in C++:
https://en.cppreference.com/w/cpp/atomic/memory_order
data dependent loads
aph@littlepinkcloud.invalid writes:
Yes. That Alpha behaviour was a historic error. No one wants to do
that again.
Was it an actual behaviour of any Alpha for public sale, or was it
just the Alpha specification? I certainly think that Alpha's lack of guarantees in memory ordering is a bad idea, and so is ARM's: "It's
only 32 pages" <YfxXO.384093$EEm7.56154@fx16.iad>. Seriously?
Sequential consistency can be specified in one sentence: "The result
of any execution is the same as if the operations of all the
processors were executed in some sequential order, and the operations
of each individual processor appear in this sequence in the order
specified by its program."
However, I don't think that the Alpha architects considered the Alpha
memory ordering to be an error, and probably still don't, just like
the ARM architects don't consider their memory model to be an error.
I am pretty sure that no Alpha implementation ever made use of the
lack of causality in the Alpha memory model, so they could have added causality without outlawing existing implementations. That they did
not indicates that they thought that their memory model was right. An advocacy paper for weak memory models [adve&gharachorloo95] came from
the same place as Alpha, so it's no surprise that Alpha specifies weak consistency.
@TechReport{adve&gharachorloo95,
author = {Sarita V. Adve and Kourosh Gharachorloo},
title = {Shared Memory Consistency Models: A Tutorial},
institution = {Digital Western Research Lab},
year = {1995},
type = {WRL Research Report},
number = {95/7},
annote = {Gives an overview of architectural features of
shared-memory computers such as independent memory
banks and per-CPU caches, and how they make the (for
programmers) most natural consistency model hard to
implement, giving examples of programs that can fail
with weaker consistency models. It then discusses
several categories of weaker consistency models and
actual consistency models in these categories, and
which ``safety net'' (e.g., memory barrier
instructions) programmers need to use to work around
the deficiencies of these models. While the authors
recognize that programmers find it difficult to use
these safety nets correctly and efficiently, it
still advocates weaker consistency models, claiming
that sequential consistency is too inefficient, by
outlining an inefficient implementation (which is of
course no proof that no efficient implementation
exists). Still the paper is a good introduction to
the issues involved.}
}
- anton
Anybody doing that sort of programming, i.e. lock-free or distributed >algorithms, who can't handle weakly consistent memory models, shouldn't
be doing that sort of programming in the first place.
Strongly consistent memory won't help incompetence.
On 11/15/2024 4:05 PM, Chris M. Thomasson wrote:
On 11/15/2024 12:53 PM, BGB wrote:
On 11/15/2024 11:27 AM, Anton Ertl wrote:[...]
jseigh <jseigh_es00@xemaps.com> writes:
Anybody doing that sort of programming, i.e. lock-free or distributed >>>>> algorithms, who can't handle weakly consistent memory models, shouldn't >>>>> be doing that sort of programming in the first place.
Do you have any argument that supports this claim.
Strongly consistent memory won't help incompetence.
Strong words to hide lack of arguments?
In my case, as I see it:
The tradeoff is more about implementation cost, performance, etc.
Weak model:
Cheaper (and simpler) to implement;
Performs better when there is no need to synchronize memory;
Performs worse when there is need to synchronize memory;
...
A TSO from a weak memory model is as it is. It should not necessarily
perform "worse" than other systems that have TSO as a default. The
weaker models give us flexibility. Any weak memory model should be able
to give sequential consistency via using the right membars in the right
places.
The speed difference is mostly that, in a weak model, the L1 cache
merely needs to fetch memory from the L2 or similar, may write to it whenever, and need not proactively store back results.
As I understand it, a typical TSO like model will require, say:
Any L1 cache that wants to write to a cache line, needs to explicitly
request write ownership over that cache line;
Any attempt by other cores to access this line,
may require the L2 cache
to send a message to the core currently holding the cache line for
writing to write back its contents, with the request unable to be
handled until after the second core has written back the dirty cache
line.
This would create potential for significantly more latency in cases
where multiple cores touch the same part of memory; albeit the cores
will see each others' memory stores.
So, initially, weak model can be faster due to not needing any
additional handling.
But... Any synchronization points, such as a barrier or locking or
releasing a mutex, will require manually flushing the cache with a weak model.
And, locking/releasing the mutex itself will require a mechanism
that is consistent between cores (such as volatile atomic swaps or
similar, which may still be weak as a volatile-atomic-swap would still
not be atomic from the POV of the L2 cache; and an MMIO interface could
be stronger here).
Seems like there could possibly be some way to skip some of the cache flushing if one could verify that a mutex is only being locked and
unlocked on a single core.
Issue then is how to deal with trying to lock a mutex which has thus far
been exclusive to a single core. One would need some way for the core
that last held the mutex to know that it needs to perform an L1 cache
flush.
Though, one possibility could be to leave this part to the OS scheduler/syscall/...
mechanism; so the core that wants to lock the
mutex signals its intention to do so via the OS, and the next time the
core that last held the mutex does a syscall (or tries to lock the mutex again), the handler sees this, then performs the L1 flush and flags the
mutex as multi-core safe (at which point, the parties will flush L1s at
each mutex lock, though possibly with a timeout count so that, if the
mutex has been single-core for N locks, it reverts to single-core
behavior).
This could reduce the overhead of "frivolous mutex locking" in programs
that are otherwise single-threaded or single processor (leaving the
cache flushes for the ones that are in-fact being used for
synchronization purposes).
....
On 11/15/2024 9:27 AM, Anton Ertl wrote:
jseigh <jseigh_es00@xemaps.com> writes:
Anybody doing that sort of programming, i.e. lock-free or distributed
algorithms, who can't handle weakly consistent memory models, shouldn't
be doing that sort of programming in the first place.
Strongly consistent memory won't help incompetence.
Strong words to hide lack of arguments?
For instance, a 100% sequential memory order won't help you with, say, >solving ABA.
The tradeoff is more about implementation cost, performance, etc.
Weak model:
Cheaper (and simpler) to implement;
Performs better when there is no need to synchronize memory;
Performs worse when there is need to synchronize memory;
On 11/18/2024 3:20 PM, Chris M. Thomasson wrote:...
On 11/17/2024 11:11 PM, Anton Ertl wrote:
The flaw in the reasoning of the paper was:
|To solve it more easily without floating–point von Neumann had
|transformed equation Bx = c to B^TBx = B^Tc , thus unnecessarily
|doubling the number of sig. bits lost to ill-condition
This is an example of how the supposed gains that the harder-to-use
interface provides (in this case the bits "wasted" on the exponent)
are overcompensated by then having to use a software workaround for
the harder-to-use interface.
Don't tell me you want all of std::memory_order_* to default to >std::memory_order_seq_cst? If your on a system that only has seq_cst and >nothing else, okay, but not on other weaker (memory order) systems, right?
On 11/17/2024 7:17 AM, Anton Ertl wrote:
jseigh <jseigh_es00@xemaps.com> writes:
Or maybe disable reordering or optimization altogether
for those target architectures.
So you want to throw out the baby with the bathwater.
No, keep the weak order systems and not throw them out wrt a system that
is 100% seq_cst? Perhaps? What am I missing here?
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
On 11/18/2024 3:20 PM, Chris M. Thomasson wrote:...
On 11/17/2024 11:11 PM, Anton Ertl wrote:
The flaw in the reasoning of the paper was:
|To solve it more easily without floating–point von Neumann had
|transformed equation Bx = c to B^TBx = B^Tc , thus unnecessarily
|doubling the number of sig. bits lost to ill-condition
This is an example of how the supposed gains that the
harder-to-use interface provides (in this case the bits "wasted"
on the exponent) are overcompensated by then having to use a
software workaround for the harder-to-use interface.
Don't tell me you want all of std::memory_order_* to default to >std::memory_order_seq_cst? If your on a system that only has seq_cst
and nothing else, okay, but not on other weaker (memory order)
systems, right?
I tell anyone who wants to read it to stop buying hardware without FP
for non-integer work, and with weak memory ordering for work that
needs concurrent programming. There are enough affordable offerings
with FP and TSO that we do not need to waste programming time and
increase the frequency of hard-to-find bugs by figuring out how to get
good performance out of hardware without FP hardware and with weak
memory ordering.
Those who enjoy the challenge of dealing with the unnecessary problems
of sub-par hardware can continue to enjoy that.
But when developing production software, as a manager don't let
programmers with this hobby horse influence your hardware and
development decisions. Give full support for FP and TSO hardware, and limited support to weakly-ordered hardware. That limited support may
consist of using software implementations of FP (instead of designing software for fixed point arithmetic). In case of hardware with weak
ordering the limited support could be to use memory barriers liberally (without trying to minimize them at all; every memory barrier
elimination costs development time and increases the potential for hard-to-find bugs), of using OS mechanisms for concurrency (rather
than, e.g., lock-free algorithms), or maybe even only supporting single-threaded operation.
Efficiently-implemented sequentially-consistent hardware would be even
more preferable, and if it was widely available, I would recommend
buying that over TSO hardware, but unfortunately we are not there yet.
- anton
BTW, does your stance means that your are strongly against A64FX ?
Lockless programming is horrendously complicated and error prone.
Sequential consistency removes only small part of potential
complications.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
On 11/17/2024 7:17 AM, Anton Ertl wrote:
jseigh <jseigh_es00@xemaps.com> writes:
Or maybe disable reordering or optimization altogether
for those target architectures.
So you want to throw out the baby with the bathwater.
No, keep the weak order systems and not throw them out wrt a system that
is 100% seq_cst? Perhaps? What am I missing here?
Disabling optimization altogether costs a lot; e.g., look at <http://www.complang.tuwien.ac.at/anton/bentley.pdf>: if you compare
the lines for clang-3.5 -O0 with clang-3.5 -O3, you see a factor >2.5
for the tsp9 program. For gcc-5.2.0 the difference is even bigger.
That's why jseigh and people like him (I have read that suggestion
several times before) love to suggest disabling optimization
altogether. It's a straw man that does not even need beating up. Of
course they usually don't show results for the supposed benefits of
the particular "optimization" they advocate (or the drawbacks of
disabling it), and jseigh follows this pattern nicely.
The compiler is allow to reorder code as long as it knows the
reordering can't be observed or detected.
If there are places
in the code it doesn't know this can't happen it won't optimize
across it, more or less.
If there are places
in the code it doesn't know this can't happen it won't optimize
across it, more or less.
The problem is HOW to TELL the COMPILER that these memory references
are "more special" than normal--when languages give few mechanisms.
If there are places
in the code it doesn't know this can't happen it won't optimize
across it, more or less.
The problem is HOW to TELL the COMPILER that these memory references
are "more special" than normal--when languages give few mechanisms.
We could start with something like
critical_region {
...
}
such that the compiler must refrain from any code motion within
those sections but is free to move things outside of those sections as
if execution was singlethreaded.
Stefan
You identify a second problem. Is it that you don't want code motion
across the boundary or you do not want code motion within the boundary??
If there are places
in the code it doesn't know this can't happen it won't optimize
across it, more or less.
The problem is HOW to TELL the COMPILER that these memory references
are "more special" than normal--when languages give few mechanisms.
We could start with something like
critical_region {
...
}
such that the compiler must refrain from any code motion within
those sections but is free to move things outside of those sections as if execution was singlethreaded.
On Tue, 3 Dec 2024 13:59:18 +0000, jseigh wrote:
The compiler is allow to reorder code as long as it knows the
reordering can't be observed or detected.
With exceptions enabled, this would allow for almost no code
movement at all.
If there are places
in the code it doesn't know this can't happen it won't optimize
across it, more or less.
The problem is HOW to TELL the COMPILER that these memory references
are "more special" than normal--when languages give few mechanisms.
You identify a second problem. Is it that you don't want code motion
across the boundary or you do not want code motion within the boundary??
Concurrency is hard. 🙂
Stefan
mitchalsup@aol.com (MitchAlsup1) writes:
On Tue, 3 Dec 2024 13:59:18 +0000, jseigh wrote:
The compiler is allow to reorder code as long as it knows the
reordering can't be observed or detected.
With exceptions enabled, this would allow for almost no code
movement at all.
If there are places
in the code it doesn't know this can't happen it won't optimize
across it, more or less.
The problem is HOW to TELL the COMPILER that these memory references
are "more special" than normal--when languages give few mechanisms.
C and C++ have the 'volatile' keyword for this purpose.
On Wed, 4 Dec 2024 16:37:41 +0000, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Tue, 3 Dec 2024 13:59:18 +0000, jseigh wrote:
The compiler is allow to reorder code as long as it knows the
reordering can't be observed or detected.
With exceptions enabled, this would allow for almost no code
movement at all.
If there are places
in the code it doesn't know this can't happen it won't optimize
across it, more or less.
The problem is HOW to TELL the COMPILER that these memory references
are "more special" than normal--when languages give few mechanisms.
C and C++ have the 'volatile' keyword for this purpose.
What if you want the volatile attribute only to hold
on an inner block::
{
int i = ...;
... // I is not volitile here
{
... // I is volitile in here
}
... // I is not volitile here
...
}
On 12/4/2024 8:13 AM, jseigh wrote:
On 12/3/24 18:37, Stefan Monnier wrote:
If there are places
in the code it doesn't know this can't happen it won't optimize
across it, more or less.
The problem is HOW to TELL the COMPILER that these memory references
are "more special" than normal--when languages give few mechanisms.
We could start with something like
critical_region {
...
}
such that the compiler must refrain from any code motion within
those sections but is free to move things outside of those sections
as if
execution was singlethreaded.
C/C++11 already defines what lock acquire/release semantics are.
Roughly you can move stuff outside of a critical section into it
but not vice versa.
Java uses synchronized blocks to denote the critical section.
C++ (the society for using RAII for everything) has scoped_lock
if you want to use RAII for your critical section. It's not
always obvious what the actual critical section is. I usually
use it inside its own bracket section to make it more obvious.
{ std::scoped_lock m(mutex);
// .. critical section
}
I'm not a big fan of c/c++ using acquire and release memory order
directives on everything since apart from a few situations it's
not intuitively obvious what they do in all cases. You can
look a compiler assembler output but you have to be real careful
generalizing from what you see.
The release on the unlock can allow some following stores and things to
sort of "bubble up before it?
Acquire and release confines things to the "critical section", the
release can allow for some following things to go above it, so to speak.
This is making me think of Alex over on c.p.t. !
:^)
Did I miss anything? Sorry Joe.
mitchalsup@aol.com (MitchAlsup1) writes:
On Tue, 3 Dec 2024 13:59:18 +0000, jseigh wrote:
The compiler is allow to reorder code as long as it knows the
reordering can't be observed or detected.
With exceptions enabled, this would allow for almost no code
movement at all.
If there are places
in the code it doesn't know this can't happen it won't optimize
across it, more or less.
The problem is HOW to TELL the COMPILER that these memory references
are "more special" than normal--when languages give few mechanisms.
C and C++ have the 'volatile' keyword for this purpose.
scott@slp53.sl.home (Scott Lurndal) writes:
mitchalsup@aol.com (MitchAlsup1) writes:
On Tue, 3 Dec 2024 13:59:18 +0000, jseigh wrote:
The compiler is allow to reorder code as long as it knows the
reordering can't be observed or detected.
With exceptions enabled, this would allow for almost no code
movement at all.
If there are places
in the code it doesn't know this can't happen it won't optimize
across it, more or less.
The problem is HOW to TELL the COMPILER that these memory references
are "more special" than normal--when languages give few mechanisms.
C and C++ have the 'volatile' keyword for this purpose.
A problem with using volatile is that volatile doesn't do what
most people think it does, especially with respect to what
reordering is or is not allowed.
Tim, did you send me a PM to check my email? I responded but then
silence. Could someone be pretending to be you?
On 12/5/2024 5:00 AM, jseigh wrote:
Maybe. For thread local non-shared data if the compiler can make that
determination but I don't know if the actual specs say that.
It would be strange to me if the compiler executed a weaker barrier than
what I said needed to be there. If I say I need a #LoadStore |
#StoreStore here, then the compiler better put that barrier in there.
Humm...
C++ doesn't use #LoadStore, etc... memory ordering terminology. They
use acquire, release, cst, relaxed, ... While in some cases it's straightforward as to what that means, in others it's less obvious.
Non-obvious isn't exactly what you want when writing multi-threaded
code. There's enough subtlety as it is.
On 12/17/2024 4:33 AM, jseigh wrote:
On 12/16/24 16:48, Chris M. Thomasson wrote:
On 12/5/2024 5:00 AM, jseigh wrote:
Maybe. For thread local non-shared data if the compiler can make that >>>> determination but I don't know if the actual specs say that.
It would be strange to me if the compiler executed a weaker barrier
than what I said needed to be there. If I say I need a #LoadStore |
#StoreStore here, then the compiler better put that barrier in there.
Humm...
C++ concurrency was designed by a committee. They try to fit things
into their world view even if reality is a bit more nuanced or complex
than that world view.
Indeed.
C++ doesn't use #LoadStore, etc... memory ordering terminology. They
use acquire, release, cst, relaxed, ... While in some cases it's
straightforward as to what that means, in others it's less obvious.
Non-obvious isn't exactly what you want when writing multi-threaded
code. There's enough subtlety as it is.
Agreed. Humm... The CAS is interesting to me.
atomic_compare_exchange_weak
atomic_compare_exchange_strong
The weak one can fail spuriously... Akin to LL/SC in a sense?
atomic_compare_exchange_weak_explicit
atomic_compare_exchange_strong_explicit
A membar for the success path and one for the failure path. Oh that's
fun. Sometimes I think its better to use relaxed for all of the atomics
and use explicit barriers ala atomic_thread_fence for the order. Well,
that is more in line with the SPARC way of doing things... ;^)
Agreed. Humm... The CAS is interesting to me.
atomic_compare_exchange_weak
atomic_compare_exchange_strong
The weak one can fail spuriously... Akin to LL/SC in a sense?
atomic_compare_exchange_weak_explicit
atomic_compare_exchange_strong_explicit
A membar for the success path and one for the failure path. Oh that's
fun. Sometimes I think its better to use relaxed for all of the atomics
and use explicit barriers ala atomic_thread_fence for the order. Well,
that is more in line with the SPARC way of doing things... ;^)
On 12/4/2024 8:13 AM, jseigh wrote:
On 12/3/24 18:37, Stefan Monnier wrote:
If there are places
in the code it doesn't know this can't happen it won't optimize
across it, more or less.
The problem is HOW to TELL the COMPILER that these memory references
are "more special" than normal--when languages give few mechanisms.
We could start with something like
critical_region {
...
}
such that the compiler must refrain from any code motion within
those sections but is free to move things outside of those sections as
if
execution was singlethreaded.
C/C++11 already defines what lock acquire/release semantics are.
Roughly you can move stuff outside of a critical section into it
but not vice versa.
Java uses synchronized blocks to denote the critical section.
C++ (the society for using RAII for everything) has scoped_lock
if you want to use RAII for your critical section. It's not
always obvious what the actual critical section is. I usually
use it inside its own bracket section to make it more obvious.
{ std::scoped_lock m(mutex);
// .. critical section
}
I'm not a big fan of c/c++ using acquire and release memory order
directives on everything since apart from a few situations it's
not intuitively obvious what they do in all cases. You can
look a compiler assembler output but you have to be real careful
generalizing from what you see.
The release on the unlock can allow some following stores and things to
sort of "bubble up before it?
Acquire and release confines things to the "critical section", the
release can allow for some following things to go above it, so to speak.
This is making me think of Alex over on c.p.t. !
:^)
Did I miss anything? Sorry Joe.
On 12/19/2024 10:33 AM, MitchAlsup1 wrote:
On Thu, 5 Dec 2024 7:44:19 +0000, Chris M. Thomasson wrote:
On 12/4/2024 8:13 AM, jseigh wrote:
On 12/3/24 18:37, Stefan Monnier wrote:
We could start with something likeIf there are places
in the code it doesn't know this can't happen it won't optimize
across it, more or less.
The problem is HOW to TELL the COMPILER that these memory references >>>>>> are "more special" than normal--when languages give few mechanisms. >>>>>
critical_region {
...
}
such that the compiler must refrain from any code motion within
those sections but is free to move things outside of those sections as >>>>> if
execution was singlethreaded.
C/C++11 already defines what lock acquire/release semantics are.
Roughly you can move stuff outside of a critical section into it
but not vice versa.
Java uses synchronized blocks to denote the critical section.
C++ (the society for using RAII for everything) has scoped_lock
if you want to use RAII for your critical section. It's not
always obvious what the actual critical section is. I usually
use it inside its own bracket section to make it more obvious.
{ std::scoped_lock m(mutex);
// .. critical section
}
I'm not a big fan of c/c++ using acquire and release memory order
directives on everything since apart from a few situations it's
not intuitively obvious what they do in all cases. You can
look a compiler assembler output but you have to be real careful
generalizing from what you see.
The release on the unlock can allow some following stores and things to
sort of "bubble up before it?
Acquire and release confines things to the "critical section", the
release can allow for some following things to go above it, so to speak. >>> This is making me think of Alex over on c.p.t. !
This sounds dangerous if the thing allowed to go above it is unCacheable
while the lock:release is cacheable, the cacheable lock can arrive at
another core before the unCacheable store arrives at its destination.
Humm... Need to ponder on that. Wrt the sparc:
membar #LoadStore | #StoreStore
can allow following stores to bubble up before it. If we want to block
that then we would use a #StoreLoad. However, a #StoreLoad is not
required for unlocking a mutex.
On 12/19/2024 10:33 AM, MitchAlsup1 wrote:
On Thu, 5 Dec 2024 7:44:19 +0000, Chris M. Thomasson wrote:
On 12/4/2024 8:13 AM, jseigh wrote:
On 12/3/24 18:37, Stefan Monnier wrote:
We could start with something likeIf there are places
in the code it doesn't know this can't happen it won't optimize
across it, more or less.
The problem is HOW to TELL the COMPILER that these memory references >>>>>> are "more special" than normal--when languages give few mechanisms. >>>>>
critical_region {
...
}
such that the compiler must refrain from any code motion within
those sections but is free to move things outside of those sections as >>>>> if
execution was singlethreaded.
C/C++11 already defines what lock acquire/release semantics are.
Roughly you can move stuff outside of a critical section into it
but not vice versa.
Java uses synchronized blocks to denote the critical section.
C++ (the society for using RAII for everything) has scoped_lock
if you want to use RAII for your critical section. It's not
always obvious what the actual critical section is. I usually
use it inside its own bracket section to make it more obvious.
{ std::scoped_lock m(mutex);
// .. critical section
}
I'm not a big fan of c/c++ using acquire and release memory order
directives on everything since apart from a few situations it's
not intuitively obvious what they do in all cases. You can
look a compiler assembler output but you have to be real careful
generalizing from what you see.
The release on the unlock can allow some following stores and things to
sort of "bubble up before it?
Acquire and release confines things to the "critical section", the
release can allow for some following things to go above it, so to speak. >>> This is making me think of Alex over on c.p.t. !
This sounds dangerous if the thing allowed to go above it is unCacheable
while the lock:release is cacheable, the cacheable lock can arrive at
another core before the unCacheable store arrives at its destination.
Humm... Need to ponder on that. Wrt the sparc:
membar #LoadStore | #StoreStore
can allow following stores to bubble up before it. If we want to block
that then we would use a #StoreLoad. However, a #StoreLoad is not
required for unlocking a mutex.