Forum: Too Lazy BBS

acquire + sleep + async

From fir@profesor.fir@gmail.com to comp.lang.c on Wed Jun 10 13:53:59 2026

From Newsgroup: comp.lang.c

im not experienced with multithreading programing (though i learned a
bit of it 20 years ago i decided it is not nice for me enough to do it)

i only tend to do things that seem reasonable to me and i have some doubts

hovever in my small view on things i may say that

I.
sleep() function
(who puts cpu core on sleep given miliseconds (or microseconds)
(by put on sleep i also thing it just chnges the context of execution
to anuther thread for given time eventually) I FIND REASONABLE
(i see no problem with that)

II.
async call of function (it is a call to some function that spawns a
thread, so main thread continues but the second spawned on that function executes also until it find some of async end when its job is done
I ALSO FIND reasoneble (no problem with that)

III.

there is a third thing...i name it acquire ..it is some kind os assembly instruction or pair of assembly instustions with some guarantees

i mean acquire(x) should work if(x==0) x=TID (thread id or something
like that) so its like conditional move mov eax, tid; movz x, eax;
(where movz would mean here mov if x is 0 if not set a cpu flag)

and this should be proof of core crashes on this acces i mean only one
can acquire (im not sure if present cpu have it - that probably should
have - becouse it makes such operation very cheap then)

NOTE im meybe even less sura about this acquire becouse theoretically
probably such acquireless (and maybe even mostly sleepless) programming
is possible but im not sure if it wouldnt be handy too

THOSE 3 "PRIMITIVES" imo are quite A SET to make multithreaded
programming "by hand".i also think it should be most lightweight,
i mean async call should be very light probbly (and probbaly it is its a metter of line of assembly or two i hope)...same with sleep (as in fact
sleep and async call are close things

so i would hope that should be just maybe a set of few assebly opcodes
(if i not mislooked and something heavvier is needed)

conclusion is what i already said it seem that those 3 thing
(primitives) allows to do a lot of multithreading without bloated
libraries i hope

eventually correct me if im wrong

(also not to say they are best way of doing multithreading but if they
are lightweight it is at least minimally reasonable in some way it seems)

--- Synchronet 3.22a-Linux NewsLink 1.2

From fir@profesor.fir@gmail.com to comp.lang.c on Wed Jun 10 16:03:28 2026

From Newsgroup: comp.lang.c

fir pisze:

im not experienced with multithreading programing (though i learned a
bit of it 20 years ago i decided it is not nice for me enough to do it)

i only tend to do things that seem reasonable to me and i have some doubts

hovever in my small view on things i may say that

I.
-asleep() function
(who puts cpu core on sleep given miliseconds (or microseconds)
(by put on sleep i also thing it just chnges the context of execution
to anuther thread for given time eventually) I FIND REASONABLE
(i see no problem with that)

II.
async call of function (it is a call to some function that spawns a
thread, so main thread continues but the second spawned on that function executes also until it find some of async end when its job is done
I ALSO FIND reasoneble (no problem with that)

III.

there is a third thing...i name it acquire ..it is some kind os assembly instruction or pair of assembly instustions with some guarantees

i mean acquire(x) should work if(x==0) x=TID (thread id or something
like that) so its like conditional move-a-a mov eax, tid; movz x, eax;
(where movz would mean here mov if x is 0 if not set a cpu flag)

and this should be proof of core crashes on this acces i mean only one
can acquire (im not sure if present cpu have it - that probably should
have - becouse it makes such operation very cheap then)

NOTE im meybe even less sura about this acquire becouse theoretically probably such acquireless (and maybe even mostly sleepless) programming
is possible but im not sure if it wouldnt be handy too

THOSE 3 "PRIMITIVES" imo are quite A SET to make multithreaded
programming "by hand".i also think it should be most lightweight,
i mean async call should be very light probbly (and probbaly it is its a metter of line of assembly or two i hope)...same with sleep (as in fact sleep and async call are close things

so i would hope that should be just maybe a set of few assebly opcodes
(if i not mislooked and something heavvier is needed)

conclusion is what i already said it seem that those 3 thing
(primitives) allows to do a lot of multithreading without bloated
libraries i hope

eventually correct me if im wrong

(also not to say they are best way of doing multithreading but if they
are lightweight it is at least minimally reasonable in some way it seems)

btw if some would like to implement some common pattern

if(acquire(x)) //or acquire(x) {} else {}

{
//do something
release(x);
}
else
{
sleep(0.01);
continue;
}

then a need for "continue" appears - which i find rather okay

imo this

if(x<10)
{
x++; continue;
}

probably looks better than while()
(though word "continue" is not most fortunate maybe it should be more
like repeat
(though it all should be rethinked meybe there is opportunity to born
beter form of loop here at all
--- Synchronet 3.22a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Jun 11 16:44:46 2026

From Newsgroup: comp.lang.c

Am 11.06.2026 um 13:39 schrieb David Brown:

There are stackless coroutine libraries for C that work entirely from
the pre-processor, with similar limitations to the C++ coroutines (like
not being able to yield from normal functions called from the
coroutine).-a There are also stackful coroutine libraries, which are more flexible but have higher overhead.

Yes, ... from the pre-processor; how comfortable is the debugging with
that ? Forget it.

C coroutine solutions require more manual coding than C++ for tracking
local variables that must be preserved between calls/yields, but on the other hand they don't require the dog's breakfast of boilerplate classes
and code that C++ coroutines need.

In the end it's less work in C++ and the debugging is comfortable at
least with VC++, i.e. you can inspect the coroutine frame when the
coroutine is not running (CLion can't do that).

It is still an astoundingly ignorant claim.-a If you had said C had not changed much since 1999, it would be reasonable - though C23 has quite a number of new features.-a And even then it would only be in reference to
the C standards - the C ecosystem has changed enormously.

It nearly hasn't changed if you compare that to the evolution of C++.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Thu Jun 11 14:18:09 2026

From Newsgroup: comp.lang.c

On 6/10/2026 9:40 PM, Lawrence DrCOOliveiro wrote:

On Wed, 10 Jun 2026 13:53:59 +0200, fir wrote:

im not experienced with multithreading programing ...

It does tend to be error-prone.

ThatrCOs why it is best reserved for CPU-intensive tasks that can
benefit from running a bunch of cores at once.

For cases where the bottleneck is in the I/O or the network connection
(which is a lot of them), threading is typically unnecessary. Instead,
the popular approach nowadays is to use coroutines.

Not sure why you say that. Threads work perfectly fine with I/O, network connections, ect... Do you think sync between processes is easier?

<https://developer.mozilla.org/en-US/docs/Learn_web_development/Extensions/Async_JS/Introducing>
<https://docs.python.org/3/library/asyncio.html>

--- Synchronet 3.22a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Thu Jun 11 15:56:20 2026

From Newsgroup: comp.lang.c

On 6/11/2026 12:45 AM, Lawrence DrCOOliveiro wrote:

On Thu, 11 Jun 2026 09:21:13 +0200, Bonita Montero wrote:

C hasn't been improved much since 1973. You still stick with the
same lowlevel view.

Actually, just as you can use threading APIs with C, you can use
stackful coroutine libraries as well.

Stackless ones are another matter.

Want to go to the fiber level? I have been there done that. Fibers
riding threads... Fine, but you have to know what you are doing.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Thu Jun 11 16:01:47 2026

From Newsgroup: comp.lang.c

On 6/11/2026 1:44 AM, fir wrote:

Bonita Montero pisze:

Am 11.06.2026 um 09:52 schrieb David Brown:

There are no coroutines in the C standard.-a There are a wide variety
of coroutine implementations in C, with different mixtures of
features, limitations, efficiencies, portability and requirements.-a...

Real coroutines aren't possible with C since with native coroutines you
need functions that can be suspended and resumed.

c lasks some low lewel core management imo -it is more visible in
multicore times..but even in old times there was something like
interrupts - who allowed theoretically but probably also practically
freeze a branch and resume it - so somemechanics of multthreading
was even on 1core machines - i remember it was in 8-bit c64 really

C hasn't been improved much since 1973.

What an astoundingly ignorant claim.

Compared to C++ that's true.

Wrt per-core, well, in user mode we can try to use affinity masks for
that. But! Its not guaranteed. So, per thread is pretty low level, per
fibers on threads is a, well, kind of a different story.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Fri Jun 12 00:12:48 2026

From Newsgroup: comp.lang.c

On Thu, 11 Jun 2026 14:18:09 -0700, Chris M. Thomasson wrote:

On 6/10/2026 9:40 PM, Lawrence DrCOOliveiro wrote:

For cases where the bottleneck is in the I/O or the network
connection (which is a lot of them), threading is typically
unnecessary. Instead, the popular approach nowadays is to use
coroutines.

Not sure why you say that.

It does tend to be error-prone.

ThatrCOs why it is best reserved for CPU-intensive tasks that can
benefit from running a bunch of cores at once.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Fri Jun 12 00:13:38 2026

From Newsgroup: comp.lang.c

On Thu, 11 Jun 2026 11:38:23 +0200, Bonita Montero wrote:

Coroutines are the most elegant way to handle finite state machines.
They need explicit language support and can't be handled exclsuively
with libraries.

Stackful coroutines ... I donrCOt see why not.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Fri Jun 12 00:14:08 2026

From Newsgroup: comp.lang.c

On Thu, 11 Jun 2026 10:28:55 +0200, Bonita Montero wrote:

Real coroutines aren't possible with C since with native coroutines
you need functions that can be suspended and resumed.

Just think of them as non-preemptive threads.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Fri Jun 12 01:32:35 2026

From Newsgroup: comp.lang.c

On Thu, 11 Jun 2026 11:36:12 +0200, Bonita Montero wrote:

Coroutines have nothing to do with multithreaded programming, but
they can be used to have sth. more lightweight than threading.

I donrCOt know whatrCOs rCLheavyweightrCY about threading, beyond the stack allocations. You have exactly the same thing in (stackful) coroutines.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Fri Jun 12 01:33:39 2026

From Newsgroup: comp.lang.c

On Thu, 11 Jun 2026 16:01:47 -0700, Chris M. Thomasson wrote:

... per fibers on threads is a, well, kind of a different story.

Are these rCLfibrerCY things just some kind of runtime abstraction built
on top of OS threads?
--- Synchronet 3.22a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Thu Jun 11 21:05:29 2026

From Newsgroup: comp.lang.c

On 6/11/2026 6:33 PM, Lawrence DrCOOliveiro wrote:

On Thu, 11 Jun 2026 16:01:47 -0700, Chris M. Thomasson wrote:

... per fibers on threads is a, well, kind of a different story.

Are these rCLfibrerCY things just some kind of runtime abstraction built
on top of OS threads?

Basically. You need to make your own scheduler for the fiber on the
threads. Fwiw:

https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-createfiberex
--- Synchronet 3.22a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Thu Jun 11 21:06:08 2026

From Newsgroup: comp.lang.c

On 6/11/2026 5:12 PM, Lawrence DrCOOliveiro wrote:

On Thu, 11 Jun 2026 14:18:09 -0700, Chris M. Thomasson wrote:

On 6/10/2026 9:40 PM, Lawrence DrCOOliveiro wrote:

For cases where the bottleneck is in the I/O or the network
connection (which is a lot of them), threading is typically
unnecessary. Instead, the popular approach nowadays is to use
coroutines.

Not sure why you say that.

It does tend to be error-prone.

Well, shit happens. :^)

ThatrCOs why it is best reserved for CPU-intensive tasks that can
benefit from running a bunch of cores at once.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Thu Jun 11 21:11:13 2026

From Newsgroup: comp.lang.c

On 6/11/2026 4:39 AM, David Brown wrote:
[...]

C coroutine solutions require more manual coding than C++ for tracking
local variables that must be preserved between calls/yields, but on the other hand they don't require the dog's breakfast of boilerplate classes
and code that C++ coroutines need.

[...]

FWIW, check this shit out. Coroutines can be emulated even in BASIC.
This is a recursive stack here. Its all manual... :^)

100 REM ct_vfield_applesoft_basic
110 HOME
120 HGR: HCOLOR = 3: VTAB 22
130 PRINT "ct_vfield_applesoft_basic"
140 GOSUB 1000
150 GOSUB 3000
160 SP = 0
170 RS(SP, 0) = 0
180 RS(SP, 1) = -1
190 RS(SP, 2) = 0
200 RS(SP, 3) = 1
210 RS(SP, 4) = 0
220 GOSUB 8000
230 V1(1) = 0: V1(2) = 0: V1(3) = 1: V1(4) = 128
240 GOSUB 6000
245 PRINT "Chris Thomasson's Koch Complete!"
250 END

1000 REM ct_init
1010 PRINT "ct_init"
1020 DIM A0(6)
1030 DIM V0(4)
1040 DIM V1(4)
1050 DIM V2(4)
1060 DIM V3(4)
1070 DIM V4(4)
1080 DIM V5(4)
1090 RN = 3
1100 DIM RS(RN, 16)
1110 GOSUB 2000
1120 RETURN

2000 REM ct_init_plane
2010 PRINT "ct_init_plane"
2020 A0(1) = 279: REM m_plane.m_width
2030 A0(2) = 191: REM m_plane.m_height
2040 A0(3) = 0.0126106: REM m_plane.m_xstep
2050 A0(4) = 0.0126316: REM m_plane.m_ystep
2060 A0(5) = -1.75288: REM m_plane.m_axes.m_xmin
2070 A0(6) = 1.2: REM m_plane.m_axes.m_ymax
2080 RETURN

3000 REM ct_display_plane
3010 PRINT "ct_display_plane"
3020 FOR I0 = 1 TO 6
3030 PRINT "A0("; I0; ") = " A0(I0)
3040 NEXT I0
3050 RETURN

4000 REM ct_project_point
4010 REM PRINT "ct_project_point"
4020 V0(3) = (V0(1) - A0(5)) / A0(3)
4030 V0(4) = (A0(6) - V0(2)) / A0(4)
4040 IF V0(3) < 0 THEN V0(3) = INT(V0(3) - .5)
4050 IF V0(3) >= 0 THEN V0(3) = INT(V0(3) + .5)
4060 IF V0(4) < 0 THEN V0(4) = INT(V0(4) - .5)
4070 IF V0(4) >= 0 THEN V0(4) = INT(V0(4) + .5)
4080 RETURN

5000 REM ct_plot_point
5010 REM PRINT "ct_plot_point"
5020 GOSUB 4000
5030 IF V0(3) > -1 AND V0(3) <= A0(1) AND V0(4) > -1 AND V0(4) <=
A0(2) THEN HPLOT V0(3), V0(4)
5040 RETURN

6000 REM ct_plot_circle
6010 PRINT "ct_plot_circle"
6020 AB = 6.28318 / V1(4)
6030 FOR I1 = 0 TO 6.28318 STEP AB
6040 V0(1) = V1(1) + COS(I1) * V1(3)
6050 V0(2) = V1(2) + SIN(I1) * V1(3)
6060 GOSUB 5000
6070 NEXT I1
6080 RETURN

7000 REM ct_plot_line
7010 PRINT "ct_plot_line"
7020 V0(1) = V5(1): V0(2) = V5(2)
7030 GOSUB 4000
7040 IF V0(3) < 0 THEN V0(3) = 0
7050 IF V0(3) > A0(1) THEN V0(3) = A0(1)
7060 IF V0(4) < 0 THEN V0(4) = 0
7070 IF V0(4) > A0(2) THEN V0(4) = A0(2)
7080 HPLOT V0(3), V0(4)
7090 V0(1) = V5(3): V0(2) = V5(4)
7100 GOSUB 4000
7110 IF V0(3) < 0 THEN V0(3) = 0
7120 IF V0(3) > A0(1) THEN V0(3) = A0(1)
7130 IF V0(4) < 0 THEN V0(4) = 0
7140 IF V0(4) > A0(2) THEN V0(4) = A0(2)
7150 HPLOT TO V0(3), V0(4)
7160 RETURN

8000 REM ct_koch
8010 IF RS(SP, 0) >= RN THEN RETURN
8020 PRINT "ct_koch = "; RS(SP, 0); " "; RS(SP, 1); " "; RS(SP, 2);
" "; RS(SP, 3); " "; RS(SP, 4)"
8030 RS(SP, 5) = RS(SP, 3) - RS(SP, 1) : REM difx
8040 RS(SP, 6) = RS(SP, 4) - RS(SP, 2) : REM dify
8050 RS(SP, 7) = RS(SP, 1) + RS(SP, 5) / 2 : REM dify
8060 RS(SP, 8) = RS(SP, 2) + RS(SP, 6) / 2 : REM dify
8070 RS(SP, 9) = -RS(SP, 6) : REM perpx
8080 RS(SP, 10) = RS(SP, 5) : REM perpy
8090 RS(SP, 11) = RS(SP, 7) + RS(SP, 9) / 3 : REM tipx
8100 RS(SP, 12) = RS(SP, 8) + RS(SP, 10) / 3 : REM tipy
8110 RS(SP, 13) = RS(SP, 1) + RS(SP, 5) / 3 : REM k0x
8120 RS(SP, 14) = RS(SP, 2) + RS(SP, 6) / 3 : REM k0y
8130 RS(SP, 15) = RS(SP, 3) - RS(SP, 5) / 3 : REM k1x
8140 RS(SP, 16) = RS(SP, 4) - RS(SP, 6) / 3 : REM k1y

8145 IF RS(SP, 0) < RN - 1 GOTO 8230
8150 V5(1) = RS(SP, 1): V5(2) = RS(SP, 2): V5(3) = RS(SP, 13): V5(4)
= RS(SP, 14)
8160 GOSUB 7000
8170 V5(1) = RS(SP, 13): V5(2) = RS(SP, 14): V5(3) = RS(SP, 11):
V5(4) = RS(SP, 12)
8180 GOSUB 7000
8190 V5(1) = RS(SP, 11): V5(2) = RS(SP, 12): V5(3) = RS(SP, 15):
V5(4) = RS(SP, 16)
8200 GOSUB 7000
8210 V5(1) = RS(SP, 15): V5(2) = RS(SP, 16): V5(3) = RS(SP, 3):
V5(4) = RS(SP, 4)
8220 GOSUB 7000

8230 REM line 0
8240 SP = SP + 1
8250 RS(SP, 0) = RS(SP - 1, 0) + 1
8260 RS(SP, 1) = RS(SP - 1, 1)
8270 RS(SP, 2) = RS(SP - 1, 2)
8280 RS(SP, 3) = RS(SP - 1, 13)
8290 RS(SP, 4) = RS(SP - 1, 14)
8300 GOSUB 8000
8310 SP = SP - 1
8320 REM line 1
8330 SP = SP + 1
8340 RS(SP, 0) = RS(SP - 1, 0) + 1
8350 RS(SP, 1) = RS(SP - 1, 13)
8360 RS(SP, 2) = RS(SP - 1, 14)
8370 RS(SP, 3) = RS(SP - 1, 11)
8380 RS(SP, 4) = RS(SP - 1, 12)
8390 GOSUB 8000
8400 SP = SP - 1
8410 REM line 2
8420 SP = SP + 1
8430 RS(SP, 0) = RS(SP - 1, 0) + 1
8440 RS(SP, 1) = RS(SP - 1, 11)
8450 RS(SP, 2) = RS(SP - 1, 12)
8460 RS(SP, 3) = RS(SP - 1, 15)
8470 RS(SP, 4) = RS(SP - 1, 16)
8480 GOSUB 8000
8490 SP = SP - 1
8500 REM line 3
8510 SP = SP + 1
8520 RS(SP, 0) = RS(SP - 1, 0) + 1
8530 RS(SP, 1) = RS(SP - 1, 15)
8540 RS(SP, 2) = RS(SP - 1, 16)
8550 RS(SP, 3) = RS(SP - 1, 3)
8560 RS(SP, 4) = RS(SP - 1, 4)
8570 GOSUB 8000
8580 SP = SP - 1
8590 RETURN
--- Synchronet 3.22a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Thu Jun 11 21:16:48 2026

From Newsgroup: comp.lang.c

On 6/11/2026 5:14 PM, Lawrence DrCOOliveiro wrote:

On Thu, 11 Jun 2026 10:28:55 +0200, Bonita Montero wrote:

Real coroutines aren't possible with C since with native coroutines
you need functions that can be suspended and resumed.

Just think of them as non-preemptive threads.

windows has a fairly nice way to get at them, that is if you want them
at all...

https://learn.microsoft.com/en-us/windows/win32/procthread/fibers

https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-createfiberex

https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-convertthreadtofiberex

--- Synchronet 3.22a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Jun 12 06:16:42 2026

From Newsgroup: comp.lang.c

Am 12.06.2026 um 03:32 schrieb Lawrence DrCOOliveiro:

I donrCOt know whatrCOs rCLheavyweightrCY about threading, beyond the stack allocations. ...

Yes, the allocation of the stack is very expensive. And the
context-switch between threads is a magnitudes more expensive
than switching between two coroutine conexts.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Thu Jun 11 21:19:45 2026

From Newsgroup: comp.lang.c

On 6/11/2026 9:16 PM, Bonita Montero wrote:

Am 12.06.2026 um 03:32 schrieb Lawrence DrCOOliveiro:

I donrCOt know whatrCOs rCLheavyweightrCY about threading, beyond the stack >> allocations. ...

Yes, the allocation of the stack is very expensive. And the
context-switch between threads is a magnitudes more expensive
than switching between two coroutine conexts.

The fibers float along the threads... ;^)

Anyway, its been a while since I used them. Iiic it was for a sorting algo.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Jun 12 13:15:59 2026

From Newsgroup: comp.lang.c

Am 12.06.2026 um 06:19 schrieb Chris M. Thomasson:

The fibers float along the threads... ;^)

If you have as many fibres as you otherwise would have coroutines
and the number is lage you waste a lot of memory.
The only difference is that with fibers you can switch the context
from any function.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Fri Jun 12 23:52:32 2026

From Newsgroup: comp.lang.c

On Fri, 12 Jun 2026 06:16:42 +0200, Bonita Montero wrote:

Am 12.06.2026 um 03:32 schrieb Lawrence DrCOOliveiro:

I donrCOt know whatrCOs rCLheavyweightrCY about threading, beyond the stack >> allocations. ...

Yes, the allocation of the stack is very expensive. And the
context-switch between threads is a magnitudes more expensive than
switching between two coroutine conexts.

Why should that be? What extra overhead is there in context-switching
between preemptive threads, versus non-preemptive ones?
--- Synchronet 3.22a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Fri Jun 12 23:54:57 2026

From Newsgroup: comp.lang.c

On Thu, 11 Jun 2026 21:05:29 -0700, Chris M. Thomasson wrote:

On 6/11/2026 6:33 PM, Lawrence DrCOOliveiro wrote: > > Are these
rCLfibrerCY things just some kind of runtime abstraction > built on top
of OS threads? Basically. You need to make your own scheduler for the
fiber on the threads.

Been there, done that. Actually, the OS concerned (VAX/VMS) didnrCOt
even support threads at the time: but it had asynchronous I/O and
timer completion routines, and I was able to build a fully-preemptive
threading model on top of that.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Sat Jun 13 00:01:21 2026

From Newsgroup: comp.lang.c

On Thu, 11 Jun 2026 21:11:13 -0700, Chris M. Thomasson wrote:

FWIW, check this shit out. Coroutines can be emulated even in BASIC.
This is a recursive stack here. Its all manual... :^)

[code omitted]

I see a lot of rCLGOSUB 8000rCY within the subroutine starting at line
8000, so thatrCOs a lot of recursion, not coroutines. Plus yourCOve got
the explicit RS array stack for assembling the components of the
curve. Normally you would either have either recursion or an explicit
stack, since each one subsumes the functions of the other, so there
shouldnrCOt be a need for both.

In other words, I think yourCOve got the worst of both worlds? Recursion
that isnrCOt enough to solve the problem purely recursively, plus an
explicit stack that isnrCOt enough to keep track of the entire stack
state?
--- Synchronet 3.22a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Sat Jun 13 05:05:23 2026

From Newsgroup: comp.lang.c

Am 13.06.2026 um 01:52 schrieb Lawrence DrCOOliveiro:

On Fri, 12 Jun 2026 06:16:42 +0200, Bonita Montero wrote:

Yes, the allocation of the stack is very expensive. And the
context-switch between threads is a magnitudes more expensive than
switching between two coroutine conexts.

Why should that be? What extra overhead is there in context-switching
between preemptive threads, versus non-preemptive ones?

Because with threading the context-switch happens inside the kernel.
--- Synchronet 3.22a-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Sat Jun 13 14:52:31 2026

From Newsgroup: comp.lang.c

Bonita Montero <Bonita.Montero@gmail.com> writes:

Am 13.06.2026 um 01:52 schrieb Lawrence DrCOOliveiro:

On Fri, 12 Jun 2026 06:16:42 +0200, Bonita Montero wrote:

Yes, the allocation of the stack is very expensive. And the
context-switch between threads is a magnitudes more expensive than
switching between two coroutine conexts.

Why should that be? What extra overhead is there in context-switching
between preemptive threads, versus non-preemptive ones?

Because with threading the context-switch happens inside the kernel.

Some implementations of threading context-switch in the kernel. That's
not true of all implementations.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Sat Jun 13 13:46:55 2026

From Newsgroup: comp.lang.c

On 6/12/2026 5:01 PM, Lawrence DrCOOliveiro wrote:

On Thu, 11 Jun 2026 21:11:13 -0700, Chris M. Thomasson wrote:

FWIW, check this shit out. Coroutines can be emulated even in BASIC.
This is a recursive stack here. Its all manual... :^)

[code omitted]

I see a lot of rCLGOSUB 8000rCY within the subroutine starting at line
8000, so thatrCOs a lot of recursion, not coroutines. Plus yourCOve got
the explicit RS array stack for assembling the components of the
curve. Normally you would either have either recursion or an explicit
stack, since each one subsumes the functions of the other, so there shouldnrCOt be a need for both.

In other words, I think yourCOve got the worst of both worlds? Recursion
that isnrCOt enough to solve the problem purely recursively, plus an
explicit stack that isnrCOt enough to keep track of the entire stack
state?

Its a recursive koch fractal using AppleSoft basic. Run it on an
emulator. Say, https://www.calormen.com/jsbasic/ ?
--- Synchronet 3.22a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Sun Jun 14 00:32:40 2026

From Newsgroup: comp.lang.c

On Sat, 13 Jun 2026 13:46:55 -0700, Chris M. Thomasson wrote:

Its a recursive koch fractal using AppleSoft basic.

Yes, that was pretty clear. As was the fact that you were able to get
it working, not *because of* your choice of BASIC to write it in, but
*in spite of* that.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Tue Jun 16 00:12:43 2026

From Newsgroup: comp.lang.c

On Sat, 13 Jun 2026 05:05:23 +0200, Bonita Montero wrote:

Am 13.06.2026 um 01:52 schrieb Lawrence DrCOOliveiro:

On Fri, 12 Jun 2026 06:16:42 +0200, Bonita Montero wrote:

Yes, the allocation of the stack is very expensive. And the
context-switch between threads is a magnitudes more expensive than
switching between two coroutine conexts.

Why should that be? What extra overhead is there in
context-switching between preemptive threads, versus non-preemptive
ones?

Because with threading the context-switch happens inside the kernel.

Microsoft Windows problems again?
--- Synchronet 3.22a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Tue Jun 16 12:22:24 2026

From Newsgroup: comp.lang.c

On 6/15/2026 5:12 PM, Lawrence DrCOOliveiro wrote:

On Sat, 13 Jun 2026 05:05:23 +0200, Bonita Montero wrote:

Am 13.06.2026 um 01:52 schrieb Lawrence DrCOOliveiro:

On Fri, 12 Jun 2026 06:16:42 +0200, Bonita Montero wrote:

Yes, the allocation of the stack is very expensive. And the
context-switch between threads is a magnitudes more expensive than
switching between two coroutine conexts.

Why should that be? What extra overhead is there in
context-switching between preemptive threads, versus non-preemptive
ones?

Because with threading the context-switch happens inside the kernel.

Microsoft Windows problems again?

Microsoft Windows problems? What do you mean? preemptive threads say
POSIX threads are going to have the same issues. Right?
--- Synchronet 3.22a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Tue Jun 16 12:23:32 2026

From Newsgroup: comp.lang.c

On 6/13/2026 5:32 PM, Lawrence DrCOOliveiro wrote:

On Sat, 13 Jun 2026 13:46:55 -0700, Chris M. Thomasson wrote:

Its a recursive koch fractal using AppleSoft basic.

Yes, that was pretty clear. As was the fact that you were able to get
it working, not *because of* your choice of BASIC to write it in, but
*in spite of* that.

It has current stack space. So, with a little work it should be workable
for continuations.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Tue Jun 16 12:25:12 2026

From Newsgroup: comp.lang.c

On 6/11/2026 9:06 PM, Chris M. Thomasson wrote:

On 6/11/2026 5:12 PM, Lawrence DrCOOliveiro wrote:

On Thu, 11 Jun 2026 14:18:09 -0700, Chris M. Thomasson wrote:

On 6/10/2026 9:40 PM, Lawrence DrCOOliveiro wrote:

For cases where the bottleneck is in the I/O or the network
connection (which is a lot of them), threading is typically
unnecessary. Instead, the popular approach nowadays is to use
coroutines.

Not sure why you say that.

It does tend to be error-prone.

Well, shit happens. :^)

Error prone... Well, C is not the lang that takes the corks off the forks:

(Dirty Rotten Scoundrels (1988) - Dinner With Ruprecht Scene (6/12) | Movieclips)

https://youtu.be/SKDX-qJaJ08

rofl!

ThatrCOs why it is best reserved for CPU-intensive tasks that can
benefit from running a bunch of cores at once.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Wed Jun 17 03:07:28 2026

From Newsgroup: comp.lang.c

On Tue, 16 Jun 2026 12:23:32 -0700, Chris M. Thomasson wrote:

On 6/13/2026 5:32 PM, Lawrence DrCOOliveiro wrote:

On Sat, 13 Jun 2026 13:46:55 -0700, Chris M. Thomasson wrote:

Its a recursive koch fractal using AppleSoft basic.

Yes, that was pretty clear. As was the fact that you were able to
get it working, not *because of* your choice of BASIC to write it
in, but *in spite of* that.

It has current stack space.

But being BASIC, it only has fixed-length arrays to use as the stack,
doesnrCOt it?

So, with a little work it should be workable for continuations.

I added continuations to my toy PostScript-revival language. I soon
discovered that having a dedicated stack area was a bad idea. So what
happens is call frames are allocated on the heap, and chained together
in various ways: for transferring control for a return, exit, yield or
stop. An instance of the Continuation class keeps a copy of the
CallFrame object that was current when it was created, and simply
makes that current again when it is invoked.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Wed Jun 17 03:09:04 2026

From Newsgroup: comp.lang.c

On Tue, 16 Jun 2026 12:22:24 -0700, Chris M. Thomasson wrote:

Microsoft Windows problems? What do you mean? preemptive threads say
POSIX threads are going to have the same issues. Right?

Well, when talking to a Windows programmer, and they say something you
know doesnrCOt sound right, it seems reasonable to conclude that it
comes from their Windows-specific experience.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Wed Jun 17 20:29:23 2026

From Newsgroup: comp.lang.c

On 6/16/2026 8:09 PM, Lawrence DrCOOliveiro wrote:

On Tue, 16 Jun 2026 12:22:24 -0700, Chris M. Thomasson wrote:

Microsoft Windows problems? What do you mean? preemptive threads say
POSIX threads are going to have the same issues. Right?

Well, when talking to a Windows programmer, and they say something you
know doesnrCOt sound right, it seems reasonable to conclude that it
comes from their Windows-specific experience.

POSIX threads are preemptive and subject to the same scheduling
concerns... Right?
--- Synchronet 3.22a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Wed Jun 17 20:30:46 2026

From Newsgroup: comp.lang.c

On 6/17/2026 8:29 PM, Chris M. Thomasson wrote:

On 6/16/2026 8:09 PM, Lawrence DrCOOliveiro wrote:

On Tue, 16 Jun 2026 12:22:24 -0700, Chris M. Thomasson wrote:

Microsoft Windows problems? What do you mean? preemptive threads say
POSIX threads are going to have the same issues. Right?

Well, when talking to a Windows programmer, and they say something you
know doesnrCOt sound right, it seems reasonable to conclude that it
comes from their Windows-specific experience.

POSIX threads are preemptive and subject to the same scheduling
concerns... Right?

If preemption were a rCLWindows thing,rCY then POSIX mutexes, atomics, and memory ordering rules wouldnrCOt need to exist... ? But they do... because
the same hazards exist on both sides?
--- Synchronet 3.22a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Thu Jun 18 04:11:55 2026

From Newsgroup: comp.lang.c

On Wed, 17 Jun 2026 20:29:23 -0700, Chris M. Thomasson wrote:

On 6/16/2026 8:09 PM, Lawrence DrCOOliveiro wrote:

On Tue, 16 Jun 2026 12:22:24 -0700, Chris M. Thomasson wrote:

Microsoft Windows problems? What do you mean? preemptive threads
say POSIX threads are going to have the same issues. Right?

Well, when talking to a Windows programmer, and they say something
you know doesnrCOt sound right, it seems reasonable to conclude that
it comes from their Windows-specific experience.

POSIX threads are preemptive and subject to the same scheduling
concerns... Right?

They are lighter-weight than processes. But then, processes on *nix
systems are cheaper to create on *nix systems than on Dave-Cutler-type
systems, we know that already.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Thu Jun 18 00:18:18 2026

From Newsgroup: comp.lang.c

On 6/17/2026 9:11 PM, Lawrence DrCOOliveiro wrote:

On Wed, 17 Jun 2026 20:29:23 -0700, Chris M. Thomasson wrote:

On 6/16/2026 8:09 PM, Lawrence DrCOOliveiro wrote:

On Tue, 16 Jun 2026 12:22:24 -0700, Chris M. Thomasson wrote:

Microsoft Windows problems? What do you mean? preemptive threads
say POSIX threads are going to have the same issues. Right?

Well, when talking to a Windows programmer, and they say something
you know doesnrCOt sound right, it seems reasonable to conclude that
it comes from their Windows-specific experience.

POSIX threads are preemptive and subject to the same scheduling
concerns... Right?

They are lighter-weight than processes. But then, processes on *nix
systems are cheaper to create on *nix systems than on Dave-Cutler-type systems, we know that already.

Sure... But process creation cost isnrCOt relevant here?
IrCOm talking about preemption semantics, which are the same on POSIX and Windows.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Thu Jun 18 07:23:31 2026

From Newsgroup: comp.lang.c

On Thu, 18 Jun 2026 00:18:18 -0700, Chris M. Thomasson wrote:

IrCOm talking about preemption semantics, which are the same on POSIX
and Windows.

But for some reason the previous poster thinks that doing thread
preemption in the kernel is somehow more resource-heavy than doing the
exact same thing in userspace.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Jun 18 18:23:28 2026

From Newsgroup: comp.lang.c

Am 18.06.2026 um 06:11 schrieb Lawrence DrCOOliveiro:

They are lighter-weight than processes. But then, processes on *nix
systems are cheaper to create on *nix systems than on Dave-Cutler-type systems, we know that already.

Yes, they are, but if performance is your concern better use thread pools.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Thu Jun 18 12:46:19 2026

From Newsgroup: comp.lang.c

On 6/18/2026 9:23 AM, Bonita Montero wrote:

Am 18.06.2026 um 06:11 schrieb Lawrence DrCOOliveiro:

They are lighter-weight than processes. But then, processes on *nix
systems are cheaper to create on *nix systems than on Dave-Cutler-type
systems, we know that already.

Yes, they are, but if performance is your concern better use thread pools.

Agreed. xxx_init, it creates all the system state, thread pools, etc,
with default settings say, anyway. and sits in a neutral state waiting
for a user to poke at it. Say issue a ConnectEx.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Thu Jun 18 12:47:15 2026

From Newsgroup: comp.lang.c

On 6/16/2026 8:07 PM, Lawrence DrCOOliveiro wrote:

On Tue, 16 Jun 2026 12:23:32 -0700, Chris M. Thomasson wrote:

On 6/13/2026 5:32 PM, Lawrence DrCOOliveiro wrote:

On Sat, 13 Jun 2026 13:46:55 -0700, Chris M. Thomasson wrote:

Its a recursive koch fractal using AppleSoft basic.

Yes, that was pretty clear. As was the fact that you were able to
get it working, not *because of* your choice of BASIC to write it
in, but *in spite of* that.

It has current stack space.

But being BASIC, it only has fixed-length arrays to use as the stack, doesnrCOt it?

Well yeah. So you create a buffer.

So, with a little work it should be workable for continuations.

I added continuations to my toy PostScript-revival language. I soon discovered that having a dedicated stack area was a bad idea. So what
happens is call frames are allocated on the heap, and chained together
in various ways: for transferring control for a return, exit, yield or
stop. An instance of the Continuation class keeps a copy of the
CallFrame object that was current when it was created, and simply
makes that current again when it is invoked.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Thu Jun 18 12:51:11 2026

From Newsgroup: comp.lang.c

On 6/18/2026 12:47 PM, Chris M. Thomasson wrote:

On 6/16/2026 8:07 PM, Lawrence DrCOOliveiro wrote:

On Tue, 16 Jun 2026 12:23:32 -0700, Chris M. Thomasson wrote:

On 6/13/2026 5:32 PM, Lawrence DrCOOliveiro wrote:

On Sat, 13 Jun 2026 13:46:55 -0700, Chris M. Thomasson wrote:

Its a recursive koch fractal using AppleSoft basic.

Yes, that was pretty clear. As was the fact that you were able to
get it working, not *because of* your choice of BASIC to write it
in, but *in spite of* that.

It has current stack space.

But being BASIC, it only has fixed-length arrays to use as the stack,
doesnrCOt it?

Well yeah. So you create a buffer.

Basilica over reserve... ;^)

So, with a little work it should be workable for continuations.

I added continuations to my toy PostScript-revival language. I soon
discovered that having a dedicated stack area was a bad idea. So what
happens is call frames are allocated on the heap, and chained together
in various ways: for transferring control for a return, exit, yield or
stop. An instance of the Continuation class keeps a copy of the
CallFrame object that was current when it was created, and simply
makes that current again when it is invoked.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Thu Jun 18 23:01:47 2026

From Newsgroup: comp.lang.c

On Thu, 18 Jun 2026 18:23:28 +0200, Bonita Montero wrote:

Am 18.06.2026 um 06:11 schrieb Lawrence DrCOOliveiro:

They are lighter-weight than processes. But then, processes on *nix
systems are cheaper to create on *nix systems than on
Dave-Cutler-type systems, we know that already.

Yes, they are, but if performance is your concern better use thread
pools.

All that does is save the initial creation overhead, not the
(supposed) context-switching overhead. Which is what you were
complaining about, wasnrCOt it?
--- Synchronet 3.22a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Fri Jun 19 13:34:21 2026

From Newsgroup: comp.lang.c

On 6/18/2026 4:01 PM, Lawrence DrCOOliveiro wrote:

On Thu, 18 Jun 2026 18:23:28 +0200, Bonita Montero wrote:

Am 18.06.2026 um 06:11 schrieb Lawrence DrCOOliveiro:

They are lighter-weight than processes. But then, processes on *nix
systems are cheaper to create on *nix systems than on
Dave-Cutler-type systems, we know that already.

Yes, they are, but if performance is your concern better use thread
pools.

All that does is save the initial creation overhead, not the
(supposed) context-switching overhead. Which is what you were
complaining about, wasnrCOt it?

Not exactly sure what you mean here. Windows and POSIX have basically
the same overhead with preemptive threads. Syncing between processes is
a different story. Once you are in a thread pool, you are staying within
the same virtual memory space, which is why threads are the standard for high-performance concurrency?
--- Synchronet 3.22a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Sat Jun 20 11:47:44 2026

From Newsgroup: comp.lang.c

Am 19.06.2026 um 22:34 schrieb Chris M. Thomasson:

Not exactly sure what you mean here. Windows and POSIX have basically
the same overhead with preemptive threads. ...

This measures the overhead of a preempted context switch:

#if defined(_WIN32)
#include <Windows.h>
#elif defined(__unix__)
#include <pthread.h>
#endif
#include <iostream>
#include <latch>
#include <thread>
#if defined(_MSC_VER)
#include <intrin.h>
#elif defined(__GNUC__) || defined(__clang__)
#include <x86intrin.h>
#endif

using namespace std;

int main()
{
constexpr size_t ROUNDS = 1'000'000'000;
struct id_tsc { uint64_t id, tsc; };
static id_tsc idTsc( -1, __rdtsc() );
latch latSync( 2 );
atomic_uint64_t aSumTsc = 0;
atomic<size_t> aNChanges = 0;
auto ctxWait = [&]( uint64_t id )
{
#if defined(_WIN32)
SetThreadAffinityMask( GetCurrentThread(), 1 );
#elif defined(__unix__)
cpu_set_t cpuSet;
CPU_ZERO_S(sizeof cpuSet, &cpuSet);
CPU_SET_S(0, sizeof cpuSet, &cpuSet);
pthread_setaffinity_np( pthread_self(), sizeof cpuSet, &cpuSet );
#endif
latSync.arrive_and_wait();
atomic_ref aIdTsc( idTsc );
id_tsc ref = aIdTsc.load( memory_order_relaxed ), niu;
uint64_t sumTsc = 0;
size_t nChanges = 0;
for( size_t r = ROUNDS; r; --r )
{
niu.id = id;
niu.tsc = __rdtsc();
if( aIdTsc.compare_exchange_strong( ref, niu, memory_order_relaxed,
memory_order_relaxed ) ) [[likely]]
{
ref = niu;
continue;
}
if( ref.id == id || ref.id == -1 )
continue;
int64_t dist = niu.tsc - ref.tsc;
if( dist < 0 )
continue;
sumTsc += dist;
++nChanges;
};
aSumTsc += sumTsc;
aNChanges += nChanges;
};
jthread spawned( ctxWait, 0 );
ctxWait( 1 );
cout << (double)aSumTsc / (double)aNChanges << endl;
}
--- Synchronet 3.22a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Sat Jun 20 22:41:39 2026

From Newsgroup: comp.lang.c

On Sat, 20 Jun 2026 11:47:44 +0200, Bonita Montero wrote:

This measures the overhead of a preempted context switch:

[code omitted]

ldo@theon:c++_try> g++ --std=c++26 context_switch_overhead.cpp
/usr/bin/x86_64-linux-gnu-ld.bfd: /tmp/ccGEPNJM.o: in function `std::__atomic_ref<main::id_tsc, false, false>::load(std::memory_order) const':
context_switch_overhead.cpp:(.text+0x547): undefined reference to `__atomic_load_16'
/usr/bin/x86_64-linux-gnu-ld.bfd: /tmp/ccGEPNJM.o: in function `std::__atomic_ref<main::id_tsc, false, false>::compare_exchange_strong(main::id_tsc&, main::id_tsc, std::memory_order, std::memory_order) const':
context_switch_overhead.cpp:(.text+0x671): undefined reference to `__atomic_compare_exchange_16'
collect2: error: ld returned 1 exit status

Got more errors when I didnrCOt specify a newer C++ --std option, but
donrCOt know how to get rid of these ...
--- Synchronet 3.22a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Sat Jun 20 23:29:44 2026

From Newsgroup: comp.lang.c

On Sat, 20 Jun 2026 22:41:39 -0000 (UTC), I wrote:

On Sat, 20 Jun 2026 11:47:44 +0200, Bonita Montero wrote:

This measures the overhead of a preempted context switch:

[code omitted]

ldo@theon:c++_try> g++ --std=c++26 context_switch_overhead.cpp
/usr/bin/x86_64-linux-gnu-ld.bfd: /tmp/ccGEPNJM.o: in function `std::__atomic_ref<main::id_tsc, false, false>::load(std::memory_order) const':
context_switch_overhead.cpp:(.text+0x547): undefined reference to `__atomic_load_16'
/usr/bin/x86_64-linux-gnu-ld.bfd: /tmp/ccGEPNJM.o: in function `std::__atomic_ref<main::id_tsc, false, false>::compare_exchange_strong(main::id_tsc&, main::id_tsc, std::memory_order, std::memory_order) const':
context_switch_overhead.cpp:(.text+0x671): undefined reference to `__atomic_compare_exchange_16'
collect2: error: ld returned 1 exit status

Got more errors when I didnrCOt specify a newer C++ --std option, but
donrCOt know how to get rid of these ...

Got it!

ldo@theon:c++_try> g++ --std=c++23 context_switch_overhead.cpp -latomic
ldo@theon:c++_try> ./a.out
17858.5
ldo@theon:c++_try> g++ --std=c++26 context_switch_overhead.cpp -latomic
ldo@theon:c++_try> ./a.out
18899.2

So what do the numbers mean?
--- Synchronet 3.22a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Sun Jun 21 09:06:58 2026

From Newsgroup: comp.lang.c

Am 21.06.2026 um 01:29 schrieb Lawrence DrCOOliveiro:> On Sat, 20 Jun 2026 22:41:39 -0000 (UTC), I wrote:

On Sat, 20 Jun 2026 11:47:44 +0200, Bonita Montero wrote:

This measures the overhead of a preempted context switch:

[code omitted]

ldo@theon:c++_try> g++ --std=c++26 context_switch_overhead.cpp
/usr/bin/x86_64-linux-gnu-ld.bfd: /tmp/ccGEPNJM.o: in function `std::__atomic_ref<main::id_tsc, false, false>::load(std::memory_order) const':
context_switch_overhead.cpp:(.text+0x547): undefined reference

to `__atomic_load_16'

/usr/bin/x86_64-linux-gnu-ld.bfd: /tmp/ccGEPNJM.o: in function `std::__atomic_ref<main::id_tsc, false, false>::compare_exchange_strong(main::id_tsc&, main::id_tsc, std::memory_order, std::memory_order) const':
context_switch_overhead.cpp:(.text+0x671): undefined reference

to `__atomic_compare_exchange_16'

collect2: error: ld returned 1 exit status

Got more errors when I didnrCOt specify a newer C++ --std option, but
donrCOt know how to get rid of these ...

Got it!

ldo@theon:c++_try> g++ --std=c++23 context_switch_overhead.cpp

-latomic

ldo@theon:c++_try> ./a.out
17858.5
ldo@theon:c++_try> g++ --std=c++26 context_switch_overhead.cpp

-latomic

ldo@theon:c++_try> ./a.out
18899.2

So what do the numbers mean?

I've improved the code. Now it measures the time for a preempted context
switch in microseconds:

#if defined(_WIN32)
#include <Windows.h>
#elif defined(__unix__)
#include <pthread.h>
#endif
#include <iostream>
#include <latch>
#include <thread>
#include <chrono>

using namespace std;
using namespace chrono;

static bool unbeatable() noexcept;

int main()
{
constexpr size_t ROUNDS = 100'000'000;
static struct id_tsc
{
uint64_t id, tsc;
} idTsc;
latch latSync( 2 );
atomic_uint64_t aSumTsc = 0;
atomic<size_t> aNChanges = 0;
auto ctxWait = [&]( uint64_t id )
{
unbeatable();
latSync.arrive_and_wait();
atomic_ref aIdTsc( idTsc );
id_tsc ref = aIdTsc.load( memory_order_relaxed ), niu;
uint64_t sumTsc = 0;
size_t nChanges = 0;
for( size_t r = ROUNDS; r; --r )
{
niu.id = id;
niu.tsc = (int64_t)high_resolution_clock::now().time_since_epoch().count();
if( aIdTsc.compare_exchange_strong( ref, niu, memory_order_relaxed, memory_order_relaxed ) ) [[likely]]
{
ref = niu;
continue;
}
if( ref.id == id || ref.id == -1 )
continue;
int64_t dist = niu.tsc - ref.tsc;
if( dist < 0 )
continue;
sumTsc += dist;
++nChanges;
};
aSumTsc += sumTsc;
aNChanges += nChanges;
};
jthread spawned( ctxWait, 0 );
ctxWait( 1 );
double
avgNs = (double)aSumTsc / (double)aNChanges,
avgUs = floor( avgNs / (1.0e3 / 10.0) + 0.5 ) / 10.0;
cout << aNChanges << " context switches" << endl;
cout << avgUs << "us per preemted context switch" << endl;
}

static bool unbeatable() noexcept
{
#if defined(_WIN32)
HANDLE hThread = GetCurrentThread();
return SetThreadAffinityMask( hThread, 1 )
&& (SetThreadPriority( hThread, THREAD_PRIORITY_HIGHEST )
|| SetThreadPriority( hThread, THREAD_PRIORITY_TIME_CRITICAL ));
#elif defined(__unix__)
cpu_set_t cpuSet;
CPU_ZERO_S(sizeof cpuSet, &cpuSet);
CPU_SET_S(0, sizeof cpuSet, &cpuSet);
if( pthread_setaffinity_np( pthread_self(), sizeof cpuSet, &cpuSet ) )
return false;
int policy;
struct sched_param param;
return pthread_getschedparam( pthread_self(), &policy, &param ) == 0
&& pthread_setschedprio( pthread_self(),
sched_get_priority_max( policy ) ) == 0;
#endif
}

And this code measures the time for a voluntary context switch (yield)
in microseconds:

#if defined(_WIN32)
#include <Windows.h>
#elif defined(__unix__)
#include <pthread.h>
#endif
#include <iostream>
#include <latch>
#include <thread>
#include <chrono>
#include <cmath>

using namespace std;
using namespace chrono;

static bool unbeatable() noexcept;

int main()
{
constexpr size_t ROUNDS = 1'000'000;
latch latSync( 2 );
atomic_uint64_t aSumTsc = 0;
auto yieldLoop = [&]
{
if( !unbeatable() )
return;
latSync.arrive_and_wait();
size_t nChanges = 0;
auto start = high_resolution_clock::now();
for( size_t r = ROUNDS; r; --r )
this_thread::yield();
aSumTsc += duration_cast<nanoseconds>(high_resolution_clock::now() - start).count();
};
jthread spawned( yieldLoop );
yieldLoop();
spawned.join();
double
avgNs = (double)aSumTsc / (2.0 * (double)ROUNDS),
avgUs = floor( avgNs / (1.0e3 / 10.0) + 0.5 ) / 10.0;
cout << avgUs << "us per voluntary context switch" << endl;
}

static bool unbeatable() noexcept
{
#if defined(_WIN32)
HANDLE hThread = GetCurrentThread();
return SetThreadAffinityMask( hThread, 1 )
&& SetThreadPriority( hThread, THREAD_PRIORITY_HIGHEST )
&& SetThreadPriority( hThread, THREAD_PRIORITY_TIME_CRITICAL );
#elif defined(__unix__)
cpu_set_t cpuSet;
CPU_ZERO_S(sizeof cpuSet, &cpuSet);
CPU_SET_S(0, sizeof cpuSet, &cpuSet);
if( pthread_setaffinity_np( pthread_self(), sizeof cpuSet, &cpuSet ) )
return false;
int policy;
struct sched_param param;
return pthread_getschedparam( pthread_self(), &policy, &param ) == 0
&& pthread_setschedprio( pthread_self(),
sched_get_priority_max( policy ) ) == 0;
#endif
}
--- Synchronet 3.22a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Sun Jun 21 07:50:47 2026

From Newsgroup: comp.lang.c

On Sun, 21 Jun 2026 09:06:58 +0200, Bonita Montero wrote:

I've improved the code. Now it measures the time for a preempted
context switch in microseconds:

Had to patch this, otherwise it couldnrCOt find a floor(double) function:

ldo@theon:Bonita Montero progs> diff -u context_switch_overhead_2.cpp{-orig,} --- context_switch_overhead_2.cpp-orig 2026-06-21 19:43:13.682113804 +1200
+++ context_switch_overhead_2.cpp 2026-06-21 19:46:24.228048403 +1200
@@ -12,6 +12,7 @@
#include <latch>
#include <thread>
#include <chrono>
+#include <math.h>

using namespace std;
using namespace chrono;
@@ -64,7 +65,7 @@
avgNs = (double)aSumTsc / (double)aNChanges,
avgUs = floor( avgNs / (1.0e3 / 10.0) + 0.5 ) / 10.0;
cout << aNChanges << " context switches" << endl;
- cout << avgUs << "us per preemted context switch" << endl;
+ cout << avgUs << "us per preempted context switch" << endl;
}

static bool unbeatable() noexcept

Output was:

470 context switches
3.6us per preempted context switch

And this code measures the time for a voluntary context switch
(yield) in microseconds:

Output was:

1.6us per voluntary context switch
--- Synchronet 3.22a-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Sun Jun 21 12:22:04 2026

From Newsgroup: comp.lang.c

On 21/06/2026 01:29, Lawrence DrCOOliveiro wrote:

On Sat, 20 Jun 2026 22:41:39 -0000 (UTC), I wrote:

On Sat, 20 Jun 2026 11:47:44 +0200, Bonita Montero wrote:

This measures the overhead of a preempted context switch:

[code omitted]

ldo@theon:c++_try> g++ --std=c++26 context_switch_overhead.cpp
/usr/bin/x86_64-linux-gnu-ld.bfd: /tmp/ccGEPNJM.o: in function `std::__atomic_ref<main::id_tsc, false, false>::load(std::memory_order) const':
context_switch_overhead.cpp:(.text+0x547): undefined reference to `__atomic_load_16'
/usr/bin/x86_64-linux-gnu-ld.bfd: /tmp/ccGEPNJM.o: in function `std::__atomic_ref<main::id_tsc, false, false>::compare_exchange_strong(main::id_tsc&, main::id_tsc, std::memory_order, std::memory_order) const':
context_switch_overhead.cpp:(.text+0x671): undefined reference to `__atomic_compare_exchange_16'
collect2: error: ld returned 1 exit status

Got more errors when I didnrCOt specify a newer C++ --std option, but
donrCOt know how to get rid of these ...

Got it!

ldo@theon:c++_try> g++ --std=c++23 context_switch_overhead.cpp -latomic
ldo@theon:c++_try> ./a.out
17858.5
ldo@theon:c++_try> g++ --std=c++26 context_switch_overhead.cpp -latomic
ldo@theon:c++_try> ./a.out
18899.2

So what do the numbers mean?

When you are compiling without optimisation enabled? Probably nothing,
unless the time taken is only due to code in libraries that you did not compile.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Sun Jun 21 13:41:04 2026

From Newsgroup: comp.lang.c

Am 21.06.2026 um 12:22 schrieb David Brown:

When you are compiling without optimisation enabled?-a Probably nothing, unless the time taken is only due to code in libraries that you did not compile.

The intervals for the thread switches are that long that this
shouldn't make a large difference. The timings just might become
somewhat inaccurate.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Mon Jun 22 12:15:38 2026

From Newsgroup: comp.lang.c

On 6/12/2026 4:15 AM, Bonita Montero wrote:

Am 12.06.2026 um 06:19 schrieb Chris M. Thomasson:

The fibers float along the threads... ;^)

If you have as many fibres as you otherwise would have coroutines
and the number is lage you waste a lot of memory.
The only difference is that with fibers you can switch the context
from any function.

Don't use them if you don't actually need them. Simple.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Tue Jun 23 11:47:00 2026

From Newsgroup: comp.lang.c

On 6/18/2026 12:23 AM, Lawrence DrCOOliveiro wrote:

On Thu, 18 Jun 2026 00:18:18 -0700, Chris M. Thomasson wrote:

IrCOm talking about preemption semantics, which are the same on POSIX
and Windows.

But for some reason the previous poster thinks that doing thread
preemption in the kernel is somehow more resource-heavy than doing the
exact same thing in userspace.

Those are different things? The kernel knows more than we do... Now
making a user land scheduler with, say, fibers is complete different
than what the kernel is doing...
--- Synchronet 3.22a-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Geek2
  Thu Jul 2 11:41:05 2026
  from Euclid, Oh via Telnet
- Hannibal
  Thu Jul 2 05:49:27 2026
  from Des Moines via SSH
- Geek2
  Wed Jul 1 16:31:20 2026
  from Euclid, Oh via Telnet
- Hannibal
  Tue Jun 30 16:45:42 2026
  from Des Moines via SSH

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	70
Nodes:	6 (0 / 6)
Uptime:	37:38:06
Calls:	948
Calls today:	2
Files:	1,325
Messages:	280,462

acquire + sleep + async

Who's Online

Recent Visitors

System Info