Forum: Too Lazy BBS

Re: Concertina II Instead

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.arch on Thu Apr 16 13:46:33 2026

From Newsgroup: comp.arch

On 4/16/2026 11:15 AM, BGB wrote:

On 4/15/2026 5:44 PM, Bill Findlay wrote:

On 15 Apr 2026, David Brown wrote
(in article <10roqep$16j1j$1@dont-email.me>):

On 15/04/2026 17:36, quadi wrote:

On Wed, 15 Apr 2026 03:32:00 +0100, moi wrote:

On 15/04/2026 01:44, MitchAlsup wrote:

One should also note: in the history of this system (~late 1930s)
to present: only 2 properly registered FA guns have been used in any >>>>>> crimes. {Anyone with a brain would say this is a pretty good record} >>>>

Anyone with a non-USAn brain would say this is utterly insane.

Utterly insane would be if the same procedure applied to thermonuclear >>>> warheads.

Of course, much about the consequences of the Second Amendment
indeed does
appear insane. The sensible thing to do would be to repeal it,
rather than
pretend it doesn't exist, or it doesn't mean what it says, and hope the >>>> Supreme Court will look the other way.

[wise words omitted]

That's just my two cents - coming from someone in a country ...
where we have far more real-world freedoms than the USA.

That is the bit they really can't fathom.

?...

But, AFAIK, the UK is the place that went and banned:
-a Sharp points on knives;
-a Sharp points on scissors;

Better put corks on the forks! This scene from Dirty Rotten Scoundrels
always cracked me up:

(Dirty Rotten Scoundrels (1988) - Dinner With Ruprecht Scene (6/12) | Movieclips)

https://youtu.be/SKDX-qJaJ08

corks on the forks to prevent him from hurting himself and/or others... ;^D

-a Buying solder without having certifications;
-a-a-a So, it is effectively sold black-market in small amounts,
-a-a-a-a-a to the electronics hobbyists.

[...]
--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Thu Apr 16 20:49:07 2026

From Newsgroup: comp.arch

scott@slp53.sl.home (Scott Lurndal) posted:

MitchAlsup <user5857@newsgrouper.org.invalid> writes:

You can tell if a person with a gun knows about gun safety as he first >touches the gun, he checks to see if it is unloaded and if not unloads
it prior to letting anyone else touch it.

While true, if someone tried to hand me a bolt-action rifle with
the bolt in place, I'd refuse. Likewise any magazine fed
weapon should not have a magazine installed and the action should
be open, if possible before handed to anyone.

Lookup "Walker Trigger". If you carry a rifle with a cartridge in
chamber, bolt locked, safety on. There are situations where when the
safety is turned off the rifle will spontaneously (and negligently)
fire all by itself. Carry the rifle with no cartridge in chamber, and
bolt unlocked for maximum safety.

Which brings up gun safety rule 2: never point a gun at something
you are not willing to destroy.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.arch on Thu Apr 16 14:59:02 2026

From Newsgroup: comp.arch

On 4/15/2026 9:25 AM, Dan Cross wrote:

In article <SlNDR.276690$4wI6.88606@fx24.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:

David Brown <david.brown@hesbynett.no> writes:

On 15/04/2026 13:30, Dan Cross wrote:

In article <n48bl0Fdbm4U1@mid.individual.net>,
moi <findlaybill@blueyonder.co.uk> wrote:

On 15/04/2026 01:44, MitchAlsup wrote:

<snip>

Broadly speaking, they're a pain, and pretty much only useful as
either a suppression weapon or for guarding avenues of approach
to fixed defensive positions.

They're also extremely expensive to use and maintain. at $2.00 or
more _per round_, the cost rises rapidly.

Full-auto weapons are also heavy, and so is the ammo. Carrying
even a SAW is no fun after a few hours; a 240 or Ma Deuce? Not
happening. A Mk19? Forget about it.

A fully auto shotgun, say aa12 with the barrel mag?

Mythbusters shot a minigun on a couple episodes and the ammo cost
(more than a decade ago) was huge.

Indeed.

I suppose that the proponents will point to smaller weapons
systems, like submachine guns, that are full-auto but shoot
conventional cartidges (the Thompson shoots .45ACP, for example)
but they often don't realize that the kickback makes them
difficult to control. Even the 3-round burst on the M4/M16
pattern weapons will kick you off target almost instantly; that
mode is only useful as a direct fire alternative for suppression
to support advancing infantry in a complex attack scenario when
indirect fire is not viable, you don't have combined-arms
support, or you don't have crew-served weapons. It's like a
method of last resort.

I don't know why anyone would want a full-auto weapon as
anything other than a museum piece or a novelty. I strongly
suspect that such people have never heard a shot fired in anger
before.

- Dan C.

--- Synchronet 3.21f-Linux NewsLink 1.2

From Bill Findlay@findlaybill@blueyonder.co.uk to comp.arch on Thu Apr 16 23:00:37 2026

From Newsgroup: comp.arch

On 16 Apr 2026, BGB wrote
(in article <10rr8vi$1sh0n$1@dont-email.me>):

On 4/15/2026 5:44 PM, Bill Findlay wrote:

On 15 Apr 2026, David Brown wrote
(in article <10roqep$16j1j$1@dont-email.me>):

On 15/04/2026 17:36, quadi wrote:

On Wed, 15 Apr 2026 03:32:00 +0100, moi wrote:

On 15/04/2026 01:44, MitchAlsup wrote:

One should also note: in the history of this system (~late 1930s) to present: only 2 properly registered FA guns have been used in any
crimes. {Anyone with a brain would say this is a pretty good record}

Anyone with a non-USAn brain would say this is utterly insane.

Utterly insane would be if the same procedure applied to thermonuclear warheads.

Of course, much about the consequences of the Second Amendment indeed does
appear insane. The sensible thing to do would be to repeal it, rather than
pretend it doesn't exist, or it doesn't mean what it says, and hope the Supreme Court will look the other way.

[wise words omitted]

That's just my two cents - coming from someone in a country ...
where we have far more real-world freedoms than the USA.

That is the bit they really can't fathom.

?...

But, AFAIK, the UK is the place that went and banned:
Sharp points on knives;
Sharp points on scissors;
Buying solder without having certifications;
So, it is effectively sold black-market in small amounts,
to the electronics hobbyists.
...
And, where a person can be arrested, for stuff they say on social media
(or "thought crime" as some are calling it);
Where corporations can lead search-and-seizure operations for claimed IP violations;

It is clear from that claptrap that in fact you know very little.
(MAGA shills like JD are not a trustworthy source of information.)
--
Bill Findlay

--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Thu Apr 16 18:53:47 2026

From Newsgroup: comp.arch

On 4/16/2026 5:00 PM, Bill Findlay wrote:

On 16 Apr 2026, BGB wrote
(in article <10rr8vi$1sh0n$1@dont-email.me>):

On 4/15/2026 5:44 PM, Bill Findlay wrote:

On 15 Apr 2026, David Brown wrote
(in article <10roqep$16j1j$1@dont-email.me>):

On 15/04/2026 17:36, quadi wrote:

On Wed, 15 Apr 2026 03:32:00 +0100, moi wrote:

On 15/04/2026 01:44, MitchAlsup wrote:

One should also note: in the history of this system (~late 1930s) >>>>>>> to present: only 2 properly registered FA guns have been used in any >>>>>>> crimes. {Anyone with a brain would say this is a pretty good record} >>>>>

Anyone with a non-USAn brain would say this is utterly insane.

Utterly insane would be if the same procedure applied to thermonuclear >>>>> warheads.

Of course, much about the consequences of the Second Amendment indeed does
appear insane. The sensible thing to do would be to repeal it, rather than
pretend it doesn't exist, or it doesn't mean what it says, and hope the >>>>> Supreme Court will look the other way.

[wise words omitted]

That's just my two cents - coming from someone in a country ...
where we have far more real-world freedoms than the USA.

That is the bit they really can't fathom.

?...

But, AFAIK, the UK is the place that went and banned:
Sharp points on knives;
Sharp points on scissors;
Buying solder without having certifications;
So, it is effectively sold black-market in small amounts,
to the electronics hobbyists.
...
And, where a person can be arrested, for stuff they say on social media
(or "thought crime" as some are calling it);
Where corporations can lead search-and-seizure operations for claimed IP
violations;

It is clear from that claptrap that in fact you know very little.
(MAGA shills like JD are not a trustworthy source of information.)

I am not really part of the MAGA crowd.
I am not really into politics in general...

But, this is still what people say online about the UK and CA and similar...

--- Synchronet 3.21f-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Fri Apr 17 00:12:01 2026

From Newsgroup: comp.arch

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <NdaER.1511$r_k6.609@fx38.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:

<snip>

[*] recently transferred to the CH-53E fleet due to the imminent >>>>retirement of the F-16C fleet.

The Marine Corps doesn't fly the F-16. :-) Perhaps you mean
the F/A-18 or the Harrier?

Brain fart. His squadron flys the F/A-18C and D models. The
E's and F's will remain in the active fleet along with the F-35,
but the final days of the C and D models are in sight.

No problem; I can see wanting to switch over to helos from fixed
wing. It's a different world.

His eventual goal is to get his A&P. Figures helos will be good
experience.

I got to visit the flight line in 2024, very interesting.

Cool.

Prior visit to a Marine base was at 29 Palms in the 1980s, visiting
a cousin. He was living on-base in married housing and told
me to avoid the well-lit compound several miles east of the housing area, which
was secured and managed by the NOP.

Ah, the stumps. I remember the first time I got there, stepping
off a bus (we'd just flown from NC, having completed post Parris
Island training at Camp Lejeune) and immediately seeing tumble
weed blowing down the main drag. "Oh my god; I have to stay
here for a YEAR?!" (This was before I became an officer.)

I wonder which part your cousin meant; perhaps Camp Wilson,
which is an active training area (and pretty much nothing else,
though there is a very small PX there selling pogey bait).

I just looked on google maps and they've pretty much censored
the entire base in both the map and satellite views.

--- Synchronet 3.21f-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Fri Apr 17 00:13:25 2026

From Newsgroup: comp.arch

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 4/16/2026 11:52 AM, Scott Lurndal wrote:

Brain fart. His squadron flys the F/A-18C and D models. The
E's and F's will remain in the active fleet along with the F-35,

Can he fly the F-35?

Only if he gets a ride in the back seat. Which, since the F35
is a single seater, is not gonna happen.
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Fri Apr 17 02:06:04 2026

From Newsgroup: comp.arch

On Thu, 16 Apr 2026 13:15:13 -0500, BGB wrote:

And, where a person can be arrested, for stuff they say on social media
(or "thought crime" as some are calling it);

Almost all countries other than the United States limit freedom of speech
by excluding the incitment of hatred towards minority groups.
This is perhaps a natural result of Europe having had World War II fought
on its own soil, and so they consider it a matter of survival to prevent
the rise of another movement similar to Nazism.

Given current political trends in Europe, I have to say it's a pity they didn't think that one way to prevent the rise of bigoted extremist
movements would have been not to have such a liberal immigration policy
that the demographic consequences would end up being an annoyance to a lot
of ordinary people not previously inclined to bigotry.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.arch on Thu Apr 16 19:08:41 2026

From Newsgroup: comp.arch

On 4/16/2026 5:13 PM, Scott Lurndal wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 4/16/2026 11:52 AM, Scott Lurndal wrote:

Brain fart. His squadron flys the F/A-18C and D models. The
E's and F's will remain in the active fleet along with the F-35,

Can he fly the F-35?

Only if he gets a ride in the back seat. Which, since the F35
is a single seater, is not gonna happen.

Oh damn! Shit. He can fly the f-16?
--- Synchronet 3.21f-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.arch on Thu Apr 16 19:09:48 2026

From Newsgroup: comp.arch

On 4/16/2026 5:13 PM, Scott Lurndal wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 4/16/2026 11:52 AM, Scott Lurndal wrote:

Brain fart. His squadron flys the F/A-18C and D models. The
E's and F's will remain in the active fleet along with the F-35,

Can he fly the F-35?

Only if he gets a ride in the back seat. Which, since the F35
is a single seater, is not gonna happen.

Neat! He is around that tech!
--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Thu Apr 16 22:09:21 2026

From Newsgroup: comp.arch

On 4/16/2026 9:06 PM, quadi wrote:

On Thu, 16 Apr 2026 13:15:13 -0500, BGB wrote:

And, where a person can be arrested, for stuff they say on social media
(or "thought crime" as some are calling it);

Almost all countries other than the United States limit freedom of speech
by excluding the incitment of hatred towards minority groups.
This is perhaps a natural result of Europe having had World War II fought
on its own soil, and so they consider it a matter of survival to prevent
the rise of another movement similar to Nazism.

OK. This at least seems like a sensible place where limits could be
imposed, like if a person is advocating for violence against a group of people, or promoting criminal activity (of the sort where actual
peoples' health or well-being is concerned).

Though, the line here gets fuzzy.

I think the claim though was that people were getting arrested for
things being said that had "offended the political elites" or
disagreeing with the official party line on various policies or something.

But, yeah, if it is people promoting doing stuff like what happened in
WWII, this is more understandable.

But, yeah, I guess in the US, people going around and doing the whole "neo-Nazi" and "white supremacist" thing has become a bit of an issue...

This sort of thing can get worrying sometimes. In recent years it does
seem as if the racists have been winning here.

Sorta reminds me of some years ago, people were making a lot of fuss
about BO, like saying here wasn't really an American and was actually
allying with the Islamists and stuff...

But, like, there were no real scandals going on, and it was mostly
uneventful. Then, with DJT, it is like it all turns into a raging
crap-storm (with an endless stream of infighting, scandals, etc) and
everyone else is like "Basically cool I guess, keep up the good work".

To admit something, though possibly an unpopular/controversial position,
kinda hoped KH would have won. Had I known how big of a crap-storm it
was all going to be, might have taken a stronger stance on the issue (vs
just going along passively...). Alas... Though, possibly it would have
still sucked either way.

But, yeah, politics is kinda confusing and sucks in this way.

Given current political trends in Europe, I have to say it's a pity they didn't think that one way to prevent the rise of bigoted extremist
movements would have been not to have such a liberal immigration policy
that the demographic consequences would end up being an annoyance to a lot
of ordinary people not previously inclined to bigotry.

Possibly true.

From what I had heard, usual issue is mostly that people from the
middle east come in and start bombing stuff and trying to push for
Sharia law everywhere.

I think also there was a thing where Germany realized this was not cool
and basically put a ban on allowing anyone to try to impose Sharia law.

But, say, there is a limit here, like one person trying to impose their religious rules on others isn't cool (nor is them trying to forcibly
convert others, etc).

Or, at least within the US context, this makes me think that the whole
thing of trying to enforce pro-life policies, or putting people through
"gay conversion therapy" and such, may be doing more harm than good.

Like, in terms of trying to control peoples' moral behavior, it is
making the situation worse for them than had they been left to make
their own choices in these area; even if one still views these things as
moral faults, and would similarly assume keeping the personal freedoms
to consider them as moral faults. Like, the line being not so much what
a person thinks or does for themselves, but where it crosses to imposing
on others.

...

--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Fri Apr 17 04:49:41 2026

From Newsgroup: comp.arch

On Wed, 15 Apr 2026 15:12:19 +0000, quadi wrote:

Now, I took out one excessively elaborate header format, and restored a feature to the architecture that I took out when pruning header types,
but this time in a different form, associated with a different header
type.

And now I rearranged the headers a bit, to defragment their opcode space.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Fri Apr 17 05:21:57 2026

From Newsgroup: comp.arch

On Fri, 17 Apr 2026 04:49:41 +0000, quadi wrote:

And now I rearranged the headers a bit, to defragment their opcode
space.

This led me to feel I had the perfect spot in which to re-introduce to
this iteration of the Concertina II architecture that most bizarre feature
of the architecture which was cited as one of its defining features... I
think of it as an exotic and strictly optional feature, existing mainly to enhance emulation, while the ability to go from RISC to CISC to VLIW is
what defines Concertina II.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri Apr 17 05:37:25 2026

From Newsgroup: comp.arch

MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

Thomas Koenig <tkoenig@netcologne.de> posted:

I first used the G3 at the Bundeswehr. 600 rounds per minute, 20
rounds per magazine. Firing that weapon at full auto will empty
it in a tid less than two seconds (note fencepost error here :-)

Firing short bursts or single shots will keep any enemy down for
a far longer time than that.

But a squad at the time had a MG3 (1200 rounds per minute) with it.
But firing that continously is also not a good idea because

a) ammunition: 24 grams per round means that you run through 0.48
kg (a bit more than a pound :-) per second of ammunition, which
somebody has to carry

b) you have to exchange barrels and locks after a certain number
of rounds to prevent overheating.

Where the military definition of overheating is: "The bullets no longer
go anywhere close to where the barrel is pointing".

In this case, not directly. The problem is wear on the barrel
and the lock. If you fire too many rounds in too short a time,
the rifling will wear down.

Hence, replacement barrels (and the asbestos rags for exchanging
them).

5 rounds at 20 second intervals is enough to heat a sniper barrel
to the point it is not "accurate enough". Now, imaging those 5
rounds in 0.5 seconds ...

Snipers and machine gunners have different tactical tasks :-)
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21f-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch on Fri Apr 17 08:05:55 2026

From Newsgroup: comp.arch

On 16/04/2026 20:15, BGB wrote:

On 4/15/2026 5:44 PM, Bill Findlay wrote:

On 15 Apr 2026, David Brown wrote
(in article <10roqep$16j1j$1@dont-email.me>):

On 15/04/2026 17:36, quadi wrote:

On Wed, 15 Apr 2026 03:32:00 +0100, moi wrote:

On 15/04/2026 01:44, MitchAlsup wrote:

One should also note: in the history of this system (~late 1930s)
to present: only 2 properly registered FA guns have been used in any >>>>>> crimes. {Anyone with a brain would say this is a pretty good record} >>>>

Anyone with a non-USAn brain would say this is utterly insane.

Utterly insane would be if the same procedure applied to thermonuclear >>>> warheads.

Of course, much about the consequences of the Second Amendment
indeed does
appear insane. The sensible thing to do would be to repeal it,
rather than
pretend it doesn't exist, or it doesn't mean what it says, and hope the >>>> Supreme Court will look the other way.

[wise words omitted]

That's just my two cents - coming from someone in a country ...
where we have far more real-world freedoms than the USA.

That is the bit they really can't fathom.

?...

Note - I come from the UK originally, but live in Norway.

But, AFAIK, the UK is the place that went and banned:
-a Sharp points on knives;
-a Sharp points on scissors;
-a Buying solder without having certifications;
-a-a-a So, it is effectively sold black-market in small amounts,
-a-a-a-a-a to the electronics hobbyists.
-a ...

Complete nonsense.

But both the UK and Norway have restrictions on people carrying around
deadly weapons of all sorts. The freedom not to be stabbed, shot, or otherwise injured or killed trumps the freedom to carry such weapons.

And, where a person can be arrested, for stuff they say on social media
(or "thought crime" as some are calling it);

Only thoughtless people are calling it that. You've been watching tool
much Fox News - a channel that describes itself as "entertainment"
without any obligation to tell the truth.

The freedom of innocent people not to suffer abuse, hatred and prejudice trumps the freedom of nasty little sods who think they have the right to
abuse others. And inciting hatred or encouraging others to commit
criminal behaviour is just as much a crime in the USA as the UK.

Where corporations can lead search-and-seizure operations for claimed IP violations;

No, they can't.

And in European countries, unlike the USA, corporations don't get to lie
and cheat then claim "freedom of speech".

We are free to live safely. We have the freedom to send our kids to
school without worrying if they will survive the day. We have the
freedom of knowing that we won't lose our jobs just because the boss is
having a bad hair day. And losing a job does not mean losing our
health. And we have freedom to vote, knowing that votes count equally.
(Well, the UK parliament elections still have a way to go here, but the Scottish and Norwegian elections have fair votes.)

No country is perfect by any means, but Europeans live far freer lives.
We might not have the freedom to own guns so our kids can accidentally
kill each other, but overall we win out.

Remember, freedoms are always a balance, not an absolute. Lots of types
of freedom for one person reduce other freedoms for other people.

...

So, sorta like California but worse...

California apparently banned the 60/40 lead/tin stuff IIRC, but still
allows people to freely possess lead-free solder (so people apparently
need to smuggle the 60/40 into CA if they want to use it). Everywhere
else, 60/40 is OK. Well, and CA has the "age verification" controversy,
etc.

I can't answer for California, but if I want leaded solder I can just
order some. But the regulations against the use of lead in general are
a good thing - the freedom to drink water without lead trumps the
freedom of a handful of people to use cheaper and lower temperature
soldering irons.

Could be wrong, this is from memory and stuff I heard on the internet.

You should be a lot more careful about what you watch on the internet.

--- Synchronet 3.21f-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch on Fri Apr 17 08:45:18 2026

From Newsgroup: comp.arch

On 16/04/2026 18:31, Dan Cross wrote:

In article <10rqrkf$1nbrp$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

On 16/04/2026 13:59, Dan Cross wrote:

In article <10rqag7$1in0h$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

[snip]
(The service members have all their other equipment at
home too - they store their own uniforms and other stuff, and are
responsible for washing and repairs or ordering replacements after the >>>> exercises.)

The same is true in the US military. I always found it annoying
that I had to make space for my issued equipment at home, but
c'est la vie.

Personally, I have no military experience at all (not counting school
cadets). But one of my sons joined the National Guard after his
military service. He has a neat solution to problem of space for his
equipment - he keeps it in his old bedroom in our house, not in his own
flat :-)

Seems like your son should take his obligations a bit more
seriously: keeping his equipment at your house, when it's
supposed to be in his dwelling, sounds like a rules violation.

I don't know the details of the regulations, but I can assure you that
it is entirely within them. As a student, his flat is considered
temporary accommodation and our house is his permanent home. His
National Guard base is in our area, not where he is a student.

Maybe the regulations in the USA are different. Maybe there are
different standards about how quickly you can be called up and need to
deploy.

What's good for the goose is good for the gander.

It also means that there are less guns stored in
concentrated places as potential targets for robberies. And in the
event of a real invasion, it's a lot easier to smugle around and
distribute firing pins to service members than to pass around guns from >>>> a central armoury.

There are tradeoffs, however: it is also easier for a bad actor
to source a weapon by breaking into a private home.

It is plausible, but AFAIK it is extremely rare here. The solid
majority of criminals here don't want guns, and would probably not take
one if they found one in a house they were robbing. As a house burglar,
you can't easily sell a gun, you don't need one for defence, you don't
need one to threaten anyone - it just increases your chance of being
shot yourself, and increases your punishment if you get got. There are
guns in the more serious narcotics gangs, but those are handguns - they
have no use for military weapons.

FWIW, handguns are military weapoons.

Sorry - I meant pistols, rather than rifle-sized weapons. (Of course
pistols are also used in the military.)

My T/E weapons in Afghanistan were a 9mm handgun and an M4
carbine. I was embedded with an Afghan Army Unit, and was armed
pretty much at all time. This was at a time when we knew there
were Taliban infiltrators in the army. Indeed, I mentioned
working with UK forces: such an infiltrator threw a hand grenade
into a tent full of British soldiers, who were sleeping at the
time (this was in the middle of the night) and sprayed it with
fire from an AK-47, killing three and wounding several more.
That is to say, we were at a relatively low, but constant and
higher than baseline level of risk that most ISAF troops were
not subject to. Despite that, and despite being directly and
indirectly threatened myself on multiple occasions, I only
carried my pistol unless I was outside the wire.

We also don't really go in for shoot-to-kill in Norway.

Note that I'm talking about the use of deadly force to prevent a
bad actor from forcibly taking military-grade arms. That is
exactly the sort of thing that deadly force arguably _should_ be
authorized for.

There is a distinction (which I did not make, but should have) between
authorising deadly force, and encouraging it. As a last resort, it
makes sense in situations like this. But it should be very much the
last resort. Real-life criminals are not like in the movies - if
military guards point guns at them, they will put their hands in the air
and no one needs to be shot. (It's a different matter for criminals
high on drugs and unable to think rationally, but they don't try to raid
military armouries.)

Of course there, is a continuum of force, and despite recent
idiots in charge of the US military asserting otherwise, the
rules of engagement and the laws of warfare are taken _very_
seriously.

It's nice to know - especially with the current muppets at the top of
the USA chain.

But the military are not the police. Full stop. If someone is
trying to attack a military armory with the intention of seizing
arms, they're not likely to be some petty criminal, but if they
obviously are, they will likely be subdued uninjured. The point
of mentioning that deadly force is authorized when protecting
those kinds of assets is to point out the relative value of the
assets themselves. To whit: weapons are dangerous, and in the
wrong hands, even more dangerous. They can, and should, be
protected.

OK.

Btw: as a veteran, one of the things that _really_ bothers me in
the US is when the police, in particular, refer to members of
the general public as "civilians". Words have meaning;
"civilian" refers to someone who is not a member of a military.
The police are are definitionally civilians.

Agreed.

If shots need to be fired, the primary aim should
be to persuade the bad guys to surrender, not to kill them.

Is it better if the enemy surrenders? Sure. But this idea that
you are going to shoot to wound in a combat scenario is not
realistic. As you said, it's not like in the movies.

I think that there has been a bit of a disparity in the situations we
have been imagining. I agree entirely that shooting to wound in a
combat situation is not realistic. It just seemed to me that you were suggesting armoury guards were moving to combat mode a lot more quickly
than I thought appropriate.

But more broadly, from a military perspective, this doesn't make
a lot of sense to me, because it ignores the human factors at
play in the fog of war.

Due to the sympathetic physiological reaction to stress, one
tends to lose one's fine motor skills in a combat-type situation
and it can be difficult to remember even the most basic bodily
functions: lose of urinary and sphincter control are common,
for example (hence the expression, "scared shitless"). Moreover
long experience in human history shows that it is impossible to
know a priori how one will react: some people are ridiculously
calm in combat, others are not.

While I have (as I said) no military experience, I have a fair bit of
martial art experience - and what you describe is entirely correct.

Martial arts are art forms, like dance, not combat training.
I really wish people who study them would internalize that. You
get one good punch on the street; you are not experts on actual
warfare, lethal or otherwise. You should take care not to
extrapolate your study of an art form to things you have no
direct experince of.

I am entirely aware of that. I know what martial arts are, and are not.
I know the difference between martial arts that are useful in real
fights, and those that are not - and where the key training differences are.

And I am fully aware that the stuff I have done is not combat training,
and I have no intention of getting into a real fight. I am confident
that I could handle myself in a real fight better than might be expected
by my appearance, as a result of my martial art training - but since I
am short, grey-haired and rather round, that's a very low bar. It takes
a great deal of appropriate training to overcome differences of size,
strength and age - training that I do not have.

But I do know how that all works, and I do understand the difference
between sport fights, sparing, real random fights, and serious combat.

(I'm sorry to have to snip the rest of this. I have read your entire
post with interest, and found it enlightening, but it is simply taking
too much time at the moment. I might get a chance to comment more
later. It is also seriously off-topic! Thank you for the informative
posts.)

Or you can listen to someone who's actually done it and who is
telling you that the intent (which yes, is explained to the
troops) is to demonstrate that we take all of this _very_
seriously, that human falibility means that people can and do
make mistakes, and that we build processes and procedures to try
and mitigate or avoid those mistakes.

Long experience has shown that mutual inspections are better
than relying on self-affirmation.

Just to be clear here - mutual inspections are fine and often a good
thing. It is the idea of standing in line while a commander of some
sort pats you down that is not. Maybe I just misunderstood what you
were saying.

--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Fri Apr 17 02:20:43 2026

From Newsgroup: comp.arch

On 4/17/2026 1:05 AM, David Brown wrote:

On 16/04/2026 20:15, BGB wrote:

On 4/15/2026 5:44 PM, Bill Findlay wrote:

On 15 Apr 2026, David Brown wrote
(in article <10roqep$16j1j$1@dont-email.me>):

On 15/04/2026 17:36, quadi wrote:

On Wed, 15 Apr 2026 03:32:00 +0100, moi wrote:

On 15/04/2026 01:44, MitchAlsup wrote:

One should also note: in the history of this system (~late 1930s) >>>>>>> to present: only 2 properly registered FA guns have been used in any >>>>>>> crimes. {Anyone with a brain would say this is a pretty good record} >>>>>

Anyone with a non-USAn brain would say this is utterly insane.

Utterly insane would be if the same procedure applied to thermonuclear >>>>> warheads.

Of course, much about the consequences of the Second Amendment
indeed does
appear insane. The sensible thing to do would be to repeal it,
rather than
pretend it doesn't exist, or it doesn't mean what it says, and hope >>>>> the
Supreme Court will look the other way.

[wise words omitted]

That's just my two cents - coming from someone in a country ...
where we have far more real-world freedoms than the USA.

That is the bit they really can't fathom.

?...

Note - I come from the UK originally, but live in Norway.

But, AFAIK, the UK is the place that went and banned:
-a-a Sharp points on knives;
-a-a Sharp points on scissors;
-a-a Buying solder without having certifications;
-a-a-a-a So, it is effectively sold black-market in small amounts,
-a-a-a-a-a-a to the electronics hobbyists.
-a-a ...

Complete nonsense.

But both the UK and Norway have restrictions on people carrying around deadly weapons of all sorts.-a The freedom not to be stabbed, shot, or otherwise injured or killed trumps the freedom to carry such weapons.

Fair enough.

Later went and asked Gemini about it, it said that the laws restrict
carrying things with sharp tips (like knives and similar) rather than possession of them (say, at a person's house).

Apparently, the idea that it was a ban on all pointy things was an over-generalization that floats around on the internet.

...

And, where a person can be arrested, for stuff they say on social
media (or "thought crime" as some are calling it);

Only thoughtless people are calling it that.-a You've been watching tool much Fox News - a channel that describes itself as "entertainment"
without any obligation to tell the truth.

Actually, mostly, it has been a mix of YouTube videos/shorts and
Twitter/X threads...

The freedom of innocent people not to suffer abuse, hatred and prejudice trumps the freedom of nasty little sods who think they have the right to abuse others.-a And inciting hatred or encouraging others to commit
criminal behaviour is just as much a crime in the USA as the UK.

OK.

Some people were making it sound like they were opposing peoples'
abilities to have and express opinions in general (though they didn't
usually specify on what sorts of topics).

Originally, seemed like, it could have been something like, say:
Person says something bad about a political leader or similar;
Political leader sees it and feels insulted and has person arrested.
Or, something of this sort, ...

But, yeah, in the US people usually say the First Amendment guarantees peoples' freedom to have and express opinions about whatever.

Though, OTOH, I guess things like social media platforms still have the ability to ban people from the platform if they go around spreading hate-speech or similar.

Apparently I guess that was a problem in the past with DJT, then he got
banned off of Twitter, then he started his own social network, then Elon bought Twitter and renamed it X, and DJT got back on there, ...

Where corporations can lead search-and-seizure operations for claimed
IP violations;

No, they can't.

And in European countries, unlike the USA, corporations don't get to lie
and cheat then claim "freedom of speech".

Saw a thing not too long ago talking about how apparently Sega left some Nintendo devkits in an old office building, then abandoned the building,
and later the building owners sold off all the old junk that was left in
the buildings.

Story goes that the guy who bought up some of the old junk posted about
it on the internet, and then Sega + Nintendo + UK Police went in and
arrested the guy and took all his stuff (then they released the guy, but
he didn't get the stuff back), because the idea was that him having the devkits was considered as theft of intellectual property.

Eg (finding a few videos talking about one of the incidents): https://www.youtube.com/watch?v=Sy9Eb8J0xGk https://www.youtube.com/watch?v=NU040CTdJI0

There was another video I saw in the past talking about it originally,
but I didn't see them now (haven't watched through the videos I found
now to compare details).

...

Well, contrast I guess if people post leaked closed-source code on
GitHub or similar, the companies that own the code may issue DMCA
take-downs or similar, but will not generally raid the person's house.

There was some guy though (with some balls) who was releasing
ported/modded versions of some previously released SuperMario64 code.

Not personally inclined to look at it or mess with it though.

We are free to live safely.-a We have the freedom to send our kids to
school without worrying if they will survive the day.-a We have the
freedom of knowing that we won't lose our jobs just because the boss is having a bad hair day.-a And losing a job does not mean losing our
health.-a And we have freedom to vote, knowing that votes count equally. (Well, the UK parliament elections still have a way to go here, but the Scottish and Norwegian elections have fair votes.)

No country is perfect by any means, but Europeans live far freer lives.
We might not have the freedom to own guns so our kids can accidentally
kill each other, but overall we win out.

Remember, freedoms are always a balance, not an absolute.-a Lots of types
of freedom for one person reduce other freedoms for other people.

OK.

...

So, sorta like California but worse...

California apparently banned the 60/40 lead/tin stuff IIRC, but still
allows people to freely possess lead-free solder (so people apparently
need to smuggle the 60/40 into CA if they want to use it). Everywhere
else, 60/40 is OK. Well, and CA has the "age verification"
controversy, etc.

I can't answer for California, but if I want leaded solder I can just
order some.-a But the regulations against the use of lead in general are
a good thing - the freedom to drink water without lead trumps the
freedom of a handful of people to use cheaper and lower temperature soldering irons.

Where I am, solder is sold on Amazon or similar...

Had seen videos where people made it seem like solder was some sort of black-market contraband.

Looking around, apparently the restrictions were specifically on
lead-based solder though, rather than restricting all solder.

In the case of California, did see something not too long ago saying
that they were trying to get something passed to ban personal ownership
of 3D printers and CNC machines.

Though, I didn't see anyone else talking about this, so it seemed
unconfirmed.

There is a lot more talking going on about the CA "OS age verification"
bull, and now something saying that the US federal people are now
looking into something similar (groan, this has a risk to potentially
ruin open source and make everything suck...). Hopefully it goes the way
of past proposals for bans on OSS and RISC-V and similar, ...

So, not like US is exactly perfect either...

Could be wrong, this is from memory and stuff I heard on the internet.

You should be a lot more careful about what you watch on the internet.

Possibly.

Then again, I guess a lot of the news I had seen had also been fed
through the lens of video game commentators and similar.

--- Synchronet 3.21f-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Fri Apr 17 08:51:12 2026

From Newsgroup: comp.arch

David Brown <david.brown@hesbynett.no> writes:

On 16/04/2026 20:15, BGB wrote:

And, where a person can be arrested, for stuff they say on social media
(or "thought crime" as some are calling it);

Only thoughtless people are calling it that. You've been watching tool
much Fox News

Or he has been watching too much Youtube (or the like). Every time
somebody tells me that he thinks that something is true that isn't, or
makes a strange judgement, as in this case, and I ask them where they
got that from, they tell me "Youtube".

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Fri Apr 17 15:05:41 2026

From Newsgroup: comp.arch

On Fri, 17 Apr 2026 08:51:12 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

David Brown <david.brown@hesbynett.no> writes:

On 16/04/2026 20:15, BGB wrote:

And, where a person can be arrested, for stuff they say on social
media (or "thought crime" as some are calling it);

Only thoughtless people are calling it that. You've been watching
tool much Fox News

Or he has been watching too much Youtube (or the like). Every time
somebody tells me that he thinks that something is true that isn't, or
makes a strange judgement, as in this case, and I ask them where they
got that from, they tell me "Youtube".

- anton

Pay attention that David Brown didn't say that things mentioned by BGB
don't actually happen in UK. He was merely disagreeinng with Orwellian
naming.

--- Synchronet 3.21f-Linux NewsLink 1.2

From Bill Findlay@findlaybill@blueyonder.co.uk to comp.arch on Fri Apr 17 13:09:54 2026

From Newsgroup: comp.arch

On 17 Apr 2026, BGB wrote
(in article <10rrsqc$22m9c$1@dont-email.me>):

On 4/16/2026 5:00 PM, Bill Findlay wrote:

On 16 Apr 2026, BGB wrote
(in article <10rr8vi$1sh0n$1@dont-email.me>):
...

But, AFAIK, the UK is the place that went and banned:
Sharp points on knives;
Sharp points on scissors;
Buying solder without having certifications;
So, it is effectively sold black-market in small amounts,
to the electronics hobbyists.
...
And, where a person can be arrested, for stuff they say on social media (or "thought crime" as some are calling it);
Where corporations can lead search-and-seizure operations for claimed IP violations;

It is clear from that claptrap that in fact you know very little.
(MAGA shills like JD are not a trustworthy source of information.)

I am not really part of the MAGA crowd.
I am not really into politics in general...

But, this is still what people say online about the UK and CA and similar...

So, myevaluation of your words was spot on.
--
Bill Findlay

--- Synchronet 3.21f-Linux NewsLink 1.2

From Bill Findlay@findlaybill@blueyonder.co.uk to comp.arch on Fri Apr 17 13:11:59 2026

From Newsgroup: comp.arch

On 17 Apr 2026, David Brown wrote
(in article <10rsik4$287kv$1@dont-email.me>):

On 16/04/2026 20:15, BGB wrote:

On 4/15/2026 5:44 PM, Bill Findlay wrote:

On 15 Apr 2026, David Brown wrote
(in article <10roqep$16j1j$1@dont-email.me>):

On 15/04/2026 17:36, quadi wrote:

On Wed, 15 Apr 2026 03:32:00 +0100, moi wrote:

On 15/04/2026 01:44, MitchAlsup wrote:

One should also note: in the history of this system (~late 1930s) to present: only 2 properly registered FA guns have been used in any
crimes. {Anyone with a brain would say this is a pretty good record}

Anyone with a non-USAn brain would say this is utterly insane.

Utterly insane would be if the same procedure applied to thermonuclear
warheads.

Of course, much about the consequences of the Second Amendment
indeed does
appear insane. The sensible thing to do would be to repeal it,
rather than
pretend it doesn't exist, or it doesn't mean what it says, and hope the
Supreme Court will look the other way.

[wise words omitted]

That's just my two cents - coming from someone in a country ...
where we have far more real-world freedoms than the USA.

That is the bit they really can't fathom.

?...

Note - I come from the UK originally, but live in Norway.

But, AFAIK, the UK is the place that went and banned:
Sharp points on knives;
Sharp points on scissors;
Buying solder without having certifications;
So, it is effectively sold black-market in small amounts,
to the electronics hobbyists.
...

Complete nonsense.

But both the UK and Norway have restrictions on people carrying around
deadly weapons of all sorts. The freedom not to be stabbed, shot, or otherwise injured or killed trumps the freedom to carry such weapons.

And, where a person can be arrested, for stuff they say on social media
(or "thought crime" as some are calling it);

Only thoughtless people are calling it that. You've been watching tool
much Fox News - a channel that describes itself as "entertainment"
without any obligation to tell the truth.

The freedom of innocent people not to suffer abuse, hatred and prejudice trumps the freedom of nasty little sods who think they have the right to abuse others. And inciting hatred or encouraging others to commit
criminal behaviour is just as much a crime in the USA as the UK.

Where corporations can lead search-and-seizure operations for claimed IP violations;

No, they can't.

And in European countries, unlike the USA, corporations don't get to lie
and cheat then claim "freedom of speech".

We are free to live safely. We have the freedom to send our kids to
school without worrying if they will survive the day. We have the
freedom of knowing that we won't lose our jobs just because the boss is having a bad hair day. And losing a job does not mean losing our
health. And we have freedom to vote, knowing that votes count equally.
(Well, the UK parliament elections still have a way to go here, but the Scottish and Norwegian elections have fair votes.)

No country is perfect by any means, but Europeans live far freer lives.
We might not have the freedom to own guns so our kids can accidentally
kill each other, but overall we win out.

Remember, freedoms are always a balance, not an absolute. Lots of types
of freedom for one person reduce other freedoms for other people.

...

So, sorta like California but worse...

California apparently banned the 60/40 lead/tin stuff IIRC, but still allows people to freely possess lead-free solder (so people apparently
need to smuggle the 60/40 into CA if they want to use it). Everywhere
else, 60/40 is OK. Well, and CA has the "age verification" controversy, etc.

I can't answer for California, but if I want leaded solder I can just
order some. But the regulations against the use of lead in general are
a good thing - the freedom to drink water without lead trumps the
freedom of a handful of people to use cheaper and lower temperature
soldering irons.

Could be wrong, this is from memory and stuff I heard on the internet.

You should be a lot more careful about what you watch on the internet.

Bravo! You put a lot more work into that than I could
stomach in response to Fuck News talking points.
Thank you.
--
Bill Findlay

--- Synchronet 3.21f-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch on Fri Apr 17 14:18:45 2026

From Newsgroup: comp.arch

On 17/04/2026 09:20, BGB wrote:

On 4/17/2026 1:05 AM, David Brown wrote:

On 16/04/2026 20:15, BGB wrote:

On 4/15/2026 5:44 PM, Bill Findlay wrote:

On 15 Apr 2026, David Brown wrote
(in article <10roqep$16j1j$1@dont-email.me>):

On 15/04/2026 17:36, quadi wrote:

On Wed, 15 Apr 2026 03:32:00 +0100, moi wrote:

On 15/04/2026 01:44, MitchAlsup wrote:

One should also note: in the history of this system (~late 1930s) >>>>>>>> to present: only 2 properly registered FA guns have been used in >>>>>>>> any
crimes. {Anyone with a brain would say this is a pretty good
record}

Anyone with a non-USAn brain would say this is utterly insane.

Utterly insane would be if the same procedure applied to
thermonuclear
warheads.

Of course, much about the consequences of the Second Amendment
indeed does
appear insane. The sensible thing to do would be to repeal it,
rather than
pretend it doesn't exist, or it doesn't mean what it says, and
hope the
Supreme Court will look the other way.

[wise words omitted]

That's just my two cents - coming from someone in a country ...
where we have far more real-world freedoms than the USA.

That is the bit they really can't fathom.

?...

Note - I come from the UK originally, but live in Norway.

But, AFAIK, the UK is the place that went and banned:
-a-a Sharp points on knives;
-a-a Sharp points on scissors;
-a-a Buying solder without having certifications;
-a-a-a-a So, it is effectively sold black-market in small amounts,
-a-a-a-a-a-a to the electronics hobbyists.
-a-a ...

Complete nonsense.

But both the UK and Norway have restrictions on people carrying around
deadly weapons of all sorts.-a The freedom not to be stabbed, shot, or
otherwise injured or killed trumps the freedom to carry such weapons.

Fair enough.

Later went and asked Gemini about it, it said that the laws restrict carrying things with sharp tips (like knives and similar) rather than possession of them (say, at a person's house).

I don't think any LLM is going to be a good source of information here.
Gemini might not be as bad as Grok, but AI will often miss the point,
and be heavily influenced by the kinds of drivel that is often published
on the net.

Basically, the laws say that if you are caught with a large screwdriver
that you are using to stab or threaten people, you will be charged and
treated as though you were carrying a knife for that purpose. Older
laws banned carrying knifes and the like in public places - newer laws
target weapons, where a "weapon" is anything that you use or plan to use
for violence or threats of violence.

Apparently, the idea that it was a ban on all pointy things was an over- generalization that floats around on the internet.

Exactly. Social media rewards people who greatly exaggerate things to
make them sound dramatic. And political extremists love fear-mongering,
with a total disregard for the truth or subtleties of reality.

...

And, where a person can be arrested, for stuff they say on social
media (or "thought crime" as some are calling it);

Only thoughtless people are calling it that.-a You've been watching
tool much Fox News - a channel that describes itself as
"entertainment" without any obligation to tell the truth.

Actually, mostly, it has been a mix of YouTube videos/shorts and
Twitter/X threads...

If he were still alive, Goebbels would have banned Twatter for having
too much right-wing propaganda. It is not healthy to use it as a source
of information unless you want the calender dates for KKK meetings.

YouTube shorts are not much better. Research has shown again and again
that social media algorithms exaggerate from people's opinions and
interests - they provide echo chambers that move you further and further
from central or balanced positions. It is somewhat inevitable -
balanced opinions are not particularly interesting or engaging, so they
are not popular and don't generate revenue for the social media platform
or the content creator. The real world is seldom as exciting as
people's imaginations.

I'd recommend looking at a reality check site like snopes.com to get an
idea of how easily people get duped. Begin smart isn't enough (you are
a very smart guy) - you need to understand how you are being
manipulated. Ask yourself "Cui bono" ? Who is benefiting, in money,
power or influence?

The freedom of innocent people not to suffer abuse, hatred and
prejudice trumps the freedom of nasty little sods who think they have
the right to abuse others.-a And inciting hatred or encouraging others
to commit criminal behaviour is just as much a crime in the USA as the
UK.

OK.

Some people were making it sound like they were opposing peoples'
abilities to have and express opinions in general (though they didn't usually specify on what sorts of topics).

Originally, seemed like, it could have been something like, say:
-a Person says something bad about a political leader or similar;
-a Political leader sees it and feels insulted and has person arrested.
Or, something of this sort, ...

That does happen in some countries. There are plenty of dictatorships,
or partial dictatorships, where that kind of thing goes on. Journalists
are banned from official press conferences or buildings because they ask awkward questions or publish things unflattering to the wannabe king. Politicians get attacked for saying things like people have to follow
laws. The USA has gone seriously downhill in that respect in the last
16 months or so. Some other parts of the world are worse - sometimes
much worse. But most European countries have very strong freedom of
speech protection as long as you are not harming other people with your speech. (Freedom of speech must be weighed against freedom /from/
speech - just like freedom of religion and freedom from religion.)

But, yeah, in the US people usually say the First Amendment guarantees peoples' freedom to have and express opinions about whatever.

Many people do think that, as I understand it. But that's not what the constitution says. In particular, it only limits what Congress and
federal authorities can do to stop people expressing themselves - it
does not in any way require non-government entities from allowing people
to say anything they want. Media (newspapers, TV, social media, etc.)
can impose whatever limitations they want.

Though, OTOH, I guess things like social media platforms still have the ability to ban people from the platform if they go around spreading hate-speech or similar.

Correct. Equally, they are allowed to encourage hate-speech and ban
people arguing against it.

There are plenty of restrictions to free speech in the USA - you can't
incite violence, for example. And something you say might be considered conspiracy to commit a crime. Things you say might fall foul of other
laws, such as harassment, psychological abuse, prejudice, etc. Other countries' laws might put more emphasis on freedom from hate speech than
the USA, but the idea that the USA has freedom of speech and Europeans
do not is wrong.

Apparently I guess that was a problem in the past with DJT, then he got banned off of Twitter, then he started his own social network, then Elon bought Twitter and renamed it X, and DJT got back on there, ...

Where corporations can lead search-and-seizure operations for claimed
IP violations;

No, they can't.

And in European countries, unlike the USA, corporations don't get to
lie and cheat then claim "freedom of speech".

Saw a thing not too long ago talking about how apparently Sega left some Nintendo devkits in an old office building, then abandoned the building,
and later the building owners sold off all the old junk that was left in
the buildings.

Story goes that the guy who bought up some of the old junk posted about
it on the internet, and then Sega + Nintendo + UK Police went in and arrested the guy and took all his stuff (then they released the guy, but
he didn't get the stuff back), because the idea was that him having the devkits was considered as theft of intellectual property.

So what you are saying here is that the /police/ conducted a raid and
seizure in connection with suspected unlawfully obtained goods and/or industrial espionage, acting on information provided by a company that believed it was a victim of the crime? And that when the dust settled,
they realised that there was no intentional crime?

That's not the breakdown of free society as you implied - it's the
police doing their job of enforcing the law, but possibly making the
wrong judgement call. Police in every country have to make decisions
based on limited information, and sometimes they make the wrong
decision. (I don't know this case, and can't say if they were wrong or right.)

Eg (finding a few videos talking about one of the incidents): https://www.youtube.com/watch?v=Sy9Eb8J0xGk https://www.youtube.com/watch?v=NU040CTdJI0

I do not want to click on your links and follow you down your rabbit
hole. The Youtube algorithm has learned that I like maths and physics
videos, some computing stuff, some comedy, linguistics, etc. I'd rather
it didn't think I was interested in conspiracy theories about the
breakdown of every country that does not follow Maga philosophies.

Remember, freedoms are always a balance, not an absolute.-a Lots of
types of freedom for one person reduce other freedoms for other people.

OK.

...

So, sorta like California but worse...

California apparently banned the 60/40 lead/tin stuff IIRC, but still
allows people to freely possess lead-free solder (so people
apparently need to smuggle the 60/40 into CA if they want to use it).
Everywhere else, 60/40 is OK. Well, and CA has the "age verification"
controversy, etc.

I can't answer for California, but if I want leaded solder I can just
order some.-a But the regulations against the use of lead in general
are a good thing - the freedom to drink water without lead trumps the
freedom of a handful of people to use cheaper and lower temperature
soldering irons.

Where I am, solder is sold on Amazon or similar...

Had seen videos where people made it seem like solder was some sort of black-market contraband.

So you know you can get solder in the post in a couple of days, but you
have seen someone on a video saying it has been banned and you have to
smuggle it on the black market - and you believed the video, not your
own experience?

Looking around, apparently the restrictions were specifically on lead-
based solder though, rather than restricting all solder.

Of course. Lead is a neuropoison, and has caused significant reduction
in mental capacity (and other health problems) for vast numbers of
people. An argument has been made that the fall of the Roman Empire can
be partially blamed on lead pipes and lead dishes. Lead from petrol has caused massive low-level poisoning. Lead in groundwater causes
poisoning. It makes a lot of sense to have regulations to reduce the
use of lead in other situations - such as solder - where there are
perfectly good alternatives.

In the case of California, did see something not too long ago saying
that they were trying to get something passed to ban personal ownership
of 3D printers and CNC machines.

Though, I didn't see anyone else talking about this, so it seemed unconfirmed.

Don't misunderstand me - sometimes it really is a case of politicians
doing stupid things. Sometimes it is for personal profit or the result
of lobbying, often it is well-intentioned but not matched by an
understanding of the implications. Here the lawmakers saw that people
can make gun parts on 3-D printers, and wanted to stop that from being possible (fair enough as an aim). The resulting proposed legislation
would have banned all personal usage of 3-D printers - a real "throw the
baby out with the bathwater" solution.

There is a lot more talking going on about the CA "OS age verification" bull, and now something saying that the US federal people are now
looking into something similar (groan, this has a risk to potentially
ruin open source and make everything suck...). Hopefully it goes the way
of past proposals for bans on OSS and RISC-V and similar, ...

So, not like US is exactly perfect either...

The situation here is that social media is not suitable or safe for
kids. (It is not safe or suitable for adults either, but it's harder to
argue that politically - "think of the kids" is always good for votes.)
Social media countries don't want to do anything about this, making any
kind of realistic registration or age checks, and they certainly don't
want to have to remove the dangerous crap they host. So they punt the
problem - they promise politicians lots of money if the politicians
impose laws saying the OS or other platforms must handle the age
verification. I don't know where this one will all end. (There's an
easy solution - social media companies could charge $10 a year per
account, payable only via credit card. That would immediately solve
much of the problems they cause.)

Could be wrong, this is from memory and stuff I heard on the internet.

You should be a lot more careful about what you watch on the internet.

Possibly.

Then again, I guess a lot of the news I had seen had also been fed
through the lens of video game commentators and similar.

Video game commentators are good for comments on video games.

You would do well to look at sites like ground.news or allsides.com that
make a specific point of showing news from multiple different sites to
help you understand the biases. Or look at multiple news sources that
are publicly funded but independent of any direct government control,
such as the BBC website. For privately owned news sources, find one
that charges you money - "free" sites still charge you, but in hidden
ways. No one site is perfect - you have to combine them.

--- Synchronet 3.21f-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch on Fri Apr 17 14:47:50 2026

From Newsgroup: comp.arch

On 17/04/2026 14:05, Michael S wrote:

On Fri, 17 Apr 2026 08:51:12 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

David Brown <david.brown@hesbynett.no> writes:

On 16/04/2026 20:15, BGB wrote:

And, where a person can be arrested, for stuff they say on social
media (or "thought crime" as some are calling it);

Only thoughtless people are calling it that. You've been watching
tool much Fox News

Or he has been watching too much Youtube (or the like). Every time
somebody tells me that he thinks that something is true that isn't, or
makes a strange judgement, as in this case, and I ask them where they
got that from, they tell me "Youtube".

That does seem to be one of BGB's sources, yes. There's a lot of good
and interesting stuff on Youtube too, but you have to pick your choices actively - not just let algorithms and autoplay wander through popular
videos.

Pay attention that David Brown didn't say that things mentioned by BGB
don't actually happen in UK. He was merely disagreeinng with Orwellian naming.

While that is true of what I wrote there, I do actually think a lot of
what BGB wrote was incorrect. Many parts had a grain of truth that had
been exaggerated well beyond what was reasonable - but the grain of
truth was still there. Some laws and regulations are significantly
different between the USA and the UK, and it is entirely reasonable to disagree with or disapprove of some of them.

The UK does not have "thought crime" in any sense. But it is true that
it is possible to make postings on social media that constitute criminal behaviour in the UK. It is also true in the USA. (For example,
threatening the president is a crime, no matter how unrealistic or
laughable the threat may be.) I think it is fair to say that it is
easier for a post to be a crime in the UK than the USA, but there is not
as much of a difference as some people (and some Youtubers!) think.

--- Synchronet 3.21f-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch on Fri Apr 17 15:01:12 2026

From Newsgroup: comp.arch

On 17/04/2026 14:11, Bill Findlay wrote:

On 17 Apr 2026, David Brown wrote
(in article <10rsik4$287kv$1@dont-email.me>):

On 16/04/2026 20:15, BGB wrote:

Could be wrong, this is from memory and stuff I heard on the internet.

You should be a lot more careful about what you watch on the internet.

Bravo! You put a lot more work into that than I could
stomach in response to Fuck News talking points.
Thank you.

I've "known" BGB for many years on Usenet. He is very intelligent, but sometimes a bit too quick to trust the wrong sources. I hope I can
encourage him to be more careful about what to trust and what not to
trust. (And that includes not believing things just because /I/ say so
either - I can be wrong too.)

Fun fact - I heard about the "ground.news" site I mentioned because they sponsored some Youtube videos I have watched :-)

--- Synchronet 3.21f-Linux NewsLink 1.2

From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Fri Apr 17 16:07:21 2026

From Newsgroup: comp.arch

David Brown wrote:

On 17/04/2026 09:20, BGB wrote:

On 4/17/2026 1:05 AM, David Brown wrote:

Complete nonsense.

But both the UK and Norway have restrictions on people carrying
around deadly weapons of all sorts.|e-a The freedom not to be stabbed,
shot, or otherwise injured or killed trumps the freedom to carry such
weapons.

Fair enough.

Later went and asked Gemini about it, it said that the laws restrict
carrying things with sharp tips (like knives and similar) rather than >> possession of them (say, at a person's house).

I don't think any LLM is going to be a good source of information here. Gemini might not be as bad as Grok, but AI will often miss the point,
and be heavily influenced by the kinds of drivel that is often published
on the net.

Basically, the laws say that if you are caught with a large screwdriver
that you are using to stab or threaten people, you will be charged and > treated as though you were carrying a knife for that purpose.-a Older
laws banned carrying knifes and the like in public places - newer laws > target weapons, where a "weapon" is anything that you use or plan to use
for violence or threats of violence.

Norway did not use to have any restrictions at all on tools, like knives/axes/scythes up to and including shotguns.
The current regulations have explicit exceptions for knife carry in
public plaes when those knives (or swords!) are part of uniform or
traditional dress. I.e around May 17th it is perfectly fine to carry
large amounts of metal (mostly silver) through airport security, but
they might tell you that the silver-decorated knife should go in checked luggage.
Similarly, Scouts' knives are fine anywhere.
Regarding firearms the Norwegian regulations are still way less strict
than the UK, pretty much anyone without a mental illness or felony
record can legally own a handgun:
You just need to start by becoming a member of a local pistol shooting
club, then turn up regularly to practice using club guns (at least 10
times or more) over a year, then pass a police security vetting which
check for those mental/felony bans.
At this point you can legally buy something like a Glock or 1911 and get the serial number on your credit-card sized ownership card.
If you are very active, then you can get separate permits for a primary
and spare gun for each of the competition classes you regularly compete
in, I know people with 10+ handguns in their gun safe.
However, unlike the US, there is absolutely no way to get either an open or concealed carry permit unless you are military or police.
Any handgun you own _must_ be stored in a proper gun safe, in can only
be brought out for cleaning and to transport it to the shooting range.
During that transport, the gun cannot be in the front seat with you, it
has to stay in the trunk or back seat, unloaded of course, and still in
its carrying box.
Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
--- Synchronet 3.21f-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch on Fri Apr 17 16:57:20 2026

From Newsgroup: comp.arch

On 17/04/2026 16:07, Terje Mathisen wrote:

David Brown wrote:

On 17/04/2026 09:20, BGB wrote:

On 4/17/2026 1:05 AM, David Brown wrote:

Complete nonsense.

But both the UK and Norway have restrictions on people carrying
around deadly weapons of all sorts.|e-a The freedom not to be stabbed, >>>> shot, or otherwise injured or killed trumps the freedom to carry
such weapons.

Fair enough.

Later went and asked Gemini about it, it said that the laws restrict
carrying things with sharp tips (like knives and similar) rather than
possession of them (say, at a person's house).

I don't think any LLM is going to be a good source of information
here. Gemini might not be as bad as Grok, but AI will often miss the
point, and be heavily influenced by the kinds of drivel that is often
published on the net.

Basically, the laws say that if you are caught with a large
screwdriver that you are using to stab or threaten people, you will be
charged and treated as though you were carrying a knife for that
purpose.-a Older laws banned carrying knifes and the like in public
places - newer laws target weapons, where a "weapon" is anything that
you use or plan to use for violence or threats of violence.

Norway did not use to have any restrictions at all on tools, like knives/axes/scythes up to and including shotguns.

The current regulations have explicit exceptions for knife carry in
public plaes when those knives (or swords!) are part of uniform or traditional dress. I.e around May 17th it is perfectly fine to carry
large amounts of metal (mostly silver) through airport security, but
they might tell you that the silver-decorated knife should go in checked luggage.

I were my sgian-dubh with my kilt. Despite the name, it's not very
hidden. It used to be legal to have one in the cabin of planes in the
UK, as long as it stayed in your sock.

Similarly, Scouts' knives are fine anywhere.

Regarding firearms the Norwegian regulations are still way less strict
than the UK, pretty much anyone without a mental illness or felony
record can legally own a handgun:

You just need to start by becoming a member of a local pistol shooting
club, then turn up regularly to practice using club guns (at least 10
times or more) over a year, then pass a police security vetting which
check for those mental/felony bans.

At this point you can legally buy something like a Glock or 1911 and get
the serial number on your credit-card sized ownership card.

If you are very active, then you can get separate permits for a primary
and spare gun for each of the competition classes you regularly compete
in, I know people with 10+ handguns in their gun safe.

However, unlike the US, there is absolutely no way to get either an open
or concealed carry permit unless you are military or police.

Indeed. And you have to keep them locked in a gun safe, which the
police can check at short notice.

Similarly, you can own a hunting rifle if you pass the hunting tests and
are vetted by the police.

Any handgun you own _must_ be stored in a proper gun safe, in can only
be brought out for cleaning and to transport it to the shooting range.

During that transport, the gun cannot be in the front seat with you, it
has to stay in the trunk or back seat, unloaded of course, and still in
its carrying box.

Basically, in Norway you can have guns for sport or hunting, but not for threatening or shooting people.

The gun laws in the UK are a lot more restrictive (after all, there's
not nearly as much scope for hunting in most of the UK). Farmers can
get shotgun licenses, but I think if you have a pistol for sport it has
to be kept at the pistol club, not at home. (I have not looked at the
rules in detail, so I could be wrong or out-dated.)

--- Synchronet 3.21f-Linux NewsLink 1.2

From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Fri Apr 17 17:30:07 2026

From Newsgroup: comp.arch

David Brown wrote:

On 17/04/2026 16:07, Terje Mathisen wrote:

David Brown wrote:

On 17/04/2026 09:20, BGB wrote:

On 4/17/2026 1:05 AM, David Brown wrote:

Complete nonsense.

But both the UK and Norway have restrictions on people carrying
around deadly weapons of all sorts.|arCU|e-a The freedom not to be
stabbed, shot, or otherwise injured or killed trumps the freedom to >>>>> carry such weapons.

Fair enough.

Later went and asked Gemini about it, it said that the laws restrict
carrying things with sharp tips (like knives and similar) rather
than possession of them (say, at a person's house).

I don't think any LLM is going to be a good source of information
here. Gemini might not be as bad as Grok, but AI will often miss the >>> point, and be heavily influenced by the kinds of drivel that is often
published on the net.

Basically, the laws say that if you are caught with a large
screwdriver that you are using to stab or threaten people, you will
be charged and treated as though you were carrying a knife for that
purpose.|e-a Older laws banned carrying knifes and the like in public
places - newer laws target weapons, where a "weapon" is anything that
you use or plan to use for violence or threats of violence.

Norway did not use to have any restrictions at all on tools, like
knives/axes/scythes up to and including shotguns.

The current regulations have explicit exceptions for knife carry in
public plaes when those knives (or swords!) are part of uniform or
traditional dress. I.e around May 17th it is perfectly fine to carry
large amounts of metal (mostly silver) through airport security, but
they might tell you that the silver-decorated knife should go in
checked luggage.

I were my sgian-dubh with my kilt.-a Despite the name, it's not very hidden.-a It used to be legal to have one in the cabin of planes in the
UK, as long as it stayed in your sock.

Similarly, Scouts' knives are fine anywhere.

Regarding firearms the Norwegian regulations are still way less strict
than the UK, pretty much anyone without a mental illness or felony
record can legally own a handgun:

You just need to start by becoming a member of a local pistol shooting
club, then turn up regularly to practice using club guns (at least 10 >> times or more) over a year, then pass a police security vetting which >> check for those mental/felony bans.

At this point you can legally buy something like a Glock or 1911 and
get the serial number on your credit-card sized ownership card.

If you are very active, then you can get separate permits for a
primary and spare gun for each of the competition classes you
regularly compete in, I know people with 10+ handguns in their gun safe.

However, unlike the US, there is absolutely no way to get either an
open or concealed carry permit unless you are military or police.

Indeed.-a And you have to keep them locked in a gun safe, which the
police can check at short notice.

Similarly, you can own a hunting rifle if you pass the hunting tests and
are vetted by the police.

Any handgun you own _must_ be stored in a proper gun safe, in can only
be brought out for cleaning and to transport it to the shooting range.>>
During that transport, the gun cannot be in the front seat with you,
it has to stay in the trunk or back seat, unloaded of course, and
still in its carrying box.

Basically, in Norway you can have guns for sport or hunting, but not for threatening or shooting people.

The gun laws in the UK are a lot more restrictive (after all, there's
not nearly as much scope for hunting in most of the UK).-a Farmers can
get shotgun licenses, but I think if you have a pistol for sport it has
to be kept at the pistol club, not at home.-a (I have not looked at the rules in detail, so I could be wrong or out-dated.)

Its worse as far as I know:
Even Olympic shooters in the UK cannot own handguns, they have to be
active military, with the gun(s) kept on base.
Non-military shooters need to cross the channel in order to practise in
France instead.
Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
--- Synchronet 3.21f-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Fri Apr 17 15:45:07 2026

From Newsgroup: comp.arch

David Brown <david.brown@hesbynett.no> writes:

On 16/04/2026 20:15, BGB wrote:

On 4/15/2026 5:44 PM, Bill Findlay wrote:

<snip>

California apparently banned the 60/40 lead/tin stuff IIRC, but still
allows people to freely possess lead-free solder (so people apparently
need to smuggle the 60/40 into CA if they want to use it). Everywhere
else, 60/40 is OK. Well, and CA has the "age verification" controversy,
etc.

I can't answer for California,

I can. 60/40 solder is NOT banned in california. It is not allowed, however, to be used in drinking water plumbing applications (for fairly obvious reasons)
and it must be labeled as potentially hazardous per prop 65.

Most of what BGB wrote above seems completely wrong. 30 seconds with
google before he/she posts facts would be a good habit for he/she to adopt.

--- Synchronet 3.21f-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Fri Apr 17 15:55:49 2026

From Newsgroup: comp.arch

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 4/16/2026 5:13 PM, Scott Lurndal wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 4/16/2026 11:52 AM, Scott Lurndal wrote:

Brain fart. His squadron flys the F/A-18C and D models. The
E's and F's will remain in the active fleet along with the F-35,

Can he fly the F-35?

Only if he gets a ride in the back seat. Which, since the F35
is a single seater, is not gonna happen.

Oh damn! Shit. He can fly the f-16?

DAGS "ordie".
--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Fri Apr 17 12:37:04 2026

From Newsgroup: comp.arch

On 4/17/2026 8:01 AM, David Brown wrote:

On 17/04/2026 14:11, Bill Findlay wrote:

On 17 Apr 2026, David Brown wrote
(in article <10rsik4$287kv$1@dont-email.me>):

On 16/04/2026 20:15, BGB wrote:

Could be wrong, this is from memory and stuff I heard on the internet.

You should be a lot more careful about what you watch on the internet.

Bravo! You put a lot more work into that than I could
stomach in response to Fuck News talking points.
Thank you.

I've "known" BGB for many years on Usenet.-a He is very intelligent, but sometimes a bit too quick to trust the wrong sources.-a I hope I can encourage him to be more careful about what to trust and what not to trust.-a (And that includes not believing things just because /I/ say so either - I can be wrong too.)

Fun fact - I heard about the "ground.news" site I mentioned because they sponsored some Youtube videos I have watched :-)

Not usually been one for fact checking, as my interest areas are still (mostly) technical (rather than political or legal), so whatever
political leaning I would have mostly shouldn't matter.

But, yeah, sometimes it gets a little sketchy if it is just someone
rambling on top of gameplay footage, like Subway Surfers, or similar.
Back in the day, we had Audiosurf, but seems that Subway Surfers has
overtaken this role.

Some people have also used games like COD or Minecraft, but watching
someone play these games from a first person POV for any length of time
causes motion sickness.

At one point, people doing videos with talking over the top of clips of Skibidi Toilet was pretty popular (during the high point of Skibidi
Toilet, but its popularity seems to have weakened as of late).

The video of the guy ranting about CA taking away personal ownership of
3D printers and CNC machines was over the top of video of him working on
one of his craft projects. Usually though, for stuff like this, there
would be repetition from multiple sources (in a case of "one guy saying something, possibly all just nothing, many people saying the same thing, possibly true" sense).

Sometimes there is a lot of stuff with people fighting over "Evolution
vs Young Earth Creationism" and similar. Sometimes people arguing for
the Earth being a flat disk (obviously wrong), ...

All sorts of stuff going on...

Sometimes does wander into international politics territory.

Sort of reminded how for a while there were lots of YouTube sponsored
segments for a company claiming to sell Lordship status...

Then other people saying it was a scam, because that was "not how it
worked". IIRC idea was that they bought a farm somewhere and were
"selling" it in 1ft^2 parcels or similar, planting a little flag on each
one, and then selling the people the title of "Lord Whatever" under the premise of them having their flag planted on a parcel of land in the UK
or such...

Had to look, couldn't initially remember name off-hand: https://en.wikipedia.org/wiki/Established_Titles

Then the whole thing went away.
Apparently a combination of public controversy and also the UK
government apparently also being like "that is not how that works".

...

--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Fri Apr 17 13:13:34 2026

From Newsgroup: comp.arch

On 4/17/2026 7:09 AM, Bill Findlay wrote:

On 17 Apr 2026, BGB wrote
(in article <10rrsqc$22m9c$1@dont-email.me>):

On 4/16/2026 5:00 PM, Bill Findlay wrote:

On 16 Apr 2026, BGB wrote
(in article <10rr8vi$1sh0n$1@dont-email.me>):
...

But, AFAIK, the UK is the place that went and banned:
Sharp points on knives;
Sharp points on scissors;
Buying solder without having certifications;
So, it is effectively sold black-market in small amounts,
to the electronics hobbyists.
...
And, where a person can be arrested, for stuff they say on social media >>>> (or "thought crime" as some are calling it);
Where corporations can lead search-and-seizure operations for claimed IP >>>> violations;

It is clear from that claptrap that in fact you know very little.
(MAGA shills like JD are not a trustworthy source of information.)

I am not really part of the MAGA crowd.
I am not really into politics in general...

But, this is still what people say online about the UK and CA and similar...

So, myevaluation of your words was spot on.

If you mean to say that you think I am also part of the MAGA crowd (or
idolize DJT or JDV), I will disagree...

But, if you mean it in the sense that I have no particular expertise in international law or similar, then probably true enough (not exactly
that I have much basis to disagree on this point).

Most of what I have gathered on this topic has mostly been from the
"streams of random people talking about stuff on the internet" (usually
mixed in with other stuff, particularly if doom scrolling on X or
similar...).

And, sometimes, one can have something to play in the background while
they are working on code or similar, that doesn't require all that much attention, ... A lot of times, YouTube videos where people just sort of
ramble on about some topic can work well, when not just listening to
music or similar.

But, in general, rarely did people really say much about either the UK
or CA, if they talk about them at all, these are a few places that
people seem to like to rip on.

The other side here being much more into going on about the whole thing
in Gaza (and human rights violations, ...), TX oppressing reproductive
rights, various places opposing people's ability to express their
preferred gender identity, etc. These two areas rarely overlap though on
a single topic. Though, admittedly, I don't really understand the whole "gender identity" thing all that well, doesn't strongly interact with my
own experience.

Well, and the places they do overlap is usually in conflict over the perception of public figures like DJT (like whether he is hero or demon,
...), and similar...

...

--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Fri Apr 17 21:33:25 2026

From Newsgroup: comp.arch

On Fri, 17 Apr 2026 05:21:57 +0000, quadi wrote:

This led me to feel I had the perfect spot in which to re-introduce to
this iteration of the Concertina II architecture that most bizarre
feature of the architecture which was cited as one of its defining features... I think of it as an exotic and strictly optional feature, existing mainly to enhance emulation, while the ability to go from RISC
to CISC to VLIW is what defines Concertina II.

And now I've increased the number of header types by one, so that the
extra options can be combined with the full instruction set, rather than
these additions being disjoint from each other so that one can only choose
one but not the other. If you need both at once, they're available at the
cost of a little more overhead.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri Apr 17 21:38:49 2026

From Newsgroup: comp.arch

John Levine <johnl@taugh.com> schrieb:

But let's get back to gate delays, please.

Let's.

How do people actually count gate delays, and how useful is it?
Different gates have different delays (obviously), so counting
an inverter the same as a three-input NOR gate (independent of
fan-out, even) seems to be a large simplification which may be
useful for a fairly rough approximation, but not that much better.

Or am I missing something?
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat Apr 18 01:11:49 2026

From Newsgroup: comp.arch

Thomas Koenig <tkoenig@netcologne.de> posted:

John Levine <johnl@taugh.com> schrieb:

But let's get back to gate delays, please.

Let's.

How do people actually count gate delays, and how useful is it?
Different gates have different delays (obviously), so counting
an inverter the same as a three-input NOR gate (independent of
fan-out, even) seems to be a large simplification which may be
useful for a fairly rough approximation, but not that much better.

Or am I missing something?

There is the "standard" FO4 counting scheme where 1 gate drives
4 other gate inputs, and in this scheme, a D-type flip-flop was
2.5 gates of delay.

As one can guess, gates can be sized: as small as obeys the FAB
design rules, to <basically> as big as one can afford. Naturally,
as gates get bigger, they can drive bigger loads--BUT they also
present bigger loads to the gates driving them.

Conway figured out that the fastest way to "buffer up" a signal
was to use inverters staged in the ratio of 1:e:e^2:e^3... with
e being the standard 2.7... base of natural logarithms. Rounding
e up to 3 degrades speed by less than 1%, rounding up to 4 only
slows down 10% or so--so, most buffering is done at FO4, where
a minimum sized gate drives an inverter 4|u as big which would then
drive another inverter 16|u as big...

Each transistor between a power connection and the signal connec-
tion basically, adds its own transconductance to the electrical path
(ignoring body effect). Knowing that deMorgan's laws apply; we
instantly see that a Nand gate is simply a Nor gate with inverted
inputs. A Nand gate has its serial string of FETs between signal
and ground and a parralel path from signal to Vdd, while a Nor has
its serial FETs between signal and Vdd and its parallel path between
signal and ground. To deal with these serial paths, the transistors
are lengthened; a 2Nand has N-channels 2|u as wide and can use 1|u
P channels, a 3Nand has 3|u N-channels and still 1|u P-channels, ...
{Nors are similar but reverse Ns and Ps} Somewhere along the line,
the parallel path FETs have to get lengthened because the capacitance
of all the serial path diffusion capacitance (to maintain rather
equal pull up and pull down).

Soon, one realizes that one needs SPICE simulation with accurate
models to push the edge--just like when pushing the Young's Modulus
in engineering models.

In fast designs, there is an entire team charged with buffering and
routing the CLOCK so that every gate in 10^2 mm^2 receives its rising
edge and falling edge with less than 1 gate of delay 'skew' across
the whole chip using wires that have more than 1 gate of delay when
jumping over 30 gates. CRAY-1 had less than 1ns of clock skew in a
machine the size of restaurant refrigerator using wires with 2ns/foot
of delay. In ASIC designs, we assume (starting out) that there will
be 1/2 clock of skew in the 'clock'
--- Synchronet 3.21f-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch on Sat Apr 18 10:56:56 2026

From Newsgroup: comp.arch

On 17/04/2026 19:37, BGB wrote:

On 4/17/2026 8:01 AM, David Brown wrote:

On 17/04/2026 14:11, Bill Findlay wrote:

On 17 Apr 2026, David Brown wrote
(in article <10rsik4$287kv$1@dont-email.me>):

On 16/04/2026 20:15, BGB wrote:

Could be wrong, this is from memory and stuff I heard on the internet. >>>>

You should be a lot more careful about what you watch on the internet.

Bravo! You put a lot more work into that than I could
stomach in response to Fuck News talking points.
Thank you.

I've "known" BGB for many years on Usenet.-a He is very intelligent,
but sometimes a bit too quick to trust the wrong sources.-a I hope I
can encourage him to be more careful about what to trust and what not
to trust.-a (And that includes not believing things just because /I/
say so either - I can be wrong too.)

Fun fact - I heard about the "ground.news" site I mentioned because
they sponsored some Youtube videos I have watched :-)

Not usually been one for fact checking, as my interest areas are still (mostly) technical (rather than political or legal), so whatever
political leaning I would have mostly shouldn't matter.

Fact checking is also important in technical fields!

But while you might not be particularly interested in politics, politics
and other aspects of societies and countries around the world still
affect you. It is still good to have some rough ideas about what is
going on in the world - and how to tell if something is true or not (or
at least likely to be true or not).

But, yeah, sometimes it gets a little sketchy if it is just someone
rambling on top of gameplay footage, like Subway Surfers, or similar.
Back in the day, we had Audiosurf, but seems that Subway Surfers has overtaken this role.

Some people have also used games like COD or Minecraft, but watching
someone play these games from a first person POV for any length of time causes motion sickness.

At one point, people doing videos with talking over the top of clips of Skibidi Toilet was pretty popular (during the high point of Skibidi
Toilet, but its popularity seems to have weakened as of late).

The video of the guy ranting about CA taking away personal ownership of
3D printers and CNC machines was over the top of video of him working on
one of his craft projects. Usually though, for stuff like this, there
would be repetition from multiple sources (in a case of "one guy saying something, possibly all just nothing, many people saying the same thing, possibly true" sense).

One particular politician is famous for prefacing many of his boldest
and most absurd lies with "many people are saying...". Many people can
say the same thing, and still be wrong. (Just look at religion around
the world. Many people say one thing. Many people say something
totally different. They can't all be right - many people are wrong.)

"Proof by repeated assertion" is not a valid argument, whether it is one person saying something many times, or many people saying the same thing.

Sometimes there is a lot of stuff with people fighting over "Evolution
vs Young Earth Creationism" and similar. Sometimes people arguing for
the Earth being a flat disk (obviously wrong), ...

I'm glad that you at least consider "flat Earth" to be obviously wrong.
The same applies to any kind of "young earth" idea.

All sorts of stuff going on...

Sometimes does wander into international politics territory.

Sort of reminded how for a while there were lots of YouTube sponsored segments for a company claiming to sell Lordship status...

It is possible to argue that believing the nonsense you have heard about
is harmless - though I would say making yourself look foolish can be considered "harm". But scams con people out of real money. Fair enough
if it is a small amount of money, and clearly nonsense, bought as a joke
- like buying insurance against alien kidnapping. Please be careful
about any kinds of scams you come across.

Then other people saying it was a scam, because that was "not how it worked". IIRC idea was that they bought a farm somewhere and were
"selling" it in 1ft^2 parcels or similar, planting a little flag on each one, and then selling the people the title of "Lord Whatever" under the premise of them having their flag planted on a parcel of land in the UK
or such...

Had to look, couldn't initially remember name off-hand: https://en.wikipedia.org/wiki/Established_Titles

Then the whole thing went away.
Apparently a combination of public controversy and also the UK
government apparently also being like "that is not how that works".

...

--- Synchronet 3.21f-Linux NewsLink 1.2

From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Sat Apr 18 11:11:35 2026

From Newsgroup: comp.arch

BGB wrote:

Sort of reminded how for a while there were lots of YouTube sponsored segments for a company claiming to sell Lordship status...

Then other people saying it was a scam, because that was "not how it worked". IIRC idea was that they bought a farm somewhere and were
"selling" it in 1ft^2 parcels or similar, planting a little flag on each one, and then selling the people the title of "Lord Whatever" under the premise of them having their flag planted on a parcel of land in the UK
or such...

Had to look, couldn't initially remember name off-hand: https://en.wikipedia.org/wiki/Established_Titles

Then the whole thing went away.
Apparently a combination of public controversy and also the UK
government apparently also being like "that is not how that works".

Something similar to this actually happened, in Norway:

At one point, after establishing that all men could vote, it was
understood that this of course only meant men with a tie to the land,
i.e farmers, millers, blacksmiths, industialists etc.

In reaction, a group working to make voting rights really universal
bought up large tracts of worthless swamp/marshland and split it into
square foot parcels. Armed with a owership certificate for said plot,
you could not be denied your voting rights.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
--- Synchronet 3.21f-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch on Sat Apr 18 11:37:01 2026

From Newsgroup: comp.arch

On 18/04/2026 11:11, Terje Mathisen wrote:

BGB wrote:

Sort of reminded how for a while there were lots of YouTube sponsored
segments for a company claiming to sell Lordship status...

Then other people saying it was a scam, because that was "not how it
worked". IIRC idea was that they bought a farm somewhere and were
"selling" it in 1ft^2 parcels or similar, planting a little flag on
each one, and then selling the people the title of "Lord Whatever"
under the premise of them having their flag planted on a parcel of
land in the UK or such...

Had to look, couldn't initially remember name off-hand:
https://en.wikipedia.org/wiki/Established_Titles

Then the whole thing went away.
Apparently a combination of public controversy and also the UK
government apparently also being like "that is not how that works".

Something similar to this actually happened, in Norway:

Well, similar in that it was parcelling of land - dissimilar in that it
was not all a scam!

At one point, after establishing that all men could vote, it was
understood that this of course only meant men with a tie to the land,
i.e farmers, millers, blacksmiths, industialists etc.

In reaction, a group working to make voting rights really universal
bought up large tracts of worthless swamp/marshland and split it into
square foot parcels. Armed with a owership certificate for said plot,
you could not be denied your voting rights.

I've also seen this done in the UK as a way to protect land from
developers. A forest (or whatever land is to be protected) is bought
then parcelled up and sold to thousands of people. If someone wants to destroy the forest to build houses, factories, or whatever, they need to
find all these owners and buy from each of them individually.

--- Synchronet 3.21f-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sat Apr 18 10:08:10 2026

From Newsgroup: comp.arch

David Brown <david.brown@hesbynett.no> schrieb:

I've also seen this done in the UK as a way to protect land from
developers. A forest (or whatever land is to be protected) is bought
then parcelled up and sold to thousands of people. If someone wants to destroy the forest to build houses, factories, or whatever, they need to find all these owners and buy from each of them individually.

You can also buy a square foot of land to get a (worthless, but
amusing) title, for example as "Laird" in Scotland.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21f-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch on Sat Apr 18 14:18:29 2026

From Newsgroup: comp.arch

On 18/04/2026 12:08, Thomas Koenig wrote:

David Brown <david.brown@hesbynett.no> schrieb:

I've also seen this done in the UK as a way to protect land from
developers. A forest (or whatever land is to be protected) is bought
then parcelled up and sold to thousands of people. If someone wants to
destroy the forest to build houses, factories, or whatever, they need to
find all these owners and buy from each of them individually.

You can also buy a square foot of land to get a (worthless, but
amusing) title, for example as "Laird" in Scotland.

No, you can't.

You can give some money to a bunch of scammers who say that you can buy
a title, but it has no connection with reality.

BGB already gave this example, with more information about how it was a
scam.

--- Synchronet 3.21f-Linux NewsLink 1.2

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Sat Apr 18 13:01:31 2026

From Newsgroup: comp.arch

On 2026-Apr-17 21:11, MitchAlsup wrote:

Thomas Koenig <tkoenig@netcologne.de> posted:

John Levine <johnl@taugh.com> schrieb:

But let's get back to gate delays, please.

Let's.

How do people actually count gate delays, and how useful is it?
Different gates have different delays (obviously), so counting
an inverter the same as a three-input NOR gate (independent of
fan-out, even) seems to be a large simplification which may be
useful for a fairly rough approximation, but not that much better.

Or am I missing something?

There is the "standard" FO4 counting scheme where 1 gate drives
4 other gate inputs, and in this scheme, a D-type flip-flop was
2.5 gates of delay.

As one can guess, gates can be sized: as small as obeys the FAB
design rules, to <basically> as big as one can afford. Naturally,
as gates get bigger, they can drive bigger loads--BUT they also
present bigger loads to the gates driving them.

Conway figured out that the fastest way to "buffer up" a signal
was to use inverters staged in the ratio of 1:e:e^2:e^3... with
e being the standard 2.7... base of natural logarithms. Rounding
e up to 3 degrades speed by less than 1%, rounding up to 4 only
slows down 10% or so--so, most buffering is done at FO4, where
a minimum sized gate drives an inverter 4|u as big which would then
drive another inverter 16|u as big...

Each transistor between a power connection and the signal connec-
tion basically, adds its own transconductance to the electrical path (ignoring body effect). Knowing that deMorgan's laws apply; we
instantly see that a Nand gate is simply a Nor gate with inverted
inputs. A Nand gate has its serial string of FETs between signal
and ground and a parralel path from signal to Vdd, while a Nor has
its serial FETs between signal and Vdd and its parallel path between
signal and ground. To deal with these serial paths, the transistors
are lengthened; a 2Nand has N-channels 2|u as wide and can use 1|u
P channels, a 3Nand has 3|u N-channels and still 1|u P-channels, ...
{Nors are similar but reverse Ns and Ps} Somewhere along the line,
the parallel path FETs have to get lengthened because the capacitance
of all the serial path diffusion capacitance (to maintain rather
equal pull up and pull down).

Soon, one realizes that one needs SPICE simulation with accurate
models to push the edge--just like when pushing the Young's Modulus
in engineering models.

In fast designs, there is an entire team charged with buffering and
routing the CLOCK so that every gate in 10^2 mm^2 receives its rising
edge and falling edge with less than 1 gate of delay 'skew' across
the whole chip using wires that have more than 1 gate of delay when
jumping over 30 gates. CRAY-1 had less than 1ns of clock skew in a
machine the size of restaurant refrigerator using wires with 2ns/foot
of delay. In ASIC designs, we assume (starting out) that there will
be 1/2 clock of skew in the 'clock'

The part I don't see is the rules for combinatorial gates.
There also seem to be combinatorial gates like XOR or AND-OR-INV or MUX
where multiple gates are combined in one but at a lower gate delay.

For example, in TTL an XOR or a 2:1 or 4:1 mux has 3 or 4 gate delays
because it really is an INV, an AND and a OR,
but in CMOS those seem to be just 1 or 1.5 gate delays.

In CMOS sometimes one is able to smoosh gates together and eliminate
gate delays, but the rules for when smooshing is allowed are not
obvious to me. I just assumed that it all sorts out in SPICE simulation.

I find this makes it more difficult to just look at a CMOS logic circuit
and know whether it will fit within a 20 gate delay stage budget.

--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat Apr 18 17:59:39 2026

From Newsgroup: comp.arch

EricP <ThatWouldBeTelling@thevillage.com> posted:

On 2026-Apr-17 21:11, MitchAlsup wrote:

Thomas Koenig <tkoenig@netcologne.de> posted:

John Levine <johnl@taugh.com> schrieb:

But let's get back to gate delays, please.

Let's.

How do people actually count gate delays, and how useful is it?
Different gates have different delays (obviously), so counting
an inverter the same as a three-input NOR gate (independent of
fan-out, even) seems to be a large simplification which may be
useful for a fairly rough approximation, but not that much better.

Or am I missing something?

There is the "standard" FO4 counting scheme where 1 gate drives
4 other gate inputs, and in this scheme, a D-type flip-flop was
2.5 gates of delay.

As one can guess, gates can be sized: as small as obeys the FAB
design rules, to <basically> as big as one can afford. Naturally,
as gates get bigger, they can drive bigger loads--BUT they also
present bigger loads to the gates driving them.

Conway figured out that the fastest way to "buffer up" a signal
was to use inverters staged in the ratio of 1:e:e^2:e^3... with
e being the standard 2.7... base of natural logarithms. Rounding
e up to 3 degrades speed by less than 1%, rounding up to 4 only
slows down 10% or so--so, most buffering is done at FO4, where
a minimum sized gate drives an inverter 4|u as big which would then
drive another inverter 16|u as big...

Each transistor between a power connection and the signal connec-
tion basically, adds its own transconductance to the electrical path (ignoring body effect). Knowing that deMorgan's laws apply; we
instantly see that a Nand gate is simply a Nor gate with inverted
inputs. A Nand gate has its serial string of FETs between signal
and ground and a parralel path from signal to Vdd, while a Nor has
its serial FETs between signal and Vdd and its parallel path between
signal and ground. To deal with these serial paths, the transistors
are lengthened; a 2Nand has N-channels 2|u as wide and can use 1|u
P channels, a 3Nand has 3|u N-channels and still 1|u P-channels, ...
{Nors are similar but reverse Ns and Ps} Somewhere along the line,
the parallel path FETs have to get lengthened because the capacitance
of all the serial path diffusion capacitance (to maintain rather
equal pull up and pull down).

Soon, one realizes that one needs SPICE simulation with accurate
models to push the edge--just like when pushing the Young's Modulus
in engineering models.

In fast designs, there is an entire team charged with buffering and
routing the CLOCK so that every gate in 10^2 mm^2 receives its rising
edge and falling edge with less than 1 gate of delay 'skew' across
the whole chip using wires that have more than 1 gate of delay when
jumping over 30 gates. CRAY-1 had less than 1ns of clock skew in a
machine the size of restaurant refrigerator using wires with 2ns/foot
of delay. In ASIC designs, we assume (starting out) that there will
be 1/2 clock of skew in the 'clock'

The part I don't see is the rules for combinatorial gates.
There also seem to be combinatorial gates like XOR or AND-OR-INV or MUX
where multiple gates are combined in one but at a lower gate delay.

For example, in TTL an XOR or a 2:1 or 4:1 mux has 3 or 4 gate delays
because it really is an INV, an AND and a OR,
but in CMOS those seem to be just 1 or 1.5 gate delays.

A 4:1 Mux is a 2222AOI gate--one Ssllooww gate--but still 1 gate.

One slow gate is generally faster than 2 gates (and less power)
because only 1 signal has to move (Vdd->gnd or gnd->Vdd) instead
of more than one. Each signal moving is limited by the transcon-
ductance of the FET stack and the capacitance being driven. We
call the rise/fall time the edge speed.

In CMOS sometimes one is able to smoosh gates together and eliminate
gate delays, but the rules for when smooshing is allowed are not
obvious to me. I just assumed that it all sorts out in SPICE simulation.

Almost always be deMorganizing the logic.

I find this makes it more difficult to just look at a CMOS logic circuit
and know whether it will fit within a 20 gate delay stage budget.

If the gate delay count is less than 20, there is "some" sizing of
those gates which will result in minimum delay.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sat Apr 18 19:23:47 2026

From Newsgroup: comp.arch

EricP <ThatWouldBeTelling@thevillage.com> schrieb:

In fast designs, there is an entire team charged with buffering and
routing the CLOCK so that every gate in 10^2 mm^2 receives its rising
edge and falling edge with less than 1 gate of delay 'skew' across
the whole chip using wires that have more than 1 gate of delay when
jumping over 30 gates. CRAY-1 had less than 1ns of clock skew in a
machine the size of restaurant refrigerator using wires with 2ns/foot
of delay. In ASIC designs, we assume (starting out) that there will
be 1/2 clock of skew in the 'clock'

The part I don't see is the rules for combinatorial gates.
There also seem to be combinatorial gates like XOR or AND-OR-INV or MUX
where multiple gates are combined in one but at a lower gate delay.

For example, in TTL an XOR or a 2:1 or 4:1 mux has 3 or 4 gate delays
because it really is an INV, an AND and a OR,
but in CMOS those seem to be just 1 or 1.5 gate delays.

There is the method of logical effort, see https://en.wikipedia.org/wiki/Logical_effort . I have not made
much effort to do calculations using that method.

An alternative would be to use an actual library as an example.
A company called Nangate released an open-sourced library (google
for NangateOpenCellLibrary_typical.lib ), based on a 45 nm process,
for which delay calculations can be done as example, for example
using Berkeley ABC. That program can also do optimiztations
(although it cannot handle gates with more than one input, such as
full adders, and has weaknesses in stability). I haven't tried to
model wire delays with this.

In CMOS sometimes one is able to smoosh gates together and eliminate
gate delays, but the rules for when smooshing is allowed are not
obvious to me. I just assumed that it all sorts out in SPICE simulation.

AOI and friends also work in TTL, I believe.

I find this makes it more difficult to just look at a CMOS logic circuit
and know whether it will fit within a 20 gate delay stage budget.

An interesting question :-)
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Sat Apr 18 15:05:10 2026

From Newsgroup: comp.arch

On 4/18/2026 3:56 AM, David Brown wrote:

On 17/04/2026 19:37, BGB wrote:

On 4/17/2026 8:01 AM, David Brown wrote:

On 17/04/2026 14:11, Bill Findlay wrote:

On 17 Apr 2026, David Brown wrote
(in article <10rsik4$287kv$1@dont-email.me>):

On 16/04/2026 20:15, BGB wrote:

Could be wrong, this is from memory and stuff I heard on the
internet.

You should be a lot more careful about what you watch on the internet. >>>>

Bravo! You put a lot more work into that than I could
stomach in response to Fuck News talking points.
Thank you.

I've "known" BGB for many years on Usenet.-a He is very intelligent,
but sometimes a bit too quick to trust the wrong sources.-a I hope I
can encourage him to be more careful about what to trust and what not
to trust.-a (And that includes not believing things just because /I/
say so either - I can be wrong too.)

Fun fact - I heard about the "ground.news" site I mentioned because
they sponsored some Youtube videos I have watched :-)

Not usually been one for fact checking, as my interest areas are still
(mostly) technical (rather than political or legal), so whatever
political leaning I would have mostly shouldn't matter.

Fact checking is also important in technical fields!

It is mostly information gathering and testing though.

If one has a crazy idea, one can test it, and if it is a bad idea, it generally doesn't work.

But while you might not be particularly interested in politics, politics
and other aspects of societies and countries around the world still
affect you.-a It is still good to have some rough ideas about what is
going on in the world - and how to tell if something is true or not (or
at least likely to be true or not).

Can usually do searches, but if searches mostly seem to agree with an
idea, or there is little to say that it is wrong, well...

Could have maybe done searches before the posts that started all this,
but alas.

Was just like "AFAIK", since it was low confidence, "YOLO, I guess...".

Sometimes, one can be lazy and ask AI models like Gemini or Grok or similar.

Gemini's response was mostly in saying that the issue was mostly about
people carrying things with sharp points rather than owning them, so say (gist):
Pointed knife in kitchen: OK;
Pointed knife in hand when walking around: Bad;
Blunt tipped knife in hand while walking: OK;
...

Apparently, maybe OK if the knife/scissors/etc having a cable/chain to
the table to prevent someone from walking off with it (sorta like the
pens chained to the tables in some public settings).

Would have also tracked with the previous implication of people still
needing blunt tipped scissors (say, if they needed something they could
freely move from one location to another, ...).

With I guess, exceptions for formal/ceremonial stuff, where people could
still carry things with pointed tips (like swords or ceremonial daggers
or such).

I think, folding the question of solder into the mix, was also mention
that one may need blunt tipped soldering irons to not risk running
against the law.

Well, and that there were mostly restrictions on lead/tin solder rather
than against solder in general.

...

Was, like, "OK..."

Had noted that one UK based guy on YouTube ("Explaining Computers") had frequently used pointed scissors, was not sure if this was some sort of low-level crime, or if the scissors were "grandfathered in" or something
(say, if one had them from whenever the restriction goes into effect, a
person could still keep using it; sorta like the cars that had run on
Leaded Gas...).

Another YouTuber based in Australia ("Dave Jones") liked to open
packages with an oversized knife (with pointed tip), and had a few times
waved the knife around and made comments along the lines of "this is for
the UK viewers", seemingly implying its status as a banned artifact.

Though there were pictures of there being spikes on park benches and
similar, but this wasn't entirely inconsistent if the restriction is
more one of the points being mobile (since on a park bench, it doesn't
move). People don't like there being spikes on park benches, which also follows, ...

But, yeah, if this is not how it works, fair enough.
Off hand, it was like, what information I had seen had pointed in a
different direction.

Didn't think to ask how far it extended, like whether or not such
restrictions applied to the shape of the tines on forks, etc. Though,
IME, most forks tend to have square-tipped tines (and one with
sharp-tipped tines would be a needless safety hazard), so maybe this was
N/A.

...

Can note that where I am living, there are no such restrictions of this
sort.

There are different sorts of restrictions though, like someone not being allowed to carry guns or similar into places like schools and
courthouses (and concealed carry requires a permit, ...).

Likewise, until fairly recently, weed was illegal...

Then it changed, and now the place is almost overrun with stores selling it.

But, yeah, sometimes it gets a little sketchy if it is just someone
rambling on top of gameplay footage, like Subway Surfers, or similar.
Back in the day, we had Audiosurf, but seems that Subway Surfers has
overtaken this role.

Some people have also used games like COD or Minecraft, but watching
someone play these games from a first person POV for any length of
time causes motion sickness.

At one point, people doing videos with talking over the top of clips
of Skibidi Toilet was pretty popular (during the high point of Skibidi
Toilet, but its popularity seems to have weakened as of late).

The video of the guy ranting about CA taking away personal ownership
of 3D printers and CNC machines was over the top of video of him
working on one of his craft projects. Usually though, for stuff like
this, there would be repetition from multiple sources (in a case of
"one guy saying something, possibly all just nothing, many people
saying the same thing, possibly true" sense).

One particular politician is famous for prefacing many of his boldest
and most absurd lies with "many people are saying...".-a Many people can
say the same thing, and still be wrong.-a (Just look at religion around
the world.-a Many people say one thing.-a Many people say something
totally different.-a They can't all be right - many people are wrong.)

"Proof by repeated assertion" is not a valid argument, whether it is one person saying something many times, or many people saying the same thing.

Possibly...

But, OTOH, if one person is saying something any no one else is saying
it, there is a higher probability that that person had pulled something
out of thin air.

And, if something has a wide reaching impact (like, say, authorities
taking everyone's 3D printers away), one would expect this to make a bit
more noise on the internet.

Sometimes there is a lot of stuff with people fighting over "Evolution
vs Young Earth Creationism" and similar. Sometimes people arguing for
the Earth being a flat disk (obviously wrong), ...

I'm glad that you at least consider "flat Earth" to be obviously wrong.
The same applies to any kind of "young earth" idea.

some things don't hold up:
Flat Earth:
Pretty much all of the evidence is against it;
Young Earth Creationism:
Hard to reconcile with geology and physics;
Depends on a particular interpretation of Genesis,
but, easier to assert that this is not the correct interpretation.
A lot of the "alien Conspiracy" stuff:
Would require implausible levels of coordination to hide,
if their claims were true;
Has similar logical problems as "Flat Earth" and similar.
A lot of the stories about various "cryptids"/etc:
Alas, if bigfoot/yeti/etc were around,
there would likely be more confirmed physical evidence.

Like, people being like "hey, these comets are actually alien spacecraft
and transmitting coded radio signals" (yeah, no). Like, the last two
high profile comets that went by both getting people claiming they were
alien spacecraft, etc.

Ambiguous areas:
Ghosts and similar;
Unlike cryptids, a ghost would not leave physical evidence;
There are explanations for why ghosts are usually no-show.
Parapsychology stuff.
One can assume it is rare, if it exists, rather than widespread;
There are reasons to believe it fails whenever tested empirically;
...

So, say, unlike on movies/TV, the "true" ESP'ers would be very rare, and
with a skill-set that can appear or disappear whether or not its
existence would effect the causal outcomes of measurements (so whenever tested, it would behave as-if it doesn't exist so that causality holds).

Though, a lot of this falls into the areas of "untestable".

In my case, I have some anomalous sensory experiences, but most are
likely explainable in terms of neurology and psychology rather than
external reality. It can become very difficult to pin down internal
subjective experiences to known external effects.

But, otherwise it is similar to a question similar to, say, whether TV
shows like "Star Trek" can be reconciled with known physics:
Current answer leans towards no.

Well, and if by some chance a "haunted location" starts dipping into
"Scooby Doo" territory, and people go in to look at it objectively,
chances are they will find a property owner or similar trying to pull a
fast one. Well, and if you had settings like in the set-ups for various
horror movies ("No one goes there, and if they do, they don't come
back!"). Well, paranormal investigator types would be on that stuff, and
if these people kept disappearing, this itself would likely draw attention.

...

On the other hand:
Politicians have done something stupid, and now everyone in a given area
needs to do something stupid as a result to not be seen as a criminal.

Is much more believable.
The existence of stupid laws is more more well known and observable.

All sorts of stuff going on...

Sometimes does wander into international politics territory.

Sort of reminded how for a while there were lots of YouTube sponsored
segments for a company claiming to sell Lordship status...

It is possible to argue that believing the nonsense you have heard about
is harmless - though I would say making yourself look foolish can be considered "harm".-a But scams con people out of real money.-a Fair enough if it is a small amount of money, and clearly nonsense, bought as a joke
- like buying insurance against alien kidnapping.-a Please be careful
about any kinds of scams you come across.

Would not likely go for something like this anyways, as whether or not
it were true, some might call it "douchemaxing".

Or, say, "You finna get tha drip going with that skibidi douchemax
rizz?" or such ("Well, I think I just might.", proceeds to get an overly
large purple and gold faux fur robe or similar).

...

Otherwise, in terms of code density XG3 has started beating out both XG1
and RV64GC+Jx in some cases in terms of code density, and I am left to realize:
I can't explain how.

The deltas in instruction count seem to be lower than what could fully
account for the absence of 16-bit instructions.

So, say:
XG3 vs RV64GC+Jx:
XG3: ~ 11% fewer instructions
RV64GC+Jx: ~ 40% of the instructions become 16 bit (saving ~ 20%).

Assuming a simplified reference case of 75K instructions:
Reference Case: 300K at 4B/Instr
16/32 case : 240K
11% fewer case: 267K

Projected winner: 16/32 (or, in this case, RV64GC).

Actual model mode complicated in that this case is 16/32/64 vs 32/64/96.

Though, at present, this quirk has mostly appeared with Heretic and
ROTT. But, the actual stats don't appear much different, and it seems
like "back of the envelope" RV64GC would still be expected to hold the lead.

Still requires more analysis it seems.

...

--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat Apr 18 21:02:58 2026

From Newsgroup: comp.arch

BGB <cr88192@gmail.com> posted:

On 4/18/2026 3:56 AM, David Brown wrote:

-----------------------

"Proof by repeated assertion" is not a valid argument, whether it is one person saying something many times, or many people saying the same thing.

Possibly...

But, OTOH, if one person is saying something any no one else is saying
it, there is a higher probability that that person had pulled something
out of thin air.

This coming from the one person here who does not subscribe to the
minutia of IEEE 754 while accepting the formats and arithmetic defini-
tions.
--------------

some things don't hold up:
Flat Earth:
Pretty much all of the evidence is against it;
Young Earth Creationism:
Hard to reconcile with geology and physics;
Depends on a particular interpretation of Genesis,
but, easier to assert that this is not the correct interpretation.
A lot of the "alien Conspiracy" stuff:
Would require implausible levels of coordination to hide,
if their claims were true;
Has similar logical problems as "Flat Earth" and similar.
A lot of the stories about various "cryptids"/etc:
Alas, if bigfoot/yeti/etc were around,
there would likely be more confirmed physical evidence.

deNorms are not needed in well written FP arithmetic
Flush to Zero is perfectly acceptable tool
...

--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sun Apr 19 01:08:23 2026

From Newsgroup: comp.arch

On Sat, 18 Apr 2026 21:02:58 GMT
MitchAlsup <user5857@newsgrouper.org.invalid> wrote:

BGB <cr88192@gmail.com> posted:

On 4/18/2026 3:56 AM, David Brown wrote:

-----------------------

"Proof by repeated assertion" is not a valid argument, whether it
is one person saying something many times, or many people saying
the same thing.

Possibly...

But, OTOH, if one person is saying something any no one else is
saying it, there is a higher probability that that person had
pulled something out of thin air.

This coming from the one person here who does not subscribe to the
minutia of IEEE 754 while accepting the formats and arithmetic defini-
tions.

I don't know what exactly do you mean by "subscribe" and "minutia", but
if you mean what I am guessing you mean then my own position is pretty
close to that.
More precisely, I think IEEE 754 formats and definitions of basic ops
are mostly great, with exception of omission of very useful rsqrt
primitive.
I think that 90% of IEEE 754 exception are useless crap and in one or
two places it's worse than that.
I think that effort-to-reward ratio of non-default rounding modes is
pretty low and the choice of mandatory non-default modes is sub-optimal.
I think that common practice of FP control and FP status shared
across different supported precisions is wrong. Although, in this case
it's more a fault of language bindings of 754 rather than of 754 itself.
But 754 leaves the issue underspecified which is also no good.

So, if BGB shares my view then there are already two of us.

--- Synchronet 3.21f-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.arch on Sun Apr 19 00:49:51 2026

From Newsgroup: comp.arch

In article <lVeER.284152$4wI6.209127@fx24.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <NdaER.1511$r_k6.609@fx38.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:

<snip>

[*] recently transferred to the CH-53E fleet due to the imminent >>>>>retirement of the F-16C fleet.

The Marine Corps doesn't fly the F-16. :-) Perhaps you mean
the F/A-18 or the Harrier?

Brain fart. His squadron flys the F/A-18C and D models. The
E's and F's will remain in the active fleet along with the F-35,
but the final days of the C and D models are in sight.

No problem; I can see wanting to switch over to helos from fixed
wing. It's a different world.

His eventual goal is to get his A&P. Figures helos will be good
experience.

Nice. I'm sure he's absolutely correct that it's good
experience for getting into the airlines once he's out of the
Corps.

I got to visit the flight line in 2024, very interesting.

Cool.

Prior visit to a Marine base was at 29 Palms in the 1980s, visiting
a cousin. He was living on-base in married housing and told
me to avoid the well-lit compound several miles east of the housing area, which
was secured and managed by the NOP.

Ah, the stumps. I remember the first time I got there, stepping
off a bus (we'd just flown from NC, having completed post Parris
Island training at Camp Lejeune) and immediately seeing tumble
weed blowing down the main drag. "Oh my god; I have to stay
here for a YEAR?!" (This was before I became an officer.)

I wonder which part your cousin meant; perhaps Camp Wilson,
which is an active training area (and pretty much nothing else,
though there is a very small PX there selling pogey bait).

I just looked on google maps and they've pretty much censored
the entire base in both the map and satellite views.

Probably out of embarassment...one of the most prominent
featues of 29 Palms is "Lake Bandini": the waste-water treatment
facility at the bottom of the hill that the base is on. (The
old joke goes, "Don't eat the fish out of Lake Bandini"). :-D

- Dan C.

--- Synchronet 3.21f-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.arch on Sun Apr 19 01:02:36 2026

From Newsgroup: comp.arch

In article <10rsktu$287kv$2@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

On 16/04/2026 18:31, Dan Cross wrote:

In article <10rqrkf$1nbrp$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

On 16/04/2026 13:59, Dan Cross wrote:

In article <10rqag7$1in0h$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

[snip]
(The service members have all their other equipment at
home too - they store their own uniforms and other stuff, and are
responsible for washing and repairs or ordering replacements after the >>>>> exercises.)

The same is true in the US military. I always found it annoying
that I had to make space for my issued equipment at home, but
c'est la vie.

Personally, I have no military experience at all (not counting school
cadets). But one of my sons joined the National Guard after his
military service. He has a neat solution to problem of space for his
equipment - he keeps it in his old bedroom in our house, not in his own
flat :-)

Seems like your son should take his obligations a bit more
seriously: keeping his equipment at your house, when it's
supposed to be in his dwelling, sounds like a rules violation.

I don't know the details of the regulations, but I can assure you that
it is entirely within them. As a student, his flat is considered
temporary accommodation and our house is his permanent home. His
National Guard base is in our area, not where he is a student.

Maybe the regulations in the USA are different. Maybe there are
different standards about how quickly you can be called up and need to >deploy.

Misrepresenting your home address, such as using your parents'
home when you don't actually live there, temporary accommodation
or not, is not something the US military looks upon favorably.

What's good for the goose is good for the gander.

It also means that there are less guns stored in
concentrated places as potential targets for robberies. And in the
event of a real invasion, it's a lot easier to smugle around and
distribute firing pins to service members than to pass around guns from >>>>> a central armoury.

There are tradeoffs, however: it is also easier for a bad actor
to source a weapon by breaking into a private home.

It is plausible, but AFAIK it is extremely rare here. The solid
majority of criminals here don't want guns, and would probably not take
one if they found one in a house they were robbing. As a house burglar, >>> you can't easily sell a gun, you don't need one for defence, you don't
need one to threaten anyone - it just increases your chance of being
shot yourself, and increases your punishment if you get got. There are
guns in the more serious narcotics gangs, but those are handguns - they
have no use for military weapons.

FWIW, handguns are military weapoons.

Sorry - I meant pistols, rather than rifle-sized weapons. (Of course >pistols are also used in the military.)

The "9mm handgun" I referred to is a pistol. It's the standard
issue sidearm for officers in the US Marine Corps.

[snip]
Of course there, is a continuum of force, and despite recent
idiots in charge of the US military asserting otherwise, the
rules of engagement and the laws of warfare are taken _very_
seriously.

It's nice to know - especially with the current muppets at the top of
the USA chain.

Yes, they are horrible.

[snip]
If shots need to be fired, the primary aim should
be to persuade the bad guys to surrender, not to kill them.

Is it better if the enemy surrenders? Sure. But this idea that
you are going to shoot to wound in a combat scenario is not
realistic. As you said, it's not like in the movies.

I think that there has been a bit of a disparity in the situations we
have been imagining. I agree entirely that shooting to wound in a
combat situation is not realistic. It just seemed to me that you were >suggesting armoury guards were moving to combat mode a lot more quickly
than I thought appropriate.

[snip]

Just to be clear here - mutual inspections are fine and often a good
thing. It is the idea of standing in line while a commander of some
sort pats you down that is not. Maybe I just misunderstood what you
were saying.

Some of your comments, such as repeatedly asserting the
ridiculous notion that people are "bribing" armorers so they can
avoid "doing their jobs" lead me to wonder whether you are
deliberately misrepresenting what I am saying so that you can
feel yourself morally superior to an American.

I have, I think, been patient with my responses, but in the this
and my previous message, my patience is slipping.

- Dan C.

--- Synchronet 3.21f-Linux NewsLink 1.2

From Robert Finch@robfi680@gmail.com to comp.arch on Sat Apr 18 21:04:02 2026

From Newsgroup: comp.arch

On 2026-04-18 6:08 p.m., Michael S wrote:

On Sat, 18 Apr 2026 21:02:58 GMT
MitchAlsup <user5857@newsgrouper.org.invalid> wrote:

BGB <cr88192@gmail.com> posted:

On 4/18/2026 3:56 AM, David Brown wrote:

-----------------------

"Proof by repeated assertion" is not a valid argument, whether it
is one person saying something many times, or many people saying
the same thing.

Possibly...

But, OTOH, if one person is saying something any no one else is
saying it, there is a higher probability that that person had
pulled something out of thin air.

This coming from the one person here who does not subscribe to the
minutia of IEEE 754 while accepting the formats and arithmetic defini-
tions.

I don't know what exactly do you mean by "subscribe" and "minutia", but
if you mean what I am guessing you mean then my own position is pretty
close to that.
More precisely, I think IEEE 754 formats and definitions of basic ops
are mostly great, with exception of omission of very useful rsqrt
primitive.
I think that 90% of IEEE 754 exception are useless crap and in one or
two places it's worse than that.
I think that effort-to-reward ratio of non-default rounding modes is
pretty low and the choice of mandatory non-default modes is sub-optimal.
I think that common practice of FP control and FP status shared
across different supported precisions is wrong. Although, in this case
it's more a fault of language bindings of 754 rather than of 754 itself.
But 754 leaves the issue underspecified which is also no good.

So, if BGB shares my view then there are already two of us.

Lack of knowledge of the minutia may be preventing a decent
implementation of the standard for my CPU. For example, the CPU has a
number of floating-point ops that do not update the status register: sign-inject, compares, convert from lower to higher precision and
others. I am going by common sense, if an instruction cannot exception,
then the flags are not affected. But common sense does not always prevail.

The IEEE 754 standard seems somewhat inaccessible to me. How do I join
IEEE if I am not a student or a professional engineer? Otherwise, how do
I get access to IEEE docs?

Thus the EIII 457 standard was bornrCa

--- Synchronet 3.21f-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.arch on Sat Apr 18 20:51:49 2026

From Newsgroup: comp.arch

John Levine <johnl@taugh.com> writes:

According to quadi <quadibloc@ca.invalid>:

On Wed, 15 Apr 2026 16:44:04 +0000, John Levine wrote:

Then in the 1960s some well organizaed revisionists ignored what
it says, pretended it meant an individual right to have guns
everywhere, and managed to find a majority of right wing supreme
court justices willing to sign on.

I'm afraid that I can't agree with you on this. ...

Of course, it's possible subordinate clauses were used differently
back in the eighteenth century, but I'd need evidence to buy into
that theory.

The evidence is that for over 150 years, everyone agreed that it meant
state militias. There were two Supreme Court decisions in 1876 and
1886 that confirmed the rights of states to regulate militias, one in
1939 saying that a sawed off shotgun wasn't the kind of arm that the
2nd was intended to protect, and one in 1980 confirming that it was OK
for states to forbid convicted felons from owning guns.

I'm not aware of anyone claiming it was an individual right that the
states could not regulate until the 1960 revisionists, and no court
decision until Heller in 2008 which reversed the previous century and
a half's precedent. Heller was decided 5-4, over strong dissents.

Has anyone seen comp.arch around here somewhere? I seem to have
wandered into rec.guns.
--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Sun Apr 19 03:54:40 2026

From Newsgroup: comp.arch

On 4/18/2026 5:08 PM, Michael S wrote:

On Sat, 18 Apr 2026 21:02:58 GMT
MitchAlsup <user5857@newsgrouper.org.invalid> wrote:

BGB <cr88192@gmail.com> posted:

On 4/18/2026 3:56 AM, David Brown wrote:

-----------------------

"Proof by repeated assertion" is not a valid argument, whether it
is one person saying something many times, or many people saying
the same thing.

Possibly...

But, OTOH, if one person is saying something any no one else is
saying it, there is a higher probability that that person had
pulled something out of thin air.

This coming from the one person here who does not subscribe to the
minutia of IEEE 754 while accepting the formats and arithmetic defini-
tions.

I don't know what exactly do you mean by "subscribe" and "minutia", but
if you mean what I am guessing you mean then my own position is pretty
close to that.

Yeah, and also disagreeing with some details of IEEE 754 semantics is
not in the same category as belief in things like "Flat Earth" or "Alien Abductions" or similar...

I suspect I am far from the only person that has asserted that DAZ/FTZ
is often preferable for optimizing an FPU for implementation cost (or
that an FPU optimized primarily for speed and logic cost can still be
useful).

Though, my stance on these things has softened when I have noted that
the full version can be supported in software without "too horrible"
impact to performance on a naive implementation. This mainly requires
the FPU to consistently have a way to detect and trap in cases where
full semantics are requested but could not be delivered natively.

Granted, it is tradeoffs, and there are corner cases where things could
get ugly if not handled well.

More precisely, I think IEEE 754 formats and definitions of basic ops
are mostly great, with exception of omission of very useful rsqrt
primitive.

Yeah, rsqrt can be useful.

Say:
1.0/sqrt(x)
Being one of the more common use cases of sqrt, and combining them into
a single operation could give a significant speedup over two operations
on an implementation where neither operation, in itself, is all that fast.

Personally, I would likely add Ssqrt() and Ssqr(), for signed square
root and signed square.

But, most of these operations can be defined in terms of the others, and
some might consider regarding ssqrt and ssqr as the more fundamental
operators to be unorthodox (and doesn't jive well if one considers the
real number line as being embedded within the complex plane).

But, for an FPU, the whole thing exists within an approximation of the
real number line (and the complex plane effectively doesn't exist).

Granted, nevermind if some algebraic rules (such as the distributive
property) would start to break down in this system when these operators
are involved. For sanity sake, would assume ssqr(x) to be distinct from
x*x, but ssqr(ssqrt(x)) => x and with a domain of all reals, ...

Likewise:
1.0/ssqrt(x)
Remains well-defined for everything other than x ~= 0.

In my case, there are a few quick dirty ops as well:
FSQRTA: Square Root, Approximate
FRCPA: Reciprocal Approximate

I think that 90% of IEEE 754 exception are useless crap and in one or
two places it's worse than that.
I think that effort-to-reward ratio of non-default rounding modes is
pretty low and the choice of mandatory non-default modes is sub-optimal.
I think that common practice of FP control and FP status shared
across different supported precisions is wrong. Although, in this case
it's more a fault of language bindings of 754 rather than of 754 itself.
But 754 leaves the issue underspecified which is also no good.

Shared FPU status is also a complaint of mine.

Though, sadly, if one wants to be able to pull off full IEEE math, it is
to some extent unavoidable.

I would personally prefer it is there were some way to specify rounding
mode at the language-level when needed (likely along with a way to
specify whether it is preferable to have subnormals and similar, or
whether DAZ/FTZ is acceptable).

I generally assume RNE as a sensible default mode for fixed-mode
instructions, with RTZ as a good second place. The other rounding modes
are more niche and would generally be best used in special cases.

Using a dynamic rounding mode held in an FPU status register effectively making every use-case worse here:
Because, nearly everywhere else, RNE is the best option, except for the one-off operations where one wants something else.

Although RTZ is the cheapest to implement case, IMO it would not be
acceptable as a default fixed rounding mode because it tends to result
in a drift-towards-zero which can become very obvious (and if one tries
to compensate for the drift-towards-zero, then it just as easily becomes
a drift away from 0). Better to have RNE as default because it doesn't
result in values drifting over time.

Well, except when converting to an integer, where generally everything
expects RTZ and anything other than RTZ is likely to be a crap storm,
even if for many use-cases, rounding towards negative infinity would
likely be better for this case since it keeps the number line on an evenly-spaced grid with respect to 0.

The FPU status register thing turns into a bigger mess if one is trying
to apply it to things like SIMD and similar. For now, I am ignoring this
(SIMD is being treated as inherently non-IEEE in this area).

Sadly, short of the ugliness of dropping a bunch of attribute modifiers
on C, there is no good way to retrofit this. And, using the existing mechanisms would require making use of the a dynamic rounding mode.

One thought though is, say:
Lax FP + and fenv_access=false
Use quick and dirty ops with a fixed rounding mode;
Lax FP + and fenv_access=true
Use dynamic rounding mode;
Here, IEEE emulation is disabled by default.
Strict FP (set as a compiler flag):
Use operations which use dynamic rounding mode;
Set IEEE emulation to true on init.

In my ISA, there were different instructions for these cases:
FADD/FSUB/FMUL: Fixed to RNE and DAZ/FTZ, no flags updates.
FADDA/FSUBA/FMULA: May also use reduced precision.
FADDG/FSUBG/FMULG: Uses dynamic rounding mode;
The Imm5fp/Imm6fp encodings also use dynamic rounding;
These operations will update the flags.

Currently, the FDIV, FSQRT, and FMAC ops, which are optional, will also
be assumed to use dynamic rounding mode rules.

At present, HW FMAC may exist, but is Double-Rounded when in DAZ/FTZ
mode. Setting IEEE Mode may enable single-rounded FMAC via trap-and-emulate.

There was an issue of where to put the FPU status bits in my case:
For a while, had put it in the high bits of GBR/GP, but this was a
problem. Reloading this register would tend to stomp the status, doing a save/reload, with the FPU status copied from the old version, would
introduce dynamic scoping rules. While this worked for a while, it is incompatible with the rules for fenv_access in C, which assume a global
state and not a dynamically-scoped state.

So, if BGB shares my view then there are already two of us.

Did see someone on the RISC-V sig-fp mailing also favoring DAZ/FTZ and
similar for floating point.

So, it isn't exactly unheard of...

Otherwise, still looking at the "OK, why is XG3 now somehow beating
RV64GC+Jx on code density?" mystery. At present, I still lack a solid explanation for this one.

Well, unless it is maybe something like 32-bit:
MOV.L (GP, Disp16u*4), Rd //32 bit (XG3)
vs:
J21I + LW Xd, Disp12s(GP) //64 bit (JX)
LUI + ADD(16b) + LW Xd, Disp12s(Xt) //80 bit (GC)
LUI + ADD + LW Xd, Disp12s(Xt) //96 bit (G)
...

This was a factor before, but on its own is nothing new.
Wouldn't expect this to be carried by just having a slightly more
compact encoding to load/store global variables though.

Still looking at it, trying to figure out.

Would be ironic though if using a 32/64/96 bit encoding scheme just so happened to end up as being the most compact option though...

Then again, say recently where someone was saying that SuperH halved the binary sizes vs its fixed-length 32-bit RISC predecessors, which wasn't
really true as the limitations of the ISA typically meant needing 60%
more instructions for similar work.

Where:
1.6 * 2 => 3.2 (SH-4 scenario)
0.8 * 4 => 3.2 (RV-C scenario)
So, roughly break-even with a 16/32 in terms of code size, except the
16/32 approach is going to be faster.

Well, unless one did a RISC with 32 bit encodings but only 2R
instruction encodings, but this would suck...

Current models imply I am looking at (for XG3):
0.9 * 4 => 3.6 (~ 10% fewer instructions)

Where, 3.6 > 3.2 ...

But, this may need another fudge factor for the percentage of
instructions in RV+JX that need jumbo prefixes.

So: from a program ~ 99k instructions, 3.2k prefixes.
... No, this doesn't cover it.

Checks, RV-C: 19k instructions.
This is around 19% of instructions, and 19% is less than 40%.

Recalc:
0.9 * 4 * 1.03 => 3.7
And, 3.6 < 3.7, so XG3 wins...

As, it would appear in this case, RV-C is under-performing the estimate.

...

--- Synchronet 3.21f-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sun Apr 19 12:28:10 2026

From Newsgroup: comp.arch

On Sun, 19 Apr 2026 03:54:40 -0500
BGB <cr88192@gmail.com> wrote:

Did see someone on the RISC-V sig-fp mailing also favoring DAZ/FTZ
and similar for floating point.

So, it isn't exactly unheard of...

As I said above, I think that IEEE 754 definitions for basic ops are
great.
I strongly oppose "flush subnormals to zero".

I am not sure what is DAZ. Does it abbreviate "De-normals are zero" ?
I.e. subnormal not only never produced as result of arithmetic ops but
also silently converted to zero when taken as input?
If that what DAZ means then I oppose it even stronger than FTZ.

--- Synchronet 3.21f-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch on Sun Apr 19 12:56:52 2026

From Newsgroup: comp.arch

On 19/04/2026 03:02, Dan Cross wrote:

Some of your comments, such as repeatedly asserting the
ridiculous notion that people are "bribing" armorers so they can
avoid "doing their jobs" lead me to wonder whether you are
deliberately misrepresenting what I am saying so that you can
feel yourself morally superior to an American.

No, that was not my intention at all - not remotely. I had interpreted
your original comments as suggesting that you could bribe the armourer
to ignore that you had not done your duty properly, and for an "under
the table" fee he would cover up for you. That is very different from
paying him for a service, which is how I now understand it after your follow-up comments.

This misinterpretation may well have come at least partly due to
cultural differences between America and Norway. (A key difference is
tipping culture - paying cash tips is standard in a wide range of circumstances in the USA, but very rare in Norway. In the USA, a
similar concept could perhaps extend to paying for a service from an
armourer. In Norway, paying cash directly to the armourer would
definitely be highly suspicious). But those are differences, not superiorities in either direction.

Of course I can prefer the way things work here, but preferences are not objective. Even when we can look at comparisons of factual, objective measures between societies (say, the rates of gun deaths in different countries), these should not be used without understanding a wider
context. And they are not at all personal - an average Norwegian is statistically less likely to be shot than an average American, but it is
not because of anything /I/ do or anything /you/ do. There is no
question of "moral superiority" involved.

I have, I think, been patient with my responses, but in the this
and my previous message, my patience is slipping.

It is best to close of this thread branch. I have found your posts on
the topic informative and they have given me a better understanding of
some things, especially some aspects of military practice in the USA,
and they have corrected some of my misunderstandings.

--- Synchronet 3.21f-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.arch on Sun Apr 19 13:35:17 2026

From Newsgroup: comp.arch

In article <10s2cdk$3t2hf$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

On 19/04/2026 03:02, Dan Cross wrote:

[snip]
I have, I think, been patient with my responses, but in the this
and my previous message, my patience is slipping.

It is best to close of this thread branch. [...]

Very well. I'll accept your words on their face, and as you and
others have pointed out (apologies to John Levine and Tim
Rentsch) this is all wildly off-topic. Not uncommon in
comp.arch, I'm afraid.

Mitch's recent post about gate delays was far more interesting
to me than anything related to the military.

- Dan C.

--- Synchronet 3.21f-Linux NewsLink 1.2

From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Sun Apr 19 17:50:16 2026

From Newsgroup: comp.arch

Dan Cross wrote:

In article <10rsktu$287kv$2@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

I don't know the details of the regulations, but I can assure you that
it is entirely within them. As a student, his flat is considered
temporary accommodation and our house is his permanent home. His
National Guard base is in our area, not where he is a student.

Maybe the regulations in the USA are different. Maybe there are
different standards about how quickly you can be called up and need to
deploy.

Misrepresenting your home address, such as using your parents'
home when you don't actually live there, temporary accommodation
or not, is not something the US military looks upon favorably.

Didn't you read his comment? Here in Norway, any student's home address
is considered to be wherever she/he lived before starting to study.

The only exception is if you, like I did, actually buy a flat/apartment
near the university, at that point this was considered my primary residence.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sun Apr 19 18:18:12 2026

From Newsgroup: comp.arch

Tim Rentsch <tr.17687@z991.linuxsc.com> posted:

John Levine <johnl@taugh.com> writes:

According to quadi <quadibloc@ca.invalid>:

On Wed, 15 Apr 2026 16:44:04 +0000, John Levine wrote:

Then in the 1960s some well organizaed revisionists ignored what
it says, pretended it meant an individual right to have guns
everywhere, and managed to find a majority of right wing supreme
court justices willing to sign on.

I'm afraid that I can't agree with you on this. ...

Of course, it's possible subordinate clauses were used differently
back in the eighteenth century, but I'd need evidence to buy into
that theory.

The evidence is that for over 150 years, everyone agreed that it meant state militias. There were two Supreme Court decisions in 1876 and
1886 that confirmed the rights of states to regulate militias, one in
1939 saying that a sawed off shotgun wasn't the kind of arm that the
2nd was intended to protect, and one in 1980 confirming that it was OK
for states to forbid convicted felons from owning guns.

I'm not aware of anyone claiming it was an individual right that the
states could not regulate until the 1960 revisionists, and no court decision until Heller in 2008 which reversed the previous century and
a half's precedent. Heller was decided 5-4, over strong dissents.

Has anyone seen comp.arch around here somewhere? I seem to have
wandered into rec.guns.

Hiding behind cover in the back of the room...
--- Synchronet 3.21f-Linux NewsLink 1.2

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Sun Apr 19 14:37:49 2026

From Newsgroup: comp.arch

On 2026-Apr-18 13:59, MitchAlsup wrote:

EricP <ThatWouldBeTelling@thevillage.com> posted:

On 2026-Apr-17 21:11, MitchAlsup wrote:

Thomas Koenig <tkoenig@netcologne.de> posted:

John Levine <johnl@taugh.com> schrieb:

But let's get back to gate delays, please.

Let's.

How do people actually count gate delays, and how useful is it?
Different gates have different delays (obviously), so counting
an inverter the same as a three-input NOR gate (independent of
fan-out, even) seems to be a large simplification which may be
useful for a fairly rough approximation, but not that much better.

Or am I missing something?

There is the "standard" FO4 counting scheme where 1 gate drives
4 other gate inputs, and in this scheme, a D-type flip-flop was
2.5 gates of delay.

As one can guess, gates can be sized: as small as obeys the FAB
design rules, to <basically> as big as one can afford. Naturally,
as gates get bigger, they can drive bigger loads--BUT they also
present bigger loads to the gates driving them.

Conway figured out that the fastest way to "buffer up" a signal
was to use inverters staged in the ratio of 1:e:e^2:e^3... with
e being the standard 2.7... base of natural logarithms. Rounding
e up to 3 degrades speed by less than 1%, rounding up to 4 only
slows down 10% or so--so, most buffering is done at FO4, where
a minimum sized gate drives an inverter 4|u as big which would then
drive another inverter 16|u as big...

Each transistor between a power connection and the signal connec-
tion basically, adds its own transconductance to the electrical path
(ignoring body effect). Knowing that deMorgan's laws apply; we
instantly see that a Nand gate is simply a Nor gate with inverted
inputs. A Nand gate has its serial string of FETs between signal
and ground and a parralel path from signal to Vdd, while a Nor has
its serial FETs between signal and Vdd and its parallel path between
signal and ground. To deal with these serial paths, the transistors
are lengthened; a 2Nand has N-channels 2|u as wide and can use 1|u
P channels, a 3Nand has 3|u N-channels and still 1|u P-channels, ...
{Nors are similar but reverse Ns and Ps} Somewhere along the line,
the parallel path FETs have to get lengthened because the capacitance
of all the serial path diffusion capacitance (to maintain rather
equal pull up and pull down).

Soon, one realizes that one needs SPICE simulation with accurate
models to push the edge--just like when pushing the Young's Modulus
in engineering models.

In fast designs, there is an entire team charged with buffering and
routing the CLOCK so that every gate in 10^2 mm^2 receives its rising
edge and falling edge with less than 1 gate of delay 'skew' across
the whole chip using wires that have more than 1 gate of delay when
jumping over 30 gates. CRAY-1 had less than 1ns of clock skew in a
machine the size of restaurant refrigerator using wires with 2ns/foot
of delay. In ASIC designs, we assume (starting out) that there will
be 1/2 clock of skew in the 'clock'

The part I don't see is the rules for combinatorial gates.
There also seem to be combinatorial gates like XOR or AND-OR-INV or MUX
where multiple gates are combined in one but at a lower gate delay.

For example, in TTL an XOR or a 2:1 or 4:1 mux has 3 or 4 gate delays
because it really is an INV, an AND and a OR,
but in CMOS those seem to be just 1 or 1.5 gate delays.

A 4:1 Mux is a 2222AOI gate--one Ssllooww gate--but still 1 gate.

One slow gate is generally faster than 2 gates (and less power)
because only 1 signal has to move (Vdd->gnd or gnd->Vdd) instead
of more than one. Each signal moving is limited by the transcon-
ductance of the FET stack and the capacitance being driven. We
call the rise/fall time the edge speed.

In CMOS sometimes one is able to smoosh gates together and eliminate
gate delays, but the rules for when smooshing is allowed are not
obvious to me. I just assumed that it all sorts out in SPICE simulation.

Almost always be deMorganizing the logic.

I'm referring to where you merge gates together.
For example, an XOR is (A nand (not B)) nand ((not A) nand B)
which is 2 INV (4T) and 3 NAND gates (12T) with a total of
16 transistors and a delay of 3.
The 3 NAND can merge into a single gate of 8 transistors
and a total of 12T and a delay of 2.

Presumably this merging of gates can continue to some point
but what that point is isn't clear to me.
That makes it difficult to look at a logic diagram and
know how many gates are going to merge that way.

I find this makes it more difficult to just look at a CMOS logic circuit
and know whether it will fit within a 20 gate delay stage budget.

If the gate delay count is less than 20, there is "some" sizing of
those gates which will result in minimum delay.

--- Synchronet 3.21f-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.arch on Sun Apr 19 19:04:19 2026

From Newsgroup: comp.arch

In article <10s2tjo$2pko$1@dont-email.me>,
Terje Mathisen <terje.mathisen@tmsw.no> wrote:

Dan Cross wrote:

In article <10rsktu$287kv$2@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

I don't know the details of the regulations, but I can assure you that
it is entirely within them. As a student, his flat is considered
temporary accommodation and our house is his permanent home. His
National Guard base is in our area, not where he is a student.

Maybe the regulations in the USA are different. Maybe there are
different standards about how quickly you can be called up and need to
deploy.

Misrepresenting your home address, such as using your parents'
home when you don't actually live there, temporary accommodation
or not, is not something the US military looks upon favorably.

Didn't you read his comment?

Yes, I did. Did you read the rest of the thread?

Here in Norway, any student's home address
is considered to be wherever she/he lived before starting to study.

The only exception is if you, like I did, actually buy a flat/apartment
near the university, at that point this was considered my primary residence.

Great. Now, perhaps, we can let this subthread go.

- Dan C.

--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Sun Apr 19 14:14:39 2026

From Newsgroup: comp.arch

On 4/19/2026 4:28 AM, Michael S wrote:

On Sun, 19 Apr 2026 03:54:40 -0500
BGB <cr88192@gmail.com> wrote:

Did see someone on the RISC-V sig-fp mailing also favoring DAZ/FTZ
and similar for floating point.

So, it isn't exactly unheard of...

As I said above, I think that IEEE 754 definitions for basic ops are
great.
I strongly oppose "flush subnormals to zero".

I also agree with the formats and basic ops, main sticking point is that subnormals make some things more complicated and expensive for hardware.

Main problem case in practice (where differences between subnormals and
FTZ semantics becomes visible) involves dividing something by a value
very close to zero. This requires alternate handling to get correct
results (the naive x*y => x*(1.0/y) strategy no longer works).

Though, a naive (if slower) option is to handle the divide as before,
but have the divide operator quietly promote things to Binary128 (with
the final result back-converted to Binary64).

While implementing the full FDIV operator using N-R would also work in
this case, in a "trap-and-emulate on subnormal numbers" strategy, it is
slower than going through Binary128.

But, as noted, one can go either way here.

My current leaning is to support subnormals at the ISA level (so, it may appear to software as-if the FPU has full IEEE FPU operations), but with
some traps and restrictions needed to get IEEE results (there are some FPU-related instructions which exist in XG1/2/3 which will need to be essentially disallowed in strict-FP mode).

It can then be chosen at compile time whether the preference is to have
more accurate math or better performance.

But, I may need to devise some sort of test program to validate that
results in this mode are correct.

I am not sure what is DAZ. Does it abbreviate "De-normals are zero" ?
I.e. subnormal not only never produced as result of arithmetic ops but
also silently converted to zero when taken as input?
If that what DAZ means then I oppose it even stronger than FTZ.

DAZ/FTZ: Means "Denormals-As-Zero" / "Flush-To-Zero"

So, yeah... This trades mathematical purity for cheapness.

There is a potentially cheaper-still option that could maybe be called
"Clamp Exponent to Zero" (don't know an official term off-hand,
apparently may also be called "Dirty Underflow" or similar). Generally
no one uses this though, as this one crosses an "unacceptable level of
suck" threshold.

Say:
(x*0.0) == (y*0.0)
Being true with either IEEE or DAZ+FTZ, but would not be true with
DAZ+CEZ. Could paper over this, but then one is just pushing the costs
around from one place to another.

Though, potentially, this is one minor case where having an FTZ mode
adds cost:
An implementation that solely used trap-and-emulate for full IEEE
behavior could use CEZ behavior and it would have been hidden. Whereas
FTZ means that one needs logic to detect that the exponent has gone out
of range and to force the mantissa to 0 (but would have still needed
this logic to deal with overflow to Inf).

The cheapest options, as noted:
Fixed RTZ (round towards zero);
DAZ+CEZ
Handles FSUB using ones-complement arithmetic;
...

These may go below a certain minimum threshold of acceptable except for specifically low-precision SIMD operations (such as Binary16).

But, even for contexts where one is using Binary32, this is frequently unacceptable.

So, for SIMD, I mostly ended up going with Fixed-RNE, DAZ+FTZ, and Two's Complement FSUB, mostly because the alternative was poor even for SIMD.

2.0 - 3.0 => -0.999999

Is kinda obvious in its suck.

So, regardless of exact strategy, things like 2.0 - 3.0 should still
ideally give an exact -1.0 ...

...

--- Synchronet 3.21f-Linux NewsLink 1.2

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Sun Apr 19 16:04:25 2026

From Newsgroup: comp.arch

On 2026-Apr-18 15:23, Thomas Koenig wrote:

EricP <ThatWouldBeTelling@thevillage.com> schrieb:

In fast designs, there is an entire team charged with buffering and
routing the CLOCK so that every gate in 10^2 mm^2 receives its rising
edge and falling edge with less than 1 gate of delay 'skew' across
the whole chip using wires that have more than 1 gate of delay when
jumping over 30 gates. CRAY-1 had less than 1ns of clock skew in a
machine the size of restaurant refrigerator using wires with 2ns/foot
of delay. In ASIC designs, we assume (starting out) that there will
be 1/2 clock of skew in the 'clock'

The part I don't see is the rules for combinatorial gates.
There also seem to be combinatorial gates like XOR or AND-OR-INV or MUX
where multiple gates are combined in one but at a lower gate delay.

For example, in TTL an XOR or a 2:1 or 4:1 mux has 3 or 4 gate delays
because it really is an INV, an AND and a OR,
but in CMOS those seem to be just 1 or 1.5 gate delays.

There is the method of logical effort, see https://en.wikipedia.org/wiki/Logical_effort . I have not made
much effort to do calculations using that method.

Yes, I haven't actually used it either.
Sutherland has examples of the gate merging I'm referring to.
Section 4.4 Asymmetric logic gates figure 4.3 has an example of
(A and B) nor C)
merges the AND and NOR gates so instead of 2 gates and 8 transistors
its 1 gate 6 transistors.

An alternative would be to use an actual library as an example.
A company called Nangate released an open-sourced library (google
for NangateOpenCellLibrary_typical.lib ), based on a 45 nm process,
for which delay calculations can be done as example, for example
using Berkeley ABC. That program can also do optimiztations
(although it cannot handle gates with more than one input, such as
full adders, and has weaknesses in stability). I haven't tried to
model wire delays with this.

A while ago I was rummaging about and found the individual gate
delay info in the open source Process Design Kit (PDK) files.

https://skywater-pdk.readthedocs.io/en/main/ https://github.com/google/skywater-pdk

In CMOS sometimes one is able to smoosh gates together and eliminate
gate delays, but the rules for when smooshing is allowed are not
obvious to me. I just assumed that it all sorts out in SPICE simulation.

AOI and friends also work in TTL, I believe.

Yes but you don't get to merge gates together to shorten the delay.
You only get to choose from the packages available
and for most situations just scan the spec sheet and use
the max of the all propagation delays.

I find this makes it more difficult to just look at a CMOS logic circuit
and know whether it will fit within a 20 gate delay stage budget.

An interesting question :-)

--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sun Apr 19 21:39:12 2026

From Newsgroup: comp.arch

EricP <ThatWouldBeTelling@thevillage.com> posted:

On 2026-Apr-18 13:59, MitchAlsup wrote:

------------------

A 4:1 Mux is a 2222AOI gate--one Ssllooww gate--but still 1 gate.

One slow gate is generally faster than 2 gates (and less power)
because only 1 signal has to move (Vdd->gnd or gnd->Vdd) instead
of more than one. Each signal moving is limited by the transcon-
ductance of the FET stack and the capacitance being driven. We
call the rise/fall time the edge speed.

In CMOS sometimes one is able to smoosh gates together and eliminate
gate delays, but the rules for when smooshing is allowed are not
obvious to me. I just assumed that it all sorts out in SPICE simulation.

Almost always be deMorganizing the logic.

I'm referring to where you merge gates together.
For example, an XOR is (A nand (not B)) nand ((not A) nand B)
which is 2 INV (4T) and 3 NAND gates (12T) with a total of
16 transistors and a delay of 3.

Consider what we call a 2-stack:

_|
a___|
|_
|
_|
b___|
|_
|

The 2-stack will conduct when both FETs are On and not otherwise.

Consider 4{2-stacks} 2 from Vdd to signal, 2 from
gnd to signal.

By having P-channel a[0] = x_true and b[0] = y_true
a[1] = x_false and b[1] = y_false

When x == y the stack-pair will pull up.

By having N-channel a[2] = x_true and b[2] = y_false
a[3] = x_false and b[3] = Y_true

When X != y the stack-pair will pull down.

Presto: an (positive logic) 8-FET XOR gate--its fault is that
it requires true/complement inputs (an inverter of delay or
about 1/3 of a gate-delay.) So, this XOR gate in positive only
signaling would have the output drive of a 2-NAND gate and
the delay of 1.3 2-NAND gate.

By rearranging the input terms an XNOR gate is accomplished.

Since the pull stack and the pull down stacks are both 2 series
transistors, the pull up has transconductance 1/2 of the single
FET, we compensate by making the FETs 2|u as long.

In full custom logic, we use 4-FET stacks pulling down (4-NAND)
but we restrict ourselves to 3-FET stacks pulling up do to both
the P-Channel conductance (holes weight more than electrons) and
the body effect of p-channels being greater than N-channels.

The 3 NAND can merge into a single gate of 8 transistors
and a total of 12T and a delay of 2.

The N-stack is our N-AND term. Parallel stacks are your OR
term. CMOS simply requires that all signals are always driven
{up or down}.

Presumably this merging of gates can continue to some point
but what that point is isn't clear to me.

4 N-channels in series and 3 P-channels in series.

That makes it difficult to look at a logic diagram and
know how many gates are going to merge that way.

You do it for a decade, and it becomes akin to breathing.
--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sun Apr 19 21:47:12 2026

From Newsgroup: comp.arch

BGB <cr88192@gmail.com> posted:

On 4/19/2026 4:28 AM, Michael S wrote:

On Sun, 19 Apr 2026 03:54:40 -0500
BGB <cr88192@gmail.com> wrote:

Did see someone on the RISC-V sig-fp mailing also favoring DAZ/FTZ
and similar for floating point.

So, it isn't exactly unheard of...

As I said above, I think that IEEE 754 definitions for basic ops are
great.
I strongly oppose "flush subnormals to zero".

I also agree with the formats and basic ops, main sticking point is that subnormals make some things more complicated and expensive for hardware.

And yet we have single chips with 64 cores, each core containing 5
(sometimes 9) FMAC in 64-bit sizes--where most of the chip is actually
L2 and L3 caches. In real silicon technology, this means one can put
600 64-bit FMACs on a die. In the GPU world, they put 2000 FMACs on
a single die.

I submit that back when it was hard to put a whole core on a chip
you had a shred of an argument, now you do not--we grew out of those constraints.

Main problem case in practice (where differences between subnormals and
FTZ semantics becomes visible) involves dividing something by a value
very close to zero. This requires alternate handling to get correct
results (the naive x*y => x*(1.0/y) strategy no longer works).

If you consider FDIV as having to get the rounding correct--that
method NEVER EVER worked: but you don't even bother getting FMUL
correctly rounded.....
-------------------

So, for SIMD, I mostly ended up going with Fixed-RNE, DAZ+FTZ, and Two's Complement FSUB, mostly because the alternative was poor even for SIMD.

2.0 - 3.0 => -0.999999

Is kinda obvious in its suck.

Note: IEEE 754 delivers the right answer, BTW...
--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sun Apr 19 21:48:39 2026

From Newsgroup: comp.arch

EricP <ThatWouldBeTelling@thevillage.com> posted:

On 2026-Apr-18 15:23, Thomas Koenig wrote:

EricP <ThatWouldBeTelling@thevillage.com> schrieb:

In fast designs, there is an entire team charged with buffering and
routing the CLOCK so that every gate in 10^2 mm^2 receives its rising
edge and falling edge with less than 1 gate of delay 'skew' across
the whole chip using wires that have more than 1 gate of delay when
jumping over 30 gates. CRAY-1 had less than 1ns of clock skew in a
machine the size of restaurant refrigerator using wires with 2ns/foot
of delay. In ASIC designs, we assume (starting out) that there will
be 1/2 clock of skew in the 'clock'

The part I don't see is the rules for combinatorial gates.
There also seem to be combinatorial gates like XOR or AND-OR-INV or MUX
where multiple gates are combined in one but at a lower gate delay.

For example, in TTL an XOR or a 2:1 or 4:1 mux has 3 or 4 gate delays
because it really is an INV, an AND and a OR,
but in CMOS those seem to be just 1 or 1.5 gate delays.

There is the method of logical effort, see https://en.wikipedia.org/wiki/Logical_effort . I have not made
much effort to do calculations using that method.

Yes, I haven't actually used it either.
Sutherland has examples of the gate merging I'm referring to.

It is all Stacks (see previous thread post) and deMorganizing.

Section 4.4 Asymmetric logic gates figure 4.3 has an example of
(A and B) nor C)
merges the AND and NOR gates so instead of 2 gates and 8 transistors
its 1 gate 6 transistors.

An alternative would be to use an actual library as an example.
A company called Nangate released an open-sourced library (google
for NangateOpenCellLibrary_typical.lib ), based on a 45 nm process,
for which delay calculations can be done as example, for example
using Berkeley ABC. That program can also do optimiztations
(although it cannot handle gates with more than one input, such as
full adders, and has weaknesses in stability). I haven't tried to
model wire delays with this.

A while ago I was rummaging about and found the individual gate
delay info in the open source Process Design Kit (PDK) files.

https://skywater-pdk.readthedocs.io/en/main/ https://github.com/google/skywater-pdk

In CMOS sometimes one is able to smoosh gates together and eliminate
gate delays, but the rules for when smooshing is allowed are not
obvious to me. I just assumed that it all sorts out in SPICE simulation.

AOI and friends also work in TTL, I believe.

Yes but you don't get to merge gates together to shorten the delay.
You only get to choose from the packages available
and for most situations just scan the spec sheet and use
the max of the all propagation delays.

I find this makes it more difficult to just look at a CMOS logic circuit >> and know whether it will fit within a 20 gate delay stage budget.

An interesting question :-)

--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Sun Apr 19 18:04:48 2026

From Newsgroup: comp.arch

On 4/19/2026 4:47 PM, MitchAlsup wrote:

BGB <cr88192@gmail.com> posted:

On 4/19/2026 4:28 AM, Michael S wrote:

On Sun, 19 Apr 2026 03:54:40 -0500
BGB <cr88192@gmail.com> wrote:

Did see someone on the RISC-V sig-fp mailing also favoring DAZ/FTZ
and similar for floating point.

So, it isn't exactly unheard of...

As I said above, I think that IEEE 754 definitions for basic ops are
great.
I strongly oppose "flush subnormals to zero".

I also agree with the formats and basic ops, main sticking point is that
subnormals make some things more complicated and expensive for hardware.

And yet we have single chips with 64 cores, each core containing 5
(sometimes 9) FMAC in 64-bit sizes--where most of the chip is actually
L2 and L3 caches. In real silicon technology, this means one can put
600 64-bit FMACs on a die. In the GPU world, they put 2000 FMACs on
a single die.

I submit that back when it was hard to put a whole core on a chip
you had a shred of an argument, now you do not--we grew out of those constraints.

Realistically, anyone making a hobbyist class core can't make anything
like what you describe.

And, if doing it on an (affordable) FPGA, can only do what an FPGA can realistically manage. Which isn't even anywhere near the level of the
SoC that fits in a typical cellphone at this point.

more like late 1990s logic complexity at early 1990s clock speeds.

Well, and, for PCs, most people don't have 64 cores either...
Typical consumer-grade CPUs being more like 8 or 16 cores.

Also typical GPU FPUs are low precision (with trying to use "double" in
a GPGPU context often resulting in a fairly steep performance penalty).

For "enterprise" stuff, that is more "money to burn", which is a very different scenario.

Well, then there are NPUs, which seem to be going in a more "mixed
FP8/FP16" direction. But, doing FMA at "FP8*FP8+FP16" is also, a very different situation.

Well, and if your task is mostly bandwidth bound, maximal FPU precision
is not the priority.

Even in my case, I am still left partly battling memory bandwidth walls
(like, say, frequent use of FP8 or FP16 in my case not being about
trying to save RAM; rather trying to optimize for D$ and I$ and memory bandwidth and similar).

Which is ironic in a way, given that seemingly I am in a better place
relative to memory-bandwidth compared with clock-speed vs a lot of
late-90s and early-2000s CPUs (well, because running a 16-bit wide DDR
chip at 50 MHz, isn't *that* drastically slower than a 64-bit wide
SO-DIMM running at 67 MHz from the early 2000s; or at least if comparing
50 MHz vs 1400 MHz for the CPU core).

Though, there is still the limit of what sorts of use-cases one can fit
into the limited precision of these formats.

...

Main problem case in practice (where differences between subnormals and
FTZ semantics becomes visible) involves dividing something by a value
very close to zero. This requires alternate handling to get correct
results (the naive x*y => x*(1.0/y) strategy no longer works).

If you consider FDIV as having to get the rounding correct--that
method NEVER EVER worked: but you don't even bother getting FMUL
correctly rounded.....
-------------------

Not in the DAZ/FTZ mode, granted.

For the IEEE emulation mode, the idea is to try to patch things up as
needed such that FMUL is correct (similar to the matter of dealing with subnormal numbers).

But, yeah, the DAZ/FTZ mode may typically also give incorrectly rounded
FMUL as well as incorrectly rounded FDIV.

In both cases, the issue appears to go away though if the intermediate computations are done at Binary128 precision. But, this is its own
pros/cons thing.

So, for SIMD, I mostly ended up going with Fixed-RNE, DAZ+FTZ, and Two's
Complement FSUB, mostly because the alternative was poor even for SIMD.

2.0 - 3.0 => -0.999999

Is kinda obvious in its suck.

Note: IEEE 754 delivers the right answer, BTW...

Yes, but I was pointing out mostly the problem of trying to cheap out
too much.

While for SIMD it initially seems like one can cheap out really hard,
basic integer arithmetic scenarios failing to give exact results, and a tendency for values to drift towards zero, can start to have fairly
obvious and visible effects.

So, there is a limit here...

It can escape notice if limited solely to graphics and audio tasks, but
as soon as one starts trying to use it for something much more demanding
than pixel colors or audio mixing, it falls on its face.

One place it becomes obvious pretty quickly is if doing physics
calculations or rotation math, where objects' positions and rotations
will start to steadily drift. They tend to be adjusted every frame,
based on things like applying forces and time-steps.

If the objects all start slowly rotating and sliding towards the origin,
the suck is evident.

So, will need to draw a line here.

--- Synchronet 3.21f-Linux NewsLink 1.2

From Robert Finch@robfi680@gmail.com to comp.arch on Sun Apr 19 22:47:45 2026

From Newsgroup: comp.arch

On 2026-04-19 7:04 p.m., BGB wrote:

On 4/19/2026 4:47 PM, MitchAlsup wrote:

BGB <cr88192@gmail.com> posted:

On 4/19/2026 4:28 AM, Michael S wrote:

On Sun, 19 Apr 2026 03:54:40 -0500
BGB <cr88192@gmail.com> wrote:

Did see someone on the RISC-V sig-fp mailing also favoring DAZ/FTZ
and similar for floating point.

So, it isn't exactly unheard of...

As I said above, I think that IEEE 754 definitions for basic ops are
great.
I strongly oppose "flush subnormals to zero".

I also agree with the formats and basic ops, main sticking point is that >>> subnormals make some things more complicated and expensive for hardware.

And yet we have single chips with 64 cores, each core containing 5
(sometimes 9) FMAC in 64-bit sizes--where most of the chip is actually
L2 and L3 caches. In real silicon technology, this means one can put
600 64-bit FMACs on a die. In the GPU world, they put 2000 FMACs on
a single die.

I submit that back when it was hard to put a whole core on a chip
you had a shred of an argument, now you do not--we grew out of those
constraints.

Realistically, anyone making a hobbyist class core can't make anything
like what you describe.

I made a 56-core system on an A7-200T. Granted only small 16-bit CPUs
with no FP. Strange thing, I could never get more than about 48 cores to
work.

I think taking the target audience into consideration is important. If
one is trying to produce an example for a low-cost FPGA then there are
limits to what can be done. Most people looking for starting-out type
examples are not going to look at superscalar machines with FP ops. For
the more complex stuff people expect that larger, more expensive
hardware is required.

I think getting IEEE correct results does not use much more logic than
simpler approaches. One is talking small percentages. What is the
difference? Hundreds of LUTs in an FPGA with tens of thousands of LUTs available? 64-bit FMA is using about 3600 LUTs with sub-normals and
rounding.

If one is not looking for IEEE compatibility there may be other
approaches to FP that might use fewer resources. TworCOs complement representation?

And, if doing it on an (affordable) FPGA, can only do what an FPGA can realistically manage. Which isn't even anywhere near the level of the
SoC that fits in a typical cellphone at this point.

more like late 1990s logic complexity at early 1990s clock speeds.

Well, and, for PCs, most people don't have 64 cores either...
-a Typical consumer-grade CPUs being more like 8 or 16 cores.

Also typical GPU FPUs are low precision (with trying to use "double" in
a GPGPU context often resulting in a fairly steep performance penalty).

For "enterprise" stuff, that is more "money to burn", which is a very different scenario.

Well, then there are NPUs, which seem to be going in a more "mixed FP8/ FP16" direction. But, doing FMA at "FP8*FP8+FP16" is also, a very
different situation.

Well, and if your task is mostly bandwidth bound, maximal FPU precision
is not the priority.

Even in my case, I am still left partly battling memory bandwidth walls (like, say, frequent use of FP8 or FP16 in my case not being about
trying to save RAM; rather trying to optimize for D$ and I$ and memory bandwidth and similar).

Which is ironic in a way, given that seemingly I am in a better place relative to memory-bandwidth compared with clock-speed vs a lot of
late-90s and early-2000s CPUs (well, because running a 16-bit wide DDR
chip at 50 MHz, isn't *that* drastically slower than a 64-bit wide SO-
DIMM running at 67 MHz from the early 2000s; or at least if comparing 50
MHz vs 1400 MHz for the CPU core).

Though, there is still the limit of what sorts of use-cases one can fit
into the limited precision of these formats.

...

Main problem case in practice (where differences between subnormals and
FTZ semantics becomes visible) involves dividing something by a value
very close to zero. This requires alternate handling to get correct
results (the naive x*y => x*(1.0/y) strategy no longer works).

If you consider FDIV as having to get the rounding correct--that
method NEVER EVER worked: but you don't even bother getting FMUL
correctly rounded.....
-------------------

Not in the DAZ/FTZ mode, granted.

For the IEEE emulation mode, the idea is to try to patch things up as
needed such that FMUL is correct (similar to the matter of dealing with subnormal numbers).

But, yeah, the DAZ/FTZ mode may typically also give incorrectly rounded
FMUL as well as incorrectly rounded FDIV.

In both cases, the issue appears to go away though if the intermediate computations are done at Binary128 precision. But, this is its own pros/ cons thing.

So, for SIMD, I mostly ended up going with Fixed-RNE, DAZ+FTZ, and Two's >>> Complement FSUB, mostly because the alternative was poor even for SIMD.

2.0 - 3.0 => -0.999999

Is kinda obvious in its suck.

Note: IEEE 754 delivers the right answer, BTW...

Yes, but I was pointing out mostly the problem of trying to cheap out
too much.

While for SIMD it initially seems like one can cheap out really hard,
basic integer arithmetic scenarios failing to give exact results, and a tendency for values to drift towards zero, can start to have fairly
obvious and visible effects.

So, there is a limit here...

It can escape notice if limited solely to graphics and audio tasks, but
as soon as one starts trying to use it for something much more demanding than pixel colors or audio mixing, it falls on its face.

One place it becomes obvious pretty quickly is if doing physics
calculations or rotation math, where objects' positions and rotations
will start to steadily drift. They tend to be adjusted every frame,
based on things like applying forces and time-steps.

If the objects all start slowly rotating and sliding towards the origin,
the suck is evident.

So, will need to draw a line here.

--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Mon Apr 20 01:08:38 2026

From Newsgroup: comp.arch

On 4/19/2026 9:47 PM, Robert Finch wrote:

On 2026-04-19 7:04 p.m., BGB wrote:

On 4/19/2026 4:47 PM, MitchAlsup wrote:

BGB <cr88192@gmail.com> posted:

On 4/19/2026 4:28 AM, Michael S wrote:

On Sun, 19 Apr 2026 03:54:40 -0500
BGB <cr88192@gmail.com> wrote:

Did see someone on the RISC-V sig-fp mailing also favoring DAZ/FTZ >>>>>> and similar for floating point.

So, it isn't exactly unheard of...

As I said above, I think that IEEE 754 definitions for basic ops are >>>>> great.
I strongly oppose "flush subnormals to zero".

I also agree with the formats and basic ops, main sticking point is
that
subnormals make some things more complicated and expensive for
hardware.

And yet we have single chips with 64 cores, each core containing 5
(sometimes 9) FMAC in 64-bit sizes--where most of the chip is actually
L2 and L3 caches. In real silicon technology, this means one can put
600 64-bit FMACs on a die. In the GPU world, they put 2000 FMACs on
a single die.

I submit that back when it was hard to put a whole core on a chip
you had a shred of an argument, now you do not--we grew out of those
constraints.

Realistically, anyone making a hobbyist class core can't make anything
like what you describe.

I made a 56-core system on an A7-200T. Granted only small 16-bit CPUs
with no FP. Strange thing, I could never get more than about 48 cores to work.

Mostly I am also mostly targeting FPGAs smaller than the A7-200T.

But, yeah. I can currently go dual-core with a hardware rasterizer and
similar on the A7-200T.

Mostly limited to Single-core on the A7-100T.
And, need to strip features to fit on the S7-50.

However, I have noted that when one *can* fit on the S7-50, it is also possible to clock it a little higher.

A big chunk of resources, besides FPU and similar, is mostly taken up by
the L1 caches.
~ 22%: L1 caches (I$ and D$ and similar)
~ 10%: Main FPU (Binary64)
~ 5%: SIMD Unit (4x Binary32)
~ 7%: Decoder
~ 4%: 64-bit Int MUL/DIV (Shift-and-Add)
~ 13%: Integer ALU and similar (3 of them, ~ 4% each).
~ 20%: Register Files
~ 19%: Everything else...

With the CPU core using roughly 70% of the total resource budget of the
FPGA (L2, HW interfaces, etc, being most of the rest).

Most of the BRAM is eaten up by the L1 and L2 caches.
29%: L1 caches and TLB
58%: L2 cache
13%: Display Hardware Stuff.
Font/Palette RAM
Raster Cache
...

I think taking the target audience into consideration is important. If
one is trying to produce an example for a low-cost FPGA then there are limits to what can be done. Most people looking for starting-out type examples are not going to look at superscalar machines with FP ops. For
the more complex stuff people expect that larger, more expensive
hardware is required.

Well, or the A7-100T can apparently run a RV32GC SWeRV core at 33 MHz.

My core is at least more feature-rich that the SWeRV, though SWeRV does
seem to get better IPC. In this case it is In-Order, 2-wide superscalar,
with a 9 stage pipeline.

So, seemingly not doing too badly...

There was a Commodore64 / Commodore128 clone using the A7-200T, which
was almost tempting as an FPGA dev-board (they already had most of the
useful computer-style peripheral interfaces, etc).

But, one fatal flaw for my uses:
No External RAM chip, for emulating the C64/C128 they could do the whole
thing in Block-RAM on this FPGA, so, they did so.

I guess this was contrast with David Murray (The 8-Bit Guy) and his
"Commander X16" project, which was mostly using all DIP chips (apart
from the display interface, or VERA, which uses an FPGA, IIRC an S7-25
or similar).

If it were me, I wouldn't have bothered with an YM/OPL chip if going
with an FPGA for the VERA, and probably have also ran the sound and
music on the FPGA. Well, actually, I probably would have just gone all
FPGA, as by that point, a slightly bigger FPGA is likely cheaper than
sourcing a bunch of legacy DIP chips (even if most are still technically in-production by NXP and similar).

Like, one is maybe leaving themselves open if they are depending on a
mix of "New Old Stock" chips and for NXP to keep on making clones of
various 40 year old chips and similar.

Nor is there likely to be much more than niche demand for "Modern built machine that exists as a semi-compatible replica of the Commodore 64".

I think getting IEEE correct results does not use much more logic than simpler approaches. One is talking small percentages. What is the difference? Hundreds of LUTs in an FPGA with tens of thousands of LUTs available? 64-bit FMA is using about 3600 LUTs with sub-normals and rounding.

Besides LUTs, there is also latency (I would likely need around 10
cycles or so for such a unit).

Also format conversion needs renormalization to deal with subnormals, so
the faster 1 cycle converters effectively get replaced by needing a full
pass through the FMA (or, alternatively, logic to detect which case it
is, and either "fast or slow path" it). In the fast/cheap paths, the
format converters essentially just moving bits around.

Then the issue of the FMUL and FADD parts need to be wider to give Single-Rounded Results.

Possible, but not particularly fast or cheap in this case.

As noted:
The other option is to do a cheap FPU that "usually" gives the correct results, and is able to detect and raise a fault in the cases when it
can't (letting software take over and then run the "correct" math using
large integers).

If one is not looking for IEEE compatibility there may be other
approaches to FP that might use fewer resources. TworCOs complement representation?

Going over to different formats is a much bigger issue for software.

Software tends to interact enough with the FPU formats that using
different / non-standard formats is going to "throw a wrench into things".

But, that said, a format like, say:
I24.E8
I52.E12

Which treats the mantissa as a combination of potentially non-normalized signed integer value and an exponent, could potentially allow for a
cheaper FPU, if also using some special cases.

Would still be pros/cons though:
While you could split FADD/FMUL and re-normalization into separate
steps, some other types of instructions could either no longer rely on
the mantissa always being normalized, or one would need to make mantissa normalization a mandatory extra step in many cases.

Would allow for cheaper FADD/FSUB logic, and for less latency (since the normalization stage goes away here).

Pretty much no one did floating-point this way IIRC.

In some ways, it would also display a bit more "jank", for example if
chaining multiple operations without a normalization step results in a
loss of precision as the mantissa scale and exponent drift out of sync.

Don't necessarily need to go the direction of making the FPU even more
jank though.

...

--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Mon Apr 20 17:17:43 2026

From Newsgroup: comp.arch

I was about to increase the number of header types again, so that extra additional instruction sets could be combined with the full instruction
set.
I did add the new header type, but I removed an old one. And, as a bonus,
that let me remove the 16-bit short instruction format.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Mon Apr 20 17:47:27 2026

From Newsgroup: comp.arch

EricP <ThatWouldBeTelling@thevillage.com> schrieb:

There is the method of logical effort, see
https://en.wikipedia.org/wiki/Logical_effort . I have not made
much effort to do calculations using that method.

Yes, I haven't actually used it either.
Sutherland has examples of the gate merging I'm referring to.
Section 4.4 Asymmetric logic gates figure 4.3 has an example of
(A and B) nor C)
merges the AND and NOR gates so instead of 2 gates and 8 transistors
its 1 gate 6 transistors.

See https://en.wikipedia.org/wiki/AND-OR-invert#/media/File:AOI21_complex_vs_standard_gates.svg
for an example.

AOI (and their dual, OAI) gates are quite cool.

An alternative would be to use an actual library as an example.
A company called Nangate released an open-sourced library (google
for NangateOpenCellLibrary_typical.lib ), based on a 45 nm process,
for which delay calculations can be done as example, for example
using Berkeley ABC. That program can also do optimiztations
(although it cannot handle gates with more than one input, such as
full adders, and has weaknesses in stability). I haven't tried to
model wire delays with this.

A while ago I was rummaging about and found the individual gate
delay info in the open source Process Design Kit (PDK) files.

https://skywater-pdk.readthedocs.io/en/main/ https://github.com/google/skywater-pdk

That looks interesting.

In CMOS sometimes one is able to smoosh gates together and eliminate
gate delays, but the rules for when smooshing is allowed are not
obvious to me. I just assumed that it all sorts out in SPICE simulation.

AOI and friends also work in TTL, I believe.

Yes but you don't get to merge gates together to shorten the delay.
You only get to choose from the packages available
and for most situations just scan the spec sheet and use
the max of the all propagation delays.

Sure, if you design a chip on a silicon wafer you have much more
freedom.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Paul Clayton@paaronclayton@gmail.com to comp.arch on Wed Apr 22 00:00:47 2026

From Newsgroup: comp.arch

On 4/17/26 9:11 PM, MitchAlsup wrote:
[snip]

In fast designs, there is an entire team charged with buffering and
routing the CLOCK so that every gate in 10^2 mm^2 receives its rising
edge and falling edge with less than 1 gate of delay 'skew' across
the whole chip using wires that have more than 1 gate of delay when
jumping over 30 gates. CRAY-1 had less than 1ns of clock skew in a
machine the size of restaurant refrigerator using wires with 2ns/foot
of delay. In ASIC designs, we assume (starting out) that there will
be 1/2 clock of skew in the 'clock'

I thought that some designs used intentional clock skew. If the
natural places to divide pipeline stages results in different
logic depths, a skewed clock would enable some stages to borrow
time from others (I think).

Perhaps this is not called 'skew'.

(I have also read that pipeline stage delay can be kept constant
and area/power traded with time, i.e., a normally longer stage
can spend area/power to reduce delay and a normally shorter
stage can save area/power by spending the delay slack.)
--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Wed Apr 22 18:19:50 2026

From Newsgroup: comp.arch

Paul Clayton <paaronclayton@gmail.com> posted:

On 4/17/26 9:11 PM, MitchAlsup wrote:
[snip]

In fast designs, there is an entire team charged with buffering and
routing the CLOCK so that every gate in 10^2 mm^2 receives its rising
edge and falling edge with less than 1 gate of delay 'skew' across
the whole chip using wires that have more than 1 gate of delay when
jumping over 30 gates. CRAY-1 had less than 1ns of clock skew in a
machine the size of restaurant refrigerator using wires with 2ns/foot
of delay. In ASIC designs, we assume (starting out) that there will
be 1/2 clock of skew in the 'clock'

I thought that some designs used intentional clock skew. If the
natural places to divide pipeline stages results in different
logic depths, a skewed clock would enable some stages to borrow
time from others (I think).

In the not so distant past, Intel would use as many as 10 clock edges
to carefully time logic blocks. I think <essentially> everyone else
uses only 2 clock edges {rising and falling} and most only use {rising}.

Perhaps this is not called 'skew'.

Skew is uncontrolled displacement of clock edge {early or late}. skew
is only harmful when a sending block and a receiving block have different
skew leaving the logic insufficient time to do its function.

Offset is controlled displacement of clock edge.

(I have also read that pipeline stage delay can be kept constant
and area/power traded with time, i.e., a normally longer stage
can spend area/power to reduce delay and a normally shorter
stage can save area/power by spending the delay slack.)

All sorts of engineering tricks are played "around the clock edge"
to make timing.
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Fri May 1 01:34:13 2026

From Newsgroup: comp.arch

On Tue, 17 Mar 2026 18:16:16 +0000, MitchAlsup wrote:

5 pounds of sand does not fit in a 4 pound bag !

Indeed.

The basic load-store instruction set takes up about 75% of the opcode
space of 32-bit instructions.

Including pairs of short instructions, sharing a 32-bit word, takes up
about 25% of the opcode space of 32-bit instructions.

The trouble is, though, I need a few other things.

I needed 32-bit operate instructions and additional memory-reference instructions. In my current iteration, I squeeze them out of a few unused opcodes in the short instructions. I had to squash the operate
instructions down to half the space I had been using for them to do this.

Operate instructions: about 1/128 of the opcode space; extra memory-
reference instructions: about 1/64 of the opcode space.

Also, I wanted a header that took up 1/16 of the opcode space, as the preferred simplest way to call for variable-length instructions. I had two spare opcodes in the 32-bit load-store instructions, but I had wanted to
hang on to them for one extra instruction. I finally decided to limit the destination registers for the load address instruction so I could grab
both opcodes.

Now I really have run out of opcode space as far as general 32-bit instructions are concerned. I had to toss out the feature I had briefly
added, of an alternate instruction set where memory operations were
aligned, so that paired short instructions without register restrictions
could be included at 50% of the opcode space. (That was because I didn't
have a spare bit in the header for the type of code this would have been
most useful with; of course, since an alternate instruction set is in a separate opcode space, I could still have it, just at a higher overhead
cost of requesting it for a block; it just was no longer worth having, or
so it seems.)

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Fri May 1 03:11:33 2026

From Newsgroup: comp.arch

On Fri, 01 May 2026 01:34:13 +0000, quadi wrote:

I had to toss out the feature I had briefly
added, of an alternate instruction set where memory operations were
aligned, so that paired short instructions without register restrictions could be included at 50% of the opcode space.

It turns out I did have enough room in the opcode space used for headers
to indicate the use of this instruction set for all three of the header
types where I had used a bit for it.

But there is another more fundamental problem: the source of the extra
opcode space for operate instructions and additional memory-reference instructions is now different from what it was, and so I would have to
give those instructions new opcodes in order to fit in that alternate instruction set which differ from those they have in the regular one.

At the moment, I don't think that's worth the trouble.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Fri May 1 05:16:41 2026

From Newsgroup: comp.arch

On Fri, 01 May 2026 03:11:33 +0000, quadi wrote:

At the moment, I don't think that's worth the trouble.

Instead, I ended up adding something else which resulted in my having to
toss out a header type I had tossed out before, because I added a new type
of header which demanded extra bits.
Having squeezed the instruction set so much to fit it in to the available space, I felt that what was most desperately needed was a convenient and low-overhead way to switch into an alternate set of 32-bit instructions
(which was already present, but only available for large headers for variable-length instruction code).
In addition to adding the new header, I added some additional needed instructions to the alternate instruction set - and corrected a mistake in
it as well.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Fri May 1 14:18:10 2026

From Newsgroup: comp.arch

On Fri, 01 May 2026 05:16:41 +0000, quadi wrote:

In addition to adding the new header, I added some additional needed instructions to the alternate instruction set - and corrected a mistake
in it as well.

It turned out that there was still another mistake. I left out the much-
needed Subroutine Jump with Offset instruction.

And now that the architecture includes both a plain Subroutine Jump instruction _and_ a Subroutine Jump with Offset instruction, I needed to explain what each one *did* carefully. Because without such an
explanation, it would not be clear how the ordinary subroutine jump
without an offset could even _work_, given how headers and pseudo-
immediates lead to non-executable matter being placed within code.

So I noted that a regular subroutine jump instruction makes use of
information within the current instruction block to correctly make the
start of the next executable instruction the return address. Since it
doesn't fetch the _next_ block, though, if it's the last executable instruction in a block, return is merely to the start of the following instruction block.

Jumping to the start of an instruction block, in general, from any branch instruction of any kind, causes control to be transferred to the first executable instruction in the block as identified by its header (or lack thereof). Otherwise, branching to a location within an instruction block
that is identified as not executable by the block header causes an error.

The Subroutine Jump with Offset contains an explicit offset to add to the return address, so it doesn't attempt to adjust the return address to be
sure it is to something executable; here, the compiler takes care of that
for greater efficiency.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri May 1 17:36:27 2026

From Newsgroup: comp.arch

quadi <quadibloc@ca.invalid> posted:

On Tue, 17 Mar 2026 18:16:16 +0000, MitchAlsup wrote:

5 pounds of sand does not fit in a 4 pound bag !

Indeed.

The basic load-store instruction set takes up about 75% of the opcode
space of 32-bit instructions.

Including pairs of short instructions, sharing a 32-bit word, takes up
about 25% of the opcode space of 32-bit instructions.

The trouble is, though, I need a few other things.

An architecture is as much about what you leave out as what you leave in.

I needed 32-bit operate instructions and additional memory-reference instructions. In my current iteration, I squeeze them out of a few unused opcodes in the short instructions. I had to squash the operate
instructions down to half the space I had been using for them to do this.

Operate instructions: about 1/128 of the opcode space; extra memory- reference instructions: about 1/64 of the opcode space.

Also, I wanted a header that took up 1/16 of the opcode space, as the preferred simplest way to call for variable-length instructions. I had two spare opcodes in the 32-bit load-store instructions, but I had wanted to hang on to them for one extra instruction. I finally decided to limit the destination registers for the load address instruction so I could grab
both opcodes.

Now I really have run out of opcode space as far as general 32-bit instructions are concerned. I had to toss out the feature I had briefly added, of an alternate instruction set where memory operations were
aligned, so that paired short instructions without register restrictions could be included at 50% of the opcode space. (That was because I didn't have a spare bit in the header for the type of code this would have been most useful with; of course, since an alternate instruction set is in a separate opcode space, I could still have it, just at a higher overhead
cost of requesting it for a block; it just was no longer worth having, or
so it seems.)

John Savard

--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Fri May 1 19:00:20 2026

From Newsgroup: comp.arch

On Fri, 01 May 2026 17:36:27 +0000, MitchAlsup wrote:

An architecture is as much about what you leave out as what you leave
in.

That certainly is true.

And no matter what I do, since there are an infinity of possibilities, I
will always have left out more than I've included.

In Concertina II, it certainly does seem like I've included a lot. The ISA could be said to be bulging at the seams, like an overstuffed suitcase. I
can see how that would seem to be a bad choice.

Where is the emphasis in Concertina II? What does it prefer to leave in,
and what is it content to leave out?

The Motorola 68000 and the 80386 had groups of eight registers. The IBM
360 had sixteen integer registers, but only four floating-point ones. Most RISC processors have 32 registers.

Traditional processors had instructions that performed an arithmetic
operation from memory into a register. RISC processors do arithmetic only
in registers, with operations to load and store from memory.

The System/360 addressed memory with a base register and an index register
in addition to a 12-bit displacement. Most microprocessors use 16-bit displacements, but usually only let you use one register with it.

I've tried to encompass all the features of these different processors as
much as I could.

Accessing variables in arrays needs an index register for which element in
the array is being accessed, and a base register in addition to the small displacement in the instruction.

I felt I couldn't leave _that_ out.

So I did leave other things out.

I stuck with load/store instructions instead of memory operate
instructions in the primary instruction set.

For both base and index registers, I used three-bit fields in the
instruction to specify them. So while there were 32 integer registers for integer arithmetic, and 32 floating-point registers for floating-point arithmetic, only seven of the integer registers could be index registers,
and only seven of the integer registers could be base registers for 16-bit displacements. (Another seven work with 12-bit displacements, and another seven with 20-bit displacements, and one other works with 15-bit displacements. This way, a register contains an address pointing to a
block of data of a given size, all of which can be accessed by the instructions using it as a base register.)

With varying amounts of overhead, though, one can use memory operate instructions, 20 bit displacements, and register banks with 128 registers
like the Itanium.

If something is good stuff that will make programs run faster, I want to include it.

I left out stack-oriented instructions, memory with type labels... no inspiration from the Burroughs 5000 here!

But if I can squeeze in base-index addressing and register banks with 32 registers, then I feel I've included two important things that will avoid programs being less efficient than they could be.

The block structure let me
- with modest overhead, switch to a different set of choices, so that one
ISA could serve a variety of applications
- have variable length instructions without changing an instruction set of 32-bit instructions designed to be in pure 32-bit code like with RISC,
instead of restricting the 32-bit instructions to 50% or less of the
opcode space (Instead, they're 75%, because even in header-less RISC-like mode, I saw having operate instructions that only take up 16 bits too important to exclude).
- have immediate values of all data types, without that making indicating instruction lengths more complicated.

Your argument for immediates made sense, but most ISAs exclude having
general immediates as too complicated - so I tried to find a highly conventional solution.

I definitely tried to lean away from super-CISC like Burroughs or even the VAX. I wanted to combine plain CISC (the 360) with RISC - the efficiencies both CISC and RISC provide. Block headers let the lengths of instructions
be determined in parallel, which seems fast... even though you are correct that instruction decoding is done so far ahead of time, it's not really a bottleneck. But the block structure also saves on opcode space.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Fri May 1 19:25:59 2026

From Newsgroup: comp.arch

On Fri, 01 May 2026 19:00:20 +0000, quadi wrote:

On Fri, 01 May 2026 17:36:27 +0000, MitchAlsup wrote:

An architecture is as much about what you leave out as what you leave
in.

In Concertina II, it certainly does seem like I've included a lot. The
ISA could be said to be bulging at the seams, like an overstuffed
suitcase. I can see how that would seem to be a bad choice.

To summarize - I've tried to include in Concertina II the features that obviously include performance, and are common in many well-known ISAs.

That led me to include base-index addressing, 16-bit displacements, and
banks of 32 registers. Combining all three of these was difficult enough,
so I did leave out memory operate instructions from the main instruction
set.

I did make one clear choice: when instruction lengths are variable,
they're variable in multiples of 16 bits. I liked the 360 and the 68020, I thought 16-bit and 48-bit instructions were efficient... but 8 bits was overkill; I didn't see x86 or the VAX as models to emulate.

And, yes, in some areas it may seem that I'm avoiding choices. But that is
a choice - a choice to have an ISA that doesn't dictate to the programmer
or the implementor one way of doing things. So one can choose a
programming style and an implementation that are best suited to one's task.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Fri May 1 12:29:01 2026

From Newsgroup: comp.arch

On 5/1/2026 12:00 PM, quadi wrote:

On Fri, 01 May 2026 17:36:27 +0000, MitchAlsup wrote:

An architecture is as much about what you leave out as what you leave
in.

That certainly is true.

And no matter what I do, since there are an infinity of possibilities, I
will always have left out more than I've included.

In Concertina II, it certainly does seem like I've included a lot. The ISA could be said to be bulging at the seams, like an overstuffed suitcase. I
can see how that would seem to be a bad choice.

Where is the emphasis in Concertina II? What does it prefer to leave in,
and what is it content to leave out?

The Motorola 68000 and the 80386 had groups of eight registers. The IBM
360 had sixteen integer registers, but only four floating-point ones. Most RISC processors have 32 registers.

Traditional processors had instructions that performed an arithmetic operation from memory into a register. RISC processors do arithmetic only
in registers, with operations to load and store from memory.

The System/360 addressed memory with a base register and an index register
in addition to a 12-bit displacement. Most microprocessors use 16-bit displacements, but usually only let you use one register with it.

I've tried to encompass all the features of these different processors as much as I could.

Accessing variables in arrays needs an index register for which element in the array is being accessed, and a base register in addition to the small displacement in the instruction.

I felt I couldn't leave _that_ out.

But every other modern architecture decided that they didn't need
separate base registers. Do you think they were wrong? Having them
doesn't save you anything - you still have to add to a register to
increment the element number. In fact it costs a little in the hardware
as the addressing requires two adds (Base+index+displacement) versus one (index + displacement). And it uses up a register needlessly. So why
include it?
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Sat May 2 01:41:53 2026

From Newsgroup: comp.arch

On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

But every other modern architecture decided that they didn't need
separate base registers. Do you think they were wrong? Having them
doesn't save you anything - you still have to add to a register to
increment the element number. In fact it costs a little in the hardware
as the addressing requires two adds (Base+index+displacement) versus one (index + displacement). And it uses up a register needlessly. So why
include it?

It's true that address calculation, especially for multi-dimensional
arrays, involves extra steps.

Base-index addressing doesn't force two additions every time; one chooses whether or not an instruction is indexed.

But when one is referring to an array element, it saves adding the displacement either to the address or the base value by means of an
explicit add instruction. One doesn't save a register by not having
indexing. One is still used to contain the modified address.

John Savard

--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat May 2 02:17:00 2026

From Newsgroup: comp.arch

quadi <quadibloc@ca.invalid> posted:

On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

But every other modern architecture decided that they didn't need
separate base registers. Do you think they were wrong? Having them doesn't save you anything - you still have to add to a register to increment the element number. In fact it costs a little in the hardware
as the addressing requires two adds (Base+index+displacement) versus one (index + displacement). And it uses up a register needlessly. So why include it?

It's true that address calculation, especially for multi-dimensional
arrays, involves extra steps.

Base-index addressing doesn't force two additions every time; one chooses whether or not an instruction is indexed.

People forget about the extra instructions:: lack of base+index causes
a) longer latency
b) more instructions
c) larger code footprint
d) compiler has to work harder
e)...

So the 2%-4% of its use causes 4%-8% more instructions which can be
eliminated for 1-extra gate of delay (3-input adder versus 2-input).

It is a more delicate balance than one presupposes.

Given base+index+displacement there are never any support instruc-
tions in memory access. Given displacement can be {16-bits, 32-bits,
or 64-bits} all of memory is accessible in a single instruction...
ALWAYS !! This gets rid of another 3%-ish of instruction footprint
tipping the balance from 6%-ish (average of above) to 12%-ish (with
these additional savings, tipping the balance towards "put it in".

But when one is referring to an array element, it saves adding the displacement either to the address or the base value by means of an
explicit add instruction. One doesn't save a register by not having indexing. One is still used to contain the modified address.

John Savard

--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Sat May 2 05:09:08 2026

From Newsgroup: comp.arch

On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

But every other modern architecture decided that they didn't need
separate base registers. Do you think they were wrong?

This is a kind of question that is hard to answer.

Do I think that I'm smarter than all those guys who designed RISC
processors? No, of course not.

But a lot of smart guys worked on the System/360 also. So the question may
be whether or not my goals are different from theirs.

After all, when the very first RISC machines came out, they didn't include floating-point arithmetic; while that was partly due to enough gates not
yet being available, the rationale was given that floating-point
arithmetic couldn't be done in one cycle.

That was quickly rejected as silly.

Index registers were considered a good idea back when they were originally introduced. It meant you could redirect an instruction to point somewhere
else without modifying the instruction in memory.

Base registers became a necessity once computer memories got so large -
over 64K locations or thereabouts - that it wasn't practical to put whole addresses in instructions. So the base register, although it works the
same way as an index register, does something different - an index
register might be incremented once per loop, while base registers are left alone.

So accessing an array requires one basically to copy a base register value into another register, and add the index to it. That's an extra
instruction. It may not be needed for every array access, as you can still increment that modified base value. (Hmm. So since an addition is removed
from address calculation in the instruction, one _could_ claim that
lacking index/base addressing forces an *optimization* to be done. I'll
have to think about that.)

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat May 2 06:19:43 2026

From Newsgroup: comp.arch

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:

On 5/1/2026 12:00 PM, quadi wrote:

The System/360 addressed memory with a base register and an index register >> in addition to a 12-bit displacement.

The S/360 has general-purpose registers,

Most microprocessors use 16-bit
displacements, but usually only let you use one register with it.

How do you count "most"? There are a lot of AMD64 processors that
offer reg+reg*[1248]+offset.

But every other modern architecture decided that they didn't need
separate base registers.

The S/360 does not have separate base registers.

One might consider the FS and GS registers of AMD64 to be dedicated
base registers. AFAIK FS is used for thread-local variables in some
OSs. But for single-threaded code, I have never seen any compiler use
FS or GS, and have not seen any assembly language program (other than
those for demonstrating their existence) use FS or GS, either. So
there seems to be little need for separate base registers.

Concerning addressing modes that involve GPRs, there are a lot of
statistics around about their use, and you can make your own
relatively easily by observing the usage of addressing modes on AMD64.
Note that for some registers, AMD64 requires the use of a displacement
even if it is 0 (that's because the encoding that one would expect for
the displacementless use of these registers has anoter meaning).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Sat May 2 02:37:14 2026

From Newsgroup: comp.arch

On 5/1/2026 9:17 PM, MitchAlsup wrote:

quadi <quadibloc@ca.invalid> posted:

On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

But every other modern architecture decided that they didn't need
separate base registers. Do you think they were wrong? Having them
doesn't save you anything - you still have to add to a register to
increment the element number. In fact it costs a little in the hardware >>> as the addressing requires two adds (Base+index+displacement) versus one >>> (index + displacement). And it uses up a register needlessly. So why
include it?

It's true that address calculation, especially for multi-dimensional
arrays, involves extra steps.

Base-index addressing doesn't force two additions every time; one chooses
whether or not an instruction is indexed.

People forget about the extra instructions:: lack of base+index causes
a) longer latency
b) more instructions
c) larger code footprint
d) compiler has to work harder
e)...

So the 2%-4% of its use causes 4%-8% more instructions which can be eliminated for 1-extra gate of delay (3-input adder versus 2-input).

It is a more delicate balance than one presupposes.

Given base+index+displacement there are never any support instruc-
tions in memory access. Given displacement can be {16-bits, 32-bits,
or 64-bits} all of memory is accessible in a single instruction...
ALWAYS !! This gets rid of another 3%-ish of instruction footprint
tipping the balance from 6%-ish (average of above) to 12%-ish (with
these additional savings, tipping the balance towards "put it in".

It is a thing:
Base+Displacement: Very Common;
Base+Index: 2nd most common;
Base+Index+Displacement: Uncommon;
Load/Store with Inc/Dec: Uncommon (In general, *1)

*1: If one uses it as PUSH/POP and then uses PUSH/POP for prologs and
epilogs, it would be common. Otherwise, usage falls off a cliff. Despite
C having special operator syntax for this, it is infrequently used, and typically nearly the only scenario where this scenario emerges naturally
and is the most efficient way to approach the problem. Where, one could
be like "what about memory and string operations?" but then find that
while intuitively auto-increment makes sense here, it is often not the
most efficient way to implement these even with these operations being present.

Typically, the first two addressing modes eating nearly the entire
load/store pie, as it were.

Can also note though that also for Load/Store displacements, the vast
majority of accesses are within a limit of 1K..4K, so a larger
displacement is overkill except for certain use-cases or certain base registers.

For a scaled displacement, the sweet spot being around 9 or 10 bits.
5 or 6-bits: Not quite enough.
7-bits: Sorta works, high miss rate;
8-bits: OK, still misses a lot;
9-bits: Good;
10 bits: Good.

Nearly all displacements are normal aligned to the element size, so
using an raw byte displacement for 32 or 64 bit items is effectively
throwing the bits away (would need 12 or 13 bits for similar effectiveness).

In my case, GP and PC being the main cases where one needs larger displacements.

There ended up being a few instructions in my case with a special
GP+Disp16 addressing mode.

Not for PC though, but in this case:
Spread relative to PC was too large;
Also not common enough to justify spending a significant chunk of
encoding space on it.

Though, this is where something like jumbo-prefixes worked well:
If 99% of the time, the small displacement works, and 1% of the time,
you can jump to 33 bits or so, which has a nearly 100% hit rate (and
when it doesn't hit, you are typically in need of absolute addressing).

...

--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Sat May 2 03:33:02 2026

From Newsgroup: comp.arch

On 5/2/2026 1:19 AM, Anton Ertl wrote:

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:

On 5/1/2026 12:00 PM, quadi wrote:

The System/360 addressed memory with a base register and an index register >>> in addition to a 12-bit displacement.

The S/360 has general-purpose registers,

Most microprocessors use 16-bit
displacements, but usually only let you use one register with it.

How do you count "most"? There are a lot of AMD64 processors that
offer reg+reg*[1248]+offset.

Also, in AMD64, your choice is mostly between 8 and 32 bit displacements.

The 16-bit displacements were only really a thing in 16-bit mode, and in
32 and 64 bit mode one needs a prefix to encode a 16-bit displacement,
which isn't really worth it (would only save 1 byte over the 32-bit displacement).

Ironically, statistical distributions would also favor 8/32 over either
8/16 or 16/32 in this case (where 8-bits would hit often enough to make
it a good choice over 16 bits, and 16-bits would miss often enough to
leave a need for a 32-bit case, but 32-bits would hardly ever miss).

This is, even with a byte-scaled 8-bit displacement having a comparably
poor hit rate.

Comparably (excluding things like M68K or similar), Disp16 seems to be infrequent.

But every other modern architecture decided that they didn't need
separate base registers.

The S/360 does not have separate base registers.

One might consider the FS and GS registers of AMD64 to be dedicated
base registers. AFAIK FS is used for thread-local variables in some
OSs. But for single-threaded code, I have never seen any compiler use
FS or GS, and have not seen any assembly language program (other than
those for demonstrating their existence) use FS or GS, either. So
there seems to be little need for separate base registers.

The role that FS/GS serves can be instead served by having an ABI
register for this purpose.

For example, X4/TP in RISC-V, etc.

Concerning addressing modes that involve GPRs, there are a lot of
statistics around about their use, and you can make your own
relatively easily by observing the usage of addressing modes on AMD64.
Note that for some registers, AMD64 requires the use of a displacement
even if it is 0 (that's because the encoding that one would expect for
the displacementless use of these registers has anoter meaning).

Yeah, Mod/RM byte stuff is a little wonky here...

Also that the encodings sort of punish one for using ESP as a base
register despite it being one of the most frequently used registers as a
base register (effectively one more often needs to pay the cost of using
a SIB byte). Though, traditional x86 ABIs used EBP rather than ESP for accessing stack locals (but, the use of frame-pointers mostly went away
with the 64-bit ABIs).

Say:
[rAX/rCX/rDX/rBX/rSI/rDI ] : 1-byte
[rAX/rCX/rDX/rBX/rBP/rSI/rDI+Disp8 ] : 2-bytes
[rAX/rCX/rDX/rBX/rBP/rSI/rDI+Disp32] : 5-bytes
[rSP+Disp8 ]: 3 bytes (SIB tax)
[rSP+Disp32]: 6 bytes (SIB tax)

rSP being an escape-case to the SIB byte, and the would-be [rBP]
encoding an Abs32 case in 32-bit x86, and [RIP+Disp32] in X64.

Then:
[Rb+Ri*{1/2/4/8}]
[Rb+Ri*{1/2/4/8}+Disp8]
[Rb+Ri*{1/2/4/8}+Disp32]
Via the SIB.
Where:
If Ri==rSP, this encodes Ri=ZERO.
...

Where, because decoding doesn't care about REX, similar wonk applies to
R12 and R13.

...

But, yeah:
0..N prefix bytes
Optional REX prefix
1/2 byte main opcode
optional Mod/RM (depends on opcode)
optional 1/4/8 byte main immediate (depends on opcode).

Sort of amazing they managed to make instruction decoding scale as well
as it has.

- anton

--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Sat May 2 12:15:50 2026

From Newsgroup: comp.arch

On Sat, 02 May 2026 06:19:43 +0000, Anton Ertl wrote:

The S/360 has general-purpose registers,

The S/360 does not have separate base registers.

That's true. But I think he was talking about the fact that S/360 memory- reference instructions had one field to specify a general register to use
as the index register, and another field to specify a general register to
use as the base register, while most modern architectures only have _one_ field to specify a register the contents of which are to be added to the displacement.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Sat May 2 05:37:21 2026

From Newsgroup: comp.arch

On 5/1/2026 10:09 PM, quadi wrote:

On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

But every other modern architecture decided that they didn't need
separate base registers. Do you think they were wrong?

This is a kind of question that is hard to answer.

Do I think that I'm smarter than all those guys who designed RISC
processors? No, of course not.

But a lot of smart guys worked on the System/360 also. So the question may
be whether or not my goals are different from theirs.

Not goals exactly, but constraints. Remember, S/360 had no memory
address relocation hardware such as a hardware base register or paging.
The addresses in a program were real memory addresses. Thus address say
1,000 in a program was real memory address 1,000. So absent base
registers, you couldn't have more than one program in memory at the same
time, since program1's address 1,000 would have referred to the same
real memory address as program2's address 1,000. That's why each
program started with a BALR instruction to put the real address of where
the program was loaded into a base register and memory reference
instructions had a base register field to add that address to the one specified in the instruction.

Modern CPUs, and I presume your design, use paging to allow multiple occurrences of the same address (in different programs) to refer to
different real memory addresses, thus don't need to specify a base
address in every memory reference instruction.

After all, when the very first RISC machines came out, they didn't include floating-point arithmetic; while that was partly due to enough gates not
yet being available, the rationale was given that floating-point
arithmetic couldn't be done in one cycle.

That was quickly rejected as silly.

Index registers were considered a good idea back when they were originally introduced. It meant you could redirect an instruction to point somewhere else without modifying the instruction in memory.

Base registers became a necessity once computer memories got so large -
over 64K locations or thereabouts - that it wasn't practical to put whole addresses in instructions.

No. If that were true then every other CPU design that supported more
than your 64K locations, including current ones, would require explicit
base address registers specifiers in instructions. The use of virtual
memory, e.g. paging, obviates that requirement.

So the base register, although it works the
same way as an index register, does something different - an index
register might be incremented once per loop, while base registers are left alone.

So accessing an array requires one basically to copy a base register value into another register, and add the index to it.

No. Do you think that current CPU designs require that? They do not.
You simply load the starting address of the array into an index register
and add to that as needed.
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.21f-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Sat May 2 15:44:39 2026

From Newsgroup: comp.arch

quadi <quadibloc@ca.invalid> writes:

On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

But every other modern architecture decided that they didn't need
separate base registers. Do you think they were wrong?

This is a kind of question that is hard to answer.

Do I think that I'm smarter than all those guys who designed RISC >processors? No, of course not.

But a lot of smart guys worked on the System/360 also.

It was a product of the times. We've advanced well beyond
that in the intervening half-century. Really, the 360 operating
systems were relatively crude and difficult to use compared
with the contemporaneous competition.

It's true, however, that no CEO was ever fired for buying IBM.

Although per Google:

Origin: The saying gained prominence during the 1980s,
when IT decisions were driven by Fear, Uncertainty, and
Doubt (FUD) tactics, aiming to avoid personal accountability.

<snip>

Index registers were considered a good idea back when they were originally >introduced. It meant you could redirect an instruction to point somewhere >else without modifying the instruction in memory.

The earliest incarnations of such were often not 'registers' per-se, but
rather reserved locations in memory (c.f. PDP-8 'TAD I'). The Electrodata
220 had a 'B' register - the predecessor Electrodata 205 was the first commercial computer to offer an Index register (with the idea inspired
by the Manchester Mark I).

Base registers became a necessity once computer memories got so large -
over 64K locations or thereabouts - that it wasn't practical to put whole >addresses in instructions.

That may be true for the IBM 360 (although they could have updated the arch), but
clearly there were contemporaneous systems (B3500, for example) which
supported direct access to 1 million locations without needing index registers (although it did have three of them), and more exotic architectures like
the stack-based B5500 and successors.

--- Synchronet 3.21f-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sat May 2 15:57:51 2026

From Newsgroup: comp.arch

On 2026-05-02, Stephen Fuld <sfuld@alumni.cmu.edu.invalid> wrote:

On 5/1/2026 10:09 PM, quadi wrote:

On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

But every other modern architecture decided that they didn't need
separate base registers. Do you think they were wrong?

This is a kind of question that is hard to answer.

Do I think that I'm smarter than all those guys who designed RISC
processors? No, of course not.

But a lot of smart guys worked on the System/360 also. So the question may >> be whether or not my goals are different from theirs.

Not goals exactly, but constraints. Remember, S/360 had no memory
address relocation hardware such as a hardware base register or paging.
The addresses in a program were real memory addresses. Thus address say 1,000 in a program was real memory address 1,000. So absent base
registers, you couldn't have more than one program in memory at the same time, since program1's address 1,000 would have referred to the same
real memory address as program2's address 1,000.

Small quibble: This depends on what your loader does. IIRC
(I would have to re-read John Levine's book on linkers and loaders
to be sure) it could do relocation on program start. Not sure if
they could have done away with the base registers completely, though.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Sat May 2 09:45:27 2026

From Newsgroup: comp.arch

On 5/2/2026 8:44 AM, Scott Lurndal wrote:

quadi <quadibloc@ca.invalid> writes:

On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

But every other modern architecture decided that they didn't need
separate base registers. Do you think they were wrong?

This is a kind of question that is hard to answer.

Do I think that I'm smarter than all those guys who designed RISC
processors? No, of course not.

But a lot of smart guys worked on the System/360 also.

It was a product of the times. We've advanced well beyond
that in the intervening half-century.

Well, 60 years, but who's counting? :-) But you are absolutely right
about the advancements.

Really, the 360 operating
systems were relatively crude and difficult to use compared
with the contemporaneous competition.

Agreed. But they had the disadvantage of the requirement of providing a family of sort of compatible OSs for a wide range of computer models.

snip

Index registers were considered a good idea back when they were originally >> introduced. It meant you could redirect an instruction to point somewhere
else without modifying the instruction in memory.

Yes.

The earliest incarnations of such were often not 'registers' per-se, but rather reserved locations in memory (c.f. PDP-8 'TAD I'). The Electrodata 220 had a 'B' register - the predecessor Electrodata 205 was the first commercial computer to offer an Index register (with the idea inspired
by the Manchester Mark I).

I don't know the when the Electrodata 205 came out, but the Univac 1107 offered real index registers in about 1962, certainly predating the PDP-8.
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.21f-Linux NewsLink 1.2

From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Sat May 2 10:01:35 2026

From Newsgroup: comp.arch

On 5/1/2026 7:17 PM, MitchAlsup wrote:

quadi <quadibloc@ca.invalid> posted:

On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

But every other modern architecture decided that they didn't need
separate base registers. Do you think they were wrong? Having them
doesn't save you anything - you still have to add to a register to
increment the element number. In fact it costs a little in the hardware >>> as the addressing requires two adds (Base+index+displacement) versus one >>> (index + displacement). And it uses up a register needlessly. So why
include it?

It's true that address calculation, especially for multi-dimensional
arrays, involves extra steps.

Base-index addressing doesn't force two additions every time; one chooses
whether or not an instruction is indexed.

People forget about the extra instructions:: lack of base+index causes
a) longer latency
b) more instructions
c) larger code footprint
d) compiler has to work harder
e)...

So the 2%-4% of its use causes 4%-8% more instructions which can be eliminated for 1-extra gate of delay (3-input adder versus 2-input).

It is a more delicate balance than one presupposes.

I can believe that.

Given base+index+displacement there are never any support instruc-
tions in memory access. Given displacement can be {16-bits, 32-bits,
or 64-bits} all of memory is accessible in a single instruction...

Yes, but adding the specification of a base register takes instruction
bits away from somewhere else, typically the displacement. So the
S/360s choice to use them reduced the displacement to 12 bits. So
larger programs required use of multiple base registers, which required loading them, i.e. extra support instructions, and increased register
pressure (though with the availability of storage to storage
instructions, that was less of an issue)

ALWAYS !! This gets rid of another 3%-ish of instruction footprint
tipping the balance from 6%-ish (average of above) to 12%-ish (with
these additional savings, tipping the balance towards "put it in".

Then why did just about every modern architecture, including your My
66000, omit them?
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Sat May 2 17:25:14 2026

From Newsgroup: comp.arch

On Sat, 02 May 2026 05:37:21 -0700, Stephen Fuld wrote:

Modern CPUs, and I presume your design, use paging to allow multiple occurrences of the same address (in different programs) to refer to
different real memory addresses, thus don't need to specify a base
address in every memory reference instruction.

Actually, that conclusion isn't quite right. This would work for a CPU
like Intel's 432. But I intend programs to be able to work with large
linear address spaces bigger than 64K, bigger than the displacement field
in an instruction. That means a base register is still needed despite
hardware paging features being potentially present.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat May 2 17:35:08 2026

From Newsgroup: comp.arch

quadi <quadibloc@ca.invalid> writes:

But a lot of smart guys worked on the System/360 also.

But they did not design immediate operands into the architecture. I
wonder why that is. It increases the instruction count by about 50%.

After all, when the very first RISC machines came out, they didn't include >floating-point arithmetic;

They actually did. The RISC workstations from HP, Sun, MIPS, SGI
etc. all included an FPU. Only Acorn was in a different market that
did not require providing an FPU.

while that was partly due to enough gates not
yet being available, the rationale was given that floating-point
arithmetic couldn't be done in one cycle.

Who gave that rationale? AFAIK they had FPUs that could start an FP
operation at every cycle, and as far as latency is concerned, every
RISC implementation has instructions with latency >1 cycle already in
their integer subset.

Base registers became a necessity once computer memories got so large -
over 64K locations or thereabouts - that it wasn't practical to put whole >addresses in instructions.

IA-32 is a counterexample for your claim. You can use the direct
addressing mode for the whole address space. The replacement of the
direct addressing mode on AMD64 is not something involving a base
register, but offset(%rip).

So accessing an array requires one basically to copy a base register value >into another register, and add the index to it. That's an extra
instruction. It may not be needed for every array access, as you can still >increment that modified base value. (Hmm. So since an addition is removed >from address calculation in the instruction, one _could_ claim that
lacking index/base addressing forces an *optimization* to be done. I'll
have to think about that.)

If you look at the code for MIPS/Alpha/RISC-V with their single
addressing mode offset(reg), you will find that the compilers tend to
keep array cursors in registers, and tend to update them on every
iteration. With a little luck, the loop-end check can be transformed
into an array-end check, and the index variable is eliminated through
an optimization called induction variable elimination. I have
recently seen that in wasm code produced by clang.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21f-Linux NewsLink 1.2

From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Sat May 2 11:05:06 2026

From Newsgroup: comp.arch

On 5/2/2026 10:25 AM, quadi wrote:

On Sat, 02 May 2026 05:37:21 -0700, Stephen Fuld wrote:

Modern CPUs, and I presume your design, use paging to allow multiple
occurrences of the same address (in different programs) to refer to
different real memory addresses, thus don't need to specify a base
address in every memory reference instruction.

Actually, that conclusion isn't quite right. This would work for a CPU
like Intel's 432. But I intend programs to be able to work with large
linear address spaces bigger than 64K, bigger than the displacement field
in an instruction. That means a base register is still needed despite hardware paging features being potentially present.

Why don't you just use an index register like just about every other architecture (except S/360 derivative) systems do?
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.21f-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat May 2 18:10:23 2026

From Newsgroup: comp.arch

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:

On 5/1/2026 10:09 PM, quadi wrote:

[...]

Modern CPUs, and I presume your design, use paging to allow multiple >occurrences of the same address (in different programs) to refer to >different real memory addresses, thus don't need to specify a base
address in every memory reference instruction.

And then we got ASLR, and now we have to live in a world again where
the code and the static data don't live in fixed locations.

So accessing an array requires one basically to copy a base register value >> into another register, and add the index to it.

No. Do you think that current CPU designs require that? They do not.
You simply load the starting address of the array into an index register
and add to that as needed.

In the general case (i.e., when the array index is not the counter of
a counted loop), instruction sets like MIPS, Alpha, and RISC-V need
additional instructions for computing the address of the array
element, and only then use a load or store instruction to access the
element. However, these architectures are three-address
architectures, so the starting address of the array does not have to
be copied first in this process.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21f-Linux NewsLink 1.2

From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Sat May 2 11:22:40 2026

From Newsgroup: comp.arch

On 5/2/2026 10:01 AM, Stephen Fuld wrote:

On 5/1/2026 7:17 PM, MitchAlsup wrote:

quadi <quadibloc@ca.invalid> posted:

On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

But every other modern architecture decided that they didn't need
separate base registers.-a Do you think they were wrong?-a Having them >>>> doesn't save you anything - you still have to add to a register to
increment the element number.-a In fact it costs a little in the
hardware
as the addressing requires two adds (Base+index+displacement) versus
one
(index + displacement).-a And it uses up a register needlessly. So why >>>> include it?

It's true that address calculation, especially for multi-dimensional
arrays, involves extra steps.

Base-index addressing doesn't force two additions every time; one
chooses
whether or not an instruction is indexed.

People forget about the extra instructions:: lack of base+index causes
a) longer latency
b) more instructions
c) larger code footprint
d) compiler has to work harder
e)...

So the 2%-4% of its use causes 4%-8% more instructions which can be
eliminated for 1-extra gate of delay (3-input adder versus 2-input).

It is a more delicate balance than one presupposes.

I can believe that.

Given base+index+displacement there are never any support instruc-
tions in memory access. Given displacement can be {16-bits, 32-bits,
or 64-bits} all of memory is accessible in a single instruction...

Yes, but adding the specification of a base register takes instruction
bits away from somewhere else, typically the displacement.-a So the
S/360s choice to use them reduced the displacement to 12 bits.-a So
larger programs required use of multiple base registers, which required loading them, i.e. extra support instructions, and increased register pressure (though with the availability of storage to storage
instructions, that was less of an issue)

ALWAYS !! This gets rid of another 3%-ish of instruction footprint
tipping the balance from 6%-ish (average of above) to 12%-ish (with
these additional savings, tipping the balance towards "put it in".

Then why did just about every modern architecture, including your My
66000, omit them?

Apologies! I see that for loads and stores, your design does offer such modes.
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat May 2 18:33:00 2026

From Newsgroup: comp.arch

quadi <quadibloc@ca.invalid> posted:

On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

But every other modern architecture decided that they didn't need
separate base registers. Do you think they were wrong?

This is a kind of question that is hard to answer.

Do I think that I'm smarter than all those guys who designed RISC processors? No, of course not.

But a lot of smart guys worked on the System/360 also. So the question may be whether or not my goals are different from theirs.

Yes, you include all 360 kinds of data and about 50% more to cover
more bases.

After all, when the very first RISC machines came out, they didn't include floating-point arithmetic; while that was partly due to enough gates not
yet being available, the rationale was given that floating-point
arithmetic couldn't be done in one cycle.

MIPS had FP, SPARC had FP, Mc88K had FP, Clipper had FP.
What RISCS are you speaking ??

That was quickly rejected as silly.

Index registers were considered a good idea back when they were originally introduced. It meant you could redirect an instruction to point somewhere else without modifying the instruction in memory.

The index registers of /360 required compiler strength reduction
and loop invariant treatment. Scaled index registers do not.

Base registers became a necessity once computer memories got so large -
over 64K locations or thereabouts - that it wasn't practical to put whole addresses in instructions.

Any yet My 66000 CAN !!

So the base register, although it works the
same way as an index register, does something different - an index
register might be incremented once per loop, while base registers are left alone.

So are displacements.

So accessing an array requires one basically to copy a base register value into another register, and add the index to it. That's an extra
instruction.

You are making assumptions that are not necessary. Why don't you
spell them out.

It may not be needed for every array access, as you can still increment that modified base value. (Hmm. So since an addition is removed from address calculation in the instruction, one _could_ claim that
lacking index/base addressing forces an *optimization* to be done. I'll
have to think about that.)

John Savard

--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat May 2 18:48:48 2026

From Newsgroup: comp.arch

quadi <quadibloc@ca.invalid> posted:

On Sat, 02 May 2026 06:19:43 +0000, Anton Ertl wrote:

The S/360 has general-purpose registers,

The S/360 does not have separate base registers.

That's true. But I think he was talking about the fact that S/360 memory- reference instructions had one field to specify a general register to use
as the index register, and another field to specify a general register to use as the base register, while most modern architectures only have _one_ field to specify a register the contents of which are to be added to the displacement.

And you have put your finger on what is wrong with most modern ISAs.

John Savard

--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Sat May 2 13:58:52 2026

From Newsgroup: comp.arch

On 5/2/2026 1:22 PM, Stephen Fuld wrote:

On 5/2/2026 10:01 AM, Stephen Fuld wrote:

On 5/1/2026 7:17 PM, MitchAlsup wrote:

quadi <quadibloc@ca.invalid> posted:

On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

But every other modern architecture decided that they didn't need
separate base registers.-a Do you think they were wrong?-a Having them >>>>> doesn't save you anything - you still have to add to a register to
increment the element number.-a In fact it costs a little in the
hardware
as the addressing requires two adds (Base+index+displacement)
versus one
(index + displacement).-a And it uses up a register needlessly. So why >>>>> include it?

It's true that address calculation, especially for multi-dimensional
arrays, involves extra steps.

Base-index addressing doesn't force two additions every time; one
chooses
whether or not an instruction is indexed.

People forget about the extra instructions:: lack of base+index causes
a) longer latency
b) more instructions
c) larger code footprint
d) compiler has to work harder
e)...

So the 2%-4% of its use causes 4%-8% more instructions which can be
eliminated for 1-extra gate of delay (3-input adder versus 2-input).

It is a more delicate balance than one presupposes.

I can believe that.

Given base+index+displacement there are never any support instruc-
tions in memory access. Given displacement can be {16-bits, 32-bits,
or 64-bits} all of memory is accessible in a single instruction...

Yes, but adding the specification of a base register takes instruction
bits away from somewhere else, typically the displacement.-a So the
S/360s choice to use them reduced the displacement to 12 bits.-a So
larger programs required use of multiple base registers, which
required loading them, i.e. extra support instructions, and increased
register pressure (though with the availability of storage to storage
instructions, that was less of an issue)

ALWAYS !! This gets rid of another 3%-ish of instruction footprint
tipping the balance from 6%-ish (average of above) to 12%-ish (with
these additional savings, tipping the balance towards "put it in".

Then why did just about every modern architecture, including your My
66000, omit them?

Apologies!-a I see that for loads and stores, your design does offer such modes.

FWIW:
XG2 and XG3 also essentially include [Rb+Ri*Sc+Disp] as an optional feature.

Just in my own stats, I didn't see them coming up often enough in
practice to justify having them as part of the core ISA, nor necessarily enabled in HW (they are, alongside the Load-Op stuff).

Theoretically can do:
ADDS.L (SP, R12, 16), R14
For:
R14+=((int *)(SP+16))[R12]

But, this is niche...

--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat May 2 19:00:22 2026

From Newsgroup: comp.arch

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> posted:

On 5/1/2026 7:17 PM, MitchAlsup wrote:

quadi <quadibloc@ca.invalid> posted:

On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

But every other modern architecture decided that they didn't need
separate base registers. Do you think they were wrong? Having them
doesn't save you anything - you still have to add to a register to
increment the element number. In fact it costs a little in the hardware >>> as the addressing requires two adds (Base+index+displacement) versus one >>> (index + displacement). And it uses up a register needlessly. So why
include it?

It's true that address calculation, especially for multi-dimensional
arrays, involves extra steps.

Base-index addressing doesn't force two additions every time; one chooses >> whether or not an instruction is indexed.

People forget about the extra instructions:: lack of base+index causes
a) longer latency
b) more instructions
c) larger code footprint
d) compiler has to work harder
e)...

So the 2%-4% of its use causes 4%-8% more instructions which can be eliminated for 1-extra gate of delay (3-input adder versus 2-input).

It is a more delicate balance than one presupposes.

I can believe that.

Given base+index+displacement there are never any support instruc-
tions in memory access. Given displacement can be {16-bits, 32-bits,
or 64-bits} all of memory is accessible in a single instruction...

Yes, but adding the specification of a base register takes instruction
bits away from somewhere else, typically the displacement. So the

In My 66000 case there is an instruction format of [base+disp16]
and there is a different instruction format of [base+index<<scale]
which can have {disp32 or disp64} optionally appended as a constant.

S/360s choice to use them reduced the displacement to 12 bits. So

We now understand that this is a less than optimal choice.

larger programs required use of multiple base registers,

This is a side effect of branching using standard memory address
format, and the small displacement, ameliorated with positive only
12-bit displacements
.

which required loading them, i.e. extra support instructions, and increased register pressure (though with the availability of storage to storage
instructions, that was less of an issue)

Which is why /360 is (or should be) not considered a "great ISA"
to copy.

ALWAYS !! This gets rid of another 3%-ish of instruction footprint
tipping the balance from 6%-ish (average of above) to 12%-ish (with
these additional savings, tipping the balance towards "put it in".

Then why did just about every modern architecture, including your My
66000, omit them?

I was arguing that My 66000 has them while most modern ISAs do not.
--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat May 2 19:02:43 2026

From Newsgroup: comp.arch

quadi <quadibloc@ca.invalid> posted:

On Sat, 02 May 2026 05:37:21 -0700, Stephen Fuld wrote:

Modern CPUs, and I presume your design, use paging to allow multiple occurrences of the same address (in different programs) to refer to different real memory addresses, thus don't need to specify a base
address in every memory reference instruction.

Actually, that conclusion isn't quite right. This would work for a CPU
like Intel's 432. But I intend programs to be able to work with large
linear address spaces bigger than 64K, bigger than the displacement field
in an instruction. That means a base register is still needed despite

My 66000 can directly address all 64-bits of VAS without using a base
register. {A special feature when DISP64 has R0 as its base register}.

hardware paging features being potentially present.

Paging is always on in My 66000--even as one comes out of reset.

John Savard

--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat May 2 19:10:24 2026

From Newsgroup: comp.arch

anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:

On 5/1/2026 10:09 PM, quadi wrote:

[...]

Modern CPUs, and I presume your design, use paging to allow multiple >occurrences of the same address (in different programs) to refer to >different real memory addresses, thus don't need to specify a base
address in every memory reference instruction.

And then we got ASLR, and now we have to live in a world again where
the code and the static data don't live in fixed locations.

In My 66000 case, the code does not know it is in ALSR mode
(or not) things that are accessed via ALSR go through GOT[].

So accessing an array requires one basically to copy a base register value >> into another register, and add the index to it.

No. Do you think that current CPU designs require that? They do not.
You simply load the starting address of the array into an index register >and add to that as needed.

In the general case (i.e., when the array index is not the counter of
a counted loop), instruction sets like MIPS, Alpha, and RISC-V need additional instructions for computing the address of the array
element, and only then use a load or store instruction to access the
element. However, these architectures are three-address
architectures, so the starting address of the array does not have to
be copied first in this process.

Agreed.

- anton

--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Mon May 4 02:34:59 2026

From Newsgroup: comp.arch

I've made one more little addition to the instruction set.

Now the 48-bit long instructions can be placed within a 64-bit pair of 32-
bit "instructions" that can be placed within code without headers.

At first, when I found the opcode space to do that with, it seemed there
was a conflict which prevented these 64-bit encapsulated instructions from appearing in code with variable-length instructions.

Yes, normally they wouldn't be needed there, since in that code, 48-bit instructions can be expressed in just 48 bits. But those existing 48-bit instructions require prefix bits to distinguish them from the regular 32-
bit instructions in the code. The encapsulation format has room for any arbitrary combination of 48 bits. So _additional_ 48-bit instructions
which don't have the necessary prefix could be defined, which can only
appear in encapsulated form in either kind of code. (So they're really additional 64-bit instructions, but I call them 48-bit because of the fact that they're placed in association with the real 48-bit instructions.)

Fortunately, though, I was able to straighten things out and eliminate the conflict without doing violence to the bit-mappings in the instruction set.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Mon May 4 02:26:33 2026

From Newsgroup: comp.arch

On Sat, 02 May 2026 18:33:00 +0000, MitchAlsup wrote:

quadi <quadibloc@ca.invalid> posted:

After all, when the very first RISC machines came out, they didn't
include floating-point arithmetic; while that was partly due to enough
gates not yet being available, the rationale was given that
floating-point arithmetic couldn't be done in one cycle.

MIPS had FP, SPARC had FP, Mc88K had FP, Clipper had FP.
What RISCS are you speaking ??

I'm remembering an article in Scientific American which explained the
concept of RISC, written by the designer of one of the very first RISC processors (probably MIPS).

Yes, MIPS has FP now, but possibly the very first MIPS processor didn't.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Mon May 4 02:45:01 2026

From Newsgroup: comp.arch

On Mon, 04 May 2026 02:34:59 +0000, quadi wrote:

Yes, normally they wouldn't be needed there, since in that code, 48-bit instructions can be expressed in just 48 bits. But those existing 48-bit instructions require prefix bits to distinguish them from the regular
32- bit instructions in the code. The encapsulation format has room for
any arbitrary combination of 48 bits. So _additional_ 48-bit
instructions which don't have the necessary prefix could be defined,
which can only appear in encapsulated form in either kind of code. (So they're really additional 64-bit instructions, but I call them 48-bit
because of the fact that they're placed in association with the real
48-bit instructions.)

And, of course, in the event that I ever do define any instructions in
this category, I could always define a new header format in which a
particular combination of bits in a prefix field indicates that the 16-bit zone to which it corresponds contains the start of a 48-bit instruction
and not one of any other length, which would then give them the right to
be called 48-bit instructions.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Mon May 4 05:45:44 2026

From Newsgroup: comp.arch

quadi <quadibloc@ca.invalid> writes:

Yes, MIPS has FP now, but possibly the very first MIPS processor didn't.

The very first MIPS processor was the R2000, and, as Wikipedia says,

|The chipset consisted of the R2000 microprocessor, R2010
|floating-point accelerator, and four R2020 write buffer chips.

I don't know if the R2000 could work without R2010, but I am pretty
sure that there never was a machine with an R2000, but without R2010.

Later, when MIPS processors became cheap enough for embedded
computing, there probably were MIPS processors without FPU, but that's
not because of RISC principles or something, but because of the
non-existing willingness of the customers to pay for that
functionality.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21f-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Mon May 4 05:59:04 2026

From Newsgroup: comp.arch

EricP <ThatWouldBeTelling@thevillage.com> schrieb:

Performance Effects of Architectural Complexity in the Intel 432, 1988 https://www.princeton.edu/~rblee/ELE572Papers/Fall04Readings/I432.pdf

Wow... paved with good intentions and all that.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Mon May 4 16:38:23 2026

From Newsgroup: comp.arch

On Mon, 04 May 2026 05:59:04 +0000, Thomas Koenig wrote:

EricP <ThatWouldBeTelling@thevillage.com> schrieb:

Performance Effects of Architectural Complexity in the Intel 432, 1988
https://www.princeton.edu/~rblee/ELE572Papers/Fall04Readings/I432.pdf

Wow... paved with good intentions and all that.

But the sign over the door of Hell says:

Abandon all hope, ye who enter here.

The iAPX 432 didn't doom computing permanently. It was just a learning experience. Which Intel survived.

It learned not to make that mistake again.

There was the 860 - from which I temporarily took some inspiration.

There was the Itanium.

Oh, dear. We _are_ doomed to eternal suffering, after all: what Intel took from all these learning experiences was to never try to deviate from x86
ever again!

John Savard

--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Mon May 4 16:39:01 2026

From Newsgroup: comp.arch

On Sun, 03 May 2026 11:47:32 -0400, EricP wrote:

Performance Effects of Architectural Complexity in the Intel 432, 1988 https://www.princeton.edu/~rblee/ELE572Papers/Fall04Readings/I432.pdf

Thank you very much for the informative link.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Mon May 4 18:06:18 2026

From Newsgroup: comp.arch

quadi <quadibloc@ca.invalid> posted:

I've made one more little addition to the instruction set.

I just had an epiphany wrt Concertina II

Now the 48-bit long instructions can be placed within a 64-bit pair of 32- bit "instructions" that can be placed within code without headers.

Headers are nothing more than mode-bits that change every block of code.
This means that ISA will be exceptionally difficult to verify, and
as you have found: very difficult to encode.

At first, when I found the opcode space to do that with, it seemed there
was a conflict which prevented these 64-bit encapsulated instructions from appearing in code with variable-length instructions.

Yes, normally they wouldn't be needed there, since in that code, 48-bit instructions can be expressed in just 48 bits. But those existing 48-bit instructions require prefix bits to distinguish them from the regular 32-
bit instructions in the code. The encapsulation format has room for any arbitrary combination of 48 bits. So _additional_ 48-bit instructions
which don't have the necessary prefix could be defined, which can only appear in encapsulated form in either kind of code. (So they're really additional 64-bit instructions, but I call them 48-bit because of the fact that they're placed in association with the real 48-bit instructions.)

Fortunately, though, I was able to straighten things out and eliminate the conflict without doing violence to the bit-mappings in the instruction set.

John Savard

--- Synchronet 3.21f-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Mon May 4 13:46:04 2026

From Newsgroup: comp.arch

On 5/4/2026 1:06 PM, MitchAlsup wrote:

quadi <quadibloc@ca.invalid> posted:

I've made one more little addition to the instruction set.

I just had an epiphany wrt Concertina II

Now the 48-bit long instructions can be placed within a 64-bit pair of 32- >> bit "instructions" that can be placed within code without headers.

Headers are nothing more than mode-bits that change every block of code.
This means that ISA will be exceptionally difficult to verify, and
as you have found: very difficult to encode.

I had an idea for a vaguely similar sort of feature in the 'F3 Block'
that I had called DCB's (or "Dynamically Configurable Blocks") where
encodings could be selected from a potentially open-ended set and mapped
into the 32-bit space at runtime.

I ended up putting it in the newer spec I am working on that these
should not be used in statically compiled binaries (or, basically, configurable instructions that are only really allowed for JIT compiled
code).

Meanwhile, recently decided to give my XG3 ISA spec to CoPilot and see
what it had to say about it.

Apparently it decided to assert that it has a complexity more like
x86-64 than it does like RISC-V.

I will disagree partly, as to what extent I had worked with trying to
emulate x86, or looking into writing a decoder in Verilog, it is a much
bigger PITA.

Even if, yes, when fully decked out (with all of the features enabled)
it will effectively have some instructions with both an immediate and displacement, and a similar [Rb+Ri*Sc+Disp] addressing mode.

It got more optimistic once I explained its role a little more, namely
that it is not intended as a RISC-V replacement rather more as a way to
have code (in a RISC-V based mode) that would have higher performance
for certain types of tasks (like having an OpenGL implementation that
isn't dead slow).

...

At first, when I found the opcode space to do that with, it seemed there
was a conflict which prevented these 64-bit encapsulated instructions from >> appearing in code with variable-length instructions.

Yes, normally they wouldn't be needed there, since in that code, 48-bit
instructions can be expressed in just 48 bits. But those existing 48-bit
instructions require prefix bits to distinguish them from the regular 32-
bit instructions in the code. The encapsulation format has room for any
arbitrary combination of 48 bits. So _additional_ 48-bit instructions
which don't have the necessary prefix could be defined, which can only
appear in encapsulated form in either kind of code. (So they're really
additional 64-bit instructions, but I call them 48-bit because of the fact >> that they're placed in association with the real 48-bit instructions.)

Fortunately, though, I was able to straighten things out and eliminate the >> conflict without doing violence to the bit-mappings in the instruction set. >>
John Savard

--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Mon May 4 21:43:33 2026

From Newsgroup: comp.arch

On Mon, 04 May 2026 02:45:01 +0000, quadi wrote:

And, of course, in the event that I ever do define any instructions in
this category, I could always define a new header format in which a particular combination of bits in a prefix field indicates that the
16-bit zone to which it corresponds contains the start of a 48-bit instruction and not one of any other length, which would then give them
the right to be called 48-bit instructions.

On a previous attempt, I couldn't find this post, so I started a new
thread where I mention that, without defining any new 48-bit instructions
yet, I did add - within an existing header - the option of indicating 48-
bit instructions for this purpose.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Mon May 4 20:00:42 2026

From Newsgroup: comp.arch

Later, when MIPS processors became cheap enough for embedded
computing, there probably were MIPS processors without FPU,

Definitely: my ASUS WL-700gE (some kind of cheap "wifi router + NAS"
from 20 years ago) had a MIPS processor and it lacked an FPU. I know
because I was using it as a jukebox (running MusicPD) and I had to be
extra careful to build it with the "idec" Vorbis decoder that was
written specially to avoid floating point operations. Plus I had to be
careful to avoid any resampling because that too tended to use the FPU.

IIRC, FPU instructions were supported via traps, so software would still "work", but it was excruciatingly slow (unusable for real-time use such
as media playback).

=== Stefan
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Tue May 5 04:33:53 2026

From Newsgroup: comp.arch

In connection with the changes described in these posts, I had needed to
cut the opcode space available to operate instructions in half.
In re-examining the opcode space available, I've found that I could have three-quarters instead of just half of the original opcode space
available, and this let me add back much of what I had lost in that area.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Tue May 5 14:02:39 2026

From Newsgroup: comp.arch

On Tue, 05 May 2026 04:33:53 +0000, quadi wrote:

In connection with the changes described in these posts, I had needed to
cut the opcode space available to operate instructions in half.
In re-examining the opcode space available, I've found that I could have three-quarters instead of just half of the original opcode space
available, and this let me add back much of what I had lost in that
area.

I've managed to add back one other thing I had previously had to remove:
the alternate instruction format for the VLIW style of code.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Wed May 6 01:06:04 2026

From Newsgroup: comp.arch

On Sat, 02 May 2026 18:33:00 +0000, MitchAlsup wrote:

quadi <quadibloc@ca.invalid> posted:

So accessing an array requires one basically to copy a base register
value into another register, and add the index to it. That's an extra
instruction.

You are making assumptions that are not necessary. Why don't you spell
them out.

I was thinking about accessing one array element randomly in isolation.
Later on, though, I realized that if one is stepping through an array sequentially in a loop, one register is indeed good enough, and thus saves
an addition.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Wed May 6 01:10:39 2026

From Newsgroup: comp.arch

On Mon, 04 May 2026 18:06:18 +0000, MitchAlsup wrote:

quadi <quadibloc@ca.invalid> posted:

I've made one more little addition to the instruction set.

I just had an epiphany wrt Concertina II

I keep always adding stuff, and it will never be finished?

Perhaps. I've left a lot of room, now, in auxilliary opcode spaces, to add
a bunch more stuff.

But the main standard opcode space is now bursting at the seams, and yet despite that I've managed to put back a couple of things I had previously, with regret, had to remove to make space.

So, while I've said it before, and it hasn't happened, it seems like I'm finally at the point where I can start fleshing out the design by listing
the opcodes for the various instructions, and explaining the fancy data
types.

Whether or not I'm even capable of going beyond that to the next steps
you've recommended is something to be seen later.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Wed May 6 17:19:26 2026

From Newsgroup: comp.arch

On Wed, 06 May 2026 01:10:39 +0000, quadi wrote:

So, while I've said it before, and it hasn't happened, it seems like I'm finally at the point where I can start fleshing out the design by
listing the opcodes for the various instructions, and explaining the
fancy data types.

After posting that, I ended up making just one more tiny addition... and noticed, when doing that, that there was a big addition that also should
be provided for now, rather than later.

The tiny addition - the U bit, so that now 34-bit instructions, instead of just being memory-reference operate instructions that don't set the
condition codes could instead be load/store instructions - but with 31
instead of 7 possible index registers.

The big one? I had enough bits for a field with which to specify an
alternate set of instructions to be used together with the existing ones
in this batch of variable-length code headers.

But now I have moved on - to a correction of an out-of-date diagram for
the 48-bit instructions.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From jgd@jgd@cix.co.uk (John Dallman) to comp.arch on Wed May 6 18:59:40 2026

From Newsgroup: comp.arch

In article <10tai1v$3q9vl$1@dont-email.me>, quadibloc@ca.invalid (quadi)
wrote:

Oh, dear. We _are_ doomed to eternal suffering, after all: what
Intel took from all these learning experiences was to never try to
deviate from x86 ever again!

They abandoned x86S <https://www.intel.com/content/www/us/en/developer/articles/technical/envi sioning-future-simplified-architecture.html>. But it won't be _eternal_ suffering. Eventually, x86 will become obsolete.

John
--- Synchronet 3.21f-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Thu May 7 23:15:46 2026

From Newsgroup: comp.arch

On Wed, 06 May 2026 18:58:00 +0100, John Dallman wrote:

They abandoned x86S

Which I thought was a _good_ idea, not a bad one. Because upwards compatibility with the huge pool of software out there is the only excuse
for sticking with x86.

John Savard
--- Synchronet 3.21f-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri May 8 18:12:48 2026

From Newsgroup: comp.arch

EricP <ThatWouldBeTelling@thevillage.com> posted:

On 2026-May-03 09:22, quadi wrote:

On Sun, 15 Mar 2026 14:35:00 +0000, John Dallman wrote:

iAPX 432 had instructions which weren't in whole bytes, and were
addressed by bit offset in a segment. You could only have 64K
instruction bits in a segment, or 8K bytes. The idea was that no
subroutine or function ever needed to be bigger than that.

It's worse than I thought.

While the STRETCH had bit addressing, unlike the STRETCH this sounds genuinely perverse.

John Savard

Performance Effects of Architectural Complexity in the Intel 432, 1988 https://www.princeton.edu/~rblee/ELE572Papers/Fall04Readings/I432.pdf

There was a CMU paper on 432 that stated if Intel had used 1 more pin
that performance could have <about> doubled.
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Fri May 8 20:11:26 2026

From Newsgroup: comp.arch

On Fri, 08 May 2026 18:12:48 +0000, MitchAlsup wrote:

EricP <ThatWouldBeTelling@thevillage.com> posted:

Performance Effects of Architectural Complexity in the Intel 432, 1988
https://www.princeton.edu/~rblee/ELE572Papers/Fall04Readings/I432.pdf

There was a CMU paper on 432 that stated if Intel had used 1 more pin
that performance could have <about> doubled.

Ouch! Well, these days they wouldn't make that kind of mistake again.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Fri May 8 16:55:07 2026

From Newsgroup: comp.arch

On 2026-May-08 14:12, MitchAlsup wrote:

EricP <ThatWouldBeTelling@thevillage.com> posted:

On 2026-May-03 09:22, quadi wrote:

On Sun, 15 Mar 2026 14:35:00 +0000, John Dallman wrote:

iAPX 432 had instructions which weren't in whole bytes, and were
addressed by bit offset in a segment. You could only have 64K
instruction bits in a segment, or 8K bytes. The idea was that no
subroutine or function ever needed to be bigger than that.

It's worse than I thought.

While the STRETCH had bit addressing, unlike the STRETCH this sounds
genuinely perverse.

John Savard

Performance Effects of Architectural Complexity in the Intel 432, 1988
https://www.princeton.edu/~rblee/ELE572Papers/Fall04Readings/I432.pdf

There was a CMU paper on 432 that stated if Intel had used 1 more pin
that performance could have <about> doubled.

It is still going to have to chew through gobs of microcode to do anything.
I too had microcode on the brain back then. In 1976 I designed (but did not build) a microcoded cpu core using TTL AMD 2900 bit-slice components.

Lately I have been playing around with circa 1975 TTL paper cpu designs but done in a pipelined risc style. The instructions must be variable length because
memory was so expensive in 1975. The key is to not bottleneck in Fetch or Decode.

Last month I designed a TTL fetch-parse unit for a risc-ish pipeline using the same
parts as are on the VAX-780 bill of materials or in the 1976 TI logic data book.

The instructions are byte granular, variable length from 1 to 12 bytes long. That is long enough to hold a 4 byte instruction specifier (opcode + registers) plus 8 bytes of immediate data. I expect the average instruction to be ~3 bytes.

My fetch unit paper design reads an 8-byte fetch block each clock
into a 32 byte circular prefetch buffer. The parser rotates the whole
32 byte buffer to align the instruction start with the length parser,
and a PLA examines the first 12 instructon bits to get the length.
It then validates that all the bytes are present in the buffer
and passes the 1 to 12 bytes + IP virtual address to the Decoder board.

This can do a sustained 5 MHz parse of 1 variable instruction/clock,
provided it hits the instruction cache. As most instructions are simple
and take 1 clock to execute, it should do sustained 5 MIPS.
Remember the 780 was 5 MHz but only executes at 0.5 MIPS.
It might also fit onto the same size PCB as the 780 used, about 15" x 15",
but requires more edge connector pins for more buses (~300).

On a separate board Decode will store the fetched instruction and
feed it through a bank of PLA chips, which controls tri-state buffers
to route signals into the Decode uOp output register.

If built my Fetch and Decode units could run 10x the speed of a 780,
using the exact same parts but just designed from a non-microcode
point of view.

--- Synchronet 3.22a-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Fri May 8 21:21:01 2026

From Newsgroup: comp.arch

EricP <ThatWouldBeTelling@thevillage.com> writes:

On 2026-May-08 14:12, MitchAlsup wrote:

04Readings/I432.pdf

There was a CMU paper on 432 that stated if Intel had used 1 more pin
that performance could have <about> doubled.

It is still going to have to chew through gobs of microcode to do anything.
I too had microcode on the brain back then. In 1976 I designed (but did not >build) a microcoded cpu core using TTL AMD 2900 bit-slice components.

Lately I have been playing around with circa 1975 TTL paper cpu designs but >done in a pipelined risc style. The instructions must be variable length because
memory was so expensive in 1975. The key is to not bottleneck in Fetch or Decode.

Last month I designed a TTL fetch-parse unit for a risc-ish pipeline using the same
parts as are on the VAX-780 bill of materials or in the 1976 TI logic data book.

The instructions are byte granular, variable length from 1 to 12 bytes long. >That is long enough to hold a 4 byte instruction specifier (opcode + registers)
plus 8 bytes of immediate data. I expect the average instruction to be ~3 bytes.

My fetch unit paper design reads an 8-byte fetch block each clock
into a 32 byte circular prefetch buffer. The parser rotates the whole
32 byte buffer to align the instruction start with the length parser,
and a PLA examines the first 12 instructon bits to get the length.
It then validates that all the bytes are present in the buffer
and passes the 1 to 12 bytes + IP virtual address to the Decoder board.

This can do a sustained 5 MHz parse of 1 variable instruction/clock,
provided it hits the instruction cache. As most instructions are simple
and take 1 clock to execute, it should do sustained 5 MIPS.
Remember the 780 was 5 MHz but only executes at 0.5 MIPS.
It might also fit onto the same size PCB as the 780 used, about 15" x 15", >but requires more edge connector pins for more buses (~300).

On a separate board Decode will store the fetched instruction and
feed it through a bank of PLA chips, which controls tri-state buffers
to route signals into the Decode uOp output register.

If built my Fetch and Decode units could run 10x the speed of a 780,
using the exact same parts but just designed from a non-microcode
point of view.

At what relative cost differential, i.e. what would a VAX-11/780
have cost if it had been built using your design?
--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat May 9 17:07:48 2026

From Newsgroup: comp.arch

EricP <ThatWouldBeTelling@thevillage.com> writes:
[reformatted to fit in 80-char lines with some slack for quote levels]

Lately I have been playing around with circa 1975 TTL paper cpu
designs but done in a pipelined risc style. The instructions must be
variable length because memory was so expensive in 1975. The key is
to not bottleneck in Fetch or Decode.

Cool!

Last month I designed a TTL fetch-parse unit for a risc-ish pipeline
using the same parts as are on the VAX-780 bill of materials or in
the 1976 TI logic data book.

The instructions are byte granular, variable length from 1 to 12
bytes long. That is long enough to hold a 4 byte instruction
specifier (opcode + registers) plus 8 bytes of immediate data. I
expect the average instruction to be ~3 bytes.

Interesting that you got this to work at 1 IPC. And, as the Skymont
shows, with enough resources such an instruction set can be made to
work at 9 IPC (the decoders of the Skymont are 3x3 wide, the renamer
is 8 wide).

Still, the question is how much, if any, code density advantage this
provides over something like RV32GC. In any case, given that RV32GC
code is smaller then VAX code
<2025Mar4.093916@mips.complang.tuwien.ac.at>, the code density of
RV32GC should be good enough, and the decoder may then need less
circuitry. I expect that your approach also gives some advantage in work/instruction, especially over RISC-V.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.22a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sat May 9 19:41:10 2026

From Newsgroup: comp.arch

EricP <ThatWouldBeTelling@thevillage.com> schrieb:

Lately I have been playing around with circa 1975 TTL paper cpu designs but done in a pipelined risc style. The instructions must be variable length because
memory was so expensive in 1975. The key is to not bottleneck in Fetch or Decode.

Last month I designed a TTL fetch-parse unit for a risc-ish pipeline using the same
parts as are on the VAX-780 bill of materials or in the 1976 TI logic data book.

The instructions are byte granular, variable length from 1 to 12 bytes long. That is long enough to hold a 4 byte instruction specifier (opcode + registers)
plus 8 bytes of immediate data. I expect the average instruction to be ~3 bytes.

That is quite impressive.

Could you share some details of the ISA, number of registers, what
do you use your 1-byte opcode for etc?

My fetch unit paper design reads an 8-byte fetch block each clock
into a 32 byte circular prefetch buffer. The parser rotates the whole
32 byte buffer to align the instruction start with the length parser,
and a PLA examines the first 12 instructon bits to get the length.
It then validates that all the bytes are present in the buffer
and passes the 1 to 12 bytes + IP virtual address to the Decoder board.

This can do a sustained 5 MHz parse of 1 variable instruction/clock,
provided it hits the instruction cache. As most instructions are simple
and take 1 clock to execute, it should do sustained 5 MIPS.
Remember the 780 was 5 MHz but only executes at 0.5 MIPS.
It might also fit onto the same size PCB as the 780 used, about 15" x 15", but requires more edge connector pins for more buses (~300).

On a separate board Decode will store the fetched instruction and
feed it through a bank of PLA chips, which controls tri-state buffers
to route signals into the Decode uOp output register.

If built my Fetch and Decode units could run 10x the speed of a 780,
using the exact same parts but just designed from a non-microcode
point of view.

--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.22a-Linux NewsLink 1.2

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Sun May 10 08:59:27 2026

From Newsgroup: comp.arch

On 2026-May-08 17:21, Scott Lurndal wrote:

EricP <ThatWouldBeTelling@thevillage.com> writes:

On 2026-May-08 14:12, MitchAlsup wrote:

04Readings/I432.pdf

There was a CMU paper on 432 that stated if Intel had used 1 more pin
that performance could have <about> doubled.

It is still going to have to chew through gobs of microcode to do anything. >> I too had microcode on the brain back then. In 1976 I designed (but did not >> build) a microcoded cpu core using TTL AMD 2900 bit-slice components.

Lately I have been playing around with circa 1975 TTL paper cpu designs but >> done in a pipelined risc style. The instructions must be variable length because
memory was so expensive in 1975. The key is to not bottleneck in Fetch or Decode.

Last month I designed a TTL fetch-parse unit for a risc-ish pipeline using the same
parts as are on the VAX-780 bill of materials or in the 1976 TI logic data book.

The instructions are byte granular, variable length from 1 to 12 bytes long. >> That is long enough to hold a 4 byte instruction specifier (opcode + registers)
plus 8 bytes of immediate data. I expect the average instruction to be ~3 bytes.

My fetch unit paper design reads an 8-byte fetch block each clock
into a 32 byte circular prefetch buffer. The parser rotates the whole
32 byte buffer to align the instruction start with the length parser,
and a PLA examines the first 12 instructon bits to get the length.
It then validates that all the bytes are present in the buffer
and passes the 1 to 12 bytes + IP virtual address to the Decoder board.

This can do a sustained 5 MHz parse of 1 variable instruction/clock,
provided it hits the instruction cache. As most instructions are simple
and take 1 clock to execute, it should do sustained 5 MIPS.
Remember the 780 was 5 MHz but only executes at 0.5 MIPS.
It might also fit onto the same size PCB as the 780 used, about 15" x 15", >> but requires more edge connector pins for more buses (~300).

On a separate board Decode will store the fetched instruction and
feed it through a bank of PLA chips, which controls tri-state buffers
to route signals into the Decode uOp output register.

If built my Fetch and Decode units could run 10x the speed of a 780,
using the exact same parts but just designed from a non-microcode
point of view.

At what relative cost differential, i.e. what would a VAX-11/780
have cost if it had been built using your design?

Just eyeballing it, I'd say about the same.

IIRC the 780 sold for $250k USD plus $100K for 256kB DRAM memory.
The cpu core consisted of 20 pcb's, the optional FPU is 5 pcb's.
Each board appears to be 15" x 15" with 6 edge connectors
along the bottom with 2 rows of 20 pins, = 240 edge pins.
Each board plugs into a specific slot in the backplane
and the backplane does the board interconnect.
The densest board appears to be the Writable Control Store
which has about 16 rows by 16 columns of 16 pin DIPs.

780 had a single I & D cache, 8kB 2 way assoc., 8 byte lines,
write through, write no alloc, with DMA snooping and invalidates.
Cache takes up 2 boards, 1 for tags, 1 for data.
The TLB is 128 sets, 2 way assoc. PTE's.
I can't find where it says main memory access cycle time
just now but IIRC it was 1200 ns.

I would keep some boards the same or with small changes.
I don't need the Writable Control Store, Prom Control Store,
and Microsequencer boards.

I would have a separate I-cache (2 boards) so it can fetch
and execute at the same time, but it is the same boards as
the D-cache so no design costs.

TLB is mostly the same except page size is 4kB and page table is
2 tables with 2 levels each, so page table walking is different.

Bus interface to the main memory bus (SBI) is the same.
As are the memory controllers and memory boards.

The design difference is in Fetch, Decode, register file,
and integer EXE stage, though the components would stay largely
the same in type and number.
32b ALU is 74181/74182 chips,
32b barrel shifter is 74S350 (aka AMD 25S10's) chips,

I don't know what 780 used for its integer multiplier
(I can't find it in the schematics).
It is possible to build a 1 clock 32b Wallace tree multiplier
in TTL but it would be expensive.
IIRC there were 16b flash multiplier chips available then
so using one of those MULU would take 5 or 6 clocks.

I also need way more than 240 pcb edge connector pins because
there are many more buses operating concurrently.
350 or so would be nice.

--- Synchronet 3.22a-Linux NewsLink 1.2

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Sun May 10 10:54:33 2026

From Newsgroup: comp.arch

On 2026-May-09 13:07, Anton Ertl wrote:

EricP <ThatWouldBeTelling@thevillage.com> writes:
[reformatted to fit in 80-char lines with some slack for quote levels]

Lately I have been playing around with circa 1975 TTL paper cpu
designs but done in a pipelined risc style. The instructions must be
variable length because memory was so expensive in 1975. The key is
to not bottleneck in Fetch or Decode.

Cool!

Last month I designed a TTL fetch-parse unit for a risc-ish pipeline
using the same parts as are on the VAX-780 bill of materials or in
the 1976 TI logic data book.

The instructions are byte granular, variable length from 1 to 12
bytes long. That is long enough to hold a 4 byte instruction
specifier (opcode + registers) plus 8 bytes of immediate data. I
expect the average instruction to be ~3 bytes.

Interesting that you got this to work at 1 IPC. And, as the Skymont
shows, with enough resources such an instruction set can be made to
work at 9 IPC (the decoders of the Skymont are 3x3 wide, the renamer
is 8 wide).

The design is for the Fetch and Decode stages, and it is a paper design.
But those two stages each fit in a single pcb and would run at 1 IPC.
The VAX usage stats show that around 80% of the integer instruction
usage are simple, 2 source, 1 dest register, which with a 3 port 2R 1W
register file would also execute in 1 clock.

If on this model certain instructions need to stall the pipeline in
some stages, eg a 5 clock MUL, I am fine with that as they are low usage.

Still, the question is how much, if any, code density advantage this
provides over something like RV32GC. In any case, given that RV32GC
code is smaller then VAX code
<2025Mar4.093916@mips.complang.tuwien.ac.at>, the code density of
RV32GC should be good enough, and the decoder may then need less
circuitry. I expect that your approach also gives some advantage in work/instruction, especially over RISC-V.

- anton

The high usage integer operate instructions fit have a 12b opcode
and 3 x 4b register fields. A 4B instruction would waste 1B each.
In 1975 memory was so expensive I didn't think that would go
down well with customers. Also having immediate values avoids
all the risc constant pasting instructions.
The way I looked at this as if I couldn't get a Fetch design
for variable length instructions to fit on 1 board or work at
1 IPC then I would have considered fixed 4B instructions.
As it turns out variable length does work - the key is having
big alignment shifter that fits on the board.

--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun May 10 15:59:48 2026

From Newsgroup: comp.arch

EricP <ThatWouldBeTelling@thevillage.com> writes:

I don't know what 780 used for its integer multiplier
(I can't find it in the schematics).

Then I expect that it used microcode to do 1 multiplier bit per cycle.
IIRC that was the approach that SPARC took at first, and that was also
designed into HPPA (but then they added the FPU and multiplied by
using the multiplier of the FPU).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun May 10 16:05:08 2026

From Newsgroup: comp.arch

EricP <ThatWouldBeTelling@thevillage.com> writes:

On 2026-May-09 13:07, Anton Ertl wrote:

Interesting that you got this to work at 1 IPC. And, as the Skymont
shows, with enough resources such an instruction set can be made to
work at 9 IPC (the decoders of the Skymont are 3x3 wide, the renamer
is 8 wide).

The design is for the Fetch and Decode stages, and it is a paper design.
But those two stages each fit in a single pcb and would run at 1 IPC.

Yes, my ramblings were about the future-proofness of such an
instruction set design.

Still, the question is how much, if any, code density advantage this
provides over something like RV32GC. In any case, given that RV32GC
code is smaller then VAX code
<2025Mar4.093916@mips.complang.tuwien.ac.at>, the code density of
RV32GC should be good enough, and the decoder may then need less
circuitry. I expect that your approach also gives some advantage in
work/instruction, especially over RISC-V.

- anton

The high usage integer operate instructions fit have a 12b opcode
and 3 x 4b register fields. A 4B instruction would waste 1B each.

The compressed instructions (the "C" in RV32GC) of RISC-V take 16bits
and are a subset of the regular instructions (i.e., if no appropriate compressed instruction exists, you just use the regular instruction).
The compressed instructions typically use one specifier for a source
and a destination, eliminating one register specifier, and in many
cases only have 3-bit register specifiers (for r8-r15, ordinary RISC-V
has a 5-bit register specifier). In some cases a register is fixed
(requiring 0b for the specifier). The immediate operands are also
shorter.

A 12b opcode looks like a lot to me. I have typically seen 6b on
RISCs, with some opcodes using an auxiliary field for further
refinement.

The disadvantage of RV32GC for your project would be that there is all
this wasted encoding space in the regular and compressed instructions
for the additional instructions of RV64GC.

OTOH, if you traveled back in time and the architecture was a success,
that would result in an easy step to extend the architecture to 64
bits.

Then again, as AMD64 and ARM A64 demonstrate, doing a separate, binary-incompatible 64-bit instruction can work well. And the fact
that for MIPS, SPARC, HPPA and Power the 32-bit instruction set is
just a subset of the 64-bit instruction set has not been a decisive
factor for their success.

In 1975 memory was so expensive I didn't think that would go
down well with customers.

Supporting 1B and 3B instructions in addition to RV32GC's 2B and 4B instructions could help increase code density further, but makes it
more costly to build later CPUs with IPC>1.

Also having immediate values avoids
all the risc constant pasting instructions.

Yes, and this increases the work/instruction. The question is if the
benefit is worth the cost. The RISC designers, even those with
variable-width instructions, decided against this feature.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.22a-Linux NewsLink 1.2

From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Sun May 10 16:23:00 2026

From Newsgroup: comp.arch

EricP [2026-05-10 08:59:27] wrote:

On 2026-May-08 17:21, Scott Lurndal wrote:

At what relative cost differential, i.e. what would a VAX-11/780
have cost if it had been built using your design?

Just eyeballing it, I'd say about the same.

[...]

I also need way more than 240 pcb edge connector pins because
there are many more buses operating concurrently.
350 or so would be nice.

So, the question comes down to how much would it cost to increase the
240 pins to, say, 360.
[ My gut feeling is that it could be (have been) fairly expensive. ]

=== Stefan
--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Mon May 11 07:47:53 2026

From Newsgroup: comp.arch

Stefan Monnier <monnier@iro.umontreal.ca> writes:

So, the question comes down to how much would it cost to increase the
240 pins to, say, 360.
[ My gut feeling is that it could be (have been) fairly expensive. ]

I am not an expert, but it seems to me that 120 extra pins on the
inter-board connectors would cost about as much as 120 pins of regular (socketed) chips per board. So if EricP can eliminate that many chips
or maybe a board, the cost should be the same. If not, it would be a
few percent higher. In absolute numbers, I expect that the increase
in cost would be <$1000.

The other question, of course, is, how much DEC could have charged for
a machine that is 10x faster than a VAX 11/780. I guess that they
could easily have recouped the additional cost, if any.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.22a-Linux NewsLink 1.2

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Mon May 11 09:02:23 2026

From Newsgroup: comp.arch

On 2026-May-10 16:23, Stefan Monnier wrote:

EricP [2026-05-10 08:59:27] wrote:

On 2026-May-08 17:21, Scott Lurndal wrote:

At what relative cost differential, i.e. what would a VAX-11/780
have cost if it had been built using your design?

Just eyeballing it, I'd say about the same.

[...]

I also need way more than 240 pcb edge connector pins because
there are many more buses operating concurrently.
350 or so would be nice.

So, the question comes down to how much would it cost to increase the
240 pins to, say, 360.
[ My gut feeling is that it could be (have been) fairly expensive. ]

=== Stefan

Yes, that's why I mention it.
The problem is that as the # of pins goes up, so does the insertion friction. At some point, which I don't know, the insertion force is high enough that
it can exceed the crush strength of the connector contact.
The contact gets crushed into the bottom of the connector and it is useless.

On a bus like a PC bus where a card can go in any slot,
you would (hopefully) just move to another slot.
But on a backplane design like the 780 where cards must go in
specific slots, you just trashed your whole backplane and now
have to disassemble the machine to replace it.

There are connectors called Zero Insertion Force connectors
where you put the card in, then twist a screw or something and
the connector closes like a clamp on the card contacts.

I did a quicky search to see if there were anything like what
I need, just to get the price, but could not find anything.
If they exist I imagine they are considerably more expensive
but each cpu only needs 25 of them.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Mon May 11 16:21:34 2026

From Newsgroup: comp.arch

EricP wrote:

On 2026-May-10 16:23, Stefan Monnier wrote:

EricP [2026-05-10 08:59:27] wrote:

On 2026-May-08 17:21, Scott Lurndal wrote:

At what relative cost differential, i.e. what would a VAX-11/780
have cost if it had been built using your design?

Just eyeballing it, I'd say about the same.

[...]

I also need way more than 240 pcb edge connector pins because
there are many more buses operating concurrently.
350 or so would be nice.

So, the question comes down to how much would it cost to increase the
240 pins to, say, 360.
[ My gut feeling is that it could be (have been) fairly expensive.-a ]

=== Stefan

Yes, that's why I mention it.
The problem is that as the # of pins goes up, so does the insertion friction.
At some point, which I don't know, the insertion force is high enough that
it can exceed the crush strength of the connector contact.
The contact gets crushed into the bottom of the connector and it is
useless.

On a bus like a PC bus where a card can go in any slot,
you would (hopefully) just move to another slot.
But on a backplane design like the 780 where cards must go in
specific slots, you just trashed your whole backplane and now
have to disassemble the machine to replace it.

There are connectors called Zero Insertion Force connectors
where you put the card in, then twist a screw or something and
the connector closes like a clamp on the card contacts.

I've seen those both on EPROMs and CPUs, quite nice actually.
Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
--- Synchronet 3.22a-Linux NewsLink 1.2

From jgd@jgd@cix.co.uk (John Dallman) to comp.arch on Mon May 11 15:35:40 2026

From Newsgroup: comp.arch

In article <10tj6f2$2frpj$1@dont-email.me>, quadibloc@ca.invalid (quadi)
wrote:

On Wed, 06 May 2026 18:58:00 +0100, John Dallman wrote:

They abandoned x86S

Which I thought was a _good_ idea, not a bad one. Because upwards compatibility with the huge pool of software out there is the only
excuse for sticking with x86.

It was a plausible concept, but as best I remember from reading the white paper, they were fuzzy about just how much 32-bit software it would run,
if you weren't an expert on x86 operating modes and memory models.

If it had been "16-bit goes, all 32-bit stays, including operating
systems" they'd likely have got a lot more buy-in. But would that have
been a worthwhile simplification?

John
--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Mon May 11 18:16:00 2026

From Newsgroup: comp.arch

anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

Stefan Monnier <monnier@iro.umontreal.ca> writes:

So, the question comes down to how much would it cost to increase the
240 pins to, say, 360.
[ My gut feeling is that it could be (have been) fairly expensive. ]

My guess (GUESS)is that packages have 4 components:
a) package area
b) die area
c) power dissipation capability
d) pin count

So adding 50% more pins keeping the other 3 fixed would add 25%/4 = 6%
but it would similarly add about 12% to the connector by which the
package is attached to its motherboard and some additional MB costs.

I am not an expert, but it seems to me that 120 extra pins on the
inter-board connectors would cost about as much as 120 pins of regular (socketed) chips per board. So if EricP can eliminate that many chips
or maybe a board, the cost should be the same. If not, it would be a
few percent higher. In absolute numbers, I expect that the increase
in cost would be <$1000.

It never ceased to amaze me that FABs are built at costs of $20B+ to manufacture $10 chips, wile packages come from $100M plants making
$10 packages that accept 1 chip.

The other question, of course, is, how much DEC could have charged for
a machine that is 10x faster than a VAX 11/780. I guess that they
could easily have recouped the additional cost, if any.

- anton

--- Synchronet 3.22a-Linux NewsLink 1.2

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Mon May 11 14:44:06 2026

From Newsgroup: comp.arch

On 2026-May-09 15:41, Thomas Koenig wrote:

EricP <ThatWouldBeTelling@thevillage.com> schrieb:

Lately I have been playing around with circa 1975 TTL paper cpu designs but >> done in a pipelined risc style. The instructions must be variable length because
memory was so expensive in 1975. The key is to not bottleneck in Fetch or Decode.

Last month I designed a TTL fetch-parse unit for a risc-ish pipeline using the same
parts as are on the VAX-780 bill of materials or in the 1976 TI logic data book.

The instructions are byte granular, variable length from 1 to 12 bytes long. >> That is long enough to hold a 4 byte instruction specifier (opcode + registers)
plus 8 bytes of immediate data. I expect the average instruction to be ~3 bytes.

That is quite impressive.

Could you share some details of the ISA, number of registers, what
do you use your 1-byte opcode for etc?

My fetch unit paper design reads an 8-byte fetch block each clock
into a 32 byte circular prefetch buffer. The parser rotates the whole
32 byte buffer to align the instruction start with the length parser,
and a PLA examines the first 12 instructon bits to get the length.
It then validates that all the bytes are present in the buffer
and passes the 1 to 12 bytes + IP virtual address to the Decoder board.

This can do a sustained 5 MHz parse of 1 variable instruction/clock,
provided it hits the instruction cache. As most instructions are simple
and take 1 clock to execute, it should do sustained 5 MIPS.
Remember the 780 was 5 MHz but only executes at 0.5 MIPS.
It might also fit onto the same size PCB as the 780 used, about 15" x 15", >> but requires more edge connector pins for more buses (~300).

On a separate board Decode will store the fetched instruction and
feed it through a bank of PLA chips, which controls tri-state buffers
to route signals into the Decode uOp output register.

If built my Fetch and Decode units could run 10x the speed of a 780,
using the exact same parts but just designed from a non-microcode
point of view.

To answer your second question first,
I have 3 instructions that must be sized at 1 instruction granule,
which in this case is 1 byte, as can occur on arbitrary boundary:
ILLG Illegal is opcode 0, causes an Illegal instruction fault
NOP No Operation
BRKP Breakpoint

ILLG is used pad non-executing space, NOP pads executing space,
BRKP can be deposited by a debugger at any instruction start.

Other instructions may be 1 granule, only these 3 must be.

The goal of my exercise was to see if Fetch and Decode units could
be designed with circa 1975 TTL that execute at 5 MHz getting 1 IPC.
The ISA is just a prototype to set some boundaries for the
Fetch and Decode designs have something fixed to shoot for.

The Fetch unit cares little what the internal format of an instruction is.
It only looks at most at the first up to 12 bits to determine length.
It only cares how many bytes are in the instruction and are they all
present in the prefetch register. If they are Fetch copies them to
the Decode input instruction register. It is Decode that looks at the
internal instruction of instructions and translates them into a uOp.

Provided Fetch and Decode meet there performance goal,
the instructions and their formats would be chosen such
that they are compatible with any limitations on those units.

The ISA is 32 bit integer and virtual address space.
Instruction Pointer register,
16 x 32b integer registers, 16 x 64b float registers.
No integer condition codes.

I wanted variable length instructions so it could have large immediates.
12 bytes was large enough to hold a 4B opcode and register numbers
with an 8B immediate that is either 1 fp64
or 2 int32 for compare and branch.

The Fetch unit parser uses a Signetics 82S100 FPLA to determine
the instruction length. The 82S100 has 16 inputs, 8 outputs,
and can match 48 product terms with 0, 1 or x dont-care bits.
The parser feeds the first 12 bits from the first 2 bytes plus
their 2 byte Valid flags and an error signal into the 82S100.
There is one input spare for future use.
The PLA outputs a 4b length from 0 to 12, plus 2 status bits
indicating the length is valid, or fetch unit stalled,
or fetch error, either page fault exception or HW error.
If the parse length is valid then fetch checks that all the
bytes of the instruction length are also valid.
If they are it passes the instruction to Decode input register.

To give an example, to encode the 3 1B instructions I define
(0 matches a 0, 1 matches a 1, x = a dont-care bit):

ILLG = bxxxx_0000_0000
NOP = bxxxx_1000_0000
BRKP = bxxxx_0100_0000
spare = bxxxx_1100_0000

Notice bits [5:0] of all are 0. So I define a PLA pattern
that matches that and spits out the length 1

11
1098_7654_3210
xxxx_xx00_0000 => 1

Now I want low usage frequency instructions like SYSENTER, SYSEXIT
that have no registers but I don't want to use up all my primary
byte code space so I continue in the second byte.

11
1098_7654_3210
0000_0010_0000 => 2

Now I want 3 byte instructions for the high usage 3 operand instructions
like ADD, SUB, AND, OR, etc. with 3 x 4b register fields.
There are many of them, lets say 64 or 4 groups of 16.

11
1098_7654_3210
xxxx_xx01_0000 => 3

So all of the integer and float 3 register instructions begin
with the bits [5:0] = 01_0000, and bits [11:6] select individual
instructions.

Now I go back and define exactly what those instructions are:

ADD Add Rd1 = Rs2 + Rs3
ADDFS Add fault signed overflow Rd1 = Rs2 + Rs3
ADDFU Add fault unsigned overflow Rd1 = Rs2 + Rs3

SUB Subtract Rd1 = Rs2 - Rs3
SUBFS Subtract fault signed overflow Rd1 = Rs2 - Rs3
SUBFU Subtract fault unsigned overflow Rd1 = Rs2 - Rs3
...

Now I'll do conditional branches.
Conditional branches need a 4b register to test,
a 3b condition code to test for (EZ = equal zero, NZ = not zero, ...)
and either a short byte or long 32b word offset.

But wait... a 12b opcode, 4b register and a 1B offset
is a 3B instruction so I can take all the short conditional
branches and merge them with the other 3B instruction groups.

That leaves long branches which have a 12b opcode, 4b register,
and 4B offset and a length of 6 bytes.

11
1098_7654_3210
xxx0_0011_0000 => 6

But wait... a 12b opcode, 4b register and 4B immediate is the
same length as an ADDIW add immediate word with a single
source-dest register. So I make the long branch group bigger
by moving the dont-care point:

11
1098_7654_3210
xxxx_x011_0000 => 6

and now define all those instructions.
And so on.

--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Mon May 11 20:51:39 2026

From Newsgroup: comp.arch

EricP <ThatWouldBeTelling@thevillage.com> posted:

On 2026-May-09 15:41, Thomas Koenig wrote:

EricP <ThatWouldBeTelling@thevillage.com> schrieb:

Lately I have been playing around with circa 1975 TTL paper cpu designs but
done in a pipelined risc style. The instructions must be variable length because
memory was so expensive in 1975. The key is to not bottleneck in Fetch or Decode.

Last month I designed a TTL fetch-parse unit for a risc-ish pipeline using the same
parts as are on the VAX-780 bill of materials or in the 1976 TI logic data book.

The instructions are byte granular, variable length from 1 to 12 bytes long.
That is long enough to hold a 4 byte instruction specifier (opcode + registers)
plus 8 bytes of immediate data. I expect the average instruction to be ~3 bytes.

That is quite impressive.

Could you share some details of the ISA, number of registers, what
do you use your 1-byte opcode for etc?

My fetch unit paper design reads an 8-byte fetch block each clock
into a 32 byte circular prefetch buffer. The parser rotates the whole
32 byte buffer to align the instruction start with the length parser,
and a PLA examines the first 12 instructon bits to get the length.
It then validates that all the bytes are present in the buffer
and passes the 1 to 12 bytes + IP virtual address to the Decoder board.

This can do a sustained 5 MHz parse of 1 variable instruction/clock,
provided it hits the instruction cache. As most instructions are simple
and take 1 clock to execute, it should do sustained 5 MIPS.
Remember the 780 was 5 MHz but only executes at 0.5 MIPS.
It might also fit onto the same size PCB as the 780 used, about 15" x 15", >> but requires more edge connector pins for more buses (~300).

On a separate board Decode will store the fetched instruction and
feed it through a bank of PLA chips, which controls tri-state buffers
to route signals into the Decode uOp output register.

If built my Fetch and Decode units could run 10x the speed of a 780,
using the exact same parts but just designed from a non-microcode
point of view.

To answer your second question first,
I have 3 instructions that must be sized at 1 instruction granule,
which in this case is 1 byte, as can occur on arbitrary boundary:
ILLG Illegal is opcode 0, causes an Illegal instruction fault
NOP No Operation
BRKP Breakpoint

ILLG is used pad non-executing space, NOP pads executing space,
BRKP can be deposited by a debugger at any instruction start.

Other instructions may be 1 granule, only these 3 must be.

Point of order: these 3 only have to be the smallest granule of
ISA not necessarily 1 byte).

The goal of my exercise was to see if Fetch and Decode units could
be designed with circa 1975 TTL that execute at 5 MHz getting 1 IPC.
The ISA is just a prototype to set some boundaries for the
Fetch and Decode designs have something fixed to shoot for.

Laudable.

The Fetch unit cares little what the internal format of an instruction is.
It only looks at most at the first up to 12 bits to determine length.
It only cares how many bytes are in the instruction and are they all
present in the prefetch register. If they are Fetch copies them to
the Decode input instruction register.

This is the stage I call PARSE. Fetch is responsible for presenting
cache sub-lines to PARSE, PARSE slices instructions out of the words
and presenting parsed instructions to multiple parallel DECODE units.

It is Decode that looks at the

internal instruction of instructions and translates them into a uOp.

Provided Fetch and Decode meet there performance goal,
the instructions and their formats would be chosen such
that they are compatible with any limitations on those units.

The ISA is 32 bit integer and virtual address space.
Instruction Pointer register,
16 x 32b integer registers, 16 x 64b float registers.
No integer condition codes.

Program status {word to line} ??
Root pointer ??
...

I wanted variable length instructions so it could have large immediates.
12 bytes was large enough to hold a 4B opcode and register numbers
with an 8B immediate that is either 1 fp64
or 2 int32 for compare and branch.

After dropping ST #largeconst,[Rbase+Rindex<<scale+largeDISP]
My 66000 went from 5 word max down to 3 word max and the inst-
length decoder went from 40-gates (4 gates of delay) down to
5 gates and 2 gates of delay--at a measured cost of 0.27% more
instructions.

A good tradeoff !

The Fetch unit parser uses a Signetics 82S100 FPLA to determine
the instruction length. The 82S100 has 16 inputs, 8 outputs,
and can match 48 product terms with 0, 1 or x dont-care bits.
The parser feeds the first 12 bits from the first 2 bytes plus
their 2 byte Valid flags and an error signal into the 82S100.
There is one input spare for future use.
The PLA outputs a 4b length from 0 to 12, plus 2 status bits
indicating the length is valid, or fetch unit stalled,
or fetch error, either page fault exception or HW error.
If the parse length is valid then fetch checks that all the
bytes of the instruction length are also valid.
If they are it passes the instruction to Decode input register.

I should mention that the DECODE unit of Mc 88100 was a siingle
NOR plane {while a PLA is 2 NOR planes}.

To give an example, to encode the 3 1B instructions I define
(0 matches a 0, 1 matches a 1, x = a dont-care bit):

ILLG = bxxxx_0000_0000
NOP = bxxxx_1000_0000
BRKP = bxxxx_0100_0000
spare = bxxxx_1100_0000

Notice bits [5:0] of all are 0. So I define a PLA pattern
that matches that and spits out the length 1

Not bad!

11
1098_7654_3210
xxxx_xx00_0000 => 1

I don't see the PLA calculation in the above--can you explain ?

Now I want low usage frequency instructions like SYSENTER, SYSEXIT
that have no registers but I don't want to use up all my primary
byte code space so I continue in the second byte.

In comparison: My 66000 SVC (sysenter) SVR (sysexit) use the Rd
field as a count of the number of registers to copy from {actually
I just do not load over these} caller/returner to called/returned.
This allows completely separate register files at each privilege
level and the same SBI as ABI (98%-ile). My 66000 Linux has 0-6
argument registers to SVC and 0-2 result registers from SVR. All
more privileged registers from returner are overwritten by lesser
privileged registers for returned--avoiding information spillage.

11
1098_7654_3210
0000_0010_0000 => 2

Now I want 3 byte instructions for the high usage 3 operand instructions
like ADD, SUB, AND, OR, etc. with 3 x 4b register fields.
There are many of them, lets say 64 or 4 groups of 16.

11
1098_7654_3210
xxxx_xx01_0000 => 3

So all of the integer and float 3 register instructions begin
with the bits [5:0] = 01_0000, and bits [11:6] select individual instructions.

Now I go back and define exactly what those instructions are:

ADD Add Rd1 = Rs2 + Rs3
ADDFS Add fault signed overflow Rd1 = Rs2 + Rs3
ADDFU Add fault unsigned overflow Rd1 = Rs2 + Rs3

SUB Subtract Rd1 = Rs2 - Rs3
SUBFS Subtract fault signed overflow Rd1 = Rs2 - Rs3
SUBFU Subtract fault unsigned overflow Rd1 = Rs2 - Rs3
...

So, Mil-like

Now I'll do conditional branches.
Conditional branches need a 4b register to test,
a 3b condition code to test for (EZ = equal zero, NZ = not zero, ...)
and either a short byte or long 32b word offset.

16-bits<<2 cover over // well we have not yet compiled a subroutine
that exceeds this flow-control range within a subroutine--so, for
all practical purposes 100%.

For unconditional branches, 26-bits<<2 covers all known benchmarks
but fails on several DataBase code arrangements {miniscul %}.

Byte sized instruction granules cannot use the <<2 portion and
will suffer more "issues".

But wait... a 12b opcode, 4b register and a 1B offset
is a 3B instruction so I can take all the short conditional
branches and merge them with the other 3B instruction groups.

That leaves long branches which have a 12b opcode, 4b register,
and 4B offset and a length of 6 bytes.

11
1098_7654_3210
xxx0_0011_0000 => 6

We found than by dropping the conditionality it fits in 4-bytes
at the cost of 4-effective bits of displacement--where you are
paying 16-bits for it.

But wait... a 12b opcode, 4b register and 4B immediate is the
same length as an ADDIW add immediate word with a single
source-dest register. So I make the long branch group bigger
by moving the dont-care point:

11
1098_7654_3210
xxxx_x011_0000 => 6

and now define all those instructions.
And so on.

Cute...
--- Synchronet 3.22a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Tue May 12 07:54:48 2026

From Newsgroup: comp.arch

EricP <ThatWouldBeTelling@thevillage.com> schrieb:

There are connectors called Zero Insertion Force connectors
where you put the card in, then twist a screw or something and
the connector closes like a clamp on the card contacts.

I did a quicky search to see if there were anything like what
I need, just to get the price, but could not find anything.
If they exist I imagine they are considerably more expensive
but each cpu only needs 25 of them.

There a couple of basic patents just around that timeframe,
e.g. https://patents.google.com/patent/US4080032A (for
chips, not for circuit boards) which was laid-open in 1978. https://patents.google.com/patent/US4189200A was filed in 1978.
Seems they were just a bit too late for introduction with your
VAX alternative.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Tue May 12 15:58:07 2026

From Newsgroup: comp.arch

On Mon, 11 May 2026 15:34:00 +0100, John Dallman wrote:

If it had been "16-bit goes, all 32-bit stays, including operating
systems" they'd likely have got a lot more buy-in. But would that have
been a worthwhile simplification?

As far as I'm concerned, they made the same mistake with EM64T as they did with the 80286. Not only should 16-bit stay, but they should have had it
so that 16-bit was as easily accessible from 64-bit as it was from 32-bit,
so as to retain all 16-bit software running perfectly and transparently in 64-bit editions of Windows.

Upwards compatibility, where software is not open-source, so users are dependent on binaries running as-is without recompilation, should be
regarded as absolutely mandatory. Total upwards compatibility.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Tue May 12 18:16:05 2026

From Newsgroup: comp.arch

quadi <quadibloc@ca.invalid> writes:

On Mon, 11 May 2026 15:34:00 +0100, John Dallman wrote:

If it had been "16-bit goes, all 32-bit stays, including operating
systems" they'd likely have got a lot more buy-in. But would that have
been a worthwhile simplification?

As far as I'm concerned, they made the same mistake with EM64T as they did >with the 80286. Not only should 16-bit stay, but they should have had it
so that 16-bit was as easily accessible from 64-bit as it was from 32-bit, >so as to retain all 16-bit software running perfectly and transparently in >64-bit editions of Windows.

As far as I'm concerned, the sooner they get rid of all the 32-bit
legacy crap (e.g. segments, booting through real-mode, protected-mode
and long-mode), the better.

x86S is the future...

https://www.intel.com/content/www/us/en/developer/articles/technical/envisioning-future-simplified-architecture.html
--- Synchronet 3.22a-Linux NewsLink 1.2

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Tue May 12 15:35:22 2026

From Newsgroup: comp.arch

On 2026-May-10 11:59, Anton Ertl wrote:

EricP <ThatWouldBeTelling@thevillage.com> writes:

I don't know what 780 used for its integer multiplier
(I can't find it in the schematics).

Then I expect that it used microcode to do 1 multiplier bit per cycle.
IIRC that was the approach that SPARC took at first, and that was also designed into HPPA (but then they added the FPU and multiplied by
using the multiplier of the FPU).

- anton

I think so. There is one line in the hardware manual that says
the optional FPU also "enhances the performance of integer multiply".

The FPU technical manual describes how the FPU does its multiplies.
It chops the values into 4b nibbles and looks up 4b x 4b values in ROMs
in a loop, and sums the partial products.
There are two sets of ROMs so it can interleave the lookups.

Note that there are TTL parts available in 1975 to build a
Wallace tree multiplier that can do 32b x 32b in ~130 ns.

From FP780_Technical_Description_1978-12

"In order to obtain fast multiplication, a pipeline technique is used (Figure 2-21 ).
The multiplier is divided into 4-bit nibbles. The nibbles are then accessed consecutively
by a counter-multiplexer combination (least significant nibble first) and each nibble
operates on up to 32 bits of multiplicand. The MCA ND bus and MPLIER nibbles are used to
address the ROMs. The banks of ROMs provide a 4 X 4 primitive with 2-way interleaving.
The data is latched (ROM.STORE) and applied to the inputs of 4-bit adders (PALU).
These adders combine the ROM data to form a partial product, storing the carryout
of each 4-bit section, to be added in on the next cycle. The partial product is latched
in PPROD and passed to another row of adders (AALU) which accumulate the final product,
again, saving the carries. Thus, when the pipeline is operating, there are four processes
cycling at the same time:

1. Select ROM addresses
2. Latch ROM data
3. Form partial product
4. Accumulate final product.

After the final product is calculated, the stored carriers from both stages are combined with the accumulated product using full carry look-ahead to produce the
final answer in a single precision (float) operation. In double precision,
this result is stored and used during the generation of the final answer
during the second pass.

Each of the pipeline processes, with the exception of accessing ROM data
(which occurs in each bank of ROMs on 100 ns) occurs at SO ns intervals."

--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Tue May 12 20:22:08 2026

From Newsgroup: comp.arch

EricP <ThatWouldBeTelling@thevillage.com> posted:

On 2026-May-10 11:59, Anton Ertl wrote:

EricP <ThatWouldBeTelling@thevillage.com> writes:

I don't know what 780 used for its integer multiplier
(I can't find it in the schematics).

Then I expect that it used microcode to do 1 multiplier bit per cycle.
IIRC that was the approach that SPARC took at first, and that was also designed into HPPA (but then they added the FPU and multiplied by
using the multiplier of the FPU).

- anton

I think so. There is one line in the hardware manual that says
the optional FPU also "enhances the performance of integer multiply".

The FPU technical manual describes how the FPU does its multiplies.
It chops the values into 4b nibbles and looks up 4b x 4b values in ROMs
in a loop, and sums the partial products.
There are two sets of ROMs so it can interleave the lookups.

Note that there are TTL parts available in 1975 to build a
Wallace tree multiplier that can do 32b x 32b in ~130 ns.

s/Wallace/Dadda/

So, at 5MHz == 200ns this is probably a 2-cycle multiplier after
you add the input multiplexing and output selection.

From FP780_Technical_Description_1978-12

"In order to obtain fast multiplication, a pipeline technique is used (Figure 2-21 ).
The multiplier is divided into 4-bit nibbles. The nibbles are then accessed consecutively
by a counter-multiplexer combination (least significant nibble first) and each nibble
operates on up to 32 bits of multiplicand. The MCA ND bus and MPLIER nibbles are used to
address the ROMs. The banks of ROMs provide a 4 X 4 primitive with 2-way interleaving.
The data is latched (ROM.STORE) and applied to the inputs of 4-bit adders (PALU).
These adders combine the ROM data to form a partial product, storing the carryout
of each 4-bit section, to be added in on the next cycle. The partial product is latched
in PPROD and passed to another row of adders (AALU) which accumulate the final product,
again, saving the carries. Thus, when the pipeline is operating, there are four processes
cycling at the same time:

1. Select ROM addresses
2. Latch ROM data
3. Form partial product
4. Accumulate final product.

After the final product is calculated, the stored carriers from both stages are
combined with the accumulated product using full carry look-ahead to produce the
final answer in a single precision (float) operation. In double precision, this result is stored and used during the generation of the final answer during the second pass.

Each of the pipeline processes, with the exception of accessing ROM data (which occurs in each bank of ROMs on 100 ns) occurs at SO ns intervals."

--- Synchronet 3.22a-Linux NewsLink 1.2

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Tue May 12 17:47:37 2026

From Newsgroup: comp.arch

On 2026-May-12 16:22, MitchAlsup wrote:

EricP <ThatWouldBeTelling@thevillage.com> posted:

On 2026-May-10 11:59, Anton Ertl wrote:

EricP <ThatWouldBeTelling@thevillage.com> writes:

I don't know what 780 used for its integer multiplier
(I can't find it in the schematics).

Then I expect that it used microcode to do 1 multiplier bit per cycle.
IIRC that was the approach that SPARC took at first, and that was also
designed into HPPA (but then they added the FPU and multiplied by
using the multiplier of the FPU).

- anton

I think so. There is one line in the hardware manual that says
the optional FPU also "enhances the performance of integer multiply".

The FPU technical manual describes how the FPU does its multiplies.
It chops the values into 4b nibbles and looks up 4b x 4b values in ROMs
in a loop, and sums the partial products.
There are two sets of ROMs so it can interleave the lookups.

Note that there are TTL parts available in 1975 to build a
Wallace tree multiplier that can do 32b x 32b in ~130 ns.

s/Wallace/Dadda/

So, at 5MHz == 200ns this is probably a 2-cycle multiplier after
you add the input multiplexing and output selection.

The TI logic databook labels the parts as Wallace tree.

"The 'S274 is a basic 4-bit-by-4-bit parallel
multiplier in a single package, and as such, no
additional components are required to obtain an 8-bit
product.
...
The 'LS275 and 'S275 expandable bit-slice Wallace
trees have been designed to accept up to seven
bit-slice inputs and two carry inputs from previous
slices for reduction to four lines."

--- Synchronet 3.22a-Linux NewsLink 1.2

From jgd@jgd@cix.co.uk (John Dallman) to comp.arch on Tue May 12 22:53:40 2026

From Newsgroup: comp.arch

In article <10tvimf$22n8v$1@dont-email.me>, quadibloc@ca.invalid (quadi)
wrote:

As far as I'm concerned, they made the same mistake with EM64T as
they did with the 80286. Not only should 16-bit stay, but they
should have had it so that 16-bit was as easily accessible from
64-bit as it was from 32-bit, so as to retain all 16-bit software
running perfectly and transparently in 64-bit editions of Windows.

When x86-64 was released, AMD were at pains to point out that running
16-bit environments was possible. It was Microsoft who decided to drop
16-bit support from 64-bit Windows.

John
--- Synchronet 3.22a-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Tue May 12 22:12:30 2026

From Newsgroup: comp.arch

jgd@cix.co.uk (John Dallman) writes:

In article <10tvimf$22n8v$1@dont-email.me>, quadibloc@ca.invalid (quadi) >wrote:

As far as I'm concerned, they made the same mistake with EM64T as
they did with the 80286. Not only should 16-bit stay, but they
should have had it so that 16-bit was as easily accessible from
64-bit as it was from 32-bit, so as to retain all 16-bit software
running perfectly and transparently in 64-bit editions of Windows.

When x86-64 was released, AMD were at pains to point out that running
16-bit environments was possible. It was Microsoft who decided to drop
16-bit support from 64-bit Windows.

I think you need to cite a source for that.

The AMD64 Opterons did not support the segment limit registers
when released.

Some incomplete support was added in the second generation of
Opteron because VMware and XEN had been using the segment limit registers for virtualization (shadow page tables). The support was sufficient
to support XEN temporarily until the NPT (nested page table)
and Pacifica (SVM) features were added to the processor, but
it was not complete enough to support random 16-bit applications.

https://www.pagetable.com/?p=25

Intel never bothered to implement 16-bit segmentation support
in x86_64; instead they created the EPT[*] for virtualization.

[*] AMD's nested page table had the exact same format as the
processor page table, while the Intel EPT had an unique
entry format, different from the processor page tables.
--- Synchronet 3.22a-Linux NewsLink 1.2

From jgd@jgd@cix.co.uk (John Dallman) to comp.arch on Wed May 13 19:34:40 2026

From Newsgroup: comp.arch

In article <iBNMR.2$2K1.1@fx17.iad>, scott@slp53.sl.home (Scott Lurndal)
wrote:

When x86-64 was released, AMD were at pains to point out that
running 16-bit environments was possible. It was Microsoft who
decided to drop 16-bit support from 64-bit Windows.

I think you need to cite a source for that.

The AMD64 Opterons did not support the segment limit registers
when released.

I suspect my informant may have been exaggerating, or I've mis-remembered.

John
--- Synchronet 3.22a-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Wed May 13 18:07:41 2026

From Newsgroup: comp.arch

scott@slp53.sl.home (Scott Lurndal) writes:

jgd@cix.co.uk (John Dallman) writes:

In article <10tvimf$22n8v$1@dont-email.me>, quadibloc@ca.invalid (quadi) >>wrote:

As far as I'm concerned, they made the same mistake with EM64T as
they did with the 80286. Not only should 16-bit stay, but they
should have had it so that 16-bit was as easily accessible from
64-bit as it was from 32-bit, so as to retain all 16-bit software
running perfectly and transparently in 64-bit editions of Windows.

When x86-64 was released, AMD were at pains to point out that running >>16-bit environments was possible. It was Microsoft who decided to drop >>16-bit support from 64-bit Windows.

I think you need to cite a source for that.

The AMD64 Opterons did not support the segment limit registers
when released.

When I bought my Athlon 64 in October 2003, no 64-bit OSs were readily available for it, and all the 32-bit and 16-bit stuff I used ran
nicely. For the game operating system, I always lagged behind, so
maybe I was at W98 when I switched the hardware, and IIRC W98 gave low
frame rates on the new hardware, so I switched to WME, or somesuch,
which fixed the problem. All the 32-bit and 16-bit stuff that was
still in WME worked on the Athlon 64 (which is the same
microarchitecture (K8) as the early Opterons).

If the segment limit registers were not supported, they obviously were
not needed.

But my guess is that you confuse this with the fact that the long mode
(unlike legacy mode, in which the 32-bit OSs run) does not support
most segmentation features of IA-32, and that the FS and GS
segmentation registers it supports does so without limiting the
segment (so they are just base registers in long mode). But, as
mentioned, in legacy mode all segmentation is fully supported.

However, if you have an OS that runs in long mode (i.e., a 64-bit OS),
you cannot run real-mode programs (no real mode, virtual 8086 mode, or
unreal mode), only 16-bit protected mode according to <https://en.wikipedia.org/wiki/X86-64#Operating_modes>. So an OS
would have to switch to legacy mode to get to these modes. That page
also says:

|However, such [real mode] programs may be started from an operating
|system running in long mode on processors supporting VT-x or AMD-V by |creating a virtual processor running in the desired mode.

Some incomplete support was added in the second generation of
Opteron because VMware and XEN had been using the segment limit registers for >virtualization (shadow page tables). The support was sufficient
to support XEN temporarily until the NPT (nested page table)
and Pacifica (SVM) features were added to the processor, but
it was not complete enough to support random 16-bit applications.

https://www.pagetable.com/?p=25

|While VMware could still virtualize 32 bit operating systems on AMD64
|CPUs, they could not virtualize 64 bit operating systems, because they |required segment limits.

So it's not that the Opterons did not support any pre-existing
functionality (no earlier 64-bit OSs for AMD64 existed, because
Opteron was the first AMD64 processor), just that VMware could not use
the same virtualization technique for 64-bit OSs that they could use
for 32-bit OSs and had been using on earlier 32-bit processors and
that they could still use in legacy mode.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.22a-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Wed May 13 18:47:50 2026

From Newsgroup: comp.arch

jgd@cix.co.uk (John Dallman) writes:

In article <iBNMR.2$2K1.1@fx17.iad>, scott@slp53.sl.home (Scott Lurndal) >wrote:

When x86-64 was released, AMD were at pains to point out that
running 16-bit environments was possible. It was Microsoft who
decided to drop 16-bit support from 64-bit Windows.

I think you need to cite a source for that.

The AMD64 Opterons did not support the segment limit registers
when released.

I suspect my informant may have been exaggerating, or I've mis-remembered.

As Anton pointed out, I should have added "in long mode".
--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Wed May 13 20:51:08 2026

From Newsgroup: comp.arch

anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

scott@slp53.sl.home (Scott Lurndal) writes:

jgd@cix.co.uk (John Dallman) writes:

In article <10tvimf$22n8v$1@dont-email.me>, quadibloc@ca.invalid (quadi) >>wrote:

As far as I'm concerned, they made the same mistake with EM64T as
they did with the 80286. Not only should 16-bit stay, but they
should have had it so that 16-bit was as easily accessible from
64-bit as it was from 32-bit, so as to retain all 16-bit software
running perfectly and transparently in 64-bit editions of Windows.

When x86-64 was released, AMD were at pains to point out that running >>16-bit environments was possible. It was Microsoft who decided to drop >>16-bit support from 64-bit Windows.

I think you need to cite a source for that.

The AMD64 Opterons did not support the segment limit registers
when released.

When I bought my Athlon 64 in October 2003, no 64-bit OSs were readily available for it, and all the 32-bit and 16-bit stuff I used ran
nicely. For the game operating system, I always lagged behind, so
maybe I was at W98 when I switched the hardware, and IIRC W98 gave low
frame rates on the new hardware, so I switched to WME, or somesuch,
which fixed the problem. All the 32-bit and 16-bit stuff that was
still in WME worked on the Athlon 64 (which is the same
microarchitecture (K8) as the early Opterons).

If the segment limit registers were not supported, they obviously were
not needed.

At that point in time, you are correct. That time was pre-virtualization
and pre-multi-threaded applications.

But my guess is that you confuse this with the fact that the long mode (unlike legacy mode, in which the 32-bit OSs run) does not support
most segmentation features of IA-32, and that the FS and GS
segmentation registers it supports does so without limiting the
segment (so they are just base registers in long mode). But, as
mentioned, in legacy mode all segmentation is fully supported.

However, if you have an OS that runs in long mode (i.e., a 64-bit OS),
you cannot run real-mode programs (no real mode, virtual 8086 mode, or
unreal mode), only 16-bit protected mode according to <https://en.wikipedia.org/wiki/X86-64#Operating_modes>. So an OS
would have to switch to legacy mode to get to these modes. That page
also says:

|However, such [real mode] programs may be started from an operating
|system running in long mode on processors supporting VT-x or AMD-V by |creating a virtual processor running in the desired mode.

Some incomplete support was added in the second generation of
Opteron because VMware and XEN had been using the segment limit registers for
virtualization (shadow page tables). The support was sufficient
to support XEN temporarily until the NPT (nested page table)
and Pacifica (SVM) features were added to the processor, but
it was not complete enough to support random 16-bit applications.

https://www.pagetable.com/?p=25

|While VMware could still virtualize 32 bit operating systems on AMD64
|CPUs, they could not virtualize 64 bit operating systems, because they |required segment limits.

So it's not that the Opterons did not support any pre-existing
functionality (no earlier 64-bit OSs for AMD64 existed, because
Opteron was the first AMD64 processor), just that VMware could not use
the same virtualization technique for 64-bit OSs that they could use
for 32-bit OSs and had been using on earlier 32-bit processors and
that they could still use in legacy mode.

By allowing 1 segment to contain data, VMware virtualization was improved*.

By allowing 1 segment to contain pointer, multi-threaded-applications could address thread-local-memory easier.

(*) on the other hand, had x86 ISA contained only 1 instruction that
needed to trap when virtualization support was necessary, that segment
register would not have been needed, either.

- anton

--- Synchronet 3.22a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Thu May 14 21:10:32 2026

From Newsgroup: comp.arch

John Dallman <jgd@cix.co.uk> schrieb:

In article <10tvimf$22n8v$1@dont-email.me>, quadibloc@ca.invalid (quadi) wrote:

As far as I'm concerned, they made the same mistake with EM64T as
they did with the 80286. Not only should 16-bit stay, but they
should have had it so that 16-bit was as easily accessible from
64-bit as it was from 32-bit, so as to retain all 16-bit software
running perfectly and transparently in 64-bit editions of Windows.

When x86-64 was released, AMD were at pains to point out that running
16-bit environments was possible. It was Microsoft who decided to drop
16-bit support from 64-bit Windows.

Given that Dosemu runs much faster in emulation than on the original
hardware, that is not such a big loss.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.22a-Linux NewsLink 1.2

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Fri May 15 09:03:43 2026

From Newsgroup: comp.arch

On 2026-May-11 16:51, MitchAlsup wrote:

EricP <ThatWouldBeTelling@thevillage.com> posted:

On 2026-May-09 15:41, Thomas Koenig wrote:

EricP <ThatWouldBeTelling@thevillage.com> schrieb:

Lately I have been playing around with circa 1975 TTL paper cpu designs but
done in a pipelined risc style. The instructions must be variable length because
memory was so expensive in 1975. The key is to not bottleneck in Fetch or Decode.

Last month I designed a TTL fetch-parse unit for a risc-ish pipeline using the same
parts as are on the VAX-780 bill of materials or in the 1976 TI logic data book.

The Fetch unit cares little what the internal format of an instruction is. >> It only looks at most at the first up to 12 bits to determine length.
It only cares how many bytes are in the instruction and are they all
present in the prefetch register. If they are Fetch copies them to
the Decode input instruction register.

This is the stage I call PARSE. Fetch is responsible for presenting
cache sub-lines to PARSE, PARSE slices instructions out of the words
and presenting parsed instructions to multiple parallel DECODE units.

Yes, I call it that too.
The Fetch unit consists of two sub units, the Prefetcher and the Parser.
These are independent machines that coordinate their activity through
the circular prefetch buffer.

More below...

The Fetch unit parser uses a Signetics 82S100 FPLA to determine
the instruction length. The 82S100 has 16 inputs, 8 outputs,
and can match 48 product terms with 0, 1 or x dont-care bits.
The parser feeds the first 12 bits from the first 2 bytes plus
their 2 byte Valid flags and an error signal into the 82S100.
There is one input spare for future use.
The PLA outputs a 4b length from 0 to 12, plus 2 status bits
indicating the length is valid, or fetch unit stalled,
or fetch error, either page fault exception or HW error.
If the parse length is valid then fetch checks that all the
bytes of the instruction length are also valid.
If they are it passes the instruction to Decode input register.

I should mention that the DECODE unit of Mc 88100 was a siingle
NOR plane {while a PLA is 2 NOR planes}.

The key to the design was figuring out a shifter that can
rotate a 16 or 32 byte buffer that fits on the board.

Originally I looked at using 8:1 muxes to build a 16B shifter
but it would have taken 288 16-pin chips for just the shifter,
which was more that the whole pcb limit of 256 16-pin chips.

The solution came when I had the idea of using the VAX-780
barrel shifter chips, the AMD 25S10 7:4 bit shifter.

Datasheets for AMD 25S10
https://www.datasheetarchive.com/?q=am25s10

It takes one of these chips to build a 4 bit barrel shifter,
two layers of 4 chips = 8 total for a 16-bit barrel shifter,
three layers of 16 chips = 48 total for a 64-bit
barrel shifter.

I repurposed them to rotate the 32B prefetch register.
The prefetch register holds 9-bit values, 8 data + 1 valid flag,
that I call VB's or Valid Bytes.
It takes just 72 of these chips to build a shifter for 16 VB's,
and 143 chips for 32 VB's. And those fit on the pcb!

With that in place I considered two Fetch unit designs,
one called LilBuf16 has a 16 VB circular prefetch buffer,
and BigBuf32 has a 32 VB circular prefetch buffer.

Suffice it to say that while LilBuf16 requires 1/2 the
chips for buffer and shifter, it has much more complex
control logic. The prefetcher is moving a write pointer
over the circular prefetch buffer and copying in as many
bytes as possible in each clock, while the parser is
also moving its read pointer. To make it work required
creating a variable sized write mask over the buffer.
And all it would take is one long instruction to
drain the prefetch buffer and cause a stall.

The control logic for BigBuf32 was much simpler as
it prefectes 8 byte blocks and writes the whole block
in at once when it finds an empty slot.
In just 4 clocks it can fill the 32B prefetch buffer.

Design for BigBuf32

I-cache
[#######################]
^ |
Cache Cmd | v 8B Fetch Blocks
& Phy Addr | --------------------------------
| | | | |
PreFetch | v v v v
Counter [##########] 32B PreFetch Reg [=FB3=|=FB2=|=FB1=|=FB0=]
VA & PA | ^ ^ | | | |
| | | v v v v
| | -------->[########]-->[########### 32B Rotate #########]
| | ^ Parse VA | | | | | |
| | | Counter | v v v v v
| | | | [PLA]-->[=12B Validate Present=]
| | | | | |
TLB Cmd | | Phy | | | |
& Vir Addr | | Addr | | | |
v ^ ^ v v v
To TLB Jump Inst Inst Inst Bytes 1..12 + Status
Bus VA VA Len
To Decode Input Inst Register

Fetching starts when a new address and priv mode comes in over
the Jump Bus. The Jump Bus is an open collector (wired-OR) bus
that runs to all the stages that can generate jump addresses:
Decode, RegRead, JBU Jump-Branch-Unit, and Writeback-Retire.

The jump address overwrite the parse and prefetch counters
and resets the phy-addr-valid flag in prefetcher.
Prefetcher see it has a new VA so it requests a translation
from the TLB, and saves the PA in the prefetch counter.

Whenever Prefetcher sees a block valid flag on the prefetch
register is clear, or is going to be made clear in this cycle,
it copies in the next sequential 8-byte fetch block and
sets the block valid flag.

The Parser uses the parse VA to rotate the prefetch register
contents so that the instruction start VB aligns with the
parse PLA inputs. The PLA looks at VB0 and possibly VB1 bits
and spits out an instruction length and some status bits.

If the parse status is valid then it validates that the valid
flags are set on all the VB's in the instruction length.
The validator is just a 16:1 mux and a bunch of AND gates.

The valid instruction bytes, its virtual address, its length
and other status info is passed to Decoder instruction register.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri May 15 13:54:59 2026

From Newsgroup: comp.arch

MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

EricP <ThatWouldBeTelling@thevillage.com> posted:

On 2026-May-10 11:59, Anton Ertl wrote:

EricP <ThatWouldBeTelling@thevillage.com> writes:

I don't know what 780 used for its integer multiplier
(I can't find it in the schematics).

Then I expect that it used microcode to do 1 multiplier bit per cycle.
IIRC that was the approach that SPARC took at first, and that was also
designed into HPPA (but then they added the FPU and multiplied by
using the multiplier of the FPU).

- anton

I think so. There is one line in the hardware manual that says
the optional FPU also "enhances the performance of integer multiply".

The FPU technical manual describes how the FPU does its multiplies.
It chops the values into 4b nibbles and looks up 4b x 4b values in ROMs
in a loop, and sums the partial products.
There are two sets of ROMs so it can interleave the lookups.

Note that there are TTL parts available in 1975 to build a
Wallace tree multiplier that can do 32b x 32b in ~130 ns.

130 ns? That sounds quite fast. Do you happen to have details
of such a design?

s/Wallace/Dadda/

So, at 5MHz == 200ns this is probably a 2-cycle multiplier after
you add the input multiplexing and output selection.

But that is only for mutiplying two nibbles with 32 bits.

By using 64 74S274 and a corresponding number of 74S275, 74S183 and
74S283 (full adders for the corners and for speeding up incoming
carries) plus a carry lookahead cascade of 74S181 and 74S182, it
would have been probably quite possible to build a two-cycle 32*32
multiplier for a VAX-like comptuter on two boards (each of which
can hold around 120 chips), pipelined for one result per cycle.

It is also interesting that they chose ROMs instead of 74S274s
with the same function, probably due to price or power.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.22a-Linux NewsLink 1.2

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Fri May 15 11:30:46 2026

From Newsgroup: comp.arch

On 2026-May-15 09:54, Thomas Koenig wrote:

MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

EricP <ThatWouldBeTelling@thevillage.com> posted:

On 2026-May-10 11:59, Anton Ertl wrote:

EricP <ThatWouldBeTelling@thevillage.com> writes:

I don't know what 780 used for its integer multiplier
(I can't find it in the schematics).

Then I expect that it used microcode to do 1 multiplier bit per cycle. >>>> IIRC that was the approach that SPARC took at first, and that was also >>>> designed into HPPA (but then they added the FPU and multiplied by
using the multiplier of the FPU).

- anton

I think so. There is one line in the hardware manual that says
the optional FPU also "enhances the performance of integer multiply".

The FPU technical manual describes how the FPU does its multiplies.
It chops the values into 4b nibbles and looks up 4b x 4b values in ROMs
in a loop, and sums the partial products.
There are two sets of ROMs so it can interleave the lookups.

Note that there are TTL parts available in 1975 to build a
Wallace tree multiplier that can do 32b x 32b in ~130 ns.

130 ns? That sounds quite fast. Do you happen to have details
of such a design?

I thought so too but it says (the 74LS times are in () )

"When SN74S274 is Combined With SN74H183
(or SN74LS183) and Schottky Look-Ahead
Adders, Multiplication Times are Typically:
16-Bit Product in 75 ns (79 ns)
32-Bit Product in 116 ns (132 ns)"

Starting at page 7-391, after technical specs it
shows a number of example configurations.

54/74 Family MSI/LSI Circuits 1976
Manual is page indexed and searchable https://www.bitsavers.org/components/ti/_dataBooks/1976_TI_The_TTL_Data_Book_2ed/07.pdf

s/Wallace/Dadda/

So, at 5MHz == 200ns this is probably a 2-cycle multiplier after
you add the input multiplexing and output selection.

But that is only for mutiplying two nibbles with 32 bits.

By using 64 74S274 and a corresponding number of 74S275, 74S183 and
74S283 (full adders for the corners and for speeding up incoming
carries) plus a carry lookahead cascade of 74S181 and 74S182, it
would have been probably quite possible to build a two-cycle 32*32
multiplier for a VAX-like comptuter on two boards (each of which
can hold around 120 chips), pipelined for one result per cycle.

It is also interesting that they chose ROMs instead of 74S274s
with the same function, probably due to price or power.

--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Fri May 15 16:02:37 2026

From Newsgroup: comp.arch

On Wed, 13 May 2026 18:07:41 +0000, Anton Ertl wrote:

When I bought my Athlon 64 in October 2003, no 64-bit OSs were readily available for it, and all the 32-bit and 16-bit stuff I used ran nicely.

Yes. You can run 32-bit Windows 7 on a 64-bit CPU in 32-bit mode, and 16-
bit programs will run just as well.

But apparently without rebooting to get into 32-bit mode, there really is
a hardware reason why 16-bit software can't easily be made to work from 64-
bit mode. Whatever the hardware reason is, in my opinion that's a mistake
on the part of the hardware designers.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Fri May 15 16:05:46 2026

From Newsgroup: comp.arch

On Thu, 14 May 2026 21:10:32 +0000, Thomas Koenig wrote:

Given that Dosemu runs much faster in emulation than on the original hardware, that is not such a big loss.

That is not the right comparison. If my old 16-bit software doesn't run at full *native* speed on my shiny new 64-bit computer, then I still have to
run out and buy new software to get my work done faster.

Although it is correct to say that Microsoft could have done something to
fix this, even with the hardware issue as it is. They could have let
people buy just one copy of Windows, and install it twice on the same
computer - so as to be able to boot into either the 32-bit version or the 64-bit version.

John Savard

--- Synchronet 3.22a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri May 15 17:08:59 2026

From Newsgroup: comp.arch

EricP <ThatWouldBeTelling@thevillage.com> schrieb:

On 2026-May-15 09:54, Thomas Koenig wrote:

130 ns? That sounds quite fast. Do you happen to have details
of such a design?

I thought so too but it says (the 74LS times are in () )

"When SN74S274 is Combined With SN74H183
(or SN74LS183) and Schottky Look-Ahead
Adders, Multiplication Times are Typically:

^^^^^^^^^

16-Bit Product in 75 ns (79 ns)
32-Bit Product in 116 ns (132 ns)"

Starting at page 7-391, after technical specs it
shows a number of example configurations.

Therein lies the rub - they probably just added their typical and
not their max values.

54/74 Family MSI/LSI Circuits 1976
Manual is page indexed and searchable https://www.bitsavers.org/components/ti/_dataBooks/1976_TI_The_TTL_Data_Book_2ed/07.pdf

Jep.

My grandrather used to have the TI handbook on his desk.
He developed (and filed a patent for) an instrument for measuring
fuel consumption for cars, and in order to display the customary
L/100 km in Euroupe, he had to divide two numbers (gasoline
consumption and speed) for which he built the circutriy out of
74xx chips. But the main focus was the mechanical device.

How he didn't blow up his house (including me, I liked to "help"
him as a young child) I don't know.

It was probably too early to have market success, now every
car has something like that.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri May 15 18:19:13 2026

From Newsgroup: comp.arch

quadi <quadibloc@ca.invalid> schrieb:

On Thu, 14 May 2026 21:10:32 +0000, Thomas Koenig wrote:

Given that Dosemu runs much faster in emulation than on the original
hardware, that is not such a big loss.

That is not the right comparison. If my old 16-bit software doesn't run at full *native* speed on my shiny new 64-bit computer, then I still have to run out and buy new software to get my work done faster.

At work, I actually use a 16-bit MS-DOS program, which was written
in the early 1990s. Originally, it had run 15-20 minutes; now it
runs in far less than 10 seconds (I never timed it, it is fast).
A newer 64-bit version is available, but I actually don't use it
because the old one works well, and I'm simply too lazy to learn
the new qirks, when I am quite used to the old quirks. It may also
be faster by a factor of 5, I don't care.

By comparison: It takes ages for for me to open an Excel file,
and whenever I change something in a certain complex Excel file,
like moing around a text box in a graph, it decides to recalculate,
taking maybe 30 seconds for me to see that the text box is not
where it should be. (And yes, I have switched off calculation, but
graphics doesn't care).

Although it is correct to say that Microsoft could have done something to fix this, even with the hardware issue as it is. They could have let
people buy just one copy of Windows, and install it twice on the same computer - so as to be able to boot into either the 32-bit version or the 64-bit version.

Haha.

Forcing people to ditch their own computers because they are not
Windows 11 compatible is the Microsoft way.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.22a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri May 15 20:07:35 2026

From Newsgroup: comp.arch

Thomas Koenig <tkoenig@netcologne.de> posted:

quadi <quadibloc@ca.invalid> schrieb:

On Thu, 14 May 2026 21:10:32 +0000, Thomas Koenig wrote:

Given that Dosemu runs much faster in emulation than on the original
hardware, that is not such a big loss.

That is not the right comparison. If my old 16-bit software doesn't run at full *native* speed on my shiny new 64-bit computer, then I still have to run out and buy new software to get my work done faster.

At work, I actually use a 16-bit MS-DOS program, which was written
in the early 1990s. Originally, it had run 15-20 minutes; now it
runs in far less than 10 seconds (I never timed it, it is fast).
A newer 64-bit version is available, but I actually don't use it
because the old one works well, and I'm simply too lazy to learn
the new qirks, when I am quite used to the old quirks. It may also
be faster by a factor of 5, I don't care.

I have an automobile engine simulator* written in eXcel. When run on
my 33 MHz 486, I had time to get up from my chair, walk to the kitchen,
open the fridge, grab a beer, walk back, and the calculations were just finishing. When I got a 200 MHz Pentium Pro the same calculations took
a blink of an eye.

(*) 15 spread sheets, more than 10,000 equations somewhere around
10% of them using SQRT(), SIN(), COS(), EXP(), and LOG().

By comparison: It takes ages for for me to open an Excel file,
and whenever I change something in a certain complex Excel file,
like moing around a text box in a graph, it decides to recalculate,

You can (CAN) turn off automatic recalculation...

taking maybe 30 seconds for me to see that the text box is not
where it should be. (And yes, I have switched off calculation, but
graphics doesn't care).

Although it is correct to say that Microsoft could have done something to fix this, even with the hardware issue as it is. They could have let people buy just one copy of Windows, and install it twice on the same computer - so as to be able to boot into either the 32-bit version or the 64-bit version.

Haha.

Forcing people to ditch their own computers because they are not
Windows 11 compatible is the Microsoft way.

With Intel and AMD support.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri May 15 22:19:24 2026

From Newsgroup: comp.arch

MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

Thomas Koenig <tkoenig@netcologne.de> posted:

quadi <quadibloc@ca.invalid> schrieb:

On Thu, 14 May 2026 21:10:32 +0000, Thomas Koenig wrote:

Given that Dosemu runs much faster in emulation than on the original
hardware, that is not such a big loss.

That is not the right comparison. If my old 16-bit software doesn't run at
full *native* speed on my shiny new 64-bit computer, then I still have to >> > run out and buy new software to get my work done faster.

At work, I actually use a 16-bit MS-DOS program, which was written
in the early 1990s. Originally, it had run 15-20 minutes; now it
runs in far less than 10 seconds (I never timed it, it is fast).
A newer 64-bit version is available, but I actually don't use it
because the old one works well, and I'm simply too lazy to learn
the new qirks, when I am quite used to the old quirks. It may also
be faster by a factor of 5, I don't care.

I have an automobile engine simulator* written in eXcel. When run on
my 33 MHz 486, I had time to get up from my chair, walk to the kitchen,
open the fridge, grab a beer, walk back, and the calculations were just finishing. When I got a 200 MHz Pentium Pro the same calculations took
a blink of an eye.

(*) 15 spread sheets, more than 10,000 equations somewhere around
10% of them using SQRT(), SIN(), COS(), EXP(), and LOG().

By comparison: It takes ages for for me to open an Excel file,
and whenever I change something in a certain complex Excel file,
like moing around a text box in a graph, it decides to recalculate,

You can (CAN) turn off automatic recalculation...

And I did. It turns off the calculations in the sheet, but
apparently it still wants to recalculate things when I shift
something in a graph... and that I cannot turn off.

But of course, if I use spill formulas with things like sorting,
and then display sorted data... obviously my fault. Woe betide
anyone who has more than, let's say, 20000 or 30000 data points in
a column. That is obviously too much for a laptop with 16 GB main
memory and eight cores running Microsoft software and Windows 11.

Now I have moved the work on that sheet to my 512 GB workstation
with 48 Xeon cores, things have gotten tolerable (but just barely).

Using Python in Excel sends data to the Microsoft cloud, and does
not work without Internet access. Yuck.

I could try to use VBA, but who on Earth wants to? Plus Excel macros
are notoriously unsafe, and it is a good idea not to use them, and
not to encourage people to switch them on by default.

External programs - sure, I could write a Fortran or ... program to
do this, but then I could no longer distribute it to colleagues
and expect it to work.

taking maybe 30 seconds for me to see that the text box is not
where it should be. (And yes, I have switched off calculation, but
graphics doesn't care).

Although it is correct to say that Microsoft could have done something to >> > fix this, even with the hardware issue as it is. They could have let
people buy just one copy of Windows, and install it twice on the same
computer - so as to be able to boot into either the 32-bit version or the >> > 64-bit version.

Haha.

Forcing people to ditch their own computers because they are not
Windows 11 compatible is the Microsoft way.

With Intel and AMD support.

Or the other way - Microsoft wanted to get their partners their
partners a shot in the arm. Wintel lives... (only very few of
these absolutely working machines will have somebody installing
Linux on them, I think).
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.22a-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch on Sat May 16 11:20:21 2026

From Newsgroup: comp.arch

On 15/05/2026 18:02, quadi wrote:

On Wed, 13 May 2026 18:07:41 +0000, Anton Ertl wrote:

When I bought my Athlon 64 in October 2003, no 64-bit OSs were readily
available for it, and all the 32-bit and 16-bit stuff I used ran nicely.

Yes. You can run 32-bit Windows 7 on a 64-bit CPU in 32-bit mode, and 16-
bit programs will run just as well.

But apparently without rebooting to get into 32-bit mode, there really is
a hardware reason why 16-bit software can't easily be made to work from 64- bit mode. Whatever the hardware reason is, in my opinion that's a mistake
on the part of the hardware designers.

John Savard

I don't know how easy or not it is to implement the support, but don't remember problems running 16-bit Windows programs on 64-bit XP on an
Athlon. And I've had occasion to run 16-bit Windows programs with Wine
on 64-bit Linux. Maybe this all requires some special effort under the
hood, but it seems to be a solved problem - lack of 16-bit support in
modern 64-bit Windows appears to be a non-technical decision.

--- Synchronet 3.22a-Linux NewsLink 1.2

From quadi@quadibloc@ca.invalid to comp.arch on Mon May 18 05:24:26 2026

From Newsgroup: comp.arch

On Mon, 04 May 2026 18:06:18 +0000, MitchAlsup wrote:

Headers are nothing more than mode-bits that change every block of code.
This means that ISA will be exceptionally difficult to verify, and as
you have found: very difficult to encode.

Your second sentence may well be true.

Your first sentence is definitely true, but it's a feature, not a bug. I intentionally exploited my block headers to achieve what would otherwise require mode bits - allowing instructions to be shorter, because one could switch between alternate sets of instructions - without the great danger
of mode bits for security: someone branching into code written to execute
in one mode while the machine's state specifies a different mode, thus
making code perform unintended actions.

John Savard
--- Synchronet 3.22a-Linux NewsLink 1.2

Who's Online

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	65
Nodes:	6 (0 / 6)
Uptime:	01:48:29
Calls:	862
Files:	1,311
D/L today:	10 files (20,373K bytes)
Messages:	264,321

Re: Concertina II Instead

Who's Online

System Info