• Re: Concertina II Instead

    From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.arch on Thu Apr 16 13:46:33 2026
    From Newsgroup: comp.arch

    On 4/16/2026 11:15 AM, BGB wrote:
    On 4/15/2026 5:44 PM, Bill Findlay wrote:
    On 15 Apr 2026, David Brown wrote
    (in article <10roqep$16j1j$1@dont-email.me>):

    On 15/04/2026 17:36, quadi wrote:
    On Wed, 15 Apr 2026 03:32:00 +0100, moi wrote:
    On 15/04/2026 01:44, MitchAlsup wrote:

    One should also note: in the history of this system (~late 1930s)
    to present: only 2 properly registered FA guns have been used in any >>>>>> crimes. {Anyone with a brain would say this is a pretty good record} >>>>
    Anyone with a non-USAn brain would say this is utterly insane.

    Utterly insane would be if the same procedure applied to thermonuclear >>>> warheads.

    Of course, much about the consequences of the Second Amendment
    indeed does
    appear insane. The sensible thing to do would be to repeal it,
    rather than
    pretend it doesn't exist, or it doesn't mean what it says, and hope the >>>> Supreme Court will look the other way.

    [wise words omitted]

    That's just my two cents - coming from someone in a country ...
    where we have far more real-world freedoms than the USA.
    That is the bit they really can't fathom.


    ?...

    But, AFAIK, the UK is the place that went and banned:
    -a Sharp points on knives;
    -a Sharp points on scissors;

    Better put corks on the forks! This scene from Dirty Rotten Scoundrels
    always cracked me up:

    (Dirty Rotten Scoundrels (1988) - Dinner With Ruprecht Scene (6/12) | Movieclips)

    https://youtu.be/SKDX-qJaJ08

    corks on the forks to prevent him from hurting himself and/or others... ;^D

    -a Buying solder without having certifications;
    -a-a-a So, it is effectively sold black-market in small amounts,
    -a-a-a-a-a to the electronics hobbyists.


    [...]
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Thu Apr 16 20:49:07 2026
    From Newsgroup: comp.arch


    scott@slp53.sl.home (Scott Lurndal) posted:

    MitchAlsup <user5857@newsgrouper.org.invalid> writes:



    You can tell if a person with a gun knows about gun safety as he first >touches the gun, he checks to see if it is unloaded and if not unloads
    it prior to letting anyone else touch it.

    While true, if someone tried to hand me a bolt-action rifle with
    the bolt in place, I'd refuse. Likewise any magazine fed
    weapon should not have a magazine installed and the action should
    be open, if possible before handed to anyone.

    Lookup "Walker Trigger". If you carry a rifle with a cartridge in
    chamber, bolt locked, safety on. There are situations where when the
    safety is turned off the rifle will spontaneously (and negligently)
    fire all by itself. Carry the rifle with no cartridge in chamber, and
    bolt unlocked for maximum safety.

    Which brings up gun safety rule 2: never point a gun at something
    you are not willing to destroy.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.arch on Thu Apr 16 14:59:02 2026
    From Newsgroup: comp.arch

    On 4/15/2026 9:25 AM, Dan Cross wrote:
    In article <SlNDR.276690$4wI6.88606@fx24.iad>,
    Scott Lurndal <slp53@pacbell.net> wrote:
    David Brown <david.brown@hesbynett.no> writes:
    On 15/04/2026 13:30, Dan Cross wrote:
    In article <n48bl0Fdbm4U1@mid.individual.net>,
    moi <findlaybill@blueyonder.co.uk> wrote:
    On 15/04/2026 01:44, MitchAlsup wrote:

    <snip>

    Broadly speaking, they're a pain, and pretty much only useful as
    either a suppression weapon or for guarding avenues of approach
    to fixed defensive positions.

    They're also extremely expensive to use and maintain. at $2.00 or
    more _per round_, the cost rises rapidly.

    Full-auto weapons are also heavy, and so is the ammo. Carrying
    even a SAW is no fun after a few hours; a 240 or Ma Deuce? Not
    happening. A Mk19? Forget about it.

    A fully auto shotgun, say aa12 with the barrel mag?




    Mythbusters shot a minigun on a couple episodes and the ammo cost
    (more than a decade ago) was huge.

    Indeed.



    I suppose that the proponents will point to smaller weapons
    systems, like submachine guns, that are full-auto but shoot
    conventional cartidges (the Thompson shoots .45ACP, for example)
    but they often don't realize that the kickback makes them
    difficult to control. Even the 3-round burst on the M4/M16
    pattern weapons will kick you off target almost instantly; that
    mode is only useful as a direct fire alternative for suppression
    to support advancing infantry in a complex attack scenario when
    indirect fire is not viable, you don't have combined-arms
    support, or you don't have crew-served weapons. It's like a
    method of last resort.

    I don't know why anyone would want a full-auto weapon as
    anything other than a museum piece or a novelty. I strongly
    suspect that such people have never heard a shot fired in anger
    before.

    - Dan C.


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Bill Findlay@findlaybill@blueyonder.co.uk to comp.arch on Thu Apr 16 23:00:37 2026
    From Newsgroup: comp.arch

    On 16 Apr 2026, BGB wrote
    (in article <10rr8vi$1sh0n$1@dont-email.me>):

    On 4/15/2026 5:44 PM, Bill Findlay wrote:
    On 15 Apr 2026, David Brown wrote
    (in article <10roqep$16j1j$1@dont-email.me>):

    On 15/04/2026 17:36, quadi wrote:
    On Wed, 15 Apr 2026 03:32:00 +0100, moi wrote:
    On 15/04/2026 01:44, MitchAlsup wrote:

    One should also note: in the history of this system (~late 1930s) to present: only 2 properly registered FA guns have been used in any
    crimes. {Anyone with a brain would say this is a pretty good record}

    Anyone with a non-USAn brain would say this is utterly insane.

    Utterly insane would be if the same procedure applied to thermonuclear warheads.

    Of course, much about the consequences of the Second Amendment indeed does
    appear insane. The sensible thing to do would be to repeal it, rather than
    pretend it doesn't exist, or it doesn't mean what it says, and hope the Supreme Court will look the other way.

    [wise words omitted]

    That's just my two cents - coming from someone in a country ...
    where we have far more real-world freedoms than the USA.
    That is the bit they really can't fathom.

    ?...

    But, AFAIK, the UK is the place that went and banned:
    Sharp points on knives;
    Sharp points on scissors;
    Buying solder without having certifications;
    So, it is effectively sold black-market in small amounts,
    to the electronics hobbyists.
    ...
    And, where a person can be arrested, for stuff they say on social media
    (or "thought crime" as some are calling it);
    Where corporations can lead search-and-seizure operations for claimed IP violations;

    It is clear from that claptrap that in fact you know very little.
    (MAGA shills like JD are not a trustworthy source of information.)
    --
    Bill Findlay

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Thu Apr 16 18:53:47 2026
    From Newsgroup: comp.arch

    On 4/16/2026 5:00 PM, Bill Findlay wrote:
    On 16 Apr 2026, BGB wrote
    (in article <10rr8vi$1sh0n$1@dont-email.me>):

    On 4/15/2026 5:44 PM, Bill Findlay wrote:
    On 15 Apr 2026, David Brown wrote
    (in article <10roqep$16j1j$1@dont-email.me>):

    On 15/04/2026 17:36, quadi wrote:
    On Wed, 15 Apr 2026 03:32:00 +0100, moi wrote:
    On 15/04/2026 01:44, MitchAlsup wrote:

    One should also note: in the history of this system (~late 1930s) >>>>>>> to present: only 2 properly registered FA guns have been used in any >>>>>>> crimes. {Anyone with a brain would say this is a pretty good record} >>>>>
    Anyone with a non-USAn brain would say this is utterly insane.

    Utterly insane would be if the same procedure applied to thermonuclear >>>>> warheads.

    Of course, much about the consequences of the Second Amendment indeed does
    appear insane. The sensible thing to do would be to repeal it, rather than
    pretend it doesn't exist, or it doesn't mean what it says, and hope the >>>>> Supreme Court will look the other way.

    [wise words omitted]

    That's just my two cents - coming from someone in a country ...
    where we have far more real-world freedoms than the USA.
    That is the bit they really can't fathom.

    ?...

    But, AFAIK, the UK is the place that went and banned:
    Sharp points on knives;
    Sharp points on scissors;
    Buying solder without having certifications;
    So, it is effectively sold black-market in small amounts,
    to the electronics hobbyists.
    ...
    And, where a person can be arrested, for stuff they say on social media
    (or "thought crime" as some are calling it);
    Where corporations can lead search-and-seizure operations for claimed IP
    violations;

    It is clear from that claptrap that in fact you know very little.
    (MAGA shills like JD are not a trustworthy source of information.)


    I am not really part of the MAGA crowd.
    I am not really into politics in general...

    But, this is still what people say online about the UK and CA and similar...


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Fri Apr 17 00:12:01 2026
    From Newsgroup: comp.arch

    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <NdaER.1511$r_k6.609@fx38.iad>,
    Scott Lurndal <slp53@pacbell.net> wrote:
    <snip>

    [*] recently transferred to the CH-53E fleet due to the imminent >>>>retirement of the F-16C fleet.

    The Marine Corps doesn't fly the F-16. :-) Perhaps you mean
    the F/A-18 or the Harrier?

    Brain fart. His squadron flys the F/A-18C and D models. The
    E's and F's will remain in the active fleet along with the F-35,
    but the final days of the C and D models are in sight.

    No problem; I can see wanting to switch over to helos from fixed
    wing. It's a different world.

    His eventual goal is to get his A&P. Figures helos will be good
    experience.


    I got to visit the flight line in 2024, very interesting.

    Cool.

    Prior visit to a Marine base was at 29 Palms in the 1980s, visiting
    a cousin. He was living on-base in married housing and told
    me to avoid the well-lit compound several miles east of the housing area, which
    was secured and managed by the NOP.

    Ah, the stumps. I remember the first time I got there, stepping
    off a bus (we'd just flown from NC, having completed post Parris
    Island training at Camp Lejeune) and immediately seeing tumble
    weed blowing down the main drag. "Oh my god; I have to stay
    here for a YEAR?!" (This was before I became an officer.)

    I wonder which part your cousin meant; perhaps Camp Wilson,
    which is an active training area (and pretty much nothing else,
    though there is a very small PX there selling pogey bait).


    I just looked on google maps and they've pretty much censored
    the entire base in both the map and satellite views.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Fri Apr 17 00:13:25 2026
    From Newsgroup: comp.arch

    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
    On 4/16/2026 11:52 AM, Scott Lurndal wrote:

    Brain fart. His squadron flys the F/A-18C and D models. The
    E's and F's will remain in the active fleet along with the F-35,

    Can he fly the F-35?

    Only if he gets a ride in the back seat. Which, since the F35
    is a single seater, is not gonna happen.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Fri Apr 17 02:06:04 2026
    From Newsgroup: comp.arch

    On Thu, 16 Apr 2026 13:15:13 -0500, BGB wrote:

    And, where a person can be arrested, for stuff they say on social media
    (or "thought crime" as some are calling it);

    Almost all countries other than the United States limit freedom of speech
    by excluding the incitment of hatred towards minority groups.
    This is perhaps a natural result of Europe having had World War II fought
    on its own soil, and so they consider it a matter of survival to prevent
    the rise of another movement similar to Nazism.

    Given current political trends in Europe, I have to say it's a pity they didn't think that one way to prevent the rise of bigoted extremist
    movements would have been not to have such a liberal immigration policy
    that the demographic consequences would end up being an annoyance to a lot
    of ordinary people not previously inclined to bigotry.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.arch on Thu Apr 16 19:08:41 2026
    From Newsgroup: comp.arch

    On 4/16/2026 5:13 PM, Scott Lurndal wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
    On 4/16/2026 11:52 AM, Scott Lurndal wrote:

    Brain fart. His squadron flys the F/A-18C and D models. The
    E's and F's will remain in the active fleet along with the F-35,

    Can he fly the F-35?

    Only if he gets a ride in the back seat. Which, since the F35
    is a single seater, is not gonna happen.

    Oh damn! Shit. He can fly the f-16?
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.arch on Thu Apr 16 19:09:48 2026
    From Newsgroup: comp.arch

    On 4/16/2026 5:13 PM, Scott Lurndal wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
    On 4/16/2026 11:52 AM, Scott Lurndal wrote:

    Brain fart. His squadron flys the F/A-18C and D models. The
    E's and F's will remain in the active fleet along with the F-35,

    Can he fly the F-35?

    Only if he gets a ride in the back seat. Which, since the F35
    is a single seater, is not gonna happen.

    Neat! He is around that tech!
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Thu Apr 16 22:09:21 2026
    From Newsgroup: comp.arch

    On 4/16/2026 9:06 PM, quadi wrote:
    On Thu, 16 Apr 2026 13:15:13 -0500, BGB wrote:

    And, where a person can be arrested, for stuff they say on social media
    (or "thought crime" as some are calling it);

    Almost all countries other than the United States limit freedom of speech
    by excluding the incitment of hatred towards minority groups.
    This is perhaps a natural result of Europe having had World War II fought
    on its own soil, and so they consider it a matter of survival to prevent
    the rise of another movement similar to Nazism.


    OK. This at least seems like a sensible place where limits could be
    imposed, like if a person is advocating for violence against a group of people, or promoting criminal activity (of the sort where actual
    peoples' health or well-being is concerned).

    Though, the line here gets fuzzy.


    I think the claim though was that people were getting arrested for
    things being said that had "offended the political elites" or
    disagreeing with the official party line on various policies or something.

    But, yeah, if it is people promoting doing stuff like what happened in
    WWII, this is more understandable.


    But, yeah, I guess in the US, people going around and doing the whole "neo-Nazi" and "white supremacist" thing has become a bit of an issue...

    This sort of thing can get worrying sometimes. In recent years it does
    seem as if the racists have been winning here.



    Sorta reminds me of some years ago, people were making a lot of fuss
    about BO, like saying here wasn't really an American and was actually
    allying with the Islamists and stuff...

    But, like, there were no real scandals going on, and it was mostly
    uneventful. Then, with DJT, it is like it all turns into a raging
    crap-storm (with an endless stream of infighting, scandals, etc) and
    everyone else is like "Basically cool I guess, keep up the good work".


    To admit something, though possibly an unpopular/controversial position,
    kinda hoped KH would have won. Had I known how big of a crap-storm it
    was all going to be, might have taken a stronger stance on the issue (vs
    just going along passively...). Alas... Though, possibly it would have
    still sucked either way.


    But, yeah, politics is kinda confusing and sucks in this way.



    Given current political trends in Europe, I have to say it's a pity they didn't think that one way to prevent the rise of bigoted extremist
    movements would have been not to have such a liberal immigration policy
    that the demographic consequences would end up being an annoyance to a lot
    of ordinary people not previously inclined to bigotry.


    Possibly true.


    From what I had heard, usual issue is mostly that people from the
    middle east come in and start bombing stuff and trying to push for
    Sharia law everywhere.

    I think also there was a thing where Germany realized this was not cool
    and basically put a ban on allowing anyone to try to impose Sharia law.


    But, say, there is a limit here, like one person trying to impose their religious rules on others isn't cool (nor is them trying to forcibly
    convert others, etc).


    Or, at least within the US context, this makes me think that the whole
    thing of trying to enforce pro-life policies, or putting people through
    "gay conversion therapy" and such, may be doing more harm than good.

    Like, in terms of trying to control peoples' moral behavior, it is
    making the situation worse for them than had they been left to make
    their own choices in these area; even if one still views these things as
    moral faults, and would similarly assume keeping the personal freedoms
    to consider them as moral faults. Like, the line being not so much what
    a person thinks or does for themselves, but where it crosses to imposing
    on others.

    ...


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Fri Apr 17 04:49:41 2026
    From Newsgroup: comp.arch

    On Wed, 15 Apr 2026 15:12:19 +0000, quadi wrote:

    Now, I took out one excessively elaborate header format, and restored a feature to the architecture that I took out when pruning header types,
    but this time in a different form, associated with a different header
    type.

    And now I rearranged the headers a bit, to defragment their opcode space.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Fri Apr 17 05:21:57 2026
    From Newsgroup: comp.arch

    On Fri, 17 Apr 2026 04:49:41 +0000, quadi wrote:

    And now I rearranged the headers a bit, to defragment their opcode
    space.

    This led me to feel I had the perfect spot in which to re-introduce to
    this iteration of the Concertina II architecture that most bizarre feature
    of the architecture which was cited as one of its defining features... I
    think of it as an exotic and strictly optional feature, existing mainly to enhance emulation, while the ability to go from RISC to CISC to VLIW is
    what defines Concertina II.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri Apr 17 05:37:25 2026
    From Newsgroup: comp.arch

    MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

    Thomas Koenig <tkoenig@netcologne.de> posted:


    I first used the G3 at the Bundeswehr. 600 rounds per minute, 20
    rounds per magazine. Firing that weapon at full auto will empty
    it in a tid less than two seconds (note fencepost error here :-)

    Firing short bursts or single shots will keep any enemy down for
    a far longer time than that.

    But a squad at the time had a MG3 (1200 rounds per minute) with it.
    But firing that continously is also not a good idea because

    a) ammunition: 24 grams per round means that you run through 0.48
    kg (a bit more than a pound :-) per second of ammunition, which
    somebody has to carry

    b) you have to exchange barrels and locks after a certain number
    of rounds to prevent overheating.

    Where the military definition of overheating is: "The bullets no longer
    go anywhere close to where the barrel is pointing".

    In this case, not directly. The problem is wear on the barrel
    and the lock. If you fire too many rounds in too short a time,
    the rifling will wear down.

    Hence, replacement barrels (and the asbestos rags for exchanging
    them).

    5 rounds at 20 second intervals is enough to heat a sniper barrel
    to the point it is not "accurate enough". Now, imaging those 5
    rounds in 0.5 seconds ...

    Snipers and machine gunners have different tactical tasks :-)
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch on Fri Apr 17 08:05:55 2026
    From Newsgroup: comp.arch

    On 16/04/2026 20:15, BGB wrote:
    On 4/15/2026 5:44 PM, Bill Findlay wrote:
    On 15 Apr 2026, David Brown wrote
    (in article <10roqep$16j1j$1@dont-email.me>):

    On 15/04/2026 17:36, quadi wrote:
    On Wed, 15 Apr 2026 03:32:00 +0100, moi wrote:
    On 15/04/2026 01:44, MitchAlsup wrote:

    One should also note: in the history of this system (~late 1930s)
    to present: only 2 properly registered FA guns have been used in any >>>>>> crimes. {Anyone with a brain would say this is a pretty good record} >>>>
    Anyone with a non-USAn brain would say this is utterly insane.

    Utterly insane would be if the same procedure applied to thermonuclear >>>> warheads.

    Of course, much about the consequences of the Second Amendment
    indeed does
    appear insane. The sensible thing to do would be to repeal it,
    rather than
    pretend it doesn't exist, or it doesn't mean what it says, and hope the >>>> Supreme Court will look the other way.

    [wise words omitted]

    That's just my two cents - coming from someone in a country ...
    where we have far more real-world freedoms than the USA.
    That is the bit they really can't fathom.


    ?...


    Note - I come from the UK originally, but live in Norway.

    But, AFAIK, the UK is the place that went and banned:
    -a Sharp points on knives;
    -a Sharp points on scissors;
    -a Buying solder without having certifications;
    -a-a-a So, it is effectively sold black-market in small amounts,
    -a-a-a-a-a to the electronics hobbyists.
    -a ...

    Complete nonsense.

    But both the UK and Norway have restrictions on people carrying around
    deadly weapons of all sorts. The freedom not to be stabbed, shot, or otherwise injured or killed trumps the freedom to carry such weapons.

    And, where a person can be arrested, for stuff they say on social media
    (or "thought crime" as some are calling it);

    Only thoughtless people are calling it that. You've been watching tool
    much Fox News - a channel that describes itself as "entertainment"
    without any obligation to tell the truth.

    The freedom of innocent people not to suffer abuse, hatred and prejudice trumps the freedom of nasty little sods who think they have the right to
    abuse others. And inciting hatred or encouraging others to commit
    criminal behaviour is just as much a crime in the USA as the UK.

    Where corporations can lead search-and-seizure operations for claimed IP violations;

    No, they can't.

    And in European countries, unlike the USA, corporations don't get to lie
    and cheat then claim "freedom of speech".

    We are free to live safely. We have the freedom to send our kids to
    school without worrying if they will survive the day. We have the
    freedom of knowing that we won't lose our jobs just because the boss is
    having a bad hair day. And losing a job does not mean losing our
    health. And we have freedom to vote, knowing that votes count equally.
    (Well, the UK parliament elections still have a way to go here, but the Scottish and Norwegian elections have fair votes.)

    No country is perfect by any means, but Europeans live far freer lives.
    We might not have the freedom to own guns so our kids can accidentally
    kill each other, but overall we win out.

    Remember, freedoms are always a balance, not an absolute. Lots of types
    of freedom for one person reduce other freedoms for other people.

    ...


    So, sorta like California but worse...

    California apparently banned the 60/40 lead/tin stuff IIRC, but still
    allows people to freely possess lead-free solder (so people apparently
    need to smuggle the 60/40 into CA if they want to use it). Everywhere
    else, 60/40 is OK. Well, and CA has the "age verification" controversy,
    etc.

    I can't answer for California, but if I want leaded solder I can just
    order some. But the regulations against the use of lead in general are
    a good thing - the freedom to drink water without lead trumps the
    freedom of a handful of people to use cheaper and lower temperature
    soldering irons.


    Could be wrong, this is from memory and stuff I heard on the internet.


    You should be a lot more careful about what you watch on the internet.


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch on Fri Apr 17 08:45:18 2026
    From Newsgroup: comp.arch

    On 16/04/2026 18:31, Dan Cross wrote:
    In article <10rqrkf$1nbrp$1@dont-email.me>,
    David Brown <david.brown@hesbynett.no> wrote:
    On 16/04/2026 13:59, Dan Cross wrote:
    In article <10rqag7$1in0h$1@dont-email.me>,
    David Brown <david.brown@hesbynett.no> wrote:
    [snip]
    (The service members have all their other equipment at
    home too - they store their own uniforms and other stuff, and are
    responsible for washing and repairs or ordering replacements after the >>>> exercises.)

    The same is true in the US military. I always found it annoying
    that I had to make space for my issued equipment at home, but
    c'est la vie.

    Personally, I have no military experience at all (not counting school
    cadets). But one of my sons joined the National Guard after his
    military service. He has a neat solution to problem of space for his
    equipment - he keeps it in his old bedroom in our house, not in his own
    flat :-)

    Seems like your son should take his obligations a bit more
    seriously: keeping his equipment at your house, when it's
    supposed to be in his dwelling, sounds like a rules violation.


    I don't know the details of the regulations, but I can assure you that
    it is entirely within them. As a student, his flat is considered
    temporary accommodation and our house is his permanent home. His
    National Guard base is in our area, not where he is a student.

    Maybe the regulations in the USA are different. Maybe there are
    different standards about how quickly you can be called up and need to
    deploy.

    What's good for the goose is good for the gander.

    It also means that there are less guns stored in
    concentrated places as potential targets for robberies. And in the
    event of a real invasion, it's a lot easier to smugle around and
    distribute firing pins to service members than to pass around guns from >>>> a central armoury.

    There are tradeoffs, however: it is also easier for a bad actor
    to source a weapon by breaking into a private home.

    It is plausible, but AFAIK it is extremely rare here. The solid
    majority of criminals here don't want guns, and would probably not take
    one if they found one in a house they were robbing. As a house burglar,
    you can't easily sell a gun, you don't need one for defence, you don't
    need one to threaten anyone - it just increases your chance of being
    shot yourself, and increases your punishment if you get got. There are
    guns in the more serious narcotics gangs, but those are handguns - they
    have no use for military weapons.

    FWIW, handguns are military weapoons.

    Sorry - I meant pistols, rather than rifle-sized weapons. (Of course
    pistols are also used in the military.)


    My T/E weapons in Afghanistan were a 9mm handgun and an M4
    carbine. I was embedded with an Afghan Army Unit, and was armed
    pretty much at all time. This was at a time when we knew there
    were Taliban infiltrators in the army. Indeed, I mentioned
    working with UK forces: such an infiltrator threw a hand grenade
    into a tent full of British soldiers, who were sleeping at the
    time (this was in the middle of the night) and sprayed it with
    fire from an AK-47, killing three and wounding several more.
    That is to say, we were at a relatively low, but constant and
    higher than baseline level of risk that most ISAF troops were
    not subject to. Despite that, and despite being directly and
    indirectly threatened myself on multiple occasions, I only
    carried my pistol unless I was outside the wire.

    We also don't really go in for shoot-to-kill in Norway.

    Note that I'm talking about the use of deadly force to prevent a
    bad actor from forcibly taking military-grade arms. That is
    exactly the sort of thing that deadly force arguably _should_ be
    authorized for.

    There is a distinction (which I did not make, but should have) between
    authorising deadly force, and encouraging it. As a last resort, it
    makes sense in situations like this. But it should be very much the
    last resort. Real-life criminals are not like in the movies - if
    military guards point guns at them, they will put their hands in the air
    and no one needs to be shot. (It's a different matter for criminals
    high on drugs and unable to think rationally, but they don't try to raid
    military armouries.)

    Of course there, is a continuum of force, and despite recent
    idiots in charge of the US military asserting otherwise, the
    rules of engagement and the laws of warfare are taken _very_
    seriously.

    It's nice to know - especially with the current muppets at the top of
    the USA chain.


    But the military are not the police. Full stop. If someone is
    trying to attack a military armory with the intention of seizing
    arms, they're not likely to be some petty criminal, but if they
    obviously are, they will likely be subdued uninjured. The point
    of mentioning that deadly force is authorized when protecting
    those kinds of assets is to point out the relative value of the
    assets themselves. To whit: weapons are dangerous, and in the
    wrong hands, even more dangerous. They can, and should, be
    protected.

    OK.


    Btw: as a veteran, one of the things that _really_ bothers me in
    the US is when the police, in particular, refer to members of
    the general public as "civilians". Words have meaning;
    "civilian" refers to someone who is not a member of a military.
    The police are are definitionally civilians.


    Agreed.

    If shots need to be fired, the primary aim should
    be to persuade the bad guys to surrender, not to kill them.

    Is it better if the enemy surrenders? Sure. But this idea that
    you are going to shoot to wound in a combat scenario is not
    realistic. As you said, it's not like in the movies.


    I think that there has been a bit of a disparity in the situations we
    have been imagining. I agree entirely that shooting to wound in a
    combat situation is not realistic. It just seemed to me that you were suggesting armoury guards were moving to combat mode a lot more quickly
    than I thought appropriate.

    But more broadly, from a military perspective, this doesn't make
    a lot of sense to me, because it ignores the human factors at
    play in the fog of war.

    Due to the sympathetic physiological reaction to stress, one
    tends to lose one's fine motor skills in a combat-type situation
    and it can be difficult to remember even the most basic bodily
    functions: lose of urinary and sphincter control are common,
    for example (hence the expression, "scared shitless"). Moreover
    long experience in human history shows that it is impossible to
    know a priori how one will react: some people are ridiculously
    calm in combat, others are not.

    While I have (as I said) no military experience, I have a fair bit of
    martial art experience - and what you describe is entirely correct.

    Martial arts are art forms, like dance, not combat training.
    I really wish people who study them would internalize that. You
    get one good punch on the street; you are not experts on actual
    warfare, lethal or otherwise. You should take care not to
    extrapolate your study of an art form to things you have no
    direct experince of.

    I am entirely aware of that. I know what martial arts are, and are not.
    I know the difference between martial arts that are useful in real
    fights, and those that are not - and where the key training differences are.

    And I am fully aware that the stuff I have done is not combat training,
    and I have no intention of getting into a real fight. I am confident
    that I could handle myself in a real fight better than might be expected
    by my appearance, as a result of my martial art training - but since I
    am short, grey-haired and rather round, that's a very low bar. It takes
    a great deal of appropriate training to overcome differences of size,
    strength and age - training that I do not have.

    But I do know how that all works, and I do understand the difference
    between sport fights, sparing, real random fights, and serious combat.


    (I'm sorry to have to snip the rest of this. I have read your entire
    post with interest, and found it enlightening, but it is simply taking
    too much time at the moment. I might get a chance to comment more
    later. It is also seriously off-topic! Thank you for the informative
    posts.)


    Or you can listen to someone who's actually done it and who is
    telling you that the intent (which yes, is explained to the
    troops) is to demonstrate that we take all of this _very_
    seriously, that human falibility means that people can and do
    make mistakes, and that we build processes and procedures to try
    and mitigate or avoid those mistakes.

    Long experience has shown that mutual inspections are better
    than relying on self-affirmation.

    Just to be clear here - mutual inspections are fine and often a good
    thing. It is the idea of standing in line while a commander of some
    sort pats you down that is not. Maybe I just misunderstood what you
    were saying.


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Fri Apr 17 02:20:43 2026
    From Newsgroup: comp.arch

    On 4/17/2026 1:05 AM, David Brown wrote:
    On 16/04/2026 20:15, BGB wrote:
    On 4/15/2026 5:44 PM, Bill Findlay wrote:
    On 15 Apr 2026, David Brown wrote
    (in article <10roqep$16j1j$1@dont-email.me>):

    On 15/04/2026 17:36, quadi wrote:
    On Wed, 15 Apr 2026 03:32:00 +0100, moi wrote:
    On 15/04/2026 01:44, MitchAlsup wrote:

    One should also note: in the history of this system (~late 1930s) >>>>>>> to present: only 2 properly registered FA guns have been used in any >>>>>>> crimes. {Anyone with a brain would say this is a pretty good record} >>>>>
    Anyone with a non-USAn brain would say this is utterly insane.

    Utterly insane would be if the same procedure applied to thermonuclear >>>>> warheads.

    Of course, much about the consequences of the Second Amendment
    indeed does
    appear insane. The sensible thing to do would be to repeal it,
    rather than
    pretend it doesn't exist, or it doesn't mean what it says, and hope >>>>> the
    Supreme Court will look the other way.

    [wise words omitted]

    That's just my two cents - coming from someone in a country ...
    where we have far more real-world freedoms than the USA.
    That is the bit they really can't fathom.


    ?...


    Note - I come from the UK originally, but live in Norway.

    But, AFAIK, the UK is the place that went and banned:
    -a-a Sharp points on knives;
    -a-a Sharp points on scissors;
    -a-a Buying solder without having certifications;
    -a-a-a-a So, it is effectively sold black-market in small amounts,
    -a-a-a-a-a-a to the electronics hobbyists.
    -a-a ...

    Complete nonsense.

    But both the UK and Norway have restrictions on people carrying around deadly weapons of all sorts.-a The freedom not to be stabbed, shot, or otherwise injured or killed trumps the freedom to carry such weapons.


    Fair enough.

    Later went and asked Gemini about it, it said that the laws restrict
    carrying things with sharp tips (like knives and similar) rather than possession of them (say, at a person's house).

    Apparently, the idea that it was a ban on all pointy things was an over-generalization that floats around on the internet.

    ...


    And, where a person can be arrested, for stuff they say on social
    media (or "thought crime" as some are calling it);

    Only thoughtless people are calling it that.-a You've been watching tool much Fox News - a channel that describes itself as "entertainment"
    without any obligation to tell the truth.


    Actually, mostly, it has been a mix of YouTube videos/shorts and
    Twitter/X threads...


    The freedom of innocent people not to suffer abuse, hatred and prejudice trumps the freedom of nasty little sods who think they have the right to abuse others.-a And inciting hatred or encouraging others to commit
    criminal behaviour is just as much a crime in the USA as the UK.


    OK.

    Some people were making it sound like they were opposing peoples'
    abilities to have and express opinions in general (though they didn't
    usually specify on what sorts of topics).


    Originally, seemed like, it could have been something like, say:
    Person says something bad about a political leader or similar;
    Political leader sees it and feels insulted and has person arrested.
    Or, something of this sort, ...



    But, yeah, in the US people usually say the First Amendment guarantees peoples' freedom to have and express opinions about whatever.

    Though, OTOH, I guess things like social media platforms still have the ability to ban people from the platform if they go around spreading hate-speech or similar.

    Apparently I guess that was a problem in the past with DJT, then he got
    banned off of Twitter, then he started his own social network, then Elon bought Twitter and renamed it X, and DJT got back on there, ...


    Where corporations can lead search-and-seizure operations for claimed
    IP violations;

    No, they can't.

    And in European countries, unlike the USA, corporations don't get to lie
    and cheat then claim "freedom of speech".



    Saw a thing not too long ago talking about how apparently Sega left some Nintendo devkits in an old office building, then abandoned the building,
    and later the building owners sold off all the old junk that was left in
    the buildings.

    Story goes that the guy who bought up some of the old junk posted about
    it on the internet, and then Sega + Nintendo + UK Police went in and
    arrested the guy and took all his stuff (then they released the guy, but
    he didn't get the stuff back), because the idea was that him having the devkits was considered as theft of intellectual property.


    Eg (finding a few videos talking about one of the incidents): https://www.youtube.com/watch?v=Sy9Eb8J0xGk https://www.youtube.com/watch?v=NU040CTdJI0

    There was another video I saw in the past talking about it originally,
    but I didn't see them now (haven't watched through the videos I found
    now to compare details).

    ...



    Well, contrast I guess if people post leaked closed-source code on
    GitHub or similar, the companies that own the code may issue DMCA
    take-downs or similar, but will not generally raid the person's house.

    There was some guy though (with some balls) who was releasing
    ported/modded versions of some previously released SuperMario64 code.


    Not personally inclined to look at it or mess with it though.




    We are free to live safely.-a We have the freedom to send our kids to
    school without worrying if they will survive the day.-a We have the
    freedom of knowing that we won't lose our jobs just because the boss is having a bad hair day.-a And losing a job does not mean losing our
    health.-a And we have freedom to vote, knowing that votes count equally. (Well, the UK parliament elections still have a way to go here, but the Scottish and Norwegian elections have fair votes.)

    No country is perfect by any means, but Europeans live far freer lives.
    We might not have the freedom to own guns so our kids can accidentally
    kill each other, but overall we win out.

    Remember, freedoms are always a balance, not an absolute.-a Lots of types
    of freedom for one person reduce other freedoms for other people.


    OK.


    ...


    So, sorta like California but worse...

    California apparently banned the 60/40 lead/tin stuff IIRC, but still
    allows people to freely possess lead-free solder (so people apparently
    need to smuggle the 60/40 into CA if they want to use it). Everywhere
    else, 60/40 is OK. Well, and CA has the "age verification"
    controversy, etc.

    I can't answer for California, but if I want leaded solder I can just
    order some.-a But the regulations against the use of lead in general are
    a good thing - the freedom to drink water without lead trumps the
    freedom of a handful of people to use cheaper and lower temperature soldering irons.


    Where I am, solder is sold on Amazon or similar...

    Had seen videos where people made it seem like solder was some sort of black-market contraband.

    Looking around, apparently the restrictions were specifically on
    lead-based solder though, rather than restricting all solder.


    In the case of California, did see something not too long ago saying
    that they were trying to get something passed to ban personal ownership
    of 3D printers and CNC machines.

    Though, I didn't see anyone else talking about this, so it seemed
    unconfirmed.


    There is a lot more talking going on about the CA "OS age verification"
    bull, and now something saying that the US federal people are now
    looking into something similar (groan, this has a risk to potentially
    ruin open source and make everything suck...). Hopefully it goes the way
    of past proposals for bans on OSS and RISC-V and similar, ...

    So, not like US is exactly perfect either...




    Could be wrong, this is from memory and stuff I heard on the internet.


    You should be a lot more careful about what you watch on the internet.


    Possibly.

    Then again, I guess a lot of the news I had seen had also been fed
    through the lens of video game commentators and similar.


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Fri Apr 17 08:51:12 2026
    From Newsgroup: comp.arch

    David Brown <david.brown@hesbynett.no> writes:
    On 16/04/2026 20:15, BGB wrote:
    And, where a person can be arrested, for stuff they say on social media
    (or "thought crime" as some are calling it);

    Only thoughtless people are calling it that. You've been watching tool
    much Fox News

    Or he has been watching too much Youtube (or the like). Every time
    somebody tells me that he thinks that something is true that isn't, or
    makes a strange judgement, as in this case, and I ask them where they
    got that from, they tell me "Youtube".

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Fri Apr 17 15:05:41 2026
    From Newsgroup: comp.arch

    On Fri, 17 Apr 2026 08:51:12 GMT
    anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

    David Brown <david.brown@hesbynett.no> writes:
    On 16/04/2026 20:15, BGB wrote:
    And, where a person can be arrested, for stuff they say on social
    media (or "thought crime" as some are calling it);

    Only thoughtless people are calling it that. You've been watching
    tool much Fox News

    Or he has been watching too much Youtube (or the like). Every time
    somebody tells me that he thinks that something is true that isn't, or
    makes a strange judgement, as in this case, and I ask them where they
    got that from, they tell me "Youtube".

    - anton

    Pay attention that David Brown didn't say that things mentioned by BGB
    don't actually happen in UK. He was merely disagreeinng with Orwellian
    naming.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Bill Findlay@findlaybill@blueyonder.co.uk to comp.arch on Fri Apr 17 13:09:54 2026
    From Newsgroup: comp.arch

    On 17 Apr 2026, BGB wrote
    (in article <10rrsqc$22m9c$1@dont-email.me>):

    On 4/16/2026 5:00 PM, Bill Findlay wrote:
    On 16 Apr 2026, BGB wrote
    (in article <10rr8vi$1sh0n$1@dont-email.me>):
    ...

    But, AFAIK, the UK is the place that went and banned:
    Sharp points on knives;
    Sharp points on scissors;
    Buying solder without having certifications;
    So, it is effectively sold black-market in small amounts,
    to the electronics hobbyists.
    ...
    And, where a person can be arrested, for stuff they say on social media (or "thought crime" as some are calling it);
    Where corporations can lead search-and-seizure operations for claimed IP violations;

    It is clear from that claptrap that in fact you know very little.
    (MAGA shills like JD are not a trustworthy source of information.)

    I am not really part of the MAGA crowd.
    I am not really into politics in general...

    But, this is still what people say online about the UK and CA and similar...

    So, myevaluation of your words was spot on.
    --
    Bill Findlay

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Bill Findlay@findlaybill@blueyonder.co.uk to comp.arch on Fri Apr 17 13:11:59 2026
    From Newsgroup: comp.arch

    On 17 Apr 2026, David Brown wrote
    (in article <10rsik4$287kv$1@dont-email.me>):

    On 16/04/2026 20:15, BGB wrote:
    On 4/15/2026 5:44 PM, Bill Findlay wrote:
    On 15 Apr 2026, David Brown wrote
    (in article <10roqep$16j1j$1@dont-email.me>):

    On 15/04/2026 17:36, quadi wrote:
    On Wed, 15 Apr 2026 03:32:00 +0100, moi wrote:
    On 15/04/2026 01:44, MitchAlsup wrote:

    One should also note: in the history of this system (~late 1930s) to present: only 2 properly registered FA guns have been used in any
    crimes. {Anyone with a brain would say this is a pretty good record}

    Anyone with a non-USAn brain would say this is utterly insane.

    Utterly insane would be if the same procedure applied to thermonuclear
    warheads.

    Of course, much about the consequences of the Second Amendment
    indeed does
    appear insane. The sensible thing to do would be to repeal it,
    rather than
    pretend it doesn't exist, or it doesn't mean what it says, and hope the
    Supreme Court will look the other way.

    [wise words omitted]

    That's just my two cents - coming from someone in a country ...
    where we have far more real-world freedoms than the USA.
    That is the bit they really can't fathom.

    ?...

    Note - I come from the UK originally, but live in Norway.

    But, AFAIK, the UK is the place that went and banned:
    Sharp points on knives;
    Sharp points on scissors;
    Buying solder without having certifications;
    So, it is effectively sold black-market in small amounts,
    to the electronics hobbyists.
    ...

    Complete nonsense.

    But both the UK and Norway have restrictions on people carrying around
    deadly weapons of all sorts. The freedom not to be stabbed, shot, or otherwise injured or killed trumps the freedom to carry such weapons.

    And, where a person can be arrested, for stuff they say on social media
    (or "thought crime" as some are calling it);

    Only thoughtless people are calling it that. You've been watching tool
    much Fox News - a channel that describes itself as "entertainment"
    without any obligation to tell the truth.

    The freedom of innocent people not to suffer abuse, hatred and prejudice trumps the freedom of nasty little sods who think they have the right to abuse others. And inciting hatred or encouraging others to commit
    criminal behaviour is just as much a crime in the USA as the UK.

    Where corporations can lead search-and-seizure operations for claimed IP violations;

    No, they can't.

    And in European countries, unlike the USA, corporations don't get to lie
    and cheat then claim "freedom of speech".

    We are free to live safely. We have the freedom to send our kids to
    school without worrying if they will survive the day. We have the
    freedom of knowing that we won't lose our jobs just because the boss is having a bad hair day. And losing a job does not mean losing our
    health. And we have freedom to vote, knowing that votes count equally.
    (Well, the UK parliament elections still have a way to go here, but the Scottish and Norwegian elections have fair votes.)

    No country is perfect by any means, but Europeans live far freer lives.
    We might not have the freedom to own guns so our kids can accidentally
    kill each other, but overall we win out.

    Remember, freedoms are always a balance, not an absolute. Lots of types
    of freedom for one person reduce other freedoms for other people.

    ...


    So, sorta like California but worse...

    California apparently banned the 60/40 lead/tin stuff IIRC, but still allows people to freely possess lead-free solder (so people apparently
    need to smuggle the 60/40 into CA if they want to use it). Everywhere
    else, 60/40 is OK. Well, and CA has the "age verification" controversy, etc.

    I can't answer for California, but if I want leaded solder I can just
    order some. But the regulations against the use of lead in general are
    a good thing - the freedom to drink water without lead trumps the
    freedom of a handful of people to use cheaper and lower temperature
    soldering irons.


    Could be wrong, this is from memory and stuff I heard on the internet.

    You should be a lot more careful about what you watch on the internet.

    Bravo! You put a lot more work into that than I could
    stomach in response to Fuck News talking points.
    Thank you.
    --
    Bill Findlay

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch on Fri Apr 17 14:18:45 2026
    From Newsgroup: comp.arch

    On 17/04/2026 09:20, BGB wrote:
    On 4/17/2026 1:05 AM, David Brown wrote:
    On 16/04/2026 20:15, BGB wrote:
    On 4/15/2026 5:44 PM, Bill Findlay wrote:
    On 15 Apr 2026, David Brown wrote
    (in article <10roqep$16j1j$1@dont-email.me>):

    On 15/04/2026 17:36, quadi wrote:
    On Wed, 15 Apr 2026 03:32:00 +0100, moi wrote:
    On 15/04/2026 01:44, MitchAlsup wrote:

    One should also note: in the history of this system (~late 1930s) >>>>>>>> to present: only 2 properly registered FA guns have been used in >>>>>>>> any
    crimes. {Anyone with a brain would say this is a pretty good
    record}

    Anyone with a non-USAn brain would say this is utterly insane.

    Utterly insane would be if the same procedure applied to
    thermonuclear
    warheads.

    Of course, much about the consequences of the Second Amendment
    indeed does
    appear insane. The sensible thing to do would be to repeal it,
    rather than
    pretend it doesn't exist, or it doesn't mean what it says, and
    hope the
    Supreme Court will look the other way.

    [wise words omitted]

    That's just my two cents - coming from someone in a country ...
    where we have far more real-world freedoms than the USA.
    That is the bit they really can't fathom.


    ?...


    Note - I come from the UK originally, but live in Norway.

    But, AFAIK, the UK is the place that went and banned:
    -a-a Sharp points on knives;
    -a-a Sharp points on scissors;
    -a-a Buying solder without having certifications;
    -a-a-a-a So, it is effectively sold black-market in small amounts,
    -a-a-a-a-a-a to the electronics hobbyists.
    -a-a ...

    Complete nonsense.

    But both the UK and Norway have restrictions on people carrying around
    deadly weapons of all sorts.-a The freedom not to be stabbed, shot, or
    otherwise injured or killed trumps the freedom to carry such weapons.


    Fair enough.

    Later went and asked Gemini about it, it said that the laws restrict carrying things with sharp tips (like knives and similar) rather than possession of them (say, at a person's house).

    I don't think any LLM is going to be a good source of information here.
    Gemini might not be as bad as Grok, but AI will often miss the point,
    and be heavily influenced by the kinds of drivel that is often published
    on the net.

    Basically, the laws say that if you are caught with a large screwdriver
    that you are using to stab or threaten people, you will be charged and
    treated as though you were carrying a knife for that purpose. Older
    laws banned carrying knifes and the like in public places - newer laws
    target weapons, where a "weapon" is anything that you use or plan to use
    for violence or threats of violence.


    Apparently, the idea that it was a ban on all pointy things was an over- generalization that floats around on the internet.


    Exactly. Social media rewards people who greatly exaggerate things to
    make them sound dramatic. And political extremists love fear-mongering,
    with a total disregard for the truth or subtleties of reality.

    ...


    And, where a person can be arrested, for stuff they say on social
    media (or "thought crime" as some are calling it);

    Only thoughtless people are calling it that.-a You've been watching
    tool much Fox News - a channel that describes itself as
    "entertainment" without any obligation to tell the truth.


    Actually, mostly, it has been a mix of YouTube videos/shorts and
    Twitter/X threads...


    If he were still alive, Goebbels would have banned Twatter for having
    too much right-wing propaganda. It is not healthy to use it as a source
    of information unless you want the calender dates for KKK meetings.

    YouTube shorts are not much better. Research has shown again and again
    that social media algorithms exaggerate from people's opinions and
    interests - they provide echo chambers that move you further and further
    from central or balanced positions. It is somewhat inevitable -
    balanced opinions are not particularly interesting or engaging, so they
    are not popular and don't generate revenue for the social media platform
    or the content creator. The real world is seldom as exciting as
    people's imaginations.

    I'd recommend looking at a reality check site like snopes.com to get an
    idea of how easily people get duped. Begin smart isn't enough (you are
    a very smart guy) - you need to understand how you are being
    manipulated. Ask yourself "Cui bono" ? Who is benefiting, in money,
    power or influence?


    The freedom of innocent people not to suffer abuse, hatred and
    prejudice trumps the freedom of nasty little sods who think they have
    the right to abuse others.-a And inciting hatred or encouraging others
    to commit criminal behaviour is just as much a crime in the USA as the
    UK.


    OK.

    Some people were making it sound like they were opposing peoples'
    abilities to have and express opinions in general (though they didn't usually specify on what sorts of topics).


    Originally, seemed like, it could have been something like, say:
    -a Person says something bad about a political leader or similar;
    -a Political leader sees it and feels insulted and has person arrested.
    Or, something of this sort, ...


    That does happen in some countries. There are plenty of dictatorships,
    or partial dictatorships, where that kind of thing goes on. Journalists
    are banned from official press conferences or buildings because they ask awkward questions or publish things unflattering to the wannabe king. Politicians get attacked for saying things like people have to follow
    laws. The USA has gone seriously downhill in that respect in the last
    16 months or so. Some other parts of the world are worse - sometimes
    much worse. But most European countries have very strong freedom of
    speech protection as long as you are not harming other people with your speech. (Freedom of speech must be weighed against freedom /from/
    speech - just like freedom of religion and freedom from religion.)



    But, yeah, in the US people usually say the First Amendment guarantees peoples' freedom to have and express opinions about whatever.


    Many people do think that, as I understand it. But that's not what the constitution says. In particular, it only limits what Congress and
    federal authorities can do to stop people expressing themselves - it
    does not in any way require non-government entities from allowing people
    to say anything they want. Media (newspapers, TV, social media, etc.)
    can impose whatever limitations they want.

    Though, OTOH, I guess things like social media platforms still have the ability to ban people from the platform if they go around spreading hate-speech or similar.

    Correct. Equally, they are allowed to encourage hate-speech and ban
    people arguing against it.

    There are plenty of restrictions to free speech in the USA - you can't
    incite violence, for example. And something you say might be considered conspiracy to commit a crime. Things you say might fall foul of other
    laws, such as harassment, psychological abuse, prejudice, etc. Other countries' laws might put more emphasis on freedom from hate speech than
    the USA, but the idea that the USA has freedom of speech and Europeans
    do not is wrong.


    Apparently I guess that was a problem in the past with DJT, then he got banned off of Twitter, then he started his own social network, then Elon bought Twitter and renamed it X, and DJT got back on there, ...


    Where corporations can lead search-and-seizure operations for claimed
    IP violations;

    No, they can't.

    And in European countries, unlike the USA, corporations don't get to
    lie and cheat then claim "freedom of speech".



    Saw a thing not too long ago talking about how apparently Sega left some Nintendo devkits in an old office building, then abandoned the building,
    and later the building owners sold off all the old junk that was left in
    the buildings.

    Story goes that the guy who bought up some of the old junk posted about
    it on the internet, and then Sega + Nintendo + UK Police went in and arrested the guy and took all his stuff (then they released the guy, but
    he didn't get the stuff back), because the idea was that him having the devkits was considered as theft of intellectual property.


    So what you are saying here is that the /police/ conducted a raid and
    seizure in connection with suspected unlawfully obtained goods and/or industrial espionage, acting on information provided by a company that believed it was a victim of the crime? And that when the dust settled,
    they realised that there was no intentional crime?

    That's not the breakdown of free society as you implied - it's the
    police doing their job of enforcing the law, but possibly making the
    wrong judgement call. Police in every country have to make decisions
    based on limited information, and sometimes they make the wrong
    decision. (I don't know this case, and can't say if they were wrong or right.)


    Eg (finding a few videos talking about one of the incidents): https://www.youtube.com/watch?v=Sy9Eb8J0xGk https://www.youtube.com/watch?v=NU040CTdJI0


    I do not want to click on your links and follow you down your rabbit
    hole. The Youtube algorithm has learned that I like maths and physics
    videos, some computing stuff, some comedy, linguistics, etc. I'd rather
    it didn't think I was interested in conspiracy theories about the
    breakdown of every country that does not follow Maga philosophies.


    Remember, freedoms are always a balance, not an absolute.-a Lots of
    types of freedom for one person reduce other freedoms for other people.


    OK.


    ...


    So, sorta like California but worse...

    California apparently banned the 60/40 lead/tin stuff IIRC, but still
    allows people to freely possess lead-free solder (so people
    apparently need to smuggle the 60/40 into CA if they want to use it).
    Everywhere else, 60/40 is OK. Well, and CA has the "age verification"
    controversy, etc.

    I can't answer for California, but if I want leaded solder I can just
    order some.-a But the regulations against the use of lead in general
    are a good thing - the freedom to drink water without lead trumps the
    freedom of a handful of people to use cheaper and lower temperature
    soldering irons.


    Where I am, solder is sold on Amazon or similar...

    Had seen videos where people made it seem like solder was some sort of black-market contraband.

    So you know you can get solder in the post in a couple of days, but you
    have seen someone on a video saying it has been banned and you have to
    smuggle it on the black market - and you believed the video, not your
    own experience?


    Looking around, apparently the restrictions were specifically on lead-
    based solder though, rather than restricting all solder.


    Of course. Lead is a neuropoison, and has caused significant reduction
    in mental capacity (and other health problems) for vast numbers of
    people. An argument has been made that the fall of the Roman Empire can
    be partially blamed on lead pipes and lead dishes. Lead from petrol has caused massive low-level poisoning. Lead in groundwater causes
    poisoning. It makes a lot of sense to have regulations to reduce the
    use of lead in other situations - such as solder - where there are
    perfectly good alternatives.


    In the case of California, did see something not too long ago saying
    that they were trying to get something passed to ban personal ownership
    of 3D printers and CNC machines.

    Though, I didn't see anyone else talking about this, so it seemed unconfirmed.


    Don't misunderstand me - sometimes it really is a case of politicians
    doing stupid things. Sometimes it is for personal profit or the result
    of lobbying, often it is well-intentioned but not matched by an
    understanding of the implications. Here the lawmakers saw that people
    can make gun parts on 3-D printers, and wanted to stop that from being possible (fair enough as an aim). The resulting proposed legislation
    would have banned all personal usage of 3-D printers - a real "throw the
    baby out with the bathwater" solution.


    There is a lot more talking going on about the CA "OS age verification" bull, and now something saying that the US federal people are now
    looking into something similar (groan, this has a risk to potentially
    ruin open source and make everything suck...). Hopefully it goes the way
    of past proposals for bans on OSS and RISC-V and similar, ...

    So, not like US is exactly perfect either...

    The situation here is that social media is not suitable or safe for
    kids. (It is not safe or suitable for adults either, but it's harder to
    argue that politically - "think of the kids" is always good for votes.)
    Social media countries don't want to do anything about this, making any
    kind of realistic registration or age checks, and they certainly don't
    want to have to remove the dangerous crap they host. So they punt the
    problem - they promise politicians lots of money if the politicians
    impose laws saying the OS or other platforms must handle the age
    verification. I don't know where this one will all end. (There's an
    easy solution - social media companies could charge $10 a year per
    account, payable only via credit card. That would immediately solve
    much of the problems they cause.)







    Could be wrong, this is from memory and stuff I heard on the internet.


    You should be a lot more careful about what you watch on the internet.


    Possibly.

    Then again, I guess a lot of the news I had seen had also been fed
    through the lens of video game commentators and similar.


    Video game commentators are good for comments on video games.

    You would do well to look at sites like ground.news or allsides.com that
    make a specific point of showing news from multiple different sites to
    help you understand the biases. Or look at multiple news sources that
    are publicly funded but independent of any direct government control,
    such as the BBC website. For privately owned news sources, find one
    that charges you money - "free" sites still charge you, but in hidden
    ways. No one site is perfect - you have to combine them.



    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch on Fri Apr 17 14:47:50 2026
    From Newsgroup: comp.arch

    On 17/04/2026 14:05, Michael S wrote:
    On Fri, 17 Apr 2026 08:51:12 GMT
    anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

    David Brown <david.brown@hesbynett.no> writes:
    On 16/04/2026 20:15, BGB wrote:
    And, where a person can be arrested, for stuff they say on social
    media (or "thought crime" as some are calling it);

    Only thoughtless people are calling it that. You've been watching
    tool much Fox News

    Or he has been watching too much Youtube (or the like). Every time
    somebody tells me that he thinks that something is true that isn't, or
    makes a strange judgement, as in this case, and I ask them where they
    got that from, they tell me "Youtube".


    That does seem to be one of BGB's sources, yes. There's a lot of good
    and interesting stuff on Youtube too, but you have to pick your choices actively - not just let algorithms and autoplay wander through popular
    videos.


    Pay attention that David Brown didn't say that things mentioned by BGB
    don't actually happen in UK. He was merely disagreeinng with Orwellian naming.


    While that is true of what I wrote there, I do actually think a lot of
    what BGB wrote was incorrect. Many parts had a grain of truth that had
    been exaggerated well beyond what was reasonable - but the grain of
    truth was still there. Some laws and regulations are significantly
    different between the USA and the UK, and it is entirely reasonable to disagree with or disapprove of some of them.

    The UK does not have "thought crime" in any sense. But it is true that
    it is possible to make postings on social media that constitute criminal behaviour in the UK. It is also true in the USA. (For example,
    threatening the president is a crime, no matter how unrealistic or
    laughable the threat may be.) I think it is fair to say that it is
    easier for a post to be a crime in the UK than the USA, but there is not
    as much of a difference as some people (and some Youtubers!) think.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch on Fri Apr 17 15:01:12 2026
    From Newsgroup: comp.arch

    On 17/04/2026 14:11, Bill Findlay wrote:
    On 17 Apr 2026, David Brown wrote
    (in article <10rsik4$287kv$1@dont-email.me>):

    On 16/04/2026 20:15, BGB wrote:

    Could be wrong, this is from memory and stuff I heard on the internet.

    You should be a lot more careful about what you watch on the internet.

    Bravo! You put a lot more work into that than I could
    stomach in response to Fuck News talking points.
    Thank you.


    I've "known" BGB for many years on Usenet. He is very intelligent, but sometimes a bit too quick to trust the wrong sources. I hope I can
    encourage him to be more careful about what to trust and what not to
    trust. (And that includes not believing things just because /I/ say so
    either - I can be wrong too.)

    Fun fact - I heard about the "ground.news" site I mentioned because they sponsored some Youtube videos I have watched :-)

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Fri Apr 17 16:07:21 2026
    From Newsgroup: comp.arch

    David Brown wrote:
    On 17/04/2026 09:20, BGB wrote:
    On 4/17/2026 1:05 AM, David Brown wrote:
    Complete nonsense.

    But both the UK and Norway have restrictions on people carrying
    around deadly weapons of all sorts.|e-a The freedom not to be stabbed,
    shot, or otherwise injured or killed trumps the freedom to carry such
    weapons.


    Fair enough.

    Later went and asked Gemini about it, it said that the laws restrict
    carrying things with sharp tips (like knives and similar) rather than >> possession of them (say, at a person's house).

    I don't think any LLM is going to be a good source of information here. Gemini might not be as bad as Grok, but AI will often miss the point,
    and be heavily influenced by the kinds of drivel that is often published
    on the net.

    Basically, the laws say that if you are caught with a large screwdriver
    that you are using to stab or threaten people, you will be charged and > treated as though you were carrying a knife for that purpose.-a Older
    laws banned carrying knifes and the like in public places - newer laws > target weapons, where a "weapon" is anything that you use or plan to use
    for violence or threats of violence.
    Norway did not use to have any restrictions at all on tools, like knives/axes/scythes up to and including shotguns.
    The current regulations have explicit exceptions for knife carry in
    public plaes when those knives (or swords!) are part of uniform or
    traditional dress. I.e around May 17th it is perfectly fine to carry
    large amounts of metal (mostly silver) through airport security, but
    they might tell you that the silver-decorated knife should go in checked luggage.
    Similarly, Scouts' knives are fine anywhere.
    Regarding firearms the Norwegian regulations are still way less strict
    than the UK, pretty much anyone without a mental illness or felony
    record can legally own a handgun:
    You just need to start by becoming a member of a local pistol shooting
    club, then turn up regularly to practice using club guns (at least 10
    times or more) over a year, then pass a police security vetting which
    check for those mental/felony bans.
    At this point you can legally buy something like a Glock or 1911 and get the serial number on your credit-card sized ownership card.
    If you are very active, then you can get separate permits for a primary
    and spare gun for each of the competition classes you regularly compete
    in, I know people with 10+ handguns in their gun safe.
    However, unlike the US, there is absolutely no way to get either an open or concealed carry permit unless you are military or police.
    Any handgun you own _must_ be stored in a proper gun safe, in can only
    be brought out for cleaning and to transport it to the shooting range.
    During that transport, the gun cannot be in the front seat with you, it
    has to stay in the trunk or back seat, unloaded of course, and still in
    its carrying box.
    Terje
    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch on Fri Apr 17 16:57:20 2026
    From Newsgroup: comp.arch

    On 17/04/2026 16:07, Terje Mathisen wrote:
    David Brown wrote:
    On 17/04/2026 09:20, BGB wrote:
    On 4/17/2026 1:05 AM, David Brown wrote:
    Complete nonsense.

    But both the UK and Norway have restrictions on people carrying
    around deadly weapons of all sorts.|e-a The freedom not to be stabbed, >>>> shot, or otherwise injured or killed trumps the freedom to carry
    such weapons.


    Fair enough.

    Later went and asked Gemini about it, it said that the laws restrict
    carrying things with sharp tips (like knives and similar) rather than
    possession of them (say, at a person's house).

    I don't think any LLM is going to be a good source of information
    here. Gemini might not be as bad as Grok, but AI will often miss the
    point, and be heavily influenced by the kinds of drivel that is often
    published on the net.

    Basically, the laws say that if you are caught with a large
    screwdriver that you are using to stab or threaten people, you will be
    charged and treated as though you were carrying a knife for that
    purpose.-a Older laws banned carrying knifes and the like in public
    places - newer laws target weapons, where a "weapon" is anything that
    you use or plan to use for violence or threats of violence.

    Norway did not use to have any restrictions at all on tools, like knives/axes/scythes up to and including shotguns.

    The current regulations have explicit exceptions for knife carry in
    public plaes when those knives (or swords!) are part of uniform or traditional dress. I.e around May 17th it is perfectly fine to carry
    large amounts of metal (mostly silver) through airport security, but
    they might tell you that the silver-decorated knife should go in checked luggage.


    I were my sgian-dubh with my kilt. Despite the name, it's not very
    hidden. It used to be legal to have one in the cabin of planes in the
    UK, as long as it stayed in your sock.

    Similarly, Scouts' knives are fine anywhere.

    Regarding firearms the Norwegian regulations are still way less strict
    than the UK, pretty much anyone without a mental illness or felony
    record can legally own a handgun:

    You just need to start by becoming a member of a local pistol shooting
    club, then turn up regularly to practice using club guns (at least 10
    times or more) over a year, then pass a police security vetting which
    check for those mental/felony bans.

    At this point you can legally buy something like a Glock or 1911 and get
    the serial number on your credit-card sized ownership card.

    If you are very active, then you can get separate permits for a primary
    and spare gun for each of the competition classes you regularly compete
    in, I know people with 10+ handguns in their gun safe.

    However, unlike the US, there is absolutely no way to get either an open
    or concealed carry permit unless you are military or police.

    Indeed. And you have to keep them locked in a gun safe, which the
    police can check at short notice.

    Similarly, you can own a hunting rifle if you pass the hunting tests and
    are vetted by the police.


    Any handgun you own _must_ be stored in a proper gun safe, in can only
    be brought out for cleaning and to transport it to the shooting range.

    During that transport, the gun cannot be in the front seat with you, it
    has to stay in the trunk or back seat, unloaded of course, and still in
    its carrying box.


    Basically, in Norway you can have guns for sport or hunting, but not for threatening or shooting people.

    The gun laws in the UK are a lot more restrictive (after all, there's
    not nearly as much scope for hunting in most of the UK). Farmers can
    get shotgun licenses, but I think if you have a pistol for sport it has
    to be kept at the pistol club, not at home. (I have not looked at the
    rules in detail, so I could be wrong or out-dated.)


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Fri Apr 17 17:30:07 2026
    From Newsgroup: comp.arch

    David Brown wrote:
    On 17/04/2026 16:07, Terje Mathisen wrote:
    David Brown wrote:
    On 17/04/2026 09:20, BGB wrote:
    On 4/17/2026 1:05 AM, David Brown wrote:
    Complete nonsense.

    But both the UK and Norway have restrictions on people carrying
    around deadly weapons of all sorts.|arCU|e-a The freedom not to be
    stabbed, shot, or otherwise injured or killed trumps the freedom to >>>>> carry such weapons.


    Fair enough.

    Later went and asked Gemini about it, it said that the laws restrict
    carrying things with sharp tips (like knives and similar) rather
    than possession of them (say, at a person's house).

    I don't think any LLM is going to be a good source of information
    here. Gemini might not be as bad as Grok, but AI will often miss the >>> point, and be heavily influenced by the kinds of drivel that is often
    published on the net.

    Basically, the laws say that if you are caught with a large
    screwdriver that you are using to stab or threaten people, you will
    be charged and treated as though you were carrying a knife for that
    purpose.|e-a Older laws banned carrying knifes and the like in public
    places - newer laws target weapons, where a "weapon" is anything that
    you use or plan to use for violence or threats of violence.

    Norway did not use to have any restrictions at all on tools, like
    knives/axes/scythes up to and including shotguns.

    The current regulations have explicit exceptions for knife carry in
    public plaes when those knives (or swords!) are part of uniform or
    traditional dress. I.e around May 17th it is perfectly fine to carry
    large amounts of metal (mostly silver) through airport security, but
    they might tell you that the silver-decorated knife should go in
    checked luggage.


    I were my sgian-dubh with my kilt.-a Despite the name, it's not very hidden.-a It used to be legal to have one in the cabin of planes in the
    UK, as long as it stayed in your sock.

    Similarly, Scouts' knives are fine anywhere.

    Regarding firearms the Norwegian regulations are still way less strict
    than the UK, pretty much anyone without a mental illness or felony
    record can legally own a handgun:

    You just need to start by becoming a member of a local pistol shooting
    club, then turn up regularly to practice using club guns (at least 10 >> times or more) over a year, then pass a police security vetting which >> check for those mental/felony bans.

    At this point you can legally buy something like a Glock or 1911 and
    get the serial number on your credit-card sized ownership card.

    If you are very active, then you can get separate permits for a
    primary and spare gun for each of the competition classes you
    regularly compete in, I know people with 10+ handguns in their gun safe.

    However, unlike the US, there is absolutely no way to get either an
    open or concealed carry permit unless you are military or police.

    Indeed.-a And you have to keep them locked in a gun safe, which the
    police can check at short notice.

    Similarly, you can own a hunting rifle if you pass the hunting tests and
    are vetted by the police.


    Any handgun you own _must_ be stored in a proper gun safe, in can only
    be brought out for cleaning and to transport it to the shooting range.>>
    During that transport, the gun cannot be in the front seat with you,
    it has to stay in the trunk or back seat, unloaded of course, and
    still in its carrying box.


    Basically, in Norway you can have guns for sport or hunting, but not for threatening or shooting people.

    The gun laws in the UK are a lot more restrictive (after all, there's
    not nearly as much scope for hunting in most of the UK).-a Farmers can
    get shotgun licenses, but I think if you have a pistol for sport it has
    to be kept at the pistol club, not at home.-a (I have not looked at the rules in detail, so I could be wrong or out-dated.)
    Its worse as far as I know:
    Even Olympic shooters in the UK cannot own handguns, they have to be
    active military, with the gun(s) kept on base.
    Non-military shooters need to cross the channel in order to practise in
    France instead.
    Terje
    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Fri Apr 17 15:45:07 2026
    From Newsgroup: comp.arch

    David Brown <david.brown@hesbynett.no> writes:
    On 16/04/2026 20:15, BGB wrote:
    On 4/15/2026 5:44 PM, Bill Findlay wrote:

    <snip>

    California apparently banned the 60/40 lead/tin stuff IIRC, but still
    allows people to freely possess lead-free solder (so people apparently
    need to smuggle the 60/40 into CA if they want to use it). Everywhere
    else, 60/40 is OK. Well, and CA has the "age verification" controversy,
    etc.

    I can't answer for California,

    I can. 60/40 solder is NOT banned in california. It is not allowed, however, to be used in drinking water plumbing applications (for fairly obvious reasons)
    and it must be labeled as potentially hazardous per prop 65.

    Most of what BGB wrote above seems completely wrong. 30 seconds with
    google before he/she posts facts would be a good habit for he/she to adopt.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Fri Apr 17 15:55:49 2026
    From Newsgroup: comp.arch

    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
    On 4/16/2026 5:13 PM, Scott Lurndal wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
    On 4/16/2026 11:52 AM, Scott Lurndal wrote:

    Brain fart. His squadron flys the F/A-18C and D models. The
    E's and F's will remain in the active fleet along with the F-35,

    Can he fly the F-35?

    Only if he gets a ride in the back seat. Which, since the F35
    is a single seater, is not gonna happen.

    Oh damn! Shit. He can fly the f-16?

    DAGS "ordie".
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Fri Apr 17 12:37:04 2026
    From Newsgroup: comp.arch

    On 4/17/2026 8:01 AM, David Brown wrote:
    On 17/04/2026 14:11, Bill Findlay wrote:
    On 17 Apr 2026, David Brown wrote
    (in article <10rsik4$287kv$1@dont-email.me>):

    On 16/04/2026 20:15, BGB wrote:

    Could be wrong, this is from memory and stuff I heard on the internet.

    You should be a lot more careful about what you watch on the internet.

    Bravo! You put a lot more work into that than I could
    stomach in response to Fuck News talking points.
    Thank you.


    I've "known" BGB for many years on Usenet.-a He is very intelligent, but sometimes a bit too quick to trust the wrong sources.-a I hope I can encourage him to be more careful about what to trust and what not to trust.-a (And that includes not believing things just because /I/ say so either - I can be wrong too.)

    Fun fact - I heard about the "ground.news" site I mentioned because they sponsored some Youtube videos I have watched :-)


    Not usually been one for fact checking, as my interest areas are still (mostly) technical (rather than political or legal), so whatever
    political leaning I would have mostly shouldn't matter.



    But, yeah, sometimes it gets a little sketchy if it is just someone
    rambling on top of gameplay footage, like Subway Surfers, or similar.
    Back in the day, we had Audiosurf, but seems that Subway Surfers has
    overtaken this role.

    Some people have also used games like COD or Minecraft, but watching
    someone play these games from a first person POV for any length of time
    causes motion sickness.

    At one point, people doing videos with talking over the top of clips of Skibidi Toilet was pretty popular (during the high point of Skibidi
    Toilet, but its popularity seems to have weakened as of late).


    The video of the guy ranting about CA taking away personal ownership of
    3D printers and CNC machines was over the top of video of him working on
    one of his craft projects. Usually though, for stuff like this, there
    would be repetition from multiple sources (in a case of "one guy saying something, possibly all just nothing, many people saying the same thing, possibly true" sense).


    Sometimes there is a lot of stuff with people fighting over "Evolution
    vs Young Earth Creationism" and similar. Sometimes people arguing for
    the Earth being a flat disk (obviously wrong), ...

    All sorts of stuff going on...

    Sometimes does wander into international politics territory.




    Sort of reminded how for a while there were lots of YouTube sponsored
    segments for a company claiming to sell Lordship status...

    Then other people saying it was a scam, because that was "not how it
    worked". IIRC idea was that they bought a farm somewhere and were
    "selling" it in 1ft^2 parcels or similar, planting a little flag on each
    one, and then selling the people the title of "Lord Whatever" under the premise of them having their flag planted on a parcel of land in the UK
    or such...

    Had to look, couldn't initially remember name off-hand: https://en.wikipedia.org/wiki/Established_Titles

    Then the whole thing went away.
    Apparently a combination of public controversy and also the UK
    government apparently also being like "that is not how that works".

    ...


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Fri Apr 17 13:13:34 2026
    From Newsgroup: comp.arch

    On 4/17/2026 7:09 AM, Bill Findlay wrote:
    On 17 Apr 2026, BGB wrote
    (in article <10rrsqc$22m9c$1@dont-email.me>):

    On 4/16/2026 5:00 PM, Bill Findlay wrote:
    On 16 Apr 2026, BGB wrote
    (in article <10rr8vi$1sh0n$1@dont-email.me>):
    ...

    But, AFAIK, the UK is the place that went and banned:
    Sharp points on knives;
    Sharp points on scissors;
    Buying solder without having certifications;
    So, it is effectively sold black-market in small amounts,
    to the electronics hobbyists.
    ...
    And, where a person can be arrested, for stuff they say on social media >>>> (or "thought crime" as some are calling it);
    Where corporations can lead search-and-seizure operations for claimed IP >>>> violations;

    It is clear from that claptrap that in fact you know very little.
    (MAGA shills like JD are not a trustworthy source of information.)

    I am not really part of the MAGA crowd.
    I am not really into politics in general...

    But, this is still what people say online about the UK and CA and similar...

    So, myevaluation of your words was spot on.

    If you mean to say that you think I am also part of the MAGA crowd (or
    idolize DJT or JDV), I will disagree...


    But, if you mean it in the sense that I have no particular expertise in international law or similar, then probably true enough (not exactly
    that I have much basis to disagree on this point).

    Most of what I have gathered on this topic has mostly been from the
    "streams of random people talking about stuff on the internet" (usually
    mixed in with other stuff, particularly if doom scrolling on X or
    similar...).


    And, sometimes, one can have something to play in the background while
    they are working on code or similar, that doesn't require all that much attention, ... A lot of times, YouTube videos where people just sort of
    ramble on about some topic can work well, when not just listening to
    music or similar.


    But, in general, rarely did people really say much about either the UK
    or CA, if they talk about them at all, these are a few places that
    people seem to like to rip on.

    The other side here being much more into going on about the whole thing
    in Gaza (and human rights violations, ...), TX oppressing reproductive
    rights, various places opposing people's ability to express their
    preferred gender identity, etc. These two areas rarely overlap though on
    a single topic. Though, admittedly, I don't really understand the whole "gender identity" thing all that well, doesn't strongly interact with my
    own experience.


    Well, and the places they do overlap is usually in conflict over the perception of public figures like DJT (like whether he is hero or demon,
    ...), and similar...


    ...



    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Fri Apr 17 21:33:25 2026
    From Newsgroup: comp.arch

    On Fri, 17 Apr 2026 05:21:57 +0000, quadi wrote:

    This led me to feel I had the perfect spot in which to re-introduce to
    this iteration of the Concertina II architecture that most bizarre
    feature of the architecture which was cited as one of its defining features... I think of it as an exotic and strictly optional feature, existing mainly to enhance emulation, while the ability to go from RISC
    to CISC to VLIW is what defines Concertina II.

    And now I've increased the number of header types by one, so that the
    extra options can be combined with the full instruction set, rather than
    these additions being disjoint from each other so that one can only choose
    one but not the other. If you need both at once, they're available at the
    cost of a little more overhead.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri Apr 17 21:38:49 2026
    From Newsgroup: comp.arch

    John Levine <johnl@taugh.com> schrieb:

    But let's get back to gate delays, please.

    Let's.

    How do people actually count gate delays, and how useful is it?
    Different gates have different delays (obviously), so counting
    an inverter the same as a three-input NOR gate (independent of
    fan-out, even) seems to be a large simplification which may be
    useful for a fairly rough approximation, but not that much better.

    Or am I missing something?
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat Apr 18 01:11:49 2026
    From Newsgroup: comp.arch


    Thomas Koenig <tkoenig@netcologne.de> posted:

    John Levine <johnl@taugh.com> schrieb:

    But let's get back to gate delays, please.

    Let's.

    How do people actually count gate delays, and how useful is it?
    Different gates have different delays (obviously), so counting
    an inverter the same as a three-input NOR gate (independent of
    fan-out, even) seems to be a large simplification which may be
    useful for a fairly rough approximation, but not that much better.

    Or am I missing something?

    There is the "standard" FO4 counting scheme where 1 gate drives
    4 other gate inputs, and in this scheme, a D-type flip-flop was
    2.5 gates of delay.

    As one can guess, gates can be sized: as small as obeys the FAB
    design rules, to <basically> as big as one can afford. Naturally,
    as gates get bigger, they can drive bigger loads--BUT they also
    present bigger loads to the gates driving them.

    Conway figured out that the fastest way to "buffer up" a signal
    was to use inverters staged in the ratio of 1:e:e^2:e^3... with
    e being the standard 2.7... base of natural logarithms. Rounding
    e up to 3 degrades speed by less than 1%, rounding up to 4 only
    slows down 10% or so--so, most buffering is done at FO4, where
    a minimum sized gate drives an inverter 4|u as big which would then
    drive another inverter 16|u as big...

    Each transistor between a power connection and the signal connec-
    tion basically, adds its own transconductance to the electrical path
    (ignoring body effect). Knowing that deMorgan's laws apply; we
    instantly see that a Nand gate is simply a Nor gate with inverted
    inputs. A Nand gate has its serial string of FETs between signal
    and ground and a parralel path from signal to Vdd, while a Nor has
    its serial FETs between signal and Vdd and its parallel path between
    signal and ground. To deal with these serial paths, the transistors
    are lengthened; a 2Nand has N-channels 2|u as wide and can use 1|u
    P channels, a 3Nand has 3|u N-channels and still 1|u P-channels, ...
    {Nors are similar but reverse Ns and Ps} Somewhere along the line,
    the parallel path FETs have to get lengthened because the capacitance
    of all the serial path diffusion capacitance (to maintain rather
    equal pull up and pull down).

    Soon, one realizes that one needs SPICE simulation with accurate
    models to push the edge--just like when pushing the Young's Modulus
    in engineering models.

    In fast designs, there is an entire team charged with buffering and
    routing the CLOCK so that every gate in 10^2 mm^2 receives its rising
    edge and falling edge with less than 1 gate of delay 'skew' across
    the whole chip using wires that have more than 1 gate of delay when
    jumping over 30 gates. CRAY-1 had less than 1ns of clock skew in a
    machine the size of restaurant refrigerator using wires with 2ns/foot
    of delay. In ASIC designs, we assume (starting out) that there will
    be 1/2 clock of skew in the 'clock'
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch on Sat Apr 18 10:56:56 2026
    From Newsgroup: comp.arch

    On 17/04/2026 19:37, BGB wrote:
    On 4/17/2026 8:01 AM, David Brown wrote:
    On 17/04/2026 14:11, Bill Findlay wrote:
    On 17 Apr 2026, David Brown wrote
    (in article <10rsik4$287kv$1@dont-email.me>):

    On 16/04/2026 20:15, BGB wrote:

    Could be wrong, this is from memory and stuff I heard on the internet. >>>>
    You should be a lot more careful about what you watch on the internet.

    Bravo! You put a lot more work into that than I could
    stomach in response to Fuck News talking points.
    Thank you.


    I've "known" BGB for many years on Usenet.-a He is very intelligent,
    but sometimes a bit too quick to trust the wrong sources.-a I hope I
    can encourage him to be more careful about what to trust and what not
    to trust.-a (And that includes not believing things just because /I/
    say so either - I can be wrong too.)

    Fun fact - I heard about the "ground.news" site I mentioned because
    they sponsored some Youtube videos I have watched :-)


    Not usually been one for fact checking, as my interest areas are still (mostly) technical (rather than political or legal), so whatever
    political leaning I would have mostly shouldn't matter.


    Fact checking is also important in technical fields!

    But while you might not be particularly interested in politics, politics
    and other aspects of societies and countries around the world still
    affect you. It is still good to have some rough ideas about what is
    going on in the world - and how to tell if something is true or not (or
    at least likely to be true or not).



    But, yeah, sometimes it gets a little sketchy if it is just someone
    rambling on top of gameplay footage, like Subway Surfers, or similar.
    Back in the day, we had Audiosurf, but seems that Subway Surfers has overtaken this role.

    Some people have also used games like COD or Minecraft, but watching
    someone play these games from a first person POV for any length of time causes motion sickness.

    At one point, people doing videos with talking over the top of clips of Skibidi Toilet was pretty popular (during the high point of Skibidi
    Toilet, but its popularity seems to have weakened as of late).


    The video of the guy ranting about CA taking away personal ownership of
    3D printers and CNC machines was over the top of video of him working on
    one of his craft projects. Usually though, for stuff like this, there
    would be repetition from multiple sources (in a case of "one guy saying something, possibly all just nothing, many people saying the same thing, possibly true" sense).


    One particular politician is famous for prefacing many of his boldest
    and most absurd lies with "many people are saying...". Many people can
    say the same thing, and still be wrong. (Just look at religion around
    the world. Many people say one thing. Many people say something
    totally different. They can't all be right - many people are wrong.)

    "Proof by repeated assertion" is not a valid argument, whether it is one person saying something many times, or many people saying the same thing.



    Sometimes there is a lot of stuff with people fighting over "Evolution
    vs Young Earth Creationism" and similar. Sometimes people arguing for
    the Earth being a flat disk (obviously wrong), ...


    I'm glad that you at least consider "flat Earth" to be obviously wrong.
    The same applies to any kind of "young earth" idea.

    All sorts of stuff going on...

    Sometimes does wander into international politics territory.




    Sort of reminded how for a while there were lots of YouTube sponsored segments for a company claiming to sell Lordship status...


    It is possible to argue that believing the nonsense you have heard about
    is harmless - though I would say making yourself look foolish can be considered "harm". But scams con people out of real money. Fair enough
    if it is a small amount of money, and clearly nonsense, bought as a joke
    - like buying insurance against alien kidnapping. Please be careful
    about any kinds of scams you come across.

    Then other people saying it was a scam, because that was "not how it worked". IIRC idea was that they bought a farm somewhere and were
    "selling" it in 1ft^2 parcels or similar, planting a little flag on each one, and then selling the people the title of "Lord Whatever" under the premise of them having their flag planted on a parcel of land in the UK
    or such...

    Had to look, couldn't initially remember name off-hand: https://en.wikipedia.org/wiki/Established_Titles

    Then the whole thing went away.
    Apparently a combination of public controversy and also the UK
    government apparently also being like "that is not how that works".

    ...



    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Sat Apr 18 11:11:35 2026
    From Newsgroup: comp.arch

    BGB wrote:
    Sort of reminded how for a while there were lots of YouTube sponsored segments for a company claiming to sell Lordship status...

    Then other people saying it was a scam, because that was "not how it worked". IIRC idea was that they bought a farm somewhere and were
    "selling" it in 1ft^2 parcels or similar, planting a little flag on each one, and then selling the people the title of "Lord Whatever" under the premise of them having their flag planted on a parcel of land in the UK
    or such...

    Had to look, couldn't initially remember name off-hand: https://en.wikipedia.org/wiki/Established_Titles

    Then the whole thing went away.
    Apparently a combination of public controversy and also the UK
    government apparently also being like "that is not how that works".

    Something similar to this actually happened, in Norway:

    At one point, after establishing that all men could vote, it was
    understood that this of course only meant men with a tie to the land,
    i.e farmers, millers, blacksmiths, industialists etc.

    In reaction, a group working to make voting rights really universal
    bought up large tracts of worthless swamp/marshland and split it into
    square foot parcels. Armed with a owership certificate for said plot,
    you could not be denied your voting rights.

    Terje
    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch on Sat Apr 18 11:37:01 2026
    From Newsgroup: comp.arch

    On 18/04/2026 11:11, Terje Mathisen wrote:
    BGB wrote:
    Sort of reminded how for a while there were lots of YouTube sponsored
    segments for a company claiming to sell Lordship status...

    Then other people saying it was a scam, because that was "not how it
    worked". IIRC idea was that they bought a farm somewhere and were
    "selling" it in 1ft^2 parcels or similar, planting a little flag on
    each one, and then selling the people the title of "Lord Whatever"
    under the premise of them having their flag planted on a parcel of
    land in the UK or such...

    Had to look, couldn't initially remember name off-hand:
    https://en.wikipedia.org/wiki/Established_Titles

    Then the whole thing went away.
    Apparently a combination of public controversy and also the UK
    government apparently also being like "that is not how that works".

    Something similar to this actually happened, in Norway:


    Well, similar in that it was parcelling of land - dissimilar in that it
    was not all a scam!

    At one point, after establishing that all men could vote, it was
    understood that this of course only meant men with a tie to the land,
    i.e farmers, millers, blacksmiths, industialists etc.

    In reaction, a group working to make voting rights really universal
    bought up large tracts of worthless swamp/marshland and split it into
    square foot parcels. Armed with a owership certificate for said plot,
    you could not be denied your voting rights.


    I've also seen this done in the UK as a way to protect land from
    developers. A forest (or whatever land is to be protected) is bought
    then parcelled up and sold to thousands of people. If someone wants to destroy the forest to build houses, factories, or whatever, they need to
    find all these owners and buy from each of them individually.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sat Apr 18 10:08:10 2026
    From Newsgroup: comp.arch

    David Brown <david.brown@hesbynett.no> schrieb:

    I've also seen this done in the UK as a way to protect land from
    developers. A forest (or whatever land is to be protected) is bought
    then parcelled up and sold to thousands of people. If someone wants to destroy the forest to build houses, factories, or whatever, they need to find all these owners and buy from each of them individually.

    You can also buy a square foot of land to get a (worthless, but
    amusing) title, for example as "Laird" in Scotland.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch on Sat Apr 18 14:18:29 2026
    From Newsgroup: comp.arch

    On 18/04/2026 12:08, Thomas Koenig wrote:
    David Brown <david.brown@hesbynett.no> schrieb:

    I've also seen this done in the UK as a way to protect land from
    developers. A forest (or whatever land is to be protected) is bought
    then parcelled up and sold to thousands of people. If someone wants to
    destroy the forest to build houses, factories, or whatever, they need to
    find all these owners and buy from each of them individually.

    You can also buy a square foot of land to get a (worthless, but
    amusing) title, for example as "Laird" in Scotland.


    No, you can't.

    You can give some money to a bunch of scammers who say that you can buy
    a title, but it has no connection with reality.

    BGB already gave this example, with more information about how it was a
    scam.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Sat Apr 18 13:01:31 2026
    From Newsgroup: comp.arch

    On 2026-Apr-17 21:11, MitchAlsup wrote:

    Thomas Koenig <tkoenig@netcologne.de> posted:

    John Levine <johnl@taugh.com> schrieb:

    But let's get back to gate delays, please.

    Let's.

    How do people actually count gate delays, and how useful is it?
    Different gates have different delays (obviously), so counting
    an inverter the same as a three-input NOR gate (independent of
    fan-out, even) seems to be a large simplification which may be
    useful for a fairly rough approximation, but not that much better.

    Or am I missing something?

    There is the "standard" FO4 counting scheme where 1 gate drives
    4 other gate inputs, and in this scheme, a D-type flip-flop was
    2.5 gates of delay.

    As one can guess, gates can be sized: as small as obeys the FAB
    design rules, to <basically> as big as one can afford. Naturally,
    as gates get bigger, they can drive bigger loads--BUT they also
    present bigger loads to the gates driving them.

    Conway figured out that the fastest way to "buffer up" a signal
    was to use inverters staged in the ratio of 1:e:e^2:e^3... with
    e being the standard 2.7... base of natural logarithms. Rounding
    e up to 3 degrades speed by less than 1%, rounding up to 4 only
    slows down 10% or so--so, most buffering is done at FO4, where
    a minimum sized gate drives an inverter 4|u as big which would then
    drive another inverter 16|u as big...

    Each transistor between a power connection and the signal connec-
    tion basically, adds its own transconductance to the electrical path (ignoring body effect). Knowing that deMorgan's laws apply; we
    instantly see that a Nand gate is simply a Nor gate with inverted
    inputs. A Nand gate has its serial string of FETs between signal
    and ground and a parralel path from signal to Vdd, while a Nor has
    its serial FETs between signal and Vdd and its parallel path between
    signal and ground. To deal with these serial paths, the transistors
    are lengthened; a 2Nand has N-channels 2|u as wide and can use 1|u
    P channels, a 3Nand has 3|u N-channels and still 1|u P-channels, ...
    {Nors are similar but reverse Ns and Ps} Somewhere along the line,
    the parallel path FETs have to get lengthened because the capacitance
    of all the serial path diffusion capacitance (to maintain rather
    equal pull up and pull down).

    Soon, one realizes that one needs SPICE simulation with accurate
    models to push the edge--just like when pushing the Young's Modulus
    in engineering models.

    In fast designs, there is an entire team charged with buffering and
    routing the CLOCK so that every gate in 10^2 mm^2 receives its rising
    edge and falling edge with less than 1 gate of delay 'skew' across
    the whole chip using wires that have more than 1 gate of delay when
    jumping over 30 gates. CRAY-1 had less than 1ns of clock skew in a
    machine the size of restaurant refrigerator using wires with 2ns/foot
    of delay. In ASIC designs, we assume (starting out) that there will
    be 1/2 clock of skew in the 'clock'

    The part I don't see is the rules for combinatorial gates.
    There also seem to be combinatorial gates like XOR or AND-OR-INV or MUX
    where multiple gates are combined in one but at a lower gate delay.

    For example, in TTL an XOR or a 2:1 or 4:1 mux has 3 or 4 gate delays
    because it really is an INV, an AND and a OR,
    but in CMOS those seem to be just 1 or 1.5 gate delays.

    In CMOS sometimes one is able to smoosh gates together and eliminate
    gate delays, but the rules for when smooshing is allowed are not
    obvious to me. I just assumed that it all sorts out in SPICE simulation.

    I find this makes it more difficult to just look at a CMOS logic circuit
    and know whether it will fit within a 20 gate delay stage budget.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat Apr 18 17:59:39 2026
    From Newsgroup: comp.arch


    EricP <ThatWouldBeTelling@thevillage.com> posted:

    On 2026-Apr-17 21:11, MitchAlsup wrote:

    Thomas Koenig <tkoenig@netcologne.de> posted:

    John Levine <johnl@taugh.com> schrieb:

    But let's get back to gate delays, please.

    Let's.

    How do people actually count gate delays, and how useful is it?
    Different gates have different delays (obviously), so counting
    an inverter the same as a three-input NOR gate (independent of
    fan-out, even) seems to be a large simplification which may be
    useful for a fairly rough approximation, but not that much better.

    Or am I missing something?

    There is the "standard" FO4 counting scheme where 1 gate drives
    4 other gate inputs, and in this scheme, a D-type flip-flop was
    2.5 gates of delay.

    As one can guess, gates can be sized: as small as obeys the FAB
    design rules, to <basically> as big as one can afford. Naturally,
    as gates get bigger, they can drive bigger loads--BUT they also
    present bigger loads to the gates driving them.

    Conway figured out that the fastest way to "buffer up" a signal
    was to use inverters staged in the ratio of 1:e:e^2:e^3... with
    e being the standard 2.7... base of natural logarithms. Rounding
    e up to 3 degrades speed by less than 1%, rounding up to 4 only
    slows down 10% or so--so, most buffering is done at FO4, where
    a minimum sized gate drives an inverter 4|u as big which would then
    drive another inverter 16|u as big...

    Each transistor between a power connection and the signal connec-
    tion basically, adds its own transconductance to the electrical path (ignoring body effect). Knowing that deMorgan's laws apply; we
    instantly see that a Nand gate is simply a Nor gate with inverted
    inputs. A Nand gate has its serial string of FETs between signal
    and ground and a parralel path from signal to Vdd, while a Nor has
    its serial FETs between signal and Vdd and its parallel path between
    signal and ground. To deal with these serial paths, the transistors
    are lengthened; a 2Nand has N-channels 2|u as wide and can use 1|u
    P channels, a 3Nand has 3|u N-channels and still 1|u P-channels, ...
    {Nors are similar but reverse Ns and Ps} Somewhere along the line,
    the parallel path FETs have to get lengthened because the capacitance
    of all the serial path diffusion capacitance (to maintain rather
    equal pull up and pull down).

    Soon, one realizes that one needs SPICE simulation with accurate
    models to push the edge--just like when pushing the Young's Modulus
    in engineering models.

    In fast designs, there is an entire team charged with buffering and
    routing the CLOCK so that every gate in 10^2 mm^2 receives its rising
    edge and falling edge with less than 1 gate of delay 'skew' across
    the whole chip using wires that have more than 1 gate of delay when
    jumping over 30 gates. CRAY-1 had less than 1ns of clock skew in a
    machine the size of restaurant refrigerator using wires with 2ns/foot
    of delay. In ASIC designs, we assume (starting out) that there will
    be 1/2 clock of skew in the 'clock'

    The part I don't see is the rules for combinatorial gates.
    There also seem to be combinatorial gates like XOR or AND-OR-INV or MUX
    where multiple gates are combined in one but at a lower gate delay.

    For example, in TTL an XOR or a 2:1 or 4:1 mux has 3 or 4 gate delays
    because it really is an INV, an AND and a OR,
    but in CMOS those seem to be just 1 or 1.5 gate delays.

    A 4:1 Mux is a 2222AOI gate--one Ssllooww gate--but still 1 gate.

    One slow gate is generally faster than 2 gates (and less power)
    because only 1 signal has to move (Vdd->gnd or gnd->Vdd) instead
    of more than one. Each signal moving is limited by the transcon-
    ductance of the FET stack and the capacitance being driven. We
    call the rise/fall time the edge speed.

    In CMOS sometimes one is able to smoosh gates together and eliminate
    gate delays, but the rules for when smooshing is allowed are not
    obvious to me. I just assumed that it all sorts out in SPICE simulation.

    Almost always be deMorganizing the logic.

    I find this makes it more difficult to just look at a CMOS logic circuit
    and know whether it will fit within a 20 gate delay stage budget.

    If the gate delay count is less than 20, there is "some" sizing of
    those gates which will result in minimum delay.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sat Apr 18 19:23:47 2026
    From Newsgroup: comp.arch

    EricP <ThatWouldBeTelling@thevillage.com> schrieb:

    In fast designs, there is an entire team charged with buffering and
    routing the CLOCK so that every gate in 10^2 mm^2 receives its rising
    edge and falling edge with less than 1 gate of delay 'skew' across
    the whole chip using wires that have more than 1 gate of delay when
    jumping over 30 gates. CRAY-1 had less than 1ns of clock skew in a
    machine the size of restaurant refrigerator using wires with 2ns/foot
    of delay. In ASIC designs, we assume (starting out) that there will
    be 1/2 clock of skew in the 'clock'

    The part I don't see is the rules for combinatorial gates.
    There also seem to be combinatorial gates like XOR or AND-OR-INV or MUX
    where multiple gates are combined in one but at a lower gate delay.

    For example, in TTL an XOR or a 2:1 or 4:1 mux has 3 or 4 gate delays
    because it really is an INV, an AND and a OR,
    but in CMOS those seem to be just 1 or 1.5 gate delays.

    There is the method of logical effort, see https://en.wikipedia.org/wiki/Logical_effort . I have not made
    much effort to do calculations using that method.

    An alternative would be to use an actual library as an example.
    A company called Nangate released an open-sourced library (google
    for NangateOpenCellLibrary_typical.lib ), based on a 45 nm process,
    for which delay calculations can be done as example, for example
    using Berkeley ABC. That program can also do optimiztations
    (although it cannot handle gates with more than one input, such as
    full adders, and has weaknesses in stability). I haven't tried to
    model wire delays with this.

    In CMOS sometimes one is able to smoosh gates together and eliminate
    gate delays, but the rules for when smooshing is allowed are not
    obvious to me. I just assumed that it all sorts out in SPICE simulation.

    AOI and friends also work in TTL, I believe.

    I find this makes it more difficult to just look at a CMOS logic circuit
    and know whether it will fit within a 20 gate delay stage budget.

    An interesting question :-)
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Sat Apr 18 15:05:10 2026
    From Newsgroup: comp.arch

    On 4/18/2026 3:56 AM, David Brown wrote:
    On 17/04/2026 19:37, BGB wrote:
    On 4/17/2026 8:01 AM, David Brown wrote:
    On 17/04/2026 14:11, Bill Findlay wrote:
    On 17 Apr 2026, David Brown wrote
    (in article <10rsik4$287kv$1@dont-email.me>):

    On 16/04/2026 20:15, BGB wrote:

    Could be wrong, this is from memory and stuff I heard on the
    internet.

    You should be a lot more careful about what you watch on the internet. >>>>
    Bravo! You put a lot more work into that than I could
    stomach in response to Fuck News talking points.
    Thank you.


    I've "known" BGB for many years on Usenet.-a He is very intelligent,
    but sometimes a bit too quick to trust the wrong sources.-a I hope I
    can encourage him to be more careful about what to trust and what not
    to trust.-a (And that includes not believing things just because /I/
    say so either - I can be wrong too.)

    Fun fact - I heard about the "ground.news" site I mentioned because
    they sponsored some Youtube videos I have watched :-)


    Not usually been one for fact checking, as my interest areas are still
    (mostly) technical (rather than political or legal), so whatever
    political leaning I would have mostly shouldn't matter.


    Fact checking is also important in technical fields!


    It is mostly information gathering and testing though.

    If one has a crazy idea, one can test it, and if it is a bad idea, it generally doesn't work.


    But while you might not be particularly interested in politics, politics
    and other aspects of societies and countries around the world still
    affect you.-a It is still good to have some rough ideas about what is
    going on in the world - and how to tell if something is true or not (or
    at least likely to be true or not).


    Can usually do searches, but if searches mostly seem to agree with an
    idea, or there is little to say that it is wrong, well...

    Could have maybe done searches before the posts that started all this,
    but alas.

    Was just like "AFAIK", since it was low confidence, "YOLO, I guess...".




    Sometimes, one can be lazy and ask AI models like Gemini or Grok or similar.

    Gemini's response was mostly in saying that the issue was mostly about
    people carrying things with sharp points rather than owning them, so say (gist):
    Pointed knife in kitchen: OK;
    Pointed knife in hand when walking around: Bad;
    Blunt tipped knife in hand while walking: OK;
    ...

    Apparently, maybe OK if the knife/scissors/etc having a cable/chain to
    the table to prevent someone from walking off with it (sorta like the
    pens chained to the tables in some public settings).

    Would have also tracked with the previous implication of people still
    needing blunt tipped scissors (say, if they needed something they could
    freely move from one location to another, ...).

    With I guess, exceptions for formal/ceremonial stuff, where people could
    still carry things with pointed tips (like swords or ceremonial daggers
    or such).

    I think, folding the question of solder into the mix, was also mention
    that one may need blunt tipped soldering irons to not risk running
    against the law.

    Well, and that there were mostly restrictions on lead/tin solder rather
    than against solder in general.

    ...

    Was, like, "OK..."


    Had noted that one UK based guy on YouTube ("Explaining Computers") had frequently used pointed scissors, was not sure if this was some sort of low-level crime, or if the scissors were "grandfathered in" or something
    (say, if one had them from whenever the restriction goes into effect, a
    person could still keep using it; sorta like the cars that had run on
    Leaded Gas...).

    Another YouTuber based in Australia ("Dave Jones") liked to open
    packages with an oversized knife (with pointed tip), and had a few times
    waved the knife around and made comments along the lines of "this is for
    the UK viewers", seemingly implying its status as a banned artifact.


    Though there were pictures of there being spikes on park benches and
    similar, but this wasn't entirely inconsistent if the restriction is
    more one of the points being mobile (since on a park bench, it doesn't
    move). People don't like there being spikes on park benches, which also follows, ...


    But, yeah, if this is not how it works, fair enough.
    Off hand, it was like, what information I had seen had pointed in a
    different direction.

    Didn't think to ask how far it extended, like whether or not such
    restrictions applied to the shape of the tines on forks, etc. Though,
    IME, most forks tend to have square-tipped tines (and one with
    sharp-tipped tines would be a needless safety hazard), so maybe this was
    N/A.

    ...



    Can note that where I am living, there are no such restrictions of this
    sort.

    There are different sorts of restrictions though, like someone not being allowed to carry guns or similar into places like schools and
    courthouses (and concealed carry requires a permit, ...).


    Likewise, until fairly recently, weed was illegal...

    Then it changed, and now the place is almost overrun with stores selling it.





    But, yeah, sometimes it gets a little sketchy if it is just someone
    rambling on top of gameplay footage, like Subway Surfers, or similar.
    Back in the day, we had Audiosurf, but seems that Subway Surfers has
    overtaken this role.

    Some people have also used games like COD or Minecraft, but watching
    someone play these games from a first person POV for any length of
    time causes motion sickness.

    At one point, people doing videos with talking over the top of clips
    of Skibidi Toilet was pretty popular (during the high point of Skibidi
    Toilet, but its popularity seems to have weakened as of late).


    The video of the guy ranting about CA taking away personal ownership
    of 3D printers and CNC machines was over the top of video of him
    working on one of his craft projects. Usually though, for stuff like
    this, there would be repetition from multiple sources (in a case of
    "one guy saying something, possibly all just nothing, many people
    saying the same thing, possibly true" sense).


    One particular politician is famous for prefacing many of his boldest
    and most absurd lies with "many people are saying...".-a Many people can
    say the same thing, and still be wrong.-a (Just look at religion around
    the world.-a Many people say one thing.-a Many people say something
    totally different.-a They can't all be right - many people are wrong.)

    "Proof by repeated assertion" is not a valid argument, whether it is one person saying something many times, or many people saying the same thing.


    Possibly...

    But, OTOH, if one person is saying something any no one else is saying
    it, there is a higher probability that that person had pulled something
    out of thin air.

    And, if something has a wide reaching impact (like, say, authorities
    taking everyone's 3D printers away), one would expect this to make a bit
    more noise on the internet.




    Sometimes there is a lot of stuff with people fighting over "Evolution
    vs Young Earth Creationism" and similar. Sometimes people arguing for
    the Earth being a flat disk (obviously wrong), ...


    I'm glad that you at least consider "flat Earth" to be obviously wrong.
    The same applies to any kind of "young earth" idea.


    some things don't hold up:
    Flat Earth:
    Pretty much all of the evidence is against it;
    Young Earth Creationism:
    Hard to reconcile with geology and physics;
    Depends on a particular interpretation of Genesis,
    but, easier to assert that this is not the correct interpretation.
    A lot of the "alien Conspiracy" stuff:
    Would require implausible levels of coordination to hide,
    if their claims were true;
    Has similar logical problems as "Flat Earth" and similar.
    A lot of the stories about various "cryptids"/etc:
    Alas, if bigfoot/yeti/etc were around,
    there would likely be more confirmed physical evidence.


    Like, people being like "hey, these comets are actually alien spacecraft
    and transmitting coded radio signals" (yeah, no). Like, the last two
    high profile comets that went by both getting people claiming they were
    alien spacecraft, etc.



    Ambiguous areas:
    Ghosts and similar;
    Unlike cryptids, a ghost would not leave physical evidence;
    There are explanations for why ghosts are usually no-show.
    Parapsychology stuff.
    One can assume it is rare, if it exists, rather than widespread;
    There are reasons to believe it fails whenever tested empirically;
    ...

    So, say, unlike on movies/TV, the "true" ESP'ers would be very rare, and
    with a skill-set that can appear or disappear whether or not its
    existence would effect the causal outcomes of measurements (so whenever tested, it would behave as-if it doesn't exist so that causality holds).

    Though, a lot of this falls into the areas of "untestable".


    In my case, I have some anomalous sensory experiences, but most are
    likely explainable in terms of neurology and psychology rather than
    external reality. It can become very difficult to pin down internal
    subjective experiences to known external effects.


    But, otherwise it is similar to a question similar to, say, whether TV
    shows like "Star Trek" can be reconciled with known physics:
    Current answer leans towards no.


    Well, and if by some chance a "haunted location" starts dipping into
    "Scooby Doo" territory, and people go in to look at it objectively,
    chances are they will find a property owner or similar trying to pull a
    fast one. Well, and if you had settings like in the set-ups for various
    horror movies ("No one goes there, and if they do, they don't come
    back!"). Well, paranormal investigator types would be on that stuff, and
    if these people kept disappearing, this itself would likely draw attention.

    ...



    On the other hand:
    Politicians have done something stupid, and now everyone in a given area
    needs to do something stupid as a result to not be seen as a criminal.

    Is much more believable.
    The existence of stupid laws is more more well known and observable.



    All sorts of stuff going on...

    Sometimes does wander into international politics territory.




    Sort of reminded how for a while there were lots of YouTube sponsored
    segments for a company claiming to sell Lordship status...


    It is possible to argue that believing the nonsense you have heard about
    is harmless - though I would say making yourself look foolish can be considered "harm".-a But scams con people out of real money.-a Fair enough if it is a small amount of money, and clearly nonsense, bought as a joke
    - like buying insurance against alien kidnapping.-a Please be careful
    about any kinds of scams you come across.


    Would not likely go for something like this anyways, as whether or not
    it were true, some might call it "douchemaxing".


    Or, say, "You finna get tha drip going with that skibidi douchemax
    rizz?" or such ("Well, I think I just might.", proceeds to get an overly
    large purple and gold faux fur robe or similar).

    ...




    Otherwise, in terms of code density XG3 has started beating out both XG1
    and RV64GC+Jx in some cases in terms of code density, and I am left to realize:
    I can't explain how.

    The deltas in instruction count seem to be lower than what could fully
    account for the absence of 16-bit instructions.


    So, say:
    XG3 vs RV64GC+Jx:
    XG3: ~ 11% fewer instructions
    RV64GC+Jx: ~ 40% of the instructions become 16 bit (saving ~ 20%).

    Assuming a simplified reference case of 75K instructions:
    Reference Case: 300K at 4B/Instr
    16/32 case : 240K
    11% fewer case: 267K

    Projected winner: 16/32 (or, in this case, RV64GC).

    Actual model mode complicated in that this case is 16/32/64 vs 32/64/96.


    Though, at present, this quirk has mostly appeared with Heretic and
    ROTT. But, the actual stats don't appear much different, and it seems
    like "back of the envelope" RV64GC would still be expected to hold the lead.


    Still requires more analysis it seems.

    ...

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat Apr 18 21:02:58 2026
    From Newsgroup: comp.arch


    BGB <cr88192@gmail.com> posted:

    On 4/18/2026 3:56 AM, David Brown wrote:
    -----------------------
    "Proof by repeated assertion" is not a valid argument, whether it is one person saying something many times, or many people saying the same thing.


    Possibly...

    But, OTOH, if one person is saying something any no one else is saying
    it, there is a higher probability that that person had pulled something
    out of thin air.

    This coming from the one person here who does not subscribe to the
    minutia of IEEE 754 while accepting the formats and arithmetic defini-
    tions.
    --------------

    some things don't hold up:
    Flat Earth:
    Pretty much all of the evidence is against it;
    Young Earth Creationism:
    Hard to reconcile with geology and physics;
    Depends on a particular interpretation of Genesis,
    but, easier to assert that this is not the correct interpretation.
    A lot of the "alien Conspiracy" stuff:
    Would require implausible levels of coordination to hide,
    if their claims were true;
    Has similar logical problems as "Flat Earth" and similar.
    A lot of the stories about various "cryptids"/etc:
    Alas, if bigfoot/yeti/etc were around,
    there would likely be more confirmed physical evidence.
    deNorms are not needed in well written FP arithmetic
    Flush to Zero is perfectly acceptable tool
    ...

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Sun Apr 19 01:08:23 2026
    From Newsgroup: comp.arch

    On Sat, 18 Apr 2026 21:02:58 GMT
    MitchAlsup <user5857@newsgrouper.org.invalid> wrote:

    BGB <cr88192@gmail.com> posted:

    On 4/18/2026 3:56 AM, David Brown wrote:
    -----------------------
    "Proof by repeated assertion" is not a valid argument, whether it
    is one person saying something many times, or many people saying
    the same thing.

    Possibly...

    But, OTOH, if one person is saying something any no one else is
    saying it, there is a higher probability that that person had
    pulled something out of thin air.

    This coming from the one person here who does not subscribe to the
    minutia of IEEE 754 while accepting the formats and arithmetic defini-
    tions.

    I don't know what exactly do you mean by "subscribe" and "minutia", but
    if you mean what I am guessing you mean then my own position is pretty
    close to that.
    More precisely, I think IEEE 754 formats and definitions of basic ops
    are mostly great, with exception of omission of very useful rsqrt
    primitive.
    I think that 90% of IEEE 754 exception are useless crap and in one or
    two places it's worse than that.
    I think that effort-to-reward ratio of non-default rounding modes is
    pretty low and the choice of mandatory non-default modes is sub-optimal.
    I think that common practice of FP control and FP status shared
    across different supported precisions is wrong. Although, in this case
    it's more a fault of language bindings of 754 rather than of 754 itself.
    But 754 leaves the issue underspecified which is also no good.

    So, if BGB shares my view then there are already two of us.



    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.arch on Sun Apr 19 00:49:51 2026
    From Newsgroup: comp.arch

    In article <lVeER.284152$4wI6.209127@fx24.iad>,
    Scott Lurndal <slp53@pacbell.net> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <NdaER.1511$r_k6.609@fx38.iad>,
    Scott Lurndal <slp53@pacbell.net> wrote:
    <snip>

    [*] recently transferred to the CH-53E fleet due to the imminent >>>>>retirement of the F-16C fleet.

    The Marine Corps doesn't fly the F-16. :-) Perhaps you mean
    the F/A-18 or the Harrier?

    Brain fart. His squadron flys the F/A-18C and D models. The
    E's and F's will remain in the active fleet along with the F-35,
    but the final days of the C and D models are in sight.

    No problem; I can see wanting to switch over to helos from fixed
    wing. It's a different world.

    His eventual goal is to get his A&P. Figures helos will be good
    experience.

    Nice. I'm sure he's absolutely correct that it's good
    experience for getting into the airlines once he's out of the
    Corps.

    I got to visit the flight line in 2024, very interesting.

    Cool.

    Prior visit to a Marine base was at 29 Palms in the 1980s, visiting
    a cousin. He was living on-base in married housing and told
    me to avoid the well-lit compound several miles east of the housing area, which
    was secured and managed by the NOP.

    Ah, the stumps. I remember the first time I got there, stepping
    off a bus (we'd just flown from NC, having completed post Parris
    Island training at Camp Lejeune) and immediately seeing tumble
    weed blowing down the main drag. "Oh my god; I have to stay
    here for a YEAR?!" (This was before I became an officer.)

    I wonder which part your cousin meant; perhaps Camp Wilson,
    which is an active training area (and pretty much nothing else,
    though there is a very small PX there selling pogey bait).


    I just looked on google maps and they've pretty much censored
    the entire base in both the map and satellite views.

    Probably out of embarassment...one of the most prominent
    featues of 29 Palms is "Lake Bandini": the waste-water treatment
    facility at the bottom of the hill that the base is on. (The
    old joke goes, "Don't eat the fish out of Lake Bandini"). :-D

    - Dan C.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.arch on Sun Apr 19 01:02:36 2026
    From Newsgroup: comp.arch

    In article <10rsktu$287kv$2@dont-email.me>,
    David Brown <david.brown@hesbynett.no> wrote:
    On 16/04/2026 18:31, Dan Cross wrote:
    In article <10rqrkf$1nbrp$1@dont-email.me>,
    David Brown <david.brown@hesbynett.no> wrote:
    On 16/04/2026 13:59, Dan Cross wrote:
    In article <10rqag7$1in0h$1@dont-email.me>,
    David Brown <david.brown@hesbynett.no> wrote:
    [snip]
    (The service members have all their other equipment at
    home too - they store their own uniforms and other stuff, and are
    responsible for washing and repairs or ordering replacements after the >>>>> exercises.)

    The same is true in the US military. I always found it annoying
    that I had to make space for my issued equipment at home, but
    c'est la vie.

    Personally, I have no military experience at all (not counting school
    cadets). But one of my sons joined the National Guard after his
    military service. He has a neat solution to problem of space for his
    equipment - he keeps it in his old bedroom in our house, not in his own
    flat :-)

    Seems like your son should take his obligations a bit more
    seriously: keeping his equipment at your house, when it's
    supposed to be in his dwelling, sounds like a rules violation.


    I don't know the details of the regulations, but I can assure you that
    it is entirely within them. As a student, his flat is considered
    temporary accommodation and our house is his permanent home. His
    National Guard base is in our area, not where he is a student.

    Maybe the regulations in the USA are different. Maybe there are
    different standards about how quickly you can be called up and need to >deploy.

    Misrepresenting your home address, such as using your parents'
    home when you don't actually live there, temporary accommodation
    or not, is not something the US military looks upon favorably.

    What's good for the goose is good for the gander.

    It also means that there are less guns stored in
    concentrated places as potential targets for robberies. And in the
    event of a real invasion, it's a lot easier to smugle around and
    distribute firing pins to service members than to pass around guns from >>>>> a central armoury.

    There are tradeoffs, however: it is also easier for a bad actor
    to source a weapon by breaking into a private home.

    It is plausible, but AFAIK it is extremely rare here. The solid
    majority of criminals here don't want guns, and would probably not take
    one if they found one in a house they were robbing. As a house burglar, >>> you can't easily sell a gun, you don't need one for defence, you don't
    need one to threaten anyone - it just increases your chance of being
    shot yourself, and increases your punishment if you get got. There are
    guns in the more serious narcotics gangs, but those are handguns - they
    have no use for military weapons.

    FWIW, handguns are military weapoons.

    Sorry - I meant pistols, rather than rifle-sized weapons. (Of course >pistols are also used in the military.)

    The "9mm handgun" I referred to is a pistol. It's the standard
    issue sidearm for officers in the US Marine Corps.

    [snip]
    Of course there, is a continuum of force, and despite recent
    idiots in charge of the US military asserting otherwise, the
    rules of engagement and the laws of warfare are taken _very_
    seriously.

    It's nice to know - especially with the current muppets at the top of
    the USA chain.

    Yes, they are horrible.

    [snip]
    If shots need to be fired, the primary aim should
    be to persuade the bad guys to surrender, not to kill them.

    Is it better if the enemy surrenders? Sure. But this idea that
    you are going to shoot to wound in a combat scenario is not
    realistic. As you said, it's not like in the movies.

    I think that there has been a bit of a disparity in the situations we
    have been imagining. I agree entirely that shooting to wound in a
    combat situation is not realistic. It just seemed to me that you were >suggesting armoury guards were moving to combat mode a lot more quickly
    than I thought appropriate.

    [snip]

    Just to be clear here - mutual inspections are fine and often a good
    thing. It is the idea of standing in line while a commander of some
    sort pats you down that is not. Maybe I just misunderstood what you
    were saying.

    Some of your comments, such as repeatedly asserting the
    ridiculous notion that people are "bribing" armorers so they can
    avoid "doing their jobs" lead me to wonder whether you are
    deliberately misrepresenting what I am saying so that you can
    feel yourself morally superior to an American.

    I have, I think, been patient with my responses, but in the this
    and my previous message, my patience is slipping.

    - Dan C.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Robert Finch@robfi680@gmail.com to comp.arch on Sat Apr 18 21:04:02 2026
    From Newsgroup: comp.arch

    On 2026-04-18 6:08 p.m., Michael S wrote:
    On Sat, 18 Apr 2026 21:02:58 GMT
    MitchAlsup <user5857@newsgrouper.org.invalid> wrote:

    BGB <cr88192@gmail.com> posted:

    On 4/18/2026 3:56 AM, David Brown wrote:
    -----------------------
    "Proof by repeated assertion" is not a valid argument, whether it
    is one person saying something many times, or many people saying
    the same thing.

    Possibly...

    But, OTOH, if one person is saying something any no one else is
    saying it, there is a higher probability that that person had
    pulled something out of thin air.

    This coming from the one person here who does not subscribe to the
    minutia of IEEE 754 while accepting the formats and arithmetic defini-
    tions.

    I don't know what exactly do you mean by "subscribe" and "minutia", but
    if you mean what I am guessing you mean then my own position is pretty
    close to that.
    More precisely, I think IEEE 754 formats and definitions of basic ops
    are mostly great, with exception of omission of very useful rsqrt
    primitive.
    I think that 90% of IEEE 754 exception are useless crap and in one or
    two places it's worse than that.
    I think that effort-to-reward ratio of non-default rounding modes is
    pretty low and the choice of mandatory non-default modes is sub-optimal.
    I think that common practice of FP control and FP status shared
    across different supported precisions is wrong. Although, in this case
    it's more a fault of language bindings of 754 rather than of 754 itself.
    But 754 leaves the issue underspecified which is also no good.

    So, if BGB shares my view then there are already two of us.



    Lack of knowledge of the minutia may be preventing a decent
    implementation of the standard for my CPU. For example, the CPU has a
    number of floating-point ops that do not update the status register: sign-inject, compares, convert from lower to higher precision and
    others. I am going by common sense, if an instruction cannot exception,
    then the flags are not affected. But common sense does not always prevail.

    The IEEE 754 standard seems somewhat inaccessible to me. How do I join
    IEEE if I am not a student or a professional engineer? Otherwise, how do
    I get access to IEEE docs?

    Thus the EIII 457 standard was bornrCa

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.arch on Sat Apr 18 20:51:49 2026
    From Newsgroup: comp.arch

    John Levine <johnl@taugh.com> writes:

    According to quadi <quadibloc@ca.invalid>:

    On Wed, 15 Apr 2026 16:44:04 +0000, John Levine wrote:

    Then in the 1960s some well organizaed revisionists ignored what
    it says, pretended it meant an individual right to have guns
    everywhere, and managed to find a majority of right wing supreme
    court justices willing to sign on.

    I'm afraid that I can't agree with you on this. ...

    Of course, it's possible subordinate clauses were used differently
    back in the eighteenth century, but I'd need evidence to buy into
    that theory.

    The evidence is that for over 150 years, everyone agreed that it meant
    state militias. There were two Supreme Court decisions in 1876 and
    1886 that confirmed the rights of states to regulate militias, one in
    1939 saying that a sawed off shotgun wasn't the kind of arm that the
    2nd was intended to protect, and one in 1980 confirming that it was OK
    for states to forbid convicted felons from owning guns.

    I'm not aware of anyone claiming it was an individual right that the
    states could not regulate until the 1960 revisionists, and no court
    decision until Heller in 2008 which reversed the previous century and
    a half's precedent. Heller was decided 5-4, over strong dissents.

    Has anyone seen comp.arch around here somewhere? I seem to have
    wandered into rec.guns.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Sun Apr 19 03:54:40 2026
    From Newsgroup: comp.arch

    On 4/18/2026 5:08 PM, Michael S wrote:
    On Sat, 18 Apr 2026 21:02:58 GMT
    MitchAlsup <user5857@newsgrouper.org.invalid> wrote:

    BGB <cr88192@gmail.com> posted:

    On 4/18/2026 3:56 AM, David Brown wrote:
    -----------------------
    "Proof by repeated assertion" is not a valid argument, whether it
    is one person saying something many times, or many people saying
    the same thing.

    Possibly...

    But, OTOH, if one person is saying something any no one else is
    saying it, there is a higher probability that that person had
    pulled something out of thin air.

    This coming from the one person here who does not subscribe to the
    minutia of IEEE 754 while accepting the formats and arithmetic defini-
    tions.

    I don't know what exactly do you mean by "subscribe" and "minutia", but
    if you mean what I am guessing you mean then my own position is pretty
    close to that.

    Yeah, and also disagreeing with some details of IEEE 754 semantics is
    not in the same category as belief in things like "Flat Earth" or "Alien Abductions" or similar...


    I suspect I am far from the only person that has asserted that DAZ/FTZ
    is often preferable for optimizing an FPU for implementation cost (or
    that an FPU optimized primarily for speed and logic cost can still be
    useful).

    Though, my stance on these things has softened when I have noted that
    the full version can be supported in software without "too horrible"
    impact to performance on a naive implementation. This mainly requires
    the FPU to consistently have a way to detect and trap in cases where
    full semantics are requested but could not be delivered natively.

    Granted, it is tradeoffs, and there are corner cases where things could
    get ugly if not handled well.


    More precisely, I think IEEE 754 formats and definitions of basic ops
    are mostly great, with exception of omission of very useful rsqrt
    primitive.

    Yeah, rsqrt can be useful.

    Say:
    1.0/sqrt(x)
    Being one of the more common use cases of sqrt, and combining them into
    a single operation could give a significant speedup over two operations
    on an implementation where neither operation, in itself, is all that fast.

    Personally, I would likely add Ssqrt() and Ssqr(), for signed square
    root and signed square.


    But, most of these operations can be defined in terms of the others, and
    some might consider regarding ssqrt and ssqr as the more fundamental
    operators to be unorthodox (and doesn't jive well if one considers the
    real number line as being embedded within the complex plane).

    But, for an FPU, the whole thing exists within an approximation of the
    real number line (and the complex plane effectively doesn't exist).

    Granted, nevermind if some algebraic rules (such as the distributive
    property) would start to break down in this system when these operators
    are involved. For sanity sake, would assume ssqr(x) to be distinct from
    x*x, but ssqr(ssqrt(x)) => x and with a domain of all reals, ...

    Likewise:
    1.0/ssqrt(x)
    Remains well-defined for everything other than x ~= 0.


    In my case, there are a few quick dirty ops as well:
    FSQRTA: Square Root, Approximate
    FRCPA: Reciprocal Approximate


    I think that 90% of IEEE 754 exception are useless crap and in one or
    two places it's worse than that.
    I think that effort-to-reward ratio of non-default rounding modes is
    pretty low and the choice of mandatory non-default modes is sub-optimal.
    I think that common practice of FP control and FP status shared
    across different supported precisions is wrong. Although, in this case
    it's more a fault of language bindings of 754 rather than of 754 itself.
    But 754 leaves the issue underspecified which is also no good.


    Shared FPU status is also a complaint of mine.

    Though, sadly, if one wants to be able to pull off full IEEE math, it is
    to some extent unavoidable.



    I would personally prefer it is there were some way to specify rounding
    mode at the language-level when needed (likely along with a way to
    specify whether it is preferable to have subnormals and similar, or
    whether DAZ/FTZ is acceptable).

    I generally assume RNE as a sensible default mode for fixed-mode
    instructions, with RTZ as a good second place. The other rounding modes
    are more niche and would generally be best used in special cases.

    Using a dynamic rounding mode held in an FPU status register effectively making every use-case worse here:
    Because, nearly everywhere else, RNE is the best option, except for the one-off operations where one wants something else.


    Although RTZ is the cheapest to implement case, IMO it would not be
    acceptable as a default fixed rounding mode because it tends to result
    in a drift-towards-zero which can become very obvious (and if one tries
    to compensate for the drift-towards-zero, then it just as easily becomes
    a drift away from 0). Better to have RNE as default because it doesn't
    result in values drifting over time.

    Well, except when converting to an integer, where generally everything
    expects RTZ and anything other than RTZ is likely to be a crap storm,
    even if for many use-cases, rounding towards negative infinity would
    likely be better for this case since it keeps the number line on an evenly-spaced grid with respect to 0.


    The FPU status register thing turns into a bigger mess if one is trying
    to apply it to things like SIMD and similar. For now, I am ignoring this
    (SIMD is being treated as inherently non-IEEE in this area).


    Sadly, short of the ugliness of dropping a bunch of attribute modifiers
    on C, there is no good way to retrofit this. And, using the existing mechanisms would require making use of the a dynamic rounding mode.


    One thought though is, say:
    Lax FP + and fenv_access=false
    Use quick and dirty ops with a fixed rounding mode;
    Lax FP + and fenv_access=true
    Use dynamic rounding mode;
    Here, IEEE emulation is disabled by default.
    Strict FP (set as a compiler flag):
    Use operations which use dynamic rounding mode;
    Set IEEE emulation to true on init.

    In my ISA, there were different instructions for these cases:
    FADD/FSUB/FMUL: Fixed to RNE and DAZ/FTZ, no flags updates.
    FADDA/FSUBA/FMULA: May also use reduced precision.
    FADDG/FSUBG/FMULG: Uses dynamic rounding mode;
    The Imm5fp/Imm6fp encodings also use dynamic rounding;
    These operations will update the flags.

    Currently, the FDIV, FSQRT, and FMAC ops, which are optional, will also
    be assumed to use dynamic rounding mode rules.

    At present, HW FMAC may exist, but is Double-Rounded when in DAZ/FTZ
    mode. Setting IEEE Mode may enable single-rounded FMAC via trap-and-emulate.


    There was an issue of where to put the FPU status bits in my case:
    For a while, had put it in the high bits of GBR/GP, but this was a
    problem. Reloading this register would tend to stomp the status, doing a save/reload, with the FPU status copied from the old version, would
    introduce dynamic scoping rules. While this worked for a while, it is incompatible with the rules for fenv_access in C, which assume a global
    state and not a dynamically-scoped state.



    So, if BGB shares my view then there are already two of us.


    Did see someone on the RISC-V sig-fp mailing also favoring DAZ/FTZ and
    similar for floating point.

    So, it isn't exactly unheard of...



    Otherwise, still looking at the "OK, why is XG3 now somehow beating
    RV64GC+Jx on code density?" mystery. At present, I still lack a solid explanation for this one.

    Well, unless it is maybe something like 32-bit:
    MOV.L (GP, Disp16u*4), Rd //32 bit (XG3)
    vs:
    J21I + LW Xd, Disp12s(GP) //64 bit (JX)
    LUI + ADD(16b) + LW Xd, Disp12s(Xt) //80 bit (GC)
    LUI + ADD + LW Xd, Disp12s(Xt) //96 bit (G)
    ...

    This was a factor before, but on its own is nothing new.
    Wouldn't expect this to be carried by just having a slightly more
    compact encoding to load/store global variables though.

    Still looking at it, trying to figure out.



    Would be ironic though if using a 32/64/96 bit encoding scheme just so happened to end up as being the most compact option though...


    Then again, say recently where someone was saying that SuperH halved the binary sizes vs its fixed-length 32-bit RISC predecessors, which wasn't
    really true as the limitations of the ISA typically meant needing 60%
    more instructions for similar work.

    Where:
    1.6 * 2 => 3.2 (SH-4 scenario)
    0.8 * 4 => 3.2 (RV-C scenario)
    So, roughly break-even with a 16/32 in terms of code size, except the
    16/32 approach is going to be faster.

    Well, unless one did a RISC with 32 bit encodings but only 2R
    instruction encodings, but this would suck...

    Current models imply I am looking at (for XG3):
    0.9 * 4 => 3.6 (~ 10% fewer instructions)

    Where, 3.6 > 3.2 ...

    But, this may need another fudge factor for the percentage of
    instructions in RV+JX that need jumbo prefixes.


    So: from a program ~ 99k instructions, 3.2k prefixes.
    ... No, this doesn't cover it.

    Checks, RV-C: 19k instructions.
    This is around 19% of instructions, and 19% is less than 40%.

    Recalc:
    0.9 * 4 * 1.03 => 3.7
    And, 3.6 < 3.7, so XG3 wins...

    As, it would appear in this case, RV-C is under-performing the estimate.

    ...

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Sun Apr 19 12:28:10 2026
    From Newsgroup: comp.arch

    On Sun, 19 Apr 2026 03:54:40 -0500
    BGB <cr88192@gmail.com> wrote:


    Did see someone on the RISC-V sig-fp mailing also favoring DAZ/FTZ
    and similar for floating point.

    So, it isn't exactly unheard of...


    As I said above, I think that IEEE 754 definitions for basic ops are
    great.
    I strongly oppose "flush subnormals to zero".

    I am not sure what is DAZ. Does it abbreviate "De-normals are zero" ?
    I.e. subnormal not only never produced as result of arithmetic ops but
    also silently converted to zero when taken as input?
    If that what DAZ means then I oppose it even stronger than FTZ.



    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch on Sun Apr 19 12:56:52 2026
    From Newsgroup: comp.arch

    On 19/04/2026 03:02, Dan Cross wrote:

    Some of your comments, such as repeatedly asserting the
    ridiculous notion that people are "bribing" armorers so they can
    avoid "doing their jobs" lead me to wonder whether you are
    deliberately misrepresenting what I am saying so that you can
    feel yourself morally superior to an American.


    No, that was not my intention at all - not remotely. I had interpreted
    your original comments as suggesting that you could bribe the armourer
    to ignore that you had not done your duty properly, and for an "under
    the table" fee he would cover up for you. That is very different from
    paying him for a service, which is how I now understand it after your follow-up comments.

    This misinterpretation may well have come at least partly due to
    cultural differences between America and Norway. (A key difference is
    tipping culture - paying cash tips is standard in a wide range of circumstances in the USA, but very rare in Norway. In the USA, a
    similar concept could perhaps extend to paying for a service from an
    armourer. In Norway, paying cash directly to the armourer would
    definitely be highly suspicious). But those are differences, not superiorities in either direction.

    Of course I can prefer the way things work here, but preferences are not objective. Even when we can look at comparisons of factual, objective measures between societies (say, the rates of gun deaths in different countries), these should not be used without understanding a wider
    context. And they are not at all personal - an average Norwegian is statistically less likely to be shot than an average American, but it is
    not because of anything /I/ do or anything /you/ do. There is no
    question of "moral superiority" involved.

    I have, I think, been patient with my responses, but in the this
    and my previous message, my patience is slipping.


    It is best to close of this thread branch. I have found your posts on
    the topic informative and they have given me a better understanding of
    some things, especially some aspects of military practice in the USA,
    and they have corrected some of my misunderstandings.



    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.arch on Sun Apr 19 13:35:17 2026
    From Newsgroup: comp.arch

    In article <10s2cdk$3t2hf$1@dont-email.me>,
    David Brown <david.brown@hesbynett.no> wrote:
    On 19/04/2026 03:02, Dan Cross wrote:
    [snip]
    I have, I think, been patient with my responses, but in the this
    and my previous message, my patience is slipping.

    It is best to close of this thread branch. [...]

    Very well. I'll accept your words on their face, and as you and
    others have pointed out (apologies to John Levine and Tim
    Rentsch) this is all wildly off-topic. Not uncommon in
    comp.arch, I'm afraid.

    Mitch's recent post about gate delays was far more interesting
    to me than anything related to the military.

    - Dan C.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Sun Apr 19 17:50:16 2026
    From Newsgroup: comp.arch

    Dan Cross wrote:
    In article <10rsktu$287kv$2@dont-email.me>,
    David Brown <david.brown@hesbynett.no> wrote:
    I don't know the details of the regulations, but I can assure you that
    it is entirely within them. As a student, his flat is considered
    temporary accommodation and our house is his permanent home. His
    National Guard base is in our area, not where he is a student.

    Maybe the regulations in the USA are different. Maybe there are
    different standards about how quickly you can be called up and need to
    deploy.

    Misrepresenting your home address, such as using your parents'
    home when you don't actually live there, temporary accommodation
    or not, is not something the US military looks upon favorably.

    Didn't you read his comment? Here in Norway, any student's home address
    is considered to be wherever she/he lived before starting to study.

    The only exception is if you, like I did, actually buy a flat/apartment
    near the university, at that point this was considered my primary residence.

    Terje
    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sun Apr 19 18:18:12 2026
    From Newsgroup: comp.arch


    Tim Rentsch <tr.17687@z991.linuxsc.com> posted:

    John Levine <johnl@taugh.com> writes:

    According to quadi <quadibloc@ca.invalid>:

    On Wed, 15 Apr 2026 16:44:04 +0000, John Levine wrote:

    Then in the 1960s some well organizaed revisionists ignored what
    it says, pretended it meant an individual right to have guns
    everywhere, and managed to find a majority of right wing supreme
    court justices willing to sign on.

    I'm afraid that I can't agree with you on this. ...

    Of course, it's possible subordinate clauses were used differently
    back in the eighteenth century, but I'd need evidence to buy into
    that theory.

    The evidence is that for over 150 years, everyone agreed that it meant state militias. There were two Supreme Court decisions in 1876 and
    1886 that confirmed the rights of states to regulate militias, one in
    1939 saying that a sawed off shotgun wasn't the kind of arm that the
    2nd was intended to protect, and one in 1980 confirming that it was OK
    for states to forbid convicted felons from owning guns.

    I'm not aware of anyone claiming it was an individual right that the
    states could not regulate until the 1960 revisionists, and no court decision until Heller in 2008 which reversed the previous century and
    a half's precedent. Heller was decided 5-4, over strong dissents.

    Has anyone seen comp.arch around here somewhere? I seem to have
    wandered into rec.guns.

    Hiding behind cover in the back of the room...
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Sun Apr 19 14:37:49 2026
    From Newsgroup: comp.arch

    On 2026-Apr-18 13:59, MitchAlsup wrote:

    EricP <ThatWouldBeTelling@thevillage.com> posted:

    On 2026-Apr-17 21:11, MitchAlsup wrote:

    Thomas Koenig <tkoenig@netcologne.de> posted:

    John Levine <johnl@taugh.com> schrieb:

    But let's get back to gate delays, please.

    Let's.

    How do people actually count gate delays, and how useful is it?
    Different gates have different delays (obviously), so counting
    an inverter the same as a three-input NOR gate (independent of
    fan-out, even) seems to be a large simplification which may be
    useful for a fairly rough approximation, but not that much better.

    Or am I missing something?

    There is the "standard" FO4 counting scheme where 1 gate drives
    4 other gate inputs, and in this scheme, a D-type flip-flop was
    2.5 gates of delay.

    As one can guess, gates can be sized: as small as obeys the FAB
    design rules, to <basically> as big as one can afford. Naturally,
    as gates get bigger, they can drive bigger loads--BUT they also
    present bigger loads to the gates driving them.

    Conway figured out that the fastest way to "buffer up" a signal
    was to use inverters staged in the ratio of 1:e:e^2:e^3... with
    e being the standard 2.7... base of natural logarithms. Rounding
    e up to 3 degrades speed by less than 1%, rounding up to 4 only
    slows down 10% or so--so, most buffering is done at FO4, where
    a minimum sized gate drives an inverter 4|u as big which would then
    drive another inverter 16|u as big...

    Each transistor between a power connection and the signal connec-
    tion basically, adds its own transconductance to the electrical path
    (ignoring body effect). Knowing that deMorgan's laws apply; we
    instantly see that a Nand gate is simply a Nor gate with inverted
    inputs. A Nand gate has its serial string of FETs between signal
    and ground and a parralel path from signal to Vdd, while a Nor has
    its serial FETs between signal and Vdd and its parallel path between
    signal and ground. To deal with these serial paths, the transistors
    are lengthened; a 2Nand has N-channels 2|u as wide and can use 1|u
    P channels, a 3Nand has 3|u N-channels and still 1|u P-channels, ...
    {Nors are similar but reverse Ns and Ps} Somewhere along the line,
    the parallel path FETs have to get lengthened because the capacitance
    of all the serial path diffusion capacitance (to maintain rather
    equal pull up and pull down).

    Soon, one realizes that one needs SPICE simulation with accurate
    models to push the edge--just like when pushing the Young's Modulus
    in engineering models.

    In fast designs, there is an entire team charged with buffering and
    routing the CLOCK so that every gate in 10^2 mm^2 receives its rising
    edge and falling edge with less than 1 gate of delay 'skew' across
    the whole chip using wires that have more than 1 gate of delay when
    jumping over 30 gates. CRAY-1 had less than 1ns of clock skew in a
    machine the size of restaurant refrigerator using wires with 2ns/foot
    of delay. In ASIC designs, we assume (starting out) that there will
    be 1/2 clock of skew in the 'clock'

    The part I don't see is the rules for combinatorial gates.
    There also seem to be combinatorial gates like XOR or AND-OR-INV or MUX
    where multiple gates are combined in one but at a lower gate delay.

    For example, in TTL an XOR or a 2:1 or 4:1 mux has 3 or 4 gate delays
    because it really is an INV, an AND and a OR,
    but in CMOS those seem to be just 1 or 1.5 gate delays.

    A 4:1 Mux is a 2222AOI gate--one Ssllooww gate--but still 1 gate.

    One slow gate is generally faster than 2 gates (and less power)
    because only 1 signal has to move (Vdd->gnd or gnd->Vdd) instead
    of more than one. Each signal moving is limited by the transcon-
    ductance of the FET stack and the capacitance being driven. We
    call the rise/fall time the edge speed.

    In CMOS sometimes one is able to smoosh gates together and eliminate
    gate delays, but the rules for when smooshing is allowed are not
    obvious to me. I just assumed that it all sorts out in SPICE simulation.

    Almost always be deMorganizing the logic.

    I'm referring to where you merge gates together.
    For example, an XOR is (A nand (not B)) nand ((not A) nand B)
    which is 2 INV (4T) and 3 NAND gates (12T) with a total of
    16 transistors and a delay of 3.
    The 3 NAND can merge into a single gate of 8 transistors
    and a total of 12T and a delay of 2.

    Presumably this merging of gates can continue to some point
    but what that point is isn't clear to me.
    That makes it difficult to look at a logic diagram and
    know how many gates are going to merge that way.




    I find this makes it more difficult to just look at a CMOS logic circuit
    and know whether it will fit within a 20 gate delay stage budget.

    If the gate delay count is less than 20, there is "some" sizing of
    those gates which will result in minimum delay.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.arch on Sun Apr 19 19:04:19 2026
    From Newsgroup: comp.arch

    In article <10s2tjo$2pko$1@dont-email.me>,
    Terje Mathisen <terje.mathisen@tmsw.no> wrote:
    Dan Cross wrote:
    In article <10rsktu$287kv$2@dont-email.me>,
    David Brown <david.brown@hesbynett.no> wrote:
    I don't know the details of the regulations, but I can assure you that
    it is entirely within them. As a student, his flat is considered
    temporary accommodation and our house is his permanent home. His
    National Guard base is in our area, not where he is a student.

    Maybe the regulations in the USA are different. Maybe there are
    different standards about how quickly you can be called up and need to
    deploy.

    Misrepresenting your home address, such as using your parents'
    home when you don't actually live there, temporary accommodation
    or not, is not something the US military looks upon favorably.

    Didn't you read his comment?

    Yes, I did. Did you read the rest of the thread?

    Here in Norway, any student's home address
    is considered to be wherever she/he lived before starting to study.

    The only exception is if you, like I did, actually buy a flat/apartment
    near the university, at that point this was considered my primary residence.

    Great. Now, perhaps, we can let this subthread go.

    - Dan C.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Sun Apr 19 14:14:39 2026
    From Newsgroup: comp.arch

    On 4/19/2026 4:28 AM, Michael S wrote:
    On Sun, 19 Apr 2026 03:54:40 -0500
    BGB <cr88192@gmail.com> wrote:


    Did see someone on the RISC-V sig-fp mailing also favoring DAZ/FTZ
    and similar for floating point.

    So, it isn't exactly unheard of...


    As I said above, I think that IEEE 754 definitions for basic ops are
    great.
    I strongly oppose "flush subnormals to zero".



    I also agree with the formats and basic ops, main sticking point is that subnormals make some things more complicated and expensive for hardware.


    Main problem case in practice (where differences between subnormals and
    FTZ semantics becomes visible) involves dividing something by a value
    very close to zero. This requires alternate handling to get correct
    results (the naive x*y => x*(1.0/y) strategy no longer works).

    Though, a naive (if slower) option is to handle the divide as before,
    but have the divide operator quietly promote things to Binary128 (with
    the final result back-converted to Binary64).

    While implementing the full FDIV operator using N-R would also work in
    this case, in a "trap-and-emulate on subnormal numbers" strategy, it is
    slower than going through Binary128.



    But, as noted, one can go either way here.

    My current leaning is to support subnormals at the ISA level (so, it may appear to software as-if the FPU has full IEEE FPU operations), but with
    some traps and restrictions needed to get IEEE results (there are some FPU-related instructions which exist in XG1/2/3 which will need to be essentially disallowed in strict-FP mode).

    It can then be chosen at compile time whether the preference is to have
    more accurate math or better performance.


    But, I may need to devise some sort of test program to validate that
    results in this mode are correct.


    I am not sure what is DAZ. Does it abbreviate "De-normals are zero" ?
    I.e. subnormal not only never produced as result of arithmetic ops but
    also silently converted to zero when taken as input?
    If that what DAZ means then I oppose it even stronger than FTZ.


    DAZ/FTZ: Means "Denormals-As-Zero" / "Flush-To-Zero"

    So, yeah... This trades mathematical purity for cheapness.



    There is a potentially cheaper-still option that could maybe be called
    "Clamp Exponent to Zero" (don't know an official term off-hand,
    apparently may also be called "Dirty Underflow" or similar). Generally
    no one uses this though, as this one crosses an "unacceptable level of
    suck" threshold.

    Say:
    (x*0.0) == (y*0.0)
    Being true with either IEEE or DAZ+FTZ, but would not be true with
    DAZ+CEZ. Could paper over this, but then one is just pushing the costs
    around from one place to another.

    Though, potentially, this is one minor case where having an FTZ mode
    adds cost:
    An implementation that solely used trap-and-emulate for full IEEE
    behavior could use CEZ behavior and it would have been hidden. Whereas
    FTZ means that one needs logic to detect that the exponent has gone out
    of range and to force the mantissa to 0 (but would have still needed
    this logic to deal with overflow to Inf).


    The cheapest options, as noted:
    Fixed RTZ (round towards zero);
    DAZ+CEZ
    Handles FSUB using ones-complement arithmetic;
    ...


    These may go below a certain minimum threshold of acceptable except for specifically low-precision SIMD operations (such as Binary16).

    But, even for contexts where one is using Binary32, this is frequently unacceptable.


    So, for SIMD, I mostly ended up going with Fixed-RNE, DAZ+FTZ, and Two's Complement FSUB, mostly because the alternative was poor even for SIMD.

    2.0 - 3.0 => -0.999999

    Is kinda obvious in its suck.

    So, regardless of exact strategy, things like 2.0 - 3.0 should still
    ideally give an exact -1.0 ...


    ...


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Sun Apr 19 16:04:25 2026
    From Newsgroup: comp.arch

    On 2026-Apr-18 15:23, Thomas Koenig wrote:
    EricP <ThatWouldBeTelling@thevillage.com> schrieb:

    In fast designs, there is an entire team charged with buffering and
    routing the CLOCK so that every gate in 10^2 mm^2 receives its rising
    edge and falling edge with less than 1 gate of delay 'skew' across
    the whole chip using wires that have more than 1 gate of delay when
    jumping over 30 gates. CRAY-1 had less than 1ns of clock skew in a
    machine the size of restaurant refrigerator using wires with 2ns/foot
    of delay. In ASIC designs, we assume (starting out) that there will
    be 1/2 clock of skew in the 'clock'

    The part I don't see is the rules for combinatorial gates.
    There also seem to be combinatorial gates like XOR or AND-OR-INV or MUX
    where multiple gates are combined in one but at a lower gate delay.

    For example, in TTL an XOR or a 2:1 or 4:1 mux has 3 or 4 gate delays
    because it really is an INV, an AND and a OR,
    but in CMOS those seem to be just 1 or 1.5 gate delays.

    There is the method of logical effort, see https://en.wikipedia.org/wiki/Logical_effort . I have not made
    much effort to do calculations using that method.

    Yes, I haven't actually used it either.
    Sutherland has examples of the gate merging I'm referring to.
    Section 4.4 Asymmetric logic gates figure 4.3 has an example of
    (A and B) nor C)
    merges the AND and NOR gates so instead of 2 gates and 8 transistors
    its 1 gate 6 transistors.

    An alternative would be to use an actual library as an example.
    A company called Nangate released an open-sourced library (google
    for NangateOpenCellLibrary_typical.lib ), based on a 45 nm process,
    for which delay calculations can be done as example, for example
    using Berkeley ABC. That program can also do optimiztations
    (although it cannot handle gates with more than one input, such as
    full adders, and has weaknesses in stability). I haven't tried to
    model wire delays with this.

    A while ago I was rummaging about and found the individual gate
    delay info in the open source Process Design Kit (PDK) files.

    https://skywater-pdk.readthedocs.io/en/main/ https://github.com/google/skywater-pdk

    In CMOS sometimes one is able to smoosh gates together and eliminate
    gate delays, but the rules for when smooshing is allowed are not
    obvious to me. I just assumed that it all sorts out in SPICE simulation.

    AOI and friends also work in TTL, I believe.

    Yes but you don't get to merge gates together to shorten the delay.
    You only get to choose from the packages available
    and for most situations just scan the spec sheet and use
    the max of the all propagation delays.

    I find this makes it more difficult to just look at a CMOS logic circuit
    and know whether it will fit within a 20 gate delay stage budget.

    An interesting question :-)

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sun Apr 19 21:39:12 2026
    From Newsgroup: comp.arch


    EricP <ThatWouldBeTelling@thevillage.com> posted:

    On 2026-Apr-18 13:59, MitchAlsup wrote:
    ------------------
    A 4:1 Mux is a 2222AOI gate--one Ssllooww gate--but still 1 gate.

    One slow gate is generally faster than 2 gates (and less power)
    because only 1 signal has to move (Vdd->gnd or gnd->Vdd) instead
    of more than one. Each signal moving is limited by the transcon-
    ductance of the FET stack and the capacitance being driven. We
    call the rise/fall time the edge speed.

    In CMOS sometimes one is able to smoosh gates together and eliminate
    gate delays, but the rules for when smooshing is allowed are not
    obvious to me. I just assumed that it all sorts out in SPICE simulation.

    Almost always be deMorganizing the logic.

    I'm referring to where you merge gates together.
    For example, an XOR is (A nand (not B)) nand ((not A) nand B)
    which is 2 INV (4T) and 3 NAND gates (12T) with a total of
    16 transistors and a delay of 3.

    Consider what we call a 2-stack:

    _|
    a___|
    |_
    |
    _|
    b___|
    |_
    |

    The 2-stack will conduct when both FETs are On and not otherwise.

    Consider 4{2-stacks} 2 from Vdd to signal, 2 from
    gnd to signal.

    By having P-channel a[0] = x_true and b[0] = y_true
    a[1] = x_false and b[1] = y_false

    When x == y the stack-pair will pull up.

    By having N-channel a[2] = x_true and b[2] = y_false
    a[3] = x_false and b[3] = Y_true

    When X != y the stack-pair will pull down.

    Presto: an (positive logic) 8-FET XOR gate--its fault is that
    it requires true/complement inputs (an inverter of delay or
    about 1/3 of a gate-delay.) So, this XOR gate in positive only
    signaling would have the output drive of a 2-NAND gate and
    the delay of 1.3 2-NAND gate.

    By rearranging the input terms an XNOR gate is accomplished.

    Since the pull stack and the pull down stacks are both 2 series
    transistors, the pull up has transconductance 1/2 of the single
    FET, we compensate by making the FETs 2|u as long.

    In full custom logic, we use 4-FET stacks pulling down (4-NAND)
    but we restrict ourselves to 3-FET stacks pulling up do to both
    the P-Channel conductance (holes weight more than electrons) and
    the body effect of p-channels being greater than N-channels.

    The 3 NAND can merge into a single gate of 8 transistors
    and a total of 12T and a delay of 2.

    The N-stack is our N-AND term. Parallel stacks are your OR
    term. CMOS simply requires that all signals are always driven
    {up or down}.

    Presumably this merging of gates can continue to some point
    but what that point is isn't clear to me.

    4 N-channels in series and 3 P-channels in series.

    That makes it difficult to look at a logic diagram and
    know how many gates are going to merge that way.

    You do it for a decade, and it becomes akin to breathing.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sun Apr 19 21:47:12 2026
    From Newsgroup: comp.arch


    BGB <cr88192@gmail.com> posted:

    On 4/19/2026 4:28 AM, Michael S wrote:
    On Sun, 19 Apr 2026 03:54:40 -0500
    BGB <cr88192@gmail.com> wrote:


    Did see someone on the RISC-V sig-fp mailing also favoring DAZ/FTZ
    and similar for floating point.

    So, it isn't exactly unheard of...


    As I said above, I think that IEEE 754 definitions for basic ops are
    great.
    I strongly oppose "flush subnormals to zero".



    I also agree with the formats and basic ops, main sticking point is that subnormals make some things more complicated and expensive for hardware.

    And yet we have single chips with 64 cores, each core containing 5
    (sometimes 9) FMAC in 64-bit sizes--where most of the chip is actually
    L2 and L3 caches. In real silicon technology, this means one can put
    600 64-bit FMACs on a die. In the GPU world, they put 2000 FMACs on
    a single die.

    I submit that back when it was hard to put a whole core on a chip
    you had a shred of an argument, now you do not--we grew out of those constraints.

    Main problem case in practice (where differences between subnormals and
    FTZ semantics becomes visible) involves dividing something by a value
    very close to zero. This requires alternate handling to get correct
    results (the naive x*y => x*(1.0/y) strategy no longer works).

    If you consider FDIV as having to get the rounding correct--that
    method NEVER EVER worked: but you don't even bother getting FMUL
    correctly rounded.....
    -------------------
    So, for SIMD, I mostly ended up going with Fixed-RNE, DAZ+FTZ, and Two's Complement FSUB, mostly because the alternative was poor even for SIMD.

    2.0 - 3.0 => -0.999999

    Is kinda obvious in its suck.

    Note: IEEE 754 delivers the right answer, BTW...
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sun Apr 19 21:48:39 2026
    From Newsgroup: comp.arch


    EricP <ThatWouldBeTelling@thevillage.com> posted:

    On 2026-Apr-18 15:23, Thomas Koenig wrote:
    EricP <ThatWouldBeTelling@thevillage.com> schrieb:

    In fast designs, there is an entire team charged with buffering and
    routing the CLOCK so that every gate in 10^2 mm^2 receives its rising
    edge and falling edge with less than 1 gate of delay 'skew' across
    the whole chip using wires that have more than 1 gate of delay when
    jumping over 30 gates. CRAY-1 had less than 1ns of clock skew in a
    machine the size of restaurant refrigerator using wires with 2ns/foot
    of delay. In ASIC designs, we assume (starting out) that there will
    be 1/2 clock of skew in the 'clock'

    The part I don't see is the rules for combinatorial gates.
    There also seem to be combinatorial gates like XOR or AND-OR-INV or MUX
    where multiple gates are combined in one but at a lower gate delay.

    For example, in TTL an XOR or a 2:1 or 4:1 mux has 3 or 4 gate delays
    because it really is an INV, an AND and a OR,
    but in CMOS those seem to be just 1 or 1.5 gate delays.

    There is the method of logical effort, see https://en.wikipedia.org/wiki/Logical_effort . I have not made
    much effort to do calculations using that method.

    Yes, I haven't actually used it either.
    Sutherland has examples of the gate merging I'm referring to.

    It is all Stacks (see previous thread post) and deMorganizing.

    Section 4.4 Asymmetric logic gates figure 4.3 has an example of
    (A and B) nor C)
    merges the AND and NOR gates so instead of 2 gates and 8 transistors
    its 1 gate 6 transistors.

    An alternative would be to use an actual library as an example.
    A company called Nangate released an open-sourced library (google
    for NangateOpenCellLibrary_typical.lib ), based on a 45 nm process,
    for which delay calculations can be done as example, for example
    using Berkeley ABC. That program can also do optimiztations
    (although it cannot handle gates with more than one input, such as
    full adders, and has weaknesses in stability). I haven't tried to
    model wire delays with this.

    A while ago I was rummaging about and found the individual gate
    delay info in the open source Process Design Kit (PDK) files.

    https://skywater-pdk.readthedocs.io/en/main/ https://github.com/google/skywater-pdk

    In CMOS sometimes one is able to smoosh gates together and eliminate
    gate delays, but the rules for when smooshing is allowed are not
    obvious to me. I just assumed that it all sorts out in SPICE simulation.

    AOI and friends also work in TTL, I believe.

    Yes but you don't get to merge gates together to shorten the delay.
    You only get to choose from the packages available
    and for most situations just scan the spec sheet and use
    the max of the all propagation delays.

    I find this makes it more difficult to just look at a CMOS logic circuit >> and know whether it will fit within a 20 gate delay stage budget.

    An interesting question :-)

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Sun Apr 19 18:04:48 2026
    From Newsgroup: comp.arch

    On 4/19/2026 4:47 PM, MitchAlsup wrote:

    BGB <cr88192@gmail.com> posted:

    On 4/19/2026 4:28 AM, Michael S wrote:
    On Sun, 19 Apr 2026 03:54:40 -0500
    BGB <cr88192@gmail.com> wrote:


    Did see someone on the RISC-V sig-fp mailing also favoring DAZ/FTZ
    and similar for floating point.

    So, it isn't exactly unheard of...


    As I said above, I think that IEEE 754 definitions for basic ops are
    great.
    I strongly oppose "flush subnormals to zero".



    I also agree with the formats and basic ops, main sticking point is that
    subnormals make some things more complicated and expensive for hardware.

    And yet we have single chips with 64 cores, each core containing 5
    (sometimes 9) FMAC in 64-bit sizes--where most of the chip is actually
    L2 and L3 caches. In real silicon technology, this means one can put
    600 64-bit FMACs on a die. In the GPU world, they put 2000 FMACs on
    a single die.

    I submit that back when it was hard to put a whole core on a chip
    you had a shred of an argument, now you do not--we grew out of those constraints.


    Realistically, anyone making a hobbyist class core can't make anything
    like what you describe.

    And, if doing it on an (affordable) FPGA, can only do what an FPGA can realistically manage. Which isn't even anywhere near the level of the
    SoC that fits in a typical cellphone at this point.

    more like late 1990s logic complexity at early 1990s clock speeds.



    Well, and, for PCs, most people don't have 64 cores either...
    Typical consumer-grade CPUs being more like 8 or 16 cores.

    Also typical GPU FPUs are low precision (with trying to use "double" in
    a GPGPU context often resulting in a fairly steep performance penalty).

    For "enterprise" stuff, that is more "money to burn", which is a very different scenario.


    Well, then there are NPUs, which seem to be going in a more "mixed
    FP8/FP16" direction. But, doing FMA at "FP8*FP8+FP16" is also, a very different situation.


    Well, and if your task is mostly bandwidth bound, maximal FPU precision
    is not the priority.


    Even in my case, I am still left partly battling memory bandwidth walls
    (like, say, frequent use of FP8 or FP16 in my case not being about
    trying to save RAM; rather trying to optimize for D$ and I$ and memory bandwidth and similar).

    Which is ironic in a way, given that seemingly I am in a better place
    relative to memory-bandwidth compared with clock-speed vs a lot of
    late-90s and early-2000s CPUs (well, because running a 16-bit wide DDR
    chip at 50 MHz, isn't *that* drastically slower than a 64-bit wide
    SO-DIMM running at 67 MHz from the early 2000s; or at least if comparing
    50 MHz vs 1400 MHz for the CPU core).

    Though, there is still the limit of what sorts of use-cases one can fit
    into the limited precision of these formats.

    ...


    Main problem case in practice (where differences between subnormals and
    FTZ semantics becomes visible) involves dividing something by a value
    very close to zero. This requires alternate handling to get correct
    results (the naive x*y => x*(1.0/y) strategy no longer works).

    If you consider FDIV as having to get the rounding correct--that
    method NEVER EVER worked: but you don't even bother getting FMUL
    correctly rounded.....
    -------------------

    Not in the DAZ/FTZ mode, granted.


    For the IEEE emulation mode, the idea is to try to patch things up as
    needed such that FMUL is correct (similar to the matter of dealing with subnormal numbers).

    But, yeah, the DAZ/FTZ mode may typically also give incorrectly rounded
    FMUL as well as incorrectly rounded FDIV.


    In both cases, the issue appears to go away though if the intermediate computations are done at Binary128 precision. But, this is its own
    pros/cons thing.


    So, for SIMD, I mostly ended up going with Fixed-RNE, DAZ+FTZ, and Two's
    Complement FSUB, mostly because the alternative was poor even for SIMD.

    2.0 - 3.0 => -0.999999

    Is kinda obvious in its suck.

    Note: IEEE 754 delivers the right answer, BTW...


    Yes, but I was pointing out mostly the problem of trying to cheap out
    too much.

    While for SIMD it initially seems like one can cheap out really hard,
    basic integer arithmetic scenarios failing to give exact results, and a tendency for values to drift towards zero, can start to have fairly
    obvious and visible effects.

    So, there is a limit here...


    It can escape notice if limited solely to graphics and audio tasks, but
    as soon as one starts trying to use it for something much more demanding
    than pixel colors or audio mixing, it falls on its face.


    One place it becomes obvious pretty quickly is if doing physics
    calculations or rotation math, where objects' positions and rotations
    will start to steadily drift. They tend to be adjusted every frame,
    based on things like applying forces and time-steps.

    If the objects all start slowly rotating and sliding towards the origin,
    the suck is evident.


    So, will need to draw a line here.


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Robert Finch@robfi680@gmail.com to comp.arch on Sun Apr 19 22:47:45 2026
    From Newsgroup: comp.arch

    On 2026-04-19 7:04 p.m., BGB wrote:
    On 4/19/2026 4:47 PM, MitchAlsup wrote:

    BGB <cr88192@gmail.com> posted:

    On 4/19/2026 4:28 AM, Michael S wrote:
    On Sun, 19 Apr 2026 03:54:40 -0500
    BGB <cr88192@gmail.com> wrote:


    Did see someone on the RISC-V sig-fp mailing also favoring DAZ/FTZ
    and similar for floating point.

    So, it isn't exactly unheard of...


    As I said above, I think that IEEE 754 definitions for basic ops are
    great.
    I strongly oppose "flush subnormals to zero".



    I also agree with the formats and basic ops, main sticking point is that >>> subnormals make some things more complicated and expensive for hardware.

    And yet we have single chips with 64 cores, each core containing 5
    (sometimes 9) FMAC in 64-bit sizes--where most of the chip is actually
    L2 and L3 caches. In real silicon technology, this means one can put
    600 64-bit FMACs on a die. In the GPU world, they put 2000 FMACs on
    a single die.

    I submit that back when it was hard to put a whole core on a chip
    you had a shred of an argument, now you do not--we grew out of those
    constraints.


    Realistically, anyone making a hobbyist class core can't make anything
    like what you describe.

    I made a 56-core system on an A7-200T. Granted only small 16-bit CPUs
    with no FP. Strange thing, I could never get more than about 48 cores to
    work.

    I think taking the target audience into consideration is important. If
    one is trying to produce an example for a low-cost FPGA then there are
    limits to what can be done. Most people looking for starting-out type
    examples are not going to look at superscalar machines with FP ops. For
    the more complex stuff people expect that larger, more expensive
    hardware is required.

    I think getting IEEE correct results does not use much more logic than
    simpler approaches. One is talking small percentages. What is the
    difference? Hundreds of LUTs in an FPGA with tens of thousands of LUTs available? 64-bit FMA is using about 3600 LUTs with sub-normals and
    rounding.

    If one is not looking for IEEE compatibility there may be other
    approaches to FP that might use fewer resources. TworCOs complement representation?

    And, if doing it on an (affordable) FPGA, can only do what an FPGA can realistically manage. Which isn't even anywhere near the level of the
    SoC that fits in a typical cellphone at this point.

    more like late 1990s logic complexity at early 1990s clock speeds.



    Well, and, for PCs, most people don't have 64 cores either...
    -a Typical consumer-grade CPUs being more like 8 or 16 cores.

    Also typical GPU FPUs are low precision (with trying to use "double" in
    a GPGPU context often resulting in a fairly steep performance penalty).

    For "enterprise" stuff, that is more "money to burn", which is a very different scenario.


    Well, then there are NPUs, which seem to be going in a more "mixed FP8/ FP16" direction. But, doing FMA at "FP8*FP8+FP16" is also, a very
    different situation.


    Well, and if your task is mostly bandwidth bound, maximal FPU precision
    is not the priority.


    Even in my case, I am still left partly battling memory bandwidth walls (like, say, frequent use of FP8 or FP16 in my case not being about
    trying to save RAM; rather trying to optimize for D$ and I$ and memory bandwidth and similar).

    Which is ironic in a way, given that seemingly I am in a better place relative to memory-bandwidth compared with clock-speed vs a lot of
    late-90s and early-2000s CPUs (well, because running a 16-bit wide DDR
    chip at 50 MHz, isn't *that* drastically slower than a 64-bit wide SO-
    DIMM running at 67 MHz from the early 2000s; or at least if comparing 50
    MHz vs 1400 MHz for the CPU core).

    Though, there is still the limit of what sorts of use-cases one can fit
    into the limited precision of these formats.

    ...


    Main problem case in practice (where differences between subnormals and
    FTZ semantics becomes visible) involves dividing something by a value
    very close to zero. This requires alternate handling to get correct
    results (the naive x*y => x*(1.0/y) strategy no longer works).

    If you consider FDIV as having to get the rounding correct--that
    method NEVER EVER worked: but you don't even bother getting FMUL
    correctly rounded.....
    -------------------

    Not in the DAZ/FTZ mode, granted.


    For the IEEE emulation mode, the idea is to try to patch things up as
    needed such that FMUL is correct (similar to the matter of dealing with subnormal numbers).

    But, yeah, the DAZ/FTZ mode may typically also give incorrectly rounded
    FMUL as well as incorrectly rounded FDIV.


    In both cases, the issue appears to go away though if the intermediate computations are done at Binary128 precision. But, this is its own pros/ cons thing.


    So, for SIMD, I mostly ended up going with Fixed-RNE, DAZ+FTZ, and Two's >>> Complement FSUB, mostly because the alternative was poor even for SIMD.

    2.0 - 3.0 => -0.999999

    Is kinda obvious in its suck.

    Note: IEEE 754 delivers the right answer, BTW...


    Yes, but I was pointing out mostly the problem of trying to cheap out
    too much.

    While for SIMD it initially seems like one can cheap out really hard,
    basic integer arithmetic scenarios failing to give exact results, and a tendency for values to drift towards zero, can start to have fairly
    obvious and visible effects.

    So, there is a limit here...


    It can escape notice if limited solely to graphics and audio tasks, but
    as soon as one starts trying to use it for something much more demanding than pixel colors or audio mixing, it falls on its face.


    One place it becomes obvious pretty quickly is if doing physics
    calculations or rotation math, where objects' positions and rotations
    will start to steadily drift. They tend to be adjusted every frame,
    based on things like applying forces and time-steps.

    If the objects all start slowly rotating and sliding towards the origin,
    the suck is evident.


    So, will need to draw a line here.



    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Mon Apr 20 01:08:38 2026
    From Newsgroup: comp.arch

    On 4/19/2026 9:47 PM, Robert Finch wrote:
    On 2026-04-19 7:04 p.m., BGB wrote:
    On 4/19/2026 4:47 PM, MitchAlsup wrote:

    BGB <cr88192@gmail.com> posted:

    On 4/19/2026 4:28 AM, Michael S wrote:
    On Sun, 19 Apr 2026 03:54:40 -0500
    BGB <cr88192@gmail.com> wrote:


    Did see someone on the RISC-V sig-fp mailing also favoring DAZ/FTZ >>>>>> and similar for floating point.

    So, it isn't exactly unheard of...


    As I said above, I think that IEEE 754 definitions for basic ops are >>>>> great.
    I strongly oppose "flush subnormals to zero".



    I also agree with the formats and basic ops, main sticking point is
    that
    subnormals make some things more complicated and expensive for
    hardware.

    And yet we have single chips with 64 cores, each core containing 5
    (sometimes 9) FMAC in 64-bit sizes--where most of the chip is actually
    L2 and L3 caches. In real silicon technology, this means one can put
    600 64-bit FMACs on a die. In the GPU world, they put 2000 FMACs on
    a single die.

    I submit that back when it was hard to put a whole core on a chip
    you had a shred of an argument, now you do not--we grew out of those
    constraints.


    Realistically, anyone making a hobbyist class core can't make anything
    like what you describe.

    I made a 56-core system on an A7-200T. Granted only small 16-bit CPUs
    with no FP. Strange thing, I could never get more than about 48 cores to work.


    Mostly I am also mostly targeting FPGAs smaller than the A7-200T.

    But, yeah. I can currently go dual-core with a hardware rasterizer and
    similar on the A7-200T.

    Mostly limited to Single-core on the A7-100T.
    And, need to strip features to fit on the S7-50.


    However, I have noted that when one *can* fit on the S7-50, it is also possible to clock it a little higher.


    A big chunk of resources, besides FPU and similar, is mostly taken up by
    the L1 caches.
    ~ 22%: L1 caches (I$ and D$ and similar)
    ~ 10%: Main FPU (Binary64)
    ~ 5%: SIMD Unit (4x Binary32)
    ~ 7%: Decoder
    ~ 4%: 64-bit Int MUL/DIV (Shift-and-Add)
    ~ 13%: Integer ALU and similar (3 of them, ~ 4% each).
    ~ 20%: Register Files
    ~ 19%: Everything else...

    With the CPU core using roughly 70% of the total resource budget of the
    FPGA (L2, HW interfaces, etc, being most of the rest).

    Most of the BRAM is eaten up by the L1 and L2 caches.
    29%: L1 caches and TLB
    58%: L2 cache
    13%: Display Hardware Stuff.
    Font/Palette RAM
    Raster Cache
    ...


    I think taking the target audience into consideration is important. If
    one is trying to produce an example for a low-cost FPGA then there are limits to what can be done. Most people looking for starting-out type examples are not going to look at superscalar machines with FP ops. For
    the more complex stuff people expect that larger, more expensive
    hardware is required.


    Well, or the A7-100T can apparently run a RV32GC SWeRV core at 33 MHz.

    My core is at least more feature-rich that the SWeRV, though SWeRV does
    seem to get better IPC. In this case it is In-Order, 2-wide superscalar,
    with a 9 stage pipeline.

    So, seemingly not doing too badly...


    There was a Commodore64 / Commodore128 clone using the A7-200T, which
    was almost tempting as an FPGA dev-board (they already had most of the
    useful computer-style peripheral interfaces, etc).

    But, one fatal flaw for my uses:
    No External RAM chip, for emulating the C64/C128 they could do the whole
    thing in Block-RAM on this FPGA, so, they did so.


    I guess this was contrast with David Murray (The 8-Bit Guy) and his
    "Commander X16" project, which was mostly using all DIP chips (apart
    from the display interface, or VERA, which uses an FPGA, IIRC an S7-25
    or similar).

    If it were me, I wouldn't have bothered with an YM/OPL chip if going
    with an FPGA for the VERA, and probably have also ran the sound and
    music on the FPGA. Well, actually, I probably would have just gone all
    FPGA, as by that point, a slightly bigger FPGA is likely cheaper than
    sourcing a bunch of legacy DIP chips (even if most are still technically in-production by NXP and similar).

    Like, one is maybe leaving themselves open if they are depending on a
    mix of "New Old Stock" chips and for NXP to keep on making clones of
    various 40 year old chips and similar.

    Nor is there likely to be much more than niche demand for "Modern built machine that exists as a semi-compatible replica of the Commodore 64".



    I think getting IEEE correct results does not use much more logic than simpler approaches. One is talking small percentages. What is the difference? Hundreds of LUTs in an FPGA with tens of thousands of LUTs available? 64-bit FMA is using about 3600 LUTs with sub-normals and rounding.


    Besides LUTs, there is also latency (I would likely need around 10
    cycles or so for such a unit).


    Also format conversion needs renormalization to deal with subnormals, so
    the faster 1 cycle converters effectively get replaced by needing a full
    pass through the FMA (or, alternatively, logic to detect which case it
    is, and either "fast or slow path" it). In the fast/cheap paths, the
    format converters essentially just moving bits around.

    Then the issue of the FMUL and FADD parts need to be wider to give Single-Rounded Results.

    Possible, but not particularly fast or cheap in this case.


    As noted:
    The other option is to do a cheap FPU that "usually" gives the correct results, and is able to detect and raise a fault in the cases when it
    can't (letting software take over and then run the "correct" math using
    large integers).


    If one is not looking for IEEE compatibility there may be other
    approaches to FP that might use fewer resources. TworCOs complement representation?


    Going over to different formats is a much bigger issue for software.

    Software tends to interact enough with the FPU formats that using
    different / non-standard formats is going to "throw a wrench into things".



    But, that said, a format like, say:
    I24.E8
    I52.E12

    Which treats the mantissa as a combination of potentially non-normalized signed integer value and an exponent, could potentially allow for a
    cheaper FPU, if also using some special cases.

    Would still be pros/cons though:
    While you could split FADD/FMUL and re-normalization into separate
    steps, some other types of instructions could either no longer rely on
    the mantissa always being normalized, or one would need to make mantissa normalization a mandatory extra step in many cases.

    Would allow for cheaper FADD/FSUB logic, and for less latency (since the normalization stage goes away here).


    Pretty much no one did floating-point this way IIRC.

    In some ways, it would also display a bit more "jank", for example if
    chaining multiple operations without a normalization step results in a
    loss of precision as the mantissa scale and exponent drift out of sync.


    Don't necessarily need to go the direction of making the FPU even more
    jank though.

    ...


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Mon Apr 20 17:17:43 2026
    From Newsgroup: comp.arch

    I was about to increase the number of header types again, so that extra additional instruction sets could be combined with the full instruction
    set.
    I did add the new header type, but I removed an old one. And, as a bonus,
    that let me remove the 16-bit short instruction format.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Mon Apr 20 17:47:27 2026
    From Newsgroup: comp.arch

    EricP <ThatWouldBeTelling@thevillage.com> schrieb:

    There is the method of logical effort, see
    https://en.wikipedia.org/wiki/Logical_effort . I have not made
    much effort to do calculations using that method.

    Yes, I haven't actually used it either.
    Sutherland has examples of the gate merging I'm referring to.
    Section 4.4 Asymmetric logic gates figure 4.3 has an example of
    (A and B) nor C)
    merges the AND and NOR gates so instead of 2 gates and 8 transistors
    its 1 gate 6 transistors.

    See https://en.wikipedia.org/wiki/AND-OR-invert#/media/File:AOI21_complex_vs_standard_gates.svg
    for an example.

    AOI (and their dual, OAI) gates are quite cool.


    An alternative would be to use an actual library as an example.
    A company called Nangate released an open-sourced library (google
    for NangateOpenCellLibrary_typical.lib ), based on a 45 nm process,
    for which delay calculations can be done as example, for example
    using Berkeley ABC. That program can also do optimiztations
    (although it cannot handle gates with more than one input, such as
    full adders, and has weaknesses in stability). I haven't tried to
    model wire delays with this.

    A while ago I was rummaging about and found the individual gate
    delay info in the open source Process Design Kit (PDK) files.

    https://skywater-pdk.readthedocs.io/en/main/ https://github.com/google/skywater-pdk

    That looks interesting.


    In CMOS sometimes one is able to smoosh gates together and eliminate
    gate delays, but the rules for when smooshing is allowed are not
    obvious to me. I just assumed that it all sorts out in SPICE simulation.

    AOI and friends also work in TTL, I believe.

    Yes but you don't get to merge gates together to shorten the delay.
    You only get to choose from the packages available
    and for most situations just scan the spec sheet and use
    the max of the all propagation delays.

    Sure, if you design a chip on a silicon wafer you have much more
    freedom.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Paul Clayton@paaronclayton@gmail.com to comp.arch on Wed Apr 22 00:00:47 2026
    From Newsgroup: comp.arch

    On 4/17/26 9:11 PM, MitchAlsup wrote:
    [snip]
    In fast designs, there is an entire team charged with buffering and
    routing the CLOCK so that every gate in 10^2 mm^2 receives its rising
    edge and falling edge with less than 1 gate of delay 'skew' across
    the whole chip using wires that have more than 1 gate of delay when
    jumping over 30 gates. CRAY-1 had less than 1ns of clock skew in a
    machine the size of restaurant refrigerator using wires with 2ns/foot
    of delay. In ASIC designs, we assume (starting out) that there will
    be 1/2 clock of skew in the 'clock'

    I thought that some designs used intentional clock skew. If the
    natural places to divide pipeline stages results in different
    logic depths, a skewed clock would enable some stages to borrow
    time from others (I think).

    Perhaps this is not called 'skew'.

    (I have also read that pipeline stage delay can be kept constant
    and area/power traded with time, i.e., a normally longer stage
    can spend area/power to reduce delay and a normally shorter
    stage can save area/power by spending the delay slack.)
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Wed Apr 22 18:19:50 2026
    From Newsgroup: comp.arch


    Paul Clayton <paaronclayton@gmail.com> posted:

    On 4/17/26 9:11 PM, MitchAlsup wrote:
    [snip]
    In fast designs, there is an entire team charged with buffering and
    routing the CLOCK so that every gate in 10^2 mm^2 receives its rising
    edge and falling edge with less than 1 gate of delay 'skew' across
    the whole chip using wires that have more than 1 gate of delay when
    jumping over 30 gates. CRAY-1 had less than 1ns of clock skew in a
    machine the size of restaurant refrigerator using wires with 2ns/foot
    of delay. In ASIC designs, we assume (starting out) that there will
    be 1/2 clock of skew in the 'clock'

    I thought that some designs used intentional clock skew. If the
    natural places to divide pipeline stages results in different
    logic depths, a skewed clock would enable some stages to borrow
    time from others (I think).

    In the not so distant past, Intel would use as many as 10 clock edges
    to carefully time logic blocks. I think <essentially> everyone else
    uses only 2 clock edges {rising and falling} and most only use {rising}.

    Perhaps this is not called 'skew'.

    Skew is uncontrolled displacement of clock edge {early or late}. skew
    is only harmful when a sending block and a receiving block have different
    skew leaving the logic insufficient time to do its function.

    Offset is controlled displacement of clock edge.

    (I have also read that pipeline stage delay can be kept constant
    and area/power traded with time, i.e., a normally longer stage
    can spend area/power to reduce delay and a normally shorter
    stage can save area/power by spending the delay slack.)

    All sorts of engineering tricks are played "around the clock edge"
    to make timing.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Fri May 1 01:34:13 2026
    From Newsgroup: comp.arch

    On Tue, 17 Mar 2026 18:16:16 +0000, MitchAlsup wrote:

    5 pounds of sand does not fit in a 4 pound bag !

    Indeed.

    The basic load-store instruction set takes up about 75% of the opcode
    space of 32-bit instructions.

    Including pairs of short instructions, sharing a 32-bit word, takes up
    about 25% of the opcode space of 32-bit instructions.

    The trouble is, though, I need a few other things.

    I needed 32-bit operate instructions and additional memory-reference instructions. In my current iteration, I squeeze them out of a few unused opcodes in the short instructions. I had to squash the operate
    instructions down to half the space I had been using for them to do this.

    Operate instructions: about 1/128 of the opcode space; extra memory-
    reference instructions: about 1/64 of the opcode space.

    Also, I wanted a header that took up 1/16 of the opcode space, as the preferred simplest way to call for variable-length instructions. I had two spare opcodes in the 32-bit load-store instructions, but I had wanted to
    hang on to them for one extra instruction. I finally decided to limit the destination registers for the load address instruction so I could grab
    both opcodes.

    Now I really have run out of opcode space as far as general 32-bit instructions are concerned. I had to toss out the feature I had briefly
    added, of an alternate instruction set where memory operations were
    aligned, so that paired short instructions without register restrictions
    could be included at 50% of the opcode space. (That was because I didn't
    have a spare bit in the header for the type of code this would have been
    most useful with; of course, since an alternate instruction set is in a separate opcode space, I could still have it, just at a higher overhead
    cost of requesting it for a block; it just was no longer worth having, or
    so it seems.)

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Fri May 1 03:11:33 2026
    From Newsgroup: comp.arch

    On Fri, 01 May 2026 01:34:13 +0000, quadi wrote:

    I had to toss out the feature I had briefly
    added, of an alternate instruction set where memory operations were
    aligned, so that paired short instructions without register restrictions could be included at 50% of the opcode space.

    It turns out I did have enough room in the opcode space used for headers
    to indicate the use of this instruction set for all three of the header
    types where I had used a bit for it.

    But there is another more fundamental problem: the source of the extra
    opcode space for operate instructions and additional memory-reference instructions is now different from what it was, and so I would have to
    give those instructions new opcodes in order to fit in that alternate instruction set which differ from those they have in the regular one.

    At the moment, I don't think that's worth the trouble.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Fri May 1 05:16:41 2026
    From Newsgroup: comp.arch

    On Fri, 01 May 2026 03:11:33 +0000, quadi wrote:

    At the moment, I don't think that's worth the trouble.

    Instead, I ended up adding something else which resulted in my having to
    toss out a header type I had tossed out before, because I added a new type
    of header which demanded extra bits.
    Having squeezed the instruction set so much to fit it in to the available space, I felt that what was most desperately needed was a convenient and low-overhead way to switch into an alternate set of 32-bit instructions
    (which was already present, but only available for large headers for variable-length instruction code).
    In addition to adding the new header, I added some additional needed instructions to the alternate instruction set - and corrected a mistake in
    it as well.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Fri May 1 14:18:10 2026
    From Newsgroup: comp.arch

    On Fri, 01 May 2026 05:16:41 +0000, quadi wrote:

    In addition to adding the new header, I added some additional needed instructions to the alternate instruction set - and corrected a mistake
    in it as well.

    It turned out that there was still another mistake. I left out the much-
    needed Subroutine Jump with Offset instruction.

    And now that the architecture includes both a plain Subroutine Jump instruction _and_ a Subroutine Jump with Offset instruction, I needed to explain what each one *did* carefully. Because without such an
    explanation, it would not be clear how the ordinary subroutine jump
    without an offset could even _work_, given how headers and pseudo-
    immediates lead to non-executable matter being placed within code.

    So I noted that a regular subroutine jump instruction makes use of
    information within the current instruction block to correctly make the
    start of the next executable instruction the return address. Since it
    doesn't fetch the _next_ block, though, if it's the last executable instruction in a block, return is merely to the start of the following instruction block.

    Jumping to the start of an instruction block, in general, from any branch instruction of any kind, causes control to be transferred to the first executable instruction in the block as identified by its header (or lack thereof). Otherwise, branching to a location within an instruction block
    that is identified as not executable by the block header causes an error.

    The Subroutine Jump with Offset contains an explicit offset to add to the return address, so it doesn't attempt to adjust the return address to be
    sure it is to something executable; here, the compiler takes care of that
    for greater efficiency.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri May 1 17:36:27 2026
    From Newsgroup: comp.arch


    quadi <quadibloc@ca.invalid> posted:

    On Tue, 17 Mar 2026 18:16:16 +0000, MitchAlsup wrote:

    5 pounds of sand does not fit in a 4 pound bag !

    Indeed.

    The basic load-store instruction set takes up about 75% of the opcode
    space of 32-bit instructions.

    Including pairs of short instructions, sharing a 32-bit word, takes up
    about 25% of the opcode space of 32-bit instructions.

    The trouble is, though, I need a few other things.

    An architecture is as much about what you leave out as what you leave in.

    I needed 32-bit operate instructions and additional memory-reference instructions. In my current iteration, I squeeze them out of a few unused opcodes in the short instructions. I had to squash the operate
    instructions down to half the space I had been using for them to do this.

    Operate instructions: about 1/128 of the opcode space; extra memory- reference instructions: about 1/64 of the opcode space.

    Also, I wanted a header that took up 1/16 of the opcode space, as the preferred simplest way to call for variable-length instructions. I had two spare opcodes in the 32-bit load-store instructions, but I had wanted to hang on to them for one extra instruction. I finally decided to limit the destination registers for the load address instruction so I could grab
    both opcodes.

    Now I really have run out of opcode space as far as general 32-bit instructions are concerned. I had to toss out the feature I had briefly added, of an alternate instruction set where memory operations were
    aligned, so that paired short instructions without register restrictions could be included at 50% of the opcode space. (That was because I didn't have a spare bit in the header for the type of code this would have been most useful with; of course, since an alternate instruction set is in a separate opcode space, I could still have it, just at a higher overhead
    cost of requesting it for a block; it just was no longer worth having, or
    so it seems.)

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Fri May 1 19:00:20 2026
    From Newsgroup: comp.arch

    On Fri, 01 May 2026 17:36:27 +0000, MitchAlsup wrote:

    An architecture is as much about what you leave out as what you leave
    in.

    That certainly is true.

    And no matter what I do, since there are an infinity of possibilities, I
    will always have left out more than I've included.

    In Concertina II, it certainly does seem like I've included a lot. The ISA could be said to be bulging at the seams, like an overstuffed suitcase. I
    can see how that would seem to be a bad choice.

    Where is the emphasis in Concertina II? What does it prefer to leave in,
    and what is it content to leave out?

    The Motorola 68000 and the 80386 had groups of eight registers. The IBM
    360 had sixteen integer registers, but only four floating-point ones. Most RISC processors have 32 registers.

    Traditional processors had instructions that performed an arithmetic
    operation from memory into a register. RISC processors do arithmetic only
    in registers, with operations to load and store from memory.

    The System/360 addressed memory with a base register and an index register
    in addition to a 12-bit displacement. Most microprocessors use 16-bit displacements, but usually only let you use one register with it.

    I've tried to encompass all the features of these different processors as
    much as I could.

    Accessing variables in arrays needs an index register for which element in
    the array is being accessed, and a base register in addition to the small displacement in the instruction.

    I felt I couldn't leave _that_ out.

    So I did leave other things out.

    I stuck with load/store instructions instead of memory operate
    instructions in the primary instruction set.

    For both base and index registers, I used three-bit fields in the
    instruction to specify them. So while there were 32 integer registers for integer arithmetic, and 32 floating-point registers for floating-point arithmetic, only seven of the integer registers could be index registers,
    and only seven of the integer registers could be base registers for 16-bit displacements. (Another seven work with 12-bit displacements, and another seven with 20-bit displacements, and one other works with 15-bit displacements. This way, a register contains an address pointing to a
    block of data of a given size, all of which can be accessed by the instructions using it as a base register.)

    With varying amounts of overhead, though, one can use memory operate instructions, 20 bit displacements, and register banks with 128 registers
    like the Itanium.

    If something is good stuff that will make programs run faster, I want to include it.

    I left out stack-oriented instructions, memory with type labels... no inspiration from the Burroughs 5000 here!

    But if I can squeeze in base-index addressing and register banks with 32 registers, then I feel I've included two important things that will avoid programs being less efficient than they could be.

    The block structure let me
    - with modest overhead, switch to a different set of choices, so that one
    ISA could serve a variety of applications
    - have variable length instructions without changing an instruction set of 32-bit instructions designed to be in pure 32-bit code like with RISC,
    instead of restricting the 32-bit instructions to 50% or less of the
    opcode space (Instead, they're 75%, because even in header-less RISC-like mode, I saw having operate instructions that only take up 16 bits too important to exclude).
    - have immediate values of all data types, without that making indicating instruction lengths more complicated.

    Your argument for immediates made sense, but most ISAs exclude having
    general immediates as too complicated - so I tried to find a highly conventional solution.

    I definitely tried to lean away from super-CISC like Burroughs or even the VAX. I wanted to combine plain CISC (the 360) with RISC - the efficiencies both CISC and RISC provide. Block headers let the lengths of instructions
    be determined in parallel, which seems fast... even though you are correct that instruction decoding is done so far ahead of time, it's not really a bottleneck. But the block structure also saves on opcode space.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Fri May 1 19:25:59 2026
    From Newsgroup: comp.arch

    On Fri, 01 May 2026 19:00:20 +0000, quadi wrote:
    On Fri, 01 May 2026 17:36:27 +0000, MitchAlsup wrote:

    An architecture is as much about what you leave out as what you leave
    in.

    In Concertina II, it certainly does seem like I've included a lot. The
    ISA could be said to be bulging at the seams, like an overstuffed
    suitcase. I can see how that would seem to be a bad choice.

    To summarize - I've tried to include in Concertina II the features that obviously include performance, and are common in many well-known ISAs.

    That led me to include base-index addressing, 16-bit displacements, and
    banks of 32 registers. Combining all three of these was difficult enough,
    so I did leave out memory operate instructions from the main instruction
    set.

    I did make one clear choice: when instruction lengths are variable,
    they're variable in multiples of 16 bits. I liked the 360 and the 68020, I thought 16-bit and 48-bit instructions were efficient... but 8 bits was overkill; I didn't see x86 or the VAX as models to emulate.

    And, yes, in some areas it may seem that I'm avoiding choices. But that is
    a choice - a choice to have an ISA that doesn't dictate to the programmer
    or the implementor one way of doing things. So one can choose a
    programming style and an implementation that are best suited to one's task.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Fri May 1 12:29:01 2026
    From Newsgroup: comp.arch

    On 5/1/2026 12:00 PM, quadi wrote:
    On Fri, 01 May 2026 17:36:27 +0000, MitchAlsup wrote:

    An architecture is as much about what you leave out as what you leave
    in.

    That certainly is true.

    And no matter what I do, since there are an infinity of possibilities, I
    will always have left out more than I've included.

    In Concertina II, it certainly does seem like I've included a lot. The ISA could be said to be bulging at the seams, like an overstuffed suitcase. I
    can see how that would seem to be a bad choice.

    Where is the emphasis in Concertina II? What does it prefer to leave in,
    and what is it content to leave out?

    The Motorola 68000 and the 80386 had groups of eight registers. The IBM
    360 had sixteen integer registers, but only four floating-point ones. Most RISC processors have 32 registers.

    Traditional processors had instructions that performed an arithmetic operation from memory into a register. RISC processors do arithmetic only
    in registers, with operations to load and store from memory.

    The System/360 addressed memory with a base register and an index register
    in addition to a 12-bit displacement. Most microprocessors use 16-bit displacements, but usually only let you use one register with it.

    I've tried to encompass all the features of these different processors as much as I could.

    Accessing variables in arrays needs an index register for which element in the array is being accessed, and a base register in addition to the small displacement in the instruction.

    I felt I couldn't leave _that_ out.

    But every other modern architecture decided that they didn't need
    separate base registers. Do you think they were wrong? Having them
    doesn't save you anything - you still have to add to a register to
    increment the element number. In fact it costs a little in the hardware
    as the addressing requires two adds (Base+index+displacement) versus one (index + displacement). And it uses up a register needlessly. So why
    include it?
    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Sat May 2 01:41:53 2026
    From Newsgroup: comp.arch

    On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

    But every other modern architecture decided that they didn't need
    separate base registers. Do you think they were wrong? Having them
    doesn't save you anything - you still have to add to a register to
    increment the element number. In fact it costs a little in the hardware
    as the addressing requires two adds (Base+index+displacement) versus one (index + displacement). And it uses up a register needlessly. So why
    include it?

    It's true that address calculation, especially for multi-dimensional
    arrays, involves extra steps.

    Base-index addressing doesn't force two additions every time; one chooses whether or not an instruction is indexed.

    But when one is referring to an array element, it saves adding the displacement either to the address or the base value by means of an
    explicit add instruction. One doesn't save a register by not having
    indexing. One is still used to contain the modified address.

    John Savard

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat May 2 02:17:00 2026
    From Newsgroup: comp.arch


    quadi <quadibloc@ca.invalid> posted:

    On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

    But every other modern architecture decided that they didn't need
    separate base registers. Do you think they were wrong? Having them doesn't save you anything - you still have to add to a register to increment the element number. In fact it costs a little in the hardware
    as the addressing requires two adds (Base+index+displacement) versus one (index + displacement). And it uses up a register needlessly. So why include it?

    It's true that address calculation, especially for multi-dimensional
    arrays, involves extra steps.

    Base-index addressing doesn't force two additions every time; one chooses whether or not an instruction is indexed.

    People forget about the extra instructions:: lack of base+index causes
    a) longer latency
    b) more instructions
    c) larger code footprint
    d) compiler has to work harder
    e)...

    So the 2%-4% of its use causes 4%-8% more instructions which can be
    eliminated for 1-extra gate of delay (3-input adder versus 2-input).

    It is a more delicate balance than one presupposes.

    Given base+index+displacement there are never any support instruc-
    tions in memory access. Given displacement can be {16-bits, 32-bits,
    or 64-bits} all of memory is accessible in a single instruction...
    ALWAYS !! This gets rid of another 3%-ish of instruction footprint
    tipping the balance from 6%-ish (average of above) to 12%-ish (with
    these additional savings, tipping the balance towards "put it in".

    But when one is referring to an array element, it saves adding the displacement either to the address or the base value by means of an
    explicit add instruction. One doesn't save a register by not having indexing. One is still used to contain the modified address.

    John Savard

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Sat May 2 05:09:08 2026
    From Newsgroup: comp.arch

    On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

    But every other modern architecture decided that they didn't need
    separate base registers. Do you think they were wrong?

    This is a kind of question that is hard to answer.

    Do I think that I'm smarter than all those guys who designed RISC
    processors? No, of course not.

    But a lot of smart guys worked on the System/360 also. So the question may
    be whether or not my goals are different from theirs.

    After all, when the very first RISC machines came out, they didn't include floating-point arithmetic; while that was partly due to enough gates not
    yet being available, the rationale was given that floating-point
    arithmetic couldn't be done in one cycle.

    That was quickly rejected as silly.

    Index registers were considered a good idea back when they were originally introduced. It meant you could redirect an instruction to point somewhere
    else without modifying the instruction in memory.

    Base registers became a necessity once computer memories got so large -
    over 64K locations or thereabouts - that it wasn't practical to put whole addresses in instructions. So the base register, although it works the
    same way as an index register, does something different - an index
    register might be incremented once per loop, while base registers are left alone.

    So accessing an array requires one basically to copy a base register value into another register, and add the index to it. That's an extra
    instruction. It may not be needed for every array access, as you can still increment that modified base value. (Hmm. So since an addition is removed
    from address calculation in the instruction, one _could_ claim that
    lacking index/base addressing forces an *optimization* to be done. I'll
    have to think about that.)

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat May 2 06:19:43 2026
    From Newsgroup: comp.arch

    Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
    On 5/1/2026 12:00 PM, quadi wrote:
    The System/360 addressed memory with a base register and an index register >> in addition to a 12-bit displacement.

    The S/360 has general-purpose registers,

    Most microprocessors use 16-bit
    displacements, but usually only let you use one register with it.

    How do you count "most"? There are a lot of AMD64 processors that
    offer reg+reg*[1248]+offset.

    But every other modern architecture decided that they didn't need
    separate base registers.

    The S/360 does not have separate base registers.

    One might consider the FS and GS registers of AMD64 to be dedicated
    base registers. AFAIK FS is used for thread-local variables in some
    OSs. But for single-threaded code, I have never seen any compiler use
    FS or GS, and have not seen any assembly language program (other than
    those for demonstrating their existence) use FS or GS, either. So
    there seems to be little need for separate base registers.

    Concerning addressing modes that involve GPRs, there are a lot of
    statistics around about their use, and you can make your own
    relatively easily by observing the usage of addressing modes on AMD64.
    Note that for some registers, AMD64 requires the use of a displacement
    even if it is 0 (that's because the encoding that one would expect for
    the displacementless use of these registers has anoter meaning).

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Sat May 2 02:37:14 2026
    From Newsgroup: comp.arch

    On 5/1/2026 9:17 PM, MitchAlsup wrote:

    quadi <quadibloc@ca.invalid> posted:

    On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

    But every other modern architecture decided that they didn't need
    separate base registers. Do you think they were wrong? Having them
    doesn't save you anything - you still have to add to a register to
    increment the element number. In fact it costs a little in the hardware >>> as the addressing requires two adds (Base+index+displacement) versus one >>> (index + displacement). And it uses up a register needlessly. So why
    include it?

    It's true that address calculation, especially for multi-dimensional
    arrays, involves extra steps.

    Base-index addressing doesn't force two additions every time; one chooses
    whether or not an instruction is indexed.

    People forget about the extra instructions:: lack of base+index causes
    a) longer latency
    b) more instructions
    c) larger code footprint
    d) compiler has to work harder
    e)...

    So the 2%-4% of its use causes 4%-8% more instructions which can be eliminated for 1-extra gate of delay (3-input adder versus 2-input).

    It is a more delicate balance than one presupposes.

    Given base+index+displacement there are never any support instruc-
    tions in memory access. Given displacement can be {16-bits, 32-bits,
    or 64-bits} all of memory is accessible in a single instruction...
    ALWAYS !! This gets rid of another 3%-ish of instruction footprint
    tipping the balance from 6%-ish (average of above) to 12%-ish (with
    these additional savings, tipping the balance towards "put it in".


    It is a thing:
    Base+Displacement: Very Common;
    Base+Index: 2nd most common;
    Base+Index+Displacement: Uncommon;
    Load/Store with Inc/Dec: Uncommon (In general, *1)

    *1: If one uses it as PUSH/POP and then uses PUSH/POP for prologs and
    epilogs, it would be common. Otherwise, usage falls off a cliff. Despite
    C having special operator syntax for this, it is infrequently used, and typically nearly the only scenario where this scenario emerges naturally
    and is the most efficient way to approach the problem. Where, one could
    be like "what about memory and string operations?" but then find that
    while intuitively auto-increment makes sense here, it is often not the
    most efficient way to implement these even with these operations being present.


    Typically, the first two addressing modes eating nearly the entire
    load/store pie, as it were.


    Can also note though that also for Load/Store displacements, the vast
    majority of accesses are within a limit of 1K..4K, so a larger
    displacement is overkill except for certain use-cases or certain base registers.

    For a scaled displacement, the sweet spot being around 9 or 10 bits.
    5 or 6-bits: Not quite enough.
    7-bits: Sorta works, high miss rate;
    8-bits: OK, still misses a lot;
    9-bits: Good;
    10 bits: Good.

    Nearly all displacements are normal aligned to the element size, so
    using an raw byte displacement for 32 or 64 bit items is effectively
    throwing the bits away (would need 12 or 13 bits for similar effectiveness).



    In my case, GP and PC being the main cases where one needs larger displacements.

    There ended up being a few instructions in my case with a special
    GP+Disp16 addressing mode.

    Not for PC though, but in this case:
    Spread relative to PC was too large;
    Also not common enough to justify spending a significant chunk of
    encoding space on it.


    Though, this is where something like jumbo-prefixes worked well:
    If 99% of the time, the small displacement works, and 1% of the time,
    you can jump to 33 bits or so, which has a nearly 100% hit rate (and
    when it doesn't hit, you are typically in need of absolute addressing).

    ...



    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Sat May 2 03:33:02 2026
    From Newsgroup: comp.arch

    On 5/2/2026 1:19 AM, Anton Ertl wrote:
    Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
    On 5/1/2026 12:00 PM, quadi wrote:
    The System/360 addressed memory with a base register and an index register >>> in addition to a 12-bit displacement.

    The S/360 has general-purpose registers,

    Most microprocessors use 16-bit
    displacements, but usually only let you use one register with it.

    How do you count "most"? There are a lot of AMD64 processors that
    offer reg+reg*[1248]+offset.


    Also, in AMD64, your choice is mostly between 8 and 32 bit displacements.

    The 16-bit displacements were only really a thing in 16-bit mode, and in
    32 and 64 bit mode one needs a prefix to encode a 16-bit displacement,
    which isn't really worth it (would only save 1 byte over the 32-bit displacement).

    Ironically, statistical distributions would also favor 8/32 over either
    8/16 or 16/32 in this case (where 8-bits would hit often enough to make
    it a good choice over 16 bits, and 16-bits would miss often enough to
    leave a need for a 32-bit case, but 32-bits would hardly ever miss).

    This is, even with a byte-scaled 8-bit displacement having a comparably
    poor hit rate.


    Comparably (excluding things like M68K or similar), Disp16 seems to be infrequent.



    But every other modern architecture decided that they didn't need
    separate base registers.

    The S/360 does not have separate base registers.

    One might consider the FS and GS registers of AMD64 to be dedicated
    base registers. AFAIK FS is used for thread-local variables in some
    OSs. But for single-threaded code, I have never seen any compiler use
    FS or GS, and have not seen any assembly language program (other than
    those for demonstrating their existence) use FS or GS, either. So
    there seems to be little need for separate base registers.


    The role that FS/GS serves can be instead served by having an ABI
    register for this purpose.

    For example, X4/TP in RISC-V, etc.




    Concerning addressing modes that involve GPRs, there are a lot of
    statistics around about their use, and you can make your own
    relatively easily by observing the usage of addressing modes on AMD64.
    Note that for some registers, AMD64 requires the use of a displacement
    even if it is 0 (that's because the encoding that one would expect for
    the displacementless use of these registers has anoter meaning).


    Yeah, Mod/RM byte stuff is a little wonky here...


    Also that the encodings sort of punish one for using ESP as a base
    register despite it being one of the most frequently used registers as a
    base register (effectively one more often needs to pay the cost of using
    a SIB byte). Though, traditional x86 ABIs used EBP rather than ESP for accessing stack locals (but, the use of frame-pointers mostly went away
    with the 64-bit ABIs).

    Say:
    [rAX/rCX/rDX/rBX/rSI/rDI ] : 1-byte
    [rAX/rCX/rDX/rBX/rBP/rSI/rDI+Disp8 ] : 2-bytes
    [rAX/rCX/rDX/rBX/rBP/rSI/rDI+Disp32] : 5-bytes
    [rSP+Disp8 ]: 3 bytes (SIB tax)
    [rSP+Disp32]: 6 bytes (SIB tax)

    rSP being an escape-case to the SIB byte, and the would-be [rBP]
    encoding an Abs32 case in 32-bit x86, and [RIP+Disp32] in X64.

    Then:
    [Rb+Ri*{1/2/4/8}]
    [Rb+Ri*{1/2/4/8}+Disp8]
    [Rb+Ri*{1/2/4/8}+Disp32]
    Via the SIB.
    Where:
    If Ri==rSP, this encodes Ri=ZERO.
    ...

    Where, because decoding doesn't care about REX, similar wonk applies to
    R12 and R13.

    ...


    But, yeah:
    0..N prefix bytes
    Optional REX prefix
    1/2 byte main opcode
    optional Mod/RM (depends on opcode)
    optional 1/4/8 byte main immediate (depends on opcode).

    Sort of amazing they managed to make instruction decoding scale as well
    as it has.


    - anton

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Sat May 2 12:15:50 2026
    From Newsgroup: comp.arch

    On Sat, 02 May 2026 06:19:43 +0000, Anton Ertl wrote:

    The S/360 has general-purpose registers,

    The S/360 does not have separate base registers.

    That's true. But I think he was talking about the fact that S/360 memory- reference instructions had one field to specify a general register to use
    as the index register, and another field to specify a general register to
    use as the base register, while most modern architectures only have _one_ field to specify a register the contents of which are to be added to the displacement.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Sat May 2 05:37:21 2026
    From Newsgroup: comp.arch

    On 5/1/2026 10:09 PM, quadi wrote:
    On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

    But every other modern architecture decided that they didn't need
    separate base registers. Do you think they were wrong?

    This is a kind of question that is hard to answer.

    Do I think that I'm smarter than all those guys who designed RISC
    processors? No, of course not.

    But a lot of smart guys worked on the System/360 also. So the question may
    be whether or not my goals are different from theirs.

    Not goals exactly, but constraints. Remember, S/360 had no memory
    address relocation hardware such as a hardware base register or paging.
    The addresses in a program were real memory addresses. Thus address say
    1,000 in a program was real memory address 1,000. So absent base
    registers, you couldn't have more than one program in memory at the same
    time, since program1's address 1,000 would have referred to the same
    real memory address as program2's address 1,000. That's why each
    program started with a BALR instruction to put the real address of where
    the program was loaded into a base register and memory reference
    instructions had a base register field to add that address to the one specified in the instruction.

    Modern CPUs, and I presume your design, use paging to allow multiple occurrences of the same address (in different programs) to refer to
    different real memory addresses, thus don't need to specify a base
    address in every memory reference instruction.



    After all, when the very first RISC machines came out, they didn't include floating-point arithmetic; while that was partly due to enough gates not
    yet being available, the rationale was given that floating-point
    arithmetic couldn't be done in one cycle.

    That was quickly rejected as silly.

    Index registers were considered a good idea back when they were originally introduced. It meant you could redirect an instruction to point somewhere else without modifying the instruction in memory.

    Base registers became a necessity once computer memories got so large -
    over 64K locations or thereabouts - that it wasn't practical to put whole addresses in instructions.

    No. If that were true then every other CPU design that supported more
    than your 64K locations, including current ones, would require explicit
    base address registers specifiers in instructions. The use of virtual
    memory, e.g. paging, obviates that requirement.


    So the base register, although it works the
    same way as an index register, does something different - an index
    register might be incremented once per loop, while base registers are left alone.

    So accessing an array requires one basically to copy a base register value into another register, and add the index to it.

    No. Do you think that current CPU designs require that? They do not.
    You simply load the starting address of the array into an index register
    and add to that as needed.
    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Sat May 2 15:44:39 2026
    From Newsgroup: comp.arch

    quadi <quadibloc@ca.invalid> writes:
    On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

    But every other modern architecture decided that they didn't need
    separate base registers. Do you think they were wrong?

    This is a kind of question that is hard to answer.

    Do I think that I'm smarter than all those guys who designed RISC >processors? No, of course not.

    But a lot of smart guys worked on the System/360 also.

    It was a product of the times. We've advanced well beyond
    that in the intervening half-century. Really, the 360 operating
    systems were relatively crude and difficult to use compared
    with the contemporaneous competition.

    It's true, however, that no CEO was ever fired for buying IBM.

    Although per Google:


    Origin: The saying gained prominence during the 1980s,
    when IT decisions were driven by Fear, Uncertainty, and
    Doubt (FUD) tactics, aiming to avoid personal accountability.


    <snip>

    Index registers were considered a good idea back when they were originally >introduced. It meant you could redirect an instruction to point somewhere >else without modifying the instruction in memory.

    The earliest incarnations of such were often not 'registers' per-se, but
    rather reserved locations in memory (c.f. PDP-8 'TAD I'). The Electrodata
    220 had a 'B' register - the predecessor Electrodata 205 was the first commercial computer to offer an Index register (with the idea inspired
    by the Manchester Mark I).


    Base registers became a necessity once computer memories got so large -
    over 64K locations or thereabouts - that it wasn't practical to put whole >addresses in instructions.

    That may be true for the IBM 360 (although they could have updated the arch), but
    clearly there were contemporaneous systems (B3500, for example) which
    supported direct access to 1 million locations without needing index registers (although it did have three of them), and more exotic architectures like
    the stack-based B5500 and successors.


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sat May 2 15:57:51 2026
    From Newsgroup: comp.arch

    On 2026-05-02, Stephen Fuld <sfuld@alumni.cmu.edu.invalid> wrote:
    On 5/1/2026 10:09 PM, quadi wrote:
    On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

    But every other modern architecture decided that they didn't need
    separate base registers. Do you think they were wrong?

    This is a kind of question that is hard to answer.

    Do I think that I'm smarter than all those guys who designed RISC
    processors? No, of course not.

    But a lot of smart guys worked on the System/360 also. So the question may >> be whether or not my goals are different from theirs.

    Not goals exactly, but constraints. Remember, S/360 had no memory
    address relocation hardware such as a hardware base register or paging.
    The addresses in a program were real memory addresses. Thus address say 1,000 in a program was real memory address 1,000. So absent base
    registers, you couldn't have more than one program in memory at the same time, since program1's address 1,000 would have referred to the same
    real memory address as program2's address 1,000.

    Small quibble: This depends on what your loader does. IIRC
    (I would have to re-read John Levine's book on linkers and loaders
    to be sure) it could do relocation on program start. Not sure if
    they could have done away with the base registers completely, though.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Sat May 2 09:45:27 2026
    From Newsgroup: comp.arch

    On 5/2/2026 8:44 AM, Scott Lurndal wrote:
    quadi <quadibloc@ca.invalid> writes:
    On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

    But every other modern architecture decided that they didn't need
    separate base registers. Do you think they were wrong?

    This is a kind of question that is hard to answer.

    Do I think that I'm smarter than all those guys who designed RISC
    processors? No, of course not.

    But a lot of smart guys worked on the System/360 also.

    It was a product of the times. We've advanced well beyond
    that in the intervening half-century.

    Well, 60 years, but who's counting? :-) But you are absolutely right
    about the advancements.

    Really, the 360 operating
    systems were relatively crude and difficult to use compared
    with the contemporaneous competition.

    Agreed. But they had the disadvantage of the requirement of providing a family of sort of compatible OSs for a wide range of computer models.

    snip

    Index registers were considered a good idea back when they were originally >> introduced. It meant you could redirect an instruction to point somewhere
    else without modifying the instruction in memory.

    Yes.
    The earliest incarnations of such were often not 'registers' per-se, but rather reserved locations in memory (c.f. PDP-8 'TAD I'). The Electrodata 220 had a 'B' register - the predecessor Electrodata 205 was the first commercial computer to offer an Index register (with the idea inspired
    by the Manchester Mark I).

    I don't know the when the Electrodata 205 came out, but the Univac 1107 offered real index registers in about 1962, certainly predating the PDP-8.
    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Sat May 2 10:01:35 2026
    From Newsgroup: comp.arch

    On 5/1/2026 7:17 PM, MitchAlsup wrote:

    quadi <quadibloc@ca.invalid> posted:

    On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

    But every other modern architecture decided that they didn't need
    separate base registers. Do you think they were wrong? Having them
    doesn't save you anything - you still have to add to a register to
    increment the element number. In fact it costs a little in the hardware >>> as the addressing requires two adds (Base+index+displacement) versus one >>> (index + displacement). And it uses up a register needlessly. So why
    include it?

    It's true that address calculation, especially for multi-dimensional
    arrays, involves extra steps.

    Base-index addressing doesn't force two additions every time; one chooses
    whether or not an instruction is indexed.

    People forget about the extra instructions:: lack of base+index causes
    a) longer latency
    b) more instructions
    c) larger code footprint
    d) compiler has to work harder
    e)...

    So the 2%-4% of its use causes 4%-8% more instructions which can be eliminated for 1-extra gate of delay (3-input adder versus 2-input).

    It is a more delicate balance than one presupposes.

    I can believe that.
    Given base+index+displacement there are never any support instruc-
    tions in memory access. Given displacement can be {16-bits, 32-bits,
    or 64-bits} all of memory is accessible in a single instruction...

    Yes, but adding the specification of a base register takes instruction
    bits away from somewhere else, typically the displacement. So the
    S/360s choice to use them reduced the displacement to 12 bits. So
    larger programs required use of multiple base registers, which required loading them, i.e. extra support instructions, and increased register
    pressure (though with the availability of storage to storage
    instructions, that was less of an issue)


    ALWAYS !! This gets rid of another 3%-ish of instruction footprint
    tipping the balance from 6%-ish (average of above) to 12%-ish (with
    these additional savings, tipping the balance towards "put it in".

    Then why did just about every modern architecture, including your My
    66000, omit them?
    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Sat May 2 17:25:14 2026
    From Newsgroup: comp.arch

    On Sat, 02 May 2026 05:37:21 -0700, Stephen Fuld wrote:

    Modern CPUs, and I presume your design, use paging to allow multiple occurrences of the same address (in different programs) to refer to
    different real memory addresses, thus don't need to specify a base
    address in every memory reference instruction.

    Actually, that conclusion isn't quite right. This would work for a CPU
    like Intel's 432. But I intend programs to be able to work with large
    linear address spaces bigger than 64K, bigger than the displacement field
    in an instruction. That means a base register is still needed despite
    hardware paging features being potentially present.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat May 2 17:35:08 2026
    From Newsgroup: comp.arch

    quadi <quadibloc@ca.invalid> writes:
    But a lot of smart guys worked on the System/360 also.

    But they did not design immediate operands into the architecture. I
    wonder why that is. It increases the instruction count by about 50%.

    After all, when the very first RISC machines came out, they didn't include >floating-point arithmetic;

    They actually did. The RISC workstations from HP, Sun, MIPS, SGI
    etc. all included an FPU. Only Acorn was in a different market that
    did not require providing an FPU.

    while that was partly due to enough gates not
    yet being available, the rationale was given that floating-point
    arithmetic couldn't be done in one cycle.

    Who gave that rationale? AFAIK they had FPUs that could start an FP
    operation at every cycle, and as far as latency is concerned, every
    RISC implementation has instructions with latency >1 cycle already in
    their integer subset.

    Base registers became a necessity once computer memories got so large -
    over 64K locations or thereabouts - that it wasn't practical to put whole >addresses in instructions.

    IA-32 is a counterexample for your claim. You can use the direct
    addressing mode for the whole address space. The replacement of the
    direct addressing mode on AMD64 is not something involving a base
    register, but offset(%rip).

    So accessing an array requires one basically to copy a base register value >into another register, and add the index to it. That's an extra
    instruction. It may not be needed for every array access, as you can still >increment that modified base value. (Hmm. So since an addition is removed >from address calculation in the instruction, one _could_ claim that
    lacking index/base addressing forces an *optimization* to be done. I'll
    have to think about that.)

    If you look at the code for MIPS/Alpha/RISC-V with their single
    addressing mode offset(reg), you will find that the compilers tend to
    keep array cursors in registers, and tend to update them on every
    iteration. With a little luck, the loop-end check can be transformed
    into an array-end check, and the index variable is eliminated through
    an optimization called induction variable elimination. I have
    recently seen that in wasm code produced by clang.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Sat May 2 11:05:06 2026
    From Newsgroup: comp.arch

    On 5/2/2026 10:25 AM, quadi wrote:
    On Sat, 02 May 2026 05:37:21 -0700, Stephen Fuld wrote:

    Modern CPUs, and I presume your design, use paging to allow multiple
    occurrences of the same address (in different programs) to refer to
    different real memory addresses, thus don't need to specify a base
    address in every memory reference instruction.

    Actually, that conclusion isn't quite right. This would work for a CPU
    like Intel's 432. But I intend programs to be able to work with large
    linear address spaces bigger than 64K, bigger than the displacement field
    in an instruction. That means a base register is still needed despite hardware paging features being potentially present.

    Why don't you just use an index register like just about every other architecture (except S/360 derivative) systems do?
    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat May 2 18:10:23 2026
    From Newsgroup: comp.arch

    Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
    On 5/1/2026 10:09 PM, quadi wrote:
    [...]
    Modern CPUs, and I presume your design, use paging to allow multiple >occurrences of the same address (in different programs) to refer to >different real memory addresses, thus don't need to specify a base
    address in every memory reference instruction.

    And then we got ASLR, and now we have to live in a world again where
    the code and the static data don't live in fixed locations.

    So accessing an array requires one basically to copy a base register value >> into another register, and add the index to it.

    No. Do you think that current CPU designs require that? They do not.
    You simply load the starting address of the array into an index register
    and add to that as needed.

    In the general case (i.e., when the array index is not the counter of
    a counted loop), instruction sets like MIPS, Alpha, and RISC-V need
    additional instructions for computing the address of the array
    element, and only then use a load or store instruction to access the
    element. However, these architectures are three-address
    architectures, so the starting address of the array does not have to
    be copied first in this process.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Sat May 2 11:22:40 2026
    From Newsgroup: comp.arch

    On 5/2/2026 10:01 AM, Stephen Fuld wrote:
    On 5/1/2026 7:17 PM, MitchAlsup wrote:

    quadi <quadibloc@ca.invalid> posted:

    On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

    But every other modern architecture decided that they didn't need
    separate base registers.-a Do you think they were wrong?-a Having them >>>> doesn't save you anything - you still have to add to a register to
    increment the element number.-a In fact it costs a little in the
    hardware
    as the addressing requires two adds (Base+index+displacement) versus
    one
    (index + displacement).-a And it uses up a register needlessly. So why >>>> include it?

    It's true that address calculation, especially for multi-dimensional
    arrays, involves extra steps.

    Base-index addressing doesn't force two additions every time; one
    chooses
    whether or not an instruction is indexed.

    People forget about the extra instructions:: lack of base+index causes
    a) longer latency
    b) more instructions
    c) larger code footprint
    d) compiler has to work harder
    e)...

    So the 2%-4% of its use causes 4%-8% more instructions which can be
    eliminated for 1-extra gate of delay (3-input adder versus 2-input).

    It is a more delicate balance than one presupposes.

    I can believe that.
    Given base+index+displacement there are never any support instruc-
    tions in memory access. Given displacement can be {16-bits, 32-bits,
    or 64-bits} all of memory is accessible in a single instruction...

    Yes, but adding the specification of a base register takes instruction
    bits away from somewhere else, typically the displacement.-a So the
    S/360s choice to use them reduced the displacement to 12 bits.-a So
    larger programs required use of multiple base registers, which required loading them, i.e. extra support instructions, and increased register pressure (though with the availability of storage to storage
    instructions, that was less of an issue)


    ALWAYS !! This gets rid of another 3%-ish of instruction footprint
    tipping the balance from 6%-ish (average of above) to 12%-ish (with
    these additional savings, tipping the balance towards "put it in".

    Then why did just about every modern architecture, including your My
    66000, omit them?

    Apologies! I see that for loads and stores, your design does offer such modes.
    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat May 2 18:33:00 2026
    From Newsgroup: comp.arch


    quadi <quadibloc@ca.invalid> posted:

    On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

    But every other modern architecture decided that they didn't need
    separate base registers. Do you think they were wrong?

    This is a kind of question that is hard to answer.

    Do I think that I'm smarter than all those guys who designed RISC processors? No, of course not.

    But a lot of smart guys worked on the System/360 also. So the question may be whether or not my goals are different from theirs.

    Yes, you include all 360 kinds of data and about 50% more to cover
    more bases.

    After all, when the very first RISC machines came out, they didn't include floating-point arithmetic; while that was partly due to enough gates not
    yet being available, the rationale was given that floating-point
    arithmetic couldn't be done in one cycle.

    MIPS had FP, SPARC had FP, Mc88K had FP, Clipper had FP.
    What RISCS are you speaking ??

    That was quickly rejected as silly.

    Index registers were considered a good idea back when they were originally introduced. It meant you could redirect an instruction to point somewhere else without modifying the instruction in memory.

    The index registers of /360 required compiler strength reduction
    and loop invariant treatment. Scaled index registers do not.

    Base registers became a necessity once computer memories got so large -
    over 64K locations or thereabouts - that it wasn't practical to put whole addresses in instructions.

    Any yet My 66000 CAN !!

    So the base register, although it works the
    same way as an index register, does something different - an index
    register might be incremented once per loop, while base registers are left alone.

    So are displacements.

    So accessing an array requires one basically to copy a base register value into another register, and add the index to it. That's an extra
    instruction.

    You are making assumptions that are not necessary. Why don't you
    spell them out.

    It may not be needed for every array access, as you can still increment that modified base value. (Hmm. So since an addition is removed from address calculation in the instruction, one _could_ claim that
    lacking index/base addressing forces an *optimization* to be done. I'll
    have to think about that.)

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat May 2 18:48:48 2026
    From Newsgroup: comp.arch


    quadi <quadibloc@ca.invalid> posted:

    On Sat, 02 May 2026 06:19:43 +0000, Anton Ertl wrote:

    The S/360 has general-purpose registers,

    The S/360 does not have separate base registers.

    That's true. But I think he was talking about the fact that S/360 memory- reference instructions had one field to specify a general register to use
    as the index register, and another field to specify a general register to use as the base register, while most modern architectures only have _one_ field to specify a register the contents of which are to be added to the displacement.

    And you have put your finger on what is wrong with most modern ISAs.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Sat May 2 13:58:52 2026
    From Newsgroup: comp.arch

    On 5/2/2026 1:22 PM, Stephen Fuld wrote:
    On 5/2/2026 10:01 AM, Stephen Fuld wrote:
    On 5/1/2026 7:17 PM, MitchAlsup wrote:

    quadi <quadibloc@ca.invalid> posted:

    On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

    But every other modern architecture decided that they didn't need
    separate base registers.-a Do you think they were wrong?-a Having them >>>>> doesn't save you anything - you still have to add to a register to
    increment the element number.-a In fact it costs a little in the
    hardware
    as the addressing requires two adds (Base+index+displacement)
    versus one
    (index + displacement).-a And it uses up a register needlessly. So why >>>>> include it?

    It's true that address calculation, especially for multi-dimensional
    arrays, involves extra steps.

    Base-index addressing doesn't force two additions every time; one
    chooses
    whether or not an instruction is indexed.

    People forget about the extra instructions:: lack of base+index causes
    a) longer latency
    b) more instructions
    c) larger code footprint
    d) compiler has to work harder
    e)...

    So the 2%-4% of its use causes 4%-8% more instructions which can be
    eliminated for 1-extra gate of delay (3-input adder versus 2-input).

    It is a more delicate balance than one presupposes.

    I can believe that.
    Given base+index+displacement there are never any support instruc-
    tions in memory access. Given displacement can be {16-bits, 32-bits,
    or 64-bits} all of memory is accessible in a single instruction...

    Yes, but adding the specification of a base register takes instruction
    bits away from somewhere else, typically the displacement.-a So the
    S/360s choice to use them reduced the displacement to 12 bits.-a So
    larger programs required use of multiple base registers, which
    required loading them, i.e. extra support instructions, and increased
    register pressure (though with the availability of storage to storage
    instructions, that was less of an issue)


    ALWAYS !! This gets rid of another 3%-ish of instruction footprint
    tipping the balance from 6%-ish (average of above) to 12%-ish (with
    these additional savings, tipping the balance towards "put it in".

    Then why did just about every modern architecture, including your My
    66000, omit them?

    Apologies!-a I see that for loads and stores, your design does offer such modes.


    FWIW:
    XG2 and XG3 also essentially include [Rb+Ri*Sc+Disp] as an optional feature.

    Just in my own stats, I didn't see them coming up often enough in
    practice to justify having them as part of the core ISA, nor necessarily enabled in HW (they are, alongside the Load-Op stuff).

    Theoretically can do:
    ADDS.L (SP, R12, 16), R14
    For:
    R14+=((int *)(SP+16))[R12]

    But, this is niche...





    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat May 2 19:00:22 2026
    From Newsgroup: comp.arch


    Stephen Fuld <sfuld@alumni.cmu.edu.invalid> posted:

    On 5/1/2026 7:17 PM, MitchAlsup wrote:

    quadi <quadibloc@ca.invalid> posted:

    On Fri, 01 May 2026 12:29:01 -0700, Stephen Fuld wrote:

    But every other modern architecture decided that they didn't need
    separate base registers. Do you think they were wrong? Having them
    doesn't save you anything - you still have to add to a register to
    increment the element number. In fact it costs a little in the hardware >>> as the addressing requires two adds (Base+index+displacement) versus one >>> (index + displacement). And it uses up a register needlessly. So why
    include it?

    It's true that address calculation, especially for multi-dimensional
    arrays, involves extra steps.

    Base-index addressing doesn't force two additions every time; one chooses >> whether or not an instruction is indexed.

    People forget about the extra instructions:: lack of base+index causes
    a) longer latency
    b) more instructions
    c) larger code footprint
    d) compiler has to work harder
    e)...

    So the 2%-4% of its use causes 4%-8% more instructions which can be eliminated for 1-extra gate of delay (3-input adder versus 2-input).

    It is a more delicate balance than one presupposes.

    I can believe that.
    Given base+index+displacement there are never any support instruc-
    tions in memory access. Given displacement can be {16-bits, 32-bits,
    or 64-bits} all of memory is accessible in a single instruction...

    Yes, but adding the specification of a base register takes instruction
    bits away from somewhere else, typically the displacement. So the

    In My 66000 case there is an instruction format of [base+disp16]
    and there is a different instruction format of [base+index<<scale]
    which can have {disp32 or disp64} optionally appended as a constant.

    S/360s choice to use them reduced the displacement to 12 bits. So

    We now understand that this is a less than optimal choice.

    larger programs required use of multiple base registers,

    This is a side effect of branching using standard memory address
    format, and the small displacement, ameliorated with positive only
    12-bit displacements
    .
    which required loading them, i.e. extra support instructions, and increased register pressure (though with the availability of storage to storage
    instructions, that was less of an issue)

    Which is why /360 is (or should be) not considered a "great ISA"
    to copy.


    ALWAYS !! This gets rid of another 3%-ish of instruction footprint
    tipping the balance from 6%-ish (average of above) to 12%-ish (with
    these additional savings, tipping the balance towards "put it in".

    Then why did just about every modern architecture, including your My
    66000, omit them?

    I was arguing that My 66000 has them while most modern ISAs do not.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat May 2 19:02:43 2026
    From Newsgroup: comp.arch


    quadi <quadibloc@ca.invalid> posted:

    On Sat, 02 May 2026 05:37:21 -0700, Stephen Fuld wrote:

    Modern CPUs, and I presume your design, use paging to allow multiple occurrences of the same address (in different programs) to refer to different real memory addresses, thus don't need to specify a base
    address in every memory reference instruction.

    Actually, that conclusion isn't quite right. This would work for a CPU
    like Intel's 432. But I intend programs to be able to work with large
    linear address spaces bigger than 64K, bigger than the displacement field
    in an instruction. That means a base register is still needed despite

    My 66000 can directly address all 64-bits of VAS without using a base
    register. {A special feature when DISP64 has R0 as its base register}.

    hardware paging features being potentially present.

    Paging is always on in My 66000--even as one comes out of reset.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat May 2 19:10:24 2026
    From Newsgroup: comp.arch


    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

    Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
    On 5/1/2026 10:09 PM, quadi wrote:
    [...]
    Modern CPUs, and I presume your design, use paging to allow multiple >occurrences of the same address (in different programs) to refer to >different real memory addresses, thus don't need to specify a base
    address in every memory reference instruction.

    And then we got ASLR, and now we have to live in a world again where
    the code and the static data don't live in fixed locations.

    In My 66000 case, the code does not know it is in ALSR mode
    (or not) things that are accessed via ALSR go through GOT[].

    So accessing an array requires one basically to copy a base register value >> into another register, and add the index to it.

    No. Do you think that current CPU designs require that? They do not.
    You simply load the starting address of the array into an index register >and add to that as needed.

    In the general case (i.e., when the array index is not the counter of
    a counted loop), instruction sets like MIPS, Alpha, and RISC-V need additional instructions for computing the address of the array
    element, and only then use a load or store instruction to access the
    element. However, these architectures are three-address
    architectures, so the starting address of the array does not have to
    be copied first in this process.

    Agreed.

    - anton
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Mon May 4 02:34:59 2026
    From Newsgroup: comp.arch

    I've made one more little addition to the instruction set.

    Now the 48-bit long instructions can be placed within a 64-bit pair of 32-
    bit "instructions" that can be placed within code without headers.

    At first, when I found the opcode space to do that with, it seemed there
    was a conflict which prevented these 64-bit encapsulated instructions from appearing in code with variable-length instructions.

    Yes, normally they wouldn't be needed there, since in that code, 48-bit instructions can be expressed in just 48 bits. But those existing 48-bit instructions require prefix bits to distinguish them from the regular 32-
    bit instructions in the code. The encapsulation format has room for any arbitrary combination of 48 bits. So _additional_ 48-bit instructions
    which don't have the necessary prefix could be defined, which can only
    appear in encapsulated form in either kind of code. (So they're really additional 64-bit instructions, but I call them 48-bit because of the fact that they're placed in association with the real 48-bit instructions.)

    Fortunately, though, I was able to straighten things out and eliminate the conflict without doing violence to the bit-mappings in the instruction set.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Mon May 4 02:26:33 2026
    From Newsgroup: comp.arch

    On Sat, 02 May 2026 18:33:00 +0000, MitchAlsup wrote:
    quadi <quadibloc@ca.invalid> posted:

    After all, when the very first RISC machines came out, they didn't
    include floating-point arithmetic; while that was partly due to enough
    gates not yet being available, the rationale was given that
    floating-point arithmetic couldn't be done in one cycle.

    MIPS had FP, SPARC had FP, Mc88K had FP, Clipper had FP.
    What RISCS are you speaking ??

    I'm remembering an article in Scientific American which explained the
    concept of RISC, written by the designer of one of the very first RISC processors (probably MIPS).

    Yes, MIPS has FP now, but possibly the very first MIPS processor didn't.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Mon May 4 02:45:01 2026
    From Newsgroup: comp.arch

    On Mon, 04 May 2026 02:34:59 +0000, quadi wrote:

    Yes, normally they wouldn't be needed there, since in that code, 48-bit instructions can be expressed in just 48 bits. But those existing 48-bit instructions require prefix bits to distinguish them from the regular
    32- bit instructions in the code. The encapsulation format has room for
    any arbitrary combination of 48 bits. So _additional_ 48-bit
    instructions which don't have the necessary prefix could be defined,
    which can only appear in encapsulated form in either kind of code. (So they're really additional 64-bit instructions, but I call them 48-bit
    because of the fact that they're placed in association with the real
    48-bit instructions.)

    And, of course, in the event that I ever do define any instructions in
    this category, I could always define a new header format in which a
    particular combination of bits in a prefix field indicates that the 16-bit zone to which it corresponds contains the start of a 48-bit instruction
    and not one of any other length, which would then give them the right to
    be called 48-bit instructions.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Mon May 4 05:45:44 2026
    From Newsgroup: comp.arch

    quadi <quadibloc@ca.invalid> writes:
    Yes, MIPS has FP now, but possibly the very first MIPS processor didn't.

    The very first MIPS processor was the R2000, and, as Wikipedia says,

    |The chipset consisted of the R2000 microprocessor, R2010
    |floating-point accelerator, and four R2020 write buffer chips.

    I don't know if the R2000 could work without R2010, but I am pretty
    sure that there never was a machine with an R2000, but without R2010.

    Later, when MIPS processors became cheap enough for embedded
    computing, there probably were MIPS processors without FPU, but that's
    not because of RISC principles or something, but because of the
    non-existing willingness of the customers to pay for that
    functionality.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Mon May 4 05:59:04 2026
    From Newsgroup: comp.arch

    EricP <ThatWouldBeTelling@thevillage.com> schrieb:

    Performance Effects of Architectural Complexity in the Intel 432, 1988 https://www.princeton.edu/~rblee/ELE572Papers/Fall04Readings/I432.pdf

    Wow... paved with good intentions and all that.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Mon May 4 16:38:23 2026
    From Newsgroup: comp.arch

    On Mon, 04 May 2026 05:59:04 +0000, Thomas Koenig wrote:
    EricP <ThatWouldBeTelling@thevillage.com> schrieb:

    Performance Effects of Architectural Complexity in the Intel 432, 1988
    https://www.princeton.edu/~rblee/ELE572Papers/Fall04Readings/I432.pdf

    Wow... paved with good intentions and all that.

    But the sign over the door of Hell says:

    Abandon all hope, ye who enter here.

    The iAPX 432 didn't doom computing permanently. It was just a learning experience. Which Intel survived.

    It learned not to make that mistake again.

    There was the 860 - from which I temporarily took some inspiration.

    There was the Itanium.

    Oh, dear. We _are_ doomed to eternal suffering, after all: what Intel took from all these learning experiences was to never try to deviate from x86
    ever again!

    John Savard

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Mon May 4 16:39:01 2026
    From Newsgroup: comp.arch

    On Sun, 03 May 2026 11:47:32 -0400, EricP wrote:

    Performance Effects of Architectural Complexity in the Intel 432, 1988 https://www.princeton.edu/~rblee/ELE572Papers/Fall04Readings/I432.pdf

    Thank you very much for the informative link.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Mon May 4 18:06:18 2026
    From Newsgroup: comp.arch


    quadi <quadibloc@ca.invalid> posted:

    I've made one more little addition to the instruction set.

    I just had an epiphany wrt Concertina II

    Now the 48-bit long instructions can be placed within a 64-bit pair of 32- bit "instructions" that can be placed within code without headers.

    Headers are nothing more than mode-bits that change every block of code.
    This means that ISA will be exceptionally difficult to verify, and
    as you have found: very difficult to encode.

    At first, when I found the opcode space to do that with, it seemed there
    was a conflict which prevented these 64-bit encapsulated instructions from appearing in code with variable-length instructions.

    Yes, normally they wouldn't be needed there, since in that code, 48-bit instructions can be expressed in just 48 bits. But those existing 48-bit instructions require prefix bits to distinguish them from the regular 32-
    bit instructions in the code. The encapsulation format has room for any arbitrary combination of 48 bits. So _additional_ 48-bit instructions
    which don't have the necessary prefix could be defined, which can only appear in encapsulated form in either kind of code. (So they're really additional 64-bit instructions, but I call them 48-bit because of the fact that they're placed in association with the real 48-bit instructions.)

    Fortunately, though, I was able to straighten things out and eliminate the conflict without doing violence to the bit-mappings in the instruction set.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Mon May 4 13:46:04 2026
    From Newsgroup: comp.arch

    On 5/4/2026 1:06 PM, MitchAlsup wrote:

    quadi <quadibloc@ca.invalid> posted:

    I've made one more little addition to the instruction set.

    I just had an epiphany wrt Concertina II

    Now the 48-bit long instructions can be placed within a 64-bit pair of 32- >> bit "instructions" that can be placed within code without headers.

    Headers are nothing more than mode-bits that change every block of code.
    This means that ISA will be exceptionally difficult to verify, and
    as you have found: very difficult to encode.


    I had an idea for a vaguely similar sort of feature in the 'F3 Block'
    that I had called DCB's (or "Dynamically Configurable Blocks") where
    encodings could be selected from a potentially open-ended set and mapped
    into the 32-bit space at runtime.

    I ended up putting it in the newer spec I am working on that these
    should not be used in statically compiled binaries (or, basically, configurable instructions that are only really allowed for JIT compiled
    code).


    Meanwhile, recently decided to give my XG3 ISA spec to CoPilot and see
    what it had to say about it.

    Apparently it decided to assert that it has a complexity more like
    x86-64 than it does like RISC-V.


    I will disagree partly, as to what extent I had worked with trying to
    emulate x86, or looking into writing a decoder in Verilog, it is a much
    bigger PITA.


    Even if, yes, when fully decked out (with all of the features enabled)
    it will effectively have some instructions with both an immediate and displacement, and a similar [Rb+Ri*Sc+Disp] addressing mode.


    It got more optimistic once I explained its role a little more, namely
    that it is not intended as a RISC-V replacement rather more as a way to
    have code (in a RISC-V based mode) that would have higher performance
    for certain types of tasks (like having an OpenGL implementation that
    isn't dead slow).

    ...


    At first, when I found the opcode space to do that with, it seemed there
    was a conflict which prevented these 64-bit encapsulated instructions from >> appearing in code with variable-length instructions.

    Yes, normally they wouldn't be needed there, since in that code, 48-bit
    instructions can be expressed in just 48 bits. But those existing 48-bit
    instructions require prefix bits to distinguish them from the regular 32-
    bit instructions in the code. The encapsulation format has room for any
    arbitrary combination of 48 bits. So _additional_ 48-bit instructions
    which don't have the necessary prefix could be defined, which can only
    appear in encapsulated form in either kind of code. (So they're really
    additional 64-bit instructions, but I call them 48-bit because of the fact >> that they're placed in association with the real 48-bit instructions.)

    Fortunately, though, I was able to straighten things out and eliminate the >> conflict without doing violence to the bit-mappings in the instruction set. >>
    John Savard

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Mon May 4 21:43:33 2026
    From Newsgroup: comp.arch

    On Mon, 04 May 2026 02:45:01 +0000, quadi wrote:

    And, of course, in the event that I ever do define any instructions in
    this category, I could always define a new header format in which a particular combination of bits in a prefix field indicates that the
    16-bit zone to which it corresponds contains the start of a 48-bit instruction and not one of any other length, which would then give them
    the right to be called 48-bit instructions.

    On a previous attempt, I couldn't find this post, so I started a new
    thread where I mention that, without defining any new 48-bit instructions
    yet, I did add - within an existing header - the option of indicating 48-
    bit instructions for this purpose.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Mon May 4 20:00:42 2026
    From Newsgroup: comp.arch

    Later, when MIPS processors became cheap enough for embedded
    computing, there probably were MIPS processors without FPU,

    Definitely: my ASUS WL-700gE (some kind of cheap "wifi router + NAS"
    from 20 years ago) had a MIPS processor and it lacked an FPU. I know
    because I was using it as a jukebox (running MusicPD) and I had to be
    extra careful to build it with the "idec" Vorbis decoder that was
    written specially to avoid floating point operations. Plus I had to be
    careful to avoid any resampling because that too tended to use the FPU.

    IIRC, FPU instructions were supported via traps, so software would still "work", but it was excruciatingly slow (unusable for real-time use such
    as media playback).


    === Stefan
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Tue May 5 04:33:53 2026
    From Newsgroup: comp.arch

    In connection with the changes described in these posts, I had needed to
    cut the opcode space available to operate instructions in half.
    In re-examining the opcode space available, I've found that I could have three-quarters instead of just half of the original opcode space
    available, and this let me add back much of what I had lost in that area.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Tue May 5 14:02:39 2026
    From Newsgroup: comp.arch

    On Tue, 05 May 2026 04:33:53 +0000, quadi wrote:

    In connection with the changes described in these posts, I had needed to
    cut the opcode space available to operate instructions in half.
    In re-examining the opcode space available, I've found that I could have three-quarters instead of just half of the original opcode space
    available, and this let me add back much of what I had lost in that
    area.

    I've managed to add back one other thing I had previously had to remove:
    the alternate instruction format for the VLIW style of code.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Wed May 6 01:06:04 2026
    From Newsgroup: comp.arch

    On Sat, 02 May 2026 18:33:00 +0000, MitchAlsup wrote:
    quadi <quadibloc@ca.invalid> posted:

    So accessing an array requires one basically to copy a base register
    value into another register, and add the index to it. That's an extra
    instruction.

    You are making assumptions that are not necessary. Why don't you spell
    them out.

    I was thinking about accessing one array element randomly in isolation.
    Later on, though, I realized that if one is stepping through an array sequentially in a loop, one register is indeed good enough, and thus saves
    an addition.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Wed May 6 01:10:39 2026
    From Newsgroup: comp.arch

    On Mon, 04 May 2026 18:06:18 +0000, MitchAlsup wrote:
    quadi <quadibloc@ca.invalid> posted:

    I've made one more little addition to the instruction set.

    I just had an epiphany wrt Concertina II

    I keep always adding stuff, and it will never be finished?

    Perhaps. I've left a lot of room, now, in auxilliary opcode spaces, to add
    a bunch more stuff.

    But the main standard opcode space is now bursting at the seams, and yet despite that I've managed to put back a couple of things I had previously, with regret, had to remove to make space.

    So, while I've said it before, and it hasn't happened, it seems like I'm finally at the point where I can start fleshing out the design by listing
    the opcodes for the various instructions, and explaining the fancy data
    types.

    Whether or not I'm even capable of going beyond that to the next steps
    you've recommended is something to be seen later.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Wed May 6 17:19:26 2026
    From Newsgroup: comp.arch

    On Wed, 06 May 2026 01:10:39 +0000, quadi wrote:

    So, while I've said it before, and it hasn't happened, it seems like I'm finally at the point where I can start fleshing out the design by
    listing the opcodes for the various instructions, and explaining the
    fancy data types.

    After posting that, I ended up making just one more tiny addition... and noticed, when doing that, that there was a big addition that also should
    be provided for now, rather than later.

    The tiny addition - the U bit, so that now 34-bit instructions, instead of just being memory-reference operate instructions that don't set the
    condition codes could instead be load/store instructions - but with 31
    instead of 7 possible index registers.

    The big one? I had enough bits for a field with which to specify an
    alternate set of instructions to be used together with the existing ones
    in this batch of variable-length code headers.

    But now I have moved on - to a correction of an out-of-date diagram for
    the 48-bit instructions.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From jgd@jgd@cix.co.uk (John Dallman) to comp.arch on Wed May 6 18:59:40 2026
    From Newsgroup: comp.arch

    In article <10tai1v$3q9vl$1@dont-email.me>, quadibloc@ca.invalid (quadi)
    wrote:

    Oh, dear. We _are_ doomed to eternal suffering, after all: what
    Intel took from all these learning experiences was to never try to
    deviate from x86 ever again!

    They abandoned x86S <https://www.intel.com/content/www/us/en/developer/articles/technical/envi sioning-future-simplified-architecture.html>. But it won't be _eternal_ suffering. Eventually, x86 will become obsolete.

    John
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Thu May 7 23:15:46 2026
    From Newsgroup: comp.arch

    On Wed, 06 May 2026 18:58:00 +0100, John Dallman wrote:

    They abandoned x86S

    Which I thought was a _good_ idea, not a bad one. Because upwards compatibility with the huge pool of software out there is the only excuse
    for sticking with x86.

    John Savard
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri May 8 18:12:48 2026
    From Newsgroup: comp.arch


    EricP <ThatWouldBeTelling@thevillage.com> posted:

    On 2026-May-03 09:22, quadi wrote:
    On Sun, 15 Mar 2026 14:35:00 +0000, John Dallman wrote:

    iAPX 432 had instructions which weren't in whole bytes, and were
    addressed by bit offset in a segment. You could only have 64K
    instruction bits in a segment, or 8K bytes. The idea was that no
    subroutine or function ever needed to be bigger than that.

    It's worse than I thought.

    While the STRETCH had bit addressing, unlike the STRETCH this sounds genuinely perverse.

    John Savard

    Performance Effects of Architectural Complexity in the Intel 432, 1988 https://www.princeton.edu/~rblee/ELE572Papers/Fall04Readings/I432.pdf


    There was a CMU paper on 432 that stated if Intel had used 1 more pin
    that performance could have <about> doubled.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Fri May 8 20:11:26 2026
    From Newsgroup: comp.arch

    On Fri, 08 May 2026 18:12:48 +0000, MitchAlsup wrote:
    EricP <ThatWouldBeTelling@thevillage.com> posted:

    Performance Effects of Architectural Complexity in the Intel 432, 1988
    https://www.princeton.edu/~rblee/ELE572Papers/Fall04Readings/I432.pdf

    There was a CMU paper on 432 that stated if Intel had used 1 more pin
    that performance could have <about> doubled.

    Ouch! Well, these days they wouldn't make that kind of mistake again.

    John Savard



    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Fri May 8 16:55:07 2026
    From Newsgroup: comp.arch

    On 2026-May-08 14:12, MitchAlsup wrote:

    EricP <ThatWouldBeTelling@thevillage.com> posted:

    On 2026-May-03 09:22, quadi wrote:
    On Sun, 15 Mar 2026 14:35:00 +0000, John Dallman wrote:

    iAPX 432 had instructions which weren't in whole bytes, and were
    addressed by bit offset in a segment. You could only have 64K
    instruction bits in a segment, or 8K bytes. The idea was that no
    subroutine or function ever needed to be bigger than that.

    It's worse than I thought.

    While the STRETCH had bit addressing, unlike the STRETCH this sounds
    genuinely perverse.

    John Savard

    Performance Effects of Architectural Complexity in the Intel 432, 1988
    https://www.princeton.edu/~rblee/ELE572Papers/Fall04Readings/I432.pdf


    There was a CMU paper on 432 that stated if Intel had used 1 more pin
    that performance could have <about> doubled.

    It is still going to have to chew through gobs of microcode to do anything.
    I too had microcode on the brain back then. In 1976 I designed (but did not build) a microcoded cpu core using TTL AMD 2900 bit-slice components.

    Lately I have been playing around with circa 1975 TTL paper cpu designs but done in a pipelined risc style. The instructions must be variable length because
    memory was so expensive in 1975. The key is to not bottleneck in Fetch or Decode.

    Last month I designed a TTL fetch-parse unit for a risc-ish pipeline using the same
    parts as are on the VAX-780 bill of materials or in the 1976 TI logic data book.

    The instructions are byte granular, variable length from 1 to 12 bytes long. That is long enough to hold a 4 byte instruction specifier (opcode + registers) plus 8 bytes of immediate data. I expect the average instruction to be ~3 bytes.

    My fetch unit paper design reads an 8-byte fetch block each clock
    into a 32 byte circular prefetch buffer. The parser rotates the whole
    32 byte buffer to align the instruction start with the length parser,
    and a PLA examines the first 12 instructon bits to get the length.
    It then validates that all the bytes are present in the buffer
    and passes the 1 to 12 bytes + IP virtual address to the Decoder board.

    This can do a sustained 5 MHz parse of 1 variable instruction/clock,
    provided it hits the instruction cache. As most instructions are simple
    and take 1 clock to execute, it should do sustained 5 MIPS.
    Remember the 780 was 5 MHz but only executes at 0.5 MIPS.
    It might also fit onto the same size PCB as the 780 used, about 15" x 15",
    but requires more edge connector pins for more buses (~300).

    On a separate board Decode will store the fetched instruction and
    feed it through a bank of PLA chips, which controls tri-state buffers
    to route signals into the Decode uOp output register.

    If built my Fetch and Decode units could run 10x the speed of a 780,
    using the exact same parts but just designed from a non-microcode
    point of view.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Fri May 8 21:21:01 2026
    From Newsgroup: comp.arch

    EricP <ThatWouldBeTelling@thevillage.com> writes:
    On 2026-May-08 14:12, MitchAlsup wrote:
    04Readings/I432.pdf


    There was a CMU paper on 432 that stated if Intel had used 1 more pin
    that performance could have <about> doubled.

    It is still going to have to chew through gobs of microcode to do anything.
    I too had microcode on the brain back then. In 1976 I designed (but did not >build) a microcoded cpu core using TTL AMD 2900 bit-slice components.

    Lately I have been playing around with circa 1975 TTL paper cpu designs but >done in a pipelined risc style. The instructions must be variable length because
    memory was so expensive in 1975. The key is to not bottleneck in Fetch or Decode.

    Last month I designed a TTL fetch-parse unit for a risc-ish pipeline using the same
    parts as are on the VAX-780 bill of materials or in the 1976 TI logic data book.

    The instructions are byte granular, variable length from 1 to 12 bytes long. >That is long enough to hold a 4 byte instruction specifier (opcode + registers)
    plus 8 bytes of immediate data. I expect the average instruction to be ~3 bytes.

    My fetch unit paper design reads an 8-byte fetch block each clock
    into a 32 byte circular prefetch buffer. The parser rotates the whole
    32 byte buffer to align the instruction start with the length parser,
    and a PLA examines the first 12 instructon bits to get the length.
    It then validates that all the bytes are present in the buffer
    and passes the 1 to 12 bytes + IP virtual address to the Decoder board.

    This can do a sustained 5 MHz parse of 1 variable instruction/clock,
    provided it hits the instruction cache. As most instructions are simple
    and take 1 clock to execute, it should do sustained 5 MIPS.
    Remember the 780 was 5 MHz but only executes at 0.5 MIPS.
    It might also fit onto the same size PCB as the 780 used, about 15" x 15", >but requires more edge connector pins for more buses (~300).

    On a separate board Decode will store the fetched instruction and
    feed it through a bank of PLA chips, which controls tri-state buffers
    to route signals into the Decode uOp output register.

    If built my Fetch and Decode units could run 10x the speed of a 780,
    using the exact same parts but just designed from a non-microcode
    point of view.

    At what relative cost differential, i.e. what would a VAX-11/780
    have cost if it had been built using your design?
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat May 9 17:07:48 2026
    From Newsgroup: comp.arch

    EricP <ThatWouldBeTelling@thevillage.com> writes:
    [reformatted to fit in 80-char lines with some slack for quote levels]
    Lately I have been playing around with circa 1975 TTL paper cpu
    designs but done in a pipelined risc style. The instructions must be
    variable length because memory was so expensive in 1975. The key is
    to not bottleneck in Fetch or Decode.

    Cool!

    Last month I designed a TTL fetch-parse unit for a risc-ish pipeline
    using the same parts as are on the VAX-780 bill of materials or in
    the 1976 TI logic data book.

    The instructions are byte granular, variable length from 1 to 12
    bytes long. That is long enough to hold a 4 byte instruction
    specifier (opcode + registers) plus 8 bytes of immediate data. I
    expect the average instruction to be ~3 bytes.

    Interesting that you got this to work at 1 IPC. And, as the Skymont
    shows, with enough resources such an instruction set can be made to
    work at 9 IPC (the decoders of the Skymont are 3x3 wide, the renamer
    is 8 wide).

    Still, the question is how much, if any, code density advantage this
    provides over something like RV32GC. In any case, given that RV32GC
    code is smaller then VAX code
    <2025Mar4.093916@mips.complang.tuwien.ac.at>, the code density of
    RV32GC should be good enough, and the decoder may then need less
    circuitry. I expect that your approach also gives some advantage in work/instruction, especially over RISC-V.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sat May 9 19:41:10 2026
    From Newsgroup: comp.arch

    EricP <ThatWouldBeTelling@thevillage.com> schrieb:

    Lately I have been playing around with circa 1975 TTL paper cpu designs but done in a pipelined risc style. The instructions must be variable length because
    memory was so expensive in 1975. The key is to not bottleneck in Fetch or Decode.

    Last month I designed a TTL fetch-parse unit for a risc-ish pipeline using the same
    parts as are on the VAX-780 bill of materials or in the 1976 TI logic data book.

    The instructions are byte granular, variable length from 1 to 12 bytes long. That is long enough to hold a 4 byte instruction specifier (opcode + registers)
    plus 8 bytes of immediate data. I expect the average instruction to be ~3 bytes.

    That is quite impressive.

    Could you share some details of the ISA, number of registers, what
    do you use your 1-byte opcode for etc?

    My fetch unit paper design reads an 8-byte fetch block each clock
    into a 32 byte circular prefetch buffer. The parser rotates the whole
    32 byte buffer to align the instruction start with the length parser,
    and a PLA examines the first 12 instructon bits to get the length.
    It then validates that all the bytes are present in the buffer
    and passes the 1 to 12 bytes + IP virtual address to the Decoder board.

    This can do a sustained 5 MHz parse of 1 variable instruction/clock,
    provided it hits the instruction cache. As most instructions are simple
    and take 1 clock to execute, it should do sustained 5 MIPS.
    Remember the 780 was 5 MHz but only executes at 0.5 MIPS.
    It might also fit onto the same size PCB as the 780 used, about 15" x 15", but requires more edge connector pins for more buses (~300).

    On a separate board Decode will store the fetched instruction and
    feed it through a bank of PLA chips, which controls tri-state buffers
    to route signals into the Decode uOp output register.

    If built my Fetch and Decode units could run 10x the speed of a 780,
    using the exact same parts but just designed from a non-microcode
    point of view.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Sun May 10 08:59:27 2026
    From Newsgroup: comp.arch

    On 2026-May-08 17:21, Scott Lurndal wrote:
    EricP <ThatWouldBeTelling@thevillage.com> writes:
    On 2026-May-08 14:12, MitchAlsup wrote:
    04Readings/I432.pdf


    There was a CMU paper on 432 that stated if Intel had used 1 more pin
    that performance could have <about> doubled.

    It is still going to have to chew through gobs of microcode to do anything. >> I too had microcode on the brain back then. In 1976 I designed (but did not >> build) a microcoded cpu core using TTL AMD 2900 bit-slice components.

    Lately I have been playing around with circa 1975 TTL paper cpu designs but >> done in a pipelined risc style. The instructions must be variable length because
    memory was so expensive in 1975. The key is to not bottleneck in Fetch or Decode.

    Last month I designed a TTL fetch-parse unit for a risc-ish pipeline using the same
    parts as are on the VAX-780 bill of materials or in the 1976 TI logic data book.

    The instructions are byte granular, variable length from 1 to 12 bytes long. >> That is long enough to hold a 4 byte instruction specifier (opcode + registers)
    plus 8 bytes of immediate data. I expect the average instruction to be ~3 bytes.

    My fetch unit paper design reads an 8-byte fetch block each clock
    into a 32 byte circular prefetch buffer. The parser rotates the whole
    32 byte buffer to align the instruction start with the length parser,
    and a PLA examines the first 12 instructon bits to get the length.
    It then validates that all the bytes are present in the buffer
    and passes the 1 to 12 bytes + IP virtual address to the Decoder board.

    This can do a sustained 5 MHz parse of 1 variable instruction/clock,
    provided it hits the instruction cache. As most instructions are simple
    and take 1 clock to execute, it should do sustained 5 MIPS.
    Remember the 780 was 5 MHz but only executes at 0.5 MIPS.
    It might also fit onto the same size PCB as the 780 used, about 15" x 15", >> but requires more edge connector pins for more buses (~300).

    On a separate board Decode will store the fetched instruction and
    feed it through a bank of PLA chips, which controls tri-state buffers
    to route signals into the Decode uOp output register.

    If built my Fetch and Decode units could run 10x the speed of a 780,
    using the exact same parts but just designed from a non-microcode
    point of view.

    At what relative cost differential, i.e. what would a VAX-11/780
    have cost if it had been built using your design?

    Just eyeballing it, I'd say about the same.

    IIRC the 780 sold for $250k USD plus $100K for 256kB DRAM memory.
    The cpu core consisted of 20 pcb's, the optional FPU is 5 pcb's.
    Each board appears to be 15" x 15" with 6 edge connectors
    along the bottom with 2 rows of 20 pins, = 240 edge pins.
    Each board plugs into a specific slot in the backplane
    and the backplane does the board interconnect.
    The densest board appears to be the Writable Control Store
    which has about 16 rows by 16 columns of 16 pin DIPs.

    780 had a single I & D cache, 8kB 2 way assoc., 8 byte lines,
    write through, write no alloc, with DMA snooping and invalidates.
    Cache takes up 2 boards, 1 for tags, 1 for data.
    The TLB is 128 sets, 2 way assoc. PTE's.
    I can't find where it says main memory access cycle time
    just now but IIRC it was 1200 ns.

    I would keep some boards the same or with small changes.
    I don't need the Writable Control Store, Prom Control Store,
    and Microsequencer boards.

    I would have a separate I-cache (2 boards) so it can fetch
    and execute at the same time, but it is the same boards as
    the D-cache so no design costs.

    TLB is mostly the same except page size is 4kB and page table is
    2 tables with 2 levels each, so page table walking is different.

    Bus interface to the main memory bus (SBI) is the same.
    As are the memory controllers and memory boards.

    The design difference is in Fetch, Decode, register file,
    and integer EXE stage, though the components would stay largely
    the same in type and number.
    32b ALU is 74181/74182 chips,
    32b barrel shifter is 74S350 (aka AMD 25S10's) chips,

    I don't know what 780 used for its integer multiplier
    (I can't find it in the schematics).
    It is possible to build a 1 clock 32b Wallace tree multiplier
    in TTL but it would be expensive.
    IIRC there were 16b flash multiplier chips available then
    so using one of those MULU would take 5 or 6 clocks.

    I also need way more than 240 pcb edge connector pins because
    there are many more buses operating concurrently.
    350 or so would be nice.



    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Sun May 10 10:54:33 2026
    From Newsgroup: comp.arch

    On 2026-May-09 13:07, Anton Ertl wrote:
    EricP <ThatWouldBeTelling@thevillage.com> writes:
    [reformatted to fit in 80-char lines with some slack for quote levels]
    Lately I have been playing around with circa 1975 TTL paper cpu
    designs but done in a pipelined risc style. The instructions must be
    variable length because memory was so expensive in 1975. The key is
    to not bottleneck in Fetch or Decode.

    Cool!

    Last month I designed a TTL fetch-parse unit for a risc-ish pipeline
    using the same parts as are on the VAX-780 bill of materials or in
    the 1976 TI logic data book.

    The instructions are byte granular, variable length from 1 to 12
    bytes long. That is long enough to hold a 4 byte instruction
    specifier (opcode + registers) plus 8 bytes of immediate data. I
    expect the average instruction to be ~3 bytes.

    Interesting that you got this to work at 1 IPC. And, as the Skymont
    shows, with enough resources such an instruction set can be made to
    work at 9 IPC (the decoders of the Skymont are 3x3 wide, the renamer
    is 8 wide).

    The design is for the Fetch and Decode stages, and it is a paper design.
    But those two stages each fit in a single pcb and would run at 1 IPC.
    The VAX usage stats show that around 80% of the integer instruction
    usage are simple, 2 source, 1 dest register, which with a 3 port 2R 1W
    register file would also execute in 1 clock.

    If on this model certain instructions need to stall the pipeline in
    some stages, eg a 5 clock MUL, I am fine with that as they are low usage.

    Still, the question is how much, if any, code density advantage this
    provides over something like RV32GC. In any case, given that RV32GC
    code is smaller then VAX code
    <2025Mar4.093916@mips.complang.tuwien.ac.at>, the code density of
    RV32GC should be good enough, and the decoder may then need less
    circuitry. I expect that your approach also gives some advantage in work/instruction, especially over RISC-V.

    - anton

    The high usage integer operate instructions fit have a 12b opcode
    and 3 x 4b register fields. A 4B instruction would waste 1B each.
    In 1975 memory was so expensive I didn't think that would go
    down well with customers. Also having immediate values avoids
    all the risc constant pasting instructions.
    The way I looked at this as if I couldn't get a Fetch design
    for variable length instructions to fit on 1 board or work at
    1 IPC then I would have considered fixed 4B instructions.
    As it turns out variable length does work - the key is having
    big alignment shifter that fits on the board.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun May 10 15:59:48 2026
    From Newsgroup: comp.arch

    EricP <ThatWouldBeTelling@thevillage.com> writes:
    I don't know what 780 used for its integer multiplier
    (I can't find it in the schematics).

    Then I expect that it used microcode to do 1 multiplier bit per cycle.
    IIRC that was the approach that SPARC took at first, and that was also
    designed into HPPA (but then they added the FPU and multiplied by
    using the multiplier of the FPU).

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun May 10 16:05:08 2026
    From Newsgroup: comp.arch

    EricP <ThatWouldBeTelling@thevillage.com> writes:
    On 2026-May-09 13:07, Anton Ertl wrote:
    Interesting that you got this to work at 1 IPC. And, as the Skymont
    shows, with enough resources such an instruction set can be made to
    work at 9 IPC (the decoders of the Skymont are 3x3 wide, the renamer
    is 8 wide).

    The design is for the Fetch and Decode stages, and it is a paper design.
    But those two stages each fit in a single pcb and would run at 1 IPC.

    Yes, my ramblings were about the future-proofness of such an
    instruction set design.

    Still, the question is how much, if any, code density advantage this
    provides over something like RV32GC. In any case, given that RV32GC
    code is smaller then VAX code
    <2025Mar4.093916@mips.complang.tuwien.ac.at>, the code density of
    RV32GC should be good enough, and the decoder may then need less
    circuitry. I expect that your approach also gives some advantage in
    work/instruction, especially over RISC-V.

    - anton

    The high usage integer operate instructions fit have a 12b opcode
    and 3 x 4b register fields. A 4B instruction would waste 1B each.

    The compressed instructions (the "C" in RV32GC) of RISC-V take 16bits
    and are a subset of the regular instructions (i.e., if no appropriate compressed instruction exists, you just use the regular instruction).
    The compressed instructions typically use one specifier for a source
    and a destination, eliminating one register specifier, and in many
    cases only have 3-bit register specifiers (for r8-r15, ordinary RISC-V
    has a 5-bit register specifier). In some cases a register is fixed
    (requiring 0b for the specifier). The immediate operands are also
    shorter.

    A 12b opcode looks like a lot to me. I have typically seen 6b on
    RISCs, with some opcodes using an auxiliary field for further
    refinement.

    The disadvantage of RV32GC for your project would be that there is all
    this wasted encoding space in the regular and compressed instructions
    for the additional instructions of RV64GC.

    OTOH, if you traveled back in time and the architecture was a success,
    that would result in an easy step to extend the architecture to 64
    bits.

    Then again, as AMD64 and ARM A64 demonstrate, doing a separate, binary-incompatible 64-bit instruction can work well. And the fact
    that for MIPS, SPARC, HPPA and Power the 32-bit instruction set is
    just a subset of the 64-bit instruction set has not been a decisive
    factor for their success.

    In 1975 memory was so expensive I didn't think that would go
    down well with customers.

    Supporting 1B and 3B instructions in addition to RV32GC's 2B and 4B instructions could help increase code density further, but makes it
    more costly to build later CPUs with IPC>1.

    Also having immediate values avoids
    all the risc constant pasting instructions.

    Yes, and this increases the work/instruction. The question is if the
    benefit is worth the cost. The RISC designers, even those with
    variable-width instructions, decided against this feature.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Sun May 10 16:23:00 2026
    From Newsgroup: comp.arch

    EricP [2026-05-10 08:59:27] wrote:
    On 2026-May-08 17:21, Scott Lurndal wrote:
    At what relative cost differential, i.e. what would a VAX-11/780
    have cost if it had been built using your design?
    Just eyeballing it, I'd say about the same.
    [...]
    I also need way more than 240 pcb edge connector pins because
    there are many more buses operating concurrently.
    350 or so would be nice.

    So, the question comes down to how much would it cost to increase the
    240 pins to, say, 360.
    [ My gut feeling is that it could be (have been) fairly expensive. ]


    === Stefan
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Mon May 11 07:47:53 2026
    From Newsgroup: comp.arch

    Stefan Monnier <monnier@iro.umontreal.ca> writes:
    So, the question comes down to how much would it cost to increase the
    240 pins to, say, 360.
    [ My gut feeling is that it could be (have been) fairly expensive. ]

    I am not an expert, but it seems to me that 120 extra pins on the
    inter-board connectors would cost about as much as 120 pins of regular (socketed) chips per board. So if EricP can eliminate that many chips
    or maybe a board, the cost should be the same. If not, it would be a
    few percent higher. In absolute numbers, I expect that the increase
    in cost would be <$1000.

    The other question, of course, is, how much DEC could have charged for
    a machine that is 10x faster than a VAX 11/780. I guess that they
    could easily have recouped the additional cost, if any.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Mon May 11 09:02:23 2026
    From Newsgroup: comp.arch

    On 2026-May-10 16:23, Stefan Monnier wrote:
    EricP [2026-05-10 08:59:27] wrote:
    On 2026-May-08 17:21, Scott Lurndal wrote:
    At what relative cost differential, i.e. what would a VAX-11/780
    have cost if it had been built using your design?
    Just eyeballing it, I'd say about the same.
    [...]
    I also need way more than 240 pcb edge connector pins because
    there are many more buses operating concurrently.
    350 or so would be nice.

    So, the question comes down to how much would it cost to increase the
    240 pins to, say, 360.
    [ My gut feeling is that it could be (have been) fairly expensive. ]


    === Stefan

    Yes, that's why I mention it.
    The problem is that as the # of pins goes up, so does the insertion friction. At some point, which I don't know, the insertion force is high enough that
    it can exceed the crush strength of the connector contact.
    The contact gets crushed into the bottom of the connector and it is useless.

    On a bus like a PC bus where a card can go in any slot,
    you would (hopefully) just move to another slot.
    But on a backplane design like the 780 where cards must go in
    specific slots, you just trashed your whole backplane and now
    have to disassemble the machine to replace it.

    There are connectors called Zero Insertion Force connectors
    where you put the card in, then twist a screw or something and
    the connector closes like a clamp on the card contacts.

    I did a quicky search to see if there were anything like what
    I need, just to get the price, but could not find anything.
    If they exist I imagine they are considerably more expensive
    but each cpu only needs 25 of them.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Mon May 11 16:21:34 2026
    From Newsgroup: comp.arch

    EricP wrote:
    On 2026-May-10 16:23, Stefan Monnier wrote:
    EricP [2026-05-10 08:59:27] wrote:
    On 2026-May-08 17:21, Scott Lurndal wrote:
    At what relative cost differential, i.e. what would a VAX-11/780
    have cost if it had been built using your design?
    Just eyeballing it, I'd say about the same.
    [...]
    I also need way more than 240 pcb edge connector pins because
    there are many more buses operating concurrently.
    350 or so would be nice.

    So, the question comes down to how much would it cost to increase the
    240 pins to, say, 360.
    [ My gut feeling is that it could be (have been) fairly expensive.-a ]


    === Stefan

    Yes, that's why I mention it.
    The problem is that as the # of pins goes up, so does the insertion friction.
    At some point, which I don't know, the insertion force is high enough that
    it can exceed the crush strength of the connector contact.
    The contact gets crushed into the bottom of the connector and it is
    useless.

    On a bus like a PC bus where a card can go in any slot,
    you would (hopefully) just move to another slot.
    But on a backplane design like the 780 where cards must go in
    specific slots, you just trashed your whole backplane and now
    have to disassemble the machine to replace it.

    There are connectors called Zero Insertion Force connectors
    where you put the card in, then twist a screw or something and
    the connector closes like a clamp on the card contacts.
    I've seen those both on EPROMs and CPUs, quite nice actually.
    Terje
    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From jgd@jgd@cix.co.uk (John Dallman) to comp.arch on Mon May 11 15:35:40 2026
    From Newsgroup: comp.arch

    In article <10tj6f2$2frpj$1@dont-email.me>, quadibloc@ca.invalid (quadi)
    wrote:

    On Wed, 06 May 2026 18:58:00 +0100, John Dallman wrote:
    They abandoned x86S
    Which I thought was a _good_ idea, not a bad one. Because upwards compatibility with the huge pool of software out there is the only
    excuse for sticking with x86.

    It was a plausible concept, but as best I remember from reading the white paper, they were fuzzy about just how much 32-bit software it would run,
    if you weren't an expert on x86 operating modes and memory models.

    If it had been "16-bit goes, all 32-bit stays, including operating
    systems" they'd likely have got a lot more buy-in. But would that have
    been a worthwhile simplification?

    John
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Mon May 11 18:16:00 2026
    From Newsgroup: comp.arch


    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

    Stefan Monnier <monnier@iro.umontreal.ca> writes:
    So, the question comes down to how much would it cost to increase the
    240 pins to, say, 360.
    [ My gut feeling is that it could be (have been) fairly expensive. ]

    My guess (GUESS)is that packages have 4 components:
    a) package area
    b) die area
    c) power dissipation capability
    d) pin count

    So adding 50% more pins keeping the other 3 fixed would add 25%/4 = 6%
    but it would similarly add about 12% to the connector by which the
    package is attached to its motherboard and some additional MB costs.

    I am not an expert, but it seems to me that 120 extra pins on the
    inter-board connectors would cost about as much as 120 pins of regular (socketed) chips per board. So if EricP can eliminate that many chips
    or maybe a board, the cost should be the same. If not, it would be a
    few percent higher. In absolute numbers, I expect that the increase
    in cost would be <$1000.

    It never ceased to amaze me that FABs are built at costs of $20B+ to manufacture $10 chips, wile packages come from $100M plants making
    $10 packages that accept 1 chip.

    The other question, of course, is, how much DEC could have charged for
    a machine that is 10x faster than a VAX 11/780. I guess that they
    could easily have recouped the additional cost, if any.

    - anton
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Mon May 11 14:44:06 2026
    From Newsgroup: comp.arch

    On 2026-May-09 15:41, Thomas Koenig wrote:
    EricP <ThatWouldBeTelling@thevillage.com> schrieb:

    Lately I have been playing around with circa 1975 TTL paper cpu designs but >> done in a pipelined risc style. The instructions must be variable length because
    memory was so expensive in 1975. The key is to not bottleneck in Fetch or Decode.

    Last month I designed a TTL fetch-parse unit for a risc-ish pipeline using the same
    parts as are on the VAX-780 bill of materials or in the 1976 TI logic data book.

    The instructions are byte granular, variable length from 1 to 12 bytes long. >> That is long enough to hold a 4 byte instruction specifier (opcode + registers)
    plus 8 bytes of immediate data. I expect the average instruction to be ~3 bytes.

    That is quite impressive.

    Could you share some details of the ISA, number of registers, what
    do you use your 1-byte opcode for etc?

    My fetch unit paper design reads an 8-byte fetch block each clock
    into a 32 byte circular prefetch buffer. The parser rotates the whole
    32 byte buffer to align the instruction start with the length parser,
    and a PLA examines the first 12 instructon bits to get the length.
    It then validates that all the bytes are present in the buffer
    and passes the 1 to 12 bytes + IP virtual address to the Decoder board.

    This can do a sustained 5 MHz parse of 1 variable instruction/clock,
    provided it hits the instruction cache. As most instructions are simple
    and take 1 clock to execute, it should do sustained 5 MIPS.
    Remember the 780 was 5 MHz but only executes at 0.5 MIPS.
    It might also fit onto the same size PCB as the 780 used, about 15" x 15", >> but requires more edge connector pins for more buses (~300).

    On a separate board Decode will store the fetched instruction and
    feed it through a bank of PLA chips, which controls tri-state buffers
    to route signals into the Decode uOp output register.

    If built my Fetch and Decode units could run 10x the speed of a 780,
    using the exact same parts but just designed from a non-microcode
    point of view.

    To answer your second question first,
    I have 3 instructions that must be sized at 1 instruction granule,
    which in this case is 1 byte, as can occur on arbitrary boundary:
    ILLG Illegal is opcode 0, causes an Illegal instruction fault
    NOP No Operation
    BRKP Breakpoint

    ILLG is used pad non-executing space, NOP pads executing space,
    BRKP can be deposited by a debugger at any instruction start.

    Other instructions may be 1 granule, only these 3 must be.

    The goal of my exercise was to see if Fetch and Decode units could
    be designed with circa 1975 TTL that execute at 5 MHz getting 1 IPC.
    The ISA is just a prototype to set some boundaries for the
    Fetch and Decode designs have something fixed to shoot for.

    The Fetch unit cares little what the internal format of an instruction is.
    It only looks at most at the first up to 12 bits to determine length.
    It only cares how many bytes are in the instruction and are they all
    present in the prefetch register. If they are Fetch copies them to
    the Decode input instruction register. It is Decode that looks at the
    internal instruction of instructions and translates them into a uOp.

    Provided Fetch and Decode meet there performance goal,
    the instructions and their formats would be chosen such
    that they are compatible with any limitations on those units.

    The ISA is 32 bit integer and virtual address space.
    Instruction Pointer register,
    16 x 32b integer registers, 16 x 64b float registers.
    No integer condition codes.

    I wanted variable length instructions so it could have large immediates.
    12 bytes was large enough to hold a 4B opcode and register numbers
    with an 8B immediate that is either 1 fp64
    or 2 int32 for compare and branch.

    The Fetch unit parser uses a Signetics 82S100 FPLA to determine
    the instruction length. The 82S100 has 16 inputs, 8 outputs,
    and can match 48 product terms with 0, 1 or x dont-care bits.
    The parser feeds the first 12 bits from the first 2 bytes plus
    their 2 byte Valid flags and an error signal into the 82S100.
    There is one input spare for future use.
    The PLA outputs a 4b length from 0 to 12, plus 2 status bits
    indicating the length is valid, or fetch unit stalled,
    or fetch error, either page fault exception or HW error.
    If the parse length is valid then fetch checks that all the
    bytes of the instruction length are also valid.
    If they are it passes the instruction to Decode input register.

    To give an example, to encode the 3 1B instructions I define
    (0 matches a 0, 1 matches a 1, x = a dont-care bit):

    ILLG = bxxxx_0000_0000
    NOP = bxxxx_1000_0000
    BRKP = bxxxx_0100_0000
    spare = bxxxx_1100_0000

    Notice bits [5:0] of all are 0. So I define a PLA pattern
    that matches that and spits out the length 1

    11
    1098_7654_3210
    xxxx_xx00_0000 => 1

    Now I want low usage frequency instructions like SYSENTER, SYSEXIT
    that have no registers but I don't want to use up all my primary
    byte code space so I continue in the second byte.

    11
    1098_7654_3210
    0000_0010_0000 => 2

    Now I want 3 byte instructions for the high usage 3 operand instructions
    like ADD, SUB, AND, OR, etc. with 3 x 4b register fields.
    There are many of them, lets say 64 or 4 groups of 16.

    11
    1098_7654_3210
    xxxx_xx01_0000 => 3

    So all of the integer and float 3 register instructions begin
    with the bits [5:0] = 01_0000, and bits [11:6] select individual
    instructions.

    Now I go back and define exactly what those instructions are:

    ADD Add Rd1 = Rs2 + Rs3
    ADDFS Add fault signed overflow Rd1 = Rs2 + Rs3
    ADDFU Add fault unsigned overflow Rd1 = Rs2 + Rs3

    SUB Subtract Rd1 = Rs2 - Rs3
    SUBFS Subtract fault signed overflow Rd1 = Rs2 - Rs3
    SUBFU Subtract fault unsigned overflow Rd1 = Rs2 - Rs3
    ...

    Now I'll do conditional branches.
    Conditional branches need a 4b register to test,
    a 3b condition code to test for (EZ = equal zero, NZ = not zero, ...)
    and either a short byte or long 32b word offset.

    But wait... a 12b opcode, 4b register and a 1B offset
    is a 3B instruction so I can take all the short conditional
    branches and merge them with the other 3B instruction groups.

    That leaves long branches which have a 12b opcode, 4b register,
    and 4B offset and a length of 6 bytes.

    11
    1098_7654_3210
    xxx0_0011_0000 => 6

    But wait... a 12b opcode, 4b register and 4B immediate is the
    same length as an ADDIW add immediate word with a single
    source-dest register. So I make the long branch group bigger
    by moving the dont-care point:

    11
    1098_7654_3210
    xxxx_x011_0000 => 6

    and now define all those instructions.
    And so on.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Mon May 11 20:51:39 2026
    From Newsgroup: comp.arch


    EricP <ThatWouldBeTelling@thevillage.com> posted:

    On 2026-May-09 15:41, Thomas Koenig wrote:
    EricP <ThatWouldBeTelling@thevillage.com> schrieb:

    Lately I have been playing around with circa 1975 TTL paper cpu designs but
    done in a pipelined risc style. The instructions must be variable length because
    memory was so expensive in 1975. The key is to not bottleneck in Fetch or Decode.

    Last month I designed a TTL fetch-parse unit for a risc-ish pipeline using the same
    parts as are on the VAX-780 bill of materials or in the 1976 TI logic data book.

    The instructions are byte granular, variable length from 1 to 12 bytes long.
    That is long enough to hold a 4 byte instruction specifier (opcode + registers)
    plus 8 bytes of immediate data. I expect the average instruction to be ~3 bytes.

    That is quite impressive.

    Could you share some details of the ISA, number of registers, what
    do you use your 1-byte opcode for etc?

    My fetch unit paper design reads an 8-byte fetch block each clock
    into a 32 byte circular prefetch buffer. The parser rotates the whole
    32 byte buffer to align the instruction start with the length parser,
    and a PLA examines the first 12 instructon bits to get the length.
    It then validates that all the bytes are present in the buffer
    and passes the 1 to 12 bytes + IP virtual address to the Decoder board.

    This can do a sustained 5 MHz parse of 1 variable instruction/clock,
    provided it hits the instruction cache. As most instructions are simple
    and take 1 clock to execute, it should do sustained 5 MIPS.
    Remember the 780 was 5 MHz but only executes at 0.5 MIPS.
    It might also fit onto the same size PCB as the 780 used, about 15" x 15", >> but requires more edge connector pins for more buses (~300).

    On a separate board Decode will store the fetched instruction and
    feed it through a bank of PLA chips, which controls tri-state buffers
    to route signals into the Decode uOp output register.

    If built my Fetch and Decode units could run 10x the speed of a 780,
    using the exact same parts but just designed from a non-microcode
    point of view.

    To answer your second question first,
    I have 3 instructions that must be sized at 1 instruction granule,
    which in this case is 1 byte, as can occur on arbitrary boundary:
    ILLG Illegal is opcode 0, causes an Illegal instruction fault
    NOP No Operation
    BRKP Breakpoint

    ILLG is used pad non-executing space, NOP pads executing space,
    BRKP can be deposited by a debugger at any instruction start.

    Other instructions may be 1 granule, only these 3 must be.

    Point of order: these 3 only have to be the smallest granule of
    ISA not necessarily 1 byte).

    The goal of my exercise was to see if Fetch and Decode units could
    be designed with circa 1975 TTL that execute at 5 MHz getting 1 IPC.
    The ISA is just a prototype to set some boundaries for the
    Fetch and Decode designs have something fixed to shoot for.

    Laudable.

    The Fetch unit cares little what the internal format of an instruction is.
    It only looks at most at the first up to 12 bits to determine length.
    It only cares how many bytes are in the instruction and are they all
    present in the prefetch register. If they are Fetch copies them to
    the Decode input instruction register.

    This is the stage I call PARSE. Fetch is responsible for presenting
    cache sub-lines to PARSE, PARSE slices instructions out of the words
    and presenting parsed instructions to multiple parallel DECODE units.

    It is Decode that looks at the
    internal instruction of instructions and translates them into a uOp.

    Provided Fetch and Decode meet there performance goal,
    the instructions and their formats would be chosen such
    that they are compatible with any limitations on those units.

    The ISA is 32 bit integer and virtual address space.
    Instruction Pointer register,
    16 x 32b integer registers, 16 x 64b float registers.
    No integer condition codes.

    Program status {word to line} ??
    Root pointer ??
    ...

    I wanted variable length instructions so it could have large immediates.
    12 bytes was large enough to hold a 4B opcode and register numbers
    with an 8B immediate that is either 1 fp64
    or 2 int32 for compare and branch.

    After dropping ST #largeconst,[Rbase+Rindex<<scale+largeDISP]
    My 66000 went from 5 word max down to 3 word max and the inst-
    length decoder went from 40-gates (4 gates of delay) down to
    5 gates and 2 gates of delay--at a measured cost of 0.27% more
    instructions.

    A good tradeoff !

    The Fetch unit parser uses a Signetics 82S100 FPLA to determine
    the instruction length. The 82S100 has 16 inputs, 8 outputs,
    and can match 48 product terms with 0, 1 or x dont-care bits.
    The parser feeds the first 12 bits from the first 2 bytes plus
    their 2 byte Valid flags and an error signal into the 82S100.
    There is one input spare for future use.
    The PLA outputs a 4b length from 0 to 12, plus 2 status bits
    indicating the length is valid, or fetch unit stalled,
    or fetch error, either page fault exception or HW error.
    If the parse length is valid then fetch checks that all the
    bytes of the instruction length are also valid.
    If they are it passes the instruction to Decode input register.

    I should mention that the DECODE unit of Mc 88100 was a siingle
    NOR plane {while a PLA is 2 NOR planes}.

    To give an example, to encode the 3 1B instructions I define
    (0 matches a 0, 1 matches a 1, x = a dont-care bit):

    ILLG = bxxxx_0000_0000
    NOP = bxxxx_1000_0000
    BRKP = bxxxx_0100_0000
    spare = bxxxx_1100_0000

    Notice bits [5:0] of all are 0. So I define a PLA pattern
    that matches that and spits out the length 1

    Not bad!

    11
    1098_7654_3210
    xxxx_xx00_0000 => 1

    I don't see the PLA calculation in the above--can you explain ?

    Now I want low usage frequency instructions like SYSENTER, SYSEXIT
    that have no registers but I don't want to use up all my primary
    byte code space so I continue in the second byte.

    In comparison: My 66000 SVC (sysenter) SVR (sysexit) use the Rd
    field as a count of the number of registers to copy from {actually
    I just do not load over these} caller/returner to called/returned.
    This allows completely separate register files at each privilege
    level and the same SBI as ABI (98%-ile). My 66000 Linux has 0-6
    argument registers to SVC and 0-2 result registers from SVR. All
    more privileged registers from returner are overwritten by lesser
    privileged registers for returned--avoiding information spillage.

    11
    1098_7654_3210
    0000_0010_0000 => 2

    Now I want 3 byte instructions for the high usage 3 operand instructions
    like ADD, SUB, AND, OR, etc. with 3 x 4b register fields.
    There are many of them, lets say 64 or 4 groups of 16.

    11
    1098_7654_3210
    xxxx_xx01_0000 => 3

    So all of the integer and float 3 register instructions begin
    with the bits [5:0] = 01_0000, and bits [11:6] select individual instructions.

    Now I go back and define exactly what those instructions are:

    ADD Add Rd1 = Rs2 + Rs3
    ADDFS Add fault signed overflow Rd1 = Rs2 + Rs3
    ADDFU Add fault unsigned overflow Rd1 = Rs2 + Rs3

    SUB Subtract Rd1 = Rs2 - Rs3
    SUBFS Subtract fault signed overflow Rd1 = Rs2 - Rs3
    SUBFU Subtract fault unsigned overflow Rd1 = Rs2 - Rs3
    ...
    So, Mil-like

    Now I'll do conditional branches.
    Conditional branches need a 4b register to test,
    a 3b condition code to test for (EZ = equal zero, NZ = not zero, ...)
    and either a short byte or long 32b word offset.

    16-bits<<2 cover over // well we have not yet compiled a subroutine
    that exceeds this flow-control range within a subroutine--so, for
    all practical purposes 100%.

    For unconditional branches, 26-bits<<2 covers all known benchmarks
    but fails on several DataBase code arrangements {miniscul %}.

    Byte sized instruction granules cannot use the <<2 portion and
    will suffer more "issues".

    But wait... a 12b opcode, 4b register and a 1B offset
    is a 3B instruction so I can take all the short conditional
    branches and merge them with the other 3B instruction groups.

    That leaves long branches which have a 12b opcode, 4b register,
    and 4B offset and a length of 6 bytes.

    11
    1098_7654_3210
    xxx0_0011_0000 => 6

    We found than by dropping the conditionality it fits in 4-bytes
    at the cost of 4-effective bits of displacement--where you are
    paying 16-bits for it.

    But wait... a 12b opcode, 4b register and 4B immediate is the
    same length as an ADDIW add immediate word with a single
    source-dest register. So I make the long branch group bigger
    by moving the dont-care point:

    11
    1098_7654_3210
    xxxx_x011_0000 => 6

    and now define all those instructions.
    And so on.

    Cute...
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Tue May 12 07:54:48 2026
    From Newsgroup: comp.arch

    EricP <ThatWouldBeTelling@thevillage.com> schrieb:

    There are connectors called Zero Insertion Force connectors
    where you put the card in, then twist a screw or something and
    the connector closes like a clamp on the card contacts.

    I did a quicky search to see if there were anything like what
    I need, just to get the price, but could not find anything.
    If they exist I imagine they are considerably more expensive
    but each cpu only needs 25 of them.

    There a couple of basic patents just around that timeframe,
    e.g. https://patents.google.com/patent/US4080032A (for
    chips, not for circuit boards) which was laid-open in 1978. https://patents.google.com/patent/US4189200A was filed in 1978.
    Seems they were just a bit too late for introduction with your
    VAX alternative.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Tue May 12 15:58:07 2026
    From Newsgroup: comp.arch

    On Mon, 11 May 2026 15:34:00 +0100, John Dallman wrote:

    If it had been "16-bit goes, all 32-bit stays, including operating
    systems" they'd likely have got a lot more buy-in. But would that have
    been a worthwhile simplification?

    As far as I'm concerned, they made the same mistake with EM64T as they did with the 80286. Not only should 16-bit stay, but they should have had it
    so that 16-bit was as easily accessible from 64-bit as it was from 32-bit,
    so as to retain all 16-bit software running perfectly and transparently in 64-bit editions of Windows.

    Upwards compatibility, where software is not open-source, so users are dependent on binaries running as-is without recompilation, should be
    regarded as absolutely mandatory. Total upwards compatibility.

    John Savard
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Tue May 12 18:16:05 2026
    From Newsgroup: comp.arch

    quadi <quadibloc@ca.invalid> writes:
    On Mon, 11 May 2026 15:34:00 +0100, John Dallman wrote:

    If it had been "16-bit goes, all 32-bit stays, including operating
    systems" they'd likely have got a lot more buy-in. But would that have
    been a worthwhile simplification?

    As far as I'm concerned, they made the same mistake with EM64T as they did >with the 80286. Not only should 16-bit stay, but they should have had it
    so that 16-bit was as easily accessible from 64-bit as it was from 32-bit, >so as to retain all 16-bit software running perfectly and transparently in >64-bit editions of Windows.

    As far as I'm concerned, the sooner they get rid of all the 32-bit
    legacy crap (e.g. segments, booting through real-mode, protected-mode
    and long-mode), the better.

    x86S is the future...

    https://www.intel.com/content/www/us/en/developer/articles/technical/envisioning-future-simplified-architecture.html
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Tue May 12 15:35:22 2026
    From Newsgroup: comp.arch

    On 2026-May-10 11:59, Anton Ertl wrote:
    EricP <ThatWouldBeTelling@thevillage.com> writes:
    I don't know what 780 used for its integer multiplier
    (I can't find it in the schematics).

    Then I expect that it used microcode to do 1 multiplier bit per cycle.
    IIRC that was the approach that SPARC took at first, and that was also designed into HPPA (but then they added the FPU and multiplied by
    using the multiplier of the FPU).

    - anton

    I think so. There is one line in the hardware manual that says
    the optional FPU also "enhances the performance of integer multiply".

    The FPU technical manual describes how the FPU does its multiplies.
    It chops the values into 4b nibbles and looks up 4b x 4b values in ROMs
    in a loop, and sums the partial products.
    There are two sets of ROMs so it can interleave the lookups.

    Note that there are TTL parts available in 1975 to build a
    Wallace tree multiplier that can do 32b x 32b in ~130 ns.

    From FP780_Technical_Description_1978-12

    "In order to obtain fast multiplication, a pipeline technique is used (Figure 2-21 ).
    The multiplier is divided into 4-bit nibbles. The nibbles are then accessed consecutively
    by a counter-multiplexer combination (least significant nibble first) and each nibble
    operates on up to 32 bits of multiplicand. The MCA ND bus and MPLIER nibbles are used to
    address the ROMs. The banks of ROMs provide a 4 X 4 primitive with 2-way interleaving.
    The data is latched (ROM.STORE) and applied to the inputs of 4-bit adders (PALU).
    These adders combine the ROM data to form a partial product, storing the carryout
    of each 4-bit section, to be added in on the next cycle. The partial product is latched
    in PPROD and passed to another row of adders (AALU) which accumulate the final product,
    again, saving the carries. Thus, when the pipeline is operating, there are four processes
    cycling at the same time:

    1. Select ROM addresses
    2. Latch ROM data
    3. Form partial product
    4. Accumulate final product.

    After the final product is calculated, the stored carriers from both stages are combined with the accumulated product using full carry look-ahead to produce the
    final answer in a single precision (float) operation. In double precision,
    this result is stored and used during the generation of the final answer
    during the second pass.

    Each of the pipeline processes, with the exception of accessing ROM data
    (which occurs in each bank of ROMs on 100 ns) occurs at SO ns intervals."


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Tue May 12 20:22:08 2026
    From Newsgroup: comp.arch


    EricP <ThatWouldBeTelling@thevillage.com> posted:

    On 2026-May-10 11:59, Anton Ertl wrote:
    EricP <ThatWouldBeTelling@thevillage.com> writes:
    I don't know what 780 used for its integer multiplier
    (I can't find it in the schematics).

    Then I expect that it used microcode to do 1 multiplier bit per cycle.
    IIRC that was the approach that SPARC took at first, and that was also designed into HPPA (but then they added the FPU and multiplied by
    using the multiplier of the FPU).

    - anton

    I think so. There is one line in the hardware manual that says
    the optional FPU also "enhances the performance of integer multiply".

    The FPU technical manual describes how the FPU does its multiplies.
    It chops the values into 4b nibbles and looks up 4b x 4b values in ROMs
    in a loop, and sums the partial products.
    There are two sets of ROMs so it can interleave the lookups.

    Note that there are TTL parts available in 1975 to build a
    Wallace tree multiplier that can do 32b x 32b in ~130 ns.

    s/Wallace/Dadda/

    So, at 5MHz == 200ns this is probably a 2-cycle multiplier after
    you add the input multiplexing and output selection.

    From FP780_Technical_Description_1978-12

    "In order to obtain fast multiplication, a pipeline technique is used (Figure 2-21 ).
    The multiplier is divided into 4-bit nibbles. The nibbles are then accessed consecutively
    by a counter-multiplexer combination (least significant nibble first) and each nibble
    operates on up to 32 bits of multiplicand. The MCA ND bus and MPLIER nibbles are used to
    address the ROMs. The banks of ROMs provide a 4 X 4 primitive with 2-way interleaving.
    The data is latched (ROM.STORE) and applied to the inputs of 4-bit adders (PALU).
    These adders combine the ROM data to form a partial product, storing the carryout
    of each 4-bit section, to be added in on the next cycle. The partial product is latched
    in PPROD and passed to another row of adders (AALU) which accumulate the final product,
    again, saving the carries. Thus, when the pipeline is operating, there are four processes
    cycling at the same time:

    1. Select ROM addresses
    2. Latch ROM data
    3. Form partial product
    4. Accumulate final product.

    After the final product is calculated, the stored carriers from both stages are
    combined with the accumulated product using full carry look-ahead to produce the
    final answer in a single precision (float) operation. In double precision, this result is stored and used during the generation of the final answer during the second pass.

    Each of the pipeline processes, with the exception of accessing ROM data (which occurs in each bank of ROMs on 100 ns) occurs at SO ns intervals."


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Tue May 12 17:47:37 2026
    From Newsgroup: comp.arch

    On 2026-May-12 16:22, MitchAlsup wrote:

    EricP <ThatWouldBeTelling@thevillage.com> posted:

    On 2026-May-10 11:59, Anton Ertl wrote:
    EricP <ThatWouldBeTelling@thevillage.com> writes:
    I don't know what 780 used for its integer multiplier
    (I can't find it in the schematics).

    Then I expect that it used microcode to do 1 multiplier bit per cycle.
    IIRC that was the approach that SPARC took at first, and that was also
    designed into HPPA (but then they added the FPU and multiplied by
    using the multiplier of the FPU).

    - anton

    I think so. There is one line in the hardware manual that says
    the optional FPU also "enhances the performance of integer multiply".

    The FPU technical manual describes how the FPU does its multiplies.
    It chops the values into 4b nibbles and looks up 4b x 4b values in ROMs
    in a loop, and sums the partial products.
    There are two sets of ROMs so it can interleave the lookups.

    Note that there are TTL parts available in 1975 to build a
    Wallace tree multiplier that can do 32b x 32b in ~130 ns.

    s/Wallace/Dadda/

    So, at 5MHz == 200ns this is probably a 2-cycle multiplier after
    you add the input multiplexing and output selection.

    The TI logic databook labels the parts as Wallace tree.

    "The 'S274 is a basic 4-bit-by-4-bit parallel
    multiplier in a single package, and as such, no
    additional components are required to obtain an 8-bit
    product.
    ...
    The 'LS275 and 'S275 expandable bit-slice Wallace
    trees have been designed to accept up to seven
    bit-slice inputs and two carry inputs from previous
    slices for reduction to four lines."


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From jgd@jgd@cix.co.uk (John Dallman) to comp.arch on Tue May 12 22:53:40 2026
    From Newsgroup: comp.arch

    In article <10tvimf$22n8v$1@dont-email.me>, quadibloc@ca.invalid (quadi)
    wrote:

    As far as I'm concerned, they made the same mistake with EM64T as
    they did with the 80286. Not only should 16-bit stay, but they
    should have had it so that 16-bit was as easily accessible from
    64-bit as it was from 32-bit, so as to retain all 16-bit software
    running perfectly and transparently in 64-bit editions of Windows.

    When x86-64 was released, AMD were at pains to point out that running
    16-bit environments was possible. It was Microsoft who decided to drop
    16-bit support from 64-bit Windows.

    John
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Tue May 12 22:12:30 2026
    From Newsgroup: comp.arch

    jgd@cix.co.uk (John Dallman) writes:
    In article <10tvimf$22n8v$1@dont-email.me>, quadibloc@ca.invalid (quadi) >wrote:

    As far as I'm concerned, they made the same mistake with EM64T as
    they did with the 80286. Not only should 16-bit stay, but they
    should have had it so that 16-bit was as easily accessible from
    64-bit as it was from 32-bit, so as to retain all 16-bit software
    running perfectly and transparently in 64-bit editions of Windows.

    When x86-64 was released, AMD were at pains to point out that running
    16-bit environments was possible. It was Microsoft who decided to drop
    16-bit support from 64-bit Windows.

    I think you need to cite a source for that.

    The AMD64 Opterons did not support the segment limit registers
    when released.

    Some incomplete support was added in the second generation of
    Opteron because VMware and XEN had been using the segment limit registers for virtualization (shadow page tables). The support was sufficient
    to support XEN temporarily until the NPT (nested page table)
    and Pacifica (SVM) features were added to the processor, but
    it was not complete enough to support random 16-bit applications.

    https://www.pagetable.com/?p=25

    Intel never bothered to implement 16-bit segmentation support
    in x86_64; instead they created the EPT[*] for virtualization.


    [*] AMD's nested page table had the exact same format as the
    processor page table, while the Intel EPT had an unique
    entry format, different from the processor page tables.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From jgd@jgd@cix.co.uk (John Dallman) to comp.arch on Wed May 13 19:34:40 2026
    From Newsgroup: comp.arch

    In article <iBNMR.2$2K1.1@fx17.iad>, scott@slp53.sl.home (Scott Lurndal)
    wrote:

    When x86-64 was released, AMD were at pains to point out that
    running 16-bit environments was possible. It was Microsoft who
    decided to drop 16-bit support from 64-bit Windows.

    I think you need to cite a source for that.

    The AMD64 Opterons did not support the segment limit registers
    when released.

    I suspect my informant may have been exaggerating, or I've mis-remembered.


    John
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Wed May 13 18:07:41 2026
    From Newsgroup: comp.arch

    scott@slp53.sl.home (Scott Lurndal) writes:
    jgd@cix.co.uk (John Dallman) writes:
    In article <10tvimf$22n8v$1@dont-email.me>, quadibloc@ca.invalid (quadi) >>wrote:

    As far as I'm concerned, they made the same mistake with EM64T as
    they did with the 80286. Not only should 16-bit stay, but they
    should have had it so that 16-bit was as easily accessible from
    64-bit as it was from 32-bit, so as to retain all 16-bit software
    running perfectly and transparently in 64-bit editions of Windows.

    When x86-64 was released, AMD were at pains to point out that running >>16-bit environments was possible. It was Microsoft who decided to drop >>16-bit support from 64-bit Windows.

    I think you need to cite a source for that.

    The AMD64 Opterons did not support the segment limit registers
    when released.

    When I bought my Athlon 64 in October 2003, no 64-bit OSs were readily available for it, and all the 32-bit and 16-bit stuff I used ran
    nicely. For the game operating system, I always lagged behind, so
    maybe I was at W98 when I switched the hardware, and IIRC W98 gave low
    frame rates on the new hardware, so I switched to WME, or somesuch,
    which fixed the problem. All the 32-bit and 16-bit stuff that was
    still in WME worked on the Athlon 64 (which is the same
    microarchitecture (K8) as the early Opterons).

    If the segment limit registers were not supported, they obviously were
    not needed.

    But my guess is that you confuse this with the fact that the long mode
    (unlike legacy mode, in which the 32-bit OSs run) does not support
    most segmentation features of IA-32, and that the FS and GS
    segmentation registers it supports does so without limiting the
    segment (so they are just base registers in long mode). But, as
    mentioned, in legacy mode all segmentation is fully supported.

    However, if you have an OS that runs in long mode (i.e., a 64-bit OS),
    you cannot run real-mode programs (no real mode, virtual 8086 mode, or
    unreal mode), only 16-bit protected mode according to <https://en.wikipedia.org/wiki/X86-64#Operating_modes>. So an OS
    would have to switch to legacy mode to get to these modes. That page
    also says:

    |However, such [real mode] programs may be started from an operating
    |system running in long mode on processors supporting VT-x or AMD-V by |creating a virtual processor running in the desired mode.

    Some incomplete support was added in the second generation of
    Opteron because VMware and XEN had been using the segment limit registers for >virtualization (shadow page tables). The support was sufficient
    to support XEN temporarily until the NPT (nested page table)
    and Pacifica (SVM) features were added to the processor, but
    it was not complete enough to support random 16-bit applications.

    https://www.pagetable.com/?p=25

    |While VMware could still virtualize 32 bit operating systems on AMD64
    |CPUs, they could not virtualize 64 bit operating systems, because they |required segment limits.

    So it's not that the Opterons did not support any pre-existing
    functionality (no earlier 64-bit OSs for AMD64 existed, because
    Opteron was the first AMD64 processor), just that VMware could not use
    the same virtualization technique for 64-bit OSs that they could use
    for 32-bit OSs and had been using on earlier 32-bit processors and
    that they could still use in legacy mode.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Wed May 13 18:47:50 2026
    From Newsgroup: comp.arch

    jgd@cix.co.uk (John Dallman) writes:
    In article <iBNMR.2$2K1.1@fx17.iad>, scott@slp53.sl.home (Scott Lurndal) >wrote:

    When x86-64 was released, AMD were at pains to point out that
    running 16-bit environments was possible. It was Microsoft who
    decided to drop 16-bit support from 64-bit Windows.

    I think you need to cite a source for that.

    The AMD64 Opterons did not support the segment limit registers
    when released.

    I suspect my informant may have been exaggerating, or I've mis-remembered.

    As Anton pointed out, I should have added "in long mode".
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Wed May 13 20:51:08 2026
    From Newsgroup: comp.arch


    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

    scott@slp53.sl.home (Scott Lurndal) writes:
    jgd@cix.co.uk (John Dallman) writes:
    In article <10tvimf$22n8v$1@dont-email.me>, quadibloc@ca.invalid (quadi) >>wrote:

    As far as I'm concerned, they made the same mistake with EM64T as
    they did with the 80286. Not only should 16-bit stay, but they
    should have had it so that 16-bit was as easily accessible from
    64-bit as it was from 32-bit, so as to retain all 16-bit software
    running perfectly and transparently in 64-bit editions of Windows.

    When x86-64 was released, AMD were at pains to point out that running >>16-bit environments was possible. It was Microsoft who decided to drop >>16-bit support from 64-bit Windows.

    I think you need to cite a source for that.

    The AMD64 Opterons did not support the segment limit registers
    when released.

    When I bought my Athlon 64 in October 2003, no 64-bit OSs were readily available for it, and all the 32-bit and 16-bit stuff I used ran
    nicely. For the game operating system, I always lagged behind, so
    maybe I was at W98 when I switched the hardware, and IIRC W98 gave low
    frame rates on the new hardware, so I switched to WME, or somesuch,
    which fixed the problem. All the 32-bit and 16-bit stuff that was
    still in WME worked on the Athlon 64 (which is the same
    microarchitecture (K8) as the early Opterons).

    If the segment limit registers were not supported, they obviously were
    not needed.

    At that point in time, you are correct. That time was pre-virtualization
    and pre-multi-threaded applications.

    But my guess is that you confuse this with the fact that the long mode (unlike legacy mode, in which the 32-bit OSs run) does not support
    most segmentation features of IA-32, and that the FS and GS
    segmentation registers it supports does so without limiting the
    segment (so they are just base registers in long mode). But, as
    mentioned, in legacy mode all segmentation is fully supported.

    However, if you have an OS that runs in long mode (i.e., a 64-bit OS),
    you cannot run real-mode programs (no real mode, virtual 8086 mode, or
    unreal mode), only 16-bit protected mode according to <https://en.wikipedia.org/wiki/X86-64#Operating_modes>. So an OS
    would have to switch to legacy mode to get to these modes. That page
    also says:

    |However, such [real mode] programs may be started from an operating
    |system running in long mode on processors supporting VT-x or AMD-V by |creating a virtual processor running in the desired mode.

    Some incomplete support was added in the second generation of
    Opteron because VMware and XEN had been using the segment limit registers for
    virtualization (shadow page tables). The support was sufficient
    to support XEN temporarily until the NPT (nested page table)
    and Pacifica (SVM) features were added to the processor, but
    it was not complete enough to support random 16-bit applications.

    https://www.pagetable.com/?p=25

    |While VMware could still virtualize 32 bit operating systems on AMD64
    |CPUs, they could not virtualize 64 bit operating systems, because they |required segment limits.

    So it's not that the Opterons did not support any pre-existing
    functionality (no earlier 64-bit OSs for AMD64 existed, because
    Opteron was the first AMD64 processor), just that VMware could not use
    the same virtualization technique for 64-bit OSs that they could use
    for 32-bit OSs and had been using on earlier 32-bit processors and
    that they could still use in legacy mode.

    By allowing 1 segment to contain data, VMware virtualization was improved*.

    By allowing 1 segment to contain pointer, multi-threaded-applications could address thread-local-memory easier.

    (*) on the other hand, had x86 ISA contained only 1 instruction that
    needed to trap when virtualization support was necessary, that segment
    register would not have been needed, either.

    - anton
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Thu May 14 21:10:32 2026
    From Newsgroup: comp.arch

    John Dallman <jgd@cix.co.uk> schrieb:
    In article <10tvimf$22n8v$1@dont-email.me>, quadibloc@ca.invalid (quadi) wrote:

    As far as I'm concerned, they made the same mistake with EM64T as
    they did with the 80286. Not only should 16-bit stay, but they
    should have had it so that 16-bit was as easily accessible from
    64-bit as it was from 32-bit, so as to retain all 16-bit software
    running perfectly and transparently in 64-bit editions of Windows.

    When x86-64 was released, AMD were at pains to point out that running
    16-bit environments was possible. It was Microsoft who decided to drop
    16-bit support from 64-bit Windows.

    Given that Dosemu runs much faster in emulation than on the original
    hardware, that is not such a big loss.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Fri May 15 09:03:43 2026
    From Newsgroup: comp.arch

    On 2026-May-11 16:51, MitchAlsup wrote:

    EricP <ThatWouldBeTelling@thevillage.com> posted:

    On 2026-May-09 15:41, Thomas Koenig wrote:
    EricP <ThatWouldBeTelling@thevillage.com> schrieb:

    Lately I have been playing around with circa 1975 TTL paper cpu designs but
    done in a pipelined risc style. The instructions must be variable length because
    memory was so expensive in 1975. The key is to not bottleneck in Fetch or Decode.

    Last month I designed a TTL fetch-parse unit for a risc-ish pipeline using the same
    parts as are on the VAX-780 bill of materials or in the 1976 TI logic data book.

    The Fetch unit cares little what the internal format of an instruction is. >> It only looks at most at the first up to 12 bits to determine length.
    It only cares how many bytes are in the instruction and are they all
    present in the prefetch register. If they are Fetch copies them to
    the Decode input instruction register.

    This is the stage I call PARSE. Fetch is responsible for presenting
    cache sub-lines to PARSE, PARSE slices instructions out of the words
    and presenting parsed instructions to multiple parallel DECODE units.

    Yes, I call it that too.
    The Fetch unit consists of two sub units, the Prefetcher and the Parser.
    These are independent machines that coordinate their activity through
    the circular prefetch buffer.

    More below...

    The Fetch unit parser uses a Signetics 82S100 FPLA to determine
    the instruction length. The 82S100 has 16 inputs, 8 outputs,
    and can match 48 product terms with 0, 1 or x dont-care bits.
    The parser feeds the first 12 bits from the first 2 bytes plus
    their 2 byte Valid flags and an error signal into the 82S100.
    There is one input spare for future use.
    The PLA outputs a 4b length from 0 to 12, plus 2 status bits
    indicating the length is valid, or fetch unit stalled,
    or fetch error, either page fault exception or HW error.
    If the parse length is valid then fetch checks that all the
    bytes of the instruction length are also valid.
    If they are it passes the instruction to Decode input register.

    I should mention that the DECODE unit of Mc 88100 was a siingle
    NOR plane {while a PLA is 2 NOR planes}.

    The key to the design was figuring out a shifter that can
    rotate a 16 or 32 byte buffer that fits on the board.

    Originally I looked at using 8:1 muxes to build a 16B shifter
    but it would have taken 288 16-pin chips for just the shifter,
    which was more that the whole pcb limit of 256 16-pin chips.

    The solution came when I had the idea of using the VAX-780
    barrel shifter chips, the AMD 25S10 7:4 bit shifter.

    Datasheets for AMD 25S10
    https://www.datasheetarchive.com/?q=am25s10

    It takes one of these chips to build a 4 bit barrel shifter,
    two layers of 4 chips = 8 total for a 16-bit barrel shifter,
    three layers of 16 chips = 48 total for a 64-bit
    barrel shifter.

    I repurposed them to rotate the 32B prefetch register.
    The prefetch register holds 9-bit values, 8 data + 1 valid flag,
    that I call VB's or Valid Bytes.
    It takes just 72 of these chips to build a shifter for 16 VB's,
    and 143 chips for 32 VB's. And those fit on the pcb!

    With that in place I considered two Fetch unit designs,
    one called LilBuf16 has a 16 VB circular prefetch buffer,
    and BigBuf32 has a 32 VB circular prefetch buffer.

    Suffice it to say that while LilBuf16 requires 1/2 the
    chips for buffer and shifter, it has much more complex
    control logic. The prefetcher is moving a write pointer
    over the circular prefetch buffer and copying in as many
    bytes as possible in each clock, while the parser is
    also moving its read pointer. To make it work required
    creating a variable sized write mask over the buffer.
    And all it would take is one long instruction to
    drain the prefetch buffer and cause a stall.

    The control logic for BigBuf32 was much simpler as
    it prefectes 8 byte blocks and writes the whole block
    in at once when it finds an empty slot.
    In just 4 clocks it can fill the 32B prefetch buffer.

    Design for BigBuf32

    I-cache
    [#######################]
    ^ |
    Cache Cmd | v 8B Fetch Blocks
    & Phy Addr | --------------------------------
    | | | | |
    PreFetch | v v v v
    Counter [##########] 32B PreFetch Reg [=FB3=|=FB2=|=FB1=|=FB0=]
    VA & PA | ^ ^ | | | |
    | | | v v v v
    | | -------->[########]-->[########### 32B Rotate #########]
    | | ^ Parse VA | | | | | |
    | | | Counter | v v v v v
    | | | | [PLA]-->[=12B Validate Present=]
    | | | | | |
    TLB Cmd | | Phy | | | |
    & Vir Addr | | Addr | | | |
    v ^ ^ v v v
    To TLB Jump Inst Inst Inst Bytes 1..12 + Status
    Bus VA VA Len
    To Decode Input Inst Register


    Fetching starts when a new address and priv mode comes in over
    the Jump Bus. The Jump Bus is an open collector (wired-OR) bus
    that runs to all the stages that can generate jump addresses:
    Decode, RegRead, JBU Jump-Branch-Unit, and Writeback-Retire.

    The jump address overwrite the parse and prefetch counters
    and resets the phy-addr-valid flag in prefetcher.
    Prefetcher see it has a new VA so it requests a translation
    from the TLB, and saves the PA in the prefetch counter.

    Whenever Prefetcher sees a block valid flag on the prefetch
    register is clear, or is going to be made clear in this cycle,
    it copies in the next sequential 8-byte fetch block and
    sets the block valid flag.

    The Parser uses the parse VA to rotate the prefetch register
    contents so that the instruction start VB aligns with the
    parse PLA inputs. The PLA looks at VB0 and possibly VB1 bits
    and spits out an instruction length and some status bits.

    If the parse status is valid then it validates that the valid
    flags are set on all the VB's in the instruction length.
    The validator is just a 16:1 mux and a bunch of AND gates.

    The valid instruction bytes, its virtual address, its length
    and other status info is passed to Decoder instruction register.



    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri May 15 13:54:59 2026
    From Newsgroup: comp.arch

    MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

    EricP <ThatWouldBeTelling@thevillage.com> posted:

    On 2026-May-10 11:59, Anton Ertl wrote:
    EricP <ThatWouldBeTelling@thevillage.com> writes:
    I don't know what 780 used for its integer multiplier
    (I can't find it in the schematics).

    Then I expect that it used microcode to do 1 multiplier bit per cycle.
    IIRC that was the approach that SPARC took at first, and that was also
    designed into HPPA (but then they added the FPU and multiplied by
    using the multiplier of the FPU).

    - anton

    I think so. There is one line in the hardware manual that says
    the optional FPU also "enhances the performance of integer multiply".

    The FPU technical manual describes how the FPU does its multiplies.
    It chops the values into 4b nibbles and looks up 4b x 4b values in ROMs
    in a loop, and sums the partial products.
    There are two sets of ROMs so it can interleave the lookups.

    Note that there are TTL parts available in 1975 to build a
    Wallace tree multiplier that can do 32b x 32b in ~130 ns.

    130 ns? That sounds quite fast. Do you happen to have details
    of such a design?


    s/Wallace/Dadda/

    So, at 5MHz == 200ns this is probably a 2-cycle multiplier after
    you add the input multiplexing and output selection.

    But that is only for mutiplying two nibbles with 32 bits.

    By using 64 74S274 and a corresponding number of 74S275, 74S183 and
    74S283 (full adders for the corners and for speeding up incoming
    carries) plus a carry lookahead cascade of 74S181 and 74S182, it
    would have been probably quite possible to build a two-cycle 32*32
    multiplier for a VAX-like comptuter on two boards (each of which
    can hold around 120 chips), pipelined for one result per cycle.

    It is also interesting that they chose ROMs instead of 74S274s
    with the same function, probably due to price or power.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Fri May 15 11:30:46 2026
    From Newsgroup: comp.arch

    On 2026-May-15 09:54, Thomas Koenig wrote:
    MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

    EricP <ThatWouldBeTelling@thevillage.com> posted:

    On 2026-May-10 11:59, Anton Ertl wrote:
    EricP <ThatWouldBeTelling@thevillage.com> writes:
    I don't know what 780 used for its integer multiplier
    (I can't find it in the schematics).

    Then I expect that it used microcode to do 1 multiplier bit per cycle. >>>> IIRC that was the approach that SPARC took at first, and that was also >>>> designed into HPPA (but then they added the FPU and multiplied by
    using the multiplier of the FPU).

    - anton

    I think so. There is one line in the hardware manual that says
    the optional FPU also "enhances the performance of integer multiply".

    The FPU technical manual describes how the FPU does its multiplies.
    It chops the values into 4b nibbles and looks up 4b x 4b values in ROMs
    in a loop, and sums the partial products.
    There are two sets of ROMs so it can interleave the lookups.

    Note that there are TTL parts available in 1975 to build a
    Wallace tree multiplier that can do 32b x 32b in ~130 ns.

    130 ns? That sounds quite fast. Do you happen to have details
    of such a design?

    I thought so too but it says (the 74LS times are in () )

    "When SN74S274 is Combined With SN74H183
    (or SN74LS183) and Schottky Look-Ahead
    Adders, Multiplication Times are Typically:
    16-Bit Product in 75 ns (79 ns)
    32-Bit Product in 116 ns (132 ns)"

    Starting at page 7-391, after technical specs it
    shows a number of example configurations.

    54/74 Family MSI/LSI Circuits 1976
    Manual is page indexed and searchable https://www.bitsavers.org/components/ti/_dataBooks/1976_TI_The_TTL_Data_Book_2ed/07.pdf


    s/Wallace/Dadda/

    So, at 5MHz == 200ns this is probably a 2-cycle multiplier after
    you add the input multiplexing and output selection.

    But that is only for mutiplying two nibbles with 32 bits.

    By using 64 74S274 and a corresponding number of 74S275, 74S183 and
    74S283 (full adders for the corners and for speeding up incoming
    carries) plus a carry lookahead cascade of 74S181 and 74S182, it
    would have been probably quite possible to build a two-cycle 32*32
    multiplier for a VAX-like comptuter on two boards (each of which
    can hold around 120 chips), pipelined for one result per cycle.

    It is also interesting that they chose ROMs instead of 74S274s
    with the same function, probably due to price or power.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Fri May 15 16:02:37 2026
    From Newsgroup: comp.arch

    On Wed, 13 May 2026 18:07:41 +0000, Anton Ertl wrote:

    When I bought my Athlon 64 in October 2003, no 64-bit OSs were readily available for it, and all the 32-bit and 16-bit stuff I used ran nicely.

    Yes. You can run 32-bit Windows 7 on a 64-bit CPU in 32-bit mode, and 16-
    bit programs will run just as well.

    But apparently without rebooting to get into 32-bit mode, there really is
    a hardware reason why 16-bit software can't easily be made to work from 64-
    bit mode. Whatever the hardware reason is, in my opinion that's a mistake
    on the part of the hardware designers.

    John Savard
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Fri May 15 16:05:46 2026
    From Newsgroup: comp.arch

    On Thu, 14 May 2026 21:10:32 +0000, Thomas Koenig wrote:

    Given that Dosemu runs much faster in emulation than on the original hardware, that is not such a big loss.

    That is not the right comparison. If my old 16-bit software doesn't run at full *native* speed on my shiny new 64-bit computer, then I still have to
    run out and buy new software to get my work done faster.

    Although it is correct to say that Microsoft could have done something to
    fix this, even with the hardware issue as it is. They could have let
    people buy just one copy of Windows, and install it twice on the same
    computer - so as to be able to boot into either the 32-bit version or the 64-bit version.

    John Savard

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri May 15 17:08:59 2026
    From Newsgroup: comp.arch

    EricP <ThatWouldBeTelling@thevillage.com> schrieb:
    On 2026-May-15 09:54, Thomas Koenig wrote:

    130 ns? That sounds quite fast. Do you happen to have details
    of such a design?

    I thought so too but it says (the 74LS times are in () )

    "When SN74S274 is Combined With SN74H183
    (or SN74LS183) and Schottky Look-Ahead
    Adders, Multiplication Times are Typically:
    ^^^^^^^^^
    16-Bit Product in 75 ns (79 ns)
    32-Bit Product in 116 ns (132 ns)"

    Starting at page 7-391, after technical specs it
    shows a number of example configurations.

    Therein lies the rub - they probably just added their typical and
    not their max values.

    54/74 Family MSI/LSI Circuits 1976
    Manual is page indexed and searchable https://www.bitsavers.org/components/ti/_dataBooks/1976_TI_The_TTL_Data_Book_2ed/07.pdf

    Jep.

    My grandrather used to have the TI handbook on his desk.
    He developed (and filed a patent for) an instrument for measuring
    fuel consumption for cars, and in order to display the customary
    L/100 km in Euroupe, he had to divide two numbers (gasoline
    consumption and speed) for which he built the circutriy out of
    74xx chips. But the main focus was the mechanical device.

    How he didn't blow up his house (including me, I liked to "help"
    him as a young child) I don't know.

    It was probably too early to have market success, now every
    car has something like that.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri May 15 18:19:13 2026
    From Newsgroup: comp.arch

    quadi <quadibloc@ca.invalid> schrieb:
    On Thu, 14 May 2026 21:10:32 +0000, Thomas Koenig wrote:

    Given that Dosemu runs much faster in emulation than on the original
    hardware, that is not such a big loss.

    That is not the right comparison. If my old 16-bit software doesn't run at full *native* speed on my shiny new 64-bit computer, then I still have to run out and buy new software to get my work done faster.

    At work, I actually use a 16-bit MS-DOS program, which was written
    in the early 1990s. Originally, it had run 15-20 minutes; now it
    runs in far less than 10 seconds (I never timed it, it is fast).
    A newer 64-bit version is available, but I actually don't use it
    because the old one works well, and I'm simply too lazy to learn
    the new qirks, when I am quite used to the old quirks. It may also
    be faster by a factor of 5, I don't care.

    By comparison: It takes ages for for me to open an Excel file,
    and whenever I change something in a certain complex Excel file,
    like moing around a text box in a graph, it decides to recalculate,
    taking maybe 30 seconds for me to see that the text box is not
    where it should be. (And yes, I have switched off calculation, but
    graphics doesn't care).

    Although it is correct to say that Microsoft could have done something to fix this, even with the hardware issue as it is. They could have let
    people buy just one copy of Windows, and install it twice on the same computer - so as to be able to boot into either the 32-bit version or the 64-bit version.

    Haha.

    Forcing people to ditch their own computers because they are not
    Windows 11 compatible is the Microsoft way.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri May 15 20:07:35 2026
    From Newsgroup: comp.arch


    Thomas Koenig <tkoenig@netcologne.de> posted:

    quadi <quadibloc@ca.invalid> schrieb:
    On Thu, 14 May 2026 21:10:32 +0000, Thomas Koenig wrote:

    Given that Dosemu runs much faster in emulation than on the original
    hardware, that is not such a big loss.

    That is not the right comparison. If my old 16-bit software doesn't run at full *native* speed on my shiny new 64-bit computer, then I still have to run out and buy new software to get my work done faster.

    At work, I actually use a 16-bit MS-DOS program, which was written
    in the early 1990s. Originally, it had run 15-20 minutes; now it
    runs in far less than 10 seconds (I never timed it, it is fast).
    A newer 64-bit version is available, but I actually don't use it
    because the old one works well, and I'm simply too lazy to learn
    the new qirks, when I am quite used to the old quirks. It may also
    be faster by a factor of 5, I don't care.

    I have an automobile engine simulator* written in eXcel. When run on
    my 33 MHz 486, I had time to get up from my chair, walk to the kitchen,
    open the fridge, grab a beer, walk back, and the calculations were just finishing. When I got a 200 MHz Pentium Pro the same calculations took
    a blink of an eye.

    (*) 15 spread sheets, more than 10,000 equations somewhere around
    10% of them using SQRT(), SIN(), COS(), EXP(), and LOG().

    By comparison: It takes ages for for me to open an Excel file,
    and whenever I change something in a certain complex Excel file,
    like moing around a text box in a graph, it decides to recalculate,

    You can (CAN) turn off automatic recalculation...

    taking maybe 30 seconds for me to see that the text box is not
    where it should be. (And yes, I have switched off calculation, but
    graphics doesn't care).

    Although it is correct to say that Microsoft could have done something to fix this, even with the hardware issue as it is. They could have let people buy just one copy of Windows, and install it twice on the same computer - so as to be able to boot into either the 32-bit version or the 64-bit version.

    Haha.

    Forcing people to ditch their own computers because they are not
    Windows 11 compatible is the Microsoft way.

    With Intel and AMD support.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri May 15 22:19:24 2026
    From Newsgroup: comp.arch

    MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

    Thomas Koenig <tkoenig@netcologne.de> posted:

    quadi <quadibloc@ca.invalid> schrieb:
    On Thu, 14 May 2026 21:10:32 +0000, Thomas Koenig wrote:

    Given that Dosemu runs much faster in emulation than on the original
    hardware, that is not such a big loss.

    That is not the right comparison. If my old 16-bit software doesn't run at
    full *native* speed on my shiny new 64-bit computer, then I still have to >> > run out and buy new software to get my work done faster.

    At work, I actually use a 16-bit MS-DOS program, which was written
    in the early 1990s. Originally, it had run 15-20 minutes; now it
    runs in far less than 10 seconds (I never timed it, it is fast).
    A newer 64-bit version is available, but I actually don't use it
    because the old one works well, and I'm simply too lazy to learn
    the new qirks, when I am quite used to the old quirks. It may also
    be faster by a factor of 5, I don't care.

    I have an automobile engine simulator* written in eXcel. When run on
    my 33 MHz 486, I had time to get up from my chair, walk to the kitchen,
    open the fridge, grab a beer, walk back, and the calculations were just finishing. When I got a 200 MHz Pentium Pro the same calculations took
    a blink of an eye.

    (*) 15 spread sheets, more than 10,000 equations somewhere around
    10% of them using SQRT(), SIN(), COS(), EXP(), and LOG().

    By comparison: It takes ages for for me to open an Excel file,
    and whenever I change something in a certain complex Excel file,
    like moing around a text box in a graph, it decides to recalculate,

    You can (CAN) turn off automatic recalculation...

    And I did. It turns off the calculations in the sheet, but
    apparently it still wants to recalculate things when I shift
    something in a graph... and that I cannot turn off.

    But of course, if I use spill formulas with things like sorting,
    and then display sorted data... obviously my fault. Woe betide
    anyone who has more than, let's say, 20000 or 30000 data points in
    a column. That is obviously too much for a laptop with 16 GB main
    memory and eight cores running Microsoft software and Windows 11.

    Now I have moved the work on that sheet to my 512 GB workstation
    with 48 Xeon cores, things have gotten tolerable (but just barely).

    Using Python in Excel sends data to the Microsoft cloud, and does
    not work without Internet access. Yuck.

    I could try to use VBA, but who on Earth wants to? Plus Excel macros
    are notoriously unsafe, and it is a good idea not to use them, and
    not to encourage people to switch them on by default.

    External programs - sure, I could write a Fortran or ... program to
    do this, but then I could no longer distribute it to colleagues
    and expect it to work.


    taking maybe 30 seconds for me to see that the text box is not
    where it should be. (And yes, I have switched off calculation, but
    graphics doesn't care).

    Although it is correct to say that Microsoft could have done something to >> > fix this, even with the hardware issue as it is. They could have let
    people buy just one copy of Windows, and install it twice on the same
    computer - so as to be able to boot into either the 32-bit version or the >> > 64-bit version.

    Haha.

    Forcing people to ditch their own computers because they are not
    Windows 11 compatible is the Microsoft way.

    With Intel and AMD support.

    Or the other way - Microsoft wanted to get their partners their
    partners a shot in the arm. Wintel lives... (only very few of
    these absolutely working machines will have somebody installing
    Linux on them, I think).
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch on Sat May 16 11:20:21 2026
    From Newsgroup: comp.arch

    On 15/05/2026 18:02, quadi wrote:
    On Wed, 13 May 2026 18:07:41 +0000, Anton Ertl wrote:

    When I bought my Athlon 64 in October 2003, no 64-bit OSs were readily
    available for it, and all the 32-bit and 16-bit stuff I used ran nicely.

    Yes. You can run 32-bit Windows 7 on a 64-bit CPU in 32-bit mode, and 16-
    bit programs will run just as well.

    But apparently without rebooting to get into 32-bit mode, there really is
    a hardware reason why 16-bit software can't easily be made to work from 64- bit mode. Whatever the hardware reason is, in my opinion that's a mistake
    on the part of the hardware designers.

    John Savard

    I don't know how easy or not it is to implement the support, but don't remember problems running 16-bit Windows programs on 64-bit XP on an
    Athlon. And I've had occasion to run 16-bit Windows programs with Wine
    on 64-bit Linux. Maybe this all requires some special effort under the
    hood, but it seems to be a solved problem - lack of 16-bit support in
    modern 64-bit Windows appears to be a non-technical decision.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Mon May 18 05:24:26 2026
    From Newsgroup: comp.arch

    On Mon, 04 May 2026 18:06:18 +0000, MitchAlsup wrote:

    Headers are nothing more than mode-bits that change every block of code.
    This means that ISA will be exceptionally difficult to verify, and as
    you have found: very difficult to encode.

    Your second sentence may well be true.

    Your first sentence is definitely true, but it's a feature, not a bug. I intentionally exploited my block headers to achieve what would otherwise require mode bits - allowing instructions to be shorter, because one could switch between alternate sets of instructions - without the great danger
    of mode bits for security: someone branching into code written to execute
    in one mode while the machine's state specifies a different mode, thus
    making code perform unintended actions.

    John Savard
    --- Synchronet 3.22a-Linux NewsLink 1.2