Forum: Too Lazy BBS

Who's Online
Recent Visitors
- Kawasu
  Fri Oct 17 10:51:10 2025
  from Mena, Ar via Telnet
- Geek2
  Thu Oct 16 20:44:04 2025
  from Euclid, Oh via Telnet
- Kawasu
  Thu Oct 16 10:17:15 2025
  from Mena, Ar via Telnet
- Geek2
  Thu Oct 16 06:39:58 2025
  from Euclid, Oh via Telnet

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	26
Nodes:	6 (0 / 6)
Uptime:	61:16:30
Calls:	633
Calls today:	1
Files:	1,188
D/L today:	32 files (20,076K bytes)
Messages:	181,450

Linus Torvalds on bad architectural features

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Fri Oct 3 08:58:32 2025

From Newsgroup: comp.arch

Apparently someone wants to create a big-endian RISC-V, and someone
proposed adding support to that to Linux. This has evoked the
following design guideline for designing bad architectures from Linus
Torvalds (extracted from <https://lwn.net/ml/all/CAHk-=wji-hEV1U1x92TLsrPbpSPqDD7Cgv2YwzeL-mMbM7iaRA@mail.gmail.com/>):

|If somebody really wants to create bad hardware in this day and age,
|please do make it big-endian, and also add the following very
|traditional features for sh*t-for-brains hardware:
|
| - virtually tagged caches
|
| You can't really claim to be worst-of-the-worst without virtually
|tagged caches.
|
| Tears of joy as you debug cache alias issues and of flushing caches
|on context switches.
|
| - only do aligned memory accesses
|
| Bonus point for not even faulting, and just loading and storing
|garbage instead.
|
| - expose your pipeline details in the ISA
|
| Delayed branch slots or explicit instruction grouping is a great
|way to show that you eat crayons for breakfast before you start
|designing your hardware platform
|
| - extended memory windows
|
| It was good enough for 8-bit machines in order to address more
|memory, and became a HIGHMEM.SYS staple in the DOS world, and then got
|taken up by both x86 and arm in their 32-bit days as HIGHMEM support.
|
| It has decades of history, and an architecture cannot be called
|truly awful if it doesn't support some kind of HIGHMEM crap.
|
| - register windows. It's like extended memory, but for your registers!
|
| Please make sure to also have hardware support for filling and
|spilling them, but make it limited enough that system software has to
|deal with faults at critical times. Nesting exceptions is joyful!
|
| Bonus points if they are rotating and overflowing them silently
|just corrupts data. Keep those users on their toes!
|
| - in fact, require software fallbacks for pretty much anything unusual.
|
| TLB fills? They might only happen every ten or twenty instructions,
|so make them fault to some software implementation to really show your
|mad hardware skillz.
|
| denormals or any other FP precision issues? No, no, don't waste
|hardware on getting it right, software people *LOVE* to clean up after
|you.
|
| Remember: your mom picked up your dirty laundry from your floor,
|and software people are like the super-moms of the world.
|
| - make exceptions asynchronous.
|
| That's another great way to make sure people stay on their toes.
|Make sure machine check exceptions can happen in any context, so that
|you are guaranteed to have a dead machine any time anything goes
|wrong.
|
| But you should also take the non-maskability of NMI to heart, and
|make sure that software cannot possibly write code that is truly
|atomic. Because the NM is NMI is what makes it great!
|
| Floating point! Make sure that the special case you don't deal with
|in hardware are also delayed so that the software people have extra
|joy in trying to figure out just WTF happened. See the previous entry:
|they live for that stuff.
|
|I'm sure I've forgotten many other points. And I'm sure that hardware
|people will figure it out!
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21a-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Fri Oct 3 05:40:22 2025

From Newsgroup: comp.arch

On 10/3/2025 3:58 AM, Anton Ertl wrote:

Apparently someone wants to create a big-endian RISC-V, and someone
proposed adding support to that to Linux. This has evoked the
following design guideline for designing bad architectures from Linus Torvalds (extracted from <https://lwn.net/ml/all/CAHk-=wji-hEV1U1x92TLsrPbpSPqDD7Cgv2YwzeL-mMbM7iaRA@mail.gmail.com/>):

Yeah...

Sadly I kinda feel called out here.
Wouldn't necessarily get the Torvalds' seal of approval...

|If somebody really wants to create bad hardware in this day and age,
|please do make it big-endian, and also add the following very
|traditional features for sh*t-for-brains hardware:
|
| - virtually tagged caches
|
| You can't really claim to be worst-of-the-worst without virtually
|tagged caches.
|
| Tears of joy as you debug cache alias issues and of flushing caches
|on context switches.
|

Sorta applies to my core...
Though the L1D$ also remembers the Phys-Addr and uses this for Write-Back.

| - only do aligned memory accesses
|
| Bonus point for not even faulting, and just loading and storing
|garbage instead.
|

Avoided in BJX2 Core.

Would apply to my smaller BSR1 and B32V cores (aligned only = cheaper).

| - expose your pipeline details in the ISA
|
| Delayed branch slots or explicit instruction grouping is a great
|way to show that you eat crayons for breakfast before you start
|designing your hardware platform
|

Former true of SuperH.
Both true of BJX1.
Latter true of BJX2 XG1/XG2.

Not true of XG3, which went over to superscalar.

WEX Bundling may have been a mistake in retrospect...

| - extended memory windows
|
| It was good enough for 8-bit machines in order to address more
|memory, and became a HIGHMEM.SYS staple in the DOS world, and then got |taken up by both x86 and arm in their 32-bit days as HIGHMEM support.
|
| It has decades of history, and an architecture cannot be called
|truly awful if it doesn't support some kind of HIGHMEM crap.
|

Avoided.

| - register windows. It's like extended memory, but for your registers!
|
| Please make sure to also have hardware support for filling and
|spilling them, but make it limited enough that system software has to
|deal with faults at critical times. Nesting exceptions is joyful!
|
| Bonus points if they are rotating and overflowing them silently
|just corrupts data. Keep those users on their toes!
|

Avoided.

| - in fact, require software fallbacks for pretty much anything unusual.
|
| TLB fills? They might only happen every ten or twenty instructions,
|so make them fault to some software implementation to really show your
|mad hardware skillz.
|

Errm, true of BJX2.

Though, TLB Misses are nowhere near that frequent though (if they were, performance would be unusable dog crap).

| denormals or any other FP precision issues? No, no, don't waste
|hardware on getting it right, software people *LOVE* to clean up after
|you.
|

Also true of my core.

It also now pretends to have Binary128, pretty much entirely by software traps.

But, trapping has less code footprint, so if sinl/cosl/... are used,
they wont burn as much space in ".text" with the function calls (and if
I can trap out of RISC-V mode, then it can use 128-bit math and a few
other features that don't exist in RV64, so it isn't necessarily slower
than using a function call).

| Remember: your mom picked up your dirty laundry from your floor,
|and software people are like the super-moms of the world.
|

But, makes hardware cheaper...

| - make exceptions asynchronous.
|
| That's another great way to make sure people stay on their toes.
|Make sure machine check exceptions can happen in any context, so that
|you are guaranteed to have a dead machine any time anything goes
|wrong.
|

Avoided:
TLB Miss handling really needs precise exceptions in order to work
correctly.

| But you should also take the non-maskability of NMI to heart, and
|make sure that software cannot possibly write code that is truly
|atomic. Because the NM is NMI is what makes it great!
|
| Floating point! Make sure that the special case you don't deal with
|in hardware are also delayed so that the software people have extra
|joy in trying to figure out just WTF happened. See the previous entry:
|they live for that stuff.
|
|I'm sure I've forgotten many other points. And I'm sure that hardware |people will figure it out!

Ignoring HOB's in pointers except in certain edge cases?...

I have mixed feelings about having put FPU status in HOBs of SP
(possible foot gun).

Weak coherence, with special rituals needed to actually get caches
flushed?...

Bit-slicing certain address calculations so the relevant structures have mandatory alignment?...

Interrupt entry is basically just a glorified branch-with-mode change,
so the ISR handler has to go through a convoluted sequence to get to
where it can start saving off the registers?...

...

--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Fri Oct 3 13:46:45 2025

From Newsgroup: comp.arch

On Fri, 03 Oct 2025 08:58:32 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

Apparently someone wants to create a big-endian RISC-V, and someone
proposed adding support to that to Linux. This has evoked the
following design guideline for designing bad architectures from Linus Torvalds (extracted from <https://lwn.net/ml/all/CAHk-=wji-hEV1U1x92TLsrPbpSPqDD7Cgv2YwzeL-mMbM7iaRA@mail.gmail.com/>):

|If somebody really wants to create bad hardware in this day and age,
|please do make it big-endian, and also add the following very
|traditional features for sh*t-for-brains hardware:
|
| - virtually tagged caches
|
| You can't really claim to be worst-of-the-worst without virtually
|tagged caches.
|
| Tears of joy as you debug cache alias issues and of flushing caches
|on context switches.
|

That is only true if one insists on OS with Multiple Address Spaces.
Virtually tagged caches are fine for Single Address Space (SAS) OS.

I see nothing wrong (and plenty right) about SAS as long as address
space is big enough.
I.e. not 47-48 bits and preferably even not 56 bits. Considering
near-death of Moore Law, 58 or 60 bits should be enough for SAS for
next 50 years. May be, even for 100.

SAS does not allow few tricks that people play today with aliases, but
none of these tricks is really important for performance and all are detrimental for sanity.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Fri Oct 3 11:26:11 2025

From Newsgroup: comp.arch

| - virtually tagged caches
| You can't really claim to be worst-of-the-worst without virtually
|tagged caches.
| Tears of joy as you debug cache alias issues and of flushing caches
|on context switches.

That is only true if one insists on OS with Multiple Address Spaces. Virtually tagged caches are fine for Single Address Space (SAS) OS.

AFAIK, the main problem with SASOS is "backward compatibility", most importantly with `fork`. The Mill people proposed a possible solution,
which seemed workable, but it's far from clear to me whether it would
work well enough if you want to port, say, Debian to such
an architecture.

Stefan
--- Synchronet 3.21a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Oct 3 15:41:34 2025

From Newsgroup: comp.arch

anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

Apparently someone wants to create a big-endian RISC-V, and someone
proposed adding support to that to Linux. This has evoked the
following design guideline for designing bad architectures from Linus Torvalds (extracted from <https://lwn.net/ml/all/CAHk-=wji-hEV1U1x92TLsrPbpSPqDD7Cgv2YwzeL-mMbM7iaRA@mail.gmail.com/>):

|If somebody really wants to create bad hardware in this day and age,
|please do make it big-endian, and also add the following very
|traditional features for sh*t-for-brains hardware:
|
| - virtually tagged caches
|
| You can't really claim to be worst-of-the-worst without virtually
|tagged caches.
|
| Tears of joy as you debug cache alias issues and of flushing caches
|on context switches.

Avoided.

| - only do aligned memory accesses
|
| Bonus point for not even faulting, and just loading and storing
|garbage instead.

Avoided.

| - expose your pipeline details in the ISA
|
| Delayed branch slots or explicit instruction grouping is a great
|way to show that you eat crayons for breakfast before you start
|designing your hardware platform

Avoided

| - extended memory windows
|
| It was good enough for 8-bit machines in order to address more
|memory, and became a HIGHMEM.SYS staple in the DOS world, and then got |taken up by both x86 and arm in their 32-bit days as HIGHMEM support.

Avoided

| It has decades of history, and an architecture cannot be called
|truly awful if it doesn't support some kind of HIGHMEM crap.
|
| - register windows. It's like extended memory, but for your registers!
|
| Please make sure to also have hardware support for filling and
|spilling them, but make it limited enough that system software has to
|deal with faults at critical times. Nesting exceptions is joyful!
|
| Bonus points if they are rotating and overflowing them silently
|just corrupts data. Keep those users on their toes!

Avoided

| - in fact, require software fallbacks for pretty much anything unusual.
|
| TLB fills? They might only happen every ten or twenty instructions,
|so make them fault to some software implementation to really show your
|mad hardware skillz.

Avoided--and mine are even coherent so you don't even have to shoot
them down.

| denormals or any other FP precision issues? No, no, don't waste
|hardware on getting it right, software people *LOVE* to clean up after
|you.
|
| Remember: your mom picked up your dirty laundry from your floor,
|and software people are like the super-moms of the world.

Avoided.

| - make exceptions asynchronous.

Avoided

| That's another great way to make sure people stay on their toes.
|Make sure machine check exceptions can happen in any context, so that
|you are guaranteed to have a dead machine any time anything goes
|wrong.
|
| But you should also take the non-maskability of NMI to heart, and
|make sure that software cannot possibly write code that is truly
|atomic. Because the NM is NMI is what makes it great!

Avoided

| Floating point! Make sure that the special case you don't deal with
|in hardware are also delayed so that the software people have extra
|joy in trying to figure out just WTF happened. See the previous entry:
|they live for that stuff.

Avoided

|I'm sure I've forgotten many other points. And I'm sure that hardware |people will figure it out!

A clean sweep.
--- Synchronet 3.21a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Oct 3 15:42:35 2025

From Newsgroup: comp.arch

Stefan Monnier <monnier@iro.umontreal.ca> posted:

| - virtually tagged caches
| You can't really claim to be worst-of-the-worst without virtually
|tagged caches.
| Tears of joy as you debug cache alias issues and of flushing caches
|on context switches.

That is only true if one insists on OS with Multiple Address Spaces. Virtually tagged caches are fine for Single Address Space (SAS) OS.

AFAIK, the main problem with SASOS is "backward compatibility", most importantly with `fork`. The Mill people proposed a possible solution,
which seemed workable, but it's far from clear to me whether it would
work well enough if you want to port, say, Debian to such
an architecture.

SASOS seems like a bridge too far.

Stefan

--- Synchronet 3.21a-Linux NewsLink 1.2

From kegs@kegs@provalid.com (Kent Dickey) to comp.arch on Fri Oct 3 16:18:47 2025

From Newsgroup: comp.arch

In article <1759506155-5857@newsgrouper.org>,
MitchAlsup <user5857@newsgrouper.org.invalid> wrote:

Stefan Monnier <monnier@iro.umontreal.ca> posted:

| - virtually tagged caches
| You can't really claim to be worst-of-the-worst without virtually
|tagged caches.
| Tears of joy as you debug cache alias issues and of flushing caches
|on context switches.

That is only true if one insists on OS with Multiple Address Spaces.
Virtually tagged caches are fine for Single Address Space (SAS) OS.

AFAIK, the main problem with SASOS is "backward compatibility", most
importantly with `fork`. The Mill people proposed a possible solution,
which seemed workable, but it's far from clear to me whether it would
work well enough if you want to port, say, Debian to such
an architecture.

SASOS seems like a bridge too far.

Stefan

Fork is not a problem with virtual tagged caches or SAS. Normal fork
starts the child with a copy of the parent's address mapping, and uses
"Copy on Write" (COW) to create unique pages as soon as either process
does a write.

For it's entire existance, PA-RISC HP-UX supported virtual indexed
caches in a SAS, and implemented fork using Copy On Access. As soon as
the child process touched any page for read or write, it got a copy, so
it can only access its own pages (not counting read-only instruction
pages). This works fine, and it's not a performance issue. The love
folks have for COW is overblown. Real code either immediately exec()'s
(maybe doing some close()'s and other housekeeping first) or starts
writing lots of pages doing what it wants to do as a new process. Note
since the OS knows it needs to copy pages, it can pre-copy a bunch of
pages, such as the stack, and some basic data pages, to avoid some
initial faults for the exec() case at least.

Kent
--- Synchronet 3.21a-Linux NewsLink 1.2

From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Fri Oct 3 15:44:26 2025

From Newsgroup: comp.arch

Kent Dickey [2025-10-03 16:18:47] wrote:

Fork is not a problem with virtual tagged caches or SAS. Normal fork
starts the child with a copy of the parent's address mapping, and uses
"Copy on Write" (COW) to create unique pages as soon as either process
does a write.

The problem is not how/when you do the "copy", but the fact that once
the data at address A has been changed, address A in the child process
and address A in the parent don't contain the same value. This is fundamentally at odds with SASOS and with virtually-indexed&tagged
caches. The usual workaround is to augment the virtual addresses with
some kind of "address-space ID" (ASID).

That in turn makes it harder to share read-write memory between
processes (Mill's approach tried to accommodate that by augmenting only
*some* addresses with an ASID, but not all), and requires flushing the
cache when an ASID is re-used for another process (which can happen
rather often because the size of the ASID is usually limited to a small
number of bits).

Stefan
--- Synchronet 3.21a-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Fri Oct 3 16:19:12 2025

From Newsgroup: comp.arch

On 10/3/2025 10:41 AM, MitchAlsup wrote:

anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

Apparently someone wants to create a big-endian RISC-V, and someone
proposed adding support to that to Linux. This has evoked the
following design guideline for designing bad architectures from Linus
Torvalds (extracted from
<https://lwn.net/ml/all/CAHk-=wji-hEV1U1x92TLsrPbpSPqDD7Cgv2YwzeL-mMbM7iaRA@mail.gmail.com/>):

|If somebody really wants to create bad hardware in this day and age,
|please do make it big-endian, and also add the following very
|traditional features for sh*t-for-brains hardware:
|
| - virtually tagged caches
|
| You can't really claim to be worst-of-the-worst without virtually
|tagged caches.
|
| Tears of joy as you debug cache alias issues and of flushing caches
|on context switches.

Avoided.

| - only do aligned memory accesses
|
| Bonus point for not even faulting, and just loading and storing
|garbage instead.

Avoided.

| - expose your pipeline details in the ISA
|
| Delayed branch slots or explicit instruction grouping is a great
|way to show that you eat crayons for breakfast before you start
|designing your hardware platform

Avoided

| - extended memory windows
|
| It was good enough for 8-bit machines in order to address more
|memory, and became a HIGHMEM.SYS staple in the DOS world, and then got
|taken up by both x86 and arm in their 32-bit days as HIGHMEM support.

Avoided

| It has decades of history, and an architecture cannot be called
|truly awful if it doesn't support some kind of HIGHMEM crap.
|
| - register windows. It's like extended memory, but for your registers!
|
| Please make sure to also have hardware support for filling and
|spilling them, but make it limited enough that system software has to
|deal with faults at critical times. Nesting exceptions is joyful!
|
| Bonus points if they are rotating and overflowing them silently
|just corrupts data. Keep those users on their toes!

Avoided

| - in fact, require software fallbacks for pretty much anything unusual.
|
| TLB fills? They might only happen every ten or twenty instructions,
|so make them fault to some software implementation to really show your
|mad hardware skillz.

Avoided--and mine are even coherent so you don't even have to shoot
them down.

| denormals or any other FP precision issues? No, no, don't waste
|hardware on getting it right, software people *LOVE* to clean up after
|you.
|
| Remember: your mom picked up your dirty laundry from your floor,
|and software people are like the super-moms of the world.

Avoided.

| - make exceptions asynchronous.

Avoided

| That's another great way to make sure people stay on their toes.
|Make sure machine check exceptions can happen in any context, so that
|you are guaranteed to have a dead machine any time anything goes
|wrong.
|
| But you should also take the non-maskability of NMI to heart, and
|make sure that software cannot possibly write code that is truly
|atomic. Because the NM is NMI is what makes it great!

Avoided

| Floating point! Make sure that the special case you don't deal with
|in hardware are also delayed so that the software people have extra
|joy in trying to figure out just WTF happened. See the previous entry:
|they live for that stuff.

Avoided

|I'm sure I've forgotten many other points. And I'm sure that hardware
|people will figure it out!

A clean sweep.

The alternative position might be:
All jank is acceptable so long as it doesn't significantly impede
performance or negatively impact userland.

Or, maybe, actively embracing the "full jank route".

Possibly Torvalds wouldn't exactly approve though...

Well, except for aligned-only and big-endian, better reasons not to go
that way. Better IMO to just leave everything LE and then use byte-swap instructions for the rare case one needs to access a big-endian variable.

Well, and then be annoyed that C lacks any standard way to specify the endianess of variables or pointers; and the need to have compiler
builtins which map to to htonl/ntohl/htons/ntohs/... (with the usual
annoyance that one also needs a generic function fallback in the
background for the case where someone wants to take the function pointer
of one of these functions; sorta like with memcpy and similar).

If I were to try to go in a "jank reducing" direction, probably:
Use XG3 as a design base;
Comparably cleaner and more orthogonal than XG1 and XG2.
Eliminate Modal stuff;
Maybe drop the RISC-V conjoined-twin thing;
Hardware page walker and fully IEEE FPU?...
Probably also add cache coherence.
Mandate zero or sign extended registers as the default (like x86-64);
Put FPU status/control into its own register or similar (*1).
...

Though, unclear is if a "good" core by these definitions could be done
without a significant negative impact on FPGA resource budget.

*1: Sticking it into the HOBs of either GP or SP is ugly, and has an unreasonable level of footgun potential. So, this is pretty high on my
"I probably need to change this before it ends up getting stuck this way permanently" thing (in which case, would go back to SP[63:48] being
hard-wired to 0).

This is probably one of those "going to change once I come up with a
better option" situations.

Don't really want to define a new CR for this, but need a place to put
it that:
May be exposed to userland without creating problems;
May be saved/restored on context switches.

Actually, relocating it the HOBs of TBR could almost work here:
Already preserved on context switch;
Not directly visible to RISC-V or XG3 via normal registers;
TP is a shadow of TBR in TestKern, but TP is its own register here.

In this case, might change TP from "Read Only in userland" to "Fault on attempt to modify low 48-bits in Userland".

Exposure to RISC-V land being the bigger problem, as compilers like GCC
are not going to be aware of "various registers may have weird crap
squirreled into the HOBs" type issues.

Granted, Link-Registers have weird stuff in the HOBs, but generally GCC doesn't poke at the link register. But, then again, there is still the
"glibc violently explodes if I try to use it" issue, and I can't prove
this is not due to the wacky link registers or similar (would have to
more carefully examine it to make sure it isn't doing something weird
here). If it turns out that glibc messes with the link register, may
need to figure out a way to make RV mode work with bare-pointer link registers.

...

--- Synchronet 3.21a-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Fri Oct 3 17:42:19 2025

From Newsgroup: comp.arch

On 10/3/2025 10:26 AM, Stefan Monnier wrote:

| - virtually tagged caches
| You can't really claim to be worst-of-the-worst without virtually
|tagged caches.
| Tears of joy as you debug cache alias issues and of flushing caches
|on context switches.

That is only true if one insists on OS with Multiple Address Spaces.
Virtually tagged caches are fine for Single Address Space (SAS) OS.

AFAIK, the main problem with SASOS is "backward compatibility", most importantly with `fork`. The Mill people proposed a possible solution,
which seemed workable, but it's far from clear to me whether it would
work well enough if you want to port, say, Debian to such
an architecture.

You can... just sort of not support full "fork()"; or support it in a
way similar to how it works on ucLinux and Cygwin. Namely, you can use
it, but trying to use it for anything more than a fork immediately
followed by an "exec*" call or similar is probably going to break something.

Well, or anything that depends on "fork()" isn't going to work; and the preferable way to spawn new process instances is something along the
lines of a "CreateProcessEx()" style mechanism.

As can be noted, I had designed my ABIs with the assumption of a single address space.

Generally, it ended up as 48 bit as, even within the limits of an FPGA
with only 128MB of actual RAM or so, a 32-bit VAS can get a bit cramped (where, 32-bits is only really enough for a single-program in an address space, if that).

My "break glass" feature for 48-bits being insufficient for a single
address space was expanding the VAS to 96 bits, though even this was a
bit wonk:
Low 32-bits: Real address bits;
Next 24 bits: Just sorta mash all the HOBs together and hope it doesn't
break.

Where, say, extending the L1 cache tags by 8 bits is a lot cheaper than extending them by 48 bits, and offers a sufficiently low probability of aliasing.

So, in the 96-bit mode:
0000_00000000-0000_00000000..0000_00000000-7FFF_FFFFFFFF:
Preserved exactly if no higher addresses used.
Anything else: YMMV.

There is a non-zero risk of random 4GB regions aliasing based on the
whims of the XOR, as actually storing full 96-bit addresses is steep.
The page-tables and TLB could support full-width 96-bit addresses, so
the main problem area would be trying to use two addresses at the same
time where they would map to the same location in the L1 cache.

However, if one assumes a scenario where each program is confined to a
slice of the bigger 96-bit space, then the XOR's all even out and the
address space is consistent (the risk mostly appearing when using
addresses not within the same 48-bit "quadrant").

Theoretically, the OS's ASLR could keep track of this and not assign
address ranges that would alias with previously used address ranges (via
a lookup table).

Kinda similar crap to the "PE loader may not load a PE to an address
that crosses a 4GB boundary" because it adds cost to have
direct-branches and PC increment need to deal with more than 4GB.
Well, sorta:
PC increment still has a 4GB window;
Branches are either 16MB window (via branch predictor);
Or, +/- 8GB, via normal address calc.
Branch predictor detecting carry-out and not handling the branch.
Was 4GB originally, but the above trick allowed being cheaper here.
However, crossing a 16MB barrier has a performance penalty.
Statistically low probability of ".text" crossing such a barrier.

Arguably, all still kinda crap though...

For now, 48-bits is plenty for my uses.

I considered possible options 64-bit VAS support (within the 96-bit
mode), but annoyingly, if done in an affordable way, would likely not
allow program code outside the low 48 bits, or arrays crossing a 48-bit boundary (or, still slightly jank).

Though, IMHO, still better than what MIPS did, IIRC:
PC1[63:28] = PC0[63:28]
PC1[27: 2] = JAL_Addr[25:0]
PC1[ 1: 0] = 0

Or, say, you have a 256MB barrier that may not be crossed, and the
loader would need to rebase within said 256 MB.

Information is inconsistent for conditional branches, where some
information implies it is simply adding the displacement (scaled by 4),
and other info implies:
Copy high bits unchanged;
Add low-order bits;
Address may wrap if it crosses some ill-defined address barrier.

They seemingly missed an opportunity to go cheaper for Bcc here, say:
PC1[63:20] = PC0[63:20]
PC1[19:14] = PC0[19:14] + SExt(Bcc_Addr[15:12])
PC1[13: 2] = Bcc_Addr[11:0]
PC1[ 1: 0] = 0
Then, say, one only needs to do an 6-bit addition for the conditional
branch instruction.

Trying to rebase a program at load time being "there be dragons here" territory.

...

Stefan

--- Synchronet 3.21a-Linux NewsLink 1.2

From kegs@kegs@provalid.com (Kent Dickey) to comp.arch on Sat Oct 4 04:36:28 2025

From Newsgroup: comp.arch

In article <jwvo6qoui1m.fsf-monnier+comp.arch@gnu.org>,
Stefan Monnier <monnier@iro.umontreal.ca> wrote:

| - virtually tagged caches
| You can't really claim to be worst-of-the-worst without virtually
|tagged caches.
| Tears of joy as you debug cache alias issues and of flushing caches
|on context switches.

That is only true if one insists on OS with Multiple Address Spaces.
Virtually tagged caches are fine for Single Address Space (SAS) OS.

AFAIK, the main problem with SASOS is "backward compatibility", most >importantly with `fork`. The Mill people proposed a possible solution,
which seemed workable, but it's far from clear to me whether it would
work well enough if you want to port, say, Debian to such
an architecture.

Stefan

Copy-on-Access gives you 100% compatibility with all fork() semantics.

You can define SAS in a way that almost defeats virtual addresses, but
let's assume we have 48-bit virtual address space and 16-bit ASID, for
an effective 64-bit SAS. We'll have every process using a different ASID.
And we'll assume the ASID affects dcache indexing so we have to handle that.

First process is ASID=1. It forks, and the child is ASID=2. It is a completely new address space. We'll assume they cannot see each other's
data in the dcache due to the virtual indexes being different. So
ASID=1, VA=0x1000 maps to a different dcache index than ASID=2,
VA=0x1000 even if they map to the same physical address. The ASID=2
process starts (for the sake of a simple explanation) with no pages
mapped, except it maps all the read-only instruction pages from ASID=1
as ASID=2. (Note it doesn't matter if these are at different
instruction and/or data cache indexes since it's always read-only). All
data pages from the ASID=1 process are made invalid (in the page table,
and removed from the TLB). Now ASID=1 and ASID=2 are running
simultaneously. If the ASID=1 process touches any data page, the OS
copies the contents of that original physical page to a new page, and
makes that new page available to the ASID=2 process. This copy is the
real trick: in the dumbest possible implementation, the OS flushes the
data to DRAM, then copies it to the new physical address, and flushes
that to DRAM. But systems with caches with virtual aliasing generally
provide ways to handle the aliasing in a more efficient way to do this
copying in the caches, at least in the L2 cache. Once the copy of the
one page is done, the OS then makes the corresponding ASID=1 page
writeable, and continues. Similarly, if the ASID=2 process touches a
page, it gets a copy of the ASID=1 page (which ASID=1 has not touched
yet), and then the OS gives the ASID=1 process write access to that
page. Basically, both processes are "paging in" the ASID=1 pages.

ASID=1 keeps all of its physical pages. ASID=2 get a copy of all the
physical pages from ASID=1 that it touches.

Note that COW has to go and make all pages of the initial process read-only, which might be more work than to just make all pages invalid.

Kent
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Levine@johnl@taugh.com to comp.arch on Sat Oct 4 18:36:45 2025

From Newsgroup: comp.arch

It appears that Kent Dickey <kegs@provalid.com> said:

AFAIK, the main problem with SASOS is "backward compatibility", most >>importantly with `fork`. ...

First process is ASID=1. It forks, and the child is ASID=2. It is a >completely new address space. ...

I don't think anyone would call a system that gives each process a completely new address space a single address space system. Making the ASID part of the translated address is one of many ways of implementing a conventional address space per process system.

The last widely used single address space systems I can think of were OS/VS1 and OS/VS2 SVS, each of which provided a single full sized address space in which they essentially ran their real memory predecessors MFT and MVT. As
Lynn has often told us, operating system bloat forced them quickly to go
to MVS, an address space per process.

I suppose there could still be single address space realtime or
embedded systems where all the programs to be run are known when the
system is built.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sat Oct 4 19:00:17 2025

From Newsgroup: comp.arch

John Levine <johnl@taugh.com> schrieb:

The last widely used single address space systems I can think of were OS/VS1 and OS/VS2 SVS,

Don't forget all the home computers. It might be debatable if they
should be called "system", though.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Sat Oct 4 12:31:49 2025

From Newsgroup: comp.arch

On 10/4/2025 11:36 AM, John Levine wrote:

It appears that Kent Dickey <kegs@provalid.com> said:

AFAIK, the main problem with SASOS is "backward compatibility", most
importantly with `fork`. ...

First process is ASID=1. It forks, and the child is ASID=2. It is a
completely new address space. ...

I don't think anyone would call a system that gives each process a completely new address space a single address space system. Making the ASID part of the translated address is one of many ways of implementing a conventional address space per process system.

The last widely used single address space systems I can think of were OS/VS1 and OS/VS2 SVS, each of which provided a single full sized address space in which they essentially ran their real memory predecessors MFT and MVT.

Isn't the AS/400, or whatever it is called now, a SAS?
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sun Oct 5 01:05:17 2025

From Newsgroup: comp.arch

On Sat, 4 Oct 2025 18:36:45 -0000 (UTC)
John Levine <johnl@taugh.com> wrote:

It appears that Kent Dickey <kegs@provalid.com> said:

AFAIK, the main problem with SASOS is "backward compatibility", most >>importantly with `fork`. ...

First process is ASID=1. It forks, and the child is ASID=2. It is a >completely new address space. ...

I don't think anyone would call a system that gives each process a
completely new address space a single address space system.

Agreed.

Making
the ASID part of the translated address is one of many ways of
implementing a conventional address space per process system.

The last widely used single address space systems I can think of were
OS/VS1 and OS/VS2 SVS,

How would you call OS/400 (nowadays, IBM i) ?

each of which provided a single full sized
address space in which they essentially ran their real memory
predecessors MFT and MVT. As Lynn has often told us, operating
system bloat forced them quickly to go to MVS, an address space per
process.

I suppose there could still be single address space realtime or
embedded systems where all the programs to be run are known when the
system is built.

IIRC, Windows CE supported SAS mode of operation just fine without such limitations.

--- Synchronet 3.21a-Linux NewsLink 1.2

From John Levine@johnl@taugh.com to comp.arch on Sat Oct 4 22:44:52 2025

From Newsgroup: comp.arch

It appears that Michael S <already5chosen@yahoo.com> said:

The last widely used single address space systems I can think of were
OS/VS1 and OS/VS2 SVS,

How would you call OS/400 (nowadays, IBM i) ?

I haven't looked at it for a while but I think you're right.
They have POSIX compatible APIs, wonder how that works.

I suppose there could still be single address space realtime or
embedded systems where all the programs to be run are known when the
system is built.

IIRC, Windows CE supported SAS mode of operation just fine without such >limitations.

For that matter, so did MS-DOS and Windows up through 3.0.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.21a-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Sat Oct 4 17:57:16 2025

From Newsgroup: comp.arch

On 10/4/2025 5:44 PM, John Levine wrote:

It appears that Michael S <already5chosen@yahoo.com> said:

The last widely used single address space systems I can think of were
OS/VS1 and OS/VS2 SVS,

How would you call OS/400 (nowadays, IBM i) ?

I haven't looked at it for a while but I think you're right.
They have POSIX compatible APIs, wonder how that works.

FWIW, I suspect that the number of programs that use "fork()" without immediately calling "exec*()" is probably fairly small.

AFAIK, programs that depend on full "fork()" semantics wont generally
work on Cygwin either, as IIRC it is just sort of faked by copying the
local stack frame and spawning out a new thread that terminates on the "exec*()" call.

Apart from non-PIE ELF or similar, not much else doesn't work in an SAS. Though, ABI tweaks are needed to make things efficient (eg, not needing
to load in a new copy of the binaries for every new process).

I suppose there could still be single address space realtime or
embedded systems where all the programs to be run are known when the
system is built.

IIRC, Windows CE supported SAS mode of operation just fine without such
limitations.

For that matter, so did MS-DOS and Windows up through 3.0.

Not sure if 16-bit protected mode segmentation counts as SAS though.
MS-DOS, maybe, as one could do address math on the segments.

FWIW, some of my own engineering efforts here took inspiration from
Windows CE.

Like, the way I am using the "Global Pointer" directory entry in the
PE/COFF headers wasn't entirely a novel innovation on my end, ...

--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sun Oct 5 02:18:26 2025

From Newsgroup: comp.arch

On Sat, 4 Oct 2025 22:44:52 -0000 (UTC)
John Levine <johnl@taugh.com> wrote:

It appears that Michael S <already5chosen@yahoo.com> said:

I suppose there could still be single address space realtime or
embedded systems where all the programs to be run are known when
the system is built.

IIRC, Windows CE supported SAS mode of operation just fine without
such limitations.

For that matter, so did MS-DOS and Windows up through 3.0.

It's not the same.
CE supported preemptive multitasking (arguably, better than likes of NT
or majority of popular Unixes, at least as long as we are talking about non-SMP) and memory protection, both protection of kernel from user
processes and of user processes from each other.

I never took a look at CE support for Virtual Memory. Probably it was
quite weak, if there was support at all. The only CE-based product I
ever did had absolutely no need for Virtual Memory.

However I am pretty sure that they utilized paging hardware for
management of physical memory, removing fear of fragmentation.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Lynn Wheeler@lynn@garlic.com to comp.arch on Sat Oct 4 14:17:32 2025

From Newsgroup: comp.arch

John Levine <johnl@taugh.com> writes:

The last widely used single address space systems I can think of were OS/VS1 and OS/VS2 SVS, each of which provided a single full sized address space in which they essentially ran their real memory predecessors MFT and MVT. As Lynn has often told us, operating system bloat forced them quickly to go
to MVS, an address space per process.

they had two kinds of bloat. original decision to add virtual memory was because of MVT storage management problems, having to specify each
region (concurrent execution) four times larger than actually used, as a
result a typical 1mbyte 370/165 only ran four concurrent regions,
insufficient to keep system busy and justified. Going to 16mbyte virtual address space (VS2/SVS) allowed concurrent regions to be increased by
factor of four (sort of like running MVT in a 16mbyte CP67 virtual
machine ... aka CP67 recursor to VM370), with little or no paging
... although caped at 15 because of 4bit storage protects keys.

Problem was that as systems got larger/faster needed to move past 15
concurrent regions ... which resulted in giving each concurrently
executing region/program, their own 16mbyte virtual address space
(VS2/MVS). However, OS/360 & descendents were heavily pointer passing
APIs (creating a different problem) and so they mapped a 8mbyte image of
the MVS kernel into every 16mbyte virtual address space (leaving
8mbytes). Then because each subsystem was moved into their separate
16mbyte virtual address space, the 1mbyte "Common Segment Area" (CSA)
was mapped into every virtual address space for passing arguments/data
back and forth between applications and subsystems (leaving 7mbytes).

Then because the space requirements for passing arguments/data back and
forth was somewhat proportional to number of subsystems and concurrently running regions/applications, the CSA started to explode becoming the
Common System Area (CSA) running 5-6mbytes (leaving 2-3mbytes for regions/applications) and threatening to become 8mbytes (leaving zero
for regions/applications). At the same time the number of concurrently
running applications space requirements was exceeding 16mbytes real
address ... and 2nd half 70s, 3033s were retrofitted for 64mbytes real addressing by taking two unused bits in page table entry and prefixing
them to the 12bit (4k) real page number for 14bits or 64mbyte
(instructions were still 16mbyte, but virtual pages could be
loaded and run "above the 16mbyte line").

Then part of 370/xa "access registers" was retrofitted to 3033 for dual
address space mode. Calls to subsystems, could move the caller's address
space pointer into the secondary address space register and the
subsystem address space pointer was moved into primary. Subsystems then
could access the caller's (secondary) virtual address space w/o needing
data be passed back&forth in CSA. For 370/xa, program call/return
instructions could perform the address space primary/secondary switches
all in hardware.

I had also started pontificating that lot of OS/360 had heavily
leveraged I/O system to compensate for limited real storage (and
descendents had inherited it). In early 80s, I wrote a tome that
relative system disk I/O throughput had declined by an order of
magnitude (disks throughput got 3-5 times faster while systems got 40-50
times faster (major motivation for constantly needing increasingly
number of concurrently executing programs). Disk division executive took exception and directed the division performance organization to refute
my claims. After a couple weeks, they came back and basically said that
I had slightly understated the problem. They then respun the analysis
for SHARE (user group) presentation on how to configure/manage disks for improved system throughput (16Aug1984, SHARE 63, B874).

3033 above the "16mbyte" line hack: There were problems with parts of
system that required virtual pages below the "16mbyte line". Introduced
with 370 was I/O channel program IDALs that were full-word
addresses. Somebody came up with idea to use IDALs to write a virtual
page (above 16mbyte) to disk and then read it back into address
<16mbyte. I gave them a hack using virtual address space table that
filled in page table entries with the >16mbyte page number and <16mbyte
page number and use MVCL instruction to copy the virtual page from above 16mbyte line to below the line.
--
virtualization experience starting Jan1968, online at home since Mar1970
--- Synchronet 3.21a-Linux NewsLink 1.2

From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Sun Oct 5 13:02:29 2025

From Newsgroup: comp.arch

John Levine wrote:

It appears that Michael S <already5chosen@yahoo.com> said:

The last widely used single address space systems I can think of were
OS/VS1 and OS/VS2 SVS,

How would you call OS/400 (nowadays, IBM i) ?

I haven't looked at it for a while but I think you're right.
They have POSIX compatible APIs, wonder how that works.

For operating systems like VMS and WNT that cannot fork (duplicate a
parent virtual space into a child) Posix allows spawn() instead.

Spawn is equivalent to fork()/exec() and CreateProcess() in that
it creates a new address space, loads an exe, and starts a thread.
Like fork() and WNT CreateProcess(), spawn() allows open file descriptor handles to be passed to the child.

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/spawn.h.html

https://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_spawn.html

--- Synchronet 3.21a-Linux NewsLink 1.2

From George Neuner@gneuner2@comcast.net to comp.arch on Mon Oct 6 06:54:10 2025

From Newsgroup: comp.arch

On Fri, 3 Oct 2025 16:18:47 -0000 (UTC), kegs@provalid.com (Kent
Dickey) wrote:

In article <1759506155-5857@newsgrouper.org>,
MitchAlsup <user5857@newsgrouper.org.invalid> wrote:

Stefan Monnier <monnier@iro.umontreal.ca> posted:

| - virtually tagged caches
| You can't really claim to be worst-of-the-worst without virtually >>> >> |tagged caches.
| Tears of joy as you debug cache alias issues and of flushing caches >>> >> |on context switches.

That is only true if one insists on OS with Multiple Address Spaces.
Virtually tagged caches are fine for Single Address Space (SAS) OS.

AFAIK, the main problem with SASOS is "backward compatibility", most
importantly with `fork`. The Mill people proposed a possible solution,
which seemed workable, but it's far from clear to me whether it would
work well enough if you want to port, say, Debian to such
an architecture.

SASOS seems like a bridge too far.

Stefan

Fork is not a problem with virtual tagged caches or SAS. Normal fork
starts the child with a copy of the parent's address mapping, and uses
"Copy on Write" (COW) to create unique pages as soon as either process
does a write.

Copy-On-Write (or Copy-On-Access) doesn't solve the fork problem in
SAS - which is that copied /pointers/ remain referencing objects in
the original process. Under the multi-space model of Unix/Linux,
after a fork the copied pointers should be referencing the copied
objects in the new process.

Lacking a way to identify and fixup pointer values, under SAS by
simply copying data (COW or COA) you end unintentionally /sharing/
data.

For it's entire existance, PA-RISC HP-UX supported virtual indexed
caches in a SAS, and implemented fork using Copy On Access. As soon as
the child process touched any page for read or write, it got a copy, so
it can only access its own pages (not counting read-only instruction
pages). This works fine, and it's not a performance issue. The love
folks have for COW is overblown. Real code either immediately exec()'s >(maybe doing some close()'s and other housekeeping first) or starts
writing lots of pages doing what it wants to do as a new process. Note
since the OS knows it needs to copy pages, it can pre-copy a bunch of
pages, such as the stack, and some basic data pages, to avoid some
initial faults for the exec() case at least.

fork-exec is not a problem. fork alone is.

How did HP-UX on PA-RISC handle fork?

Kent

--- Synchronet 3.21a-Linux NewsLink 1.2

From kegs@kegs@provalid.com (Kent Dickey) to comp.arch on Mon Oct 6 15:49:10 2025

From Newsgroup: comp.arch

In article <10brpft$23go$1@gal.iecc.com>, John Levine <johnl@taugh.com> wrote: >It appears that Kent Dickey <kegs@provalid.com> said:

AFAIK, the main problem with SASOS is "backward compatibility", most >>>importantly with `fork`. ...

First process is ASID=1. It forks, and the child is ASID=2. It is a >>completely new address space. ...

Sorry, bad terminology. I just means all addresses under ASID=2 are
invalid.

In my example, all processes can peek inside any other process's address
space, by just forming the 64-bit virtual address. The ASID thing is
just a convention, so I wouldn't have to type 16 digit hex numbers over and over.

[snip]

The last widely used single address space systems I can think of were OS/VS1 >and OS/VS2 SVS, each of which provided a single full sized address space in >which they essentially ran their real memory predecessors MFT and MVT. As >Lynn has often told us, operating system bloat forced them quickly to go
to MVS, an address space per process.

HP-UX on PA-RISC from 1986-2004 or so was effectively a SAS computer. In 32-bit CPUs, the virtual address space was 48 bits, and normal user code could form any 48-bit address, and this was used for shared libraries and shared
code (processes running the same executable shared the same virtual address space for the executable). In 64-bit mode, it works mostly as I described. There were 32-bit Space registers which were OR'ed into the upper bits of
the 64-bit virtual address, to give the global 64-bit system address.
It was an OS convention to limit the Space values to the upper 16 bits or so, and it could change it to whatever it wanted.

I suppose there could still be single address space realtime or
embedded systems where all the programs to be run are known when the
system is built.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Kent
--- Synchronet 3.21a-Linux NewsLink 1.2

From kegs@kegs@provalid.com (Kent Dickey) to comp.arch on Mon Oct 6 16:44:52 2025

From Newsgroup: comp.arch

In article <ne67ekdeej48s8jp7jh1ahda32qmiphm0p@4ax.com>,
George Neuner <gneuner2@comcast.net> wrote:

On Fri, 3 Oct 2025 16:18:47 -0000 (UTC), kegs@provalid.com (Kent
Dickey) wrote:

In article <1759506155-5857@newsgrouper.org>,
MitchAlsup <user5857@newsgrouper.org.invalid> wrote:

Stefan Monnier <monnier@iro.umontreal.ca> posted:

| - virtually tagged caches
| You can't really claim to be worst-of-the-worst without virtually >>>> >> |tagged caches.
| Tears of joy as you debug cache alias issues and of flushing caches >>>> >> |on context switches.

That is only true if one insists on OS with Multiple Address Spaces. >>>> > Virtually tagged caches are fine for Single Address Space (SAS) OS.

AFAIK, the main problem with SASOS is "backward compatibility", most
importantly with `fork`. The Mill people proposed a possible solution, >>>> which seemed workable, but it's far from clear to me whether it would
work well enough if you want to port, say, Debian to such
an architecture.

SASOS seems like a bridge too far.

Stefan

Fork is not a problem with virtual tagged caches or SAS. Normal fork >>starts the child with a copy of the parent's address mapping, and uses >>"Copy on Write" (COW) to create unique pages as soon as either process
does a write.

Copy-On-Write (or Copy-On-Access) doesn't solve the fork problem in
SAS - which is that copied /pointers/ remain referencing objects in
the original process. Under the multi-space model of Unix/Linux,
after a fork the copied pointers should be referencing the copied
objects in the new process.

Lacking a way to identify and fixup pointer values, under SAS by
simply copying data (COW or COA) you end unintentionally /sharing/
data.

For it's entire existance, PA-RISC HP-UX supported virtual indexed
caches in a SAS, and implemented fork using Copy On Access. As soon as
the child process touched any page for read or write, it got a copy, so
it can only access its own pages (not counting read-only instruction >>pages). This works fine, and it's not a performance issue. The love
folks have for COW is overblown. Real code either immediately exec()'s >>(maybe doing some close()'s and other housekeeping first) or starts
writing lots of pages doing what it wants to do as a new process. Note >>since the OS knows it needs to copy pages, it can pre-copy a bunch of >>pages, such as the stack, and some basic data pages, to avoid some
initial faults for the exec() case at least.

fork-exec is not a problem. fork alone is.

How did HP-UX on PA-RISC handle fork?

Kent

This is what I was saying: if you define SAS to only mean that each
process is living at a unique address, and it knows its full address,
then I don't wish to discuss that SAS. That's like running without
virtual memory.

If you define SAS that all processes can see other running processes
addresses, and can directly read/write each others addresses (with protection obviously), then that's the SAS HP PA-RISC ran in.

HP PA-RISC 64-bit creates a 64-bit global virtual address. Each process
by convention lives in a smaller part of that, let's say a 48-bit space.
Each process has 8 32-bit Space Registers (not general registers, and
some are not writeable by the user, but 5 are writeable) which are OR'ed
in to bits [63:32] of the VA address bits formed by loads and stores to
form the GVA. Of GVA bits [63:32], it's an OS convention how many bits
are effectively the ASID and how many are VA bits for the process.
The GVA is mostly transparent to the user process--they can read the Space Registers and figure it out if they want to, but this was not usual.

[The architecture defines Space registers as up to 64-bit, so there's a 96-bit GVA, but the hardware only implemented 32-bit Space registers with a 64-bit GVA].

Note that at any time, user code can set Space Register 1 to 0, form
the address 0x12345678_12345670 in a register, and try to read and write
that address. This will generally fail due to a Protection ID scheme, but
some Space Register values were reserved for shared libraries to share the
code at the same GVA in all processes.

So fork() is easy--no pointers in memory or registers are affected, the
OS assignes a new ASID, puts that in the upper bits of the Space
Registers for the new process, and it's off. But all HP PA-RISC CPUs have virtually indexed caches, where the ASID is mixed in with lower address
bits to "hash" the cache lookup. So it needed to do COA since the new
ASID is different, so the same VA wouldn't see the cached data of the
old process.

Note that the OS sees all processes at once. If it wants to read from
one process and write to another, it can just do Load, then Store.

Kent
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Savard@quadibloc@invalid.invalid to comp.arch on Thu Oct 9 13:57:06 2025

From Newsgroup: comp.arch

On Fri, 03 Oct 2025 08:58:32 +0000, Anton Ertl wrote:

Apparently someone wants to create a big-endian RISC-V, and someone
proposed adding support to that to Linux.

I had previously seen Linus' specific response to that: support should not
be added now, as that would be promoting fragmentation of RISC-V, but, of course, if it was implemented and widely used, of course it would have to
be supported in Linux.

John Savard
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Savard@quadibloc@invalid.invalid to comp.arch on Thu Oct 9 21:41:03 2025

From Newsgroup: comp.arch

On Fri, 03 Oct 2025 08:58:32 +0000, Anton Ertl quoted:

|If somebody really wants to create bad hardware in this day and age,
|please do make it big-endian, and also add the following very
|traditional features for sh*t-for-brains hardware:

I think that for a computer to be big-endian is a good thing.

It makes it easier to understand core dumps, as numbers are stored just as they are written.

But more importantly, it means that binary integers are ordered the same
way as packed decimal integers, which are ordered the same way as integers
in character text form.

As for the _rest_ of the items, though, all of them are indeed bad things.

But some are worse than others.

| - only do aligned memory accesses

Nearly all memory access are, or could be, aligned. Performance is
improved if they are. As long as there's some provision to handle
unaligned data, such as a move characters instruction, data structures can
be dealt with for things like communications formats.
I'm not saying it isn't bad, just that it was excusable before we had as
many transistors available as we do now.

| - expose your pipeline details in the ISA

The original MIPS did this. This is bad indeed, as whatever you do in this direction won't be applicable to later iterations of the ISA as technology advances.

Failing to support the entire IEEE 754 floating-point standard just needs
to be documented. Expecting software to fake it being implemented is not reasonable: as long as denormals instead produce zero as the result, one
just has an inferior floating-point format, not a computer that doesn't
work.
Once again, bad, but not all that terrible.

But anything that means that programs could randomly fail because
interrupts don't properly save or restore the entire machine state...
*that* is catastrophically bad, and hardly compares to his other examples.

John Savard

--- Synchronet 3.21a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Thu Oct 9 22:10:10 2025

From Newsgroup: comp.arch

John Savard <quadibloc@invalid.invalid> posted:

On Fri, 03 Oct 2025 08:58:32 +0000, Anton Ertl quoted:

|If somebody really wants to create bad hardware in this day and age, |please do make it big-endian, and also add the following very
|traditional features for sh*t-for-brains hardware:

I think that for a computer to be big-endian is a good thing.

It makes it easier to understand core dumps, as numbers are stored just as they are written.

But more importantly, it means that binary integers are ordered the same
way as packed decimal integers, which are ordered the same way as integers in character text form.

Nada true:: packed decimal in LE is stored in the same order as binary.
Bytes at higher addresses are more significant.

As for the _rest_ of the items, though, all of them are indeed bad things.

But some are worse than others.

| - only do aligned memory accesses

Nearly all memory access are, or could be, aligned. Performance is
improved if they are. As long as there's some provision to handle
unaligned data, such as a move characters instruction, data structures can be dealt with for things like communications formats.
I'm not saying it isn't bad, just that it was excusable before we had as many transistors available as we do now.

I am (AM) a BE guy through and through--but even I can read the writing
on the wall. BE is dead and will remain an ever shrinking niche. Making
My 66000 architecture LE was <indeed> painful; but ultimately the correct decision.

| - expose your pipeline details in the ISA

The original MIPS did this. This is bad indeed, as whatever you do in this direction won't be applicable to later iterations of the ISA as technology advances.

We {the original RISC generation 1 architects} would have all dropped
delayed branches if we believed everyone else would do so. But we knew
they wouldn't, so we couldn't allow ourselves to loose 20% perf, so we
all jumped off the same cliff like lemmings. That was in the 1-wide
generation, by the 2-wide generation we knew it was bad-architecture,
by the 4-wide generation we would have all been better off without.

I do not think any of us would do that to our projects again. I advise
you not too either.

Failing to support the entire IEEE 754 floating-point standard just needs
to be documented. Expecting software to fake it being implemented is not reasonable: as long as denormals instead produce zero as the result, one just has an inferior floating-point format, not a computer that doesn't work. Once again, bad, but not all that terrible.

No, just no. There are enough transistors today to "do the right thing"
a) full 754-2019 support
b) misaligned memory
c) hardware table-walkers
d) HyperVisor support
e) an infinite number of interrupt tables

But anything that means that programs could randomly fail because
interrupts don't properly save or restore the entire machine state...
*that* is catastrophically bad, and hardly compares to his other examples.

We now need to provide for situations where the Guest OS fails, or
where Host OS fails; and the system remains up and running while
only a few applications die off and guest OS or Host OS reboots
from checkpoints.

John Savard

--- Synchronet 3.21a-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Thu Oct 9 22:21:12 2025

From Newsgroup: comp.arch

John Savard <quadibloc@invalid.invalid> writes:

On Fri, 03 Oct 2025 08:58:32 +0000, Anton Ertl quoted:

|If somebody really wants to create bad hardware in this day and age,
|please do make it big-endian, and also add the following very
|traditional features for sh*t-for-brains hardware:

I think that for a computer to be big-endian is a good thing.

It makes it easier to understand core dumps, as numbers are stored just as >they are written.

Any good dump analyzer will happily bswap the value before converting
it into a printable form on a little-endian system, just to make it
readable (when dumping in other than 8-bit units, of course).

The only benefit in modern days for big-endian is that network
protocols are in big-endian form. Not a big issue with modern
LE CPUs, where byteswap is a single cycle instruction.
--- Synchronet 3.21a-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Fri Oct 10 08:30:03 2025

From Newsgroup: comp.arch

scott@slp53.sl.home (Scott Lurndal) writes:

The only benefit in modern days for big-endian is that network
protocols are in big-endian form. Not a big issue with modern
LE CPUs, where byteswap is a single cycle instruction.

Clever architects put the byte swap it in the load and store
instructions, where the byte-swapping is just an addition to the
handling of misaligned loads and stores, which itself is an addition
to the handling of smaller-than-transfer-width accesses. PowerPC has
such instructions.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21a-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Fri Oct 10 15:02:17 2025

From Newsgroup: comp.arch

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

scott@slp53.sl.home (Scott Lurndal) writes:

The only benefit in modern days for big-endian is that network
protocols are in big-endian form. Not a big issue with modern
LE CPUs, where byteswap is a single cycle instruction.

Clever architects put the byte swap it in the load and store
instructions, where the byte-swapping is just an addition to the
handling of misaligned loads and stores, which itself is an addition
to the handling of smaller-than-transfer-width accesses. PowerPC has
such instructions.

Even better, hardware network accelerators bypass the CPU entirely
whan working with packets.
--- Synchronet 3.21a-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat Oct 11 07:18:16 2025

From Newsgroup: comp.arch

John Savard <quadibloc@invalid.invalid> writes:

On Fri, 03 Oct 2025 08:58:32 +0000, Anton Ertl quoted:

|If somebody really wants to create bad hardware in this day and age,
|please do make it big-endian, and also add the following very
|traditional features for sh*t-for-brains hardware:

I think that for a computer to be big-endian is a good thing.

Whatever the technical merits of different byte orders may be (and the
names "big-endian" and "little-endian" already indicate that far more discussion has been expended on the topic than these merits justify <https://en.wikipedia.org/wiki/Lilliput_and_Blefuscu#History_and_politics>), little-endian has won, and that's its major merit, and big-endian's
major demerit.

Any big-endian architecture will suffer from less software support,
and conversely, if software wants to include support for this
hardware, that results in extra development effort, i.e., extra cost
(not for all software, but for some). And Linus Torvalds is not
willing to expend this effort, not even if the initial patches for
supporting such an architecture come for free, because the additional
effort would be ongoing.

IBM has recognized the sign of the times, and added full-blown
little-endian support to Power (including unaligned accesses), and in
their Linux efforts retracted their support for the big-endian Power
and threw their weight behind little-endian Power.

Standardization has lots of merits, and deviating from an established
standard is a step one should not take lightly.

But more importantly, it means that binary integers are ordered the same
way as packed decimal integers, which are ordered the same way as integers >in character text form.

Says who? In a course we were a group of five who had to write some
program dealing with BCD numbers in 80286 assembly language. We
divided the work up, with each one writing some routines. Evantually,
on integration testing, we found that half of the group had
interpreted the numbers to be represented in little-endian order
(because the CPU was little-endian), and the other half had
interpreted them to be represented in big-endian order (because that
results in more readable memory dumps); and none of us thought that
any of the others would implement the other byte order. So no, the
byte order of BCD numbers is not obvious.

| - only do aligned memory accesses

Nearly all memory access are, or could be, aligned. Performance is
improved if they are. As long as there's some provision to handle
unaligned data, such as a move characters instruction, data structures can >be dealt with for things like communications formats.
I'm not saying it isn't bad, just that it was excusable before we had as >many transistors available as we do now.

Again, the merit of supporting unaligned accesses in this day and age
is that more software will run on your hardware, and the demerit of
not doing it is that extra software effort is required for some
software to support it, as you outline.

Failing to support the entire IEEE 754 floating-point standard just needs
to be documented. Expecting software to fake it being implemented is not >reasonable: as long as denormals instead produce zero as the result, one >just has an inferior floating-point format, not a computer that doesn't >work.

Software that expects a-b == 0.0 to give the same result as a==b (as
guaranteed by IEEE 754 40 years ago) won't work. What do you mean
with "not a computer that does not work" if the computer does not run
software with the intended results?

I take pride in the portability of my software, but for things that
have been settled in the mainstream (byte order, alignment, IEEE FP,
among other things), there must be a very good reason to support
deviants. E.g., RWX mappings have worked on every OS since the
beginning of mmap(), and are necessary for JITs. Trying to mmap RWX
fails on MacOS on Apple Silicon (it works on the same MacOS version on
Intel hardware, and it works on the same Apple Silicon under Linux, so
this is a voluntary removal of a capability by Apple). As a result,
the development version of Gforth did not work on MacOS on Apple
Silicon for several years.

My plan for fixing that was to just disable the JIT compiler and fall
back to the threaded code interpreter on that OS, but Bernd Paysan
actually decided to jump through the hoops that Apple sets up for
people writing JIT compilers. The result is a speedup by a factor 2-3
(times are run-times in seconds):

sieve bubble matrix fib fft
0.108 0.107 0.071 0.119 0.057 threaded code on Mac Mini M1 MacOS
0.052 0.041 0.027 0.038 0.018 JIT compiler on Mac Mini M1 MacOS
0.029 0.034 0.015 0.044 0.015 JIT compiler on Core i5-1135G7 Linux

For comparison, I also provided numbers for laptop hardware
contemporary with the M1.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Levine@johnl@taugh.com to comp.arch on Sun Oct 12 02:37:40 2025

From Newsgroup: comp.arch

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:

John Savard <quadibloc@invalid.invalid> writes:

On Fri, 03 Oct 2025 08:58:32 +0000, Anton Ertl quoted:

|If somebody really wants to create bad hardware in this day and age,
|please do make it big-endian, and also add the following very
|traditional features for sh*t-for-brains hardware:

I think that for a computer to be big-endian is a good thing.

Garrrgghhhhhhhh, not this again.

Whatever the technical merits of different byte orders may be (and the
names "big-endian" and "little-endian" already indicate that far more >discussion has been expended on the topic than these merits justify ><https://en.wikipedia.org/wiki/Lilliput_and_Blefuscu#History_and_politics>), >little-endian has won, and that's its major merit, and big-endian's
major demerit.

Yup. I really wish the arguments about which order is "more natural"
would stop since they're just people's cultural preconceptions. I
imagine that if my first language were Arabic or Hebrew, I would find left-to-right big-endian core dumps much less readable than the
familiar looking right-to-left little-endian ones.

But as you correctly said, the fight is over, little-endian has won,
let's argue about something else.

IEN 137 said everything worth saying about this topic 45 years ago.

https://www.rfc-editor.org/ien/ien137.txt
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Oct 12 07:13:57 2025

From Newsgroup: comp.arch

John Levine <johnl@taugh.com> schrieb:

But as you correctly said, the fight is over, little-endian has won,
let's argue about something else.

There is something to be said for at least having a big-endian
system around to test programs: If people mismatch types, there
is a chance that it will blow up on a big-endian system and work
silently on a little-endian system.

This has a reverse side: Little-endian having effectively won,
software often does not work on big-endian systems out of the box
any more. I suspect this is why IBM effectively chose little-endian
for POWER, but AIX is big-endian (and will remain so for the forseeable future).

And of course, this is all due to an architecture which is arguably
the most influential of all times (or at least has the highest
ratio of influence to recognition level, but that by a _huge_ margin):
The Datapoint 2200.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun Oct 12 09:51:38 2025

From Newsgroup: comp.arch

Thomas Koenig <tkoenig@netcologne.de> writes:

There is something to be said for at least having a big-endian
system around to test programs: If people mismatch types, there
is a chance that it will blow up on a big-endian system and work
silently on a little-endian system.

If the only thing wrong with the software is that it does not work on big-endian systems, and little-endian has won, is there really
anything wrong with the software?

This has a reverse side: Little-endian having effectively won,
software often does not work on big-endian systems out of the box
any more. I suspect this is why IBM effectively chose little-endian
for POWER, but AIX is big-endian (and will remain so for the forseeable >future).

If someone chooses to buy a big-endian system nowadays, they hopefully
know about these problems. If they need a particular piece of
software, they hopefully are able to sponsor porting it to the
big-endian system.

And of course, this is all due to an architecture which is arguably
the most influential of all times (or at least has the highest
ratio of influence to recognition level, but that by a _huge_ margin):
The Datapoint 2200.

Another widely-used architecture today inherited its byte order from
the 6502.

But the actual reason why little-endian has won is that all the
big-endian architectures either have been cancelled (HPPA, MIPSeb,
SPARC), switched to little-endian (Power on Linux), or are retreating
to a niche (Power on AIX, S390x).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Oct 12 10:14:08 2025

From Newsgroup: comp.arch

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

There is something to be said for at least having a big-endian
system around to test programs: If people mismatch types, there
is a chance that it will blow up on a big-endian system and work
silently on a little-endian system.

If the only thing wrong with the software is that it does not work on big-endian systems, and little-endian has won, is there really
anything wrong with the software?

A type mismatch? I think so.

And of course, this is all due to an architecture which is arguably
the most influential of all times (or at least has the highest
ratio of influence to recognition level, but that by a _huge_ margin):
The Datapoint 2200.

Another widely-used architecture today inherited its byte order from
the 6502.

Which one?
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sun Oct 12 13:56:25 2025

From Newsgroup: comp.arch

On Sun, 12 Oct 2025 10:14:08 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

There is something to be said for at least having a big-endian
system around to test programs: If people mismatch types, there
is a chance that it will blow up on a big-endian system and work
silently on a little-endian system.

If the only thing wrong with the software is that it does not work
on big-endian systems, and little-endian has won, is there really
anything wrong with the software?

A type mismatch? I think so.

And of course, this is all due to an architecture which is arguably
the most influential of all times (or at least has the highest
ratio of influence to recognition level, but that by a _huge_
margin): The Datapoint 2200.

Another widely-used architecture today inherited its byte order from
the 6502.

Which one?

Arm. It was designed as CPU for successor of 6502-based BBC Micro.

But does 6502 really have "byte order" in hardware? Or just "soft"
conventions of BBC BASIC interpreter?

--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Oct 12 11:38:39 2025

From Newsgroup: comp.arch

Michael S <already5chosen@yahoo.com> schrieb:

On Sun, 12 Oct 2025 10:14:08 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

There is something to be said for at least having a big-endian
system around to test programs: If people mismatch types, there
is a chance that it will blow up on a big-endian system and work
silently on a little-endian system.

If the only thing wrong with the software is that it does not work
on big-endian systems, and little-endian has won, is there really
anything wrong with the software?

A type mismatch? I think so.

And of course, this is all due to an architecture which is arguably
the most influential of all times (or at least has the highest
ratio of influence to recognition level, but that by a _huge_
margin): The Datapoint 2200.

Another widely-used architecture today inherited its byte order from
the 6502.

Which one?

Arm.

That does not have many architectural features from the 6502 :-)

It was designed as CPU for successor of 6502-based BBC Micro.

But does 6502 really have "byte order" in hardware? Or just "soft" conventions of BBC BASIC interpreter?

Yes, the 6502 is little-endian, which you can see in its instruction
formats and the way the pointers in the zero page were stored.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sun Oct 12 15:31:21 2025

From Newsgroup: comp.arch

On Sun, 12 Oct 2025 11:38:39 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:

Michael S <already5chosen@yahoo.com> schrieb:

On Sun, 12 Oct 2025 10:14:08 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

There is something to be said for at least having a big-endian
system around to test programs: If people mismatch types, there
is a chance that it will blow up on a big-endian system and work
silently on a little-endian system.

If the only thing wrong with the software is that it does not
work on big-endian systems, and little-endian has won, is there
really anything wrong with the software?

A type mismatch? I think so.

And of course, this is all due to an architecture which is
arguably the most influential of all times (or at least has the
highest ratio of influence to recognition level, but that by a
_huge_ margin): The Datapoint 2200.

Another widely-used architecture today inherited its byte order
from the 6502.

Which one?

Arm.

That does not have many architectural features from the 6502 :-)

It has the same byte order.

CZVN flags are superficially similar, although there is an important
difference - on ARM Z flag is not affected by non-arithmetic
instructions.

Also both processors appear to share a philosophy of design driven by practicality rather than by theoretical principles. They are what they
are because that was a maximum that comfortably fit into available
budgets of all sorts rather than because of "closing semantic gap" or
conversly "reducing instruction set".

It was designed as CPU for successor of 6502-based BBC Micro.

But does 6502 really have "byte order" in hardware? Or just "soft" conventions of BBC BASIC interpreter?

Yes, the 6502 is little-endian,
which you can see in its instruction formats

That does not count. Instruction encoding is orthogonal to the question
of byte order during execution. I had seen various combinations.
Including encodings that have no particular order, i.e. immediate field scattered in instruction word. Not that I remember which architecture it
was.

and the way the pointers in the zero page were stored.

Yes, I see.
Indirect addressing modes are clearly LE.
In case of JMP instruction 16-bit LE pointer does not even have to be in
zero page.

--- Synchronet 3.21a-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun Oct 12 13:31:22 2025

From Newsgroup: comp.arch

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

There is something to be said for at least having a big-endian
system around to test programs: If people mismatch types, there
is a chance that it will blow up on a big-endian system and work
silently on a little-endian system.

If the only thing wrong with the software is that it does not work on
big-endian systems, and little-endian has won, is there really
anything wrong with the software?

A type mismatch? I think so.

If there is really something wrong with the software on little-endian
systems, you don't need a big-endian system to find the mistake.

Another widely-used architecture today inherited its byte order from
the 6502.

Which one?

ARM A32, and then T32 and A64.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21a-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun Oct 12 13:36:51 2025

From Newsgroup: comp.arch

Michael S <already5chosen@yahoo.com> writes:

On Sun, 12 Oct 2025 11:38:39 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:

Michael S <already5chosen@yahoo.com> schrieb:

Arm.

That does not have many architectural features from the 6502 :-)

It has the same byte order.

Which is what is relevant for the question at hand. The intention of
the ARM architects was to produce a CPU for their successor of the BBC
Micro, and they certainly mentioned the prominent role of the 6502 as inspiration in their accounts; they obviously did not try to create a
32-bit 6502, but at least they did not change the byte order.

CZVN flags are superficially similar, although there is an important >difference - on ARM Z flag is not affected by non-arithmetic
instructions.

What about the other flags? My impression was that ARM instruction
sets always set NZCV together, which makes OoO implementation quite a
bit cheaper.

Looking in Zaks' 6502 book, I find that SBC sets NVZC, whereas CMP
only sets NZC (and lots of other instructions only set NZ). I expect
that this difference between SBC and CMP cost a transistor or two. I
wonder why they did that. Only setting NZ on, e.g., INC/INX/INY
probably also cost some transistors, but allowed to keep C in. e.g. a long-addition loop.

Yes, the 6502 is little-endian,
which you can see in its instruction formats

That does not count. Instruction encoding is orthogonal to the question
of byte order during execution. I had seen various combinations.
Including encodings that have no particular order, i.e. immediate field >scattered in instruction word. Not that I remember which architecture it
was.

In many (e.g., HPPA, RISC-V, funny constant encodings on ARM A64).
However, on the 6502 it is significant, because the instructions are
read byte-by-byte. They switched from the 6800's big-endian order to little-endian because the latter was cheaper and faster to implement
especially in the instructions. For the data, they could have
accessed two-byte data backwards and become big-endian (but with the
address pointing to the LSB, and the MSB being at address-1) without
much difficulty. The unusual address could be hidden by the assembler
(i.e., if you write "lda (2),y", that would be encoded as $b1 $3.

Indirect addressing modes are clearly LE.
In case of JMP instruction 16-bit LE pointer does not even have to be in
zero page.

JSR stores the return address in little-endian order and RTS loads the
address to return to in little-endian order.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Oct 12 15:10:02 2025

From Newsgroup: comp.arch

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

There is something to be said for at least having a big-endian
system around to test programs: If people mismatch types, there
is a chance that it will blow up on a big-endian system and work >>>>silently on a little-endian system.

If the only thing wrong with the software is that it does not work on
big-endian systems, and little-endian has won, is there really
anything wrong with the software?

A type mismatch? I think so.

If there is really something wrong with the software on little-endian systems, you don't need a big-endian system to find the mistake.

Would you consider a type mistake (access through the wrong type
of pointer, say store a value to char * and read via int *) to
be an error or not, if it is not directly observable on limited
number of test runs on a little-endian system? Your comment would
suggest not.

Another widely-used architecture today inherited its byte order from
the 6502.

Which one?

ARM A32, and then T32 and A64.

https://developer.arm.com/documentation/102376/0200/Alignment-and-endianness/Endianness
says endianness can be configurable (unless you mean something else
by A64).
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun Oct 12 15:48:02 2025

From Newsgroup: comp.arch

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

If the only thing wrong with the software is that it does not work on
big-endian systems, and little-endian has won, is there really
anything wrong with the software?

A type mismatch? I think so.

If there is really something wrong with the software on little-endian
systems, you don't need a big-endian system to find the mistake.

Would you consider a type mistake (access through the wrong type
of pointer, say store a value to char * and read via int *) to
be an error or not, if it is not directly observable on limited
number of test runs on a little-endian system? Your comment would
suggest not.

If no test can be devised that shows unintended behaviour on the
little-endian system, then I consider the program as delivered to be
working.

If a test can be devised that shows unintended behaviour on the
little-endian system, then there is no need for testing on a
big-endian system.

Another widely-used architecture today inherited its byte order from
the 6502.

Which one?

ARM A32, and then T32 and A64.

https://developer.arm.com/documentation/102376/0200/Alignment-and-endianness/Endianness
says endianness can be configurable (unless you mean something else
by A64).

Which has zero relevance, because everyone in their right mind
configures their machine little-endian.
<https://wiki.debian.org/ArmPorts> says:

|armeb - Big-endian OABI port targeting the linksys NSLU2 and
|similar. Interest fell after a method was determined for running
|little ending Linux systems on the NSLU2. Active during the sarge
|timeframe and now abandoned.

It would be cool for people who want to test portability to big-endian
systems if one could actually configure, say, a Raspi 5 for big-endian operation, and have a big-endian Linux distribution running on it, but
who is going to pay the developers for all this work?

And given that little-endian has won, why would one want to be able to
port to big-endian? Sure, there is a certain satisfaction in doing
pointless work, and one could extend this to supporting
word-addressable machines, 36-bit machines, sign-magnitude and
ones-complement machines, and decimal-arithmetic machines. But it's
better to spend one's time on useful features.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sun Oct 12 16:11:27 2025

From Newsgroup: comp.arch

John Levine <johnl@taugh.com> posted:

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:

John Savard <quadibloc@invalid.invalid> writes:

On Fri, 03 Oct 2025 08:58:32 +0000, Anton Ertl quoted:

|If somebody really wants to create bad hardware in this day and age,
|please do make it big-endian, and also add the following very
|traditional features for sh*t-for-brains hardware:

I think that for a computer to be big-endian is a good thing.

Garrrgghhhhhhhh, not this again.

Whatever the technical merits of different byte orders may be (and the >names "big-endian" and "little-endian" already indicate that far more >discussion has been expended on the topic than these merits justify ><https://en.wikipedia.org/wiki/Lilliput_and_Blefuscu#History_and_politics>), >little-endian has won, and that's its major merit, and big-endian's
major demerit.

Yup. I really wish the arguments about which order is "more natural"
would stop since they're just people's cultural preconceptions. I
imagine that if my first language were Arabic or Hebrew, I would find left-to-right big-endian core dumps much less readable than the
familiar looking right-to-left little-endian ones.

Top to bottom works for Japanese and Chinese. Yet I hear not
appetite for TB byte order.

But as you correctly said, the fight is over, little-endian has won,
let's argue about something else.

IEN 137 said everything worth saying about this topic 45 years ago.

https://www.rfc-editor.org/ien/ien137.txt

--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Oct 12 16:25:51 2025

From Newsgroup: comp.arch

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

If the only thing wrong with the software is that it does not work on >>>>> big-endian systems, and little-endian has won, is there really
anything wrong with the software?

A type mismatch? I think so.

If there is really something wrong with the software on little-endian
systems, you don't need a big-endian system to find the mistake.

Would you consider a type mistake (access through the wrong type
of pointer, say store a value to char * and read via int *) to
be an error or not, if it is not directly observable on limited
number of test runs on a little-endian system? Your comment would
suggest not.

If no test can be devised that shows unintended behaviour on the little-endian system, then I consider the program as delivered to be
working.

That isn't what I was saying.

If a test can be devised that shows unintended behaviour on the
little-endian system, then there is no need for testing on a
big-endian system.

Testing, by its very nature, is incomplete. The theoretical
possibility that a test can be derived does not help in practice.

I believe you have written programs. Did you ever put in a bug
that your existing testing framework did not catch?

Another widely-used architecture today inherited its byte order from >>>>> the 6502.

Which one?

ARM A32, and then T32 and A64.

https://developer.arm.com/documentation/102376/0200/Alignment-and-endianness/Endianness

says endianness can be configurable (unless you mean something else
by A64).

Which has zero relevance, because everyone in their right mind
configures their machine little-endian.
<https://wiki.debian.org/ArmPorts> says:

That's circular reasoning.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sun Oct 12 19:56:32 2025

From Newsgroup: comp.arch

On Sun, 12 Oct 2025 15:10:02 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

There is something to be said for at least having a big-endian
system around to test programs: If people mismatch types, there
is a chance that it will blow up on a big-endian system and work >>>>silently on a little-endian system.

If the only thing wrong with the software is that it does not
work on big-endian systems, and little-endian has won, is there
really anything wrong with the software?

A type mismatch? I think so.

If there is really something wrong with the software on
little-endian systems, you don't need a big-endian system to find
the mistake.

Would you consider a type mistake (access through the wrong type
of pointer, say store a value to char * and read via int *) to
be an error or not, if it is not directly observable on limited
number of test runs on a little-endian system? Your comment would
suggest not.

Another widely-used architecture today inherited its byte order
from the 6502.

Which one?

ARM A32, and then T32 and A64.

https://developer.arm.com/documentation/102376/0200/Alignment-and-endianness/Endianness
says endianness can be configurable (unless you mean something else
by A64).

Once, many years ago, I encounterd ARMv7-AR processor (TI MCU that was
based Cortex-R4 core) that was BE-only. I am still not sure whether it
violates ARM standard or not.
Never encountered ARMv7-M that was not LE-only.
For Arm v8-A and v9-A, the formal requirements are hard to understand.
But in pratice nobody makes cores that do not support LE or do not
power-up in LE mode. May be, some of them can be switched into BE later.
But why?

--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Oct 12 17:02:17 2025

From Newsgroup: comp.arch

Michael S <already5chosen@yahoo.com> schrieb:

But in pratice nobody makes cores that do not support LE or do not
power-up in LE mode. May be, some of them can be switched into BE later.
But why?

Somebody may want to port software from a big-endian system like
zSystem, AIX or SPARC, and may not want to go to the trouble of
making this code endian-clean.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sun Oct 12 20:13:57 2025

From Newsgroup: comp.arch

On Sun, 12 Oct 2025 13:36:51 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

Michael S <already5chosen@yahoo.com> writes:

On Sun, 12 Oct 2025 11:38:39 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:

Michael S <already5chosen@yahoo.com> schrieb:

Arm.

That does not have many architectural features from the 6502 :-)

It has the same byte order.

Which is what is relevant for the question at hand. The intention of
the ARM architects was to produce a CPU for their successor of the BBC
Micro, and they certainly mentioned the prominent role of the 6502 as inspiration in their accounts; they obviously did not try to create a
32-bit 6502, but at least they did not change the byte order.

CZVN flags are superficially similar, although there is an important >difference - on ARM Z flag is not affected by non-arithmetic
instructions.

What about the other flags?

Sorry, my mistake. On 6502 Z is not the only flag that is affected by non-arithmetic instructions. N is affected as well.
Also, apart fron different flags-handling by INC/DEC, which is
fully expected, there are differences in Logical, shift and evenin
compare instruuctions.
So, the two architectures are more far apart in flags handling then I
thought.

Convinient reference here: http://www.6502.org/users/obelisk/6502/instructions.html

--- Synchronet 3.21a-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun Oct 12 17:25:30 2025

From Newsgroup: comp.arch

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

If the only thing wrong with the software is that it does not work on >>>>>> big-endian systems, and little-endian has won, is there really
anything wrong with the software?

A type mismatch? I think so.

If there is really something wrong with the software on little-endian
systems, you don't need a big-endian system to find the mistake.

Would you consider a type mistake (access through the wrong type
of pointer, say store a value to char * and read via int *) to
be an error or not, if it is not directly observable on limited
number of test runs on a little-endian system? Your comment would >>>suggest not.

If no test can be devised that shows unintended behaviour on the
little-endian system, then I consider the program as delivered to be
working.

That isn't what I was saying.

Correct: That's what I am saying.

If a test can be devised that shows unintended behaviour on the
little-endian system, then there is no need for testing on a
big-endian system.

Testing, by its very nature, is incomplete. The theoretical
possibility that a test can be derived does not help in practice.

Maybe not, but that's not my point: If no such test can be devised,
would you call it a bug? Why?

As for practice: Does testing on big-endian systems help in practice?
Not in my experience. I don't remember ever finding a bug of the kind
you indicate by testing on a big-endian system (and my primary laptop
was big-endian until 2011), not a byte-order portability bug, much
less something that I would consider a bug if portability to
big-endian systems was not a goal.

https://developer.arm.com/documentation/102376/0200/Alignment-and-endianness/Endianness
says endianness can be configurable (unless you mean something else
by A64).

Which has zero relevance, because everyone in their right mind
configures their machine little-endian.
<https://wiki.debian.org/ArmPorts> says:

That's circular reasoning.

You may think so, but the lack of big-endian ARM systems makes my
point.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21a-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun Oct 12 17:47:41 2025

From Newsgroup: comp.arch

Michael S <already5chosen@yahoo.com> writes:

On Sun, 12 Oct 2025 13:36:51 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

Michael S <already5chosen@yahoo.com> writes:

CZVN flags are superficially similar, although there is an important
difference - on ARM Z flag is not affected by non-arithmetic
instructions.

What about the other flags?

Sorry, my mistake. On 6502 Z is not the only flag that is affected by >non-arithmetic instructions. N is affected as well.
Also, apart fron different flags-handling by INC/DEC, which is
fully expected, there are differences in Logical, shift and evenin
compare instruuctions.
So, the two architectures are more far apart in flags handling then I >thought.

But I don't think that the ARM architects considered that to be a
problem. The instructions were different anyway, and they did not
want to have an 8086-style 6502->ARM assembly-language translator, did
they?

Anyway, for an OoO implementation the important question is if ARM
always updates all of NZCV at the same time, or if it is selective in
the updates.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21a-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Sun Oct 12 13:04:18 2025

From Newsgroup: comp.arch

On 10/12/2025 11:11 AM, MitchAlsup wrote:

John Levine <johnl@taugh.com> posted:

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:

John Savard <quadibloc@invalid.invalid> writes:

On Fri, 03 Oct 2025 08:58:32 +0000, Anton Ertl quoted:

|If somebody really wants to create bad hardware in this day and age, >>>>> |please do make it big-endian, and also add the following very
|traditional features for sh*t-for-brains hardware:

I think that for a computer to be big-endian is a good thing.

Garrrgghhhhhhhh, not this again.

At this point, the main use-case of BE is because some people
occasionally use it in file formats, because it is somehow perceived as
better for file interchange despite pretty much none of the computers
still in use using it as the native format.

...

Well, and UTF-16 may use a BOM, so it can go either way, except when it doesn't use the BOM. And, UTF-8 sometimes uses a BOM though this is
often an unwanted aberration when one mostly just wants ASCII text (and
not all programs that read text-files as input deal gracefully with a BOM).

Though, this wonk mostly came up if one tries to edit files in
VisualStudio or Notepad. Most other text editors have the sense to not
just randomly use UTF-16 or insert a BOM (now with the generally
accepted default of assuming UTF-8 for non-ASCII characters, or if not
valid as UTF-8, assuming 1252).

Well, where statistically there is a low probability of confusing 1252
for UTF-8 as only certain statistically-unlikely combinations would
result in valid UTF-8 code-points.

There was a problem in the past of sometimes programs unintentionally
parsing ASCII as UTF-16, usually resulting in a mess of CJK characters.
As apparently some MS tools would mistakenly parse ASCII as UTF-16 if it
was an even number of bytes and if it "could" be parsed as UTF-16 (vs,
say, detecting stuff that was unlikely to be valid ASCII). Apparently, a partial workaround (also in some MS tools) was that if not explicitly
forcing ASCII, it would detect this scenario when saving and instead
save as UTF-16.

Well, then with the annoyance that if one edits a file in VS or similar,
it might be magically turned into UTF-16. Well, except in newer VS,
which has mostly gone over to UTF-8 + BOM.

...

Though, partly for these reasons, BGBCC is BOM aware, generally
normalizing code files internally as UTF-8 (BOM Free), and also CR+LF to CR-only, ... But, does mean that file-load requests need to distinguish between text and binary files on import.

Ironically though, dual endian formats with a reversible magic aren't
very very popular, possibly because people realize that even if such a
format can be in the native endian, dealing with reversible endian is a
bigger pain than just picking one or the other.

Well, with possibly ELF and COFF as the main examples of formats that
had gone this way (except PE/COFF that is pretty much always LE). Say,
for example, the machine-ID's serving to both identify the architecture
and the endianess.

Though, for plain COFF it lacks other magic numbers, and a lot of tools interpret a COFF or PE/COFF with an unknown machine ID as an unknown format.

Whatever the technical merits of different byte orders may be (and the
names "big-endian" and "little-endian" already indicate that far more
discussion has been expended on the topic than these merits justify
<https://en.wikipedia.org/wiki/Lilliput_and_Blefuscu#History_and_politics>),
little-endian has won, and that's its major merit, and big-endian's
major demerit.

Yup. I really wish the arguments about which order is "more natural"
would stop since they're just people's cultural preconceptions. I
imagine that if my first language were Arabic or Hebrew, I would find
left-to-right big-endian core dumps much less readable than the
familiar looking right-to-left little-endian ones.

Top to bottom works for Japanese and Chinese. Yet I hear not
appetite for TB byte order.

Also IIRC, can note that the current "most significant digit first"
ordering was itself partly a historical artifact:
The number notation (along with algebraic notation) was partly derived
from imported Arabic stuff.

They wrote right-to-left, westerners wrote left to write.
When imported, the notation kept the same relative order (so, were not
flipped to match the writing order). So effectively everyone in the
western world is using them backwards of the ordering in the original
context in which they were developed.

Effectively, the numbers are little endian when read right to left, or
big endian when read left to right.

In this case, it could be argued that little endian is more natural...

Well, and/or that hex-dumps should have been right to left so that the
digits would have come out in the expected order for little endian
systems (nevermind if then all of the ASCII text would be backwards).

Then again, roman numerals:
IV=4, VI=6: Decrement on Left, Increment on Right
But, MCV=1105
Bigger precedes smaller.
And, MCM=1900
With an order violation encoding a decrement.
...

Well, and classical Greek numerals were also written starting at the
highest digit:
alpha-theta: 1..9
ioata-koppa: 10, 20, 30, ...
rho-sampi: 100, 200, 300, ...
...

Etc...

So, the western world may have already had a preference for BE first,
then when importing the Arabic numeral system, keeping the original
digit order on paper would have made more sense than transposing it to
match the writing system.

But as you correctly said, the fight is over, little-endian has won,
let's argue about something else.

IEN 137 said everything worth saying about this topic 45 years ago.

https://www.rfc-editor.org/ien/ien137.txt

--- Synchronet 3.21a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sun Oct 12 19:31:11 2025

From Newsgroup: comp.arch

Michael S <already5chosen@yahoo.com> posted:

On Sun, 12 Oct 2025 13:36:51 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

Michael S <already5chosen@yahoo.com> writes:

On Sun, 12 Oct 2025 11:38:39 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:

Michael S <already5chosen@yahoo.com> schrieb:

Arm.

That does not have many architectural features from the 6502 :-)

It has the same byte order.

Which is what is relevant for the question at hand. The intention of
the ARM architects was to produce a CPU for their successor of the BBC Micro, and they certainly mentioned the prominent role of the 6502 as inspiration in their accounts; they obviously did not try to create a 32-bit 6502, but at least they did not change the byte order.

CZVN flags are superficially similar, although there is an important >difference - on ARM Z flag is not affected by non-arithmetic >instructions.

What about the other flags?

Sorry, my mistake. On 6502 Z is not the only flag that is affected by non-arithmetic instructions. N is affected as well.
Also, apart fron different flags-handling by INC/DEC, which is
fully expected, there are differences in Logical, shift and evenin
compare instruuctions.

Just more reasons either to have::
a) a bit in the instruction that controls whether flags are modified
OR
b) no condition codes at all

In Athlon and Opteron there was more Reservation Station logic for flags
than for operands {logic not flip-flops}

So, the two architectures are more far apart in flags handling then I thought.

Convinient reference here: http://www.6502.org/users/obelisk/6502/instructions.html

--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Oct 12 20:03:21 2025

From Newsgroup: comp.arch

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

If the only thing wrong with the software is that it does not work on >>>>>>> big-endian systems, and little-endian has won, is there really
anything wrong with the software?

A type mismatch? I think so.

If there is really something wrong with the software on little-endian >>>>> systems, you don't need a big-endian system to find the mistake.

Would you consider a type mistake (access through the wrong type
of pointer, say store a value to char * and read via int *) to
be an error or not, if it is not directly observable on limited
number of test runs on a little-endian system? Your comment would >>>>suggest not.

If no test can be devised that shows unintended behaviour on the
little-endian system, then I consider the program as delivered to be
working.

That isn't what I was saying.

Correct: That's what I am saying.

If a test can be devised that shows unintended behaviour on the
little-endian system, then there is no need for testing on a
big-endian system.

Testing, by its very nature, is incomplete. The theoretical
possibility that a test can be derived does not help in practice.

Maybe not, but that's not my point: If no such test can be devised,
would you call it a bug? Why?

As for practice: Does testing on big-endian systems help in practice?
Not in my experience.

And, of course, your experience is all-encompassing and the whole
source of wisdom, at least as far as your know.

Then again, I know that you do not care for anything liek standards
adherence or portability, as long as your own personal pet projects
are running well. This just confirms it.

https://developer.arm.com/documentation/102376/0200/Alignment-and-endianness/Endianness
says endianness can be configurable (unless you mean something else
by A64).

Which has zero relevance, because everyone in their right mind
configures their machine little-endian. >>><https://wiki.debian.org/ArmPorts> says:

That's circular reasoning.

You may think so,

Definitely.

but the lack of big-endian ARM systems makes my
point.

Not really. Why did the ARM architects put this in?
They need not have done so...
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Levine@johnl@taugh.com to comp.arch on Sun Oct 12 21:07:25 2025

From Newsgroup: comp.arch

According to Thomas Koenig <tkoenig@netcologne.de>:

John Levine <johnl@taugh.com> schrieb:

But as you correctly said, the fight is over, little-endian has won,
let's argue about something else.

There is something to be said for at least having a big-endian
system around to test programs: If people mismatch types, there
is a chance that it will blow up on a big-endian system and work
silently on a little-endian system.

I'd think that linux on Hercules, the open source IBM mainframe emulator, would do the
trick. It really works, not super fast, but so what.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.21a-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun Oct 12 21:07:15 2025

From Newsgroup: comp.arch

Thomas Koenig <tkoenig@netcologne.de> writes:
[configurable byte order]

Why did the ARM architects put this in?
They need not have done so...

It's cheap to add (at least the cheapo version, and I expect that's
the one that ARM provied), several other architectures supported it,
and when they added this feature, it was not clear that little-endian
would win.

And Linksys actually used big-endian mode in their NSLU2 NAS
(discontinued 2008), so maybe Intel got a customer thanks to this
feature of ARM (or maybe they would have gone with the Xscale CPU
anyway, and used it little-endian if the big-endian mode had not
existed).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21a-Linux NewsLink 1.2

From Robert Swindells@rjs@fdy2.co.uk to comp.arch on Mon Oct 13 17:26:00 2025

From Newsgroup: comp.arch

On Sun, 12 Oct 2025 21:07:15 GMT, Anton Ertl wrote:

Thomas Koenig <tkoenig@netcologne.de> writes:
[configurable byte order]

Why did the ARM architects put this in?
They need not have done so...

It's cheap to add (at least the cheapo version, and I expect that's the
one that ARM provied), several other architectures supported it, and
when they added this feature, it was not clear that little-endian would
win.

And Linksys actually used big-endian mode in their NSLU2 NAS
(discontinued 2008), so maybe Intel got a customer thanks to this
feature of ARM (or maybe they would have gone with the Xscale CPU
anyway, and used it little-endian if the big-endian mode had not
existed).

The Intel IXP CPU in the NSLU2 device was designed for networking
applications, it ran in big-endian mode by default to reduce byte
swapping of IP buffers.

They also had network offload coprocessors that looked at the same
data.

--- Synchronet 3.21a-Linux NewsLink 1.2

Who's Online

Recent Visitors

System Info

Linus Torvalds on bad architectural features