• Linus Torvalds on bad architectural features

    From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Fri Oct 3 08:58:32 2025
    From Newsgroup: comp.arch

    Apparently someone wants to create a big-endian RISC-V, and someone
    proposed adding support to that to Linux. This has evoked the
    following design guideline for designing bad architectures from Linus
    Torvalds (extracted from <https://lwn.net/ml/all/CAHk-=wji-hEV1U1x92TLsrPbpSPqDD7Cgv2YwzeL-mMbM7iaRA@mail.gmail.com/>):

    |If somebody really wants to create bad hardware in this day and age,
    |please do make it big-endian, and also add the following very
    |traditional features for sh*t-for-brains hardware:
    |
    | - virtually tagged caches
    |
    | You can't really claim to be worst-of-the-worst without virtually
    |tagged caches.
    |
    | Tears of joy as you debug cache alias issues and of flushing caches
    |on context switches.
    |
    | - only do aligned memory accesses
    |
    | Bonus point for not even faulting, and just loading and storing
    |garbage instead.
    |
    | - expose your pipeline details in the ISA
    |
    | Delayed branch slots or explicit instruction grouping is a great
    |way to show that you eat crayons for breakfast before you start
    |designing your hardware platform
    |
    | - extended memory windows
    |
    | It was good enough for 8-bit machines in order to address more
    |memory, and became a HIGHMEM.SYS staple in the DOS world, and then got
    |taken up by both x86 and arm in their 32-bit days as HIGHMEM support.
    |
    | It has decades of history, and an architecture cannot be called
    |truly awful if it doesn't support some kind of HIGHMEM crap.
    |
    | - register windows. It's like extended memory, but for your registers!
    |
    | Please make sure to also have hardware support for filling and
    |spilling them, but make it limited enough that system software has to
    |deal with faults at critical times. Nesting exceptions is joyful!
    |
    | Bonus points if they are rotating and overflowing them silently
    |just corrupts data. Keep those users on their toes!
    |
    | - in fact, require software fallbacks for pretty much anything unusual.
    |
    | TLB fills? They might only happen every ten or twenty instructions,
    |so make them fault to some software implementation to really show your
    |mad hardware skillz.
    |
    | denormals or any other FP precision issues? No, no, don't waste
    |hardware on getting it right, software people *LOVE* to clean up after
    |you.
    |
    | Remember: your mom picked up your dirty laundry from your floor,
    |and software people are like the super-moms of the world.
    |
    | - make exceptions asynchronous.
    |
    | That's another great way to make sure people stay on their toes.
    |Make sure machine check exceptions can happen in any context, so that
    |you are guaranteed to have a dead machine any time anything goes
    |wrong.
    |
    | But you should also take the non-maskability of NMI to heart, and
    |make sure that software cannot possibly write code that is truly
    |atomic. Because the NM is NMI is what makes it great!
    |
    | Floating point! Make sure that the special case you don't deal with
    |in hardware are also delayed so that the software people have extra
    |joy in trying to figure out just WTF happened. See the previous entry:
    |they live for that stuff.
    |
    |I'm sure I've forgotten many other points. And I'm sure that hardware
    |people will figure it out!
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Fri Oct 3 05:40:22 2025
    From Newsgroup: comp.arch

    On 10/3/2025 3:58 AM, Anton Ertl wrote:
    Apparently someone wants to create a big-endian RISC-V, and someone
    proposed adding support to that to Linux. This has evoked the
    following design guideline for designing bad architectures from Linus Torvalds (extracted from <https://lwn.net/ml/all/CAHk-=wji-hEV1U1x92TLsrPbpSPqDD7Cgv2YwzeL-mMbM7iaRA@mail.gmail.com/>):


    Yeah...

    Sadly I kinda feel called out here.
    Wouldn't necessarily get the Torvalds' seal of approval...


    |If somebody really wants to create bad hardware in this day and age,
    |please do make it big-endian, and also add the following very
    |traditional features for sh*t-for-brains hardware:
    |
    | - virtually tagged caches
    |
    | You can't really claim to be worst-of-the-worst without virtually
    |tagged caches.
    |
    | Tears of joy as you debug cache alias issues and of flushing caches
    |on context switches.
    |

    Sorta applies to my core...
    Though the L1D$ also remembers the Phys-Addr and uses this for Write-Back.


    | - only do aligned memory accesses
    |
    | Bonus point for not even faulting, and just loading and storing
    |garbage instead.
    |

    Avoided in BJX2 Core.

    Would apply to my smaller BSR1 and B32V cores (aligned only = cheaper).


    | - expose your pipeline details in the ISA
    |
    | Delayed branch slots or explicit instruction grouping is a great
    |way to show that you eat crayons for breakfast before you start
    |designing your hardware platform
    |

    Former true of SuperH.
    Both true of BJX1.
    Latter true of BJX2 XG1/XG2.

    Not true of XG3, which went over to superscalar.

    WEX Bundling may have been a mistake in retrospect...



    | - extended memory windows
    |
    | It was good enough for 8-bit machines in order to address more
    |memory, and became a HIGHMEM.SYS staple in the DOS world, and then got |taken up by both x86 and arm in their 32-bit days as HIGHMEM support.
    |
    | It has decades of history, and an architecture cannot be called
    |truly awful if it doesn't support some kind of HIGHMEM crap.
    |

    Avoided.


    | - register windows. It's like extended memory, but for your registers!
    |
    | Please make sure to also have hardware support for filling and
    |spilling them, but make it limited enough that system software has to
    |deal with faults at critical times. Nesting exceptions is joyful!
    |
    | Bonus points if they are rotating and overflowing them silently
    |just corrupts data. Keep those users on their toes!
    |

    Avoided.

    | - in fact, require software fallbacks for pretty much anything unusual.
    |
    | TLB fills? They might only happen every ten or twenty instructions,
    |so make them fault to some software implementation to really show your
    |mad hardware skillz.
    |

    Errm, true of BJX2.

    Though, TLB Misses are nowhere near that frequent though (if they were, performance would be unusable dog crap).


    | denormals or any other FP precision issues? No, no, don't waste
    |hardware on getting it right, software people *LOVE* to clean up after
    |you.
    |

    Also true of my core.


    It also now pretends to have Binary128, pretty much entirely by software traps.

    But, trapping has less code footprint, so if sinl/cosl/... are used,
    they wont burn as much space in ".text" with the function calls (and if
    I can trap out of RISC-V mode, then it can use 128-bit math and a few
    other features that don't exist in RV64, so it isn't necessarily slower
    than using a function call).



    | Remember: your mom picked up your dirty laundry from your floor,
    |and software people are like the super-moms of the world.
    |

    But, makes hardware cheaper...


    | - make exceptions asynchronous.
    |
    | That's another great way to make sure people stay on their toes.
    |Make sure machine check exceptions can happen in any context, so that
    |you are guaranteed to have a dead machine any time anything goes
    |wrong.
    |

    Avoided:
    TLB Miss handling really needs precise exceptions in order to work
    correctly.


    | But you should also take the non-maskability of NMI to heart, and
    |make sure that software cannot possibly write code that is truly
    |atomic. Because the NM is NMI is what makes it great!
    |
    | Floating point! Make sure that the special case you don't deal with
    |in hardware are also delayed so that the software people have extra
    |joy in trying to figure out just WTF happened. See the previous entry:
    |they live for that stuff.
    |
    |I'm sure I've forgotten many other points. And I'm sure that hardware |people will figure it out!


    Ignoring HOB's in pointers except in certain edge cases?...

    I have mixed feelings about having put FPU status in HOBs of SP
    (possible foot gun).

    Weak coherence, with special rituals needed to actually get caches
    flushed?...

    Bit-slicing certain address calculations so the relevant structures have mandatory alignment?...

    Interrupt entry is basically just a glorified branch-with-mode change,
    so the ISR handler has to go through a convoluted sequence to get to
    where it can start saving off the registers?...

    ...


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Fri Oct 3 13:46:45 2025
    From Newsgroup: comp.arch

    On Fri, 03 Oct 2025 08:58:32 GMT
    anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

    Apparently someone wants to create a big-endian RISC-V, and someone
    proposed adding support to that to Linux. This has evoked the
    following design guideline for designing bad architectures from Linus Torvalds (extracted from <https://lwn.net/ml/all/CAHk-=wji-hEV1U1x92TLsrPbpSPqDD7Cgv2YwzeL-mMbM7iaRA@mail.gmail.com/>):

    |If somebody really wants to create bad hardware in this day and age,
    |please do make it big-endian, and also add the following very
    |traditional features for sh*t-for-brains hardware:
    |
    | - virtually tagged caches
    |
    | You can't really claim to be worst-of-the-worst without virtually
    |tagged caches.
    |
    | Tears of joy as you debug cache alias issues and of flushing caches
    |on context switches.
    |

    That is only true if one insists on OS with Multiple Address Spaces.
    Virtually tagged caches are fine for Single Address Space (SAS) OS.

    I see nothing wrong (and plenty right) about SAS as long as address
    space is big enough.
    I.e. not 47-48 bits and preferably even not 56 bits. Considering
    near-death of Moore Law, 58 or 60 bits should be enough for SAS for
    next 50 years. May be, even for 100.

    SAS does not allow few tricks that people play today with aliases, but
    none of these tricks is really important for performance and all are detrimental for sanity.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Fri Oct 3 11:26:11 2025
    From Newsgroup: comp.arch

    | - virtually tagged caches
    | You can't really claim to be worst-of-the-worst without virtually
    |tagged caches.
    | Tears of joy as you debug cache alias issues and of flushing caches
    |on context switches.
    That is only true if one insists on OS with Multiple Address Spaces. Virtually tagged caches are fine for Single Address Space (SAS) OS.

    AFAIK, the main problem with SASOS is "backward compatibility", most importantly with `fork`. The Mill people proposed a possible solution,
    which seemed workable, but it's far from clear to me whether it would
    work well enough if you want to port, say, Debian to such
    an architecture.


    Stefan
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Oct 3 15:41:34 2025
    From Newsgroup: comp.arch


    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

    Apparently someone wants to create a big-endian RISC-V, and someone
    proposed adding support to that to Linux. This has evoked the
    following design guideline for designing bad architectures from Linus Torvalds (extracted from <https://lwn.net/ml/all/CAHk-=wji-hEV1U1x92TLsrPbpSPqDD7Cgv2YwzeL-mMbM7iaRA@mail.gmail.com/>):

    |If somebody really wants to create bad hardware in this day and age,
    |please do make it big-endian, and also add the following very
    |traditional features for sh*t-for-brains hardware:
    |
    | - virtually tagged caches
    |
    | You can't really claim to be worst-of-the-worst without virtually
    |tagged caches.
    |
    | Tears of joy as you debug cache alias issues and of flushing caches
    |on context switches.

    Avoided.

    | - only do aligned memory accesses
    |
    | Bonus point for not even faulting, and just loading and storing
    |garbage instead.

    Avoided.

    | - expose your pipeline details in the ISA
    |
    | Delayed branch slots or explicit instruction grouping is a great
    |way to show that you eat crayons for breakfast before you start
    |designing your hardware platform

    Avoided

    | - extended memory windows
    |
    | It was good enough for 8-bit machines in order to address more
    |memory, and became a HIGHMEM.SYS staple in the DOS world, and then got |taken up by both x86 and arm in their 32-bit days as HIGHMEM support.

    Avoided

    | It has decades of history, and an architecture cannot be called
    |truly awful if it doesn't support some kind of HIGHMEM crap.
    |
    | - register windows. It's like extended memory, but for your registers!
    |
    | Please make sure to also have hardware support for filling and
    |spilling them, but make it limited enough that system software has to
    |deal with faults at critical times. Nesting exceptions is joyful!
    |
    | Bonus points if they are rotating and overflowing them silently
    |just corrupts data. Keep those users on their toes!

    Avoided

    | - in fact, require software fallbacks for pretty much anything unusual.
    |
    | TLB fills? They might only happen every ten or twenty instructions,
    |so make them fault to some software implementation to really show your
    |mad hardware skillz.

    Avoided--and mine are even coherent so you don't even have to shoot
    them down.

    | denormals or any other FP precision issues? No, no, don't waste
    |hardware on getting it right, software people *LOVE* to clean up after
    |you.
    |
    | Remember: your mom picked up your dirty laundry from your floor,
    |and software people are like the super-moms of the world.

    Avoided.

    | - make exceptions asynchronous.

    Avoided

    | That's another great way to make sure people stay on their toes.
    |Make sure machine check exceptions can happen in any context, so that
    |you are guaranteed to have a dead machine any time anything goes
    |wrong.
    |
    | But you should also take the non-maskability of NMI to heart, and
    |make sure that software cannot possibly write code that is truly
    |atomic. Because the NM is NMI is what makes it great!

    Avoided

    | Floating point! Make sure that the special case you don't deal with
    |in hardware are also delayed so that the software people have extra
    |joy in trying to figure out just WTF happened. See the previous entry:
    |they live for that stuff.

    Avoided

    |I'm sure I've forgotten many other points. And I'm sure that hardware |people will figure it out!


    A clean sweep.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Oct 3 15:42:35 2025
    From Newsgroup: comp.arch


    Stefan Monnier <monnier@iro.umontreal.ca> posted:

    | - virtually tagged caches
    | You can't really claim to be worst-of-the-worst without virtually
    |tagged caches.
    | Tears of joy as you debug cache alias issues and of flushing caches
    |on context switches.
    That is only true if one insists on OS with Multiple Address Spaces. Virtually tagged caches are fine for Single Address Space (SAS) OS.

    AFAIK, the main problem with SASOS is "backward compatibility", most importantly with `fork`. The Mill people proposed a possible solution,
    which seemed workable, but it's far from clear to me whether it would
    work well enough if you want to port, say, Debian to such
    an architecture.

    SASOS seems like a bridge too far.


    Stefan
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From kegs@kegs@provalid.com (Kent Dickey) to comp.arch on Fri Oct 3 16:18:47 2025
    From Newsgroup: comp.arch

    In article <1759506155-5857@newsgrouper.org>,
    MitchAlsup <user5857@newsgrouper.org.invalid> wrote:

    Stefan Monnier <monnier@iro.umontreal.ca> posted:

    | - virtually tagged caches
    | You can't really claim to be worst-of-the-worst without virtually
    |tagged caches.
    | Tears of joy as you debug cache alias issues and of flushing caches
    |on context switches.
    That is only true if one insists on OS with Multiple Address Spaces.
    Virtually tagged caches are fine for Single Address Space (SAS) OS.

    AFAIK, the main problem with SASOS is "backward compatibility", most
    importantly with `fork`. The Mill people proposed a possible solution,
    which seemed workable, but it's far from clear to me whether it would
    work well enough if you want to port, say, Debian to such
    an architecture.

    SASOS seems like a bridge too far.


    Stefan

    Fork is not a problem with virtual tagged caches or SAS. Normal fork
    starts the child with a copy of the parent's address mapping, and uses
    "Copy on Write" (COW) to create unique pages as soon as either process
    does a write.

    For it's entire existance, PA-RISC HP-UX supported virtual indexed
    caches in a SAS, and implemented fork using Copy On Access. As soon as
    the child process touched any page for read or write, it got a copy, so
    it can only access its own pages (not counting read-only instruction
    pages). This works fine, and it's not a performance issue. The love
    folks have for COW is overblown. Real code either immediately exec()'s
    (maybe doing some close()'s and other housekeeping first) or starts
    writing lots of pages doing what it wants to do as a new process. Note
    since the OS knows it needs to copy pages, it can pre-copy a bunch of
    pages, such as the stack, and some basic data pages, to avoid some
    initial faults for the exec() case at least.

    Kent
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Fri Oct 3 15:44:26 2025
    From Newsgroup: comp.arch

    Kent Dickey [2025-10-03 16:18:47] wrote:
    Fork is not a problem with virtual tagged caches or SAS. Normal fork
    starts the child with a copy of the parent's address mapping, and uses
    "Copy on Write" (COW) to create unique pages as soon as either process
    does a write.

    The problem is not how/when you do the "copy", but the fact that once
    the data at address A has been changed, address A in the child process
    and address A in the parent don't contain the same value. This is fundamentally at odds with SASOS and with virtually-indexed&tagged
    caches. The usual workaround is to augment the virtual addresses with
    some kind of "address-space ID" (ASID).

    That in turn makes it harder to share read-write memory between
    processes (Mill's approach tried to accommodate that by augmenting only
    *some* addresses with an ASID, but not all), and requires flushing the
    cache when an ASID is re-used for another process (which can happen
    rather often because the size of the ASID is usually limited to a small
    number of bits).


    Stefan
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Fri Oct 3 16:19:12 2025
    From Newsgroup: comp.arch

    On 10/3/2025 10:41 AM, MitchAlsup wrote:

    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

    Apparently someone wants to create a big-endian RISC-V, and someone
    proposed adding support to that to Linux. This has evoked the
    following design guideline for designing bad architectures from Linus
    Torvalds (extracted from
    <https://lwn.net/ml/all/CAHk-=wji-hEV1U1x92TLsrPbpSPqDD7Cgv2YwzeL-mMbM7iaRA@mail.gmail.com/>):

    |If somebody really wants to create bad hardware in this day and age,
    |please do make it big-endian, and also add the following very
    |traditional features for sh*t-for-brains hardware:
    |
    | - virtually tagged caches
    |
    | You can't really claim to be worst-of-the-worst without virtually
    |tagged caches.
    |
    | Tears of joy as you debug cache alias issues and of flushing caches
    |on context switches.

    Avoided.

    | - only do aligned memory accesses
    |
    | Bonus point for not even faulting, and just loading and storing
    |garbage instead.

    Avoided.

    | - expose your pipeline details in the ISA
    |
    | Delayed branch slots or explicit instruction grouping is a great
    |way to show that you eat crayons for breakfast before you start
    |designing your hardware platform

    Avoided

    | - extended memory windows
    |
    | It was good enough for 8-bit machines in order to address more
    |memory, and became a HIGHMEM.SYS staple in the DOS world, and then got
    |taken up by both x86 and arm in their 32-bit days as HIGHMEM support.

    Avoided

    | It has decades of history, and an architecture cannot be called
    |truly awful if it doesn't support some kind of HIGHMEM crap.
    |
    | - register windows. It's like extended memory, but for your registers!
    |
    | Please make sure to also have hardware support for filling and
    |spilling them, but make it limited enough that system software has to
    |deal with faults at critical times. Nesting exceptions is joyful!
    |
    | Bonus points if they are rotating and overflowing them silently
    |just corrupts data. Keep those users on their toes!

    Avoided

    | - in fact, require software fallbacks for pretty much anything unusual.
    |
    | TLB fills? They might only happen every ten or twenty instructions,
    |so make them fault to some software implementation to really show your
    |mad hardware skillz.

    Avoided--and mine are even coherent so you don't even have to shoot
    them down.

    | denormals or any other FP precision issues? No, no, don't waste
    |hardware on getting it right, software people *LOVE* to clean up after
    |you.
    |
    | Remember: your mom picked up your dirty laundry from your floor,
    |and software people are like the super-moms of the world.

    Avoided.

    | - make exceptions asynchronous.

    Avoided

    | That's another great way to make sure people stay on their toes.
    |Make sure machine check exceptions can happen in any context, so that
    |you are guaranteed to have a dead machine any time anything goes
    |wrong.
    |
    | But you should also take the non-maskability of NMI to heart, and
    |make sure that software cannot possibly write code that is truly
    |atomic. Because the NM is NMI is what makes it great!

    Avoided

    | Floating point! Make sure that the special case you don't deal with
    |in hardware are also delayed so that the software people have extra
    |joy in trying to figure out just WTF happened. See the previous entry:
    |they live for that stuff.

    Avoided

    |I'm sure I've forgotten many other points. And I'm sure that hardware
    |people will figure it out!


    A clean sweep.


    The alternative position might be:
    All jank is acceptable so long as it doesn't significantly impede
    performance or negatively impact userland.

    Or, maybe, actively embracing the "full jank route".

    Possibly Torvalds wouldn't exactly approve though...


    Well, except for aligned-only and big-endian, better reasons not to go
    that way. Better IMO to just leave everything LE and then use byte-swap instructions for the rare case one needs to access a big-endian variable.

    Well, and then be annoyed that C lacks any standard way to specify the endianess of variables or pointers; and the need to have compiler
    builtins which map to to htonl/ntohl/htons/ntohs/... (with the usual
    annoyance that one also needs a generic function fallback in the
    background for the case where someone wants to take the function pointer
    of one of these functions; sorta like with memcpy and similar).



    If I were to try to go in a "jank reducing" direction, probably:
    Use XG3 as a design base;
    Comparably cleaner and more orthogonal than XG1 and XG2.
    Eliminate Modal stuff;
    Maybe drop the RISC-V conjoined-twin thing;
    Hardware page walker and fully IEEE FPU?...
    Probably also add cache coherence.
    Mandate zero or sign extended registers as the default (like x86-64);
    Put FPU status/control into its own register or similar (*1).
    ...

    Though, unclear is if a "good" core by these definitions could be done
    without a significant negative impact on FPGA resource budget.



    *1: Sticking it into the HOBs of either GP or SP is ugly, and has an unreasonable level of footgun potential. So, this is pretty high on my
    "I probably need to change this before it ends up getting stuck this way permanently" thing (in which case, would go back to SP[63:48] being
    hard-wired to 0).

    This is probably one of those "going to change once I come up with a
    better option" situations.

    Don't really want to define a new CR for this, but need a place to put
    it that:
    May be exposed to userland without creating problems;
    May be saved/restored on context switches.

    Actually, relocating it the HOBs of TBR could almost work here:
    Already preserved on context switch;
    Not directly visible to RISC-V or XG3 via normal registers;
    TP is a shadow of TBR in TestKern, but TP is its own register here.

    In this case, might change TP from "Read Only in userland" to "Fault on attempt to modify low 48-bits in Userland".


    Exposure to RISC-V land being the bigger problem, as compilers like GCC
    are not going to be aware of "various registers may have weird crap
    squirreled into the HOBs" type issues.

    Granted, Link-Registers have weird stuff in the HOBs, but generally GCC doesn't poke at the link register. But, then again, there is still the
    "glibc violently explodes if I try to use it" issue, and I can't prove
    this is not due to the wacky link registers or similar (would have to
    more carefully examine it to make sure it isn't doing something weird
    here). If it turns out that glibc messes with the link register, may
    need to figure out a way to make RV mode work with bare-pointer link registers.


    ...


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Fri Oct 3 17:42:19 2025
    From Newsgroup: comp.arch

    On 10/3/2025 10:26 AM, Stefan Monnier wrote:
    | - virtually tagged caches
    | You can't really claim to be worst-of-the-worst without virtually
    |tagged caches.
    | Tears of joy as you debug cache alias issues and of flushing caches
    |on context switches.
    That is only true if one insists on OS with Multiple Address Spaces.
    Virtually tagged caches are fine for Single Address Space (SAS) OS.

    AFAIK, the main problem with SASOS is "backward compatibility", most importantly with `fork`. The Mill people proposed a possible solution,
    which seemed workable, but it's far from clear to me whether it would
    work well enough if you want to port, say, Debian to such
    an architecture.


    You can... just sort of not support full "fork()"; or support it in a
    way similar to how it works on ucLinux and Cygwin. Namely, you can use
    it, but trying to use it for anything more than a fork immediately
    followed by an "exec*" call or similar is probably going to break something.

    Well, or anything that depends on "fork()" isn't going to work; and the preferable way to spawn new process instances is something along the
    lines of a "CreateProcessEx()" style mechanism.




    As can be noted, I had designed my ABIs with the assumption of a single address space.

    Generally, it ended up as 48 bit as, even within the limits of an FPGA
    with only 128MB of actual RAM or so, a 32-bit VAS can get a bit cramped (where, 32-bits is only really enough for a single-program in an address space, if that).


    My "break glass" feature for 48-bits being insufficient for a single
    address space was expanding the VAS to 96 bits, though even this was a
    bit wonk:
    Low 32-bits: Real address bits;
    Next 24 bits: Just sorta mash all the HOBs together and hope it doesn't
    break.

    Where, say, extending the L1 cache tags by 8 bits is a lot cheaper than extending them by 48 bits, and offers a sufficiently low probability of aliasing.


    So, in the 96-bit mode:
    0000_00000000-0000_00000000..0000_00000000-7FFF_FFFFFFFF:
    Preserved exactly if no higher addresses used.
    Anything else: YMMV.

    There is a non-zero risk of random 4GB regions aliasing based on the
    whims of the XOR, as actually storing full 96-bit addresses is steep.
    The page-tables and TLB could support full-width 96-bit addresses, so
    the main problem area would be trying to use two addresses at the same
    time where they would map to the same location in the L1 cache.

    However, if one assumes a scenario where each program is confined to a
    slice of the bigger 96-bit space, then the XOR's all even out and the
    address space is consistent (the risk mostly appearing when using
    addresses not within the same 48-bit "quadrant").


    Theoretically, the OS's ASLR could keep track of this and not assign
    address ranges that would alias with previously used address ranges (via
    a lookup table).

    Kinda similar crap to the "PE loader may not load a PE to an address
    that crosses a 4GB boundary" because it adds cost to have
    direct-branches and PC increment need to deal with more than 4GB.
    Well, sorta:
    PC increment still has a 4GB window;
    Branches are either 16MB window (via branch predictor);
    Or, +/- 8GB, via normal address calc.
    Branch predictor detecting carry-out and not handling the branch.
    Was 4GB originally, but the above trick allowed being cheaper here.
    However, crossing a 16MB barrier has a performance penalty.
    Statistically low probability of ".text" crossing such a barrier.


    Arguably, all still kinda crap though...


    For now, 48-bits is plenty for my uses.

    I considered possible options 64-bit VAS support (within the 96-bit
    mode), but annoyingly, if done in an affordable way, would likely not
    allow program code outside the low 48 bits, or arrays crossing a 48-bit boundary (or, still slightly jank).



    Though, IMHO, still better than what MIPS did, IIRC:
    PC1[63:28] = PC0[63:28]
    PC1[27: 2] = JAL_Addr[25:0]
    PC1[ 1: 0] = 0

    Or, say, you have a 256MB barrier that may not be crossed, and the
    loader would need to rebase within said 256 MB.

    Information is inconsistent for conditional branches, where some
    information implies it is simply adding the displacement (scaled by 4),
    and other info implies:
    Copy high bits unchanged;
    Add low-order bits;
    Address may wrap if it crosses some ill-defined address barrier.

    They seemingly missed an opportunity to go cheaper for Bcc here, say:
    PC1[63:20] = PC0[63:20]
    PC1[19:14] = PC0[19:14] + SExt(Bcc_Addr[15:12])
    PC1[13: 2] = Bcc_Addr[11:0]
    PC1[ 1: 0] = 0
    Then, say, one only needs to do an 6-bit addition for the conditional
    branch instruction.

    Trying to rebase a program at load time being "there be dragons here" territory.


    ...




    Stefan

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From kegs@kegs@provalid.com (Kent Dickey) to comp.arch on Sat Oct 4 04:36:28 2025
    From Newsgroup: comp.arch

    In article <jwvo6qoui1m.fsf-monnier+comp.arch@gnu.org>,
    Stefan Monnier <monnier@iro.umontreal.ca> wrote:
    | - virtually tagged caches
    | You can't really claim to be worst-of-the-worst without virtually
    |tagged caches.
    | Tears of joy as you debug cache alias issues and of flushing caches
    |on context switches.
    That is only true if one insists on OS with Multiple Address Spaces.
    Virtually tagged caches are fine for Single Address Space (SAS) OS.

    AFAIK, the main problem with SASOS is "backward compatibility", most >importantly with `fork`. The Mill people proposed a possible solution,
    which seemed workable, but it's far from clear to me whether it would
    work well enough if you want to port, say, Debian to such
    an architecture.


    Stefan

    Copy-on-Access gives you 100% compatibility with all fork() semantics.

    You can define SAS in a way that almost defeats virtual addresses, but
    let's assume we have 48-bit virtual address space and 16-bit ASID, for
    an effective 64-bit SAS. We'll have every process using a different ASID.
    And we'll assume the ASID affects dcache indexing so we have to handle that.

    First process is ASID=1. It forks, and the child is ASID=2. It is a completely new address space. We'll assume they cannot see each other's
    data in the dcache due to the virtual indexes being different. So
    ASID=1, VA=0x1000 maps to a different dcache index than ASID=2,
    VA=0x1000 even if they map to the same physical address. The ASID=2
    process starts (for the sake of a simple explanation) with no pages
    mapped, except it maps all the read-only instruction pages from ASID=1
    as ASID=2. (Note it doesn't matter if these are at different
    instruction and/or data cache indexes since it's always read-only). All
    data pages from the ASID=1 process are made invalid (in the page table,
    and removed from the TLB). Now ASID=1 and ASID=2 are running
    simultaneously. If the ASID=1 process touches any data page, the OS
    copies the contents of that original physical page to a new page, and
    makes that new page available to the ASID=2 process. This copy is the
    real trick: in the dumbest possible implementation, the OS flushes the
    data to DRAM, then copies it to the new physical address, and flushes
    that to DRAM. But systems with caches with virtual aliasing generally
    provide ways to handle the aliasing in a more efficient way to do this
    copying in the caches, at least in the L2 cache. Once the copy of the
    one page is done, the OS then makes the corresponding ASID=1 page
    writeable, and continues. Similarly, if the ASID=2 process touches a
    page, it gets a copy of the ASID=1 page (which ASID=1 has not touched
    yet), and then the OS gives the ASID=1 process write access to that
    page. Basically, both processes are "paging in" the ASID=1 pages.

    ASID=1 keeps all of its physical pages. ASID=2 get a copy of all the
    physical pages from ASID=1 that it touches.

    Note that COW has to go and make all pages of the initial process read-only, which might be more work than to just make all pages invalid.

    Kent
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From John Levine@johnl@taugh.com to comp.arch on Sat Oct 4 18:36:45 2025
    From Newsgroup: comp.arch

    It appears that Kent Dickey <kegs@provalid.com> said:
    AFAIK, the main problem with SASOS is "backward compatibility", most >>importantly with `fork`. ...

    First process is ASID=1. It forks, and the child is ASID=2. It is a >completely new address space. ...

    I don't think anyone would call a system that gives each process a completely new address space a single address space system. Making the ASID part of the translated address is one of many ways of implementing a conventional address space per process system.

    The last widely used single address space systems I can think of were OS/VS1 and OS/VS2 SVS, each of which provided a single full sized address space in which they essentially ran their real memory predecessors MFT and MVT. As
    Lynn has often told us, operating system bloat forced them quickly to go
    to MVS, an address space per process.

    I suppose there could still be single address space realtime or
    embedded systems where all the programs to be run are known when the
    system is built.
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sat Oct 4 19:00:17 2025
    From Newsgroup: comp.arch

    John Levine <johnl@taugh.com> schrieb:

    The last widely used single address space systems I can think of were OS/VS1 and OS/VS2 SVS,

    Don't forget all the home computers. It might be debatable if they
    should be called "system", though.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Sat Oct 4 12:31:49 2025
    From Newsgroup: comp.arch

    On 10/4/2025 11:36 AM, John Levine wrote:
    It appears that Kent Dickey <kegs@provalid.com> said:
    AFAIK, the main problem with SASOS is "backward compatibility", most
    importantly with `fork`. ...

    First process is ASID=1. It forks, and the child is ASID=2. It is a
    completely new address space. ...

    I don't think anyone would call a system that gives each process a completely new address space a single address space system. Making the ASID part of the translated address is one of many ways of implementing a conventional address space per process system.

    The last widely used single address space systems I can think of were OS/VS1 and OS/VS2 SVS, each of which provided a single full sized address space in which they essentially ran their real memory predecessors MFT and MVT.

    Isn't the AS/400, or whatever it is called now, a SAS?
    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Sun Oct 5 01:05:17 2025
    From Newsgroup: comp.arch

    On Sat, 4 Oct 2025 18:36:45 -0000 (UTC)
    John Levine <johnl@taugh.com> wrote:

    It appears that Kent Dickey <kegs@provalid.com> said:
    AFAIK, the main problem with SASOS is "backward compatibility", most >>importantly with `fork`. ...

    First process is ASID=1. It forks, and the child is ASID=2. It is a >completely new address space. ...

    I don't think anyone would call a system that gives each process a
    completely new address space a single address space system.

    Agreed.

    Making
    the ASID part of the translated address is one of many ways of
    implementing a conventional address space per process system.

    The last widely used single address space systems I can think of were
    OS/VS1 and OS/VS2 SVS,


    How would you call OS/400 (nowadays, IBM i) ?

    each of which provided a single full sized
    address space in which they essentially ran their real memory
    predecessors MFT and MVT. As Lynn has often told us, operating
    system bloat forced them quickly to go to MVS, an address space per
    process.

    I suppose there could still be single address space realtime or
    embedded systems where all the programs to be run are known when the
    system is built.


    IIRC, Windows CE supported SAS mode of operation just fine without such limitations.





    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From John Levine@johnl@taugh.com to comp.arch on Sat Oct 4 22:44:52 2025
    From Newsgroup: comp.arch

    It appears that Michael S <already5chosen@yahoo.com> said:
    The last widely used single address space systems I can think of were
    OS/VS1 and OS/VS2 SVS,

    How would you call OS/400 (nowadays, IBM i) ?

    I haven't looked at it for a while but I think you're right.
    They have POSIX compatible APIs, wonder how that works.

    I suppose there could still be single address space realtime or
    embedded systems where all the programs to be run are known when the
    system is built.

    IIRC, Windows CE supported SAS mode of operation just fine without such >limitations.

    For that matter, so did MS-DOS and Windows up through 3.0.
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Sat Oct 4 17:57:16 2025
    From Newsgroup: comp.arch

    On 10/4/2025 5:44 PM, John Levine wrote:
    It appears that Michael S <already5chosen@yahoo.com> said:
    The last widely used single address space systems I can think of were
    OS/VS1 and OS/VS2 SVS,

    How would you call OS/400 (nowadays, IBM i) ?

    I haven't looked at it for a while but I think you're right.
    They have POSIX compatible APIs, wonder how that works.


    FWIW, I suspect that the number of programs that use "fork()" without immediately calling "exec*()" is probably fairly small.

    AFAIK, programs that depend on full "fork()" semantics wont generally
    work on Cygwin either, as IIRC it is just sort of faked by copying the
    local stack frame and spawning out a new thread that terminates on the "exec*()" call.

    Apart from non-PIE ELF or similar, not much else doesn't work in an SAS. Though, ABI tweaks are needed to make things efficient (eg, not needing
    to load in a new copy of the binaries for every new process).


    I suppose there could still be single address space realtime or
    embedded systems where all the programs to be run are known when the
    system is built.

    IIRC, Windows CE supported SAS mode of operation just fine without such
    limitations.

    For that matter, so did MS-DOS and Windows up through 3.0.


    Not sure if 16-bit protected mode segmentation counts as SAS though.
    MS-DOS, maybe, as one could do address math on the segments.


    FWIW, some of my own engineering efforts here took inspiration from
    Windows CE.

    Like, the way I am using the "Global Pointer" directory entry in the
    PE/COFF headers wasn't entirely a novel innovation on my end, ...


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Sun Oct 5 02:18:26 2025
    From Newsgroup: comp.arch

    On Sat, 4 Oct 2025 22:44:52 -0000 (UTC)
    John Levine <johnl@taugh.com> wrote:

    It appears that Michael S <already5chosen@yahoo.com> said:

    I suppose there could still be single address space realtime or
    embedded systems where all the programs to be run are known when
    the system is built.

    IIRC, Windows CE supported SAS mode of operation just fine without
    such limitations.

    For that matter, so did MS-DOS and Windows up through 3.0.


    It's not the same.
    CE supported preemptive multitasking (arguably, better than likes of NT
    or majority of popular Unixes, at least as long as we are talking about non-SMP) and memory protection, both protection of kernel from user
    processes and of user processes from each other.

    I never took a look at CE support for Virtual Memory. Probably it was
    quite weak, if there was support at all. The only CE-based product I
    ever did had absolutely no need for Virtual Memory.

    However I am pretty sure that they utilized paging hardware for
    management of physical memory, removing fear of fragmentation.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Lynn Wheeler@lynn@garlic.com to comp.arch on Sat Oct 4 14:17:32 2025
    From Newsgroup: comp.arch


    John Levine <johnl@taugh.com> writes:
    The last widely used single address space systems I can think of were OS/VS1 and OS/VS2 SVS, each of which provided a single full sized address space in which they essentially ran their real memory predecessors MFT and MVT. As Lynn has often told us, operating system bloat forced them quickly to go
    to MVS, an address space per process.

    they had two kinds of bloat. original decision to add virtual memory was because of MVT storage management problems, having to specify each
    region (concurrent execution) four times larger than actually used, as a
    result a typical 1mbyte 370/165 only ran four concurrent regions,
    insufficient to keep system busy and justified. Going to 16mbyte virtual address space (VS2/SVS) allowed concurrent regions to be increased by
    factor of four (sort of like running MVT in a 16mbyte CP67 virtual
    machine ... aka CP67 recursor to VM370), with little or no paging
    ... although caped at 15 because of 4bit storage protects keys.

    Problem was that as systems got larger/faster needed to move past 15
    concurrent regions ... which resulted in giving each concurrently
    executing region/program, their own 16mbyte virtual address space
    (VS2/MVS). However, OS/360 & descendents were heavily pointer passing
    APIs (creating a different problem) and so they mapped a 8mbyte image of
    the MVS kernel into every 16mbyte virtual address space (leaving
    8mbytes). Then because each subsystem was moved into their separate
    16mbyte virtual address space, the 1mbyte "Common Segment Area" (CSA)
    was mapped into every virtual address space for passing arguments/data
    back and forth between applications and subsystems (leaving 7mbytes).

    Then because the space requirements for passing arguments/data back and
    forth was somewhat proportional to number of subsystems and concurrently running regions/applications, the CSA started to explode becoming the
    Common System Area (CSA) running 5-6mbytes (leaving 2-3mbytes for regions/applications) and threatening to become 8mbytes (leaving zero
    for regions/applications). At the same time the number of concurrently
    running applications space requirements was exceeding 16mbytes real
    address ... and 2nd half 70s, 3033s were retrofitted for 64mbytes real addressing by taking two unused bits in page table entry and prefixing
    them to the 12bit (4k) real page number for 14bits or 64mbyte
    (instructions were still 16mbyte, but virtual pages could be
    loaded and run "above the 16mbyte line").

    Then part of 370/xa "access registers" was retrofitted to 3033 for dual
    address space mode. Calls to subsystems, could move the caller's address
    space pointer into the secondary address space register and the
    subsystem address space pointer was moved into primary. Subsystems then
    could access the caller's (secondary) virtual address space w/o needing
    data be passed back&forth in CSA. For 370/xa, program call/return
    instructions could perform the address space primary/secondary switches
    all in hardware.

    I had also started pontificating that lot of OS/360 had heavily
    leveraged I/O system to compensate for limited real storage (and
    descendents had inherited it). In early 80s, I wrote a tome that
    relative system disk I/O throughput had declined by an order of
    magnitude (disks throughput got 3-5 times faster while systems got 40-50
    times faster (major motivation for constantly needing increasingly
    number of concurrently executing programs). Disk division executive took exception and directed the division performance organization to refute
    my claims. After a couple weeks, they came back and basically said that
    I had slightly understated the problem. They then respun the analysis
    for SHARE (user group) presentation on how to configure/manage disks for improved system throughput (16Aug1984, SHARE 63, B874).

    3033 above the "16mbyte" line hack: There were problems with parts of
    system that required virtual pages below the "16mbyte line". Introduced
    with 370 was I/O channel program IDALs that were full-word
    addresses. Somebody came up with idea to use IDALs to write a virtual
    page (above 16mbyte) to disk and then read it back into address
    <16mbyte. I gave them a hack using virtual address space table that
    filled in page table entries with the >16mbyte page number and <16mbyte
    page number and use MVCL instruction to copy the virtual page from above 16mbyte line to below the line.
    --
    virtualization experience starting Jan1968, online at home since Mar1970
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From EricP@ThatWouldBeTelling@thevillage.com to comp.arch on Sun Oct 5 13:02:29 2025
    From Newsgroup: comp.arch

    John Levine wrote:
    It appears that Michael S <already5chosen@yahoo.com> said:
    The last widely used single address space systems I can think of were
    OS/VS1 and OS/VS2 SVS,
    How would you call OS/400 (nowadays, IBM i) ?

    I haven't looked at it for a while but I think you're right.
    They have POSIX compatible APIs, wonder how that works.

    For operating systems like VMS and WNT that cannot fork (duplicate a
    parent virtual space into a child) Posix allows spawn() instead.

    Spawn is equivalent to fork()/exec() and CreateProcess() in that
    it creates a new address space, loads an exe, and starts a thread.
    Like fork() and WNT CreateProcess(), spawn() allows open file descriptor handles to be passed to the child.

    https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/spawn.h.html

    https://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_spawn.html



    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From George Neuner@gneuner2@comcast.net to comp.arch on Mon Oct 6 06:54:10 2025
    From Newsgroup: comp.arch

    On Fri, 3 Oct 2025 16:18:47 -0000 (UTC), kegs@provalid.com (Kent
    Dickey) wrote:

    In article <1759506155-5857@newsgrouper.org>,
    MitchAlsup <user5857@newsgrouper.org.invalid> wrote:

    Stefan Monnier <monnier@iro.umontreal.ca> posted:

    | - virtually tagged caches
    | You can't really claim to be worst-of-the-worst without virtually >>> >> |tagged caches.
    | Tears of joy as you debug cache alias issues and of flushing caches >>> >> |on context switches.
    That is only true if one insists on OS with Multiple Address Spaces.
    Virtually tagged caches are fine for Single Address Space (SAS) OS.

    AFAIK, the main problem with SASOS is "backward compatibility", most
    importantly with `fork`. The Mill people proposed a possible solution,
    which seemed workable, but it's far from clear to me whether it would
    work well enough if you want to port, say, Debian to such
    an architecture.

    SASOS seems like a bridge too far.


    Stefan

    Fork is not a problem with virtual tagged caches or SAS. Normal fork
    starts the child with a copy of the parent's address mapping, and uses
    "Copy on Write" (COW) to create unique pages as soon as either process
    does a write.

    Copy-On-Write (or Copy-On-Access) doesn't solve the fork problem in
    SAS - which is that copied /pointers/ remain referencing objects in
    the original process. Under the multi-space model of Unix/Linux,
    after a fork the copied pointers should be referencing the copied
    objects in the new process.

    Lacking a way to identify and fixup pointer values, under SAS by
    simply copying data (COW or COA) you end unintentionally /sharing/
    data.


    For it's entire existance, PA-RISC HP-UX supported virtual indexed
    caches in a SAS, and implemented fork using Copy On Access. As soon as
    the child process touched any page for read or write, it got a copy, so
    it can only access its own pages (not counting read-only instruction
    pages). This works fine, and it's not a performance issue. The love
    folks have for COW is overblown. Real code either immediately exec()'s >(maybe doing some close()'s and other housekeeping first) or starts
    writing lots of pages doing what it wants to do as a new process. Note
    since the OS knows it needs to copy pages, it can pre-copy a bunch of
    pages, such as the stack, and some basic data pages, to avoid some
    initial faults for the exec() case at least.

    fork-exec is not a problem. fork alone is.

    How did HP-UX on PA-RISC handle fork?


    Kent

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From kegs@kegs@provalid.com (Kent Dickey) to comp.arch on Mon Oct 6 15:49:10 2025
    From Newsgroup: comp.arch

    In article <10brpft$23go$1@gal.iecc.com>, John Levine <johnl@taugh.com> wrote: >It appears that Kent Dickey <kegs@provalid.com> said:
    AFAIK, the main problem with SASOS is "backward compatibility", most >>>importantly with `fork`. ...

    First process is ASID=1. It forks, and the child is ASID=2. It is a >>completely new address space. ...

    Sorry, bad terminology. I just means all addresses under ASID=2 are
    invalid.

    In my example, all processes can peek inside any other process's address
    space, by just forming the 64-bit virtual address. The ASID thing is
    just a convention, so I wouldn't have to type 16 digit hex numbers over and over.

    [snip]

    The last widely used single address space systems I can think of were OS/VS1 >and OS/VS2 SVS, each of which provided a single full sized address space in >which they essentially ran their real memory predecessors MFT and MVT. As >Lynn has often told us, operating system bloat forced them quickly to go
    to MVS, an address space per process.

    HP-UX on PA-RISC from 1986-2004 or so was effectively a SAS computer. In 32-bit CPUs, the virtual address space was 48 bits, and normal user code could form any 48-bit address, and this was used for shared libraries and shared
    code (processes running the same executable shared the same virtual address space for the executable). In 64-bit mode, it works mostly as I described. There were 32-bit Space registers which were OR'ed into the upper bits of
    the 64-bit virtual address, to give the global 64-bit system address.
    It was an OS convention to limit the Space values to the upper 16 bits or so, and it could change it to whatever it wanted.

    I suppose there could still be single address space realtime or
    embedded systems where all the programs to be run are known when the
    system is built.



    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    Kent
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From kegs@kegs@provalid.com (Kent Dickey) to comp.arch on Mon Oct 6 16:44:52 2025
    From Newsgroup: comp.arch

    In article <ne67ekdeej48s8jp7jh1ahda32qmiphm0p@4ax.com>,
    George Neuner <gneuner2@comcast.net> wrote:
    On Fri, 3 Oct 2025 16:18:47 -0000 (UTC), kegs@provalid.com (Kent
    Dickey) wrote:

    In article <1759506155-5857@newsgrouper.org>,
    MitchAlsup <user5857@newsgrouper.org.invalid> wrote:

    Stefan Monnier <monnier@iro.umontreal.ca> posted:

    | - virtually tagged caches
    | You can't really claim to be worst-of-the-worst without virtually >>>> >> |tagged caches.
    | Tears of joy as you debug cache alias issues and of flushing caches >>>> >> |on context switches.
    That is only true if one insists on OS with Multiple Address Spaces. >>>> > Virtually tagged caches are fine for Single Address Space (SAS) OS.

    AFAIK, the main problem with SASOS is "backward compatibility", most
    importantly with `fork`. The Mill people proposed a possible solution, >>>> which seemed workable, but it's far from clear to me whether it would
    work well enough if you want to port, say, Debian to such
    an architecture.

    SASOS seems like a bridge too far.


    Stefan

    Fork is not a problem with virtual tagged caches or SAS. Normal fork >>starts the child with a copy of the parent's address mapping, and uses >>"Copy on Write" (COW) to create unique pages as soon as either process
    does a write.

    Copy-On-Write (or Copy-On-Access) doesn't solve the fork problem in
    SAS - which is that copied /pointers/ remain referencing objects in
    the original process. Under the multi-space model of Unix/Linux,
    after a fork the copied pointers should be referencing the copied
    objects in the new process.

    Lacking a way to identify and fixup pointer values, under SAS by
    simply copying data (COW or COA) you end unintentionally /sharing/
    data.


    For it's entire existance, PA-RISC HP-UX supported virtual indexed
    caches in a SAS, and implemented fork using Copy On Access. As soon as
    the child process touched any page for read or write, it got a copy, so
    it can only access its own pages (not counting read-only instruction >>pages). This works fine, and it's not a performance issue. The love
    folks have for COW is overblown. Real code either immediately exec()'s >>(maybe doing some close()'s and other housekeeping first) or starts
    writing lots of pages doing what it wants to do as a new process. Note >>since the OS knows it needs to copy pages, it can pre-copy a bunch of >>pages, such as the stack, and some basic data pages, to avoid some
    initial faults for the exec() case at least.

    fork-exec is not a problem. fork alone is.

    How did HP-UX on PA-RISC handle fork?


    Kent

    This is what I was saying: if you define SAS to only mean that each
    process is living at a unique address, and it knows its full address,
    then I don't wish to discuss that SAS. That's like running without
    virtual memory.

    If you define SAS that all processes can see other running processes
    addresses, and can directly read/write each others addresses (with protection obviously), then that's the SAS HP PA-RISC ran in.

    HP PA-RISC 64-bit creates a 64-bit global virtual address. Each process
    by convention lives in a smaller part of that, let's say a 48-bit space.
    Each process has 8 32-bit Space Registers (not general registers, and
    some are not writeable by the user, but 5 are writeable) which are OR'ed
    in to bits [63:32] of the VA address bits formed by loads and stores to
    form the GVA. Of GVA bits [63:32], it's an OS convention how many bits
    are effectively the ASID and how many are VA bits for the process.
    The GVA is mostly transparent to the user process--they can read the Space Registers and figure it out if they want to, but this was not usual.

    [The architecture defines Space registers as up to 64-bit, so there's a 96-bit GVA, but the hardware only implemented 32-bit Space registers with a 64-bit GVA].

    Note that at any time, user code can set Space Register 1 to 0, form
    the address 0x12345678_12345670 in a register, and try to read and write
    that address. This will generally fail due to a Protection ID scheme, but
    some Space Register values were reserved for shared libraries to share the
    code at the same GVA in all processes.

    So fork() is easy--no pointers in memory or registers are affected, the
    OS assignes a new ASID, puts that in the upper bits of the Space
    Registers for the new process, and it's off. But all HP PA-RISC CPUs have virtually indexed caches, where the ASID is mixed in with lower address
    bits to "hash" the cache lookup. So it needed to do COA since the new
    ASID is different, so the same VA wouldn't see the cached data of the
    old process.

    Note that the OS sees all processes at once. If it wants to read from
    one process and write to another, it can just do Load, then Store.

    Kent
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From John Savard@quadibloc@invalid.invalid to comp.arch on Thu Oct 9 13:57:06 2025
    From Newsgroup: comp.arch

    On Fri, 03 Oct 2025 08:58:32 +0000, Anton Ertl wrote:

    Apparently someone wants to create a big-endian RISC-V, and someone
    proposed adding support to that to Linux.

    I had previously seen Linus' specific response to that: support should not
    be added now, as that would be promoting fragmentation of RISC-V, but, of course, if it was implemented and widely used, of course it would have to
    be supported in Linux.

    John Savard
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From John Savard@quadibloc@invalid.invalid to comp.arch on Thu Oct 9 21:41:03 2025
    From Newsgroup: comp.arch

    On Fri, 03 Oct 2025 08:58:32 +0000, Anton Ertl quoted:
    |If somebody really wants to create bad hardware in this day and age,
    |please do make it big-endian, and also add the following very
    |traditional features for sh*t-for-brains hardware:

    I think that for a computer to be big-endian is a good thing.

    It makes it easier to understand core dumps, as numbers are stored just as they are written.

    But more importantly, it means that binary integers are ordered the same
    way as packed decimal integers, which are ordered the same way as integers
    in character text form.

    As for the _rest_ of the items, though, all of them are indeed bad things.

    But some are worse than others.

    | - only do aligned memory accesses

    Nearly all memory access are, or could be, aligned. Performance is
    improved if they are. As long as there's some provision to handle
    unaligned data, such as a move characters instruction, data structures can
    be dealt with for things like communications formats.
    I'm not saying it isn't bad, just that it was excusable before we had as
    many transistors available as we do now.

    | - expose your pipeline details in the ISA

    The original MIPS did this. This is bad indeed, as whatever you do in this direction won't be applicable to later iterations of the ISA as technology advances.

    Failing to support the entire IEEE 754 floating-point standard just needs
    to be documented. Expecting software to fake it being implemented is not reasonable: as long as denormals instead produce zero as the result, one
    just has an inferior floating-point format, not a computer that doesn't
    work.
    Once again, bad, but not all that terrible.

    But anything that means that programs could randomly fail because
    interrupts don't properly save or restore the entire machine state...
    *that* is catastrophically bad, and hardly compares to his other examples.

    John Savard

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Thu Oct 9 22:10:10 2025
    From Newsgroup: comp.arch


    John Savard <quadibloc@invalid.invalid> posted:

    On Fri, 03 Oct 2025 08:58:32 +0000, Anton Ertl quoted:
    |If somebody really wants to create bad hardware in this day and age, |please do make it big-endian, and also add the following very
    |traditional features for sh*t-for-brains hardware:

    I think that for a computer to be big-endian is a good thing.

    It makes it easier to understand core dumps, as numbers are stored just as they are written.

    But more importantly, it means that binary integers are ordered the same
    way as packed decimal integers, which are ordered the same way as integers in character text form.

    Nada true:: packed decimal in LE is stored in the same order as binary.
    Bytes at higher addresses are more significant.

    As for the _rest_ of the items, though, all of them are indeed bad things.

    But some are worse than others.

    | - only do aligned memory accesses

    Nearly all memory access are, or could be, aligned. Performance is
    improved if they are. As long as there's some provision to handle
    unaligned data, such as a move characters instruction, data structures can be dealt with for things like communications formats.
    I'm not saying it isn't bad, just that it was excusable before we had as many transistors available as we do now.

    I am (AM) a BE guy through and through--but even I can read the writing
    on the wall. BE is dead and will remain an ever shrinking niche. Making
    My 66000 architecture LE was <indeed> painful; but ultimately the correct decision.

    | - expose your pipeline details in the ISA

    The original MIPS did this. This is bad indeed, as whatever you do in this direction won't be applicable to later iterations of the ISA as technology advances.

    We {the original RISC generation 1 architects} would have all dropped
    delayed branches if we believed everyone else would do so. But we knew
    they wouldn't, so we couldn't allow ourselves to loose 20% perf, so we
    all jumped off the same cliff like lemmings. That was in the 1-wide
    generation, by the 2-wide generation we knew it was bad-architecture,
    by the 4-wide generation we would have all been better off without.

    I do not think any of us would do that to our projects again. I advise
    you not too either.

    Failing to support the entire IEEE 754 floating-point standard just needs
    to be documented. Expecting software to fake it being implemented is not reasonable: as long as denormals instead produce zero as the result, one just has an inferior floating-point format, not a computer that doesn't work. Once again, bad, but not all that terrible.

    No, just no. There are enough transistors today to "do the right thing"
    a) full 754-2019 support
    b) misaligned memory
    c) hardware table-walkers
    d) HyperVisor support
    e) an infinite number of interrupt tables

    But anything that means that programs could randomly fail because
    interrupts don't properly save or restore the entire machine state...
    *that* is catastrophically bad, and hardly compares to his other examples.

    We now need to provide for situations where the Guest OS fails, or
    where Host OS fails; and the system remains up and running while
    only a few applications die off and guest OS or Host OS reboots
    from checkpoints.

    John Savard

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Thu Oct 9 22:21:12 2025
    From Newsgroup: comp.arch

    John Savard <quadibloc@invalid.invalid> writes:
    On Fri, 03 Oct 2025 08:58:32 +0000, Anton Ertl quoted:
    |If somebody really wants to create bad hardware in this day and age,
    |please do make it big-endian, and also add the following very
    |traditional features for sh*t-for-brains hardware:

    I think that for a computer to be big-endian is a good thing.

    It makes it easier to understand core dumps, as numbers are stored just as >they are written.

    Any good dump analyzer will happily bswap the value before converting
    it into a printable form on a little-endian system, just to make it
    readable (when dumping in other than 8-bit units, of course).

    The only benefit in modern days for big-endian is that network
    protocols are in big-endian form. Not a big issue with modern
    LE CPUs, where byteswap is a single cycle instruction.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Fri Oct 10 08:30:03 2025
    From Newsgroup: comp.arch

    scott@slp53.sl.home (Scott Lurndal) writes:
    The only benefit in modern days for big-endian is that network
    protocols are in big-endian form. Not a big issue with modern
    LE CPUs, where byteswap is a single cycle instruction.

    Clever architects put the byte swap it in the load and store
    instructions, where the byte-swapping is just an addition to the
    handling of misaligned loads and stores, which itself is an addition
    to the handling of smaller-than-transfer-width accesses. PowerPC has
    such instructions.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Fri Oct 10 15:02:17 2025
    From Newsgroup: comp.arch

    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    scott@slp53.sl.home (Scott Lurndal) writes:
    The only benefit in modern days for big-endian is that network
    protocols are in big-endian form. Not a big issue with modern
    LE CPUs, where byteswap is a single cycle instruction.

    Clever architects put the byte swap it in the load and store
    instructions, where the byte-swapping is just an addition to the
    handling of misaligned loads and stores, which itself is an addition
    to the handling of smaller-than-transfer-width accesses. PowerPC has
    such instructions.

    Even better, hardware network accelerators bypass the CPU entirely
    whan working with packets.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat Oct 11 07:18:16 2025
    From Newsgroup: comp.arch

    John Savard <quadibloc@invalid.invalid> writes:
    On Fri, 03 Oct 2025 08:58:32 +0000, Anton Ertl quoted:
    |If somebody really wants to create bad hardware in this day and age,
    |please do make it big-endian, and also add the following very
    |traditional features for sh*t-for-brains hardware:

    I think that for a computer to be big-endian is a good thing.

    Whatever the technical merits of different byte orders may be (and the
    names "big-endian" and "little-endian" already indicate that far more discussion has been expended on the topic than these merits justify <https://en.wikipedia.org/wiki/Lilliput_and_Blefuscu#History_and_politics>), little-endian has won, and that's its major merit, and big-endian's
    major demerit.

    Any big-endian architecture will suffer from less software support,
    and conversely, if software wants to include support for this
    hardware, that results in extra development effort, i.e., extra cost
    (not for all software, but for some). And Linus Torvalds is not
    willing to expend this effort, not even if the initial patches for
    supporting such an architecture come for free, because the additional
    effort would be ongoing.

    IBM has recognized the sign of the times, and added full-blown
    little-endian support to Power (including unaligned accesses), and in
    their Linux efforts retracted their support for the big-endian Power
    and threw their weight behind little-endian Power.

    Standardization has lots of merits, and deviating from an established
    standard is a step one should not take lightly.

    But more importantly, it means that binary integers are ordered the same
    way as packed decimal integers, which are ordered the same way as integers >in character text form.

    Says who? In a course we were a group of five who had to write some
    program dealing with BCD numbers in 80286 assembly language. We
    divided the work up, with each one writing some routines. Evantually,
    on integration testing, we found that half of the group had
    interpreted the numbers to be represented in little-endian order
    (because the CPU was little-endian), and the other half had
    interpreted them to be represented in big-endian order (because that
    results in more readable memory dumps); and none of us thought that
    any of the others would implement the other byte order. So no, the
    byte order of BCD numbers is not obvious.

    | - only do aligned memory accesses

    Nearly all memory access are, or could be, aligned. Performance is
    improved if they are. As long as there's some provision to handle
    unaligned data, such as a move characters instruction, data structures can >be dealt with for things like communications formats.
    I'm not saying it isn't bad, just that it was excusable before we had as >many transistors available as we do now.

    Again, the merit of supporting unaligned accesses in this day and age
    is that more software will run on your hardware, and the demerit of
    not doing it is that extra software effort is required for some
    software to support it, as you outline.

    Failing to support the entire IEEE 754 floating-point standard just needs
    to be documented. Expecting software to fake it being implemented is not >reasonable: as long as denormals instead produce zero as the result, one >just has an inferior floating-point format, not a computer that doesn't >work.

    Software that expects a-b == 0.0 to give the same result as a==b (as
    guaranteed by IEEE 754 40 years ago) won't work. What do you mean
    with "not a computer that does not work" if the computer does not run
    software with the intended results?

    I take pride in the portability of my software, but for things that
    have been settled in the mainstream (byte order, alignment, IEEE FP,
    among other things), there must be a very good reason to support
    deviants. E.g., RWX mappings have worked on every OS since the
    beginning of mmap(), and are necessary for JITs. Trying to mmap RWX
    fails on MacOS on Apple Silicon (it works on the same MacOS version on
    Intel hardware, and it works on the same Apple Silicon under Linux, so
    this is a voluntary removal of a capability by Apple). As a result,
    the development version of Gforth did not work on MacOS on Apple
    Silicon for several years.

    My plan for fixing that was to just disable the JIT compiler and fall
    back to the threaded code interpreter on that OS, but Bernd Paysan
    actually decided to jump through the hoops that Apple sets up for
    people writing JIT compilers. The result is a speedup by a factor 2-3
    (times are run-times in seconds):

    sieve bubble matrix fib fft
    0.108 0.107 0.071 0.119 0.057 threaded code on Mac Mini M1 MacOS
    0.052 0.041 0.027 0.038 0.018 JIT compiler on Mac Mini M1 MacOS
    0.029 0.034 0.015 0.044 0.015 JIT compiler on Core i5-1135G7 Linux

    For comparison, I also provided numbers for laptop hardware
    contemporary with the M1.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From John Levine@johnl@taugh.com to comp.arch on Sun Oct 12 02:37:40 2025
    From Newsgroup: comp.arch

    According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
    John Savard <quadibloc@invalid.invalid> writes:
    On Fri, 03 Oct 2025 08:58:32 +0000, Anton Ertl quoted:
    |If somebody really wants to create bad hardware in this day and age,
    |please do make it big-endian, and also add the following very
    |traditional features for sh*t-for-brains hardware:

    I think that for a computer to be big-endian is a good thing.

    Garrrgghhhhhhhh, not this again.

    Whatever the technical merits of different byte orders may be (and the
    names "big-endian" and "little-endian" already indicate that far more >discussion has been expended on the topic than these merits justify ><https://en.wikipedia.org/wiki/Lilliput_and_Blefuscu#History_and_politics>), >little-endian has won, and that's its major merit, and big-endian's
    major demerit.

    Yup. I really wish the arguments about which order is "more natural"
    would stop since they're just people's cultural preconceptions. I
    imagine that if my first language were Arabic or Hebrew, I would find left-to-right big-endian core dumps much less readable than the
    familiar looking right-to-left little-endian ones.

    But as you correctly said, the fight is over, little-endian has won,
    let's argue about something else.

    IEN 137 said everything worth saying about this topic 45 years ago.

    https://www.rfc-editor.org/ien/ien137.txt
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Oct 12 07:13:57 2025
    From Newsgroup: comp.arch

    John Levine <johnl@taugh.com> schrieb:

    But as you correctly said, the fight is over, little-endian has won,
    let's argue about something else.

    There is something to be said for at least having a big-endian
    system around to test programs: If people mismatch types, there
    is a chance that it will blow up on a big-endian system and work
    silently on a little-endian system.

    This has a reverse side: Little-endian having effectively won,
    software often does not work on big-endian systems out of the box
    any more. I suspect this is why IBM effectively chose little-endian
    for POWER, but AIX is big-endian (and will remain so for the forseeable future).

    And of course, this is all due to an architecture which is arguably
    the most influential of all times (or at least has the highest
    ratio of influence to recognition level, but that by a _huge_ margin):
    The Datapoint 2200.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun Oct 12 09:51:38 2025
    From Newsgroup: comp.arch

    Thomas Koenig <tkoenig@netcologne.de> writes:
    There is something to be said for at least having a big-endian
    system around to test programs: If people mismatch types, there
    is a chance that it will blow up on a big-endian system and work
    silently on a little-endian system.

    If the only thing wrong with the software is that it does not work on big-endian systems, and little-endian has won, is there really
    anything wrong with the software?

    This has a reverse side: Little-endian having effectively won,
    software often does not work on big-endian systems out of the box
    any more. I suspect this is why IBM effectively chose little-endian
    for POWER, but AIX is big-endian (and will remain so for the forseeable >future).

    If someone chooses to buy a big-endian system nowadays, they hopefully
    know about these problems. If they need a particular piece of
    software, they hopefully are able to sponsor porting it to the
    big-endian system.

    And of course, this is all due to an architecture which is arguably
    the most influential of all times (or at least has the highest
    ratio of influence to recognition level, but that by a _huge_ margin):
    The Datapoint 2200.

    Another widely-used architecture today inherited its byte order from
    the 6502.

    But the actual reason why little-endian has won is that all the
    big-endian architectures either have been cancelled (HPPA, MIPSeb,
    SPARC), switched to little-endian (Power on Linux), or are retreating
    to a niche (Power on AIX, S390x).

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Oct 12 10:14:08 2025
    From Newsgroup: comp.arch

    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    There is something to be said for at least having a big-endian
    system around to test programs: If people mismatch types, there
    is a chance that it will blow up on a big-endian system and work
    silently on a little-endian system.

    If the only thing wrong with the software is that it does not work on big-endian systems, and little-endian has won, is there really
    anything wrong with the software?

    A type mismatch? I think so.

    And of course, this is all due to an architecture which is arguably
    the most influential of all times (or at least has the highest
    ratio of influence to recognition level, but that by a _huge_ margin):
    The Datapoint 2200.

    Another widely-used architecture today inherited its byte order from
    the 6502.

    Which one?
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Sun Oct 12 13:56:25 2025
    From Newsgroup: comp.arch

    On Sun, 12 Oct 2025 10:14:08 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:

    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    There is something to be said for at least having a big-endian
    system around to test programs: If people mismatch types, there
    is a chance that it will blow up on a big-endian system and work
    silently on a little-endian system.

    If the only thing wrong with the software is that it does not work
    on big-endian systems, and little-endian has won, is there really
    anything wrong with the software?

    A type mismatch? I think so.

    And of course, this is all due to an architecture which is arguably
    the most influential of all times (or at least has the highest
    ratio of influence to recognition level, but that by a _huge_
    margin): The Datapoint 2200.

    Another widely-used architecture today inherited its byte order from
    the 6502.

    Which one?

    Arm. It was designed as CPU for successor of 6502-based BBC Micro.

    But does 6502 really have "byte order" in hardware? Or just "soft"
    conventions of BBC BASIC interpreter?


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Oct 12 11:38:39 2025
    From Newsgroup: comp.arch

    Michael S <already5chosen@yahoo.com> schrieb:
    On Sun, 12 Oct 2025 10:14:08 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:

    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    There is something to be said for at least having a big-endian
    system around to test programs: If people mismatch types, there
    is a chance that it will blow up on a big-endian system and work
    silently on a little-endian system.

    If the only thing wrong with the software is that it does not work
    on big-endian systems, and little-endian has won, is there really
    anything wrong with the software?

    A type mismatch? I think so.

    And of course, this is all due to an architecture which is arguably
    the most influential of all times (or at least has the highest
    ratio of influence to recognition level, but that by a _huge_
    margin): The Datapoint 2200.

    Another widely-used architecture today inherited its byte order from
    the 6502.

    Which one?

    Arm.

    That does not have many architectural features from the 6502 :-)

    It was designed as CPU for successor of 6502-based BBC Micro.

    But does 6502 really have "byte order" in hardware? Or just "soft" conventions of BBC BASIC interpreter?

    Yes, the 6502 is little-endian, which you can see in its instruction
    formats and the way the pointers in the zero page were stored.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Sun Oct 12 15:31:21 2025
    From Newsgroup: comp.arch

    On Sun, 12 Oct 2025 11:38:39 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:

    Michael S <already5chosen@yahoo.com> schrieb:
    On Sun, 12 Oct 2025 10:14:08 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:

    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    There is something to be said for at least having a big-endian
    system around to test programs: If people mismatch types, there
    is a chance that it will blow up on a big-endian system and work
    silently on a little-endian system.

    If the only thing wrong with the software is that it does not
    work on big-endian systems, and little-endian has won, is there
    really anything wrong with the software?

    A type mismatch? I think so.

    And of course, this is all due to an architecture which is
    arguably the most influential of all times (or at least has the
    highest ratio of influence to recognition level, but that by a
    _huge_ margin): The Datapoint 2200.

    Another widely-used architecture today inherited its byte order
    from the 6502.

    Which one?

    Arm.

    That does not have many architectural features from the 6502 :-)

    It has the same byte order.

    CZVN flags are superficially similar, although there is an important
    difference - on ARM Z flag is not affected by non-arithmetic
    instructions.

    Also both processors appear to share a philosophy of design driven by practicality rather than by theoretical principles. They are what they
    are because that was a maximum that comfortably fit into available
    budgets of all sorts rather than because of "closing semantic gap" or
    conversly "reducing instruction set".



    It was designed as CPU for successor of 6502-based BBC Micro.

    But does 6502 really have "byte order" in hardware? Or just "soft" conventions of BBC BASIC interpreter?

    Yes, the 6502 is little-endian,
    which you can see in its instruction formats

    That does not count. Instruction encoding is orthogonal to the question
    of byte order during execution. I had seen various combinations.
    Including encodings that have no particular order, i.e. immediate field scattered in instruction word. Not that I remember which architecture it
    was.

    and the way the pointers in the zero page were stored.


    Yes, I see.
    Indirect addressing modes are clearly LE.
    In case of JMP instruction 16-bit LE pointer does not even have to be in
    zero page.




    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun Oct 12 13:31:22 2025
    From Newsgroup: comp.arch

    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    There is something to be said for at least having a big-endian
    system around to test programs: If people mismatch types, there
    is a chance that it will blow up on a big-endian system and work
    silently on a little-endian system.

    If the only thing wrong with the software is that it does not work on
    big-endian systems, and little-endian has won, is there really
    anything wrong with the software?

    A type mismatch? I think so.

    If there is really something wrong with the software on little-endian
    systems, you don't need a big-endian system to find the mistake.

    Another widely-used architecture today inherited its byte order from
    the 6502.

    Which one?

    ARM A32, and then T32 and A64.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun Oct 12 13:36:51 2025
    From Newsgroup: comp.arch

    Michael S <already5chosen@yahoo.com> writes:
    On Sun, 12 Oct 2025 11:38:39 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:

    Michael S <already5chosen@yahoo.com> schrieb:
    Arm.

    That does not have many architectural features from the 6502 :-)

    It has the same byte order.

    Which is what is relevant for the question at hand. The intention of
    the ARM architects was to produce a CPU for their successor of the BBC
    Micro, and they certainly mentioned the prominent role of the 6502 as inspiration in their accounts; they obviously did not try to create a
    32-bit 6502, but at least they did not change the byte order.

    CZVN flags are superficially similar, although there is an important >difference - on ARM Z flag is not affected by non-arithmetic
    instructions.

    What about the other flags? My impression was that ARM instruction
    sets always set NZCV together, which makes OoO implementation quite a
    bit cheaper.

    Looking in Zaks' 6502 book, I find that SBC sets NVZC, whereas CMP
    only sets NZC (and lots of other instructions only set NZ). I expect
    that this difference between SBC and CMP cost a transistor or two. I
    wonder why they did that. Only setting NZ on, e.g., INC/INX/INY
    probably also cost some transistors, but allowed to keep C in. e.g. a long-addition loop.

    Yes, the 6502 is little-endian,
    which you can see in its instruction formats

    That does not count. Instruction encoding is orthogonal to the question
    of byte order during execution. I had seen various combinations.
    Including encodings that have no particular order, i.e. immediate field >scattered in instruction word. Not that I remember which architecture it
    was.

    In many (e.g., HPPA, RISC-V, funny constant encodings on ARM A64).
    However, on the 6502 it is significant, because the instructions are
    read byte-by-byte. They switched from the 6800's big-endian order to little-endian because the latter was cheaper and faster to implement
    especially in the instructions. For the data, they could have
    accessed two-byte data backwards and become big-endian (but with the
    address pointing to the LSB, and the MSB being at address-1) without
    much difficulty. The unusual address could be hidden by the assembler
    (i.e., if you write "lda (2),y", that would be encoded as $b1 $3.

    Indirect addressing modes are clearly LE.
    In case of JMP instruction 16-bit LE pointer does not even have to be in
    zero page.

    JSR stores the return address in little-endian order and RTS loads the
    address to return to in little-endian order.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Oct 12 15:10:02 2025
    From Newsgroup: comp.arch

    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    There is something to be said for at least having a big-endian
    system around to test programs: If people mismatch types, there
    is a chance that it will blow up on a big-endian system and work >>>>silently on a little-endian system.

    If the only thing wrong with the software is that it does not work on
    big-endian systems, and little-endian has won, is there really
    anything wrong with the software?

    A type mismatch? I think so.

    If there is really something wrong with the software on little-endian systems, you don't need a big-endian system to find the mistake.

    Would you consider a type mistake (access through the wrong type
    of pointer, say store a value to char * and read via int *) to
    be an error or not, if it is not directly observable on limited
    number of test runs on a little-endian system? Your comment would
    suggest not.


    Another widely-used architecture today inherited its byte order from
    the 6502.

    Which one?

    ARM A32, and then T32 and A64.

    https://developer.arm.com/documentation/102376/0200/Alignment-and-endianness/Endianness
    says endianness can be configurable (unless you mean something else
    by A64).
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun Oct 12 15:48:02 2025
    From Newsgroup: comp.arch

    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    If the only thing wrong with the software is that it does not work on
    big-endian systems, and little-endian has won, is there really
    anything wrong with the software?

    A type mismatch? I think so.

    If there is really something wrong with the software on little-endian
    systems, you don't need a big-endian system to find the mistake.

    Would you consider a type mistake (access through the wrong type
    of pointer, say store a value to char * and read via int *) to
    be an error or not, if it is not directly observable on limited
    number of test runs on a little-endian system? Your comment would
    suggest not.

    If no test can be devised that shows unintended behaviour on the
    little-endian system, then I consider the program as delivered to be
    working.

    If a test can be devised that shows unintended behaviour on the
    little-endian system, then there is no need for testing on a
    big-endian system.

    Another widely-used architecture today inherited its byte order from
    the 6502.

    Which one?

    ARM A32, and then T32 and A64.

    https://developer.arm.com/documentation/102376/0200/Alignment-and-endianness/Endianness
    says endianness can be configurable (unless you mean something else
    by A64).

    Which has zero relevance, because everyone in their right mind
    configures their machine little-endian.
    <https://wiki.debian.org/ArmPorts> says:

    |armeb - Big-endian OABI port targeting the linksys NSLU2 and
    |similar. Interest fell after a method was determined for running
    |little ending Linux systems on the NSLU2. Active during the sarge
    |timeframe and now abandoned.

    It would be cool for people who want to test portability to big-endian
    systems if one could actually configure, say, a Raspi 5 for big-endian operation, and have a big-endian Linux distribution running on it, but
    who is going to pay the developers for all this work?

    And given that little-endian has won, why would one want to be able to
    port to big-endian? Sure, there is a certain satisfaction in doing
    pointless work, and one could extend this to supporting
    word-addressable machines, 36-bit machines, sign-magnitude and
    ones-complement machines, and decimal-arithmetic machines. But it's
    better to spend one's time on useful features.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sun Oct 12 16:11:27 2025
    From Newsgroup: comp.arch


    John Levine <johnl@taugh.com> posted:

    According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
    John Savard <quadibloc@invalid.invalid> writes:
    On Fri, 03 Oct 2025 08:58:32 +0000, Anton Ertl quoted:
    |If somebody really wants to create bad hardware in this day and age,
    |please do make it big-endian, and also add the following very
    |traditional features for sh*t-for-brains hardware:

    I think that for a computer to be big-endian is a good thing.

    Garrrgghhhhhhhh, not this again.

    Whatever the technical merits of different byte orders may be (and the >names "big-endian" and "little-endian" already indicate that far more >discussion has been expended on the topic than these merits justify ><https://en.wikipedia.org/wiki/Lilliput_and_Blefuscu#History_and_politics>), >little-endian has won, and that's its major merit, and big-endian's
    major demerit.

    Yup. I really wish the arguments about which order is "more natural"
    would stop since they're just people's cultural preconceptions. I
    imagine that if my first language were Arabic or Hebrew, I would find left-to-right big-endian core dumps much less readable than the
    familiar looking right-to-left little-endian ones.

    Top to bottom works for Japanese and Chinese. Yet I hear not
    appetite for TB byte order.

    But as you correctly said, the fight is over, little-endian has won,
    let's argue about something else.

    IEN 137 said everything worth saying about this topic 45 years ago.

    https://www.rfc-editor.org/ien/ien137.txt

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Oct 12 16:25:51 2025
    From Newsgroup: comp.arch

    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    If the only thing wrong with the software is that it does not work on >>>>> big-endian systems, and little-endian has won, is there really
    anything wrong with the software?

    A type mismatch? I think so.

    If there is really something wrong with the software on little-endian
    systems, you don't need a big-endian system to find the mistake.

    Would you consider a type mistake (access through the wrong type
    of pointer, say store a value to char * and read via int *) to
    be an error or not, if it is not directly observable on limited
    number of test runs on a little-endian system? Your comment would
    suggest not.

    If no test can be devised that shows unintended behaviour on the little-endian system, then I consider the program as delivered to be
    working.

    That isn't what I was saying.

    If a test can be devised that shows unintended behaviour on the
    little-endian system, then there is no need for testing on a
    big-endian system.

    Testing, by its very nature, is incomplete. The theoretical
    possibility that a test can be derived does not help in practice.

    I believe you have written programs. Did you ever put in a bug
    that your existing testing framework did not catch?


    Another widely-used architecture today inherited its byte order from >>>>> the 6502.

    Which one?

    ARM A32, and then T32 and A64.
    https://developer.arm.com/documentation/102376/0200/Alignment-and-endianness/Endianness
    says endianness can be configurable (unless you mean something else
    by A64).

    Which has zero relevance, because everyone in their right mind
    configures their machine little-endian.
    <https://wiki.debian.org/ArmPorts> says:

    That's circular reasoning.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Sun Oct 12 19:56:32 2025
    From Newsgroup: comp.arch

    On Sun, 12 Oct 2025 15:10:02 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:

    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    There is something to be said for at least having a big-endian
    system around to test programs: If people mismatch types, there
    is a chance that it will blow up on a big-endian system and work >>>>silently on a little-endian system.

    If the only thing wrong with the software is that it does not
    work on big-endian systems, and little-endian has won, is there
    really anything wrong with the software?

    A type mismatch? I think so.

    If there is really something wrong with the software on
    little-endian systems, you don't need a big-endian system to find
    the mistake.

    Would you consider a type mistake (access through the wrong type
    of pointer, say store a value to char * and read via int *) to
    be an error or not, if it is not directly observable on limited
    number of test runs on a little-endian system? Your comment would
    suggest not.


    Another widely-used architecture today inherited its byte order
    from the 6502.

    Which one?

    ARM A32, and then T32 and A64.

    https://developer.arm.com/documentation/102376/0200/Alignment-and-endianness/Endianness
    says endianness can be configurable (unless you mean something else
    by A64).


    Once, many years ago, I encounterd ARMv7-AR processor (TI MCU that was
    based Cortex-R4 core) that was BE-only. I am still not sure whether it
    violates ARM standard or not.
    Never encountered ARMv7-M that was not LE-only.
    For Arm v8-A and v9-A, the formal requirements are hard to understand.
    But in pratice nobody makes cores that do not support LE or do not
    power-up in LE mode. May be, some of them can be switched into BE later.
    But why?





    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Oct 12 17:02:17 2025
    From Newsgroup: comp.arch

    Michael S <already5chosen@yahoo.com> schrieb:

    But in pratice nobody makes cores that do not support LE or do not
    power-up in LE mode. May be, some of them can be switched into BE later.
    But why?

    Somebody may want to port software from a big-endian system like
    zSystem, AIX or SPARC, and may not want to go to the trouble of
    making this code endian-clean.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Sun Oct 12 20:13:57 2025
    From Newsgroup: comp.arch

    On Sun, 12 Oct 2025 13:36:51 GMT
    anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

    Michael S <already5chosen@yahoo.com> writes:
    On Sun, 12 Oct 2025 11:38:39 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:

    Michael S <already5chosen@yahoo.com> schrieb:
    Arm.

    That does not have many architectural features from the 6502 :-)

    It has the same byte order.

    Which is what is relevant for the question at hand. The intention of
    the ARM architects was to produce a CPU for their successor of the BBC
    Micro, and they certainly mentioned the prominent role of the 6502 as inspiration in their accounts; they obviously did not try to create a
    32-bit 6502, but at least they did not change the byte order.

    CZVN flags are superficially similar, although there is an important >difference - on ARM Z flag is not affected by non-arithmetic
    instructions.

    What about the other flags?

    Sorry, my mistake. On 6502 Z is not the only flag that is affected by non-arithmetic instructions. N is affected as well.
    Also, apart fron different flags-handling by INC/DEC, which is
    fully expected, there are differences in Logical, shift and evenin
    compare instruuctions.
    So, the two architectures are more far apart in flags handling then I
    thought.

    Convinient reference here: http://www.6502.org/users/obelisk/6502/instructions.html


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun Oct 12 17:25:30 2025
    From Newsgroup: comp.arch

    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    If the only thing wrong with the software is that it does not work on >>>>>> big-endian systems, and little-endian has won, is there really
    anything wrong with the software?

    A type mismatch? I think so.

    If there is really something wrong with the software on little-endian
    systems, you don't need a big-endian system to find the mistake.

    Would you consider a type mistake (access through the wrong type
    of pointer, say store a value to char * and read via int *) to
    be an error or not, if it is not directly observable on limited
    number of test runs on a little-endian system? Your comment would >>>suggest not.

    If no test can be devised that shows unintended behaviour on the
    little-endian system, then I consider the program as delivered to be
    working.

    That isn't what I was saying.

    Correct: That's what I am saying.

    If a test can be devised that shows unintended behaviour on the
    little-endian system, then there is no need for testing on a
    big-endian system.

    Testing, by its very nature, is incomplete. The theoretical
    possibility that a test can be derived does not help in practice.

    Maybe not, but that's not my point: If no such test can be devised,
    would you call it a bug? Why?

    As for practice: Does testing on big-endian systems help in practice?
    Not in my experience. I don't remember ever finding a bug of the kind
    you indicate by testing on a big-endian system (and my primary laptop
    was big-endian until 2011), not a byte-order portability bug, much
    less something that I would consider a bug if portability to
    big-endian systems was not a goal.

    https://developer.arm.com/documentation/102376/0200/Alignment-and-endianness/Endianness
    says endianness can be configurable (unless you mean something else
    by A64).

    Which has zero relevance, because everyone in their right mind
    configures their machine little-endian.
    <https://wiki.debian.org/ArmPorts> says:

    That's circular reasoning.

    You may think so, but the lack of big-endian ARM systems makes my
    point.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun Oct 12 17:47:41 2025
    From Newsgroup: comp.arch

    Michael S <already5chosen@yahoo.com> writes:
    On Sun, 12 Oct 2025 13:36:51 GMT
    anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

    Michael S <already5chosen@yahoo.com> writes:
    CZVN flags are superficially similar, although there is an important
    difference - on ARM Z flag is not affected by non-arithmetic
    instructions.

    What about the other flags?

    Sorry, my mistake. On 6502 Z is not the only flag that is affected by >non-arithmetic instructions. N is affected as well.
    Also, apart fron different flags-handling by INC/DEC, which is
    fully expected, there are differences in Logical, shift and evenin
    compare instruuctions.
    So, the two architectures are more far apart in flags handling then I >thought.

    But I don't think that the ARM architects considered that to be a
    problem. The instructions were different anyway, and they did not
    want to have an 8086-style 6502->ARM assembly-language translator, did
    they?

    Anyway, for an OoO implementation the important question is if ARM
    always updates all of NZCV at the same time, or if it is selective in
    the updates.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Sun Oct 12 13:04:18 2025
    From Newsgroup: comp.arch

    On 10/12/2025 11:11 AM, MitchAlsup wrote:

    John Levine <johnl@taugh.com> posted:

    According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
    John Savard <quadibloc@invalid.invalid> writes:
    On Fri, 03 Oct 2025 08:58:32 +0000, Anton Ertl quoted:
    |If somebody really wants to create bad hardware in this day and age, >>>>> |please do make it big-endian, and also add the following very
    |traditional features for sh*t-for-brains hardware:

    I think that for a computer to be big-endian is a good thing.

    Garrrgghhhhhhhh, not this again.


    At this point, the main use-case of BE is because some people
    occasionally use it in file formats, because it is somehow perceived as
    better for file interchange despite pretty much none of the computers
    still in use using it as the native format.

    ...


    Well, and UTF-16 may use a BOM, so it can go either way, except when it doesn't use the BOM. And, UTF-8 sometimes uses a BOM though this is
    often an unwanted aberration when one mostly just wants ASCII text (and
    not all programs that read text-files as input deal gracefully with a BOM).

    Though, this wonk mostly came up if one tries to edit files in
    VisualStudio or Notepad. Most other text editors have the sense to not
    just randomly use UTF-16 or insert a BOM (now with the generally
    accepted default of assuming UTF-8 for non-ASCII characters, or if not
    valid as UTF-8, assuming 1252).

    Well, where statistically there is a low probability of confusing 1252
    for UTF-8 as only certain statistically-unlikely combinations would
    result in valid UTF-8 code-points.

    There was a problem in the past of sometimes programs unintentionally
    parsing ASCII as UTF-16, usually resulting in a mess of CJK characters.
    As apparently some MS tools would mistakenly parse ASCII as UTF-16 if it
    was an even number of bytes and if it "could" be parsed as UTF-16 (vs,
    say, detecting stuff that was unlikely to be valid ASCII). Apparently, a partial workaround (also in some MS tools) was that if not explicitly
    forcing ASCII, it would detect this scenario when saving and instead
    save as UTF-16.

    Well, then with the annoyance that if one edits a file in VS or similar,
    it might be magically turned into UTF-16. Well, except in newer VS,
    which has mostly gone over to UTF-8 + BOM.

    ...


    Though, partly for these reasons, BGBCC is BOM aware, generally
    normalizing code files internally as UTF-8 (BOM Free), and also CR+LF to CR-only, ... But, does mean that file-load requests need to distinguish between text and binary files on import.


    Ironically though, dual endian formats with a reversible magic aren't
    very very popular, possibly because people realize that even if such a
    format can be in the native endian, dealing with reversible endian is a
    bigger pain than just picking one or the other.

    Well, with possibly ELF and COFF as the main examples of formats that
    had gone this way (except PE/COFF that is pretty much always LE). Say,
    for example, the machine-ID's serving to both identify the architecture
    and the endianess.

    Though, for plain COFF it lacks other magic numbers, and a lot of tools interpret a COFF or PE/COFF with an unknown machine ID as an unknown format.


    Whatever the technical merits of different byte orders may be (and the
    names "big-endian" and "little-endian" already indicate that far more
    discussion has been expended on the topic than these merits justify
    <https://en.wikipedia.org/wiki/Lilliput_and_Blefuscu#History_and_politics>),
    little-endian has won, and that's its major merit, and big-endian's
    major demerit.

    Yup. I really wish the arguments about which order is "more natural"
    would stop since they're just people's cultural preconceptions. I
    imagine that if my first language were Arabic or Hebrew, I would find
    left-to-right big-endian core dumps much less readable than the
    familiar looking right-to-left little-endian ones.

    Top to bottom works for Japanese and Chinese. Yet I hear not
    appetite for TB byte order.


    Also IIRC, can note that the current "most significant digit first"
    ordering was itself partly a historical artifact:
    The number notation (along with algebraic notation) was partly derived
    from imported Arabic stuff.

    They wrote right-to-left, westerners wrote left to write.
    When imported, the notation kept the same relative order (so, were not
    flipped to match the writing order). So effectively everyone in the
    western world is using them backwards of the ordering in the original
    context in which they were developed.

    Effectively, the numbers are little endian when read right to left, or
    big endian when read left to right.

    In this case, it could be argued that little endian is more natural...

    Well, and/or that hex-dumps should have been right to left so that the
    digits would have come out in the expected order for little endian
    systems (nevermind if then all of the ASCII text would be backwards).

    Then again, roman numerals:
    IV=4, VI=6: Decrement on Left, Increment on Right
    But, MCV=1105
    Bigger precedes smaller.
    And, MCM=1900
    With an order violation encoding a decrement.
    ...

    Well, and classical Greek numerals were also written starting at the
    highest digit:
    alpha-theta: 1..9
    ioata-koppa: 10, 20, 30, ...
    rho-sampi: 100, 200, 300, ...
    ...

    Etc...


    So, the western world may have already had a preference for BE first,
    then when importing the Arabic numeral system, keeping the original
    digit order on paper would have made more sense than transposing it to
    match the writing system.


    But as you correctly said, the fight is over, little-endian has won,
    let's argue about something else.

    IEN 137 said everything worth saying about this topic 45 years ago.

    https://www.rfc-editor.org/ien/ien137.txt


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sun Oct 12 19:31:11 2025
    From Newsgroup: comp.arch


    Michael S <already5chosen@yahoo.com> posted:

    On Sun, 12 Oct 2025 13:36:51 GMT
    anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

    Michael S <already5chosen@yahoo.com> writes:
    On Sun, 12 Oct 2025 11:38:39 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:

    Michael S <already5chosen@yahoo.com> schrieb:
    Arm.

    That does not have many architectural features from the 6502 :-)

    It has the same byte order.

    Which is what is relevant for the question at hand. The intention of
    the ARM architects was to produce a CPU for their successor of the BBC Micro, and they certainly mentioned the prominent role of the 6502 as inspiration in their accounts; they obviously did not try to create a 32-bit 6502, but at least they did not change the byte order.

    CZVN flags are superficially similar, although there is an important >difference - on ARM Z flag is not affected by non-arithmetic >instructions.

    What about the other flags?

    Sorry, my mistake. On 6502 Z is not the only flag that is affected by non-arithmetic instructions. N is affected as well.
    Also, apart fron different flags-handling by INC/DEC, which is
    fully expected, there are differences in Logical, shift and evenin
    compare instruuctions.

    Just more reasons either to have::
    a) a bit in the instruction that controls whether flags are modified
    OR
    b) no condition codes at all

    In Athlon and Opteron there was more Reservation Station logic for flags
    than for operands {logic not flip-flops}

    So, the two architectures are more far apart in flags handling then I thought.

    Convinient reference here: http://www.6502.org/users/obelisk/6502/instructions.html


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Oct 12 20:03:21 2025
    From Newsgroup: comp.arch

    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    If the only thing wrong with the software is that it does not work on >>>>>>> big-endian systems, and little-endian has won, is there really
    anything wrong with the software?

    A type mismatch? I think so.

    If there is really something wrong with the software on little-endian >>>>> systems, you don't need a big-endian system to find the mistake.

    Would you consider a type mistake (access through the wrong type
    of pointer, say store a value to char * and read via int *) to
    be an error or not, if it is not directly observable on limited
    number of test runs on a little-endian system? Your comment would >>>>suggest not.

    If no test can be devised that shows unintended behaviour on the
    little-endian system, then I consider the program as delivered to be
    working.

    That isn't what I was saying.

    Correct: That's what I am saying.

    If a test can be devised that shows unintended behaviour on the
    little-endian system, then there is no need for testing on a
    big-endian system.

    Testing, by its very nature, is incomplete. The theoretical
    possibility that a test can be derived does not help in practice.

    Maybe not, but that's not my point: If no such test can be devised,
    would you call it a bug? Why?

    As for practice: Does testing on big-endian systems help in practice?
    Not in my experience.

    And, of course, your experience is all-encompassing and the whole
    source of wisdom, at least as far as your know.

    Then again, I know that you do not care for anything liek standards
    adherence or portability, as long as your own personal pet projects
    are running well. This just confirms it.

    https://developer.arm.com/documentation/102376/0200/Alignment-and-endianness/Endianness
    says endianness can be configurable (unless you mean something else
    by A64).

    Which has zero relevance, because everyone in their right mind
    configures their machine little-endian. >>><https://wiki.debian.org/ArmPorts> says:

    That's circular reasoning.

    You may think so,

    Definitely.

    but the lack of big-endian ARM systems makes my
    point.

    Not really. Why did the ARM architects put this in?
    They need not have done so...
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From John Levine@johnl@taugh.com to comp.arch on Sun Oct 12 21:07:25 2025
    From Newsgroup: comp.arch

    According to Thomas Koenig <tkoenig@netcologne.de>:
    John Levine <johnl@taugh.com> schrieb:

    But as you correctly said, the fight is over, little-endian has won,
    let's argue about something else.

    There is something to be said for at least having a big-endian
    system around to test programs: If people mismatch types, there
    is a chance that it will blow up on a big-endian system and work
    silently on a little-endian system.

    I'd think that linux on Hercules, the open source IBM mainframe emulator, would do the
    trick. It really works, not super fast, but so what.
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun Oct 12 21:07:15 2025
    From Newsgroup: comp.arch

    Thomas Koenig <tkoenig@netcologne.de> writes:
    [configurable byte order]
    Why did the ARM architects put this in?
    They need not have done so...

    It's cheap to add (at least the cheapo version, and I expect that's
    the one that ARM provied), several other architectures supported it,
    and when they added this feature, it was not clear that little-endian
    would win.

    And Linksys actually used big-endian mode in their NSLU2 NAS
    (discontinued 2008), so maybe Intel got a customer thanks to this
    feature of ARM (or maybe they would have gone with the Xscale CPU
    anyway, and used it little-endian if the big-endian mode had not
    existed).

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Robert Swindells@rjs@fdy2.co.uk to comp.arch on Mon Oct 13 17:26:00 2025
    From Newsgroup: comp.arch

    On Sun, 12 Oct 2025 21:07:15 GMT, Anton Ertl wrote:

    Thomas Koenig <tkoenig@netcologne.de> writes:
    [configurable byte order]
    Why did the ARM architects put this in?
    They need not have done so...

    It's cheap to add (at least the cheapo version, and I expect that's the
    one that ARM provied), several other architectures supported it, and
    when they added this feature, it was not clear that little-endian would
    win.

    And Linksys actually used big-endian mode in their NSLU2 NAS
    (discontinued 2008), so maybe Intel got a customer thanks to this
    feature of ARM (or maybe they would have gone with the Xscale CPU
    anyway, and used it little-endian if the big-endian mode had not
    existed).

    The Intel IXP CPU in the NSLU2 device was designed for networking
    applications, it ran in big-endian mode by default to reduce byte
    swapping of IP buffers.

    They also had network offload coprocessors that looked at the same
    data.

    --- Synchronet 3.21a-Linux NewsLink 1.2