• Re: what's a mainframe, was is Vax addressing sane today

    From Lynn Wheeler@21:1/5 to Terje Mathisen on Fri Sep 13 09:54:45 2024
    Terje Mathisen <terje.mathisen@tmsw.no> writes:
    Novell's System Fault Tolerant NetWare 386 (around 1990) supported two complete servers acting like one, so that any hardware component could
    fail and the system would keep running, with nothing noticed by the
    clients, even those that were in the middle of an update/write
    request.

    late 80s, get HA/6000 project, originally for NYTimes to move their
    newspaper system (ATEX) off VAXCluster to RS/6000. I then rename it
    HA/CMP when I start doing technical/scientific scale-up with national
    labs and commercial scale-up with RDBMS vendors (Oracle, Sybase,
    Informix, Ingres) that had VAXCluster support in same source base with
    Unix (I do distributed lock manager that supported VAXCluster semantics
    to ease ports). https://en.wikipedia.org/wiki/IBM_High_Availability_Cluster_Multiprocessing

    IBM had been marketing S/88, rebranded fault tolerant. Then the S/88
    product administer starts taking us around to their customers. https://en.wikipedia.org/wiki/Stratus_Technologies
    Also has me write a section for the corporate continuous availability
    strategy document ... however, it gets pulled when both Rochester
    (AS/400, I-systems) and POK (mainframe) complain that they couldn't meet
    the requirements.

    Early Jan92 in meeting with Oracle CEO, AWD/Hester tells Ellison that we
    would have 16processor clusters by mid92 and 128processor clusters by
    ye92. Within a couple weeks (end jan92), cluster scale-up is transferred
    for announce as IBM Supercomputer (scientific/technical *ONLY*) and we
    are told we can't work on anything with more than four processors (we
    leave IBM a few months later).

    --
    virtualization experience starting Jan1968, online at home since Mar1970

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lynn Wheeler@21:1/5 to John Levine on Fri Sep 13 10:38:40 2024
    John Levine <johnl@taugh.com> writes:
    That's fine for workloads that work that way.

    Airline reservation systems historically ran on mainframes because when they were invented
    that's all there was (original SABRE ran on two 7090s) and they are business critical so
    they need to be very reliable.

    About 30 years ago some guys at MIT realized that route and fare search, which are some of
    the most demanding things that CRS do, are easy to parallelize and don't have to be
    particularly reliable -- if your search system crashes and restarts and reruns the search
    and the result is a couple of seconds late, that's OK. So they started ITA software which
    used racks of PC servers running parallel applications written in Lisp (they were from
    MIT) and blew away the competition.

    However, that's just the search part. Actually booking the seats and selling tickets stays
    on a mainframe or an Oracle system because double booking or giving away free tickets would
    be really bad.

    There's also a rule of thumb about databases that says one system of performance 100 is
    much better than 100 systems of performance 1 because those 100 systems will spend all
    their time contending for database locks.

    after leaving IBM was brought into largest airline res system to look
    ten impossible things they can't do. Got started with "ROUTES" (about
    25% of the mainframe workload), they gave me a full softcopy of OAG (all scheduled commercial flt segments in the world) ... couple weeks later
    came back with ROUTES that implemented their impossible things.
    Mainframe had tech trade-offs from the 60s and started from scratch
    could make totally different tech trade-offs, initially ran 100 times
    faster, then implementing the impossible stuff and still ran ten times
    faster (than their mainframe systems). Showed that ten rs6000/990 could
    handle workload for every flt and every airline in the world.

    Part of the issue was that they extensively massaged the data on a
    mainframe MVS/IMS system and then in sunday night, rebuilt the mainframe
    "TPF" (limited datamanagement services) system from the MVS/IMS
    system. That was all eliminated.

    Fare search was harder because it started being "tuned" by some real
    time factors.

    Could move all to RS/6000 - HA/CMP. Then some very non-technical issues kicked-in (like large staff involved in the data massaging). trivia: I
    had done a bunch of slight of hand for HA/CMP RDBMS distributed lock
    manager scaleup for 128-processor clusters.


    --
    virtualization experience starting Jan1968, online at home since Mar1970

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Michael S on Fri Sep 13 21:43:06 2024
    Michael S <already5chosen@yahoo.com> schrieb:
    On Fri, 13 Sep 2024 11:20:06 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:

    Terje Mathisen <terje.mathisen@tmsw.no> schrieb:

    10-15 years ago I talked to another speaker at a conference, he
    told me that he was working on high-end open source LDAP software
    using _very_ large memory DBs: Their system allowed one US cell
    phone company to keep every SIM card (~100M) on a single system,
    while a similar-size competitor had been forced to fall back on
    17-way sharding (presumably using a hash of the SIM id).

    Keeping databases in memory is definitely a thing now... see SAP HANA.

    Any architectural implications for this?

    Browsing through the SAP pages, it seems they used Intel's Optane
    persistent memory, but that is no longer manufactured (?). But
    having fast, persistent storage is definitely an advantage for
    databases.

    Large memory: Of course.

    On the ISA level... these databases run on x86, so that seems to
    be good enough.

    Anything else?


    Another thing that SAP HANA seems to use more intensely than anybody
    else is Intel TSX. TSX (at least RTM part, I am not sure about HLE
    part) still present in the latest Xeon generation, but is strongly de-emphasized.

    Sounds like a market niche... Mitch, how good is your ESM for
    in-memory databases?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Lynn Wheeler on Fri Sep 13 22:14:11 2024
    On Fri, 13 Sep 2024 09:05:33 -1000, Lynn Wheeler wrote:

    I had also started pontificating the relative disk throughput had gotten
    an order of magnitude slower (disks got 3-5 times faster while systems
    got 40-50 times faster) since 360 announce.

    Out of curiosity, did you have figures on how closely the filesystem could
    get to using all of theoretical disk I/O bandwidth?

    I ask because, in the Unix world, this was pretty terrible until
    Berkeley’s FFS (“Fast File System”) came along.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to already5chosen@yahoo.com on Fri Sep 13 21:50:15 2024
    It appears that Michael S <already5chosen@yahoo.com> said:
    There's also a rule of thumb about databases that says one system of
    performance 100 is much better than 100 systems of performance 1
    because those 100 systems will spend all their time contending for
    database locks.

    How many transactions per minute does world's biggest company need at
    peak hours?

    Ten years ago Visa could process 56,000 messages/second. It must be a
    lot more now. I think a transaction is two or four messages depending
    on the transaction type.

    Is not this number small relatively to capabilities of
    even 15 y.o. dual-Xeon server with few dozens of spinning rust disks?

    Uh, no, it is not.


    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Thomas Koenig on Fri Sep 13 23:12:24 2024
    On Fri, 13 Sep 2024 21:43:06 +0000, Thomas Koenig wrote:

    Michael S <already5chosen@yahoo.com> schrieb:
    On Fri, 13 Sep 2024 11:20:06 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:

    Anything else?


    Another thing that SAP HANA seems to use more intensely than anybody
    else is Intel TSX. TSX (at least RTM part, I am not sure about HLE
    part) still present in the latest Xeon generation, but is strongly
    de-emphasized.

    Sounds like a market niche... Mitch, how good is your ESM for
    in-memory databases?

    I do not think the in-memory part has anything to do with ESM
    ATOMIC behavior.

    I have no actual data, all I have is mental analyses.

    The real think about ESM is that it allows one to code in such a way
    as to need FEWER ATOMIC events--because each event can do more work;
    so, thereby one needs fewer events.

    1) You can acquire several cache lines and perform a single event
    that would take a more typical ISA multiple ATIMOIC instructions.
    This attacks the exponent of how rapidly things degrade under
    contention.

    2) secondly if a higher privilege thread contends with a lower thread
    the higher privileged thread wins.

    3) amongst equally privileged threads the one(s) that have made more
    forward progress succeed while those just getting started fail.

    4) There are ways for SW to get a count of the amount of interference
    and each thread choose more wisely such that contention is reduced
    on subsequent tries. There are some ATOMIC things for which this takes
    a BigO( n**3 ) and makes it BigO( 3 ) {yes constant time}. A more
    typical; use with new contenders coming and going randomly goes from
    BIgO( n**3 ) to between BigO( n*ln(ln(n)) ) and BigO( n*ln(n) ).

    HOWEVER:: if one uses ESM to simply implement locking behavior; only
    part 1) above applies. That is if one uses ESM to create you standard {test&set, test*test*set, LoadLocked-StoreCOnditional, CAS, DCAS,
    DCADS, TCADS,...} to get a performing kernel that depends on how
    the SW is written, not necessarily how HW performs ESM.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Michael S on Sat Sep 14 01:47:03 2024
    On Fri, 13 Sep 2024 12:22:17 +0300, Michael S wrote:

    How many transactions per minute does world's biggest company need at
    peak hours?

    A few years ago, I read an article about Facebook’s setup. At the time,
    they had about a billion users who were active at least once a month. So
    that would have been over 300 postings per second, sustained.

    They were using MySQL with memcached, and I think they already had HHVM
    (their custom PHP implementation) then as well.

    Mainframes? Never heard of them.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Thomas Koenig on Sat Sep 14 01:44:14 2024
    On Fri, 13 Sep 2024 11:20:06 -0000 (UTC), Thomas Koenig wrote:

    Keeping databases in memory is definitely a thing now... see SAP HANA.

    memcached might have been there before SAP.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to John Dallman on Sat Sep 14 01:48:26 2024
    On Fri, 13 Sep 2024 16:18 +0100 (BST), John Dallman wrote:

    So there's real demand for systems with huge capacity. Not very many of
    them, but they have large budgets.

    Did somebody say “cloud” ... ?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to John Levine on Sat Sep 14 09:21:46 2024
    John Levine <johnl@taugh.com> writes:
    It appears that Michael S <already5chosen@yahoo.com> said:
    There's also a rule of thumb about databases that says one system of
    performance 100 is much better than 100 systems of performance 1
    because those 100 systems will spend all their time contending for
    database locks.

    How many transactions per minute does world's biggest company need at
    peak hours?

    Ten years ago Visa could process 56,000 messages/second. It must be a
    lot more now. I think a transaction is two or four messages depending
    on the transaction type.

    Is not this number small relatively to capabilities of
    even 15 y.o. dual-Xeon server with few dozens of spinning rust disks?

    Uh, no, it is not.

    The way I would design this for a machine with that little IOPS is as
    an in-memory database, with transactions written to a log on RAID-1
    (on two or three of the HDDs), and a snapshot of the in-memory
    database written to disk repeatedly, with copy-on-write to get a
    consistent snapshot. The 8 cores of a 2009-vintage the dual-Xeon
    machine should be easily capable of doing it, but the question is if
    the machine has enough RAM for the database. Our dual-Xeon system
    from IIRC 2007 has 24GB of RAM, not sure how big it could be
    configured; OTOH, we have a single-Xeon system from 2009 or so with
    32GB of RAM (and there were bigger Xeons in the market at the time).

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Anton Ertl on Sat Sep 14 09:59:58 2024
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

    [in-memory database]

    but the question is if
    the machine has enough RAM for the database. Our dual-Xeon system
    from IIRC 2007 has 24GB of RAM, not sure how big it could be
    configured; OTOH, we have a single-Xeon system from 2009 or so with
    32GB of RAM (and there were bigger Xeons in the market at the time).

    The minimum requirement of SAP HANA is 64 GB of memory, but typical
    ranges are from 256GB to 1TB.

    Interestingly enough, it will run on selected systemw, which only
    have Intel processors, and little-endian POWER 8 to 10. No AMD,
    no ARM, no zSystem.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to John Levine on Sat Sep 14 09:42:00 2024
    On Fri, 13 Sep 2024 21:50:15 -0000 (UTC), John Levine wrote:

    Ten years ago Visa could process 56,000 messages/second.

    That maybe sounds better than it is. After all, most of those transactions would tend to be geographically localized.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kent Dickey@21:1/5 to Anton Ertl on Fri Sep 20 18:35:26 2024
    In article <2024Sep10.094353@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    Brett <ggtgp@yahoo.com> writes:
    Speaking of complex things, have you looked at Swift output, as it checks >>all operations for overflow?

    You could add an exception type for that, saving huge numbers of correctly >>predicted branch instructions.

    The future of programming languages is type safe with checks, you need to >>get on that bandwagon early.

    MIPS got on that bandwagon early. It has, e.g., add (which traps on
    signed overflow) in addition to addu (which performs modulo
    arithmetic). It has been abandoned and replaced by RISC-V several
    years ago.

    Alpha got on that bandwagon early. It's a descendent of MIPS, but it
    renamed add into addv, and addu into add. It has been canceled around
    the year 2000.

    [ More details about architectures without trapping overflow instructions ]

    Trapping on overflow is basically useless other than as a debug aid,
    which clearly nobody values. If you take Rust's approach, and only
    detect overflow in debug builds, then you already don't care about
    performance.

    If you want to do almost anything at all other than core dump on
    overflow, you need to branch to recovery code. And although it's
    theoretically possible to recover from the trap, it's worse than any
    other approach. So it's added hardware that's HARDER for software to
    use. No surprise it's gone away.

    IA64 went down this road--trapping on speculation failures. It was a
    huge disaster--trying to recover through an exception handler mechanism
    is slow and painful, for the reasons I'll lay out for overflow
    exceptions.

    Let's look at how you might want to handle overflows when they happen:

    1) Your language supports seemlessly transitioning to BigInts on
    overflow. Then each operation that could overflow needs to call
    a special bit of code to change to BigInt and then continue the
    calculation. This code must exist, even if a trapping
    instruction doesn't need an explicit branch to it. Some
    mechanism is needed to call this code.

    2) You need to call an exception handler, and the routine with the overflow
    is ended. We need to know which exception handler to call.

    3) You want to clamp the value to a reasonable range and continue. The
    reasonable values need to be looked up somewhere.

    4) You just want to crash the program. If a debugger is attached, it can
    say where the overflow occurred.

    Trapping on overflow instructions really are only useful for #4. Let's
    look at how the other cases could be handled, with a) meaning using
    branches, and b) mean using a trapping instruction.

    1a) (BigInt): After doing an operation which could overflow, use a
    conditional branch to jump to code to convert to BigInt, which
    then jumps back. Overhead is basically the branch-on-overflow
    instruction.

    1b) (BigInt with traps). Hardware traps to the OS, which needs to prepare
    the required structures describing the exception (all regs and
    the address), and then call the signal handler. The signal
    handler needs to look up the address of the trap with a table
    describing what to do for this particular operation which
    overflowed. Each table entry needs to describe, in detail, what
    registers are involved (the sources and the dest), and where to
    return once the BigInt has been created. This requires massive
    changes to the compiler (and possibly linker) to prepare these
    tables. The compiler must guarantee that changing the dest
    register to a pointer to BigInt works properly (otherwise,
    special code needs to be emitted for each potentially trapping
    instruction to try to recover).

    2a) (Try/Catch): After doing an operation which could overflow, use a
    conditional branch to jump to the catch block.

    2b) (Try/Catch with traps). Repeat all the OS work and call the signal
    handler. Now, it just needs a table entry describing where to
    jump to to enter the catch block. Almost all the complexity of
    1b), but without needing the register details.

    3a) (Clamp): After doing an operation which could overflow, use a
    conditional branch to do the MIN/MAX operations to bring it back
    within range and then jump back.

    3b) (Clamp with trap): Basically the same as 1b), but there's an alternative
    if the clamps are global (MAX_INT/MIN_INT). The exception handler
    can read the instruction which trapped, figure out the source and
    dest registers, re-do the calculation, and clamp the destination
    to MIN or MAX, and return to just after the instruction which
    trapped.

    4a) (Crash): Every operation could overflow needs a conditional branch
    after it to branch to a crashing instruction (or a branch over
    an undefined instruction if there's no overflow).

    4b) (Crash with trap): Use operations which trap on overflow. This takes
    no new instructions and costs no performance.

    Basically, all a) cases are:

    op_with_might_overflow();
    if(overflow_happened) {
    handle the overflow
    }

    Trapping-on-overflow instructions are clearly useless for languages
    which care about overflow. To save one branch instruction, an entry is
    needed to describe how to handle the overflow, which is certainly larger
    than a branch instruction. And the code to "handle the overflow" is
    needed in any case. And this assume some sort of instant lookup--of the
    1000 overflow instructions, we need a hash table to look up the address,
    which is more overhead.

    Trapping on overflow instructions are useful as a debug aid for
    languages which don't care about overflow--but then you're optimizing
    something nearly useless. It also might be helpful if global clamping
    to MIN/MAX was useful (and I don't think it is).

    Instruction sets which make detecting overflow difficult (say, RISC-V),
    would do well to make branch-on-overflow efficient and easy. But adding trap-on-overflow instructions is a waste of effort.

    Note that using traps on data access violations which are "fixed" by
    signal handlers CAN work out. They are slow, but as long as the
    exception handler can fix the access violation and return right to the instruction which failed (without needing to know ANYTHING about that instruction in particular), this can work fine. But integer overflow
    doesn't work like that--it's generally not possible to figure out
    in the trap handler what to do without more information.

    Kent

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Kent Dickey on Fri Sep 20 22:00:28 2024
    On Fri, 20 Sep 2024 18:35:26 +0000, Kent Dickey wrote:

    In article <2024Sep10.094353@mips.complang.tuwien.ac.at>,

    Alpha got on that bandwagon early. It's a descendent of MIPS, but it >>renamed add into addv, and addu into add. It has been canceled around
    the year 2000.

    [ More details about architectures without trapping overflow
    instructions ]

    Trapping on overflow is basically useless other than as a debug aid,
    which clearly nobody values. If you take Rust's approach, and only
    detect overflow in debug builds, then you already don't care about performance.

    If you want to do almost anything at all other than core dump on
    overflow, you need to branch to recovery code. And although it's theoretically possible to recover from the trap, it's worse than any
    other approach. So it's added hardware that's HARDER for software to
    use. No surprise it's gone away.

    Note: Linux does not even have an "Integer Overflow" signal, while
    it does have a "FP exception" signal.

    But then IEEE 754 exception semantics make even less sense than
    Linux signals. ...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Sat Sep 21 01:09:43 2024
    On Fri, 20 Sep 2024 22:00:28 +0000, MitchAlsup1 wrote:

    But then IEEE 754 exception semantics make even less sense than Linux signals. ...

    Note that what IEEE 754 calls an “exception” is just a bunch of status
    bits reporting on the current state of the computation: there is no
    implication of some transfer of control elsewhere.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Kent Dickey on Sat Sep 21 01:12:11 2024
    On Fri, 20 Sep 2024 18:35:26 -0000 (UTC), Kent Dickey wrote:

    3) You want to clamp the value to a reasonable range and continue. The
    reasonable values need to be looked up somewhere.

    This won’t work. The values outside the range are by definition non- representable, so comparisons against them are useless.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Lawrence D'Oliveiro on Sat Sep 21 01:52:32 2024
    On Sat, 21 Sep 2024 1:09:43 +0000, Lawrence D'Oliveiro wrote:

    On Fri, 20 Sep 2024 22:00:28 +0000, MitchAlsup1 wrote:

    But then IEEE 754 exception semantics make even less sense than Linux
    signals. ...

    Note that what IEEE 754 calls an “exception” is just a bunch of status bits reporting on the current state of the computation: there is no implication of some transfer of control elsewhere.

    Then how do you implement the alternate exception model ???
    which IS part of 754-2008 and 754-2019

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Lawrence D'Oliveiro on Sat Sep 21 01:51:21 2024
    On Sat, 21 Sep 2024 1:12:11 +0000, Lawrence D'Oliveiro wrote:

    On Fri, 20 Sep 2024 18:35:26 -0000 (UTC), Kent Dickey wrote:

    3) You want to clamp the value to a reasonable range and continue. The
    reasonable values need to be looked up somewhere.

    This won’t work. The values outside the range are by definition non- representable, so comparisons against them are useless.

    When a range is 0..10 both -1 and 11 are representable in
    the arithmetic of ALL computers, just not in the language
    specifying the range.

    So you are talking a language issue not a computer arithmetic
    issue.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Niklas Holsti@21:1/5 to All on Sat Sep 21 10:56:24 2024
    On 2024-09-21 4:51, MitchAlsup1 wrote:
    On Sat, 21 Sep 2024 1:12:11 +0000, Lawrence D'Oliveiro wrote:

    On Fri, 20 Sep 2024 18:35:26 -0000 (UTC), Kent Dickey wrote:

    3) You want to clamp the value to a reasonable range and continue.  The >>>     reasonable values need to be looked up somewhere.

    This won’t work. The values outside the range are by definition non-
    representable, so comparisons against them are useless.

    When a range is 0..10 both -1 and 11 are representable in
    the arithmetic of ALL computers, just not in the language
    specifying the range.


    For "11" I agree, for "-1" disagree.

    if the program was written (in whatever language) with the assumption
    that the data type in question is unsigned, then it cannot represent -1
    in the program's view of the bits. The bits that represent -1 in a
    signed two's complement view represent a large positive value in the
    unsigned view that the code uses.

    Now if the error condition that was trapped or detected was an attempt
    to produce a negative value like -1 for an unsigned data type, that
    error condition is of course representable separately; it does not have
    to be encoded by an out-of-range value in the data type itself.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Sat Sep 21 08:17:17 2024
    On Sat, 21 Sep 2024 01:52:32 +0000, MitchAlsup1 wrote:

    On Sat, 21 Sep 2024 1:09:43 +0000, Lawrence D'Oliveiro wrote:

    On Fri, 20 Sep 2024 22:00:28 +0000, MitchAlsup1 wrote:

    But then IEEE 754 exception semantics make even less sense than Linux
    signals. ...

    Note that what IEEE 754 calls an “exception” is just a bunch of status >> bits reporting on the current state of the computation: there is no
    implication of some transfer of control elsewhere.

    Then how do you implement the alternate exception model ??? which IS
    part of 754-2008 and 754-2019

    Section 8.3 of the 2008 spec says:

    NOTE 2 — Immediate alternate exception handling for an exception
    can be implemented by traps or, for exceptions listed in Clause 7
    other than underflow, by testing status flags after each operation
    or at the end of the associated block. Thus for exceptions listed
    in Clause 7 other than underflow, immediate exception handling can
    be implemented with the same mechanism as delayed exception
    handling, if no better implementation mechanism is available.

    So explicit testing of flag bits is permitted. Note that the special case
    for underflow mentioned is that the exception signalled is “inexact”, not “underflow”.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Sat Sep 21 08:18:06 2024
    On Sat, 21 Sep 2024 01:51:21 +0000, MitchAlsup1 wrote:

    On Sat, 21 Sep 2024 1:12:11 +0000, Lawrence D'Oliveiro wrote:

    On Fri, 20 Sep 2024 18:35:26 -0000 (UTC), Kent Dickey wrote:

    3) You want to clamp the value to a reasonable range and continue.
    The reasonable values need to be looked up somewhere.

    This won’t work. The values outside the range are by definition non-
    representable, so comparisons against them are useless.

    When a range is 0..10 both -1 and 11 are representable in the arithmetic
    of ALL computers, just not in the language specifying the range.

    That’s an ”out of subrange” error, not an “overflow” error.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From George Neuner@21:1/5 to ThatWouldBeTelling@thevillage.com on Fri Sep 27 19:52:58 2024
    On Wed, 25 Sep 2024 12:54:18 -0400, EricP
    <ThatWouldBeTelling@thevillage.com> wrote:

    For me error detection of all kinds is useful. It just happens
    to not be conveniently supported in C so no one tries it in C.

    GCC's -trapv option is not useful for a variety of reasons.
    1) its slow, about 50% performance hit
    2) its always on for a compilation unit which is not what programmers need
    as it triggers for many false positives so people turn it off.

    Things like that are why some companies have a code policy that allows
    just one function per file.

    Still a problem if you need <whatever the relevant flag does> only in
    one or a few places.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to EricP on Sat Sep 28 02:25:21 2024
    On Thu, 26 Sep 2024 13:13:02 -0400, EricP wrote:

    I've always paid for mine. My first C compiler came with the WinNT 3.5
    beta in 1992 for $99 and came with the development kit,
    editor, source code debugger, tools, documentation.
    A few hundred bucks is not going to hurt my business.

    Given that GCC offers more features and generates better code than MSVC,
    the money may not matter to your business, but the quality of the product
    will.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to John Dallman on Fri Oct 4 15:07:17 2024
    jgd@cix.co.uk (John Dallman) writes:
    In article <2024Oct3.085754@mips.complang.tuwien.ac.at>, >anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

    If the RISC companies failed to keep up, they only have themselves to
    blame. It seems to me that a number of RISC companies had difficulties
    with managing the larger projects that the growing die areas allowed.

    Another contributing factor was Itanium, which was quite successful at >disrupting the development cycles of the RISC architectures.

    That's the question. It seems to me that many struggled even before,
    and jumped ship to IA-64 ASAP.

    Alpha suffered from DEC's mis-management, which led to DEC being taken
    over by Compaq. They killed Alpha when Itanium first became to work, and >before it was clear that it was a turkey.

    Alpha suffered before. The 21264 was late, and did not keep up in the
    clock race. While they had higher clock rates than the competition up
    to the EV56 (1996), the OoO EV6 appeared with a lower clock rate than
    the in-order EV56 (while the OoO Pentium Pro had a higher clock rate
    than the in-order Pentium available at the same time), and did not
    scale as well with smaller processes as the Intel and AMD CPUs, which
    were making huge strides in those years. Intel then had the 2000MHz
    Pentium 4, and AMD the 1200MHz Athlon in 2000 (and 1400MHz by the time
    Alpha was canceled); unfortunately, release dates for EV6 variants at
    different clock rates are not documented on Wikipedia, so
    unfortunately I cannot make a table of Alpha vs. Intel and AMD clock
    rates by year.

    PA-RISC was intended by HP to be replaced by Itanium. They managed that,
    but their success was limited because Linux on x86-64 was so much more >cost-effective.

    Reportedly they thought early on that they could not afford to keep
    their own line competetive, so they started the IA-64 project with
    Intel. Interestingly, they also designed the OoO PA-8000, which was
    introduced at the same time as the Pentium Pro, and they used the same microarchitectur until they introduced the PA-8900 almost 10 years
    later, which showed a more evolutionary approach than most others used
    in those years.

    IBM kept POWER development going through the Itanium period, which is a >significant reason why it's still going.

    With the Power 4+ (2003) it also got competetive clock rates
    (although, judging by the PowerPC 970, I wonder what the IPC was).

    SGI went into Itanium hard and neglected MIPS development, which never >recovered. It had been losing in the performance race anyway.

    The followon project "Beast" for the R10000 failed (was canceled), and
    then SGI management was happy to jump ship to Itanium, and in the
    meantime they only respun the R10000 into R12000, R14000, R16000.

    Sun kept SPARC development going, but made a different mistake, by
    spreading their development resources over too many projects. The ones
    that succeeded did so too slowly, and they fell behind.

    Intel, HP, SGI and AMD went to OoO in 1995/1996, Alpha in 1998, Power
    at the latest with Power3 in 1998, only Sun kept doing in-order stuff,
    and took until 2011 to finally get an OoO CPU out the door in the form
    of the SPARC T4 (their Rock project was also OoO, but was canceled).
    They also had relatively low clock rates before that (which changed
    with the SPARC T5). Fujitsu managed better, introducing the OoO
    SPARC64 V in 2002, and also with competetive clock rates.

    Also, Linux ate
    their web-infrastructure market rather quickly.

    Well, SPARC survived much longer than most others, despite being
    technically a lot behind.

    Power still survives, maybe only because it has a common basis with
    iSeries (or whatever it is called now). Similarly, s390x survives
    because of its software legacy.

    Linux could not have had the success it did without the large range of >powerful and cheap hardware designed to run Windows.

    It was first developed on a 386, and many of the early co-developers
    also had IA-32 machines. But the 386 certainly was not designed to
    run Windows. The 386 project was finished before Windows 1.0 was
    released in November 1985, and nobody used Windows 1.0 or 2.0, so why
    would anybody design a processor for those? Windows became only
    popular with 3.0 in 1990 (after the release of the 486, which was
    therefore not designed for Windows, either). When I bought my first
    PC (with a 486) in 1993, it ran DOS (for games) and Linux (for
    everything else).

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Anton Ertl on Fri Oct 4 19:44:40 2024
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

    Alpha suffered before. The 21264 was late, and did not keep up in the
    clock race.

    https://www.star.bnl.gov/public/daq/HARDWARE/21264_data_sheet.pdf
    gives the clock rate as varying between 466 and 600 MHz, and
    Wikipedia gives the clock frequency of the Pentium Pro as between
    150 and 200 MHz. The Pentium II Overdrive, according to Wikipedia,
    had up to 333 MHz.

    Is this information wrong?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Dallman@21:1/5 to D'Oliveiro on Fri Oct 4 21:53:00 2024
    In article <vdnef0$3uaeh$5@dont-email.me>, ldo@nz.invalid (Lawrence
    D'Oliveiro) wrote:

    On Thu, 3 Oct 2024 23:49 +0100 (BST), John Dallman wrote:

    Given all of IBM's missteps, it's mildly surprising they got that
    one right. Even a stopped clock is right once a day ...

    IBM doesn't often repeat a mistake. They're made all the ordinary ones,
    so nowadays they usually invent new ones.

    SGI decided to embrace the platform that was eating their market,
    and try to sell Windows NT boxes. Trouble is, those NT boxes, while
    only a fraction of the cost of an IRIX-based product, still cost
    about 3╫ what other NT machines were going for.

    SGI had a lengthy internal conflict about Windows NT. One group of pro-NT people left and founded NetPower, whose idea was to build really fast workstations running NT on MIPS. We had one for a while, and were
    persuading Microsoft to fix a bug from the MIPSPro code generator for the
    third time (we'd also had it on DEC MIPS/Ultrix, and SGI Irix) when the
    Pentium Pro was released, and NetPower suddenly went very quiet.

    Then there were the SGI Visual Workstations, which ran NT on x86. The
    first generation of them were quite nice, but needed a very custom HAL,
    and hence couldn't be upgraded to later versions of Windows once SGI
    abandoned them.

    The later generations were ordinary PCs - the one I had as a deskside for
    a while was made by Mitsubishi - with an Nvidia graphics card. The only
    SGI added value was their OpenGL driver, and that didn't seem to justify
    the price if you were buying them.

    By this time, SGI had a department of downsizing, whose job was to get
    rid of departments and sites. Being an American company, this department
    fought for power and budget share, and nobody inside the company seemed
    to think that this would spell doom for SGI.

    They could still have sold SPARC hardware running Linux. I can
    remember comments saying Linux ran better on that hardware than
    Sun's own SunOS/Solaris did.

    They would not have faced up to that. There was an interesting incident
    with Solaris on x86. Since the Linux and Solaris kernel interfaces are
    somewhat similar, somebody at Sun decided to try making the Solaris
    kernel capable of acting as a Linux kernel, so that they could run a
    Linux userland and applications on the same machine as the Solaris
    userland and applications.

    So they hired some Linux people, but they didn't get good ones. A year
    later, their Linux people came back to Sun with a huge set of patches
    that amounted to patching a lot of the Linux kernel into Solaris, and
    didn't do it at all well. The Solaris kernel people looked at it a bit
    and said "Hell, no! This will destabilise Solaris!" They weren't
    exaggerating.

    So that year was wasted, and the project was restarted with some of the
    Solaris people involved, to explain how their kernel worked. Quite a
    while later you could install the 32-bit Red Hat Enterprise Linux 3.0
    userland and most application would run, but not all. This was not a
    success, and was dropped.

    John

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to John Dallman on Fri Oct 4 21:37:25 2024
    jgd@cix.co.uk (John Dallman) writes:
    In article <vdnef0$3uaeh$5@dont-email.me>, ldo@nz.invalid (Lawrence >D'Oliveiro) wrote:

    On Thu, 3 Oct 2024 23:49 +0100 (BST), John Dallman wrote:

    Then there were the SGI Visual Workstations, which ran NT on x86. The
    first generation of them were quite nice, but needed a very custom HAL,
    and hence couldn't be upgraded to later versions of Windows once SGI >abandoned them.

    I left in early 2000, just after it was introduced. I was using
    an 2P octane at the time (with the 24" sony monitor).


    By this time, SGI had a department of downsizing, whose job was to get
    rid of departments and sites. Being an American company, this department >fought for power and budget share, and nobody inside the company seemed
    to think that this would spell doom for SGI.

    They [SGI ed.] could still have sold SPARC hardware running Linux. I can
    remember comments saying Linux ran better on that hardware than
    Sun's own SunOS/Solaris did.

    They would not have faced up to that.

    Some of the SGI engineers were fond of loud noises, and one day took
    a sun pizza box into the parking lot with some m-80's. Got
    a visit a bit later from the secret service as AF-1 was next
    door at moffett that day.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Thomas Koenig on Fri Oct 4 21:48:12 2024
    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

    Alpha suffered before. The 21264 was late, and did not keep up in the
    clock race.

    https://www.star.bnl.gov/public/daq/HARDWARE/21264_data_sheet.pdf
    gives the clock rate as varying between 466 and 600 MHz, and
    Wikipedia gives the clock frequency of the Pentium Pro as between
    150 and 200 MHz. The Pentium II Overdrive, according to Wikipedia,
    had up to 333 MHz.

    Is this information wrong?

    No, but it misses context: The Pentium Pro was available in late 1995.
    The 21264 was officially available in 1998, but when we ordered a
    machine with a 500MHz 21264 (and needed it delivered before the end of
    the year for budget reasons), they delivered a machine with a 21164a,
    and then in the next year upgraded it to the 21264 (which probably
    meant replacing the motherboard, not just the CPU package).

    Intel released a 450MHz Pentium II in 1998, and the 500MHz Pentium III
    on February 28, 1999. AMD released the 600MHz Athlon in June 23,
    1999, and won the GHz race with the 1000MHz Athlon in March 6, 2000,
    with Intel's Pentium III following in March 8. Meanwhile, the Alphas
    could not keep up in MHz numbers, but I have no firm dates, only
    memories from that time.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Anton Ertl on Fri Oct 4 22:49:26 2024
    On Fri, 4 Oct 2024 7:05:34 +0000, Anton Ertl wrote:

    George Neuner <gneuner2@comcast.net> writes:
    <snipping>

    I don't agree with all of that, however. E.g., when discussing a VAX instruction similar to IA-32's REP MOVS, he considers it to be a big advantage that the operands of REP MOVS are in registers. That
    appears wrong to me; you either have to keep REP MOVS in decoding (and
    thus stop decoding any later instructions) until you know the value of
    that register coming out of the OoO engine, making REP MOVS a mostly serializing instruction. Or you have a separate OoO logic for REP
    MOVS that keeps generating loads and stores inside the OoO engine. If
    you have the latter in the VAX, it does not make much difference if
    the operand is on a register or memory. The possibility of trapping
    during REP MOVS (or the VAX variant) complicates things, though: the
    first part of the REP MOVS has to be committed, and the registers
    written to the architectural state, and then execution has to start
    again with the REP MOVS. Does not seem much harder on the VAX to me, however.

    My 66000 has a MemMove instruction consisting of a 1 word instruction,
    that leaves DECODE and enters into one MEMory unit, where it proceeds
    to AGEN and Read, AGEN and Write, leaving the rest of the function
    units proceeding to whatever is next.

    One thing I did different, here, none of the 3 registers is modified,
    yet I retain the ability to take exception and re-play the instruction
    from where it left off {in state never visible to the instruction
    stream except via DECODE stage.}

    - anton

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Chris M. Thomasson on Fri Oct 4 22:54:55 2024
    On Fri, 4 Oct 2024 19:36:41 +0000, Chris M. Thomasson wrote:

    On 10/3/2024 11:36 PM, Chris M. Thomasson wrote:
    On 10/3/2024 9:23 PM, George Neuner wrote:
    On Fri, 4 Oct 2024 00:48:43 -0000 (UTC), Lawrence D'Oliveiro
    <ldo@nz.invalid> wrote:

    On Thu, 03 Oct 2024 06:57:54 GMT, Anton Ertl wrote:

    If the RISC companies failed to keep up, they only have themselves to >>>>> blame.

    That’s all past history, anyway. RISC very much rules today, and it
    is x86
    that is struggling to keep up.

    You are, of course, aware that the complex "x86" instruction set is an
    illusion and that the hardware essentially has been a load-store RISC
    with a complex decoder on the front end since the Pentium Pro landed
    in 1995.

    Yeah. Wrt memory barriers, one is allowed to release a spinlock on "x86"
    with a simple store.

    The fact that one can release a spinlock using a simple store means that
    its basically load-acquire release-store.

    So a load will do a load then have an implied acquire barrier.

    A store will do an implied release barrier then perform the store.

    How does the store know it needs to do this when the locking
    instruction is more than a pipeline depth away from the
    store release ?? So, Locked LD (or something) happens at
    1,000,000 cycles, and the corresponding store happens at
    10,000,000 cycles (9,000,000 locked).

    This release behavior is okay for releasing a spinlock with a simple
    store, MOV.

    It may be OK to SW but it causes all kinds of grief to HW.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to mitchalsup@aol.com on Fri Oct 4 23:30:03 2024
    mitchalsup@aol.com (MitchAlsup1) writes:
    On Fri, 4 Oct 2024 7:05:34 +0000, Anton Ertl wrote:

    George Neuner <gneuner2@comcast.net> writes:
    <snipping>


    My 66000 has a MemMove instruction consisting of a 1 word instruction,
    that leaves DECODE and enters into one MEMory unit, where it proceeds
    to AGEN and Read, AGEN and Write, leaving the rest of the function
    units proceeding to whatever is next.

    One thing I did different, here, none of the 3 registers is modified,
    yet I retain the ability to take exception and re-play the instruction
    from where it left off {in state never visible to the instruction
    stream except via DECODE stage.}

    What happens if the exception handler reschedules the CPU to
    a different task before returning from the exception?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From George Neuner@21:1/5 to Anton Ertl on Sat Oct 5 00:13:24 2024
    On Fri, 04 Oct 2024 07:05:34 GMT, anton@mips.complang.tuwien.ac.at
    (Anton Ertl) wrote:

    George Neuner <gneuner2@comcast.net> writes:
    You are, of course, aware that the complex "x86" instruction set is an >>illusion and that the hardware essentially has been a load-store RISC
    with a complex decoder on the front end since the Pentium Pro landed
    in 1995.

    Repeating nonsense does not make it any truer, and this nonsense has
    been repeated since at least the Pentium Pro (1995), maybe already
    since the 486 (1989). CISC and RISC are about the instruction set,
    not about the implementation. And even if you look at the
    implementation, it's not true: The P6 has microinstructions that are
    ~100 bits long, whereas RISCs have 32-bit and 16-bit instructions.
    The K7 has load-store microinstructions; RISCs don't have that.

    Anton, you know very well that the hardware does not execute the "x86" instruction set but only /emulates/ it. The decoder translates x86 instructions into sequences of microinstructions that perform the
    equivalent operations. The fact that some simple instructions
    translate one to one does not change this.


    In more recent CPUs, AMD tends to work with macro-instructions between
    the decoder and the reorder buffer (i.e., in the part that in the
    Pentium Pro may have been used as the justification for the RISC
    claim); these macro instructions are load-and-op and read-modify-write >instructions.

    John Mashey has written about the difference between CISC and RISC
    repeatedly <https://homepages.cwi.nl/%7Erobertl/mash/RISCvsCISC>, and
    he gives good criteria for classifying instruction sets as RISC or
    CISC, and by his criteria the 80286 and IA-32 instruction sets of the
    Pentium Pro clearly both are CISCs. I have recently ><2024Jan12.145502@mips.complang.tuwien.ac.at> used his criteria on >instruction sets that Mashey did not classify (mostly because they
    were done after his table), and by these criteria AMD64 is clearly a
    CISC, while ARM A64 and RISC-V are clearly RISCs.

    In searching for whether he has written something specific about
    IA-32, I found <https://yarchive.net/comp/vax.html>, which is an
    earlier instance of the recent discussion of whether it would have
    been better for DEC to stick with VAX, do an OoO implementation and
    extend the architecture to 64 bits, like Intel has done: ><https://yarchive.net/comp/vax.html>. He also discusses the problems
    of IA-32 there, but mainly in pointing out how much smaller they were
    than the VAX ones.

    I don't agree with all of that, however. E.g., when discussing a VAX >instruction similar to IA-32's REP MOVS, he considers it to be a big >advantage that the operands of REP MOVS are in registers. That
    appears wrong to me; you either have to keep REP MOVS in decoding (and
    thus stop decoding any later instructions) until you know the value of
    that register coming out of the OoO engine, making REP MOVS a mostly >serializing instruction. Or you have a separate OoO logic for REP
    MOVS that keeps generating loads and stores inside the OoO engine. If
    you have the latter in the VAX, it does not make much difference if
    the operand is on a register or memory. The possibility of trapping
    during REP MOVS (or the VAX variant) complicates things, though: the
    first part of the REP MOVS has to be committed, and the registers
    written to the architectural state, and then execution has to start
    again with the REP MOVS. Does not seem much harder on the VAX to me, >however.

    - anton

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to George Neuner on Sat Oct 5 08:01:23 2024
    George Neuner <gneuner2@comcast.net> writes:
    On Fri, 04 Oct 2024 07:05:34 GMT, anton@mips.complang.tuwien.ac.at
    (Anton Ertl) wrote:

    George Neuner <gneuner2@comcast.net> writes:
    You are, of course, aware that the complex "x86" instruction set is an >>>illusion and that the hardware essentially has been a load-store RISC >>>with a complex decoder on the front end since the Pentium Pro landed
    in 1995.

    Repeating nonsense does not make it any truer, and this nonsense has
    been repeated since at least the Pentium Pro (1995), maybe already
    since the 486 (1989). CISC and RISC are about the instruction set,
    not about the implementation. And even if you look at the
    implementation, it's not true: The P6 has microinstructions that are
    ~100 bits long, whereas RISCs have 32-bit and 16-bit instructions.
    The K7 has load-store microinstructions; RISCs don't have that.

    Anton, you know very well that the hardware does not execute the "x86" >instruction set but only /emulates/ it. The decoder translates x86 >instructions into sequences of microinstructions that perform the
    equivalent operations. The fact that some simple instructions
    translate one to one does not change this.

    I know that the hardware does not execute the "x86" instruction set,
    because there is no "x86" instruction set. There is the 80286
    instruction set, the IA-32 instruction set, and the AMD64 instruction
    set (and the boundary between 286 and IA-32 is squishy, but that
    between those and AMD64 is hard).

    As for the point you are trying to make, I know quite a bit about how
    the instruction execution is implemented on various IA-32 and AMD64 implementations. Whether you call it execution or emulation, IA-32
    and AMD64 are still the instruction sets of all of them, and there is
    no way to execute (or emulate) other instruction sets, and no way to
    run programs written in macro-ops, micro-ops, ROPs, or whatever they
    may be called. That's even true for the Transmeta implementations
    (although doing other instruction sets would have been possible there
    and IIRC was demonstrated once). Moreover, these
    implementation-specific things change from one implementation to the
    next, and that includes the implementations by Transmeta.

    For the 6502 or the MIPS R2000 we don't consider the instruction set
    to be emulated, either, and they have a decoder that translates the instructions into sequences of signals to various units (i.e., microinstructions), too.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Lawrence D'Oliveiro on Fri Oct 11 14:20:20 2024
    On 11/10/2024 03:46, Lawrence D'Oliveiro wrote:
    On Mon, 7 Oct 2024 22:26:58 +0300, Michael S wrote:

    On Mon, 7 Oct 2024 17:38:54 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:

    ARM was rather late to the RISC game, this might have been literally
    true.

    ARM was rather early to the RISC game. Shipped for profit since late
    1986.

    Shipped in an actual PC, the Acorn Archimedes range.

    That was the first time I ever saw a 3D shaded rendition of a flag waving,
    on a computer, generated in real time. No other machine could do it,
    unless you got up to the really expensive Unix workstation class (e.g.
    SGI, custom Evans & Sutherland hardware etc).

    The Acorn Archimedes was /way/ ahead of anything in the PC / x86 world,
    both in hardware and software. It could emulate an 80286 PC almost as
    fast as real PC's that you could buy at the time for a higher price than
    the Archimedes.

    The demo that impressed me most was drawing full-screen Mandelbrot sets
    in a second or two, compared to several minutes for a typical PC at the
    time. It meant you could do real-time zooming and flying around in the set.

    My first encounter with ARM assembly was enhancing that demo program for
    higher screen resolution and deeper zooming.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Terje Mathisen on Sat Oct 12 08:23:39 2024
    Terje Mathisen <terje.mathisen@tmsw.no> writes:
    Maybe all add/sub/etc opcodes that are immediately followed by an INTO=20 >could be fused into a single ADDO/SUBO/etc version that takes zero extra =

    cycles as long as the trap part isn't hit?

    On Intel P-cores add/inc/sub etc. has been fused with a following
    JO/JNO into one uop for quite a while (I guess since Sandy Bridge
    (2011)).

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to EricP on Sat Oct 12 08:45:57 2024
    EricP <ThatWouldBeTelling@thevillage.com> writes:
    But then, risc processors mostly, started using exceptions for housekeeping
    - SPARC for register window sliding, Alpha for byte, word and misaligned >memory access

    On Alpha the assembler expands byte, word and unaligned access
    mnemonics into sequences of machine instructions; if you compile for
    BWX extensions, byte and word mnemonics get compiled into BWX
    instructions. If the machine does not have the BWX extensions and it encounters a BWX instruction, the result is an illegal instruction
    signal at least on Linux. This terminates your typical program, so
    it's not at all frequent.

    Concerning unaligned accesses, if you use a load or store that
    requires alignment, Digital OSF/1 (and the later versions with various
    names) by default produced a signal rather than fixing it up, so again
    programs are typically terminated, and the exception is not at all
    frequent. There is a system call and a tool (uac) that allows telling
    the OS to fix up unaligned accesses, but it played no role in my
    experience while I was still using Digital OSF/1 (including it's
    successors).

    On Linux the default behaviour was to fix up the unaligned accesses
    and to log that in the system log. There were a few such messages in
    the log per day, so that obviously was not a frequent occurence,
    either. I wrote a program that allowed me to change the behaviour <https://www.complang.tuwien.ac.at/anton/uace.c>, mainly because I
    wanted to get a signal when an unaligned access happens.

    As for the unaligned-access mnemonics, these were obviously barely
    used: I found that gas generates wrong code for ustq several years
    after Alpha was introduced, so obviously no software running under
    Linux has used this mnemonic.

    The solution for Alpha was to add back the byte and word instructions,
    and add misaligned access support to all memory ops.

    Alpha added BWX instructions, but not because it had used trapping to
    emulate them earlier; Old or portable binaries continued to use
    instruction sequences. Alpha traps when you do, e.g., an unaligned
    ldq in all Alpha implementations I have had contact with (up to a
    800MHz 21264B).

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to EricP on Sat Oct 12 09:18:23 2024
    EricP <ThatWouldBeTelling@thevillage.com> writes:
    Kent Dickey wrote:
    [...]
    GCC's -trapv option is not useful for a variety of reasons.
    1) its slow, about 50% performance hit
    2) its always on for a compilation unit which is not what programmers need
    as it triggers for many false positives so people turn it off.
    ...
    So why should any hardware include an instruction to trap-on-overflow?

    Because ALL the negative speed and code size consequences do not occur.

    Looking at <https://godbolt.org/z/oMhW55YsK> and selecting MIPS clang
    18.1.0, I get a 15-instruction sequence which does not include add
    (the trap-on-overflow version).

    MIPS gcc 14.2.0 generates a sequence that includes

    jal __addvsi3

    i.e., just as for x86-64. Similar for MIPS64 with these compilers.

    Interestingly, with RISC-V rv64gc clang 18.1.0, the sequence is much
    shorter than for MIPS clang 18.1.0, even though RV64GC has no specific
    way of checking overflow at all.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)