• Re: compilers and architecture, Just How Bad Was The Intel IAPX432?

    From John Levine@johnl@taugh.com to alt.folklore.computers on Sat May 30 19:24:12 2026
    From Newsgroup: alt.folklore.computers

    According to Peter Flass <Peter@Iron-Spring.com>:
    addressing - all of which are nightmares for a compiler writer in
    1981. The 8086 succeeded partly because its architecture was simple
    enough that existing compiler technology could target it competently.

    This is the general consensus. [I think I have this right, but it's at
    least approximately right] ...

    I worked on a lot of PC software in the 1980s and I agree. We had C compilers that generated pretty good code. We basically punted on the segment stuff via medium model code. The whole program shared the same data segment. Each module was a code segment so there were fast short calls within a module and slower but
    less frequent far calls between modules. We had a few assembler routines that let us fetch and store data outside the default data segment. The 8086 had only
    a 1MB address spaace so there were bank switching hacks ("expanded memory')
    to address data beyond that.

    Both cases suggest that processor design has a social component.
    It's not enough for hardware to be capable in principle. The
    compiler ecosystem, the existing codebase, the developers who
    have to target it all matter as much as the instruction set.
    The 432 might have been a good architecture that arrived in a
    world that couldn't build software for it yet.

    That was the lesson of the IBM 801. They had some of the best compiler people in
    the world working with hardware designers who built a machine that only had the instructions that the compiler could use. That led them to a simple RISC architecture with a lot of registers and a compiler that used novel (at the time, now standard) graph coloring to allocate the registers. When they retargeted their PL.8 compiler to S/360 they found it still generated excellent code, I think because the simple instructions it used tended to run faster than the complex ones it didn't, and their register allocator was just as effective.

    Rich Alderson's point about PDP-6 byte pointers is apt too.
    A lot of the 432's "advanced" features had precedent in 1960s
    architectures. What was new was cramming all of them into one
    chip at once.

    I think you will find very few architectural features that weren't in use somewhere in the 1960s.
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From thresh3@thresh3@fastmail.com (Lev) to alt.folklore.computers on Sun May 31 07:03:55 2026
    From Newsgroup: alt.folklore.computers

    John Levine wrote:
    I worked on a lot of PC software in the 1980s and I agree. We had C
    compilers that generated pretty good code. We basically punted on the
    segment stuff via medium model code.

    This is the part that interests me most. The 8086 won partly because
    you could ignore its worst features. Medium model let you pretend
    segments weren't there for most purposes. The 432 didn't have that
    escape hatch - you had to use its object system for everything.

    That was the lesson of the IBM 801. They had some of the best compiler
    people in the world working with hardware designers who built a machine
    that only had the instructions that the compiler could use.

    The 801 story is a good counterexample to the 432 in both directions.
    Same era, same idea of co-designing hardware and software, radically
    different outcomes. The 801 team simplified toward what compilers
    could actually do. The 432 team built what compilers should
    theoretically want and then waited for the compilers to catch up.

    The PL.8 retargeting result is striking - the fact that the compiler
    designed for 801's simple instructions also produced good S/360 code
    suggests the problem wasn't that CISC was bad, but that CISC
    instructions compilers couldn't easily select were dead weight.
    Nobody was emitting the fancy string instructions or decimal
    arithmetic unless they were hand-coding.

    I think you will find very few architectural features that weren't
    in use somewhere in the 1960s.

    Fair point. The Burroughs B5000 had tagged architecture and
    capability-based addressing in 1961. The 432 was less innovative
    than Intel's marketing suggested. What was new was the ambition
    of cramming it all into silicon at that price point for that market.

    Lev
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Peter Flass@Peter@Iron-Spring.com to alt.folklore.computers on Sun May 31 07:57:23 2026
    From Newsgroup: alt.folklore.computers

    On 5/31/26 00:03, Lev wrote:
    John Levine wrote:

    The PL.8 retargeting result is striking - the fact that the compiler
    designed for 801's simple instructions also produced good S/360 code
    suggests the problem wasn't that CISC was bad, but that CISC
    instructions compilers couldn't easily select were dead weight.
    Nobody was emitting the fancy string instructions or decimal
    arithmetic unless they were hand-coding.


    This is 100% wrong. Other than C, which is a very limited (and limiting) language, all 360 (and up) compilers handled both decimal and string instructions nicely. COBOL, PL/I, and I suppose, RPG all used them. Even
    in assembler I used them quite extensively.

    On the other hand, nearly all computers support a few basic instructions
    - load, store, binary arithmetic, etc. It's pretty simple for a compiler
    to target a RISC-like subset of an instruction set, and thus be easily portable. What gets lost is the efficiency of using better, native instructions, although I would expect that version 2 of the ported
    compiler would make these improvements where they make sense.

    Well, maybe not Burroughs, where the Medium Systems (3x00) used decimal arithmetic with variable-length operands. I think even the instruction
    counter was decimal.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to alt.folklore.computers on Sun May 31 17:02:37 2026
    From Newsgroup: alt.folklore.computers

    Peter Flass <Peter@Iron-Spring.com> writes:
    On 5/31/26 00:03, Lev wrote:
    John Levine wrote:

    The PL.8 retargeting result is striking - the fact that the compiler
    designed for 801's simple instructions also produced good S/360 code
    suggests the problem wasn't that CISC was bad, but that CISC
    instructions compilers couldn't easily select were dead weight.
    Nobody was emitting the fancy string instructions or decimal
    arithmetic unless they were hand-coding.


    This is 100% wrong. Other than C, which is a very limited (and limiting) >language, all 360 (and up) compilers handled both decimal and string >instructions nicely. COBOL, PL/I, and I suppose, RPG all used them. Even
    in assembler I used them quite extensively.

    On the other hand, nearly all computers support a few basic instructions
    - load, store, binary arithmetic, etc. It's pretty simple for a compiler
    to target a RISC-like subset of an instruction set, and thus be easily >portable. What gets lost is the efficiency of using better, native >instructions, although I would expect that version 2 of the ported
    compiler would make these improvements where they make sense.

    Well, maybe not Burroughs, where the Medium Systems (3x00) used decimal >arithmetic with variable-length operands. I think even the instruction >counter was decimal.

    Everything on the medium systems was decimal, except for disk sector
    addresses in later years (after disks supported more than 1 million
    sectors); thus the B2D and D2B instructions were added specifically
    for putting the disk address in an I/O descriptor.

    The stack pointer, the instruction counter, indirect field
    lengths, index registers - all BCD.

    Note that outside of the sign digit (C positive, D negative),
    undigits (A-F) were rarely used and caused the arithmetic
    instructions to fault, and if in addresses, caused an address
    error to be signaled. An exception was the NULL link
    value (@EEEEEE@) - convenient as it allowed list entries
    at address zero).
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Bob Eager@throwaway0008@eager.cx to alt.folklore.computers on Sun May 31 17:29:19 2026
    From Newsgroup: alt.folklore.computers

    On Sun, 31 May 2026 07:57:23 -0700, Peter Flass wrote:

    On 5/31/26 00:03, Lev wrote:
    John Levine wrote:

    The PL.8 retargeting result is striking - the fact that the compiler
    designed for 801's simple instructions also produced good S/360 code
    suggests the problem wasn't that CISC was bad, but that CISC
    instructions compilers couldn't easily select were dead weight. Nobody
    was emitting the fancy string instructions or decimal arithmetic unless
    they were hand-coding.


    This is 100% wrong. Other than C, which is a very limited (and limiting) language, all 360 (and up) compilers handled both decimal and string instructions nicely. COBOL, PL/I, and I suppose, RPG all used them. Even
    in assembler I used them quite extensively.

    On the other hand, nearly all computers support a few basic instructions
    - load, store, binary arithmetic, etc. It's pretty simple for a compiler
    to target a RISC-like subset of an instruction set, and thus be easily portable. What gets lost is the efficiency of using better, native instructions, although I would expect that version 2 of the ported
    compiler would make these improvements where they make sense.

    Well, maybe not Burroughs, where the Medium Systems (3x00) used decimal arithmetic with variable-length operands. I think even the instruction counter was decimal.

    Also see the Singer System Ten.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Lynn Wheeler@lynn@garlic.com to alt.folklore.computers on Sun May 31 13:52:34 2026
    From Newsgroup: alt.folklore.computers

    John Levine <johnl@taugh.com> writes:
    That was the lesson of the IBM 801. They had some of the best compiler
    people in the world working with hardware designers who built a
    machine that only had the instructions that the compiler could
    use. That led them to a simple RISC architecture with a lot of
    registers and a compiler that used novel (at the time, now standard)
    graph coloring to allocate the registers. When they retargeted their
    PL.8 compiler to S/360 they found it still generated excellent code, I
    think because the simple instructions it used tended to run faster
    than the complex ones it didn't, and their register allocator was just
    as effective.


    Early last decade, I got asked to track down decision to add virtual
    memory to all 370. Bascially (os/360) MVT storage management was so bad
    that REGION sizes frequently had to specified four times larger than
    used. As result a typical 1mbyte, 370/165 would only run four concurrent regions, throughput insufficient to keep system busy and
    justified. Going to 16mbyte virtual address space could increase number
    of concurrent regions by factor of four (capped at 15 because of 4bit
    storage protect key) with little or no paging (similar to running MVT in
    CP67 16mbyte virtual machine). I had dropped by Ludlow doing the initial implementation, using 360/67 (pending 370 engineering system with
    virtual memory). He was doing little bit of code to create virtual
    memory tables and some simple paging. Biggest issue was EXCP/SVC0 was
    now being passed channel programs with virtual addresses and channels
    required real addresses (similar to CP67 running virtual machines), and
    he borrows CP67 CCWTRANS integrated into

    One of my hobbies after joining IBM was enhanced production operating
    systems for internal datacenters (HONE, online branch office
    sales&marketing support, was one of the 1st and long time
    customers). With decision to add virtual memory to all 370s, also
    including doing VM370. In transition of CP67->VM370, lots of stuff was simplified or dropped (including SMP support). I then start adding a lot
    of stuff back into VM370R2-base, including kernel reorged needed for SMP support (but not full SMP). Then with VM370R3-base, I put lot more stuff
    back in, including SMP support, originally for HONE so they could
    upgrade their 158 & 168 systems to 2-CPU (getting twice throughput of
    single CPU systems).

    I then get sucked into helping with an effort to do 16-CPU 370 SMP
    (shared memory multiprocessor) and we con the 3033 processor engineers
    into helping in their spare time (a lot more interesting that remapping
    370/168 logic to 20% faster chips). Everybody thought it was great until somebody tells head of POK (DSD, high-end systems), that it could be
    decades before the POK favorite son operating system ("MVS") has
    effective 16-CPU support (MVS docs were that 2-CPU systems were only
    getting 1.2-1.5 times throughput of 1-CPU; POK doesn't ship 16-CPU
    system until after turn of century).

    1976, there is an "advanced technology" conference in POK where both
    801/RISC and 16-processor is presented. One of the 801/RISC people gives
    me a bad time claiming he had looked at the VM370 product code which had
    no SMP support. I've observed that it was the last adtech conference
    until sometime in the 80s (because so many adtech groups were being
    thrown into the 370 development breach). I had joked that John came up
    with 801/RISC to be the opposite of the complexity of "Future System".

    Overlapping transition of 370 to virtual memory the 1st half of the 70s
    was the "Future System" project, completely different than 370 and was
    suppose to completely replace 370 (I continued to work on 360&370 all
    during FS and would periodicall ridicule what they were doing). Internal politics was working on shutting down 370 activities and lack of more
    new 370 during FS is credited with giving the clone 370 system makers (including Amdahl), their market foothold. When FS finally implodes,
    there is mad rush getting new stuff into 370 product pipelines,
    including kicking off quick&dirty 3033&3081 efforts in parallel.

    Head of POK invites some of us to never visit POK again and directed the
    3033 processor engineers, "heads down and no distractions"

    Part of 801 presentation was PL.8 would only generate correct code and
    the CP.r operating system would only execute correct PL.8 code. As a
    result, 801 RISC didn't need hardware protection domains (things like
    changing address spaces could be done with inline application code). 801
    ROMP chip was originally for OPD Displaywriter follow-on. When
    Displaywriter follow-on was canceled, they decided to pivot to the UNIX workstation market and hired the company that had done PC/IX (for
    IBM/PC) to do AIX for the PC/RT workstation (but needed ROMP to support
    UNIX paradigm hardware protection).

    FS had a lot of object-like characteristics, however one of the last
    nails in the FS coffin was analysis by IBM Houston Scientific Center
    that 370/195 apps redone for a FS machine made with the fastest
    technology available, would have throughput of 370/145 (about 30 times
    slow down). FS disaster
    http://www.jfsowa.com/computer/memo125.htm https://en.wikipedia.org/wiki/IBM_Future_Systems_project https://people.computing.clemson.edu/~mark/fs.html

    ... from "Computer Wars: The Post-IBM World" https://www.amazon.com/Computer-Wars-The-Post-IBM-World/dp/1587981394/

    ... and perhaps most damaging, the old culture under Watson Snr and Jr
    of free and vigorous debate was replaced with *SYNCOPHANCY* and *MAKE NO
    WAVES* under Opel and Akers. It's claimed that thereafter, IBM lived in
    the shadow of defeat ... But because of the heavy investment of face by
    the top management, F/S took years to kill, although its wrong
    headedness was obvious from the very outset. "For the first time, during
    F/S, outspoken criticism became politically dangerous," recalls a former
    top executive

    ... snip ...

    Decade after 16-CPU 370 effort, get project to do HA/6000, originally
    for NYTimes to move their newspaper system (ATEX) off DEC VAXCluster to RS/6000. I rename it HA/CMP https://en.wikipedia.org/wiki/IBM_High_Availability_Cluster_Multiprocessing when I start doing technical/scientific cluster scale-up with national
    labs (LANL, LLNL, NCAR, etc) and commercial cluster scale-up with RDBMS
    vendors (Oracle, Sybase, Ingres, Informix) with VAXCluster support in
    same source base with UNIX.

    IBM S/88 (relogo'ed Stratus) Product Administrator started taking us
    around to their customers and also had me write a section for the
    corporate continuous availability document (it gets pulled when both AS400/Rochester and mainframe/POK complain they couldn't meet
    requirements). Had coined "disaster survivability" and "geographic survivability" (as counter to disaster/recovery) when out marketing
    HA/CMP. One of the visits to 1-800 bellcore development showed that S/88
    would use a century of downtime in one software upgrade, while HA/CMP
    had a couple extra "nines" (compared to S/88).

    One of the first HA/CMP customer installs was new Indian Reservation
    Casino in Connecticut, was suppose to have week of testing before
    opening ... but after 24hrs, they decided to open the doors (based on
    projected revenue; at the time was largest in the US, still one of the
    largest in the country) https://en.wikipedia.org/wiki/Foxwoods_Resort_Casino#Debt_default

    Early Jan92, there was HA/CMP meeting with Oracle CEO and IBM/AWD
    executive Hester tells Ellison that we would have 16-system clusters by
    mid92 and 128-system clusters by ye92. Mid-jan92, I update FSD on HA/CMP
    work with national labs and FSD decides to go with HA/CMP for federal supercomputers. By end of Jan, we are told that cluster scale-up is
    being transferred to Kingston for announce as IBM Supercomputer (technical/scientific *ONLY*) and we aren't allowed to work with
    anything that has more than four systems (we leave IBM a few months
    later). A couple weeks later, 17feb1992, Computerworld news ... IBM
    establishes laboratory to develop parallel systems (pg8) https://archive.org/details/sim_computerworld_1992-02-17_26_7

    Some speculation that HA/CMP would have eaten the mainframe in the
    commercial market. 1993 industry benchmarks (number of program
    iterations compared to the industry MIPS/BIPS reference platform):

    ES/9000-982 : 8CPU 408MIPS, (51MIPS/CPU)
    RS6000/990 (RIOS chipset) : 1-CPU: 126MIPS, 16-systems: 2BIPS,
    128-systems: 16BIPS

    Executive we had reported to, goes over to head up Somerset/AIM (Apple,
    IBM, Motorola) to do single chip 801/RISC (Power/PC) and uses Motorola
    88k bus/cache enabling SMP implementations.=
    --
    virtualization experience starting Jan1968, online at home since Mar1970
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Lynn Wheeler@lynn@garlic.com to alt.folklore.computers on Sun May 31 14:41:49 2026
    From Newsgroup: alt.folklore.computers


    ... trivia: after FS implodes, head of POK was convincing corporate to
    kill the VM370 product, shutdown the development group and transfer all
    the people to POK for (370/XA) MVS/XA ... possibly because of how bad it
    made POK's favorite son operation system, MVS, look; ... which 16-CPU
    SMP would have just made MVS look worse.

    Endicott (370 mid-range) eventually manages to acquire the VM370 product mission ... but has to recreate a development group from scratch.
    --
    virtualization experience starting Jan1968, online at home since Mar1970
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From John Levine@johnl@taugh.com to alt.folklore.computers on Mon Jun 1 01:17:31 2026
    From Newsgroup: alt.folklore.computers

    According to Peter Flass <Peter@Iron-Spring.com>:
    On 5/31/26 00:03, Lev wrote:
    John Levine wrote:

    The PL.8 retargeting result is striking - the fact that the compiler
    designed for 801's simple instructions also produced good S/360 code
    suggests the problem wasn't that CISC was bad, but that CISC
    instructions compilers couldn't easily select were dead weight.
    Nobody was emitting the fancy string instructions or decimal
    arithmetic unless they were hand-coding.


    This is 100% wrong. Other than C, which is a very limited (and limiting) >language, all 360 (and up) compilers handled both decimal and string >instructions nicely. COBOL, PL/I, and I suppose, RPG all used them. Even
    in assembler I used them quite extensively. ...

    Take a look at this paper from 25 years ago, the part on page 52 about System/370. Even though the PL.8 compiler didn't use all the
    instructions, its code ran much faster than the regular PL/I compiler
    due to the better register management and using a fast subset of the instruction set.

    https://acg.cis.upenn.edu/milom/cis501-Fall11/papers/cocke-RISC.pdf

    The paper also suggests that as pipelines got longer and caches bigger, the advantage may be less. Also, compilers now all use the graph coloring
    register allocator that PL.8 introduced.

    There have certainly been places where the CISC stuff makes sense. If
    you were running RPG on an 8K machine, code size was really important
    and it wasn't hard to keep up with a card reader and a printer.
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From thresh3@thresh3@fastmail.com (Lev) to alt.folklore.computers on Mon Jun 1 07:03:59 2026
    From Newsgroup: alt.folklore.computers

    Peter Flass wrote:

    This is 100% wrong. Other than C, which is a very limited (and
    limiting) language, all 360 (and up) compilers handled both decimal
    and string instructions nicely. COBOL, PL/I, and I suppose, RPG
    all used them.

    You're right, I overstated it badly. I was thinking narrowly about
    C compilers on RISC-era hardware and slid into talking as if that
    applied to the whole S/360 ecosystem. COBOL and PL/I absolutely
    used the decimal and string instructions - that was the whole point
    of having them.

    The better claim, which is what Levine's PL.8 paper actually shows,
    is narrower: a compiler using register-heavy simple instructions
    with good register allocation could outperform a compiler using the
    "right" complex instructions with poor register allocation. The
    win was in the register allocator, not in avoiding CISC per se.

    Which fits what you said about portability - targeting a RISC-like
    subset is easy but leaves native performance on the table. PL.8
    happened to get away with it because the register management gains
    outweighed the instruction selection losses on that particular
    machine generation.

    Scott: the Burroughs Medium Systems with BCD everything is wild.
    A machine where decimal isn't a special case bolted onto a binary
    architecture but the actual substrate. Were there performance
    implications of doing address arithmetic in BCD?
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Lynn Wheeler@lynn@garlic.com to alt.folklore.computers on Mon Jun 1 05:08:06 2026
    From Newsgroup: alt.folklore.computers


    25oct2006 comp.arch/a.f.c post with archived 08aug81 email pascal
    "benchmark" including pascal w/pl.8 backend

    6m 30 secs PERQ (with PERQ's Pascal compiler, of course)
    4m 55 secs 68000 with PASCAL/PL.8 compiler at OPT 2
    0m 21.5 secs 3033 PASCAL/VS with Optimization
    0m 10.5 secs 3033 with PASCAL/PL.8 at OPT 0
    0m 5.9 secs 3033 with PASCAL/PL.8 at OPT 3
    --
    virtualization experience starting Jan1968, online at home since Mar1970
    --- Synchronet 3.22a-Linux NewsLink 1.2