• Re: Why VAX Was the Ultimate CISC and Not RISC

    From Anton Ertl@21:1/5 to Lawrence D'Oliveiro on Sat Mar 1 11:58:17 2025
    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    Could the VAX have been designed as a
    RISC architecture to begin with? Because not doing so meant that, just
    over a decade later, RISC architectures took over the “real computer” >market and wiped the floor with DEC’s flagship architecture, >performance-wise.

    The answer was no, the VAX could not have been done as a RISC
    architecture. RISC wasn’t actually price-performance competitive until
    the latter 1980s:

    RISC didn’t cross over CISC until 1985. This occurred with the
    availability of large SRAMs that could be used for caches.

    Like other USA-based computer architects, Bell ignores ARM, which
    outperformed the VAX without using caches and was much easier to
    design.

    As for code size, we see significantly smaller code for RISC
    instruction sets with 16/32-bit encodings such as ARM T32/A32 and
    RV64GC than for all CISCs, including AMD64, i386, and S390x <2024Jan4.101941@mips.complang.tuwien.ac.at>. I doubt that VAX fares
    so much better in this respect that its code is significantly smaller
    than for these CPUs.

    Bottom line: If you sent, e.g., me and the needed documents back in
    time to the start of the VAX project, and gave me a magic wand that
    would convince the DEC management and workforce that I know how to
    design their next architecture, and how to compiler for it, I would
    give the implementation team RV32GC as architecture to implement, and
    that they should use pipelining for that, and of course also give that
    to the software people.

    As a result, DEC would have had an architecture that would have given
    them superior performance, they would not have suffered from the
    infighting of VAX9000 vs. PRISM etc. (and not from the wrong decision
    to actually build the VAX9000), and might still be going strong to
    this day. They would have been able to extend RV32GC to RV64GC
    without problems, and produce superscalar and OoO implementations.

    OTOH, DEC had great success with the VAX for a while, and their demise
    may have been unavoidable given their market position: Their customers (especially the business customers of VAXen) went to them instead of
    IBM, because they wanted something less costly, and they continued
    onwards to PCs running Linux when they provided something less costly.
    So DEC would also have needed to outcompete Intel and the PC market to
    succeed (and IBM eventually got out of that market).

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Sat Mar 1 08:09:35 2025
    Found this paper <https://gordonbell.azurewebsites.net/Digital/Bell_Retrospective_PDP11_paper_c1998.htm>
    at Gordon Bell’s website. Talking about the VAX, which was designed as
    the ultimate “kitchen-sink” architecture, with every conceivable
    feature to make it easy for compilers (and humans) to generate code,
    he explains:

    The VAX was designed to run programs using the same amount of
    memory as they occupied in a PDP-11. The VAX-11/780 memory range
    was 256 Kbytes to 2 Mbytes. Thus, the pressure on the design was
    to have very efficient encoding of programs. Very efficient
    encoding of programs was achieved by having a large number of
    instructions, including those for decimal arithmetic, string
    handling, queue manipulation, and procedure calls. In essence, any
    frequent operation, such as the instruction address calculations,
    was put into the instruction-set. VAX became known as the
    ultimate, Complex (Complete) Instruction Set Computer. The Intel
    x86 architecture followed a similar evolution through various
    address sizes and architectural fads.

    The VAX project started roughly around the time the first RISC
    concepts were being researched. Could the VAX have been designed as a
    RISC architecture to begin with? Because not doing so meant that, just
    over a decade later, RISC architectures took over the “real computer” market and wiped the floor with DEC’s flagship architecture, performance-wise.

    The answer was no, the VAX could not have been done as a RISC
    architecture. RISC wasn’t actually price-performance competitive until
    the latter 1980s:

    RISC didn’t cross over CISC until 1985. This occurred with the
    availability of large SRAMs that could be used for caches. It
    should be noted at the time the VAX-11/780 was introduced, DRAMs
    were 4 Kbits and the 8 Kbyte cache used 1 Kbits SRAMs. Memory
    sizes continued to improve following Moore’s Law, but it wasn’t
    till 1985, that Reduced Instruction Set Computers could be built
    in a cost-effective fashion using SRAM caches. In essence RISC
    traded off cache memories built from SRAMs for the considerably
    faster, and less expensive Read Only Memories that held the more
    complex instructions of VAX (Bell, 1986).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Anton Ertl on Sat Mar 1 17:59:51 2025
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    If you sent, e.g., me and the needed documents back in
    time to the start of the VAX project, and gave me a magic wand that
    would convince the DEC management and workforce that I know how to
    design their next architecture, and how to compiler for it, I would
    give the implementation team RV32GC as architecture to implement, and
    that they should use pipelining for that, and of course also give that
    to the software people.

    There was also the question of PDP-11 compatibility. I would solve
    that by adding a PDP-11 decoder that produces RV32G instructions (or
    maybe the microcode that the RV32G decoder produces). Low-end models
    may get a dynamic binary translator instead.

    OTOH, DEC had great success with the VAX for a while, and their demise
    may have been unavoidable given their market position: Their customers >(especially the business customers of VAXen) went to them instead of
    IBM, because they wanted something less costly, and they continued
    onwards to PCs running Linux when they provided something less costly.
    So DEC would also have needed to outcompete Intel and the PC market to >succeed (and IBM eventually got out of that market).

    OTOH, HP was also a big player in the mini and later workstation
    market, and they managed to survive, albeit by eventually splitting
    themselves into HPE for the big iron, and the other part for the PCs
    and printers. But it may be the exception that proves the rule.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Anton Ertl on Sat Mar 1 18:03:21 2025
    On Sat, 1 Mar 2025 11:58:17 +0000, Anton Ertl wrote:

    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    Could the VAX have been designed as a
    RISC architecture to begin with? Because not doing so meant that, just
    over a decade later, RISC architectures took over the “real computer” >>market and wiped the floor with DEC’s flagship architecture, >>performance-wise.

    The answer was no, the VAX could not have been done as a RISC
    architecture. RISC wasn’t actually price-performance competitive until >>the latter 1980s:

    RISC didn’t cross over CISC until 1985. This occurred with the
    availability of large SRAMs that could be used for caches.

    Like other USA-based computer architects, Bell ignores ARM, which outperformed the VAX without using caches and was much easier to
    design.

    Was ARM around when VAX was being designed (~1973) ??

    "The Case for the Reduced Instruction Set Computer" was after
    1980 as a point of temporal reference.

    As for code size, we see significantly smaller code for RISC
    instruction sets with 16/32-bit encodings such as ARM T32/A32 and
    RV64GC than for all CISCs, including AMD64, i386, and S390x <2024Jan4.101941@mips.complang.tuwien.ac.at>. I doubt that VAX fares
    so much better in this respect that its code is significantly smaller
    than for these CPUs.

    VAX's advantage was it executed fewer instructions (VAX only executed
    65% of the number of instructions R2000 executed.)

    My 66000 only needs 70% of the instructions RISC-V requires. Thus
    it is within spitting distance of VAX instruction count while still
    being almost a RISC architecture.

    Bottom line: If you sent, e.g., me and the needed documents back in
    time to the start of the VAX project, and gave me a magic wand that
    would convince the DEC management and workforce

    You would also have to convince the Computer Science department at
    CMU; Where a lot of VAX ideas were dreamed up based on the success
    of the PDP-11.

    that I know how to
    design their next architecture, and how to compiler for it, I would
    give the implementation team RV32GC as architecture to implement, and
    that they should use pipelining for that, and of course also give that
    to the software people.

    A pipelined machine in 1978 would have had 50% to 100% more circuit
    boards than VAX 11/780, making it a lot more expensive.

    As a result, DEC would have had an architecture that would have given
    them superior performance, they would not have suffered from the
    infighting of VAX9000 vs. PRISM etc. (and not from the wrong decision
    to actually build the VAX9000), and might still be going strong to
    this day. They would have been able to extend RV32GC to RV64GC
    without problems, and produce superscalar and OoO implementations.

    The design point you target for the original VAX would have taken
    significantly longer to design, debug, and ship.

    OTOH, DEC had great success with the VAX for a while, and their demise
    may have been unavoidable given their market position: Their customers (especially the business customers of VAXen) went to them instead of
    IBM, because they wanted something less costly, and they continued
    onwards to PCs running Linux when they provided something less costly.
    So DEC would also have needed to outcompete Intel and the PC market to succeed (and IBM eventually got out of that market).

    Unclear.

    - anton

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to mitchalsup@aol.com on Sat Mar 1 20:01:01 2025
    MitchAlsup1 <mitchalsup@aol.com> schrieb:
    On Sat, 1 Mar 2025 11:58:17 +0000, Anton Ertl wrote:

    Like other USA-based computer architects, Bell ignores ARM, which
    outperformed the VAX without using caches and was much easier to
    design.

    Was ARM around when VAX was being designed (~1973) ??

    ARM was designed starting in 1983, if Wikipedia is to be believed.

    The only ones experimenting (successfully) with RISC at the time
    the VAX was designed were IBM with the 801, and they were kept
    from realizing their full potential by IBM's desire to not hurt
    their /370 business.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to Lawrence D'Oliveiro on Sat Mar 1 14:40:55 2025
    Lawrence D'Oliveiro wrote:
    Found this paper <https://gordonbell.azurewebsites.net/Digital/Bell_Retrospective_PDP11_paper_c1998.htm>
    at Gordon Bell’s website. Talking about the VAX, which was designed as
    the ultimate “kitchen-sink” architecture, with every conceivable
    feature to make it easy for compilers (and humans) to generate code,
    he explains:

    The VAX was designed to run programs using the same amount of
    memory as they occupied in a PDP-11. The VAX-11/780 memory range
    was 256 Kbytes to 2 Mbytes. Thus, the pressure on the design was
    to have very efficient encoding of programs. Very efficient
    encoding of programs was achieved by having a large number of
    instructions, including those for decimal arithmetic, string
    handling, queue manipulation, and procedure calls. In essence, any
    frequent operation, such as the instruction address calculations,
    was put into the instruction-set. VAX became known as the
    ultimate, Complex (Complete) Instruction Set Computer. The Intel
    x86 architecture followed a similar evolution through various
    address sizes and architectural fads.

    The VAX project started roughly around the time the first RISC
    concepts were being researched. Could the VAX have been designed as a
    RISC architecture to begin with? Because not doing so meant that, just
    over a decade later, RISC architectures took over the “real computer” market and wiped the floor with DEC’s flagship architecture, performance-wise.

    The answer was no, the VAX could not have been done as a RISC
    architecture. RISC wasn’t actually price-performance competitive until
    the latter 1980s:

    RISC didn’t cross over CISC until 1985. This occurred with the
    availability of large SRAMs that could be used for caches. It
    should be noted at the time the VAX-11/780 was introduced, DRAMs
    were 4 Kbits and the 8 Kbyte cache used 1 Kbits SRAMs. Memory
    sizes continued to improve following Moore’s Law, but it wasn’t
    till 1985, that Reduced Instruction Set Computers could be built
    in a cost-effective fashion using SRAM caches. In essence RISC
    traded off cache memories built from SRAMs for the considerably
    faster, and less expensive Read Only Memories that held the more
    complex instructions of VAX (Bell, 1986).

    If you look at the VAX 8800 or NVAX uArch you see that even in 1990 it was still taking multiple clocks to serially decode each instruction and
    that basically stalls away any benefits a pipeline might have given.

    If they had just only put in *the things they actually use*
    (as show by DEC's own instruction usage stats from 1982),
    and left out all the things that they rarely or never use,
    it would have had 50 or so opcodes instead of 305,
    at most one operand that addressed memory on arithmetic and logic opcodes
    with 3 address modes (register, register address, register offset address) instead of 0 to 5 variable length operands with 13 address modes each
    (most combinations of which are either silly, redundant, or illegal).

    Then they would have be able to parse instructions in one clock,
    which makes pipelining a possible consideration,
    and simplifies the uArch so now it can all fit on one chip,
    which allows it to complete with RISC.

    The reason it was designed the way it was, was because DEC had
    microcode and microprogramming on the brain.
    In this 1975 paper Bell and Strecher say it over and over and over.
    They were looking at the cpu design as one large parsing machine
    and not as a set of parallel hardware tasks.

    This was their mental mindset just before they started the VAX design:

    What Have We Learned From PDP11, Bell Strecker, 1975 https://gordonbell.azurewebsites.net/Digital/Bell_Strecker_What_we%20_learned_fm_PDP-11c%207511.pdf

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Sat Mar 1 20:46:29 2025
    According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
    The answer was no, the VAX could not have been done as a RISC
    architecture. RISC wasn’t actually price-performance competitive until >>the latter 1980s:

    RISC didn’t cross over CISC until 1985. This occurred with the
    availability of large SRAMs that could be used for caches.

    Like other USA-based computer architects, Bell ignores ARM, which >outperformed the VAX without using caches and was much easier to
    design.

    That's not a fair comparison. VAX design started in 1975 and shipped in 1978. The first ARM design started in 1983 with working silicon in 1985. It was a decade later.

    On the other hand, I think some things were shortsighted even at the time. As Bell's paper said, they knew about Moore's law but didn't believe it. If they believed it they could have made the instructions a little less dense and a lot easier to decode and pipeline. STRETCH did pipelining in the 1950s so they should have been aware of it and considered that future machines could use it.

    As someeone else noted, they had microcode on the brain and the VAX instruction set is clearly designed to be decoded by microcode one byte at a time. Address modes can have side-effects so you have to decode them serially or have a big honking hazard scheme. They probably also assumed that microcode ROM would
    be faster than RAM which even in 1975 was not particularly true. Rather than putting every possible instruction into microcode, have a fast subroutine call and make them subsroutines which can be cached and pipelined.



    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to EricP on Sat Mar 1 23:19:24 2025
    On Sat, 01 Mar 2025 14:40:55 -0500, EricP wrote:

    If you look at the VAX 8800 or NVAX uArch you see that even in 1990 it
    was still taking multiple clocks to serially decode each instruction and
    that basically stalls away any benefits a pipeline might have given.

    How many clocks did Alpha take to process each instruction? Because I
    recall the initial chips had clock speeds several times that of the RISC competition, but performance, while competitive, was not several times
    greater.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to EricP on Sat Mar 1 22:30:32 2025
    On Sat, 1 Mar 2025 19:40:55 +0000, EricP wrote:

    Lawrence D'Oliveiro wrote:
    Found this paper
    <https://gordonbell.azurewebsites.net/Digital/Bell_Retrospective_PDP11_paper_c1998.htm>
    at Gordon Bell’s website. Talking about the VAX, which was designed as
    the ultimate “kitchen-sink” architecture, with every conceivable
    feature to make it easy for compilers (and humans) to generate code,
    he explains:

    The VAX was designed to run programs using the same amount of
    memory as they occupied in a PDP-11. The VAX-11/780 memory range
    was 256 Kbytes to 2 Mbytes. Thus, the pressure on the design was
    to have very efficient encoding of programs. Very efficient
    encoding of programs was achieved by having a large number of
    instructions, including those for decimal arithmetic, string
    handling, queue manipulation, and procedure calls. In essence, any
    frequent operation, such as the instruction address calculations,
    was put into the instruction-set. VAX became known as the
    ultimate, Complex (Complete) Instruction Set Computer. The Intel
    x86 architecture followed a similar evolution through various
    address sizes and architectural fads.

    The VAX project started roughly around the time the first RISC
    concepts were being researched. Could the VAX have been designed as a
    RISC architecture to begin with? Because not doing so meant that, just
    over a decade later, RISC architectures took over the “real computer”
    market and wiped the floor with DEC’s flagship architecture,
    performance-wise.

    The answer was no, the VAX could not have been done as a RISC
    architecture. RISC wasn’t actually price-performance competitive until
    the latter 1980s:

    RISC didn’t cross over CISC until 1985. This occurred with the
    availability of large SRAMs that could be used for caches. It
    should be noted at the time the VAX-11/780 was introduced, DRAMs
    were 4 Kbits and the 8 Kbyte cache used 1 Kbits SRAMs. Memory
    sizes continued to improve following Moore’s Law, but it wasn’t
    till 1985, that Reduced Instruction Set Computers could be built
    in a cost-effective fashion using SRAM caches. In essence RISC
    traded off cache memories built from SRAMs for the considerably
    faster, and less expensive Read Only Memories that held the more
    complex instructions of VAX (Bell, 1986).

    If you look at the VAX 8800 or NVAX uArch you see that even in 1990 it
    was
    still taking multiple clocks to serially decode each instruction and
    that basically stalls away any benefits a pipeline might have given.

    If they had just only put in *the things they actually use*
    (as show by DEC's own instruction usage stats from 1982),
    and left out all the things that they rarely or never use,
    it would have had 50 or so opcodes instead of 305,
    at most one operand that addressed memory on arithmetic and logic
    opcodes
    with 3 address modes (register, register address, register offset
    address)
    instead of 0 to 5 variable length operands with 13 address modes each
    (most combinations of which are either silly, redundant, or illegal).

    Excepting for the 1 memory operand per instruction, the above para-
    graph accurately describes My 66000 ISA.

    Then they would have be able to parse instructions in one clock,
    which makes pipelining a possible consideration,
    and simplifies the uArch so now it can all fit on one chip,
    which allows it to complete with RISC.

    If VAX had stuck with PDP-11 address modes and simply added the
    {Byte, Half, Word, Double} accesses it would have been a lot easier
    to pipeline.

    The reason it was designed the way it was, was because DEC had
    microcode and microprogramming on the brain.

    As did most of academia at the time.

    In this 1975 paper Bell and Strecher say it over and over and over.
    They were looking at the cpu design as one large parsing machine
    and not as a set of parallel hardware tasks.

    Orthogonality, Regularity, Expressibility, ...

    This was their mental mindset just before they started the VAX design:

    What Have We Learned From PDP11, Bell Strecker, 1975 https://gordonbell.azurewebsites.net/Digital/Bell_Strecker_What_we%20_learned_fm_PDP-11c%207511.pdf

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to John Levine on Sat Mar 1 22:25:26 2025
    John Levine <johnl@taugh.com> writes:
    According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
    The answer was no, the VAX could not have been done as a RISC >>>architecture. RISC wasn’t actually price-performance competitive until >>>the latter 1980s:

    RISC didn’t cross over CISC until 1985. This occurred with the
    availability of large SRAMs that could be used for caches.

    Like other USA-based computer architects, Bell ignores ARM, which >>outperformed the VAX without using caches and was much easier to
    design.

    That's not a fair comparison. VAX design started in 1975 and shipped in 1978. >The first ARM design started in 1983 with working silicon in 1985. It was a >decade later.

    The point is that ARM outperformed VAX without using caches. DRAM
    with 800ns cycle time was available in 1971 (the Nova 800 used it).
    By 1977, when the VAX 11/780 was released, certainly faster DRAM was
    available.

    So I think that, for a VAX-11/780-priced machine, they could have had
    a pipelined RISC that reads instructions from two 32-bit-wide DRAM
    banks alternatingly, resulting in maybe 3-4 32-bits of instructions
    delivered per microsecond for straight-line code without loads or
    stores. And in RV32GC many instructions take only 16 bits, so these
    3-4 32-bits contain maybe 5-6 instructions. So that might be 5-6 peak
    MIPS, maybe 3 average MIPS, compared to 0.5 VAX MIPS. Some VAX
    instructions have to be replaced with several RISC instructions, so
    let's say these 3 RISC MIPS correspond to 2 VAX MIPS. That would
    still be faster than the VAX 11/780, which reportedly had about 0.5
    MIPS.

    The other thing is that the VAX 11/780 (released 1977) had a 2KB
    cache, so Bell's argument that caches were only available around 1985
    does not hold water on that end, either. So my 1977 RISC project
    would have used that cache, too, increasing the performance of the
    result even more.

    Yes, commercial RISCs only happened in 1986 or so, but there is no
    technical reason for that, only that commercial architects did not
    believe in such things at the time. It took research projects from
    several sources until the concept had enough credibility to be taken
    seriously. That's why I asked for the magic wand for my time-travel
    project.

    It's interesting that this lack of credibility apparently includes
    IBM, whose research lab pioneered the concept. They produced the IBM
    801 with 15MHz clock, probably around the time of the first VAX, but
    the IBM 801 had no MMU; not sure what RAM technology they used.

    IBM tried to commercialize it in the ROMP in the IBM RT PC; Wikipedia
    says: "The architectural work on the ROMP began in late spring of
    1977, as a spin-off of IBM Research's 801 RISC processor ... The first
    examples became available in 1981, and it was first used commercially
    in the IBM RT PC announced in January 1986. ... The delay between the completion of the ROMP design, and introduction of the RT PC was
    caused by overly ambitious software plans for the RT PC and its
    operating system (OS)." And IBM then designed a new RISC, the
    RS/6000, which was released in 1990.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Anton Ertl on Sun Mar 2 00:16:06 2025
    On Sat, 01 Mar 2025 11:58:17 GMT, Anton Ertl wrote:

    Like other USA-based computer architects, Bell ignores ARM, which outperformed the VAX without using caches and was much easier to design.

    While those ARM chips were legendary for their low power consumption (and
    low transistor count), those Archimedes machines were not exactly low-
    cost, as I recall.

    Without caches, did they have to use faster (and therefore more expensive) memory? Or did they fall back on the classic “wait states”?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to BGB on Sun Mar 2 01:02:04 2025
    On Sat, 1 Mar 2025 22:29:27 +0000, BGB wrote:

    On 3/1/2025 5:58 AM, Anton Ertl wrote:
    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    ------------------------------
    Would likely need some new internal operators to deal with bit-array operations and similar, with bit-ranges allowed as a pseudo-value type
    (may exist in constant expressions but will not necessarily exist as an actual value type at runtime).
    Say:
    val[63:32]
    Has the (63:32) as a BitRange type, which then has special semantics
    when used as an array index on an integer type, ...

    Mc 88K and My 66000 both have bit-vector operations.

    The previous idea for bitfield extract/insert had turned into a
    composite BITMOV instruction that could potentially do both operations
    in a single instruction (along with moving a bitfield directly between
    two instructions).

    Using CARRY and extract + insert, one can extract a field spanning
    a doubleword and then insert it into another pair of doublewords.
    1 pseudo-instruction, 2 actual instructions.

    Idea here is that it may do, essentially a combination of a shift and a masked bit-select, say:
    Low 8 bits of immediate encode a shift in the usual format:
    Signed 8-bit shift amount, negative is right shift.
    High bits give a pair of bit-offsets used to compose a bit-mask.
    These will MUX between the shifted value and another input value.

    You want the offset (a 6-bit number) and the size (another 6-bit number)
    in order to identify the field in question.

    I am still not sure whether this would make sense in hardware, but is
    not entirely implausible to implement in the Verilog.

    In the extract case, you have the shifter before the masker
    In the insert case, you have the masker before the shifter
    followed by a merge (OR). Both maskers use the size. Offset
    goes only to the shifter.

    Would likely be a 2 or 3 cycle operation, say:
    EX1: Do a Shift and Mask Generation;
    May reuse the normal SHAD unit for the shift;
    Mask-Gen will be specialized logic;
    EX2:
    Do the MUX.
    EX3:
    Present MUX result as output (passed over from EX2).

    I have done these in 1 cycle ...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Anton Ertl on Sun Mar 2 02:40:45 2025
    On Sat, 01 Mar 2025 22:25:26 GMT, Anton Ertl wrote:

    The other thing is that the VAX 11/780 (released 1977) had a 2KB cache,
    so Bell's argument that caches were only available around 1985 does not
    hold water on that end, either.

    It was about the sizes of the caches, and hence their contribution to the
    cost.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lynn Wheeler@21:1/5 to Anton Ertl on Sat Mar 1 18:29:50 2025
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    IBM tried to commercialize it in the ROMP in the IBM RT PC; Wikipedia
    says: "The architectural work on the ROMP began in late spring of
    1977, as a spin-off of IBM Research's 801 RISC processor ... The first examples became available in 1981, and it was first used commercially
    in the IBM RT PC announced in January 1986. ... The delay between the completion of the ROMP design, and introduction of the RT PC was
    caused by overly ambitious software plans for the RT PC and its
    operating system (OS)." And IBM then designed a new RISC, the
    RS/6000, which was released in 1990.

    ROMP originally for DISPLAYWRITER follow-on ... running CP.r operating
    system and PL.8 programming language. ROMP was minimal 801, didn't have supervisor/problem mode ... at the time their claim was PL.8 would only generate correct code and CP.r would only load/execute correct programs.
    They claimed 40bit addressing ... 32 bit addresses ... but top four bits selected 16 "segment registers" that contained 12bit
    segment-identifiers. ... aka 28bit segment displacement and 12bit
    segment-ids (40bits) .... and any inline code could change segment
    register value ... as easily as could load any general register.

    When follow-on to DISPLAYWRITER was canceled, they pivoted to UNIX
    workstation market and got the company that had done AT&T unix port to
    IBM/PC for PC/IX ... to do AIX. Now ROMP needed supervisor/problem mode
    and inline code could no longer change segment register values
    ... needed to have supervisor call.

    Folklore is they also had 200 PL.8 programmers and needed something for
    them to do, so they gen'ed a abstract virtual machine system ("VRM") (implemented in PL.8) and had AIX port be done to the abstract virtual
    machine definition (instead of real hardware) .... claiming that the
    combined effort would be less (total effort) than having the outside
    company do the AIX port to the real hardware (also putting in a lot of
    IBM SNA communication support).

    The IBM Palo Alto group had been working on UCB BSD port to 370, but was redirected to do it instead to bare ROMP hardware ... doing it in
    enormously significantly less resources than the VRM+AIX+SNA effort.

    Move to RS/6000 & RIOS (large multi-chip) doubled the 12bit segment-id
    to 24bit segment-id (and some left-over description talked about it
    being 52bit addressing) and eliminated the VRM ... and adding in some
    amount of BSDisms.

    AWD had done their own cards for PC/RT (16bit AT) bus, including a 4mbit token-ring card. Then for RS/6000 microchannel, AWD was told they
    couldn't do their own card, but had to do PS2 microchannel cards. The communication group was fiercely fighting off client/server and
    distributed computing and had seriously performance knee-capped PS2
    cards, including ($800) 16mbit token-ring card (the PS2 microchannel
    which had lower card throughput than the PC/RT 4mbit TR card). There
    was joke that PC/RT 4mbit TR server having higher throughput than
    RS/6000 16mbit TR server. There was also joke that the RS6000/730 with
    VMEbus was a work around corporate politics and being able to install high-performance workstation cards

    We got the HA/6000 project in 1988 (approved by Nick Donofrio),
    originally for NYTimes to move their newspaper system off VAXCluster to RS/6000. I rename it HA/CMP. https://en.wikipedia.org/wiki/IBM_High_Availability_Cluster_Multiprocessing when I start doing technical/scientific cluster scale-up with national
    labs (LLNL, LANL, NCAR, etc) and commercial cluster scale-up with RDBMS
    vendors (Oracle, Sybase, Ingres, Informix that had vaxcluster support in
    same source base with unix). The S/88 product administrator then starts
    taking us around to their customers and also has me do a section for the corporate continuous availability strategy document ... it gets pulled
    when both Rochester/AS400 and POK/(high-end mainframe) complain they
    couldn't meet the requirements.

    Early Jan1992 have a meeting with Oracle CEO and IBM/AWD Hester tells
    Ellison we would have 16-system clusters by mid92 and 128-system
    clusters by ye92. Then late Jan92, cluster scale-up is transferred for
    announce as IBM Supercomputer (for technical/scientific *ONLY*) and we
    are told we can't work on anything with more than four processors (we
    leave IBM a few months later). Contributing was the mainframe DB2 DBMS
    group were complaining if we were allowed to coninue, it would be at
    least five years ahead of them.

    Neither ROMP or RIOS supported bus/cache consistency for multiprocessor operation. The executive we reported to, went over to head up ("AIM" -
    Apple, IBM, Motorola) Somerset for single chip 801/risc ... but also
    adopts Motorola 88k bus enabling multiprocessor configurations. He later
    leaves Somerset for president of (SGI owned) MIPS.

    trivia: I also had HSDT project (started in early 80s), T1 and faster
    computer links, both terrestrial and satellite ... which included custom designed TDMA satellite system done on the other side of the pacific
    ... and put in 3-node system. two 4.5M dishes, one in San Jose and one
    in Yorktown Research (hdqtrs, east coast) and a 7M dish in Austin (where
    much of the RIOS design was going on). San Jose also got an EVE, a
    superfast hardware VLSI logic simulator (scores of times faster than
    existing simultion) ... and it was claimed that Austin being able to use
    the EVE in San Jose, helped bring RIOS in a year early.

    --
    virtualization experience starting Jan1968, online at home since Mar1970

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)