• What do we call non-pipelined designs?

    From Marcus@21:1/5 to All on Sun Dec 8 23:10:15 2024
    I usually (and simplistically) divide CPU designs (implementations) into
    two main categories:

    - Pipelined
    - Non-pipelined

    Of course, there is a sliding scale at play, but let's not get into that debate.

    My question is: What is the best name for non-pipelined designs?

    I'm thinking about CPU:s that transition through several states (one
    clock cycle after another) when executing a single instruction (e.g.
    FETCH + DECODE + EXECUTE), and where instruction and data typically
    share the same memory interface.

    /Marcus

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Marcus on Sun Dec 8 23:05:40 2024
    On Sun, 8 Dec 2024 22:10:15 +0000, Marcus wrote:

    I usually (and simplistically) divide CPU designs (implementations) into
    two main categories:

    - Pipelined
    - Non-pipelined

    Of course, there is a sliding scale at play, but let's not get into that debate.

    My question is: What is the best name for non-pipelined designs?

    If any portion of the design fetches the next instruction before the
    last calculation of the previous instruction, then the design is
    pipelined. CDC 6600 had a pipelined front end and serially reusable
    calculation units. {{Also note under this definition 6800, 68000, and
    8086 were (partially pipelined) architectures.

    I'm thinking about CPU:s that transition through several states (one
    clock cycle after another) when executing a single instruction (e.g.
    FETCH + DECODE + EXECUTE), and where instruction and data typically
    share the same memory interface.

    Given that one can take an off the shelf (rather cheap) FPGA and
    implement a fully pipeline RISC ISA implementation::

    Why, in this day and age, would anyone want to even consider doing
    something less pipelined than that ?!?!?

    /Marcus

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Mon Dec 9 03:43:14 2024
    On Sun, 8 Dec 2024 23:05:40 +0000, MitchAlsup1 wrote:

    Why, in this day and age, would anyone want to even consider doing
    something less pipelined than that ?!?!?

    Here’s one reason (tell me if I’m wrong): I remember some CPU designs used CMOS and probably some other hardware magic I don’t understand to create chips that could run at any speed down to 0Hz.

    That is, you could slow down the clock and even stop it completely at any
    point in instruction execution (keep the power on) to pause the program,
    then start it up again and the program would resume from that point.

    Would that work with a pipeline? Actually I suppose it would.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Schultz@21:1/5 to Lawrence D'Oliveiro on Mon Dec 9 11:20:55 2024
    On 12/8/24 9:43 PM, Lawrence D'Oliveiro wrote:
    On Sun, 8 Dec 2024 23:05:40 +0000, MitchAlsup1 wrote:

    Why, in this day and age, would anyone want to even consider doing
    something less pipelined than that ?!?!?

    Here’s one reason (tell me if I’m wrong): I remember some CPU designs used
    CMOS and probably some other hardware magic I don’t understand to create chips that could run at any speed down to 0Hz.

    It required zero hardware magic. It was a design choice.

    A dynamic latch required fewer transistors (area) but imposed a minimum
    clock speed to keep the stored charge from leaking away to uselessness.
    A static latch had no minimum speed but required more transistors.

    In those early days with CPU transistor counts in the thousands, dynamic latches were a good and fairly common choice.

    --
    http://davesrocketworks.com
    David Schultz

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Lawrence D'Oliveiro on Tue Dec 10 02:05:58 2024
    On Mon, 9 Dec 2024 3:43:14 +0000, Lawrence D'Oliveiro wrote:

    On Sun, 8 Dec 2024 23:05:40 +0000, MitchAlsup1 wrote:

    Why, in this day and age, would anyone want to even consider doing
    something less pipelined than that ?!?!?

    Here’s one reason (tell me if I’m wrong): I remember some CPU designs used CMOS and probably some other hardware magic I don’t understand to create > chips that could run at any speed down to 0Hz.

    This is just standard static CMOS design. It is they dynamic stuff
    which is faster but cannot be slowed down to 0 Hz.

    That is, you could slow down the clock and even stop it completely at
    any point in instruction execution (keep the power on) to pause the
    program, then start it up again and the program would resume from that
    point.

    Would that work with a pipeline? Actually I suppose it would.

    A purely static pipeline, yes. But the trick is the ability to steal
    0-k clocks from a single pipeline, without messing with the phase
    accuracy of the rest of the chip.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marcus@21:1/5 to All on Sat Dec 14 10:27:47 2024
    On 2024-12-09 00:05, MitchAlsup1 wrote:
    On Sun, 8 Dec 2024 22:10:15 +0000, Marcus wrote:

    I usually (and simplistically) divide CPU designs (implementations) into
    two main categories:

    - Pipelined
    - Non-pipelined

    Of course, there is a sliding scale at play, but let's not get into that
    debate.

    My question is: What is the best name for non-pipelined designs?

    If any portion of the design fetches the next instruction before the
    last calculation of the previous instruction, then the design is
    pipelined. CDC 6600 had a pipelined front end and serially reusable calculation units. {{Also note under this definition 6800, 68000, and
    8086 were (partially pipelined) architectures.

    Yes, it's certainly a scale, where most implementations have *some*
    pipelining and *some* unit reuse. I'm thinking about what to call the
    two ends of that scale.


    I'm thinking about CPU:s that transition through several states (one
    clock cycle after another) when executing a single instruction (e.g.
    FETCH + DECODE + EXECUTE), and where instruction and data typically
    share the same memory interface.

    Given that one can take an off the shelf (rather cheap) FPGA and
    implement a fully pipeline RISC ISA implementation::

    Why, in this day and age, would anyone want to even consider doing
    something less pipelined than that ?!?!?

    My question is more about the nomenclature, not about merits of
    different design choices.

    It's mostly about overall design, where on one end (e.g. the MIPS)
    you focus on instruction throughput at the cost of resource duplication,
    while on the other end (e.g. 6502) you focus on resource reuse at the
    cost of lower performance.

    /Marcus

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Marcus on Sun Dec 15 01:14:42 2024
    On Sat, 14 Dec 2024 9:27:47 +0000, Marcus wrote:

    On 2024-12-09 00:05, MitchAlsup1 wrote:
    On Sun, 8 Dec 2024 22:10:15 +0000, Marcus wrote:

    I usually (and simplistically) divide CPU designs (implementations) into >>> two main categories:

    - Pipelined
    - Non-pipelined

    Of course, there is a sliding scale at play, but let's not get into that >>> debate.

    My question is: What is the best name for non-pipelined designs?

    If any portion of the design fetches the next instruction before the
    last calculation of the previous instruction, then the design is
    pipelined. CDC 6600 had a pipelined front end and serially reusable
    calculation units. {{Also note under this definition 6800, 68000, and
    8086 were (partially pipelined) architectures.

    Yes, it's certainly a scale, where most implementations have *some* pipelining and *some* unit reuse. I'm thinking about what to call the
    two ends of that scale.

    Is there a problem with calling one end Heavily pipelined, and then
    to call the other end non-pipelined. At least the nomenclature will
    not make the spectrum less visible.


    I'm thinking about CPU:s that transition through several states (one
    clock cycle after another) when executing a single instruction (e.g.
    FETCH + DECODE + EXECUTE), and where instruction and data typically
    share the same memory interface.

    Given that one can take an off the shelf (rather cheap) FPGA and
    implement a fully pipeline RISC ISA implementation::

    Why, in this day and age, would anyone want to even consider doing
    something less pipelined than that ?!?!?

    My question is more about the nomenclature, not about merits of
    different design choices.

    We could invent some kind of metric such as transistor count divided
    by Flip-Flop count. Quotients above 120 are pipelined; quotients
    below 60 are unlikely to be pipelined; quotients above 240 are
    deeply pipelined.

    It's mostly about overall design, where on one end (e.g. the MIPS)
    you focus on instruction throughput at the cost of resource duplication, while on the other end (e.g. 6502) you focus on resource reuse at the
    cost of lower performance.

    /Marcus

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Robert Finch on Thu Dec 26 12:57:46 2024
    Robert Finch <robfi680@gmail.com> schrieb:

    According to my understanding of “pipelined” most designs are
    pipelined. There are not very many non-pipelined designs.

    Not any more.

    Non-pipelined
    designs perform everything in one long clock cycle.

    Earlier architectures had several clock cycles per instruction,
    also without pipelining. I think the single-clock CPUs mostly
    serve as an example for educational purposes.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to Thomas Koenig on Thu Dec 26 13:54:42 2024
    Thomas Koenig wrote:
    Robert Finch <robfi680@gmail.com> schrieb:

    According to my understanding of “pipelined” most designs are
    pipelined. There are not very many non-pipelined designs.

    Not any more.

    Non-pipelined
    designs perform everything in one long clock cycle.

    Earlier architectures had several clock cycles per instruction,
    also without pipelining. I think the single-clock CPUs mostly
    serve as an example for educational purposes.

    It is possible to do everything for a risc style ISA in one clock but
    it would need a Harvard architecture with separate instruction and
    data memory because it would have to read the instruction memory and
    also LD [reg]->reg or ST reg->[reg] data memory within the same clock.

    So the only flip-flops would be in the 3-port register file and
    the RIP register, and everything between instruction read and result
    write is combinatorial logic. The critical timing path would be
    2x the mem access time plus combinatorial logic.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to EricP on Thu Dec 26 20:22:01 2024
    On Thu, 26 Dec 2024 18:54:42 +0000, EricP wrote:

    Thomas Koenig wrote:
    Robert Finch <robfi680@gmail.com> schrieb:

    According to my understanding of “pipelined” most designs are
    pipelined. There are not very many non-pipelined designs.

    Not any more.

    Non-pipelined
    designs perform everything in one long clock cycle.

    Earlier architectures had several clock cycles per instruction,
    also without pipelining. I think the single-clock CPUs mostly
    serve as an example for educational purposes.

    It is possible to do everything for a risc style ISA in one clock but

    ??? LDs in 1 cycle
    ??? IMUL in 1 cycle
    ??? IDIV in 1 cycle
    ??? L1 miss in 1 cycle
    ??? FP <any> in 1 cycle

    it would need a Harvard architecture with separate instruction and
    data memory because it would have to read the instruction memory and
    also LD [reg]->reg or ST reg->[reg] data memory within the same clock.

    So the only flip-flops would be in the 3-port register file and
    the RIP register, and everything between instruction read and result
    write is combinatorial logic. The critical timing path would be
    2x the mem access time plus combinatorial logic.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to All on Thu Dec 26 16:27:00 2024
    MitchAlsup1 wrote:
    On Thu, 26 Dec 2024 18:54:42 +0000, EricP wrote:

    Thomas Koenig wrote:
    Robert Finch <robfi680@gmail.com> schrieb:

    According to my understanding of “pipelined” most designs are
    pipelined. There are not very many non-pipelined designs.

    Not any more.

    Non-pipelined
    designs perform everything in one long clock cycle.

    Earlier architectures had several clock cycles per instruction,
    also without pipelining. I think the single-clock CPUs mostly
    serve as an example for educational purposes.

    It is possible to do everything for a risc style ISA in one clock but

    ??? LDs in 1 cycle
    ??? IMUL in 1 cycle
    ??? IDIV in 1 cycle
    ??? L1 miss in 1 cycle
    ??? FP <any> in 1 cycle

    Luxury! Why in my day...

    it would need a Harvard architecture with separate instruction and
    data memory because it would have to read the instruction memory and
    also LD [reg]->reg or ST reg->[reg] data memory within the same clock.

    So the only flip-flops would be in the 3-port register file and
    the RIP register, and everything between instruction read and result
    write is combinatorial logic. The critical timing path would be
    2x the mem access time plus combinatorial logic.

    Sure, for a minimal risc like the original HP-PA RISC
    which had no multiply because that took multiple clocks.

    The memory would be SRAM and read data available after T_read_access
    and write data performed at the rising clock edge at T_write_access
    after the write address is presented. There is no need for a Ready/Wait
    signal from SRAM because we make sure we could meet the timing in design.

    But if your not in a hurry, both IMUL and IDIV can be done combinatorially.
    So could FP if you really want to. They just give you a long critical path.

    There is no L1 because that implies a cache miss means multiple clocks
    which violates the design requirement. However it would be easy enough to implement a Wait signal from memory to inhibit the next clock until Ready.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)