• Re: What do we call non-pipelined designs?

    From Thomas Koenig@21:1/5 to Robert Finch on Thu Dec 26 12:57:46 2024
    Robert Finch <robfi680@gmail.com> schrieb:

    According to my understanding of “pipelined” most designs are
    pipelined. There are not very many non-pipelined designs.

    Not any more.

    Non-pipelined
    designs perform everything in one long clock cycle.

    Earlier architectures had several clock cycles per instruction,
    also without pipelining. I think the single-clock CPUs mostly
    serve as an example for educational purposes.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to Thomas Koenig on Thu Dec 26 13:54:42 2024
    Thomas Koenig wrote:
    Robert Finch <robfi680@gmail.com> schrieb:

    According to my understanding of “pipelined” most designs are
    pipelined. There are not very many non-pipelined designs.

    Not any more.

    Non-pipelined
    designs perform everything in one long clock cycle.

    Earlier architectures had several clock cycles per instruction,
    also without pipelining. I think the single-clock CPUs mostly
    serve as an example for educational purposes.

    It is possible to do everything for a risc style ISA in one clock but
    it would need a Harvard architecture with separate instruction and
    data memory because it would have to read the instruction memory and
    also LD [reg]->reg or ST reg->[reg] data memory within the same clock.

    So the only flip-flops would be in the 3-port register file and
    the RIP register, and everything between instruction read and result
    write is combinatorial logic. The critical timing path would be
    2x the mem access time plus combinatorial logic.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to EricP on Thu Dec 26 20:22:01 2024
    On Thu, 26 Dec 2024 18:54:42 +0000, EricP wrote:

    Thomas Koenig wrote:
    Robert Finch <robfi680@gmail.com> schrieb:

    According to my understanding of “pipelined” most designs are
    pipelined. There are not very many non-pipelined designs.

    Not any more.

    Non-pipelined
    designs perform everything in one long clock cycle.

    Earlier architectures had several clock cycles per instruction,
    also without pipelining. I think the single-clock CPUs mostly
    serve as an example for educational purposes.

    It is possible to do everything for a risc style ISA in one clock but

    ??? LDs in 1 cycle
    ??? IMUL in 1 cycle
    ??? IDIV in 1 cycle
    ??? L1 miss in 1 cycle
    ??? FP <any> in 1 cycle

    it would need a Harvard architecture with separate instruction and
    data memory because it would have to read the instruction memory and
    also LD [reg]->reg or ST reg->[reg] data memory within the same clock.

    So the only flip-flops would be in the 3-port register file and
    the RIP register, and everything between instruction read and result
    write is combinatorial logic. The critical timing path would be
    2x the mem access time plus combinatorial logic.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to All on Thu Dec 26 16:27:00 2024
    MitchAlsup1 wrote:
    On Thu, 26 Dec 2024 18:54:42 +0000, EricP wrote:

    Thomas Koenig wrote:
    Robert Finch <robfi680@gmail.com> schrieb:

    According to my understanding of “pipelined” most designs are
    pipelined. There are not very many non-pipelined designs.

    Not any more.

    Non-pipelined
    designs perform everything in one long clock cycle.

    Earlier architectures had several clock cycles per instruction,
    also without pipelining. I think the single-clock CPUs mostly
    serve as an example for educational purposes.

    It is possible to do everything for a risc style ISA in one clock but

    ??? LDs in 1 cycle
    ??? IMUL in 1 cycle
    ??? IDIV in 1 cycle
    ??? L1 miss in 1 cycle
    ??? FP <any> in 1 cycle

    Luxury! Why in my day...

    it would need a Harvard architecture with separate instruction and
    data memory because it would have to read the instruction memory and
    also LD [reg]->reg or ST reg->[reg] data memory within the same clock.

    So the only flip-flops would be in the 3-port register file and
    the RIP register, and everything between instruction read and result
    write is combinatorial logic. The critical timing path would be
    2x the mem access time plus combinatorial logic.

    Sure, for a minimal risc like the original HP-PA RISC
    which had no multiply because that took multiple clocks.

    The memory would be SRAM and read data available after T_read_access
    and write data performed at the rising clock edge at T_write_access
    after the write address is presented. There is no need for a Ready/Wait
    signal from SRAM because we make sure we could meet the timing in design.

    But if your not in a hurry, both IMUL and IDIV can be done combinatorially.
    So could FP if you really want to. They just give you a long critical path.

    There is no L1 because that implies a cache miss means multiple clocks
    which violates the design requirement. However it would be easy enough to implement a Wait signal from memory to inhibit the next clock until Ready.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marcus@21:1/5 to Robert Finch on Sat Jan 11 19:13:50 2025
    On 2024-12-26, Robert Finch wrote:
    On 2024-12-08 5:10 p.m., Marcus wrote:
    I usually (and simplistically) divide CPU designs (implementations) into
    two main categories:

    - Pipelined
    - Non-pipelined

    Of course, there is a sliding scale at play, but let's not get into that
    debate.

    My question is: What is the best name for non-pipelined designs?

    I'm thinking about CPU:s that transition through several states (one
    clock cycle after another) when executing a single instruction (e.g.
    FETCH + DECODE + EXECUTE), and where instruction and data typically
    share the same memory interface.

    /Marcus
    According to my understanding of  “pipelined” most designs are pipelined. There are not very many non-pipelined designs.

    True. I'm talking about a niche here.

    Non-pipelined designs perform everything in one long clock cycle.

    The designs I'm thinking about are mostly multi-cycle, i.e. one
    instruction takes several cycles to complete.

    Otherwise, there are two major classes of pipelined designs,
    non-overlapped pipeline and overlapped pipeline. Some designs are
    partially overlapped pipelined.

    For the sake of the argument, what should we call:

    * Intel 8008 [1]
    * Olof Kindgren's SERV [2]
    * MOS 6502 [3]

    ?

    There may be some pipelining in parts of these designs, but the key
    point I'm trying to get at is that the CPU typically goes through a
    sequence of states when executing an instruction, and it is typically
    "busy" for more than one clock cycle while executing one instruction.

    /Marcus

    [1] https://en.wikipedia.org/wiki/Intel_8008
    [2]
    https://serv.readthedocs.io/en/latest/internals.html#instruction-life-cycle
    [3] https://en.wikipedia.org/wiki/MOS_Technology_6502

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to ze@zerandconsulting.com on Sun Jan 12 16:05:42 2025
    On Sun, 12 Jan 2025 13:44:44 +0000
    ze@zerandconsulting.com (Ze) wrote:


    We were just doing toy cpus to learn on , I doubt anybody needs to do
    multi cycle designs anymore , those are from a time when gates were
    precious.


    To remove your doubts: https://www.intel.com/content/www/us/en/products/details/fpga/intellectual-property/processors-peripherals/niosv.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ze@21:1/5 to All on Sun Jan 12 13:44:44 2025
    At university during the early 2000s we would've called it a multi cycle design.

    We had to design and implement on an fgpa an 8 bit multi cycle CPU using
    the various design techniques in the fgpa software. It w as only a toy
    with separate data and instruction memory that reused the ALU for
    calculating instruction pointer etc , I can't even remember if we had a call/return instructions or just branch and jump instructions it may
    have had push,pop instructions or not, I do remember it using 16 bit instruction words and roughly based on risc principles , I know I wrote
    a quick and dirty assembler so I didn't have to manually translate
    assembler to binary for the program ROMs.

    I remember it going
    Single Cycle : instructions take a single cycle and don't overlap

    Multi Cycle : instructions take multiple cycles and don't overlap
    optionally reusing parts on different cycle eg ALU , register file ,
    different instructions could potentially take different number of cycles

    Pipelined: instructions take multiple cycles but multiple instructions overlapping , so need to deal with hazards.

    We used the Hennessey and Patterson book 4th Ed from memory, the first
    one, not the second more advanced one, so I assume those are the terms
    used in it.

    We were just doing toy cpus to learn on , I doubt anybody needs to do
    multi cycle designs anymore , those are from a time when gates were
    precious.


    If one layed it out like a pipeline but didn't have flip flops or
    latches in between each stage would it still be pipelined? Ie is it
    (single cycle,multi cycle)x(non pipelined,pipelined) or (single
    cycle,multi cycle,pipelined). Then we could have degrees of pipelining ,
    eg lightly,heavily, number of stages or numbers of gates(fo4 etc) of
    delay.

    Nick

    --

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to Marcus on Sun Jan 12 12:47:35 2025
    Marcus wrote:
    On 2024-12-26, Robert Finch wrote:
    On 2024-12-08 5:10 p.m., Marcus wrote:
    I usually (and simplistically) divide CPU designs (implementations) into >>> two main categories:

    - Pipelined
    - Non-pipelined

    Of course, there is a sliding scale at play, but let's not get into that >>> debate.

    My question is: What is the best name for non-pipelined designs?

    I'm thinking about CPU:s that transition through several states (one
    clock cycle after another) when executing a single instruction (e.g.
    FETCH + DECODE + EXECUTE), and where instruction and data typically
    share the same memory interface.

    /Marcus
    According to my understanding of “pipelined” most designs are
    pipelined. There are not very many non-pipelined designs.

    True. I'm talking about a niche here.

    Non-pipelined designs perform everything in one long clock cycle.

    The designs I'm thinking about are mostly multi-cycle, i.e. one
    instruction takes several cycles to complete.

    Otherwise, there are two major classes of pipelined designs,
    non-overlapped pipeline and overlapped pipeline. Some designs are
    partially overlapped pipelined.

    For the sake of the argument, what should we call:

    * Intel 8008 [1]
    * Olof Kindgren's SERV [2]
    * MOS 6502 [3]

    ?

    There may be some pipelining in parts of these designs, but the key
    point I'm trying to get at is that the CPU typically goes through a
    sequence of states when executing an instruction, and it is typically
    "busy" for more than one clock cycle while executing one instruction.

    /Marcus

    [1] https://en.wikipedia.org/wiki/Intel_8008
    [2] https://serv.readthedocs.io/en/latest/internals.html#instruction-life-cycle [3] https://en.wikipedia.org/wiki/MOS_Technology_6502


    I don't think there was a name for non-pipelined or non-superscalar
    as it is implied if not explicitly labeled as otherwise.
    When cpu designers go to the trouble of adding concurrency features
    like these to have multiple instructions in-flight at once
    then they usually tout them to potential customers.

    Sequential would be applicable for no concurrency though I don't
    think you'd find many people saying they bought a "sequential cpu".

    Note that the 6502 in certain cases could overlap the fetch of the next instruction with the execution of the current one. So even though it
    was a first generation microprocessor, with a combination microprogrammed
    and hardware micro-sequencer, it was slightly pipelined.
    And they touted it as such.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marcus@21:1/5 to Michael S on Wed Jan 15 08:18:31 2025
    On 2025-01-12, Michael S wrote:
    On Sun, 12 Jan 2025 13:44:44 +0000
    ze@zerandconsulting.com (Ze) wrote:


    We were just doing toy cpus to learn on , I doubt anybody needs to do
    multi cycle designs anymore , those are from a time when gates were
    precious.


    To remove your doubts: https://www.intel.com/content/www/us/en/products/details/fpga/intellectual-property/processors-peripherals/niosv.html




    Also: https://github.com/olofk/serv

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)