• LUTstructions: Self-loading FPGA-based Reconfigurable Instructions

    From John Levine@johnl@taugh.com to comp.arch on Wed Feb 25 18:56:31 2026
    From Newsgroup: comp.arch

    This paper was posted to arXiv yesterday. I'm not sure how useful
    it really is, but it's an intresting idea.

    Abstract

    General-purpose processors feature a limited number of instructions
    based on an instruction set. They can be numerous, such as with vector extensions that include hundreds or thousands of instructions, but
    this comes at a cost; they are often unable to express arbitrary tasks efficiently. This paper explores the concept of having reconfigurable instructions by incorporating reconfigurable areas in a softcore. It
    follows a relatively-recently proposed computer architecture concept
    for seamlessly loading instruction implementation-carrying bitstreams
    from main memory. The resulting softcore is entirely evaluated on an
    FPGA, essentially having an FPGA-on-an-FPGA for the instruction implementations, with no notable operating frequency overhead. This is
    achieved with a custom FPGA architecture called LUTstruction, which is
    tailored towards low-latency for custom instructions and wide
    reconfiguration, as well as a soft implementation for the purposes of architectural exploration.

    https://arxiv.org/abs/2602.20802
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Wed Feb 25 19:48:27 2026
    From Newsgroup: comp.arch

    John Levine <johnl@taugh.com> writes:
    This paper was posted to arXiv yesterday. I'm not sure how useful
    it really is, but it's an intresting idea.

    Abstract

    General-purpose processors feature a limited number of instructions
    based on an instruction set. They can be numerous, such as with vector >extensions that include hundreds or thousands of instructions, but
    this comes at a cost; they are often unable to express arbitrary tasks >efficiently. This paper explores the concept of having reconfigurable >instructions by incorporating reconfigurable areas in a softcore. It
    follows a relatively-recently proposed computer architecture concept
    for seamlessly loading instruction implementation-carrying bitstreams
    from main memory. The resulting softcore is entirely evaluated on an
    FPGA, essentially having an FPGA-on-an-FPGA for the instruction >implementations, with no notable operating frequency overhead. This is >achieved with a custom FPGA architecture called LUTstruction, which is >tailored towards low-latency for custom instructions and wide >reconfiguration, as well as a soft implementation for the purposes of >architectural exploration.

    https://arxiv.org/abs/2602.20802

    Sounds a bit like the Burroughs small systems (B1700 - B1900),
    where the instruction set was reconfigured per-task based
    on the application language (e.g. COBOL used a different
    instruction set then FORTRAN or RPG) using a writable control store.

    https://en.wikipedia.org/wiki/Burroughs_B1700
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From John Levine@johnl@taugh.com to comp.arch on Wed Feb 25 19:56:26 2026
    From Newsgroup: comp.arch

    According to Scott Lurndal <slp53@pacbell.net>:
    tailored towards low-latency for custom instructions and wide >>reconfiguration, as well as a soft implementation for the purposes of >>architectural exploration.

    https://arxiv.org/abs/2602.20802

    Sounds a bit like the Burroughs small systems (B1700 - B1900),
    where the instruction set was reconfigured per-task ...

    Kind of, but if you read the paper you'll see it's a RISC V reloading the FPGA that implmenents instructions that do specific calculations, not a whole new microcode load.

    Also, the LUT bit is look-up tables, less flexible but presumably faster
    than full microinstructions.
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Thu Feb 26 01:37:40 2026
    From Newsgroup: comp.arch


    John Levine <johnl@taugh.com> posted:

    This paper was posted to arXiv yesterday. I'm not sure how useful
    it really is, but it's an intresting idea.

    Abstract

    General-purpose processors feature a limited number of instructions
    based on an instruction set.

    measured in the thousands, sometimes higher.

    They can be numerous, such as with vector extensions that include hundreds or thousands of instructions, but
    this comes at a cost; they are often unable to express arbitrary tasks efficiently.

    A general purpose machine should be good a general purpose algorithms.

    Remember way back when Herman was nagging us for exponentially distributed random number generators; but was not allowed to tell of us the algorithm,
    or to make such a random number generator ?!?

    . This paper explores the concept of having reconfigurable
    instructions by incorporating reconfigurable areas in a softcore. It

    How is this usefully different than writeable microcode ?!?

    follows a relatively-recently proposed computer architecture concept
    for seamlessly loading instruction implementation-carrying bitstreams
    from main memory.

    Sounds like a fine way to grant control of your computer over to an
    untrusted bystander.

    The resulting softcore is entirely evaluated on an
    FPGA, essentially having an FPGA-on-an-FPGA for the instruction implementations, with no notable operating frequency overhead. This is achieved with a custom FPGA architecture called LUTstruction, which is tailored towards low-latency for custom instructions and wide reconfiguration, as well as a soft implementation for the purposes of architectural exploration.

    Way back when the first people tried this, they kept running into the
    situation where the new instructions just barely did not fit. Then
    later when they did, they added more and they no longer fit. Repeat
    ad infinitum.

    You would certainly never do this in where the system was dealing with
    money or transactions without years of testing.


    https://arxiv.org/abs/2602.20802

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Thu Feb 26 07:59:34 2026
    From Newsgroup: comp.arch

    John Levine <johnl@taugh.com> writes:
    Also, the LUT bit is look-up tables, less flexible but presumably faster
    than full microinstructions.

    Microinstructions use the full-custom hardware of the CPU running at
    5GHz or so. LUTstructions use FPGA LUTs, resulting in clock rates on
    the order of 50-300MHz. There are some things that take many (micro)instructions based on the building blocks available to (micro)instructions, and that can be done in one or a few cycles in
    FPGAs. But of course that must be so hard that the factor 20-100
    speed disadvantage of FPGAs can be overcome. There are not that many
    such useful things that satisfy that criterion, and if they are useful
    to a lot of people, they tend to be cast into custom silicon at some
    point (e.g., the AESENC instruction in AMD64 or video encoding and
    decoding hardware in various GPUs). That's why the dream that people
    have of adding an FPGA to a CPU to add additional special-purpose
    instructions somehow has never happened (at least in the mass market),
    although people where hopeful when AMD bought one FPGA company and
    Intel bought the other one.

    The bottom line is that I see it exactly the other way round: FPGAs
    are flexible but slow, microinstructions steering full-custom hardware
    are less flexible but fast, instructions steering full-custom hardware
    are inflexible at the individual instruction level, but can be
    composed to flexible routines, and that tends to be slightly faster
    than microcoded instructions (because CPUs have overhead when entering microcode).

    There has been some work at connecting blocks of custom hardware using FPGA-like mechanisms, and some of that has made it to real FPGAs, and
    I expect that the flexibility/speed balance of that lands somewhere
    between using LUTs and using microinstructions.

    I have only looked at your posting, not the paper itself, but I don't
    see anything in the abstract that would change this.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Thu Feb 26 11:04:06 2026
    From Newsgroup: comp.arch

    On Thu, 26 Feb 2026 01:37:40 GMT
    MitchAlsup <user5857@newsgrouper.org.invalid> wrote:


    Remember way back when Herman was nagging us for exponentially
    distributed random number generators; but was not allowed to tell of
    us the algorithm, or to make such a random number generator ?!?


    No, I don't remember.
    I don't even remember poster with name Herman.
    Google archives search suggest that there was poster Herman Rubin, but
    his signatures suggested that he works in the Statistics Department or
    of Purdue University. I interpret it as being allowed to tell us
    anything he chooses to tell.


    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Thu Feb 26 12:14:31 2026
    From Newsgroup: comp.arch

    On Wed, 25 Feb 2026 18:56:31 -0000 (UTC)
    John Levine <johnl@taugh.com> wrote:

    This paper was posted to arXiv yesterday. I'm not sure how useful
    it really is, but it's an intresting idea.

    Abstract

    General-purpose processors feature a limited number of instructions
    based on an instruction set. They can be numerous, such as with vector extensions that include hundreds or thousands of instructions, but
    this comes at a cost; they are often unable to express arbitrary tasks efficiently. This paper explores the concept of having reconfigurable instructions by incorporating reconfigurable areas in a softcore. It
    follows a relatively-recently proposed computer architecture concept
    for seamlessly loading instruction implementation-carrying bitstreams
    from main memory. The resulting softcore is entirely evaluated on an
    FPGA, essentially having an FPGA-on-an-FPGA for the instruction implementations, with no notable operating frequency overhead. This is achieved with a custom FPGA architecture called LUTstruction, which is tailored towards low-latency for custom instructions and wide reconfiguration, as well as a soft implementation for the purposes of architectural exploration.

    https://arxiv.org/abs/2602.20802


    Altera's Nios2 soft core had ability to add few custom instructions
    from day one.

    https://docs.altera.com/r/docs/683242/current/nios-ii-custom-instruction-user-guide/nios-ii-custom-instruction-overview
    PDF https://docs.altera.com/api/khub/maps/ClLnyg5z3HhMFQEw58M_xg/attachments/8jsoEfIYIyv660Zy30Zu0Q-ClLnyg5z3HhMFQEw58M_xg/content?download=true

    I can not say for sure, but my understanding is that very few Nois2
    customers found this feature useful.


    Nios2 is officially dead for few years. Nios5 (variant of RISC-V) is
    its official heir.
    It also provides custom instructions capability: https://docs.altera.com/r/docs/773194/current/an-977-nios-v-processor-custom-instruction/nios-v-processor-custom-instruction-overview

    It looks like Nios5 custom instruction are less flexible than those of
    Nios2. In particular, the simplest and probably the most useful
    "combinatorial" custom instructions is gone.





    --- Synchronet 3.21b-Linux NewsLink 1.2