• Re: instruction ordering, was Memory ordering (Re: Multi-precision addition ...)

    From John Levine@johnl@taugh.com to comp.arch on Fri Dec 12 01:41:41 2025
    From Newsgroup: comp.arch

    According to Thomas Koenig <tkoenig@netcologne.de>:
    MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

    Heck, there are assemblers that rearrange code like this too much--
    until they can be taught not to.

    Any example? This would definitely go against what I would consider
    to be reasonable for an assembler. gdb certainly does not do so.

    On machines with delayed branches I've seen assemblers that move
    instructions into the delay slot. Can't think of any others off hand.
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.arch on Thu Dec 11 18:27:48 2025
    From Newsgroup: comp.arch

    On 12/11/2025 5:41 PM, John Levine wrote:
    According to Thomas Koenig <tkoenig@netcologne.de>:
    MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

    Heck, there are assemblers that rearrange code like this too much--
    until they can be taught not to.

    Any example? This would definitely go against what I would consider
    to be reasonable for an assembler. gdb certainly does not do so.

    On machines with delayed branches I've seen assemblers that move
    instructions into the delay slot. Can't think of any others off hand.


    That would suck! Back when I used to code in SPARC assembly language, I
    had full control over my delay slots. Actually, IIRC, putting a MEMBAR instruction in a delay slot is VERY bad.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From John Levine@johnl@taugh.com to comp.arch on Fri Dec 12 02:48:19 2025
    From Newsgroup: comp.arch

    According to Chris M. Thomasson <chris.m.thomasson.1@gmail.com>:
    On 12/11/2025 5:41 PM, John Levine wrote:
    According to Thomas Koenig <tkoenig@netcologne.de>:
    MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

    Heck, there are assemblers that rearrange code like this too much--
    until they can be taught not to.

    Any example? This would definitely go against what I would consider
    to be reasonable for an assembler. gdb certainly does not do so.

    On machines with delayed branches I've seen assemblers that move
    instructions into the delay slot. Can't think of any others off hand.

    That would suck! Back when I used to code in SPARC assembly language, I
    had full control over my delay slots. Actually, IIRC, putting a MEMBAR >instruction in a delay slot is VERY bad.

    I think they were smart enough only to move instructions that wouldn't cause problems.
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri Dec 12 08:14:47 2025
    From Newsgroup: comp.arch

    John Levine <johnl@taugh.com> schrieb:
    According to Thomas Koenig <tkoenig@netcologne.de>:
    MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

    Heck, there are assemblers that rearrange code like this too much--
    until they can be taught not to.

    Any example? This would definitely go against what I would consider
    to be reasonable for an assembler. gdb certainly does not do so.

    On machines with delayed branches I've seen assemblers that move
    instructions into the delay slot. Can't think of any others off hand.

    Thinking of it a bit more, the optimizing assemblers for drum memory
    computers like the IBM 650 or the LGP-30 of Mel the Programmer
    fame moved around instructions so the next one would be under the
    head when the previous one was done executing.

    Random-access memory made this redundant :-)
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.arch on Fri Dec 12 13:05:43 2025
    From Newsgroup: comp.arch

    In article <10hfrsl$145v$1@gal.iecc.com>, John Levine <johnl@taugh.com> wrote: >According to Thomas Koenig <tkoenig@netcologne.de>:
    MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

    Heck, there are assemblers that rearrange code like this too much--
    until they can be taught not to.

    Any example? This would definitely go against what I would consider
    to be reasonable for an assembler. gdb certainly does not do so.

    On machines with delayed branches I've seen assemblers that move
    instructions into the delay slot. Can't think of any others off hand.

    I've seen things like this, as well, particularly on machines
    with multiple delay slots, where this detail was hidden from the
    programmer. Or at least I have a vague memory of this; perhaps
    I'm hallucinating.

    More dangerous are linkers that do LTO and decide to elide code
    that, no, really, I actually need for reasons that are not
    apparent to the toolchain.

    - Dan C.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch on Fri Dec 12 15:28:30 2025
    From Newsgroup: comp.arch

    On 12/12/2025 14:05, Dan Cross wrote:
    In article <10hfrsl$145v$1@gal.iecc.com>, John Levine <johnl@taugh.com> wrote:
    According to Thomas Koenig <tkoenig@netcologne.de>:
    MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

    Heck, there are assemblers that rearrange code like this too much--
    until they can be taught not to.

    Any example? This would definitely go against what I would consider
    to be reasonable for an assembler. gdb certainly does not do so.

    On machines with delayed branches I've seen assemblers that move
    instructions into the delay slot. Can't think of any others off hand.

    I've seen things like this, as well, particularly on machines
    with multiple delay slots, where this detail was hidden from the
    programmer. Or at least I have a vague memory of this; perhaps
    I'm hallucinating.


    I've seen a few assemblers that do fancy things with jumps and branches
    - giving you generic conditional branch pseudo-instructions that get
    turned into different types of real instructions depending on the
    distance needed for the jumps and the ranges supported by the
    instructions. And there are plenty that have pseudo-instructions for
    loading immediates into registers that generate whatever sequence of
    load immediate, shift-and-or, etc., are needed.


    More dangerous are linkers that do LTO and decide to elide code
    that, no, really, I actually need for reasons that are not
    apparent to the toolchain.


    IME you have control over the details - either using directives in the assembly, or in the linker control files. Of course that might mean
    modifying code that you hoped to use untouched, and it's not hard to
    forget to add a "keep" or "retain" directive.

    I've found link-time dead code elimination quite useful when I have one
    code base but different binary builds - sometimes all you need is a
    different linker file.


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.arch on Fri Dec 12 16:25:42 2025
    From Newsgroup: comp.arch

    In article <10hh8qe$2v9lm$1@dont-email.me>,
    David Brown <david.brown@hesbynett.no> wrote:
    On 12/12/2025 14:05, Dan Cross wrote:
    In article <10hfrsl$145v$1@gal.iecc.com>, John Levine <johnl@taugh.com> wrote:
    According to Thomas Koenig <tkoenig@netcologne.de>:
    MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

    Heck, there are assemblers that rearrange code like this too much--
    until they can be taught not to.

    Any example? This would definitely go against what I would consider
    to be reasonable for an assembler. gdb certainly does not do so.

    On machines with delayed branches I've seen assemblers that move
    instructions into the delay slot. Can't think of any others off hand.

    I've seen things like this, as well, particularly on machines
    with multiple delay slots, where this detail was hidden from the
    programmer. Or at least I have a vague memory of this; perhaps
    I'm hallucinating.


    I've seen a few assemblers that do fancy things with jumps and branches
    - giving you generic conditional branch pseudo-instructions that get
    turned into different types of real instructions depending on the
    distance needed for the jumps and the ranges supported by the
    instructions. And there are plenty that have pseudo-instructions for >loading immediates into registers that generate whatever sequence of
    load immediate, shift-and-or, etc., are needed.


    More dangerous are linkers that do LTO and decide to elide code
    that, no, really, I actually need for reasons that are not
    apparent to the toolchain.


    IME you have control over the details - either using directives in the >assembly, or in the linker control files. Of course that might mean >modifying code that you hoped to use untouched, and it's not hard to
    forget to add a "keep" or "retain" directive.

    Provided, of course, that you have access to both the assembly
    and the linker configuration for a given program. Sometimes you
    don't (e.g., if the code in question is in some higher-level
    language) or the linker configuration is just some default.

    For example, the Plan 9 C compiler delegated actual instruction
    selection to the linker; the compiler emitted a high(er)-level
    representation of the operation. This made the linker free to
    perform peephole optimization, potentially eliding important
    instructions (like writes to MMIO regions). Fortunately, the
    Plan 9 authors understood this so effectively all globals were
    volatile, but when porting that code to standard C, one had to
    exercise some care.

    I've found link-time dead code elimination quite useful when I have one
    code base but different binary builds - sometimes all you need is a >different linker file.

    Agreed, it _is_ useful. But sometimes it's inappropriate.

    - Dan C.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Dec 12 19:17:16 2025
    From Newsgroup: comp.arch


    John Levine <johnl@taugh.com> posted:

    According to Chris M. Thomasson <chris.m.thomasson.1@gmail.com>:
    On 12/11/2025 5:41 PM, John Levine wrote:
    According to Thomas Koenig <tkoenig@netcologne.de>:
    MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

    Heck, there are assemblers that rearrange code like this too much--
    until they can be taught not to.

    Any example? This would definitely go against what I would consider
    to be reasonable for an assembler. gdb certainly does not do so.

    On machines with delayed branches I've seen assemblers that move
    instructions into the delay slot. Can't think of any others off hand.

    That would suck! Back when I used to code in SPARC assembly language, I >had full control over my delay slots. Actually, IIRC, putting a MEMBAR >instruction in a delay slot is VERY bad.

    I think they were smart enough only to move instructions that wouldn't cause problems.

    Many early RISC assemblers were in charge of moving instructions around
    subject to not altering register dependencies and not altering control
    flow dependencies. This allowed those assemblers to move code across
    memory instructions, across long latency calculation instructions,
    branch instructions, including delay slots; and redefine what "program
    order" now is. A bad side effect of exposing the pipeline to SW.

    We mostly have gotten away from this due to "smart" instruction queueing.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch on Fri Dec 12 21:12:05 2025
    From Newsgroup: comp.arch

    On 12/12/2025 17:25, Dan Cross wrote:
    In article <10hh8qe$2v9lm$1@dont-email.me>,
    David Brown <david.brown@hesbynett.no> wrote:
    On 12/12/2025 14:05, Dan Cross wrote:
    In article <10hfrsl$145v$1@gal.iecc.com>, John Levine <johnl@taugh.com> wrote:
    According to Thomas Koenig <tkoenig@netcologne.de>:
    MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

    Heck, there are assemblers that rearrange code like this too much-- >>>>>> until they can be taught not to.

    Any example? This would definitely go against what I would consider >>>>> to be reasonable for an assembler. gdb certainly does not do so.

    On machines with delayed branches I've seen assemblers that move
    instructions into the delay slot. Can't think of any others off hand.

    I've seen things like this, as well, particularly on machines
    with multiple delay slots, where this detail was hidden from the
    programmer. Or at least I have a vague memory of this; perhaps
    I'm hallucinating.


    I've seen a few assemblers that do fancy things with jumps and branches
    - giving you generic conditional branch pseudo-instructions that get
    turned into different types of real instructions depending on the
    distance needed for the jumps and the ranges supported by the
    instructions. And there are plenty that have pseudo-instructions for
    loading immediates into registers that generate whatever sequence of
    load immediate, shift-and-or, etc., are needed.


    More dangerous are linkers that do LTO and decide to elide code
    that, no, really, I actually need for reasons that are not
    apparent to the toolchain.


    IME you have control over the details - either using directives in the
    assembly, or in the linker control files. Of course that might mean
    modifying code that you hoped to use untouched, and it's not hard to
    forget to add a "keep" or "retain" directive.

    Provided, of course, that you have access to both the assembly
    and the linker configuration for a given program. Sometimes you
    don't (e.g., if the code in question is in some higher-level
    language) or the linker configuration is just some default.

    I've managed so far in my own work, but I suppose I work at a lower
    level than most. I don't think it is common for C or C++ programmers to
    know much about linker control files.


    For example, the Plan 9 C compiler delegated actual instruction
    selection to the linker; the compiler emitted a high(er)-level
    representation of the operation. This made the linker free to
    perform peephole optimization, potentially eliding important
    instructions (like writes to MMIO regions). Fortunately, the
    Plan 9 authors understood this so effectively all globals were
    volatile, but when porting that code to standard C, one had to
    exercise some care.

    I've found link-time dead code elimination quite useful when I have one
    code base but different binary builds - sometimes all you need is a
    different linker file.

    Agreed, it _is_ useful. But sometimes it's inappropriate.


    Indeed.


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri Dec 12 21:02:14 2025
    From Newsgroup: comp.arch

    MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

    Many early RISC assemblers were in charge of moving instructions around subject to not altering register dependencies and not altering control
    flow dependencies. This allowed those assemblers to move code across
    memory instructions, across long latency calculation instructions,
    branch instructions, including delay slots; and redefine what "program order" now is. A bad side effect of exposing the pipeline to SW.

    I never heard of that one.

    Sounds like bad design - that should be done by the compiler,
    not the assembler. It is fine for the compiler to have pipeline
    descriptions in the cost model of the CPU under a specific -march
    or -mtune flag.

    (Yes, it is preferred that performance should be rather good for
    code generated for a generic microarchitecture).

    We mostly have gotten away from this due to "smart" instruction queueing.

    What is that?
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Dec 12 22:05:14 2025
    From Newsgroup: comp.arch


    Thomas Koenig <tkoenig@netcologne.de> posted:

    MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

    Many early RISC assemblers were in charge of moving instructions around subject to not altering register dependencies and not altering control
    flow dependencies. This allowed those assemblers to move code across
    memory instructions, across long latency calculation instructions,
    branch instructions, including delay slots; and redefine what "program order" now is. A bad side effect of exposing the pipeline to SW.

    I never heard of that one.

    Sounds like bad design - that should be done by the compiler,
    not the assembler. It is fine for the compiler to have pipeline
    descriptions in the cost model of the CPU under a specific -march
    or -mtune flag.

    (Yes, it is preferred that performance should be rather good for
    code generated for a generic microarchitecture).

    We mostly have gotten away from this due to "smart" instruction queueing.

    What is that?

    Reservation stations {Value capturing and value free}, Scoreboards,
    Dispatch stacks, and similar.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.arch on Fri Dec 12 14:19:29 2025
    From Newsgroup: comp.arch

    On 12/12/2025 2:05 PM, MitchAlsup wrote:

    Thomas Koenig <tkoenig@netcologne.de> posted:

    MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

    Many early RISC assemblers were in charge of moving instructions around
    subject to not altering register dependencies and not altering control
    flow dependencies. This allowed those assemblers to move code across
    memory instructions, across long latency calculation instructions,
    branch instructions, including delay slots; and redefine what "program
    order" now is. A bad side effect of exposing the pipeline to SW.

    I never heard of that one.

    Sounds like bad design - that should be done by the compiler,
    not the assembler. It is fine for the compiler to have pipeline
    descriptions in the cost model of the CPU under a specific -march
    or -mtune flag.

    (Yes, it is preferred that performance should be rather good for
    code generated for a generic microarchitecture).

    We mostly have gotten away from this due to "smart" instruction queueing. >>
    What is that?

    Reservation stations {Value capturing and value free}, Scoreboards,
    Dispatch stacks, and similar.

    Iiic, over on the PPC, wrt LL/SC, it was the reservation granule. I
    think it could be larger that a L2 cache line. So, any interference in
    that granule could cause LL/SC to fail. This can lead to livelock if the program's data was not aligned and/or padded correctly.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.arch on Fri Dec 12 14:22:30 2025
    From Newsgroup: comp.arch

    On 12/11/2025 6:48 PM, John Levine wrote:
    According to Chris M. Thomasson <chris.m.thomasson.1@gmail.com>:
    On 12/11/2025 5:41 PM, John Levine wrote:
    According to Thomas Koenig <tkoenig@netcologne.de>:
    MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

    Heck, there are assemblers that rearrange code like this too much--
    until they can be taught not to.

    Any example? This would definitely go against what I would consider
    to be reasonable for an assembler. gdb certainly does not do so.

    On machines with delayed branches I've seen assemblers that move
    instructions into the delay slot. Can't think of any others off hand.

    That would suck! Back when I used to code in SPARC assembly language, I
    had full control over my delay slots. Actually, IIRC, putting a MEMBAR
    instruction in a delay slot is VERY bad.

    I think they were smart enough only to move instructions that wouldn't cause problems.




    I would check the disassembly to see if anything funny happened. Also,
    when my assembled code was used in C, back before C/C++11, I would turn
    off link time optimization. And check again. This was way back, around
    25 years ago. My lock/wait free code was highly sensitive. If something thought it could "optimize" it, well, that was NOT good.
    --- Synchronet 3.21a-Linux NewsLink 1.2