Forum: Too Lazy BBS

Re: instruction ordering, was Memory ordering (Re: Multi-precision addition ...)

From John Levine@johnl@taugh.com to comp.arch on Fri Dec 12 01:41:41 2025

From Newsgroup: comp.arch

According to Thomas Koenig <tkoenig@netcologne.de>:

MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

Heck, there are assemblers that rearrange code like this too much--
until they can be taught not to.

Any example? This would definitely go against what I would consider
to be reasonable for an assembler. gdb certainly does not do so.

On machines with delayed branches I've seen assemblers that move
instructions into the delay slot. Can't think of any others off hand.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.21a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.arch on Thu Dec 11 18:27:48 2025

From Newsgroup: comp.arch

On 12/11/2025 5:41 PM, John Levine wrote:

According to Thomas Koenig <tkoenig@netcologne.de>:

MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

Heck, there are assemblers that rearrange code like this too much--
until they can be taught not to.

Any example? This would definitely go against what I would consider
to be reasonable for an assembler. gdb certainly does not do so.

On machines with delayed branches I've seen assemblers that move
instructions into the delay slot. Can't think of any others off hand.

That would suck! Back when I used to code in SPARC assembly language, I
had full control over my delay slots. Actually, IIRC, putting a MEMBAR instruction in a delay slot is VERY bad.
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Levine@johnl@taugh.com to comp.arch on Fri Dec 12 02:48:19 2025

From Newsgroup: comp.arch

According to Chris M. Thomasson <chris.m.thomasson.1@gmail.com>:

On 12/11/2025 5:41 PM, John Levine wrote:

According to Thomas Koenig <tkoenig@netcologne.de>:

MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

Heck, there are assemblers that rearrange code like this too much--
until they can be taught not to.

Any example? This would definitely go against what I would consider
to be reasonable for an assembler. gdb certainly does not do so.

On machines with delayed branches I've seen assemblers that move
instructions into the delay slot. Can't think of any others off hand.

That would suck! Back when I used to code in SPARC assembly language, I
had full control over my delay slots. Actually, IIRC, putting a MEMBAR >instruction in a delay slot is VERY bad.

I think they were smart enough only to move instructions that wouldn't cause problems.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri Dec 12 08:14:47 2025

From Newsgroup: comp.arch

John Levine <johnl@taugh.com> schrieb:

According to Thomas Koenig <tkoenig@netcologne.de>:

MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

Heck, there are assemblers that rearrange code like this too much--
until they can be taught not to.

Any example? This would definitely go against what I would consider
to be reasonable for an assembler. gdb certainly does not do so.

On machines with delayed branches I've seen assemblers that move
instructions into the delay slot. Can't think of any others off hand.

Thinking of it a bit more, the optimizing assemblers for drum memory
computers like the IBM 650 or the LGP-30 of Mel the Programmer
fame moved around instructions so the next one would be under the
head when the previous one was done executing.

Random-access memory made this redundant :-)
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.arch on Fri Dec 12 13:05:43 2025

From Newsgroup: comp.arch

In article <10hfrsl$145v$1@gal.iecc.com>, John Levine <johnl@taugh.com> wrote: >According to Thomas Koenig <tkoenig@netcologne.de>:

MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

Heck, there are assemblers that rearrange code like this too much--
until they can be taught not to.

Any example? This would definitely go against what I would consider
to be reasonable for an assembler. gdb certainly does not do so.

On machines with delayed branches I've seen assemblers that move
instructions into the delay slot. Can't think of any others off hand.

I've seen things like this, as well, particularly on machines
with multiple delay slots, where this detail was hidden from the
programmer. Or at least I have a vague memory of this; perhaps
I'm hallucinating.

More dangerous are linkers that do LTO and decide to elide code
that, no, really, I actually need for reasons that are not
apparent to the toolchain.

- Dan C.

--- Synchronet 3.21a-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch on Fri Dec 12 15:28:30 2025

From Newsgroup: comp.arch

On 12/12/2025 14:05, Dan Cross wrote:

In article <10hfrsl$145v$1@gal.iecc.com>, John Levine <johnl@taugh.com> wrote:

According to Thomas Koenig <tkoenig@netcologne.de>:

MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

Heck, there are assemblers that rearrange code like this too much--
until they can be taught not to.

Any example? This would definitely go against what I would consider
to be reasonable for an assembler. gdb certainly does not do so.

On machines with delayed branches I've seen assemblers that move
instructions into the delay slot. Can't think of any others off hand.

I've seen things like this, as well, particularly on machines
with multiple delay slots, where this detail was hidden from the
programmer. Or at least I have a vague memory of this; perhaps
I'm hallucinating.

I've seen a few assemblers that do fancy things with jumps and branches
- giving you generic conditional branch pseudo-instructions that get
turned into different types of real instructions depending on the
distance needed for the jumps and the ranges supported by the
instructions. And there are plenty that have pseudo-instructions for
loading immediates into registers that generate whatever sequence of
load immediate, shift-and-or, etc., are needed.

More dangerous are linkers that do LTO and decide to elide code
that, no, really, I actually need for reasons that are not
apparent to the toolchain.

IME you have control over the details - either using directives in the assembly, or in the linker control files. Of course that might mean
modifying code that you hoped to use untouched, and it's not hard to
forget to add a "keep" or "retain" directive.

I've found link-time dead code elimination quite useful when I have one
code base but different binary builds - sometimes all you need is a
different linker file.

--- Synchronet 3.21a-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.arch on Fri Dec 12 16:25:42 2025

From Newsgroup: comp.arch

In article <10hh8qe$2v9lm$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

On 12/12/2025 14:05, Dan Cross wrote:

In article <10hfrsl$145v$1@gal.iecc.com>, John Levine <johnl@taugh.com> wrote:

According to Thomas Koenig <tkoenig@netcologne.de>:

MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

Heck, there are assemblers that rearrange code like this too much--
until they can be taught not to.

Any example? This would definitely go against what I would consider
to be reasonable for an assembler. gdb certainly does not do so.

On machines with delayed branches I've seen assemblers that move
instructions into the delay slot. Can't think of any others off hand.

I've seen things like this, as well, particularly on machines
with multiple delay slots, where this detail was hidden from the
programmer. Or at least I have a vague memory of this; perhaps
I'm hallucinating.

I've seen a few assemblers that do fancy things with jumps and branches
- giving you generic conditional branch pseudo-instructions that get
turned into different types of real instructions depending on the
distance needed for the jumps and the ranges supported by the
instructions. And there are plenty that have pseudo-instructions for >loading immediates into registers that generate whatever sequence of
load immediate, shift-and-or, etc., are needed.

More dangerous are linkers that do LTO and decide to elide code
that, no, really, I actually need for reasons that are not
apparent to the toolchain.

IME you have control over the details - either using directives in the >assembly, or in the linker control files. Of course that might mean >modifying code that you hoped to use untouched, and it's not hard to
forget to add a "keep" or "retain" directive.

Provided, of course, that you have access to both the assembly
and the linker configuration for a given program. Sometimes you
don't (e.g., if the code in question is in some higher-level
language) or the linker configuration is just some default.

For example, the Plan 9 C compiler delegated actual instruction
selection to the linker; the compiler emitted a high(er)-level
representation of the operation. This made the linker free to
perform peephole optimization, potentially eliding important
instructions (like writes to MMIO regions). Fortunately, the
Plan 9 authors understood this so effectively all globals were
volatile, but when porting that code to standard C, one had to
exercise some care.

I've found link-time dead code elimination quite useful when I have one
code base but different binary builds - sometimes all you need is a >different linker file.

Agreed, it _is_ useful. But sometimes it's inappropriate.

- Dan C.

--- Synchronet 3.21a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Dec 12 19:17:16 2025

From Newsgroup: comp.arch

John Levine <johnl@taugh.com> posted:

According to Chris M. Thomasson <chris.m.thomasson.1@gmail.com>:

On 12/11/2025 5:41 PM, John Levine wrote:

According to Thomas Koenig <tkoenig@netcologne.de>:

MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

Heck, there are assemblers that rearrange code like this too much--
until they can be taught not to.

Any example? This would definitely go against what I would consider
to be reasonable for an assembler. gdb certainly does not do so.

On machines with delayed branches I've seen assemblers that move
instructions into the delay slot. Can't think of any others off hand.

That would suck! Back when I used to code in SPARC assembly language, I >had full control over my delay slots. Actually, IIRC, putting a MEMBAR >instruction in a delay slot is VERY bad.

I think they were smart enough only to move instructions that wouldn't cause problems.

Many early RISC assemblers were in charge of moving instructions around
subject to not altering register dependencies and not altering control
flow dependencies. This allowed those assemblers to move code across
memory instructions, across long latency calculation instructions,
branch instructions, including delay slots; and redefine what "program
order" now is. A bad side effect of exposing the pipeline to SW.

We mostly have gotten away from this due to "smart" instruction queueing.
--- Synchronet 3.21a-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch on Fri Dec 12 21:12:05 2025

From Newsgroup: comp.arch

On 12/12/2025 17:25, Dan Cross wrote:

In article <10hh8qe$2v9lm$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

On 12/12/2025 14:05, Dan Cross wrote:

In article <10hfrsl$145v$1@gal.iecc.com>, John Levine <johnl@taugh.com> wrote:

According to Thomas Koenig <tkoenig@netcologne.de>:

MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

Heck, there are assemblers that rearrange code like this too much-- >>>>>> until they can be taught not to.

Any example? This would definitely go against what I would consider >>>>> to be reasonable for an assembler. gdb certainly does not do so.

On machines with delayed branches I've seen assemblers that move
instructions into the delay slot. Can't think of any others off hand.

I've seen things like this, as well, particularly on machines
with multiple delay slots, where this detail was hidden from the
programmer. Or at least I have a vague memory of this; perhaps
I'm hallucinating.

I've seen a few assemblers that do fancy things with jumps and branches
- giving you generic conditional branch pseudo-instructions that get
turned into different types of real instructions depending on the
distance needed for the jumps and the ranges supported by the
instructions. And there are plenty that have pseudo-instructions for
loading immediates into registers that generate whatever sequence of
load immediate, shift-and-or, etc., are needed.

More dangerous are linkers that do LTO and decide to elide code
that, no, really, I actually need for reasons that are not
apparent to the toolchain.

IME you have control over the details - either using directives in the
assembly, or in the linker control files. Of course that might mean
modifying code that you hoped to use untouched, and it's not hard to
forget to add a "keep" or "retain" directive.

Provided, of course, that you have access to both the assembly
and the linker configuration for a given program. Sometimes you
don't (e.g., if the code in question is in some higher-level
language) or the linker configuration is just some default.

I've managed so far in my own work, but I suppose I work at a lower
level than most. I don't think it is common for C or C++ programmers to
know much about linker control files.

For example, the Plan 9 C compiler delegated actual instruction
selection to the linker; the compiler emitted a high(er)-level
representation of the operation. This made the linker free to
perform peephole optimization, potentially eliding important
instructions (like writes to MMIO regions). Fortunately, the
Plan 9 authors understood this so effectively all globals were
volatile, but when porting that code to standard C, one had to
exercise some care.

I've found link-time dead code elimination quite useful when I have one
code base but different binary builds - sometimes all you need is a
different linker file.

Agreed, it _is_ useful. But sometimes it's inappropriate.

Indeed.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri Dec 12 21:02:14 2025

From Newsgroup: comp.arch

MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

Many early RISC assemblers were in charge of moving instructions around subject to not altering register dependencies and not altering control
flow dependencies. This allowed those assemblers to move code across
memory instructions, across long latency calculation instructions,
branch instructions, including delay slots; and redefine what "program order" now is. A bad side effect of exposing the pipeline to SW.

I never heard of that one.

Sounds like bad design - that should be done by the compiler,
not the assembler. It is fine for the compiler to have pipeline
descriptions in the cost model of the CPU under a specific -march
or -mtune flag.

(Yes, it is preferred that performance should be rather good for
code generated for a generic microarchitecture).

We mostly have gotten away from this due to "smart" instruction queueing.

What is that?
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Dec 12 22:05:14 2025

From Newsgroup: comp.arch

Thomas Koenig <tkoenig@netcologne.de> posted:

MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

Many early RISC assemblers were in charge of moving instructions around subject to not altering register dependencies and not altering control
flow dependencies. This allowed those assemblers to move code across
memory instructions, across long latency calculation instructions,
branch instructions, including delay slots; and redefine what "program order" now is. A bad side effect of exposing the pipeline to SW.

I never heard of that one.

Sounds like bad design - that should be done by the compiler,
not the assembler. It is fine for the compiler to have pipeline
descriptions in the cost model of the CPU under a specific -march
or -mtune flag.

(Yes, it is preferred that performance should be rather good for
code generated for a generic microarchitecture).

We mostly have gotten away from this due to "smart" instruction queueing.

What is that?

Reservation stations {Value capturing and value free}, Scoreboards,
Dispatch stacks, and similar.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.arch on Fri Dec 12 14:19:29 2025

From Newsgroup: comp.arch

On 12/12/2025 2:05 PM, MitchAlsup wrote:

Thomas Koenig <tkoenig@netcologne.de> posted:

MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

Many early RISC assemblers were in charge of moving instructions around
subject to not altering register dependencies and not altering control
flow dependencies. This allowed those assemblers to move code across
memory instructions, across long latency calculation instructions,
branch instructions, including delay slots; and redefine what "program
order" now is. A bad side effect of exposing the pipeline to SW.

I never heard of that one.

Sounds like bad design - that should be done by the compiler,
not the assembler. It is fine for the compiler to have pipeline
descriptions in the cost model of the CPU under a specific -march
or -mtune flag.

(Yes, it is preferred that performance should be rather good for
code generated for a generic microarchitecture).

We mostly have gotten away from this due to "smart" instruction queueing. >>

What is that?

Reservation stations {Value capturing and value free}, Scoreboards,
Dispatch stacks, and similar.

Iiic, over on the PPC, wrt LL/SC, it was the reservation granule. I
think it could be larger that a L2 cache line. So, any interference in
that granule could cause LL/SC to fail. This can lead to livelock if the program's data was not aligned and/or padded correctly.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.arch on Fri Dec 12 14:22:30 2025

From Newsgroup: comp.arch

On 12/11/2025 6:48 PM, John Levine wrote:

According to Chris M. Thomasson <chris.m.thomasson.1@gmail.com>:

On 12/11/2025 5:41 PM, John Levine wrote:

According to Thomas Koenig <tkoenig@netcologne.de>:

MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

Heck, there are assemblers that rearrange code like this too much--
until they can be taught not to.

Any example? This would definitely go against what I would consider
to be reasonable for an assembler. gdb certainly does not do so.

On machines with delayed branches I've seen assemblers that move
instructions into the delay slot. Can't think of any others off hand.

That would suck! Back when I used to code in SPARC assembly language, I
had full control over my delay slots. Actually, IIRC, putting a MEMBAR
instruction in a delay slot is VERY bad.

I think they were smart enough only to move instructions that wouldn't cause problems.

I would check the disassembly to see if anything funny happened. Also,
when my assembled code was used in C, back before C/C++11, I would turn
off link time optimization. And check again. This was way back, around
25 years ago. My lock/wait free code was highly sensitive. If something thought it could "optimize" it, well, that was NOT good.
--- Synchronet 3.21a-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Widgit
  Sun Jan 11 18:29:52 2026
  from New Zealand via Telnet
- Geek2
  Sun Jan 11 14:07:03 2026
  from Euclid, Oh via Telnet
- Geek2
  Sun Jan 11 11:15:24 2026
  from Euclid, Oh via Telnet
- Crackerchest
  Sun Jan 11 08:12:39 2026
  from Usa via Telnet

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	54
Nodes:	6 (0 / 6)
Uptime:	14:02:29
Calls:	742
Files:	1,218
D/L today:	3 files (2,681K bytes)
Messages:	183,722
Posted today:	1

Re: instruction ordering, was Memory ordering (Re: Multi-precision addition ...)

Who's Online

Recent Visitors

System Info