On 11/13/25 5:13 PM, MitchAlsup wrote:
[snip]
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
What I wanted to write was "And assembly language is
architecture-specific".
I have worked on a single machine with several different ASM "compilers".
Believe me, one asm can be different than another asm.
But it is absolutely true that asm is architecture specific.
Is that really *absolutely* true? Architecture usually includes binary >encoding (and memory order model and perhaps other non-assembly details).
I do not know if being able to have an interrupt in the middle of an >assembly instruction is a violation of the assembly contract. (In
theory, a few special cases might be handled such that the assembly >instruction that breaks into more than one machine instruction is
handled similarly to breaking instructions into -|ops.) There might not
be any practical case where all the sub-instructions of an assembly >instruction are also assembly instructions (especially not if
retaining instruction size compatibility, which would be difficult
with such assembly instruction fission anyway).
I _feel_ that if only the opcode encoding is changed (a very tiny
difference that would only affect using code as data) that one
could rightly state that the new architecture uses the same
assembly.
I doubt there could be any economic justification for
only changing the opcode encoding, but theoretically such could
have multiple architectures with the same assembly.
I do not think assembly language considered the possible effects of
memory order model. (Have all x86 implementations been compatible?
I think the specification changed, but I do not know if
compatibility was broken.)
In addition to the definition for "assembly language" one also
needs to define "architecture".
Intel has sold incompatible architectures within the same design
by fusing off functionality and has even had different application
cores in the same chip have different instruction support (though
that seems to have bitten Intel).
On 11/13/25 5:13 PM, MitchAlsup wrote:
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:[snip]
What I wanted to write was "And assembly language is
architecture-specific".
I have worked on a single machine with several different ASM "compilers". Believe me, one asm can be different than another asm.
But it is absolutely true that asm is architecture specific.
Is that really *absolutely* true? Architecture usually includes binary encoding (and memory order model and perhaps other non-assembly details).
I do not know if being able to have an interrupt in the middle of an assembly instruction is a violation of the assembly contract. (In
theory, a few special cases might be handled such that the assembly instruction that breaks into more than one machine instruction is
handled similarly to breaking instructions into -|ops.) There might not
be any practical case where all the sub-instructions of an assembly instruction are also assembly instructions (especially not if
retaining instruction size compatibility, which would be difficult
with such assembly instruction fission anyway).
Self-modifying assembly obviously breaks with different encodings (as
would using instruction encodings as data).
If the assembly instructions were different sizes, control flow
instructions could be broken if addresses or explicit displacements
were used rather than abstract labels (which might not be allowed or
merely considered bad practice). Jump tables would also be affected
(such could also be fixed automatically if the jump table location and format is known).
Obviously, one could also do the equivalent of complete binary recopmilation, which would usually not be considered the role of an assembler.
I _feel_ that if only the opcode encoding is changed (a very tiny
difference that would only affect using code as data) that one could
rightly state that the new architecture uses the same assembly. I
doubt there could be any economic justification for only changing the
opcode encoding, but theoretically such could have multiple
architectures with the same assembly.
If one allows changing the placement of constants, register
specifiers, and opcodes (without changing the machine code size of any assembly instruction) to still be the same assembly language (which I consider reasonable), the benefit of a new encoding might be
measurable (albeit tiny and not worthwhile).
If one allows assembly instructions to change in size as well as
encoding (but retain even interrupt semantics), the assembler could
still be very simple (which might justify still calling it an assembler).
If the assembly language includes macros (single assembly instruction
that is assembled into multiple machine instructions), interrupt
granularity should not be considered part of compatibility, in my
opinion. Yes, behavior would change because some uninterruptable
assembly instructions would become interruptable, but the mapping was already not simple.
If one allows pipeline reorganization in the assembler (as I think was considered a possibility for handling explicit pipelines that
changed), then size changes would be allowed in which case substantial encoding changes should be allowed.
I do not think assembly language considered the possible effects of
memory order model. (Have all x86 implementations been compatible? I
think the specification changed, but I do not know if compatibility
was broken.)
Upward compatibility is also a factor. Since one could say that adding assembly instructions to an assembly language does not change the
language (like adding machine instructions does not change the
architecture in terms of name (upwardly compatible family?)), one
could argue that increasing the number of registers could maintain the
same "assembly language' as well as increasing the size of registers.
In addition to the definition for "assembly language" one also needs
to define "architecture". In a very strict sense, x86-64 is not a
single architecture rCo every different set of machine instructions
would constitute a different architecture. Intel has sold incompatible architectures within the same design by fusing off functionality and
has even had different application cores in the same chip have
different instruction support (though that seems to have bit Intel).
AMD and Intel also differ slightly in architecture for one or two application-level instructions (as well as virtualization
differences), but are considered the same architecture.
Architecture seems to be used in the fuzzy sense rather than the
strict sense of 100% timing-independent compatibility,
so it seems reasonable to have a fuzzier sense of assembly language to include at
least encoding changes. It seems reasonable to me for "assembly
language" to mean the preferred language for simple mapping to machine instructions (which can include idioms rCo different spellings of the
same machine instruction rCo and macros).
In My 66000, compiler produces an abstract address. After linking when
the address/offset/displacement is manifest, Linker determines the size
of the instruction.
Perhaps the RISC-V binutils team are simply incompetent, but
I think it is far more likely that linker relaxation is simply
a very difficult task to get right, and the problem lies mainly
with the specification, not with those tasked with implementing it.
Perhaps the RISC-V binutils team are simply incompetent, but
I think it is far more likely that linker relaxation is simply
a very difficult task to get right, and the problem lies mainly
with the specification, not with those tasked with implementing it.
Paul Clayton <paaronclayton@gmail.com> writes:An even more common example (numbering in the 100M to 1B range?) is x86 processors with interruptible REP MOVS/STOS/LODS instructions.
On 11/13/25 5:13 PM, MitchAlsup wrote:
[snip]
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
What I wanted to write was "And assembly language is
architecture-specific".
I have worked on a single machine with several different ASM "compilers". >>> Believe me, one asm can be different than another asm.
But it is absolutely true that asm is architecture specific.
Is that really *absolutely* true? Architecture usually includes binary>> encoding (and memory order model and perhaps other non-assembly details).
I do not know if being able to have an interrupt in the middle of an
assembly instruction is a violation of the assembly contract. (In
theory, a few special cases might be handled such that the assembly
instruction that breaks into more than one machine instruction is
handled similarly to breaking instructions into |e-|ops.) There might not
be any practical case where all the sub-instructions of an assembly
instruction are also assembly instructions (especially not if
retaining instruction size compatibility, which would be difficult
with such assembly instruction fission anyway).
The classic case is the VAX MOVC3/MOVC5 instructions. An interrupt
could occur during the move and simply restart the instruction
(the register operands having been updated as each byte was moved).
Paul Clayton <paaronclayton@gmail.com> posted:
reasonable to have a fuzzier sense of assembly language to include at
least encoding changes. It seems reasonable to me for "assembly
language" to mean the preferred language for simple mapping to machine>> instructions (which can include idioms |ore4rCY different spellings of the
same machine instruction |ore4rCY and macros).
The modern sense of ASM is that it is an ASCII version of binary.
The old sense where ASM was a language that could do anything and
everything (via Macros) has slipped into the past.
In article <10lbcg1$3uh8h$1@dont-email.me>, paaronclayton@gmail.com (Paul Clayton) wrote:
I _feel_ that if only the opcode encoding is changed (a very tiny
difference that would only affect using code as data) that one
could rightly state that the new architecture uses the same
assembly.
That would, however, raise questions and doubts among everyone who was
aware of the different instruction encodings. You would do far better to
say that the new architecture is compatible at the assembler source level, but not at the binary level.
I doubt there could be any economic justification for
only changing the opcode encoding, but theoretically such could
have multiple architectures with the same assembly.
There was a threatened case of this in the early years of this century.
Intel admitted to themselves that AMD64 was trouncing Itanium in the marketplace, and they needed to do 64-bit x86 or see their company shrink dramatically. However, they did not want to do an AMD-compatible x86-64.
They wanted to use a different instruction encoding and have deliberate binary incompatibility.
This was crazy from the network externalities point of view. It was an anti-competitive move, requiring software vendors to do separate builds
for Intel and AMD, hoping that they would not bother with AMD builds.
Microsoft killed this idea, by refusing to support any such
Intel-specific 64-bit x86. They could not prevent Intel doing it, but
there would not be Windows for it. Intel had to climb down.
I do not think assembly language considered the possible effects of
memory order model. (Have all x86 implementations been compatible?
I think the specification changed, but I do not know if
compatibility was broken.)
In general, the assembly programmer is responsible for considering the
memory model, not the language implementation.
In addition to the definition for "assembly language" one also
needs to define "architecture".
Actually, the world seems to get on OK without such clear definitions.
The obscurity of assembly language tends to limit its use to those who
really need to use it, and who are prepared to use a powerful but
unforgiving tool.
Intel has sold incompatible architectures within the same design
by fusing off functionality and has even had different application
cores in the same chip have different instruction support (though
that seems to have bitten Intel).
Well, different ISA support in different cores in the same processor
package is just dumb[1]. It reflects a delusion that Intel has suffered
since at least the late 1990s: that software is specific to particular generations of their chips, and there's a new release with significant changes for each new generation. Plenty of Intel people know that is true
for motherboard firmware, but not for operating systems or application software. But the company carries on behaving that way.
[1] See the Cell processor for an extreme example.
On 1/28/26 10:34 AM, John Dallman wrote:-----------------
In article <10lbcg1$3uh8h$1@dont-email.me>, paaronclayton@gmail.com (Paul Clayton) wrote:
Would the Intel-64 have been assembly compatible with AMD64? I
would have guessed that not just encodings would have been
different. If one wants to maintain market friction, supporting
the same assembly seems counterproductive.
This was crazy from the network externalities point of view. It was an anti-competitive move, requiring software vendors to do separate builds
for Intel and AMD, hoping that they would not bother with AMD builds.
Cooperating with AMD to develop a more sane encoding while
supporting low overhead for old binaries would have been better
for customers (I think). However, doing what is best generally
for customers is not necessarily the most profitable action.
Microsoft killed this idea, by refusing to support any such
Intel-specific 64-bit x86. They could not prevent Intel doing it, but
there would not be Windows for it. Intel had to climb down.
Which was actually a sane action not just from the hassle to
Microsoft of supporting yet another ISA but the confusion of
users (Intel64 and AMD64 both run x86-32 binaries but neither
Intel64 nor AMD64 run the other's binaries!) which would impact
Microsoft (and PC OEMs) more than Intel.
I do not think assembly language considered the possible effects of
memory order model. (Have all x86 implementations been compatible?
I think the specification changed, but I do not know if
compatibility was broken.)
In general, the assembly programmer is responsible for considering the memory model, not the language implementation.
Yes, but for a single-threaded application this is not a factor rCo
so such would be more compatible. It is not clear if assembly
programmers would use less efficient abstractions (like locks) to
handle concurrency in which case a different memory model might
not impact correctness. On the one hand, assembly is generally
chosen because C provides insufficient performance (or
expressiveness), which would imply that assembly programmers
would not want to leave any performance on the table and would
exploit the memory model. On the other hand, the assembly
programmer mindset may often be more serial and the performance
cost of using higher abstractions for concurrency may be lower
than the debugging costs of being clever relative to using
cleverness for other optimizations.
In addition to the definition for "assembly language" one also
needs to define "architecture".
Actually, the world seems to get on OK without such clear definitions.
The obscurity of assembly language tends to limit its use to those who really need to use it, and who are prepared to use a powerful but unforgiving tool.
Yes, the niche effect helps to avoid diversity of meaning across
users and across time. I suspect jargon also changes less rapidly
than common language both because there is less interaction and
there is more pressure to be formal in expression.
Intel has sold incompatible architectures within the same design
by fusing off functionality and has even had different application
cores in the same chip have different instruction support (though
that seems to have bitten Intel).
Well, different ISA support in different cores in the same processor package is just dumb[1]. It reflects a delusion that Intel has suffered since at least the late 1990s: that software is specific to particular generations of their chips, and there's a new release with significant changes for each new generation. Plenty of Intel people know that is true for motherboard firmware, but not for operating systems or application software. But the company carries on behaving that way.
Paul Clayton <paaronclayton@gmail.com> posted:[...]
On 1/28/26 10:34 AM, John Dallman wrote:-----------------
In article <10lbcg1$3uh8h$1@dont-email.me>, paaronclayton@gmail.com (Paul >>> Clayton) wrote:
Would the Intel-64 have been assembly compatible with AMD64? I
Andy Glew indicated similar but not exact enough.
Andy also stated that MicroSoft forced Intel's hand towards x86-64.
Currently, assembly-level compatibility does not seem worthwhile.
Software is usually distributed as machine code binaries not as
assembly,
Would the Intel-64 have been assembly compatible with AMD64? I
would have guessed that not just encodings would have been
different. If one wants to maintain market friction, supporting
the same assembly seems counterproductive.
Cooperating with AMD to develop a more sane encoding while
supporting low overhead for old binaries would have been better
for customers (I think).
It is not clear if assembly programmers would use less efficient
abstractions (like locks) to handle concurrency in which case
a different memory model might not impact correctness.
I do not think ISA heterogeneity is necessarily problematic.
I suspect it might require more system-level organization (similar
to Apple).
Even without ISA heterogeneity, optimal scheduling
seems to be a hard problem. Energy/power and delay/performance
preferences are not typically expressed. The abstraction of each
program owning the machine seems to discourage nice behavior (pun
intended).
(I thought Intel marketed their initial 512-bit SIMD processors
as GPGPUs with x86 compatibility, so the idea of having a
general purpose ISA morphed into a GPU-like ISA had some
fascination after Cell.)
Paul Clayton <paaronclayton@gmail.com> posted:
Cooperating with AMD to develop a more sane encoding while
supporting low overhead for old binaries would have been better
for customers (I think). However, doing what is best generally
for customers is not necessarily the most profitable action.
Yes, imaging Custer (Intel) and AMD (Sioux) sitting down together
and making optimal battle plans for Little Big Horn battle to come.
One can still buy a milling machine built in 1937 and run it in his shop.
Can one even do this for software from the previous decade ??
MS wants you to buy Office every time you buy a new PC.
MS, then moves all the menu items to different pull downs and
makes it difficult to adjust to the new SW--and then it has the
Gaul to chew up valuable screen space with ever larger pull-
down bars.
Is it any wonder users want the 1937 milling machine model ???
On 2/5/26 2:02 PM, MitchAlsup wrote:
Paul Clayton <paaronclayton@gmail.com> posted:
[snip]
Cooperating with AMD to develop a more sane encoding while
supporting low overhead for old binaries would have been better
for customers (I think). However, doing what is best generally
for customers is not necessarily the most profitable action.
Yes, imaging Custer (Intel) and AMD (Sioux) sitting down together
and making optimal battle plans for Little Big Horn battle to come.
Rather than making battle plans for how to annihilate each
other, perhaps finding a better solution than the ratting each
other out in the prisoner's dilemma.
[snip]
One can still buy a milling machine built in 1937 and run it in his shop. Can one even do this for software from the previous decade ??
Yes, but dependency on (proprietary) servers for some games has
made them (unnecessarily) unplayable.
From what I understand, one can still run WordPerfect under a
DOS emulator on modern x86-64.
With the poor security of much software, even OSes, one might
want to contain any legacy software in a more secured
environment.
Preventing automatic update is perhaps more of a hassle. Some
people have placed software in a virtual machine that has no
networking to avoid software breaking.
MS wants you to buy Office every time you buy a new PC.
I thought MS wanted everyone to use Office365. It is harder to
force people to get a new computer, but a monthly fee will recur automatically.
MS, then moves all the menu items to different pull downs and
makes it difficult to adjust to the new SW--and then it has the
Gaul to chew up valuable screen space with ever larger pull-
down bars.
Ah, but they are just beginning to include advertising. Imagine
every time one uses the mouse (to indicate to the computer that
the user's eyes are focused on a particular place) an
advertisement appears and follows the cursor movement. Even just
having menu entries that are advertisements would be kind of
annoying, but one would be able to get rid of those by leasing
the premium edition (until one needs to lease the platinum
edition, then the "who wants to remain a millionaire" edition).
Is it any wonder users want the 1937 milling machine model ???
Have no fear; soon you may be merely leasing your computer.
Computers need to have the latest spyware so that advertisements
can be appropriately targeted and adblocking must be made
impossible.
Paul Clayton <paaronclayton@gmail.com> posted:
On 2/5/26 2:02 PM, MitchAlsup wrote:
I thought MS wanted everyone to use Office365. It is harder to
force people to get a new computer, but a monthly fee will recur
automatically.
When I need a tool--I buy that tool--I never rent that tool.
Name one feature I would want from office365 that was not already
present in office from <say> 1998.
MS, then moves all the menu items to different pull downs and
makes it difficult to adjust to the new SW--and then it has the
Gaul to chew up valuable screen space with ever larger pull-
down bars.
Ah, but they are just beginning to include advertising. Imagine
every time one uses the mouse (to indicate to the computer that
the user's eyes are focused on a particular place) an
advertisement appears and follows the cursor movement. Even just
having menu entries that are advertisements would be kind of
annoying, but one would be able to get rid of those by leasing
the premium edition (until one needs to lease the platinum
edition, then the "who wants to remain a millionaire" edition).
Why would I or anyone want advertising in office ????????
Is it any wonder users want the 1937 milling machine model ???
Have no fear; soon you may be merely leasing your computer.
Computers need to have the latest spyware so that advertisements
can be appropriately targeted and adblocking must be made
impossible.
I am the kind of guy that turns off "telemetry" and places advertisers
in /hosts file.
Why would I or anyone want advertising in office ????????
Name one feature I would want from office365 that was not already
present in office from <say> 1998.
Why would I or anyone want advertising in office ????????
Paul Clayton <paaronclayton@gmail.com> posted:
On 2/5/26 2:02 PM, MitchAlsup wrote:
[snip]
Paul Clayton <paaronclayton@gmail.com> posted:
Cooperating with AMD to develop a more sane encoding while
supporting low overhead for old binaries would have been better
for customers (I think). However, doing what is best generally
for customers is not necessarily the most profitable action.
Yes, imaging Custer (Intel) and AMD (Sioux) sitting down together
and making optimal battle plans for Little Big Horn battle to come.
Rather than making battle plans for how to annihilate each
other, perhaps finding a better solution than the ratting each
other out in the prisoner's dilemma.
[snip]
One can still buy a milling machine built in 1937 and run it in his shop. >>> Can one even do this for software from the previous decade ??
Yes, but dependency on (proprietary) servers for some games has
made them (unnecessarily) unplayable.
From what I understand, one can still run WordPerfect under a
DOS emulator on modern x86-64.
With the poor security of much software, even OSes, one might
want to contain any legacy software in a more secured
environment.
Preventing automatic update is perhaps more of a hassle. Some
people have placed software in a virtual machine that has no
networking to avoid software breaking.
MS wants you to buy Office every time you buy a new PC.
I thought MS wanted everyone to use Office365. It is harder to
force people to get a new computer, but a monthly fee will recur
automatically.
When I need a tool--I buy that tool--I never rent that tool.
Name one feature I would want from office365 that was not already
present in office from <say> 1998.
MS, then moves all the menu items to different pull downs and
makes it difficult to adjust to the new SW--and then it has the
Gaul to chew up valuable screen space with ever larger pull-
down bars.
Ah, but they are just beginning to include advertising. Imagine
every time one uses the mouse (to indicate to the computer that
the user's eyes are focused on a particular place) an
advertisement appears and follows the cursor movement. Even just
having menu entries that are advertisements would be kind of
annoying, but one would be able to get rid of those by leasing
the premium edition (until one needs to lease the platinum
edition, then the "who wants to remain a millionaire" edition).
Why would I or anyone want advertising in office ????????
Is it any wonder users want the 1937 milling machine model ???
Have no fear; soon you may be merely leasing your computer.
Computers need to have the latest spyware so that advertisements
can be appropriately targeted and adblocking must be made
impossible.
I am the kind of guy that turns off "telemetry" and places advertisers
in /hosts file.
On 09/02/2026 20:33, MitchAlsup wrote:
Paul Clayton <paaronclayton@gmail.com> posted:
From what I understand, one can still run WordPerfect under a
DOS emulator on modern x86-64.
With the poor security of much software, even OSes, one might
want to contain any legacy software in a more secured
environment.
Most old software did not have poor security. It was secure by not
having features that could be abused - and thus no need to worry about
extra layers to protect said features. MS practically invented the
concept of insecure applications like word processors - they put
unnecessary levels of automation and macros, integrated it with email >(especially their already hopelessly insecure programs), and so on. No
real user has any need for "send this document by email" in their word >processor - but spam robots loved it. (MS even managed to figure out a
way to let font files have executable malware in them.) If you go back
to older tools that did the job they were supposed to do, without trying
to do everything else, security is a non-issue for most software.
MS wants you to buy Office every time you buy a new PC.
I thought MS wanted everyone to use Office365. It is harder to
force people to get a new computer, but a monthly fee will recur
automatically.
When I need a tool--I buy that tool--I never rent that tool.
On 2/9/26 2:33 PM, MitchAlsup wrote:
Paul Clayton <paaronclayton@gmail.com> posted:
On 2/5/26 2:02 PM, MitchAlsup wrote:
[snip]>>> MS wants you to buy Office every time you buy a new PC.
I thought MS wanted everyone to use Office365. It is harder to
force people to get a new computer, but a monthly fee will recur
automatically.
When I need a tool--I buy that tool--I never rent that tool.
Name one feature I would want from office365 that was not already
present in office from <say> 1998.
I do not know if MS can legally cancel your MS Office license, and I
doubt the few "software pirates" who continue to use an unsupported ("invalid") version would be worth MS' time and effort to prevent such people from using such software.
However, there seems to be a strong trend toward "you shall own nothing."
MS, then moves all the menu items to different pull downs and
makes it difficult to adjust to the new SW--and then it has the
Gaul to chew up valuable screen space with ever larger pull-
down bars.
Ah, but they are just beginning to include advertising. Imagine
every time one uses the mouse (to indicate to the computer that
the user's eyes are focused on a particular place) an
advertisement appears and follows the cursor movement. Even just
having menu entries that are advertisements would be kind of
annoying, but one would be able to get rid of those by leasing
the premium edition (until one needs to lease the platinum
edition, then the "who wants to remain a millionaire" edition).
Why would I or anyone want advertising in office ????????
Why would anyone want advertising in in a Windows Start Menu?
For Microsoft such provides a bit more revenue/profit as businesses seem willing to pay for such advertisements. Have you ever heard "You are not
the consumer; you are the product"?
I think I read that some streaming services have added
advertising to their (formerly) no-advertising subscriptions, so
the suggested lease term inflation is not completely
unthinkable.
Is it any wonder users want the 1937 milling machine model ???
Have no fear; soon you may be merely leasing your computer.
Computers need to have the latest spyware so that advertisements
can be appropriately targeted and adblocking must be made
impossible.
I am the kind of guy that turns off "telemetry" and places advertisers
in /hosts file.
If all new computers are "leased" (where tampering with the
device rCo or not connecting it to the Internet such that it can
phone home rCo revokes "ownership" and not merely warranty and one
agrees to a minimum use [to ensure that enough ads are viewed]),
ordinary users (who cannot assemble devices from commodity
parts) would not have a choice. If governments enforce the
rights of corporations to protect their businesses by outlawing
sale of computer components to anyone who would work around the
cartel, owning a computer could become illegal. Governments have
an interest in having all domestic computers be both secure and
to facilitate domestic surveillance, so mandating features that
remove freedom and require an upgrade cycle (which is also good
for the economyry|) has some attraction.
I doubt people like you are a sufficient threat to profits that
such extreme measures will be used, but the world (and
particularly the U.S.) seems to be becoming somewhat dystopian.
This is getting kind of off-topic and is certainly not something I want
to think about.
I remember reading about the 8080 _ 8086 assembly translator. I
did not know that CP/M and MS-DOS were similar enough to
facilitate porting, so that note was interesting to me.
Intel presumably thought Itanium would be the only merchant
64-bit ISA that mattered (and this would exclude AMD) and
that the masses could use 32-bit until less expensive Itanium
processors were possible.
I agree that such would add complexity, but there is already
complexity for power saving with same ISA heterogeneity. NUMA-
awareness, cache sharing, and cache warmth also complicate
scheduling, so the question becomes how much extra complexity
does such introduce.
I still feel an attraction to a market-oriented resource
management such that threads could both minimize resource use
(that might be more beneficial to others) and get more than a
fair-share of resources that are important.
In article <10n2u02$270jc$5@dont-email.me>, paaronclayton@gmail.com (Paul Clayton) wrote:
I remember reading about the 8080 _ 8086 assembly translator. I
did not know that CP/M and MS-DOS were similar enough to
facilitate porting, so that note was interesting to me.
/Early/ MS-DOS. That used CPM-like File Control Blocks, and didn't have hierarchical directories. It didn't really support hard disks. The
CP/M-style APIs all carried on existing after MS-DOS 2.0 introduced a new
set of APIs that were more suitable for high-level languages, but they weren't much used un new software.
Intel presumably thought Itanium would be the only merchant
64-bit ISA that mattered (and this would exclude AMD) and
that the masses could use 32-bit until less expensive Itanium
processors were possible.
Pretty much. Then the struggle to make Itanium run fast became the overpowering concern, until they gave up and concentrated on x86-64,
claiming that Itanium would be back in a few years.
I don't think many people took that claim seriously. Some years later, an Intel marketing man was quite shocked to hear that, and that the world
had simply been humouring them.
I agree that such would add complexity, but there is already
complexity for power saving with same ISA heterogeneity. NUMA-
awareness, cache sharing, and cache warmth also complicate
scheduling, so the question becomes how much extra complexity
does such introduce.
If the behaviour of Apple's OSes are any guide, complexity is avoided as
far as possible.
I still feel an attraction to a market-oriented resource
management such that threads could both minimize resource use
(that might be more beneficial to others) and get more than a
fair-share of resources that are important.
The difficulty there is that developers will have a very hard time
creating /measurable/ speed-ups that apply across a wide range of
different configurations. Companies will therefore be reluctant to put developer hours into it that could go into features that customers are
asking for.
John
/Early/ MS-DOS. That used CPM-like File Control Blocks, and didn't have
hierarchical directories. ...
My own limited experience with MS-DOS programming mostly showed them
using integer file-handles an a vaguely Unix-like interface for file IO
at the "int 21h" level.
According to BGB <cr88192@gmail.com>:
/Early/ MS-DOS. That used CPM-like File Control Blocks, and didn't have
hierarchical directories. ...
My own limited experience with MS-DOS programming mostly showed them
using integer file-handles an a vaguely Unix-like interface for file IO
at the "int 21h" level.
Yeah, Mark Zbikowski added them along with the tree structred file system in DOS 2.0.
He was at Yale when I was, using a Unix 7th edition system I was supporting.
My own limited experience with MS-DOS programming mostly showed
them using integer file-handles an a vaguely Unix-like interface
for file IO at the "int 21h" level.
Which is, ironically, in conflict with the "FILE *" interface used
by C's stdio API.
Well, apart from some vague (unconfirmed) memories of being exposed
to Pascal via the "Mac Programmer's Workbench" thing at one point
and being totally lost (was very confused, used a CLI but the CLI
commands didn't make sense).
In a way, it showed that they screwed up the design pretty hard
that x86-64 ended up being the faster and more efficient option...
I guess one question is if they had any other particular drawbacks
other than, say:
Their code density was one of the worst around;
128 registers is a little excessive;
128 predicate register bits is a bit WTF;
I guess it is more of an open question of what would have happened,
say, if Intel had gone for an ISA design more like ARM64 or RISC-V
or something.
Well, or something like PowerPC, but then again, IBM still had
difficulty keeping PPC competitive, so dunno. Then again, I think
IBM's PPC issues were more related to trying to keep up in the chip
fab race that was still going strong at the time, rather than an
ISA design issue.
They did. They really did.
I guess one question is if they had any other particular drawbacks
other than, say:
Their code density was one of the worst around;
128 registers is a little excessive;
128 predicate register bits is a bit WTF;
Those huge register files had a lot to do with the low code density. They
had two much bigger problems, though.
They'd correctly understood that the low speed of affordable dynamic RAM
as compared to CPUs running at hundreds of MHz was the biggest barrier to making code run fast. Their solution was have the compiler schedule loads well in advance. They assumed, without evidence, that a compiler with
plenty of time to think could schedule loads better than hardware doing
it dynamically. It's an appealing idea,
but it's wrong.
It might be possible to do that effectively in a single-core,
single-thread, single-task system that isn't taking many (if any)
interrupts. In a multi-core system, running a complex operating system, several multi-threaded applications, and taking frequent interrupts and context switches, it is _not possible_. There is no knowledge of any of
the interrupts, context switches or other applications at compile time,
so the compiler has no idea what is in cache and what isn't. I don't understand why HP and Intel didn't realise this. It took me years, but I
am no CPU designer.
Speculative execution addresses that problem quite effectively. We don't
have a better way, almost thirty years after Itanium design decisions
were taken. They didn't want to do speculative execution, and they close
an instruction format and register set that made adding it later hard. If
it was ever tried, nothing was released that had it AFAIK.
The other problem was that they had three (or six, or twelve) in-order pipelines running in parallel. That meant the compilers had to provide
enough ILP to keep those pipelines fed, or they'd just eat cache capacity
and memory bandwidth executing no-ops ... in a very bulky instruction set. They didn't have a general way to extract enough ILP. Nobody does,
even
now. They just assumed that with an army of developers they'd find enough heuristics to make it work well enough. They didn't.
There was also an architectural misfeature with floating-point advance
loads that could make them disappear entirely if there was a call
instruction between an advance-load instruction and the corresponding check-load instruction. That cost me a couple of weeks working out and reporting the bug, which was unfixable. The only work-around was to
re-issue all outstanding all floating-point advance-load instruction
after each call returned. The effective code density went down further,
and there were lots of extra read instructions issued.
I guess it is more of an open question of what would have happened,
say, if Intel had gone for an ISA design more like ARM64 or RISC-V
or something.
ARM64 seems to me to be the product of a lot more experience with speculatively-executing processors than was available in 1998. RISC-V has
not demonstrated really high performance yet, and it's been around long enough that I'm starting to doubt it ever will.
Well, or something like PowerPC, but then again, IBM still had
difficulty keeping PPC competitive, so dunno. Then again, I think
IBM's PPC issues were more related to trying to keep up in the chip
fab race that was still going strong at the time, rather than an
ISA design issue.
I think that was fabs, rather than architecture.
While I was providing libraries for PowerPC (strictly, POWER4, POWER5 and POWER6, one after another) it always had rather decent performance for its clockspeed and process.--- Synchronet 3.21b-Linux NewsLink 1.2
John
At the time of conception, there were amny arguments that {sooner or
later} compilers COULD figure stuff like this out.
Now, 30 years later the compilers are still in the position of having
made LITTLE progress.
I suspect a big part of the problem was tension between Intel and HP
were the only political solution was allowing the architects from both
sides to "dump in" their favorite ideas. A recipe for disaster.
On 2/19/2026 1:59 PM, John Levine wrote:
According to BGB-a <cr88192@gmail.com>:
/Early/ MS-DOS. That used CPM-like File Control Blocks, and didn't have >>>> hierarchical directories. ...
My own limited experience with MS-DOS programming mostly showed them
using integer file-handles an a vaguely Unix-like interface for file IO
at the "int 21h" level.
Yeah, Mark Zbikowski added them along with the tree structred file
system in DOS 2.0.
He was at Yale when I was, using a Unix 7th edition system I was
supporting.
Looks it up...
Yeah, my case, I didn't exist yet when the MS-DOS 2.x line came out...
Did exist for the 3.x line though.
I don't remember much from those years though.
Some fragmentary memories implied that (in that era) had mostly been watching shows like Care Bears and similar (but, looking at it at a
later age, found it mostly unwatchable). I think also shows like Smurfs
and Ninja Turtles and similar, etc.
Like, at some point, memory breaking down into sort of an amorphous mass
of things from TV shows all just sort of got mashed together. Not much > stable memory of things other than fragments of TV shows and such.
Not sure what the experience is like for most people though.My memory from before the age of 4 is extremely spotty, just a couple of situations that made a lasting impact.
My own limited experience with MS-DOS programming mostly showed
them using integer file-handles an a vaguely Unix-like interface
for file IO at the "int 21h" level.
Which is, ironically, in conflict with the "FILE *" interface used
by C's stdio API.
However, it's entirely concordant with Unix's lower-level file
descriptors, as used in the read() and write() calls.
<https://en.wikipedia.org/wiki/File_descriptor> <https://en.wikipedia.org/wiki/Read_(system_call)>
The FILE* interface is normally implemented on top of the lower-level
calls, with a buffer in the process' address space, managed by the C
run-time library. The file descriptor is normally a member of the FILE structure.
MS-DOS is not a great design, but it isn't crazy either.
Well, apart from some vague (unconfirmed) memories of being exposed
to Pascal via the "Mac Programmer's Workbench" thing at one point
and being totally lost (was very confused, used a CLI but the CLI
commands didn't make sense).
I used it very briefly. It was a very weird CLI, seemingly designed by someone opposed to the basic idea of a CLI.
In a way, it showed that they screwed up the design pretty hard
that x86-64 ended up being the faster and more efficient option...
They did. They really did.
I guess one question is if they had any other particular drawbacks
other than, say:
Their code density was one of the worst around;
128 registers is a little excessive;
128 predicate register bits is a bit WTF;
Those huge register files had a lot to do with the low code density. They
had two much bigger problems, though.
They'd correctly understood that the low speed of affordable dynamic RAM
as compared to CPUs running at hundreds of MHz was the biggest barrier to making code run fast. Their solution was have the compiler schedule loads well in advance. They assumed, without evidence, that a compiler with
plenty of time to think could schedule loads better than hardware doing
it dynamically. It's an appealing idea, but it's wrong.
It might be possible to do that effectively in a single-core,
single-thread, single-task system that isn't taking many (if any)
interrupts. In a multi-core system, running a complex operating system, several multi-threaded applications, and taking frequent interrupts and context switches, it is _not possible_. There is no knowledge of any of
the interrupts, context switches or other applications at compile time,
so the compiler has no idea what is in cache and what isn't. I don't understand why HP and Intel didn't realise this. It took me years, but I
am no CPU designer.
Speculative execution addresses that problem quite effectively. We don't
have a better way, almost thirty years after Itanium design decisions
were taken. They didn't want to do speculative execution, and they close
an instruction format and register set that made adding it later hard. If
it was ever tried, nothing was released that had it AFAIK.
The other problem was that they had three (or six, or twelve) in-order pipelines running in parallel. That meant the compilers had to provide
enough ILP to keep those pipelines fed, or they'd just eat cache capacity
and memory bandwidth executing no-ops ... in a very bulky instruction set. They didn't have a general way to extract enough ILP. Nobody does, even
now. They just assumed that with an army of developers they'd find enough heuristics to make it work well enough. They didn't.
There was also an architectural misfeature with floating-point advance
loads that could make them disappear entirely if there was a call
instruction between an advance-load instruction and the corresponding check-load instruction. That cost me a couple of weeks working out and reporting the bug, which was unfixable. The only work-around was to
re-issue all outstanding all floating-point advance-load instruction
after each call returned. The effective code density went down further,
and there were lots of extra read instructions issued.
I guess it is more of an open question of what would have happened,
say, if Intel had gone for an ISA design more like ARM64 or RISC-V
or something.
ARM64 seems to me to be the product of a lot more experience with speculatively-executing processors than was available in 1998. RISC-V has
not demonstrated really high performance yet, and it's been around long enough that I'm starting to doubt it ever will.
Well, or something like PowerPC, but then again, IBM still had
difficulty keeping PPC competitive, so dunno. Then again, I think
IBM's PPC issues were more related to trying to keep up in the chip
fab race that was still going strong at the time, rather than an
ISA design issue.
I think that was fabs, rather than architecture. While I was providing libraries for PowerPC (strictly, POWER4, POWER5 and POWER6, one after another) it always had rather decent performance for its clockspeed and process.
John
On 2/19/2026 5:10 PM, John Dallman wrote: ------------------------------------
This can be used to add resistance against stack-stomping via buffer overflows, but is potentially risky with RISC-V:
AUIPC X1, AddrHi
JALR X0, AddrLo(X1)
Can nuke the process, when officially it is allowed (vs forcing the use
of a different register to encode a long branch).
Like, how about one not try to bake in assumptions about 1-cycle ALU and 2-cycle Load being practical?...
Vs, say, 2-cycle ALU ops and 3-cycle Loads; with an ideal of putting 5 instructions between an instruction that generates a result and the instruction that consumes the result as this is more likely to work with in-order superscalar.
But, then one runs into the issue that if a basic operation then
requires a multi-op sequence, the implied latency goes up considerably
(say, could call this "soft latency", or SL).
So, for example, it means that, say:
2-instruction sign extension:
RV working assumption: 2 cycles
Hard latency (2c ALU): 4 cycles
Soft latency: 12 cycles.
For a 3-op sequence, the effective soft-latency goes up to 18, ...
And, in cases where the soft-latency significantly exceeds the total
length of the loop body, it is no longer viable to schedule the loop efficiently.
So, in this case, an indexed-load instruction has an effective 9c SL, whereas SLLI+ADD+LD has a 21 cycle SL.
where, in this case, the goal of something like the WEXifier is to
minimize this soft-latency cost (in cases where a dependency is seen,
any remaining soft-latency is counted as penalty).
But, then again, maybe the concept of this sort of "soft latency" seems
a bit alien.
Granted, not sure how this maps over to OoO, but had noted that even
with modern CPUs, there still seems to be benefit from assuming a sort
of implicit high latency for instructions over assuming a lower latency.
*1: Where people argue that if each vendor can do a CPU with their own custom ISA variants and without needing to license or get approval from
a central authority, that invariably everything would decay into an incoherent mess where there is no binary compatibility between
processors from different vendors (usual implication being that people
are then better off staying within the ARM ecosystem to avoid RV's lawlessness).
BGB <cr88192@gmail.com> posted:
On 2/19/2026 5:10 PM, John Dallman wrote:
------------------------------------
This can be used to add resistance against stack-stomping via buffer
overflows, but is potentially risky with RISC-V:
AUIPC X1, AddrHi
JALR X0, AddrLo(X1)
Can nuke the process, when officially it is allowed (vs forcing the use
of a different register to encode a long branch).
That should be:
AUPIC x1,hi(offset)
JALR x0,lo(offset)
using:
SETHI x1,AddrHi
JALR x0,AddrLo
would work.
---------------------
Like, how about one not try to bake in assumptions about 1-cycle ALU and
2-cycle Load being practical?...
for the above to work::
ALU is < -+ cycle leaving -+ cycle output drive and -+ cycle input mux
SRAM is -+ cycle, AGEN to SRAM decode is -+ cycle, SRAM output to shifter
is < -+ cycle, and set-selection is -+ cycle; leaving -+ cycle for output drive.
Vs, say, 2-cycle ALU ops and 3-cycle Loads; with an ideal of putting 5
instructions between an instruction that generates a result and the
instruction that consumes the result as this is more likely to work with
in-order superscalar.
1-cycle ALU with 3 cycle LD is not very hard at 16-gates per cycle.
2-cycle LD is absolutely impossible with 1-cycle addr-in to data-out
SRAM. So, we generally consider any design with 2-cycle LD to be
frequency limited.
But, then one runs into the issue that if a basic operation then
requires a multi-op sequence, the implied latency goes up considerably
(say, could call this "soft latency", or SL).
So, for example, it means that, say:
2-instruction sign extension:
RV working assumption: 2 cycles
Hard latency (2c ALU): 4 cycles
Soft latency: 12 cycles.
For a 3-op sequence, the effective soft-latency goes up to 18, ...
One of the reasons a 16-gate design works better in practice than
a 12-gate design. And why a 1-cycle ALU, 3-cycle LD runs at higher
frequency.
And, in cases where the soft-latency significantly exceeds the total
length of the loop body, it is no longer viable to schedule the loop
efficiently.
In software, there remains no significant problem running the loop
in HW.
So, in this case, an indexed-load instruction has an effective 9c SL,
whereas SLLI+ADD+LD has a 21 cycle SL.
3-cycle indexed LD with cache hit in may -|Architectures--with scaled indexing. This is one of the driving influences of "raising" the
semantic content of LD/ST instructions to [Rbase+Rindex<<sc+Disp]
where, in this case, the goal of something like the WEXifier is to
minimize this soft-latency cost (in cases where a dependency is seen,
any remaining soft-latency is counted as penalty).
But, then again, maybe the concept of this sort of "soft latency" seems
a bit alien.
Those ISAs without scaled indexing have longer effective latency through cache than those with: those without full range Dsip have similar problems: those without both are effectively adding 3-4 cycles to LD latency.
Which is why the size of the execution windows grew from 60-ish to 300-ish
to double performance--the ISA is adding latency and the size of execution window is the easiest way to absorb such latency.
{{60-ish ~= Athlon; 300-ish ~= M4}}
Granted, not sure how this maps over to OoO, but had noted that even
with modern CPUs, there still seems to be benefit from assuming a sort
of implicit high latency for instructions over assuming a lower latency.
Execution window size is how it maps.
*1: Where people argue that if each vendor can do a CPU with their own
custom ISA variants and without needing to license or get approval from
a central authority, that invariably everything would decay into an
incoherent mess where there is no binary compatibility between
processors from different vendors (usual implication being that people
are then better off staying within the ARM ecosystem to avoid RV's
lawlessness).
RISC-V seems to be "eating" a year (or a bit more) to bring this mess into
a coherent framework.
At the time of conception, there were amny arguments that {sooner or
later} compilers COULD figure stuff like this out.
I can't remember seeing such arguments comping from compiler people, tho.
I suspect a big part of the problem was tension between Intel and HP
were the only political solution was allowing the architects from both
sides to "dump in" their favorite ideas. A recipe for disaster.
The odd thing is that these were hardware companies betting on "someone
else" solving their problem, yet if compiler people truly had managed to >solve those problems, then other hardware companies could have taken >advantage just as well.
To me the main question is whether they were truly confused and just got >lucky (lucky because they still managed to sell their idea enough that
most RISC companies folded),
On 2/20/2026 5:49 PM, MitchAlsup wrote:
----------------------------
There is a non-zero risk though when one disallows uses that are theoretically allowed in the ISA, even if GCC doesn't use them.
Well, and in terms of typical ASM notation, there is this mess:
(Rb) / @Rb / @(Rb) //load/store register
(Rb, Disp) / Disp(Rb) //load/store disp
@(Rb, Disp) / @(Disp, Rb) //load/store disp (but with @)
Then:
(Rb, Ri) //indexed (element sized index)
Ri(Rb) //indexed (byte-scaled index)
(Rb, Ri, Sc) //indexed with scale
Disp(Rb, Ri) //indexed with displacement
Disp(Rb, Ri, Sc) //indexed with displacement and scale
Then:
@Rb+ / (Rb)+ //post-increment
@-Rb / -(Rb) //pre-decrement
@Rb- / (Rb)- //post-decrement
@+Rb / +(Rb) //pre-increment
And, in some variants, all the registers prefixed with '%'.
Stefan Monnier <monnier@iro.umontreal.ca> writes:
At the time of conception, there were amny arguments that {sooner or
later} compilers COULD figure stuff like this out.
I can't remember seeing such arguments comping from compiler people, tho.
Actually, the IA-64 people could point to the work on VLIW (in
particular, Multiflow (trace scheduling) and Cydrome (software
pipelining)), which in turn is based on the work on compilers for
microcode.
That did not solve memory latency, but that's a problem even for OoO
cores.
I suspect a big part of the problem was tension between Intel and HP
were the only political solution was allowing the architects from both
sides to "dump in" their favorite ideas. A recipe for disaster.
The HP side had people like Bob Rau (Cydrome) and Josh Fisher
(Multiflow), and given their premise, the architecture is ok; somewhat
on the complex side, but they wanted to cover all the good ideas from
earlier designs; after all, it was to be the one architecture to rule
them all (especially performancewise). You cannot leave out a feature
that a competitor could then add to outperform IA-64.
To me the main question is whether they were truly confused and just got >lucky (lucky because they still managed to sell their idea enough that
most RISC companies folded),
I think most RISC companies had troubles scaling. They were used to
small design teams spinning out simple RISCs in a short time, and did
not have the organization to deal with the much larger projects that
OoO superscalars required.
And while everybody inventing their own architecture may have looked like a good idea when developing an
architecture and its implementations was cheap,
it looked like a bad--- Synchronet 3.21b-Linux NewsLink 1.2
deal when development costs started to ramp up in the mid-90s. That's
why HP went to Intel, and other companies (in particular, SGI) took
this as an exit strategy from the own-RISC business.
DEC had increasing delays in their chips, and eventually could not
make enough money with them and had to sell themselves to Compaq (who
also could not sustain the effort and sold themselves to HP (who
canceled Alpha development)). I doubt that IA-64 played a big role in
that game.
Back to IA-64: At the time, when OoO was just starting, the premise of
IA-64 looked plausible. Why wouldn't they see a fast clock rate and
higher ILP from explicit parallelism than conventional architectures
would see from OoO (apparently complex, and initially without anything
like IA-64's ALAT)?
- anton
BGB <cr88192@gmail.com> posted:
On 2/20/2026 5:49 PM, MitchAlsup wrote:
----------------------------
There is a non-zero risk though when one disallows uses that are
theoretically allowed in the ISA, even if GCC doesn't use them.
This is why one must decode all 32-bits of each instruction--so that
there is no hole in the decoder that would allow the core to do some-
thing not directly specified in ISA. {And one of the things that make
an industrial quality ISA so hard to fully specify.}}
---------------------
Well, and in terms of typical ASM notation, there is this mess:
(Rb) / @Rb / @(Rb) //load/store register
(Rb, Disp) / Disp(Rb) //load/store disp
@(Rb, Disp) / @(Disp, Rb) //load/store disp (but with @)
Then:
(Rb, Ri) //indexed (element sized index)
Ri(Rb) //indexed (byte-scaled index)
(Rb, Ri, Sc) //indexed with scale
Disp(Rb, Ri) //indexed with displacement
Disp(Rb, Ri, Sc) //indexed with displacement and scale
Then:
@Rb+ / (Rb)+ //post-increment
@-Rb / -(Rb) //pre-decrement
@Rb- / (Rb)- //post-decrement
@+Rb / +(Rb) //pre-increment
And, in some variants, all the registers prefixed with '%'.
Leading to SERIAL DECODE--which is BAD.
-----------------------
On 2/21/2026 2:15 PM, MitchAlsup wrote:Whether your ISA can be attacked with Spectr|- and/or Meltdown;
BGB <cr88192@gmail.com> posted:
On 2/20/2026 5:49 PM, MitchAlsup wrote:
----------------------------
There is a non-zero risk though when one disallows uses that are
theoretically allowed in the ISA, even if GCC doesn't use them.
This is why one must decode all 32-bits of each instruction--so that
there is no hole in the decoder that would allow the core to do some-
thing not directly specified in ISA. {And one of the things that make
an industrial quality ISA so hard to fully specify.}}
---------------------
Sometimes there is a tension:
What is theoretically allowed in the ISA;
What is the theoretically expected behavior in some abstract model;
What stuff is actually used by compilers;
What features or behaviors does one want;
...
Implementing RISC-V strictly as per an abstract model would limit both efficiency and hinder some use-cases.
Then it comes down to "what do compilers do" and "what unintentional behaviors could an ASM programmer stumble onto unintentionally".
eve at best.
anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
The HP side had people like Bob Rau (Cydrome) and Josh Fisher
(Multiflow), and given their premise, the architecture is ok; somewhat
on the complex side, but they wanted to cover all the good ideas from
earlier designs; after all, it was to be the one architecture to rule
them all (especially performancewise). You cannot leave out a feature
that a competitor could then add to outperform IA-64.
In this time period, performance was doubling every 14 months, so if a >feature added x performance it MUST avoid adding more than x/14 months
to the schedule. If IA-64 was 2 years earlier, it would have been com- >petitive--sadly it was not.
Stefan Monnier <monnier@iro.umontreal.ca> writes:
MitchAlsup <user5857@newsgrouper.org.invalid> wrote:Actually, the IA-64 people could point to the work on VLIW (in
At the time of conception, there were amny arguments that {sooner orI can't remember seeing such arguments comping from compiler people, tho.
later} compilers COULD figure stuff like this out.
particular, Multiflow (trace scheduling) and Cydrome (software
pipelining)), which in turn is based on the work on compilers for
microcode.
The major problem was that the premise was wrong. They assumed that
in-order would give them a clock rate edge, but that was not the case,
right from the start (The 1GHz Itanium II (released July 2002)
competed with 2.53GHz Pentium 4 (released May 2002) and 1800MHz Athlon
XP (released June 2002)). They also assumed that explicit parallelism
would provide at least as much ILP as hardware scheduling of OoO CPUs,
but that was not the case for general-purpose code, and in any case,
they needed a lot of additional ILP to make up for their clock speed disadvantage.
The odd thing is that these were hardware companies betting on "someone >>else" solving their problem, yet if compiler people truly had managed to >>solve those problems, then other hardware companies could have taken >>advantage just as well.I am sure they had patents on stuff like the advanced load and the
ALAT, so no, other hardware companies would have had a hard time.
Does imply that my younger self was notable, and not seen as just
some otherwise worthless nerd.
For 128 predicate registers, this part doesn't make as much sense:
*1: Where people argue that if each vendor can do a CPU with their
own custom ISA variants and without needing to license or get
approval from a central authority, that invariably everything would
decay into an incoherent mess where there is no binary
compatibility between processors from different vendors (usual
implication being that people are then better off staying within
the ARM ecosystem to avoid RV's lawlessness).
BGB <cr88192@gmail.com> posted:
On 2/21/2026 2:15 PM, MitchAlsup wrote:Whether your ISA can be attacked with Spectr|- and/or Meltdown;
BGB <cr88192@gmail.com> posted:
On 2/20/2026 5:49 PM, MitchAlsup wrote:
----------------------------
There is a non-zero risk though when one disallows uses that are
theoretically allowed in the ISA, even if GCC doesn't use them.
This is why one must decode all 32-bits of each instruction--so that
there is no hole in the decoder that would allow the core to do some-
thing not directly specified in ISA. {And one of the things that make
an industrial quality ISA so hard to fully specify.}}
---------------------
Sometimes there is a tension:
What is theoretically allowed in the ISA;
What is the theoretically expected behavior in some abstract model;
What stuff is actually used by compilers;
What features or behaviors does one want;
...
Whether your DFAM can be attacked with RowHammer;
Whether your call/return interface can be attacked with:
{ Return Orienter Programmeing, Buffer Overflows, ...}
That is; whether you care if your system provides a decently robust programming environment.
I happen to care. Apparently, most do not.
Implementing RISC-V strictly as per an abstract model would limit both
efficiency and hinder some use-cases.
One can make an argument that it is GOOD to limit attack vectors, and
provide a system that is robust in the face of attacks.
Then it comes down to "what do compilers do" and "what unintentional
behaviors could an ASM programmer stumble onto unintentionally".
eve at best.
In article <10nak0a$nrac$2@dont-email.me>, cr88192@gmail.com (BGB) wrote:
Does imply that my younger self was notable, and not seen as just
some otherwise worthless nerd.
Educators who are any good notice the weird kids who are actually smart.
For 128 predicate registers, this part doesn't make as much sense:
I suspect they wanted to re-use some logic.
The tricks Itanium could do with combinations of predicate registers were pretty weird. There was at least one instruction for manipulating them
which I was entirely unable to understand, with the manual in front of me
and pencil and paper to try examples. Fortunately, it never occurred in
code generated by any of the compilers I used.
*1: Where people argue that if each vendor can do a CPU with their
own custom ISA variants and without needing to license or get
approval from a central authority, that invariably everything would
decay into an incoherent mess where there is no binary
compatibility between processors from different vendors (usual
implication being that people are then better off staying within
the ARM ecosystem to avoid RV's lawlessness).
The importance of binary compatibility is very much dependent on the
market sector you're addressing. It's absolutely vital for consumer apps
and games. It's much less important for current "AI" where each vendor
has their own software stack anyway. RISC-V seems to be far more
interested in the latter at present.
On 2/22/2026 3:52 PM, John Dallman wrote:
In article <10nak0a$nrac$2@dont-email.me>, cr88192@gmail.com (BGB) wrote:
Does imply that my younger self was notable, and not seen as just
some otherwise worthless nerd.
Educators who are any good notice the weird kids who are actually smart.
Sometimes I question if I really am though.
Like, some evidence says I am, but by most metrics of "life success" I
have done rather poorly.
And, in middle and high-school, they just sorta forced me to sit through normal classes (which sucked really hard)
Well, and I apparently missed
the point of school, thinking it was more of an endurance thing with
sort of a vague pretense of education (and I probably would have learned more if they just let me spend the time doing whatever else).
The tricks Itanium could do with combinations of predicate registers were pretty weird. There was at least one instruction for manipulating them which I was entirely unable to understand, with the manual in front of me and pencil and paper to try examples. Fortunately, it never occurred in code generated by any of the compilers I used.
Possibly.
I had also looked into a more limited set of predicate registers at one point, but this fizzled in favor of just using GPRs.
So, as noted:
I have 1 predicate bit (T bit);
Had looked into expanding it to 2 predicate bits (using an S bit as a
second predicate), but this went nowhere.
And, in middle and high-school, they just sorta forced me to sit through
normal classes (which sucked really hard)
In my case, I remember sitting in the back of advanced algebra class
(mostly senior HS people, me a sophomore) doing chemistry homework while >vaguely listening to the teacher fail to get various students to solve
a typical algebra problem. Then she called on me, I looked up at the board >and in less than a second I rattled off the answer skipping 5 steps along
the way. Moral, don't be bored in class, do something useful instead.
Well, and I apparently missed
the point of school, thinking it was more of an endurance thing with
sort of a vague pretense of education (and I probably would have learned
more if they just let me spend the time doing whatever else).
For most people, school attempts to give the students just enough knowledge >that they are not burdens on society.
MitchAlsup <user5857@newsgrouper.org.invalid> writes:
And, in middle and high-school, they just sorta forced me to sit through >>> normal classes (which sucked really hard)
In my case, I remember sitting in the back of advanced algebra class
(mostly senior HS people, me a sophomore) doing chemistry homework while
vaguely listening to the teacher fail to get various students to solve
a typical algebra problem. Then she called on me, I looked up at the board >> and in less than a second I rattled off the answer skipping 5 steps along
the way. Moral, don't be bored in class, do something useful instead.
Well, and I apparently missed >>> the point of school, thinking it was more of an endurance thing with
sort of a vague pretense of education (and I probably would have learned >>> more if they just let me spend the time doing whatever else).
For most people, school attempts to give the students just enough knowledge >> that they are not burdens on society.
My high school (1970s, when the split was K-7, 7-9, 10-12) had
four "communities".
Traditional
Career
Work Study
Flexible Individual Learning (FIL)
The college-bound were generally part of the
FIL community. Career included business classes,
traditional was more like the olden days and
Work Study included off-school apprenticeships,
shop classes, electronics training, etc.
Students mostly took classes with peers in their
community (there were over 400 in my graduating class).
Worked rather well, but ended up segregating students
by income level as well as IQ, so
the school district changed that in the
80s in the interest of equality treating the
entire high school as a single community. The
quality of the education received diminished
thereafter, IMO.
BGB <cr88192@gmail.com> posted:
On 2/22/2026 3:52 PM, John Dallman wrote:
In article <10nak0a$nrac$2@dont-email.me>, cr88192@gmail.com (BGB) wrote: >>>
Does imply that my younger self was notable, and not seen as just
some otherwise worthless nerd.
Educators who are any good notice the weird kids who are actually smart. >>>
Sometimes I question if I really am though.
Like, some evidence says I am, but by most metrics of "life success" I
have done rather poorly.
And, in middle and high-school, they just sorta forced me to sit through
normal classes (which sucked really hard)
In my case, I remember sitting in the back of advanced algebra class
(mostly senior HS people, me a sophomore) doing chemistry homework while vaguely listening to the teacher fail to get various students to solve
a typical algebra problem. Then she called on me, I looked up at the board and in less than a second I rattled off the answer skipping 5 steps along
the way. Moral, don't be bored in class, do something useful instead.
Well, and I apparently missed
the point of school, thinking it was more of an endurance thing with
sort of a vague pretense of education (and I probably would have learned
more if they just let me spend the time doing whatever else).
For most people, school attempts to give the students just enough knowledge that they are not burdens on society.
-------------------------
The tricks Itanium could do with combinations of predicate registers were >>> pretty weird. There was at least one instruction for manipulating them
which I was entirely unable to understand, with the manual in front of me >>> and pencil and paper to try examples. Fortunately, it never occurred in
code generated by any of the compilers I used.
It could have been a case where the obvious logic decoding "that" field in the instruction allowed for "a certain pattern" to perform what they described
in the spec. I did some of this in Mc 88100, and this is what taught me never to do it again or allow anyone else to do it again.
Possibly.
I had also looked into a more limited set of predicate registers at one
point, but this fizzled in favor of just using GPRs.
So, as noted:
I have 1 predicate bit (T bit);
Had looked into expanding it to 2 predicate bits (using an S bit as a
second predicate), but this went nowhere.
I have tried several organizations over the last 40 years of practice::
In my Humble and Honest Opinion, the only constructs predicates should support are singular comparisons and comparisons using && and || with deMoganizing logic {~}--not because other forms are unuseful, but be-
cause those are the constructs programmers use writing code.
On 2/22/2026 3:52 PM, John Dallman wrote:
In article <10nak0a$nrac$2@dont-email.me>, cr88192@gmail.com (BGB) wrote:
Does imply that my younger self was notable, and not seen as just
some otherwise worthless nerd.
Educators who are any good notice the weird kids who are actually smart.
Sometimes I question if I really am though.
Like, some evidence says I am, but by most metrics of "life success" I
have done rather poorly.
And, in middle and high-school, they just sorta forced me to sit through normal classes (which sucked really hard). Well, and I apparently missed
the point of school, thinking it was more of an endurance thing with
sort of a vague pretense of education (and I probably would have learned more if they just let me spend the time doing whatever else).
...
But, it seems like a case of:
By implication, I am smart, because if I wasn't, even my own (sometimes pointless) hobby interests would have been out of reach.
Like, not a world of difficulty justifying them, or debating whether or
not something is worth doing, but likely not something someone could do
at all.
Or, maybe, like encountering things that seem confusing isn't such a
rare experience (or that people have learned how to deal more
productively with things they can see but don't understand?...).
But, there is a thing I have noted:
I had a few times mentioned to people about finding that certain AIs had gotten smart enough to start understanding how a 5/6 bit finite state machine to predict repeating 1-4 bit patterns would be constructed.
Then, I try to describe it, and then realize that for the people I try
to mention it to, it isn't that they have difficulty imagining how one
would go about filling in the table and getting all of the 4 bit
patterns to fit into 32 possible states. Many seem to have difficulty understanding how such a finite state machine would operate in the first place.
Even if, it seems like this part seems like something that pretty much anyone should be able to understand.
Initially, I had used this as a test case for the AIs because it posed "moderate difficulty" for problems which could be reasonably completely described in a chat prompt (and is not overly generic).
Nevermind if it is still a pain to generate tables by hand, and my
attempts at hand-generated tables have tended to have worse adaptation
rates than those generated using genetic algorithms (can be more clean looking, but tend to need more input bits to reach the target state if
the pattern changes).
Sometimes I feel like a poser.
Other things, it seems, I had taken for granted.
Seems sometimes if I were "actually smart", would have figured out some
way to make better and more efficient use of my span of existence.
BGB <cr88192@gmail.com> posted:
On 2/22/2026 3:52 PM, John Dallman wrote:
In article <10nak0a$nrac$2@dont-email.me>, cr88192@gmail.com (BGB) wrote: >>>
Does imply that my younger self was notable, and not seen as just
some otherwise worthless nerd.
Educators who are any good notice the weird kids who are actually smart. >>>
Sometimes I question if I really am though.
Like, some evidence says I am, but by most metrics of "life success" I
have done rather poorly.
And, in middle and high-school, they just sorta forced me to sit through
normal classes (which sucked really hard)
In my case, I remember sitting in the back of advanced algebra class
(mostly senior HS people, me a sophomore) doing chemistry homework while vaguely listening to the teacher fail to get various students to solve
a typical algebra problem. Then she called on me, I looked up at the board and in less than a second I rattled off the answer skipping 5 steps along
the way. Moral, don't be bored in class, do something useful instead.
MitchAlsup wrote:
In my case, I remember sitting in the back of advanced algebra class
(mostly senior HS people, me a sophomore) doing chemistry homework while
vaguely listening to the teacher fail to get various students to solve
a typical algebra problem. Then she called on me, I looked up at the board >> and in less than a second I rattled off the answer skipping 5 steps along
the way. Moral, don't be bored in class, do something useful instead.
I used a double physics time slot (i.e two 50-min time slots with a
5-min break between them) in exactly the same way, except that I
calculated ~24 digits of pi using the taylor series for atan(1/5) and >atan(1/239). The latter part was much faster of course!
Doing long divisions by 25 and (n^2+n) took the majority of the time.
Terje
PS. I re-implemented the exact same algorithm, using base 1e10, on the
very first computer I got access to, a Univac 110x in University. This
was my first ever personal piece of programming.
On 11/5/25 3:52 PM, MitchAlsup wrote:
Robert Finch <robfi680@gmail.com> posted:
On 2025-11-05 1:47 a.m., Robert Finch wrote:-----------
I am now modifying Qupls2024 into Qupls2026 rather than starting a
completely new ISA. The big difference is Qupls2024 uses 64-bit
instructions and Qupls2026 uses 48-bit instructions making the code 25%
more compact with no real loss of operations.
Qupls2024 also used 8-bit register specs. This was a bit of overkill and >> not really needed. Register specs are reduced to 6-bits. Right-away that >> reduced most instructions eight bits.
4 register specifiers: check.
I decided I liked the dual operations that some instructions supported,
which need a wide instruction format.
With 48-bits, if you can get 2 instructions 50% of the time, you are only 12% bigger than a 32-bit ISA.
I must be misunderstanding your math; if half of the
6-byte instructions are two operations, I think that
means 12 bytes would have three operations which is
the same as for a 32-bit ISA.
Perhaps you meant for every two instructions, there
is a 50% chance neither can be "fused" and a 50%
chance they can be fused with each other; this would
get four operations in 18 bytes, which _is_ 12.5%
bigger. That seems an odd expression, as if the
ability to fuse was not quasi-independent.
It could just be that one of us has a "thought-O".
One gotcha is that 64-bit constant overrides need to be modified. For
Qupls2024 a 64-bit constant override could be specified using only a
single additional instruction word. This is not possible with 48-bit
instruction words. Qupls2024 only allowed a single additional constant
word. I may maintain this for Qupls2026, but that means that a max
constant override of 48-bits would be supported. A 64-bit constant can
still be built up in a register using the add-immediate with shift
instruction. It is ugly and takes about three instructions.
It was that sticking problem of constants that drove most of My 66000
ISA style--variable length and how to encode access to these constants
and routing thereof.
Motto: never execute any instructions fetching or building constants.
I am guessing that having had experience with x86
(and the benefit of predecode bits), you recognized
that VLE need not be horribly complex to parse.
My 66000 does not use "start bits", but the length
is quickly decoded from the first word and the
critical information is in mostly fixed locations
in the first word.
(One might argue that opcodeor link time.
can be in two locations depending on if the
instruction uses a 16-bit immediate or not rCo
assuming I remember that correctly.)
Obviously, something like DOUBLE could provide
extra register operands to a complex instruction,
though there may not be any operation needing
five register inputs. Similarly, opcode refinement
(that does not affect operation routing) could be
placed into an "immediate". I think you do not
expect to need such tricks because reduced
number of instructions is a design principle and
there is lots of opcode space remaining, but I
feel these also allow the ISA to be extended in
unexpected directions.
I think that motto could be generalized to "do
not do at decode time what can be done at
compile time" (building immediates could be
"executed" in decode). There are obvious limits
to that principle; e.g., one would not encode
instructions as control bits, i.e., "predecoded",
in order to avoid decode work. For My 66000
immediates, reducing decode work also decreases
code size.
Discerning when to apply a transformation and if/
where to cache the result seems useful. E.g., a
compiler caches the source code to machine code
transformation inside an executable binary. My
66000's Virtual Vector Method implementations
are expected, from what I understand, to cache
fetch and decode work and simplify operand
routing.
Caching branch prediction information in an
instruction seems to be viewed generally as not
worth much since dynamic predictors are generally
more accurate.
Static prediction by branch--- Synchronet 3.21d-Linux NewsLink 1.2
"type" (e.g., forward not-taken) can require no
additional information. (Branch prediction
_directives_ are somewhat different. Such might
be used to reduce the time for a critical path,
but average time is usually a greater concern.)
My first was a simple BASIC "hello world" program in 1974 on a
Burroughs B5500 (remotely, via again an ASR-33) which we had
for a week in 7th grade math class.
I was quite proud when I managed to factorize 123456789, which
took some time.
Scott Lurndal <scott@slp53.sl.home> schrieb:
My first was a simple BASIC "hello world" program in 1974 on a
Burroughs B5500 (remotely, via again an ASR-33) which we had
for a week in 7th grade math class.
I started out on my father's first programmable pocket calculator,
a Casio model with 38 steps (I think).
I was quite proud when I managed to factorize 123456789, which
took some time.
On 01/03/2026 13:18, Thomas Koenig wrote:
Scott Lurndal <scott@slp53.sl.home> schrieb:
My first was a simple BASIC "hello world" program in 1974 on a
Burroughs B5500 (remotely, via again an ASR-33) which we had
for a week in 7th grade math class.
I started out on my father's first programmable pocket calculator,
a Casio model with 38 steps (I think).
Would that have been a Casio fx-3600P ? I bought one of these as a teenager, and used it non-stop. 38 steps of program space was not a
lot, but I remember making a library for complex number calculations for it.
I was quite proud when I managed to factorize 123456789, which
took some time.
I used mine to find formulas for numerical integration (like Simpson's
rule, but higher order). Basically useless, but fun!
Stefan Monnier <monnier@iro.umontreal.ca> writes:
At the time of conception, there were amny arguments that {sooner or
later} compilers COULD figure stuff like this out.
I can't remember seeing such arguments comping from compiler people, tho.
Actually, the IA-64 people could point to the work on VLIW (in
particular, Multiflow (trace scheduling) and Cydrome (software
pipelining)), which in turn is based on the work on compilers for
microcode.
That did not solve memory latency, but that's a problem even for OoO
cores.
I suspect a big part of the problem was tension between Intel and HP
were the only political solution was allowing the architects from both
sides to "dump in" their favorite ideas. A recipe for disaster.
The HP side had people like Bob Rau (Cydrome) and Josh Fisher
(Multiflow), and given their premise, the architecture is ok; somewhat
on the complex side, but they wanted to cover all the good ideas from
earlier designs; after all, it was to be the one architecture to rule
them all (especially performancewise). You cannot leave out a feature
that a competitor could then add to outperform IA-64.
The major problem was that the premise was wrong. They assumed that
in-order would give them a clock rate edge, but that was not the case,
right from the start (The 1GHz Itanium II (released July 2002)
competed with 2.53GHz Pentium 4 (released May 2002) and 1800MHz Athlon
XP (released June 2002)). They also assumed that explicit parallelism
would provide at least as much ILP as hardware scheduling of OoO CPUs,
but that was not the case for general-purpose code, and in any case,
they needed a lot of additional ILP to make up for their clock speed >disadvantage.
| Sysop: | Amessyroom |
|---|---|
| Location: | Fayetteville, NC |
| Users: | 59 |
| Nodes: | 6 (0 / 6) |
| Uptime: | 00:15:51 |
| Calls: | 810 |
| Files: | 1,287 |
| Messages: | 197,330 |