• Re: Tonights Tradeoff

    From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Wed Jan 28 02:10:21 2026
    From Newsgroup: comp.arch

    Paul Clayton <paaronclayton@gmail.com> writes:
    On 11/13/25 5:13 PM, MitchAlsup wrote:

    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
    [snip]
    What I wanted to write was "And assembly language is
    architecture-specific".

    I have worked on a single machine with several different ASM "compilers".
    Believe me, one asm can be different than another asm.

    But it is absolutely true that asm is architecture specific.

    Is that really *absolutely* true? Architecture usually includes binary >encoding (and memory order model and perhaps other non-assembly details).

    I do not know if being able to have an interrupt in the middle of an >assembly instruction is a violation of the assembly contract. (In
    theory, a few special cases might be handled such that the assembly >instruction that breaks into more than one machine instruction is
    handled similarly to breaking instructions into -|ops.) There might not
    be any practical case where all the sub-instructions of an assembly >instruction are also assembly instructions (especially not if
    retaining instruction size compatibility, which would be difficult
    with such assembly instruction fission anyway).

    The classic case is the VAX MOVC3/MOVC5 instructions. An interrupt
    could occur during the move and simply restart the instruction
    (the register operands having been updated as each byte was moved).

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From jgd@jgd@cix.co.uk (John Dallman) to comp.arch on Wed Jan 28 15:34:00 2026
    From Newsgroup: comp.arch

    In article <10lbcg1$3uh8h$1@dont-email.me>, paaronclayton@gmail.com (Paul Clayton) wrote:

    I _feel_ that if only the opcode encoding is changed (a very tiny
    difference that would only affect using code as data) that one
    could rightly state that the new architecture uses the same
    assembly.

    That would, however, raise questions and doubts among everyone who was
    aware of the different instruction encodings. You would do far better to
    say that the new architecture is compatible at the assembler source level,
    but not at the binary level.

    I doubt there could be any economic justification for
    only changing the opcode encoding, but theoretically such could
    have multiple architectures with the same assembly.

    There was a threatened case of this in the early years of this century.
    Intel admitted to themselves that AMD64 was trouncing Itanium in the marketplace, and they needed to do 64-bit x86 or see their company shrink dramatically. However, they did not want to do an AMD-compatible x86-64.
    They wanted to use a different instruction encoding and have deliberate
    binary incompatibility.

    This was crazy from the network externalities point of view. It was an anti-competitive move, requiring software vendors to do separate builds
    for Intel and AMD, hoping that they would not bother with AMD builds.

    Microsoft killed this idea, by refusing to support any such
    Intel-specific 64-bit x86. They could not prevent Intel doing it, but
    there would not be Windows for it. Intel had to climb down.

    I do not think assembly language considered the possible effects of
    memory order model. (Have all x86 implementations been compatible?
    I think the specification changed, but I do not know if
    compatibility was broken.)

    In general, the assembly programmer is responsible for considering the
    memory model, not the language implementation.

    In addition to the definition for "assembly language" one also
    needs to define "architecture".

    Actually, the world seems to get on OK without such clear definitions.
    The obscurity of assembly language tends to limit its use to those who
    really need to use it, and who are prepared to use a powerful but
    unforgiving tool.

    Intel has sold incompatible architectures within the same design
    by fusing off functionality and has even had different application
    cores in the same chip have different instruction support (though
    that seems to have bitten Intel).

    Well, different ISA support in different cores in the same processor
    package is just dumb[1]. It reflects a delusion that Intel has suffered
    since at least the late 1990s: that software is specific to particular generations of their chips, and there's a new release with significant
    changes for each new generation. Plenty of Intel people know that is true
    for motherboard firmware, but not for operating systems or application software. But the company carries on behaving that way.

    [1] See the Cell processor for an extreme example.

    John
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Wed Jan 28 19:19:06 2026
    From Newsgroup: comp.arch


    Paul Clayton <paaronclayton@gmail.com> posted:

    On 11/13/25 5:13 PM, MitchAlsup wrote:

    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
    [snip]
    What I wanted to write was "And assembly language is
    architecture-specific".

    I have worked on a single machine with several different ASM "compilers". Believe me, one asm can be different than another asm.

    But it is absolutely true that asm is architecture specific.

    Is that really *absolutely* true? Architecture usually includes binary encoding (and memory order model and perhaps other non-assembly details).

    If/when you use an instruction in a more modern implementation,
    that you HAVE made this program incompatible with prior implementations.

    I do not know if being able to have an interrupt in the middle of an assembly instruction is a violation of the assembly contract. (In
    theory, a few special cases might be handled such that the assembly instruction that breaks into more than one machine instruction is
    handled similarly to breaking instructions into -|ops.) There might not
    be any practical case where all the sub-instructions of an assembly instruction are also assembly instructions (especially not if
    retaining instruction size compatibility, which would be difficult
    with such assembly instruction fission anyway).

    The real question is whether you support multiple memory instructions
    Since we all seem to be supporting multiple data instructions (Vector
    and SIMD).

    In My 66000 case, Memory to Memory move is performed using indexing
    from starting points {From and To}, Program status cache line maintains
    the index when a MM is interrupted, so we can restart in the middle. {Essentially no different than VAX except its an index instead of a
    set of pointers}

    Self-modifying assembly obviously breaks with different encodings (as
    would using instruction encodings as data).

    If the assembly instructions were different sizes, control flow
    instructions could be broken if addresses or explicit displacements
    were used rather than abstract labels (which might not be allowed or
    merely considered bad practice). Jump tables would also be affected
    (such could also be fixed automatically if the jump table location and format is known).

    In My 66000, jump tables are PIC.

    Obviously, one could also do the equivalent of complete binary recopmilation, which would usually not be considered the role of an assembler.

    I _feel_ that if only the opcode encoding is changed (a very tiny
    difference that would only affect using code as data) that one could
    rightly state that the new architecture uses the same assembly. I
    doubt there could be any economic justification for only changing the
    opcode encoding, but theoretically such could have multiple
    architectures with the same assembly.

    If one allows changing the placement of constants, register
    specifiers, and opcodes (without changing the machine code size of any assembly instruction) to still be the same assembly language (which I consider reasonable), the benefit of a new encoding might be
    measurable (albeit tiny and not worthwhile).

    If one allows assembly instructions to change in size as well as
    encoding (but retain even interrupt semantics), the assembler could
    still be very simple (which might justify still calling it an assembler).

    In My 66000, compiler produces an abstract address. After linking when
    the address/offset/displacement is manifest, Linker determines the size
    of the instruction.

    If the assembly language includes macros (single assembly instruction
    that is assembled into multiple machine instructions), interrupt
    granularity should not be considered part of compatibility, in my
    opinion. Yes, behavior would change because some uninterruptable
    assembly instructions would become interruptable, but the mapping was already not simple.

    I do not think anyone would think that converting macros into multiple instructions in any way prevents interrupts from happening anywhere.

    If one allows pipeline reorganization in the assembler (as I think was considered a possibility for handling explicit pipelines that
    changed), then size changes would be allowed in which case substantial encoding changes should be allowed.

    S/substantial/moderate/

    I do not think assembly language considered the possible effects of
    memory order model. (Have all x86 implementations been compatible? I
    think the specification changed, but I do not know if compatibility
    was broken.)

    Agree with previous responder: programmer programs to memory model
    not ASM.

    Upward compatibility is also a factor. Since one could say that adding assembly instructions to an assembly language does not change the
    language (like adding machine instructions does not change the
    architecture in terms of name (upwardly compatible family?)), one
    could argue that increasing the number of registers could maintain the
    same "assembly language' as well as increasing the size of registers.

    But the use of said addition, prevent this program from running on
    previous implementations.

    In addition to the definition for "assembly language" one also needs
    to define "architecture". In a very strict sense, x86-64 is not a
    single architecture rCo every different set of machine instructions
    would constitute a different architecture. Intel has sold incompatible architectures within the same design by fusing off functionality and
    has even had different application cores in the same chip have
    different instruction support (though that seems to have bit Intel).

    The ISA is less than 1/3rd of an architecture:: you have
    a) Memory management
    b) exception management
    c) interrupt management
    d) system check management
    e) PCIe Root Complex management
    f) peripheral management
    g) power management
    h) frequency management
    i) virtualization
    j) Boot considerations
    ...

    AMD and Intel also differ slightly in architecture for one or two application-level instructions (as well as virtualization
    differences), but are considered the same architecture.

    Requiring different builds for any sense of compatible performance.

    Architecture seems to be used in the fuzzy sense rather than the
    strict sense of 100% timing-independent compatibility,

    You are only considering 1/3rd of what architecture IS.

    so it seems reasonable to have a fuzzier sense of assembly language to include at
    least encoding changes. It seems reasonable to me for "assembly
    language" to mean the preferred language for simple mapping to machine instructions (which can include idioms rCo different spellings of the
    same machine instruction rCo and macros).

    The modern sense of ASM is that it is an ASCII version of binary.
    The old sense where ASM was a language that could do anything and
    everything (via Macros) has slipped into the past.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Thu Jan 29 07:13:17 2026
    From Newsgroup: comp.arch

    MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

    In My 66000, compiler produces an abstract address. After linking when
    the address/offset/displacement is manifest, Linker determines the size
    of the instruction.

    Maybe eventually.

    Right now, the assembler adjust sizes when it has the information
    (including the size of jump tables, for example). Unresolved
    symbols are left in a size according to the memory model specified
    to the assembler.

    A linker *can* do linker relaxation, the RISC-V toolchain does so.
    However, they have opened a huge can of worms with this, for several
    reasons, for example changing debug tables in the linker and bugs
    for corner cases where special alignment was needed, which is not
    uncommon on embedded systems (I believe).

    Perhaps the RISC-V binutils team are simply incompetent, but
    I think it is far more likely that linker relaxation is simply
    a very difficult task to get right, and the problem lies mainly
    with the specification, not with those tasked with implementing it.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Thu Jan 29 12:30:09 2026
    From Newsgroup: comp.arch

    Thomas Koenig [2026-01-29 07:13:17] wrote:
    Perhaps the RISC-V binutils team are simply incompetent, but
    I think it is far more likely that linker relaxation is simply
    a very difficult task to get right, and the problem lies mainly
    with the specification, not with those tasked with implementing it.

    My gut feeling is that adjusting instruction sizes after you generated
    the machine code is just a bad idea. In theory it can be done, but
    I'd expect there's always a better solution to the problem it's
    trying to solve (e.g. delay the generation of the machine code, or just
    use pessimistically-sized instructions).


    === Stefan
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Thu Jan 29 12:30:46 2026
    From Newsgroup: comp.arch

    Thomas Koenig [2026-01-29 07:13:17] wrote:
    Perhaps the RISC-V binutils team are simply incompetent, but
    I think it is far more likely that linker relaxation is simply
    a very difficult task to get right, and the problem lies mainly
    with the specification, not with those tasked with implementing it.

    My gut feeling is that adjusting instruction sizes after you generated
    the machine code is just a bad idea. In theory it can be done, but
    I'd expect there's always a better solution to the problem it's
    trying to solve (e.g. delay the generation of the machine code, or just
    use pessimistically-sized instructions).


    === Stefan
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Sun Feb 1 17:51:07 2026
    From Newsgroup: comp.arch

    Scott Lurndal wrote:
    Paul Clayton <paaronclayton@gmail.com> writes:
    On 11/13/25 5:13 PM, MitchAlsup wrote:

    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
    [snip]
    What I wanted to write was "And assembly language is
    architecture-specific".

    I have worked on a single machine with several different ASM "compilers". >>> Believe me, one asm can be different than another asm.

    But it is absolutely true that asm is architecture specific.

    Is that really *absolutely* true? Architecture usually includes binary>> encoding (and memory order model and perhaps other non-assembly details).

    I do not know if being able to have an interrupt in the middle of an
    assembly instruction is a violation of the assembly contract. (In
    theory, a few special cases might be handled such that the assembly
    instruction that breaks into more than one machine instruction is
    handled similarly to breaking instructions into |e-|ops.) There might not
    be any practical case where all the sub-instructions of an assembly
    instruction are also assembly instructions (especially not if
    retaining instruction size compatibility, which would be difficult
    with such assembly instruction fission anyway).

    The classic case is the VAX MOVC3/MOVC5 instructions. An interrupt
    could occur during the move and simply restart the instruction
    (the register operands having been updated as each byte was moved).
    An even more common example (numbering in the 100M to 1B range?) is x86 processors with interruptible REP MOVS/STOS/LODS instructions.
    Terje
    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Sun Feb 1 18:01:13 2026
    From Newsgroup: comp.arch

    MitchAlsup wrote:

    Paul Clayton <paaronclayton@gmail.com> posted:
    reasonable to have a fuzzier sense of assembly language to include at
    least encoding changes. It seems reasonable to me for "assembly
    language" to mean the preferred language for simple mapping to machine>> instructions (which can include idioms |ore4rCY different spellings of the
    same machine instruction |ore4rCY and macros).

    The modern sense of ASM is that it is an ASCII version of binary.
    The old sense where ASM was a language that could do anything and
    everything (via Macros) has slipped into the past.

    In my current world, asm is what I use for inline kernels that cannot be directly described in Rust (or C(+), letting the ocmpiler handle all the scaffolding that would have been handled by asm MACROs 40 years ago.
    Terje
    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Paul Clayton@paaronclayton@gmail.com to comp.arch on Wed Feb 4 22:31:23 2026
    From Newsgroup: comp.arch

    On 1/28/26 10:34 AM, John Dallman wrote:
    In article <10lbcg1$3uh8h$1@dont-email.me>, paaronclayton@gmail.com (Paul Clayton) wrote:

    I _feel_ that if only the opcode encoding is changed (a very tiny
    difference that would only affect using code as data) that one
    could rightly state that the new architecture uses the same
    assembly.

    That would, however, raise questions and doubts among everyone who was
    aware of the different instruction encodings. You would do far better to
    say that the new architecture is compatible at the assembler source level, but not at the binary level.

    I tend to agree. I was arguing semantics (what is assembly?) not
    best practice.

    Currently, assembly-level compatibility does not seem worthwhile.
    Software is usually distributed as machine code binaries not as
    assembly, and software is usually developed in at least a C-level
    language rather than assembly. In the past, easy translation of
    assembly to support a new machine language would be useful, but
    this seems not to be the case now.



    I doubt there could be any economic justification for
    only changing the opcode encoding, but theoretically such could
    have multiple architectures with the same assembly.

    There was a threatened case of this in the early years of this century.
    Intel admitted to themselves that AMD64 was trouncing Itanium in the marketplace, and they needed to do 64-bit x86 or see their company shrink dramatically. However, they did not want to do an AMD-compatible x86-64.
    They wanted to use a different instruction encoding and have deliberate binary incompatibility.

    Would the Intel-64 have been assembly compatible with AMD64? I
    would have guessed that not just encodings would have been
    different. If one wants to maintain market friction, supporting
    the same assembly seems counterproductive.

    This was crazy from the network externalities point of view. It was an anti-competitive move, requiring software vendors to do separate builds
    for Intel and AMD, hoping that they would not bother with AMD builds.

    Cooperating with AMD to develop a more sane encoding while
    supporting low overhead for old binaries would have been better
    for customers (I think). However, doing what is best generally
    for customers is not necessarily the most profitable action.

    Microsoft killed this idea, by refusing to support any such
    Intel-specific 64-bit x86. They could not prevent Intel doing it, but
    there would not be Windows for it. Intel had to climb down.

    Which was actually a sane action not just from the hassle to
    Microsoft of supporting yet another ISA but the confusion of
    users (Intel64 and AMD64 both run x86-32 binaries but neither
    Intel64 nor AMD64 run the other's binaries!) which would impact
    Microsoft (and PC OEMs) more than Intel.

    I do not think assembly language considered the possible effects of
    memory order model. (Have all x86 implementations been compatible?
    I think the specification changed, but I do not know if
    compatibility was broken.)

    In general, the assembly programmer is responsible for considering the
    memory model, not the language implementation.

    Yes, but for a single-threaded application this is not a factor rCo
    so such would be more compatible. It is not clear if assembly
    programmers would use less efficient abstractions (like locks) to
    handle concurrency in which case a different memory model might
    not impact correctness. On the one hand, assembly is generally
    chosen because C provides insufficient performance (or
    expressiveness), which would imply that assembly programmers
    would not want to leave any performance on the table and would
    exploit the memory model. On the other hand, the assembly
    programmer mindset may often be more serial and the performance
    cost of using higher abstractions for concurrency may be lower
    than the debugging costs of being clever relative to using
    cleverness for other optimizations.

    In addition to the definition for "assembly language" one also
    needs to define "architecture".

    Actually, the world seems to get on OK without such clear definitions.
    The obscurity of assembly language tends to limit its use to those who
    really need to use it, and who are prepared to use a powerful but
    unforgiving tool.

    Yes, the niche effect helps to avoid diversity of meaning across
    users and across time. I suspect jargon also changes less rapidly
    than common language both because there is less interaction and
    there is more pressure to be formal in expression.

    Intel has sold incompatible architectures within the same design
    by fusing off functionality and has even had different application
    cores in the same chip have different instruction support (though
    that seems to have bitten Intel).

    Well, different ISA support in different cores in the same processor
    package is just dumb[1]. It reflects a delusion that Intel has suffered
    since at least the late 1990s: that software is specific to particular generations of their chips, and there's a new release with significant changes for each new generation. Plenty of Intel people know that is true
    for motherboard firmware, but not for operating systems or application software. But the company carries on behaving that way.

    I do not think ISA heterogeneity is necessarily problematic. I
    suspect it might require more system-level organization (similar
    to Apple). Even without ISA heterogeneity, optimal scheduling
    seems to be a hard problem. Energy/power and delay/performance
    preferences are not typically expressed. The abstraction of each
    program owning the machine seems to discourage nice behavior (pun
    intended).

    Intel seems to be conflicted between encouraging software use of
    features and extracting profit from those users who benefit more
    from certain features. Maximizing availability of an
    architectural feature encourages software to adopt the feature,
    but limiting availability allows charging more for enabling a
    feature.


    [1] See the Cell processor for an extreme example.

    I thought Cell was almost an embedded system. The SIMD-focused
    processors were more like GPUs, I thought, and intended to be
    used as such. For games, this might have made sense. However,
    I think this was before General Purpose GPU was a thing.
    (I thought Intel marketed their initial 512-bit SIMD processors
    as GPGPUs with x86 compatibility, so the idea of having a
    general purpose ISA morphed into a GPU-like ISA had some
    fascination after Cell.)
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Thu Feb 5 19:02:14 2026
    From Newsgroup: comp.arch


    Paul Clayton <paaronclayton@gmail.com> posted:

    On 1/28/26 10:34 AM, John Dallman wrote:
    In article <10lbcg1$3uh8h$1@dont-email.me>, paaronclayton@gmail.com (Paul Clayton) wrote:
    -----------------

    Would the Intel-64 have been assembly compatible with AMD64? I

    Andy Glew indicated similar but not exact enough.
    Andy also stated that MicroSoft forced Intel's hand towards x86-64.

    would have guessed that not just encodings would have been
    different. If one wants to maintain market friction, supporting
    the same assembly seems counterproductive.

    It was, in essence, the control register model, the nested paging,
    and other insundry non ISA components.

    This was crazy from the network externalities point of view. It was an anti-competitive move, requiring software vendors to do separate builds
    for Intel and AMD, hoping that they would not bother with AMD builds.

    Cooperating with AMD to develop a more sane encoding while
    supporting low overhead for old binaries would have been better
    for customers (I think). However, doing what is best generally
    for customers is not necessarily the most profitable action.

    Yes, imaging Custer (Intel) and AMD (Sioux) sitting down together
    and making optimal battle plans for Little Big Horn battle to come.

    Microsoft killed this idea, by refusing to support any such
    Intel-specific 64-bit x86. They could not prevent Intel doing it, but
    there would not be Windows for it. Intel had to climb down.

    Which was actually a sane action not just from the hassle to
    Microsoft of supporting yet another ISA but the confusion of
    users (Intel64 and AMD64 both run x86-32 binaries but neither
    Intel64 nor AMD64 run the other's binaries!) which would impact
    Microsoft (and PC OEMs) more than Intel.

    I do not think assembly language considered the possible effects of
    memory order model. (Have all x86 implementations been compatible?
    I think the specification changed, but I do not know if
    compatibility was broken.)

    In general, the assembly programmer is responsible for considering the memory model, not the language implementation.

    Yes, but for a single-threaded application this is not a factor rCo
    so such would be more compatible. It is not clear if assembly
    programmers would use less efficient abstractions (like locks) to
    handle concurrency in which case a different memory model might
    not impact correctness. On the one hand, assembly is generally
    chosen because C provides insufficient performance (or
    expressiveness), which would imply that assembly programmers
    would not want to leave any performance on the table and would
    exploit the memory model. On the other hand, the assembly
    programmer mindset may often be more serial and the performance
    cost of using higher abstractions for concurrency may be lower
    than the debugging costs of being clever relative to using
    cleverness for other optimizations.

    In addition to the definition for "assembly language" one also
    needs to define "architecture".

    Actually, the world seems to get on OK without such clear definitions.
    The obscurity of assembly language tends to limit its use to those who really need to use it, and who are prepared to use a powerful but unforgiving tool.

    Yes, the niche effect helps to avoid diversity of meaning across
    users and across time. I suspect jargon also changes less rapidly
    than common language both because there is less interaction and
    there is more pressure to be formal in expression.

    Intel has sold incompatible architectures within the same design
    by fusing off functionality and has even had different application
    cores in the same chip have different instruction support (though
    that seems to have bitten Intel).

    Well, different ISA support in different cores in the same processor package is just dumb[1]. It reflects a delusion that Intel has suffered since at least the late 1990s: that software is specific to particular generations of their chips, and there's a new release with significant changes for each new generation. Plenty of Intel people know that is true for motherboard firmware, but not for operating systems or application software. But the company carries on behaving that way.

    One can still buy a milling machine built in 1937 and run it in his shop.
    Can one even do this for software from the previous decade ??

    MS wants you to buy Office every time you buy a new PC.
    MS, then moves all the menu items to different pull downs and
    makes it difficult to adjust to the new SW--and then it has the
    Gaul to chew up valuable screen space with ever larger pull-
    down bars.

    Is it any wonder users want the 1937 milling machine model ???

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.arch on Thu Feb 5 14:35:57 2026
    From Newsgroup: comp.arch

    On 2/5/2026 11:02 AM, MitchAlsup wrote:

    Paul Clayton <paaronclayton@gmail.com> posted:

    On 1/28/26 10:34 AM, John Dallman wrote:
    In article <10lbcg1$3uh8h$1@dont-email.me>, paaronclayton@gmail.com (Paul >>> Clayton) wrote:
    -----------------

    Would the Intel-64 have been assembly compatible with AMD64? I

    Andy Glew indicated similar but not exact enough.
    Andy also stated that MicroSoft forced Intel's hand towards x86-64.
    [...]

    Side note (sorry for injecting ;^o ): I had the pleasure to converse
    with Andy Glew on this very group. Very nice indeed. All about DWCAS and
    fun things. This is a nice group.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From jgd@jgd@cix.co.uk (John Dallman) to comp.arch on Fri Feb 6 15:54:00 2026
    From Newsgroup: comp.arch

    In article <10m12ue$2t2k5$1@dont-email.me>, paaronclayton@gmail.com (Paul Clayton) wrote:

    Currently, assembly-level compatibility does not seem worthwhile.

    Not now, no. There was one case where it was valuable: the assembler
    source translator for 8080 to 8086. That plus the resemblance of early
    MS-DOS to CP/M meant that CP/M software written in assembler could be got working on the early IBM PC and compatibles more rapidly than new
    software could be developed in high-level languages. That was one of the factors in the runaway success of PC-compatible machines in the early
    1980s.

    Software is usually distributed as machine code binaries not as
    assembly,

    Or as source code...

    Would the Intel-64 have been assembly compatible with AMD64? I
    would have guessed that not just encodings would have been
    different. If one wants to maintain market friction, supporting
    the same assembly seems counterproductive.

    It would hardly have mattered. Very little assembler is written for
    64-bit architectures.

    Cooperating with AMD to develop a more sane encoding while
    supporting low overhead for old binaries would have been better
    for customers (I think).

    Intel didn't admit to themselves they needed to do 64-bit x86 until AMD64
    was thrashing them in the market. Far too late for collaborative design
    by then.

    It is not clear if assembly programmers would use less efficient
    abstractions (like locks) to handle concurrency in which case
    a different memory model might not impact correctness.

    You are thinking of doing application programming in assembler. That's
    pretty much extinct these days. Use of assembler to implement locks or
    other concurrency-control mechanisms in an OS or a language run-time
    library is far more likely.

    I've been doing low-level parts of application development for over 40
    years. In 1983-86, I was working in assembler, or needed to have a very
    close awareness of the assembler code being generated by a higher level language. In 1987-1990, I needed to be able to call assembler-level OS functions from C code. Since then, the only coding I've done in assembler
    has been to generate hardware error conditions for testing error handlers.
    I've read and debugged lots of compiler-generated assembler to report
    compiler bugs, but that has become far less common over time.

    I do not think ISA heterogeneity is necessarily problematic.

    It requires the OS scheduler to be ISA-aware, and to never, /ever/ put a
    thread onto a core that can't run the relevant ISA. That will inevitably
    make the scheduler more complicated and thus increase system overheads.

    I suspect it might require more system-level organization (similar
    to Apple).

    Have you ever tried to optimise multi-threaded performance on a modern
    Apple system with a mixture of Performance and Efficiency cores? I have,
    and it's a lot harder than Apple give the impression it will be.

    Apple make an assumption: that you will use their "Grand Central Dispatch" threading model. That requires multi-threaded code to be structured as a one-direction pipeline of work packets, with buffers between them, and
    one thread/core per pipeline stage. That's a sensible model for some
    kinds of work, but not all kinds. It also requires compiler extensions
    which don't exist on other compilers. So you have to fall back to POSIX
    threads to get flexibility and portability.

    If you're using POSIX threads, the scheduler seems to assign threads to
    cores randomly. So your worker threads spend a lot of time on Efficiency
    cores. Those are in different clusters from the Performance cores, which
    means that communications between threads (via locks) are very slow.
    Using Apple's performance category attributes for threads has no obvious
    effect on this.

    The way to fix this is to find out how many Performance cores there are
    in a Performance cluster (which wasn't possible until macOS 12) and use
    that many threads. Then you need to reach below the POSIX threading layer
    to the underlying BSD thread layer. There, you can set an association
    number on your threads, which tells the scheduler to try to run them in
    the same cluster. Then you get stable and near-optimal performance. But
    finding out how to do this is fairly hard, and few seem to managed it.

    Even without ISA heterogeneity, optimal scheduling
    seems to be a hard problem. Energy/power and delay/performance
    preferences are not typically expressed. The abstraction of each
    program owning the machine seems to discourage nice behavior (pun
    intended).

    Allowing processes to find out the details of other processes' resource
    usage makes life very complicated, and introduces new opportunities for security bugs.

    (I thought Intel marketed their initial 512-bit SIMD processors
    as GPGPUs with x86 compatibility, so the idea of having a
    general purpose ISA morphed into a GPU-like ISA had some
    fascination after Cell.)

    Larabee turned out to be a pretty bad GPU, and a pretty bad set of CPUs.

    John
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Paul Clayton@paaronclayton@gmail.com to comp.arch on Sun Feb 8 18:22:46 2026
    From Newsgroup: comp.arch

    On 2/5/26 2:02 PM, MitchAlsup wrote:

    Paul Clayton <paaronclayton@gmail.com> posted:

    [snip]
    Cooperating with AMD to develop a more sane encoding while
    supporting low overhead for old binaries would have been better
    for customers (I think). However, doing what is best generally
    for customers is not necessarily the most profitable action.

    Yes, imaging Custer (Intel) and AMD (Sioux) sitting down together
    and making optimal battle plans for Little Big Horn battle to come.

    Rather than making battle plans for how to annihilate each
    other, perhaps finding a better solution than the ratting each
    other out in the prisoner's dilemma.

    [snip]
    One can still buy a milling machine built in 1937 and run it in his shop.
    Can one even do this for software from the previous decade ??

    Yes, but dependency on (proprietary) servers for some games has
    made them (unnecessarily) unplayable.

    From what I understand, one can still run WordPerfect under a
    DOS emulator on modern x86-64.

    With the poor security of much software, even OSes, one might
    want to contain any legacy software in a more secured
    environment.

    Preventing automatic update is perhaps more of a hassle. Some
    people have placed software in a virtual machine that has no
    networking to avoid software breaking.

    MS wants you to buy Office every time you buy a new PC.

    I thought MS wanted everyone to use Office365. It is harder to
    force people to get a new computer, but a monthly fee will recur
    automatically.

    MS, then moves all the menu items to different pull downs and
    makes it difficult to adjust to the new SW--and then it has the
    Gaul to chew up valuable screen space with ever larger pull-
    down bars.

    Ah, but they are just beginning to include advertising. Imagine
    every time one uses the mouse (to indicate to the computer that
    the user's eyes are focused on a particular place) an
    advertisement appears and follows the cursor movement. Even just
    having menu entries that are advertisements would be kind of
    annoying, but one would be able to get rid of those by leasing
    the premium edition (until one needs to lease the platinum
    edition, then the "who wants to remain a millionaire" edition).

    Is it any wonder users want the 1937 milling machine model ???

    Have no fear; soon you may be merely leasing your computer.
    Computers need to have the latest spyware so that advertisements
    can be appropriately targeted and adblocking must be made
    impossible.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Mon Feb 9 19:33:36 2026
    From Newsgroup: comp.arch


    Paul Clayton <paaronclayton@gmail.com> posted:

    On 2/5/26 2:02 PM, MitchAlsup wrote:

    Paul Clayton <paaronclayton@gmail.com> posted:

    [snip]
    Cooperating with AMD to develop a more sane encoding while
    supporting low overhead for old binaries would have been better
    for customers (I think). However, doing what is best generally
    for customers is not necessarily the most profitable action.

    Yes, imaging Custer (Intel) and AMD (Sioux) sitting down together
    and making optimal battle plans for Little Big Horn battle to come.

    Rather than making battle plans for how to annihilate each
    other, perhaps finding a better solution than the ratting each
    other out in the prisoner's dilemma.

    [snip]
    One can still buy a milling machine built in 1937 and run it in his shop. Can one even do this for software from the previous decade ??

    Yes, but dependency on (proprietary) servers for some games has
    made them (unnecessarily) unplayable.

    From what I understand, one can still run WordPerfect under a
    DOS emulator on modern x86-64.

    With the poor security of much software, even OSes, one might
    want to contain any legacy software in a more secured
    environment.

    Preventing automatic update is perhaps more of a hassle. Some
    people have placed software in a virtual machine that has no
    networking to avoid software breaking.

    MS wants you to buy Office every time you buy a new PC.

    I thought MS wanted everyone to use Office365. It is harder to
    force people to get a new computer, but a monthly fee will recur automatically.

    When I need a tool--I buy that tool--I never rent that tool.

    Name one feature I would want from office365 that was not already
    present in office from <say> 1998.

    MS, then moves all the menu items to different pull downs and
    makes it difficult to adjust to the new SW--and then it has the
    Gaul to chew up valuable screen space with ever larger pull-
    down bars.

    Ah, but they are just beginning to include advertising. Imagine
    every time one uses the mouse (to indicate to the computer that
    the user's eyes are focused on a particular place) an
    advertisement appears and follows the cursor movement. Even just
    having menu entries that are advertisements would be kind of
    annoying, but one would be able to get rid of those by leasing
    the premium edition (until one needs to lease the platinum
    edition, then the "who wants to remain a millionaire" edition).

    Why would I or anyone want advertising in office ????????

    Is it any wonder users want the 1937 milling machine model ???

    Have no fear; soon you may be merely leasing your computer.
    Computers need to have the latest spyware so that advertisements
    can be appropriately targeted and adblocking must be made
    impossible.

    I am the kind of guy that turns off "telemetry" and places advertisers
    in /hosts file.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Paul Clayton@paaronclayton@gmail.com to comp.arch on Mon Feb 9 21:18:20 2026
    From Newsgroup: comp.arch

    On 2/9/26 2:33 PM, MitchAlsup wrote:

    Paul Clayton <paaronclayton@gmail.com> posted:

    On 2/5/26 2:02 PM, MitchAlsup wrote:

    [snip]>>> MS wants you to buy Office every time you buy a new PC.

    I thought MS wanted everyone to use Office365. It is harder to
    force people to get a new computer, but a monthly fee will recur
    automatically.

    When I need a tool--I buy that tool--I never rent that tool.

    Name one feature I would want from office365 that was not already
    present in office from <say> 1998.

    I do not know if MS can legally cancel your MS Office license,
    and I doubt the few "software pirates" who continue to use an
    unsupported ("invalid") version would be worth MS' time and
    effort to prevent such people from using such software.

    However, there seems to be a strong trend toward "you shall own
    nothing."

    MS, then moves all the menu items to different pull downs and
    makes it difficult to adjust to the new SW--and then it has the
    Gaul to chew up valuable screen space with ever larger pull-
    down bars.

    Ah, but they are just beginning to include advertising. Imagine
    every time one uses the mouse (to indicate to the computer that
    the user's eyes are focused on a particular place) an
    advertisement appears and follows the cursor movement. Even just
    having menu entries that are advertisements would be kind of
    annoying, but one would be able to get rid of those by leasing
    the premium edition (until one needs to lease the platinum
    edition, then the "who wants to remain a millionaire" edition).

    Why would I or anyone want advertising in office ????????

    Why would anyone want advertising in in a Windows Start Menu?

    For Microsoft such provides a bit more revenue/profit as
    businesses seem willing to pay for such advertisements. Have you
    ever heard "You are not the consumer; you are the product"?

    I think I read that some streaming services have added
    advertising to their (formerly) no-advertising subscriptions, so
    the suggested lease term inflation is not completely
    unthinkable.

    Is it any wonder users want the 1937 milling machine model ???

    Have no fear; soon you may be merely leasing your computer.
    Computers need to have the latest spyware so that advertisements
    can be appropriately targeted and adblocking must be made
    impossible.

    I am the kind of guy that turns off "telemetry" and places advertisers
    in /hosts file.

    If all new computers are "leased" (where tampering with the
    device rCo or not connecting it to the Internet such that it can
    phone home rCo revokes "ownership" and not merely warranty and one
    agrees to a minimum use [to ensure that enough ads are viewed]),
    ordinary users (who cannot assemble devices from commodity
    parts) would not have a choice. If governments enforce the
    rights of corporations to protect their businesses by outlawing
    sale of computer components to anyone who would work around the
    cartel, owning a computer could become illegal. Governments have
    an interest in having all domestic computers be both secure and
    to facilitate domestic surveillance, so mandating features that
    remove freedom and require an upgrade cycle (which is also good
    for the economyry|) has some attraction.

    I doubt people like you are a sufficient threat to profits that
    such extreme measures will be used, but the world (and
    particularly the U.S.) seems to be becoming somewhat dystopian.

    This is getting kind of off-topic and is certainly not something
    I want to think about.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Tue Feb 10 17:53:10 2026
    From Newsgroup: comp.arch

    MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

    Why would I or anyone want advertising in office ????????

    It is enough if Microsoft wants it... Oh, they'll call it
    "information" or "tips". This was already displayed it in the
    start menu on my work computer some time ago because of some
    IT failure (they failed to turn it off).
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From George Neuner@gneuner2@comcast.net to comp.arch on Tue Feb 10 14:13:57 2026
    From Newsgroup: comp.arch

    On Mon, 09 Feb 2026 19:33:36 GMT, MitchAlsup
    <user5857@newsgrouper.org.invalid> wrote:


    Name one feature I would want from office365 that was not already
    present in office from <say> 1998.

    YMMV, but I'd say OpenDocument (ISO 26300) support.

    Like you, I stayed with Office97 for a long time. I jumped to 2013
    for awhile, briefly toyed with OpenOffice, and finally went to
    LibreOffice and never looked back.

    The biggest problem with Microsoft Office was/is that its various
    versions all had backward incompatibilities, so they could (and did)
    F_ up even working with their own .doc files.


    Why would I or anyone want advertising in office ????????

    LibreOffice and OpenOffice don't have advertising.

    Yes, you do need to get used to different menus / things you use
    frequently being in different places.

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch on Wed Feb 11 15:05:34 2026
    From Newsgroup: comp.arch

    On 09/02/2026 20:33, MitchAlsup wrote:

    Paul Clayton <paaronclayton@gmail.com> posted:

    On 2/5/26 2:02 PM, MitchAlsup wrote:

    Paul Clayton <paaronclayton@gmail.com> posted:

    [snip]
    Cooperating with AMD to develop a more sane encoding while
    supporting low overhead for old binaries would have been better
    for customers (I think). However, doing what is best generally
    for customers is not necessarily the most profitable action.

    Yes, imaging Custer (Intel) and AMD (Sioux) sitting down together
    and making optimal battle plans for Little Big Horn battle to come.

    Rather than making battle plans for how to annihilate each
    other, perhaps finding a better solution than the ratting each
    other out in the prisoner's dilemma.

    [snip]
    One can still buy a milling machine built in 1937 and run it in his shop. >>> Can one even do this for software from the previous decade ??

    Yes, but dependency on (proprietary) servers for some games has
    made them (unnecessarily) unplayable.

    From what I understand, one can still run WordPerfect under a
    DOS emulator on modern x86-64.

    With the poor security of much software, even OSes, one might
    want to contain any legacy software in a more secured
    environment.

    Most old software did not have poor security. It was secure by not
    having features that could be abused - and thus no need to worry about
    extra layers to protect said features. MS practically invented the
    concept of insecure applications like word processors - they put
    unnecessary levels of automation and macros, integrated it with email (especially their already hopelessly insecure programs), and so on. No
    real user has any need for "send this document by email" in their word processor - but spam robots loved it. (MS even managed to figure out a
    way to let font files have executable malware in them.) If you go back
    to older tools that did the job they were supposed to do, without trying
    to do everything else, security is a non-issue for most software.

    The 1930's milling machine is safe because it is a milling machine. If
    MS made milling machines, they'd come with built-in beer fridges, TV
    screens and a subscription to sports channels - and in response to
    complaints of users chopping their fingers off, they'd add six layers of security gates that can't be passed without a Windows phone, controlled
    by a HAL 9000 that won't let you mill anything without first begging the
    IT department for permission. Of course, there would still be a small
    hatch at the back where you can put your remaining fingers in to get
    chopped off.


    Preventing automatic update is perhaps more of a hassle. Some
    people have placed software in a virtual machine that has no
    networking to avoid software breaking.

    MS wants you to buy Office every time you buy a new PC.

    I thought MS wanted everyone to use Office365. It is harder to
    force people to get a new computer, but a monthly fee will recur
    automatically.

    When I need a tool--I buy that tool--I never rent that tool.


    Nice in theory (and I fully agree with the aim), but it's getting
    steadily more difficult in practice.

    Name one feature I would want from office365 that was not already
    present in office from <say> 1998.


    Do you mean a /useful/ feature? That makes it a lot harder. What about
    that dancing paper clip? I haven't had any MS Office installed on a PC
    since Word for Windows 2.0 on Win3.11. (I have been a LibreOffice user
    since it's Star Office ancestor - not that I use office suite software
    much.)

    MS, then moves all the menu items to different pull downs and
    makes it difficult to adjust to the new SW--and then it has the
    Gaul to chew up valuable screen space with ever larger pull-
    down bars.

    Ah, but they are just beginning to include advertising. Imagine
    every time one uses the mouse (to indicate to the computer that
    the user's eyes are focused on a particular place) an
    advertisement appears and follows the cursor movement. Even just
    having menu entries that are advertisements would be kind of
    annoying, but one would be able to get rid of those by leasing
    the premium edition (until one needs to lease the platinum
    edition, then the "who wants to remain a millionaire" edition).

    Why would I or anyone want advertising in office ????????

    Why would MS care what /users/ want?


    Is it any wonder users want the 1937 milling machine model ???

    Have no fear; soon you may be merely leasing your computer.
    Computers need to have the latest spyware so that advertisements
    can be appropriately targeted and adblocking must be made
    impossible.

    I am the kind of guy that turns off "telemetry" and places advertisers
    in /hosts file.

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From George Neuner@gneuner2@comcast.net to comp.arch on Thu Feb 12 10:27:00 2026
    From Newsgroup: comp.arch



    Realize that I'm responding to posts from different people below. I
    hope the attribution is correction.



    On Wed, 11 Feb 2026 15:05:34 +0100, David Brown
    <david.brown@hesbynett.no> wrote:

    On 09/02/2026 20:33, MitchAlsup wrote:

    Paul Clayton <paaronclayton@gmail.com> posted:

    From what I understand, one can still run WordPerfect under a
    DOS emulator on modern x86-64.

    Yes you can.


    With the poor security of much software, even OSes, one might
    want to contain any legacy software in a more secured
    environment.

    Most old software did not have poor security. It was secure by not
    having features that could be abused - and thus no need to worry about
    extra layers to protect said features. MS practically invented the
    concept of insecure applications like word processors - they put
    unnecessary levels of automation and macros, integrated it with email >(especially their already hopelessly insecure programs), and so on. No
    real user has any need for "send this document by email" in their word >processor - but spam robots loved it. (MS even managed to figure out a
    way to let font files have executable malware in them.) If you go back
    to older tools that did the job they were supposed to do, without trying
    to do everything else, security is a non-issue for most software.

    Automation and macros? By that definition, you could argue that
    WordStar invented insecurity (on micros), and everyone else followed
    its bad example.

    [You also could argue that the TECO editor on Unix was the origin )a
    decade before WordStar), but the Unix environment made it more
    difficult to cause any /major/ havoc with a dangerous editor macro.]

    Adding networking to CP/M, or DOS, or [early] Windows, just amplified
    the problem by making it easier to share and exchange files. The
    insecure OSes, combined with too powerful macro systems, made it
    relatively easy to destroy the whole system.


    MS wants you to buy Office every time you buy a new PC.

    I thought MS wanted everyone to use Office365. It is harder to
    force people to get a new computer, but a monthly fee will recur
    automatically.

    When I need a tool--I buy that tool--I never rent that tool.

    There won't be a choice. Sooner or later, Microsoft will stop selling
    software with perpetual licenses.

    Yet another reason to stop using their stuff.

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Wed Feb 18 15:51:04 2026
    From Newsgroup: comp.arch

    On 2/9/2026 8:18 PM, Paul Clayton wrote:
    On 2/9/26 2:33 PM, MitchAlsup wrote:

    Paul Clayton <paaronclayton@gmail.com> posted:

    On 2/5/26 2:02 PM, MitchAlsup wrote:

    [snip]>>> MS wants you to buy Office every time you buy a new PC.

    I thought MS wanted everyone to use Office365. It is harder to
    force people to get a new computer, but a monthly fee will recur
    automatically.

    When I need a tool--I buy that tool--I never rent that tool.

    Name one feature I would want from office365 that was not already
    present in office from <say> 1998.

    I do not know if MS can legally cancel your MS Office license, and I
    doubt the few "software pirates" who continue to use an unsupported ("invalid") version would be worth MS' time and effort to prevent such people from using such software.

    However, there seems to be a strong trend toward "you shall own nothing."

    MS, then moves all the menu items to different pull downs and
    makes it difficult to adjust to the new SW--and then it has the
    Gaul to chew up valuable screen space with ever larger pull-
    down bars.

    Ah, but they are just beginning to include advertising. Imagine
    every time one uses the mouse (to indicate to the computer that
    the user's eyes are focused on a particular place) an
    advertisement appears and follows the cursor movement. Even just
    having menu entries that are advertisements would be kind of
    annoying, but one would be able to get rid of those by leasing
    the premium edition (until one needs to lease the platinum
    edition, then the "who wants to remain a millionaire" edition).

    Why would I or anyone want advertising in office ????????

    Why would anyone want advertising in in a Windows Start Menu?

    For Microsoft such provides a bit more revenue/profit as businesses seem willing to pay for such advertisements. Have you ever heard "You are not
    the consumer; you are the product"?

    I think I read that some streaming services have added
    advertising to their (formerly) no-advertising subscriptions, so
    the suggested lease term inflation is not completely
    unthinkable.


    Better at this point to just use LibreOffice or similar...

    Well, and to not use Windows 11 ...


    For now, my main PC still uses Windows 10, and at this point would
    almost rather jump ship to Linux if need be, than go to Windows 11.

    Like, MS has become hell bent on turning Windows 11 into a trash fire.


    Is it any wonder users want the 1937 milling machine model ???

    Have no fear; soon you may be merely leasing your computer.
    Computers need to have the latest spyware so that advertisements
    can be appropriately targeted and adblocking must be made
    impossible.

    I am the kind of guy that turns off "telemetry" and places advertisers
    in /hosts file.

    If all new computers are "leased" (where tampering with the
    device rCo or not connecting it to the Internet such that it can
    phone home rCo revokes "ownership" and not merely warranty and one
    agrees to a minimum use [to ensure that enough ads are viewed]),
    ordinary users (who cannot assemble devices from commodity
    parts) would not have a choice. If governments enforce the
    rights of corporations to protect their businesses by outlawing
    sale of computer components to anyone who would work around the
    cartel, owning a computer could become illegal. Governments have
    an interest in having all domestic computers be both secure and
    to facilitate domestic surveillance, so mandating features that
    remove freedom and require an upgrade cycle (which is also good
    for the economyry|) has some attraction.

    I doubt people like you are a sufficient threat to profits that
    such extreme measures will be used, but the world (and
    particularly the U.S.) seems to be becoming somewhat dystopian.

    This is getting kind of off-topic and is certainly not something I want
    to think about.

    Ironically, this sort of thing, and also locking down computers enough
    that they only allow basic user programs and disallow "side loading"
    etc, was an element in some of my sci-fi stories.

    But, not exactly an optimistic point.

    And, there was effectively an illicit black market for unconstrained
    computers and computer parts (mostly salvaged). Where, a computer built
    mostly from salvaged parts from old electronics would be worth more than
    a new computer available though official channels (and within legal
    limits in terms of OS and hardware specs).


    Though, in such a world, owning an unconstrained computer would be seen
    as both illegal and dangerous.

    But, not all sci-fi needs to be utopic or optimistic (this itself seems
    like a trap, both that people assume this as a default, or that people
    can mistake overt dystopias as an ideal to strive for).

    But, then if one includes "obvious bad" things (like a bunch of WWII
    type stuff; mass euthanasia and so on), then almost invariably someone
    thinks that one is endorsing it and gets offended about it.

    Well, and sometimes one needs to be able to reference and depict bad
    things in order to denounce them as such.

    Granted, things are not so great when one is more prone to dealing with
    "gray on gray morality" rather than a more clear cut "battle of good
    versus evil" theme. Reality more tends towards the former, but society
    prefers the latter, and choosing the latter more often leads one into a
    trap (more so if one assumes that the protagonists' side is always
    necessarily the "good" one).

    Say, flip the perspective, and tell a story from the POV of the villain,
    and almost invariably they become an anti-hero even if they continue
    pretty much the exact same actions they would have taken if a villain
    from the hero's POV (more so if they are humanized in any way, or their actions are given any sort of justification, even if said justification
    is purely self-serving and egocentric).

    Alas...


    ...


    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From jgd@jgd@cix.co.uk (John Dallman) to comp.arch on Thu Feb 19 08:02:00 2026
    From Newsgroup: comp.arch

    In article <10n2u02$270jc$5@dont-email.me>, paaronclayton@gmail.com (Paul Clayton) wrote:

    I remember reading about the 8080 _ 8086 assembly translator. I
    did not know that CP/M and MS-DOS were similar enough to
    facilitate porting, so that note was interesting to me.

    /Early/ MS-DOS. That used CPM-like File Control Blocks, and didn't have hierarchical directories. It didn't really support hard disks. The
    CP/M-style APIs all carried on existing after MS-DOS 2.0 introduced a new
    set of APIs that were more suitable for high-level languages, but they
    weren't much used un new software.

    Intel presumably thought Itanium would be the only merchant
    64-bit ISA that mattered (and this would exclude AMD) and
    that the masses could use 32-bit until less expensive Itanium
    processors were possible.

    Pretty much. Then the struggle to make Itanium run fast became the
    overpowering concern, until they gave up and concentrated on x86-64,
    claiming that Itanium would be back in a few years.

    I don't think many people took that claim seriously. Some years later, an
    Intel marketing man was quite shocked to hear that, and that the world
    had simply been humouring them.

    I agree that such would add complexity, but there is already
    complexity for power saving with same ISA heterogeneity. NUMA-
    awareness, cache sharing, and cache warmth also complicate
    scheduling, so the question becomes how much extra complexity
    does such introduce.

    If the behaviour of Apple's OSes are any guide, complexity is avoided as
    far as possible.

    I still feel an attraction to a market-oriented resource
    management such that threads could both minimize resource use
    (that might be more beneficial to others) and get more than a
    fair-share of resources that are important.

    The difficulty there is that developers will have a very hard time
    creating /measurable/ speed-ups that apply across a wide range of
    different configurations. Companies will therefore be reluctant to put developer hours into it that could go into features that customers are
    asking for.

    John
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Thu Feb 19 05:53:20 2026
    From Newsgroup: comp.arch

    On 2/19/2026 2:02 AM, John Dallman wrote:
    In article <10n2u02$270jc$5@dont-email.me>, paaronclayton@gmail.com (Paul Clayton) wrote:

    I remember reading about the 8080 _ 8086 assembly translator. I
    did not know that CP/M and MS-DOS were similar enough to
    facilitate porting, so that note was interesting to me.

    /Early/ MS-DOS. That used CPM-like File Control Blocks, and didn't have hierarchical directories. It didn't really support hard disks. The
    CP/M-style APIs all carried on existing after MS-DOS 2.0 introduced a new
    set of APIs that were more suitable for high-level languages, but they weren't much used un new software.


    My own limited experience with MS-DOS programming mostly showed them
    using integer file-handles an a vaguely Unix-like interface for file IO
    at the "int 21h" level.

    Which is, ironically, in conflict with the "FILE *" interface used by
    C's stdio API.

    But, I do remember a mechanism involving shared FCB structs existing on MS-DOS, but AFAIK it was mostly unused in favor of the use of integer
    handles.

    But, it has been a very long time since I have messed around much with
    DOS (well, and in my childhood it was mostly 6.20 and 6.22 and similar,
    or 7.00 if using the version that came with Win95).


    Though, I have a vague memory of sometimes setting up chimeric versions
    of DOS, mostly intentionally installing 7.00 on top of 6.22 because
    there were a lot of additional programs that existed within a 6.22
    install that were absent 7.00.

    IIRC, in 6.22 it had QBASIC and EDIT was a thin wrapper over QBASIC,
    whereas 7.00 had dropped QBASIC and made EDIT self-contained, or
    something to this effect. So, in this sense, it made sense to install
    6.22 first and then 7.00 on top of it to get a DOS install that still
    had things like QBASIC and similar.


    Though, this was before switching over to dual-booting Slackware and
    NT4, and then running Cygwin on NT4 (was my general setup in
    middle-school before switching to Win2K in high-school). By that point,
    had mostly abandoned QBASIC (but, QBASIC was used some in elementary
    school).

    Well, apart from some vague (unconfirmed) memories of being exposed to
    Pascal via the "Mac Programmer's Workbench" thing at one point and being totally lost (was very confused, used a CLI but the CLI commands didn't
    make sense). Memory was like a Macintosh II with an external HDD and magneto-optical drive (*). Seemingly, the hardware exists and matches my memory, so presumably I saw it, but the memories also, doesn't make much sense.

    *: Like, it use disks that were sort of like giant versions of the 3.5" floppies holding a something resembling a rainbow-colored CD-ROM (disk protected by the sliding door). Or, say, if one took a 3.5" floppy and
    scaled it up to be a little larger than a 5.25" floppy (and if it held something resembling a rainbow-patterned CD-ROM).

    These things were a novelty as basically none of the other computers
    used them (everything else using normal floppies or CD-ROMs). Like, some
    sort of weird alien tech.


    Like, someone brought it in and had me try to use it (at the elementary school), and I didn't get it (like, it was confusing, and/or I was too
    stupid to use it at the time).

    Well, nothing like this ever happened again, I guess I had kinda blew it pretty hard at the time. The person took the computer with them and
    left. I am not sure why they came by (was managed by a guy who wasn't
    one of the usual teachers).


    Pretty much everything else ran DOS, apart from some Apple II/E and an
    Apple II/GS and similar (where the II/GS was kinda like the Mac, but
    with no MPW). The II/E's could also do BASIC, or one could boot up games
    like "Oregon Trail" and similar.

    Or, sorta timeline:
    Early years: Mostly played NES and watched TV.
    Like, mostly Super Mario Bros and similar.
    Well, and TV shows like "Captain N" and "Super Mario Super Show".
    Lots of NES related stuff going on in this era.
    Started elementary school:
    Distrurbing experience of being around other people;
    And like, none of them could read or similar (*1);
    Teacher was surprised, went and got librarian, ...
    Started messing around with Apple II/e's;
    Some BASIC on the II/e.
    Encounter with the guy with the Mac II;
    PCs were around, typically with 5.25" floppy drives;
    TV shows included things like the Sonic the Hedgehog cartoons.
    Also ReBoot and similar.
    Well, and "Star Trek: TNG".
    Parents got a PC: Had Win 3.11 and a CD-ROM drive.
    Mostly played games like Wolfenstein 3D and similar.
    QBASIC era started;
    Got my own PC;
    Started writing some stuff in real-mode ASM;
    Started moving from QBASIC to C;
    Windows 95 appeared;
    Moved (~ end of Elementary School era, following 6th grade);
    Tried using Win95, but it sucked.
    Jumped to NT4;
    Started Middle School.
    TV Shows in this era: "Star Trek: Voyager" and "Deep Space Nine".
    High School:
    Jumped to Windows 2000.
    "Star Trek: Enterprise" (but... it sucked...).
    ...

    *1: Was likely unexpected that I would make an issue about no one being
    able to read, but I think at the time, I was not particularly impressed
    by being shown cards with letters and similar, so...

    I don't remember my very early years though (my span of memory seemingly starting in the NES era).


    But, alas, my life since then was basically a failure...

    I guess back then, maybe people didn't realize it yet.

    Well, starting in middle school, stuff sucked a lot more. Just had to
    sit around classes and basically say nothing to draw undue attention,
    which sucked (was nicer when people just let me go off and mess with
    computers and similar). Still never really did much schoolwork though,
    just did tests when they came along (though, this strategy was an epic
    fail for college level calculus classes though... I just sorta figured I
    was too stupid for this stuff...).

    ...


    Intel presumably thought Itanium would be the only merchant
    64-bit ISA that mattered (and this would exclude AMD) and
    that the masses could use 32-bit until less expensive Itanium
    processors were possible.

    Pretty much. Then the struggle to make Itanium run fast became the overpowering concern, until they gave up and concentrated on x86-64,
    claiming that Itanium would be back in a few years.

    I don't think many people took that claim seriously. Some years later, an Intel marketing man was quite shocked to hear that, and that the world
    had simply been humouring them.


    In a way, it showed that they screwed up the design pretty hard that
    x86-64 ended up being the faster and more efficient option...

    I guess one question is if they had any other particular drawbacks other
    than, say:
    Their code density was one of the worst around;
    128 registers is a little excessive;
    128 predicate register bits is a bit WTF;
    ...


    I guess it is more of an open question of what would have happened, say,
    if Intel had gone for an ISA design more like ARM64 or RISC-V or something.

    These don't seem like they would have been too out-there, but then
    again, at the time there were also some of the WTF design choices that
    existed in MIPS and Alpha, so no guarantee it wouldn't have been screwed up.


    Well, or something like PowerPC, but then again, IBM still had
    difficulty keeping PPC competitive, so dunno. Then again, I think IBM's
    PPC issues were more related to trying to keep up in the chip fab race
    that was still going strong at the time, rather than an ISA design issue.


    I agree that such would add complexity, but there is already
    complexity for power saving with same ISA heterogeneity. NUMA-
    awareness, cache sharing, and cache warmth also complicate
    scheduling, so the question becomes how much extra complexity
    does such introduce.

    If the behaviour of Apple's OSes are any guide, complexity is avoided as
    far as possible.


    Unnecessary complexity is best avoided, as it often comes back to bite
    later.


    I still feel an attraction to a market-oriented resource
    management such that threads could both minimize resource use
    (that might be more beneficial to others) and get more than a
    fair-share of resources that are important.

    The difficulty there is that developers will have a very hard time
    creating /measurable/ speed-ups that apply across a wide range of
    different configurations. Companies will therefore be reluctant to put developer hours into it that could go into features that customers are
    asking for.

    John

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From John Levine@johnl@taugh.com to comp.arch on Thu Feb 19 19:59:11 2026
    From Newsgroup: comp.arch

    According to BGB <cr88192@gmail.com>:
    /Early/ MS-DOS. That used CPM-like File Control Blocks, and didn't have
    hierarchical directories. ...

    My own limited experience with MS-DOS programming mostly showed them
    using integer file-handles an a vaguely Unix-like interface for file IO
    at the "int 21h" level.

    Yeah, Mark Zbikowski added them along with the tree structred file system in DOS 2.0.
    He was at Yale when I was, using a Unix 7th edition system I was supporting.
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Thu Feb 19 17:04:59 2026
    From Newsgroup: comp.arch

    On 2/19/2026 1:59 PM, John Levine wrote:
    According to BGB <cr88192@gmail.com>:
    /Early/ MS-DOS. That used CPM-like File Control Blocks, and didn't have
    hierarchical directories. ...

    My own limited experience with MS-DOS programming mostly showed them
    using integer file-handles an a vaguely Unix-like interface for file IO
    at the "int 21h" level.

    Yeah, Mark Zbikowski added them along with the tree structred file system in DOS 2.0.
    He was at Yale when I was, using a Unix 7th edition system I was supporting.


    Looks it up...


    Yeah, my case, I didn't exist yet when the MS-DOS 2.x line came out...

    Did exist for the 3.x line though.
    I don't remember much from those years though.


    Some fragmentary memories implied that (in that era) had mostly been
    watching shows like Care Bears and similar (but, looking at it at a
    later age, found it mostly unwatchable). I think also shows like Smurfs
    and Ninja Turtles and similar, etc.

    Like, at some point, memory breaking down into sort of an amorphous mass
    of things from TV shows all just sort of got mashed together. Not much
    stable memory of things other than fragments of TV shows and such.


    Not sure what the experience is like for most people though.


    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From jgd@jgd@cix.co.uk (John Dallman) to comp.arch on Thu Feb 19 23:10:00 2026
    From Newsgroup: comp.arch


    My own limited experience with MS-DOS programming mostly showed
    them using integer file-handles an a vaguely Unix-like interface
    for file IO at the "int 21h" level.

    Which is, ironically, in conflict with the "FILE *" interface used
    by C's stdio API.

    However, it's entirely concordant with Unix's lower-level file
    descriptors, as used in the read() and write() calls.

    <https://en.wikipedia.org/wiki/File_descriptor> <https://en.wikipedia.org/wiki/Read_(system_call)>

    The FILE* interface is normally implemented on top of the lower-level
    calls, with a buffer in the process' address space, managed by the C
    run-time library. The file descriptor is normally a member of the FILE structure.

    MS-DOS is not a great design, but it isn't crazy either.

    Well, apart from some vague (unconfirmed) memories of being exposed
    to Pascal via the "Mac Programmer's Workbench" thing at one point
    and being totally lost (was very confused, used a CLI but the CLI
    commands didn't make sense).

    I used it very briefly. It was a very weird CLI, seemingly designed by
    someone opposed to the basic idea of a CLI.

    In a way, it showed that they screwed up the design pretty hard
    that x86-64 ended up being the faster and more efficient option...

    They did. They really did.

    I guess one question is if they had any other particular drawbacks
    other than, say:
    Their code density was one of the worst around;
    128 registers is a little excessive;
    128 predicate register bits is a bit WTF;

    Those huge register files had a lot to do with the low code density. They
    had two much bigger problems, though.

    They'd correctly understood that the low speed of affordable dynamic RAM
    as compared to CPUs running at hundreds of MHz was the biggest barrier to making code run fast. Their solution was have the compiler schedule loads
    well in advance. They assumed, without evidence, that a compiler with
    plenty of time to think could schedule loads better than hardware doing
    it dynamically. It's an appealing idea, but it's wrong.

    It might be possible to do that effectively in a single-core,
    single-thread, single-task system that isn't taking many (if any)
    interrupts. In a multi-core system, running a complex operating system,
    several multi-threaded applications, and taking frequent interrupts and
    context switches, it is _not possible_. There is no knowledge of any of
    the interrupts, context switches or other applications at compile time,
    so the compiler has no idea what is in cache and what isn't. I don't
    understand why HP and Intel didn't realise this. It took me years, but I
    am no CPU designer.

    Speculative execution addresses that problem quite effectively. We don't
    have a better way, almost thirty years after Itanium design decisions
    were taken. They didn't want to do speculative execution, and they close
    an instruction format and register set that made adding it later hard. If
    it was ever tried, nothing was released that had it AFAIK.

    The other problem was that they had three (or six, or twelve) in-order pipelines running in parallel. That meant the compilers had to provide
    enough ILP to keep those pipelines fed, or they'd just eat cache capacity
    and memory bandwidth executing no-ops ... in a very bulky instruction set.
    They didn't have a general way to extract enough ILP. Nobody does, even
    now. They just assumed that with an army of developers they'd find enough heuristics to make it work well enough. They didn't.

    There was also an architectural misfeature with floating-point advance
    loads that could make them disappear entirely if there was a call
    instruction between an advance-load instruction and the corresponding check-load instruction. That cost me a couple of weeks working out and reporting the bug, which was unfixable. The only work-around was to
    re-issue all outstanding all floating-point advance-load instruction
    after each call returned. The effective code density went down further,
    and there were lots of extra read instructions issued.

    I guess it is more of an open question of what would have happened,
    say, if Intel had gone for an ISA design more like ARM64 or RISC-V
    or something.

    ARM64 seems to me to be the product of a lot more experience with speculatively-executing processors than was available in 1998. RISC-V has
    not demonstrated really high performance yet, and it's been around long
    enough that I'm starting to doubt it ever will.

    Well, or something like PowerPC, but then again, IBM still had
    difficulty keeping PPC competitive, so dunno. Then again, I think
    IBM's PPC issues were more related to trying to keep up in the chip
    fab race that was still going strong at the time, rather than an
    ISA design issue.

    I think that was fabs, rather than architecture. While I was providing libraries for PowerPC (strictly, POWER4, POWER5 and POWER6, one after
    another) it always had rather decent performance for its clockspeed and process.

    John
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Feb 20 00:06:35 2026
    From Newsgroup: comp.arch


    jgd@cix.co.uk (John Dallman) posted:


    They did. They really did.

    I guess one question is if they had any other particular drawbacks
    other than, say:
    Their code density was one of the worst around;
    128 registers is a little excessive;
    128 predicate register bits is a bit WTF;

    Those huge register files had a lot to do with the low code density. They
    had two much bigger problems, though.

    They'd correctly understood that the low speed of affordable dynamic RAM
    as compared to CPUs running at hundreds of MHz was the biggest barrier to making code run fast. Their solution was have the compiler schedule loads well in advance. They assumed, without evidence, that a compiler with
    plenty of time to think could schedule loads better than hardware doing
    it dynamically. It's an appealing idea,

    possibly

    but it's wrong.

    at best we can say that their version failed to provide performance.
    The future may well prove it flat-out wrong at some point.

    It might be possible to do that effectively in a single-core,
    single-thread, single-task system that isn't taking many (if any)
    interrupts. In a multi-core system, running a complex operating system, several multi-threaded applications, and taking frequent interrupts and context switches, it is _not possible_. There is no knowledge of any of
    the interrupts, context switches or other applications at compile time,
    so the compiler has no idea what is in cache and what isn't. I don't understand why HP and Intel didn't realise this. It took me years, but I
    am no CPU designer.

    At the time of conception, there were amny arguments that {sooner or
    later} compilers COULD figure stuff like this out. Now, 30 years later
    the compilers are still in the position of having made LITTLE progress.

    I suspect a big part of the problem was tension between Intel and HP
    were the only political solution was allowing the architects from both
    sides to "dump in" their favorite ideas. A recipe for disaster.

    Speculative execution addresses that problem quite effectively. We don't
    have a better way, almost thirty years after Itanium design decisions
    were taken. They didn't want to do speculative execution, and they close
    an instruction format and register set that made adding it later hard. If
    it was ever tried, nothing was released that had it AFAIK.

    The other problem was that they had three (or six, or twelve) in-order pipelines running in parallel. That meant the compilers had to provide
    enough ILP to keep those pipelines fed, or they'd just eat cache capacity
    and memory bandwidth executing no-ops ... in a very bulky instruction set. They didn't have a general way to extract enough ILP. Nobody does,

    Reservation stations* provide such--but they do not use a multiplicity of
    in order pipelines.

    (*) and similar.

    even
    now. They just assumed that with an army of developers they'd find enough heuristics to make it work well enough. They didn't.

    There was also an architectural misfeature with floating-point advance
    loads that could make them disappear entirely if there was a call
    instruction between an advance-load instruction and the corresponding check-load instruction. That cost me a couple of weeks working out and reporting the bug, which was unfixable. The only work-around was to
    re-issue all outstanding all floating-point advance-load instruction
    after each call returned. The effective code density went down further,
    and there were lots of extra read instructions issued.

    LoL, I guess I am surprised that the same could not happen at interrupt
    or exception....

    I guess it is more of an open question of what would have happened,
    say, if Intel had gone for an ISA design more like ARM64 or RISC-V
    or something.

    ARM64 seems to me to be the product of a lot more experience with speculatively-executing processors than was available in 1998. RISC-V has
    not demonstrated really high performance yet, and it's been around long enough that I'm starting to doubt it ever will.

    In my humble opinion, there is a lot less wrong with ARM than RISC-V

    Well, or something like PowerPC, but then again, IBM still had
    difficulty keeping PPC competitive, so dunno. Then again, I think
    IBM's PPC issues were more related to trying to keep up in the chip
    fab race that was still going strong at the time, rather than an
    ISA design issue.

    I think that was fabs, rather than architecture.

    I suspect it was the cash-flow the product produced that limited
    "development" ... Whereas x86 and ARM have the kind of cash flow
    that allows/supports whatever the designers can invent that adds
    performance.

    While I was providing libraries for PowerPC (strictly, POWER4, POWER5 and POWER6, one after another) it always had rather decent performance for its clockspeed and process.

    John
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Thu Feb 19 22:35:45 2026
    From Newsgroup: comp.arch

    At the time of conception, there were amny arguments that {sooner or
    later} compilers COULD figure stuff like this out.

    I can't remember seeing such arguments comping from compiler people, tho.

    Now, 30 years later the compilers are still in the position of having
    made LITTLE progress.

    And, to be honest, compiler people had been working on similar problems
    for 30 years already, so most compiler people aren't surprised that 30
    more made no significant difference.

    I suspect a big part of the problem was tension between Intel and HP
    were the only political solution was allowing the architects from both
    sides to "dump in" their favorite ideas. A recipe for disaster.

    The odd thing is that these were hardware companies betting on "someone
    else" solving their problem, yet if compiler people truly had managed to
    solve those problems, then other hardware companies could have taken
    advantage just as well.

    So from a commercial strategy it made very little sense.

    To me the main question is whether they were truly confused and just got
    lucky (lucky because they still managed to sell their idea enough that
    most RISC companies folded), or whether they truly understood that the
    actual technical success of the architecture didn't matter and that it
    was just a clever way to kill the RISC architectures.


    === Stefan
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Fri Feb 20 15:14:08 2026
    From Newsgroup: comp.arch

    BGB wrote:
    On 2/19/2026 1:59 PM, John Levine wrote:
    According to BGB-a <cr88192@gmail.com>:
    /Early/ MS-DOS. That used CPM-like File Control Blocks, and didn't have >>>> hierarchical directories. ...

    My own limited experience with MS-DOS programming mostly showed them
    using integer file-handles an a vaguely Unix-like interface for file IO
    at the "int 21h" level.

    Yeah, Mark Zbikowski added them along with the tree structred file
    system in DOS 2.0.
    He was at Yale when I was, using a Unix 7th edition system I was
    supporting.


    Looks it up...


    Yeah, my case, I didn't exist yet when the MS-DOS 2.x line came out...

    Did exist for the 3.x line though.
    I don't remember much from those years though.


    Some fragmentary memories implied that (in that era) had mostly been watching shows like Care Bears and similar (but, looking at it at a
    later age, found it mostly unwatchable). I think also shows like Smurfs
    and Ninja Turtles and similar, etc.

    Like, at some point, memory breaking down into sort of an amorphous mass
    of things from TV shows all just sort of got mashed together. Not much > stable memory of things other than fragments of TV shows and such.


    Not sure what the experience is like for most people though.
    My memory from before the age of 4 is extremely spotty, just a couple of situations that made a lasting impact.
    By the time MSDOS 2.0 came out I had already handed in my MSEE thesis. :-) Terje
    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Fri Feb 20 15:29:27 2026
    From Newsgroup: comp.arch

    On 2/19/2026 5:10 PM, John Dallman wrote:
    My own limited experience with MS-DOS programming mostly showed
    them using integer file-handles an a vaguely Unix-like interface
    for file IO at the "int 21h" level.

    Which is, ironically, in conflict with the "FILE *" interface used
    by C's stdio API.

    However, it's entirely concordant with Unix's lower-level file
    descriptors, as used in the read() and write() calls.

    <https://en.wikipedia.org/wiki/File_descriptor> <https://en.wikipedia.org/wiki/Read_(system_call)>

    The FILE* interface is normally implemented on top of the lower-level
    calls, with a buffer in the process' address space, managed by the C
    run-time library. The file descriptor is normally a member of the FILE structure.

    MS-DOS is not a great design, but it isn't crazy either.


    Yeah.


    Well, apart from some vague (unconfirmed) memories of being exposed
    to Pascal via the "Mac Programmer's Workbench" thing at one point
    and being totally lost (was very confused, used a CLI but the CLI
    commands didn't make sense).

    I used it very briefly. It was a very weird CLI, seemingly designed by someone opposed to the basic idea of a CLI.


    My vague memories was that its commands were just sort of straight up paradoxical. I don't remember much about them now though, other than
    being confused trying to look at them (or getting anything to happen).


    But, yeah, at my level of mental development at the time, whole
    experience was confusing. Also it using external drives (sorta like on
    the Apple II) but connected up with what looked like printer cables.

    But, I don't really know exactly why the guy with the computer showed
    up, or why he left, but he didn't seem pleased in any case.


    Exact timeline is fuzzy, but I do remember enough familiarity with
    MS-DOS to recognize they were almost completely different.

    And, unlike the Apple II/e, which had essentially used BASIC.


    But, either way, the experience (of MPW weirdness) was not something I
    would have been ready for at that stage of development.

    Well, and apparently a detail I missed in all of this, being that one
    didn't just do a SHIFT+RETURN, but it was apparently necessary to select
    the text for the command one wanted to run (with the mouse) before
    hitting SHIFT+RETURN (or, hitting the keys without selecting something
    first does nothing). Could be related to my difficulties/bewilderment at
    the time (compared with DOS, which was more like "type command and hit ENTER").

    Somehow I didn't remember anything about the "select command first"
    part. More seeing it like "click on the command window and do keyboard shortcuts" and then having it not work.


    But, I guess, some memories of mine, namely the thing of needing to do a ritual of dragging the drive to the trash-can and then also push a
    button on the front of the drive, is reasonably correct for those drives.

    Well, vs the 3.5" drive: Drag to trash, it ejects itself.

    Or, DOS/Windows/etc, wait until drive stops, press button to eject disk.
    Was very important in this case though to drag the drive to the trash
    and then wait for the light on the drive to go off, then press the eject button (and with a good solid press, the disk ejects).

    Also, it using a black-and-white monitor in an era where most others
    around were color (though with a typically lower screen resolution).



    Does seem like a sort of weird almost surreal memory.

    Does imply that my younger self was notable, and not seen as just some otherwise worthless nerd.

    Even if I totally failed at the tasks the guy had wanted from me.

    So, I was confused, and the guy left in frustration.



    In a way, it showed that they screwed up the design pretty hard
    that x86-64 ended up being the faster and more efficient option...

    They did. They really did.


    Yeah.


    I guess one question is if they had any other particular drawbacks
    other than, say:
    Their code density was one of the worst around;
    128 registers is a little excessive;
    128 predicate register bits is a bit WTF;

    Those huge register files had a lot to do with the low code density. They
    had two much bigger problems, though.

    They'd correctly understood that the low speed of affordable dynamic RAM
    as compared to CPUs running at hundreds of MHz was the biggest barrier to making code run fast. Their solution was have the compiler schedule loads well in advance. They assumed, without evidence, that a compiler with
    plenty of time to think could schedule loads better than hardware doing
    it dynamically. It's an appealing idea, but it's wrong.


    My CPU core doesn't do speculative prefetch either, but this seems more
    like a "big OoO CPU" feature.

    There is a sort of form of very limited/naive prefetch, where if it
    guesses that one line of a like pair is likely to be followed by an
    access to the following line in the pair (via heuristics), it will
    prefetch the following line. This can help with things like linear
    memory walks.


    Could be better if there was a good/reliable way to detect linear walks.

    Say, ideal case would be that in linear walk scenarios, most of the
    memory fetches for the walk are via prefetches (while limiting the
    number of hard misses).

    For the L1 I$, one can assume linear walking by default.

    Though, arguably the effectiveness of a prefetch is reduced in cases
    where the hard-miss is likely to happen before the result of the
    prefetch arrives (even if it is an L2 hit), but does maybe give the L2
    cache a few cycles of "heads up" in the case of an L2 miss.


    In my case, as noted, I ended up using 64 registers, but can note:
    32 is near-optimal for generic code;
    Works well for 32-bit instruction words;
    64 deals better with high-pressure scenarios;
    Is a little tight for 32-bit instruction words;
    128 is likely invariably overkill
    Not particularly viable with 32-bit instruction words.


    Using register-paired types does result in "spikes" in register
    pressure, and is a strong case where supporting 64 registers makes sense
    (eg, so code generation doesn't get "owned"/"pwnt" when dealing with
    int128 or paired-128-bit-SIMD).

    Though, in the case of paired 128-bit ops, the even-registers-only rule
    does have a side benefit of allowing for use of 5-bit register fields
    while accessing all 64 registers (though still leaves a pain point when accessing one of the 64-bit halves of the pair, say if it happens to be
    on the "wrong side" in the case of an ISA like RISC-V).


    For 128 predicate registers, this part doesn't make as much sense:
    Typically, 1 predicate bit is sufficient;
    When exploring schemes for more advanced predication (Eg, 2 or 7/8
    predicate registers), they didn't really even hit break-even.

    Even if going for an IA64 like approach, probably made more sense to
    have gone with an 8-register config, say:
    P0: Hard-wired to 1/True
    P1..P7: Dynamic Predicates


    But, as noted, it was uncommon to find scenarios where having more than
    a single predicate bit offered enough of an advantage over 1 predicated
    bit to make it worthwhile, so the single-bit scheme seemed to remain the
    most viable (with some more complex scenarios instead using GPRs for
    boolean logic tasks, even if using a GPR for boolean logic tasks is
    arguably wasteful).

    For XG3, had ended up with a scenario where directing Boolean operations
    to X0/RO was understood as updating the predicate bit:
    SLT, SGE, SEQ, SNE: Rd=X0, Sets/Clears SR.T
    AND/OR: Rd=X0, Also modifies SR.T (understood as a Boolean op).
    Contrast (Rd=X0):
    ADD/ADDI: NOP
    LHU/LWU: Reserved for Mode-Hops (XG3 supported) / NOP (unsupported).
    LHU: Jumps to RV64GC Mode (behaves like a JALR with Rd=X0)
    LWU: Jumps to XG3 Mode (behaves like a JALR with Rd=X0)
    Both being fall-through if XG3 is not supported.
    If it doesn't branch, it means only RISC-V ops are supported.


    Currently, the detection features are not used, as they only really make
    sense in a mixed-mode binary that could potentially be used on a plain
    RISC-V target.

    But, in other contexts, the typical pattern is to use pointer tagging,
    where:
    (0)=0: Jump to an address within the same mode, (63:48) ignored
    (1)=1: Jump with possible mode change, (63:48)=mode

    One other special feature is that the mode bits also encode a tag, which
    can be used to mark a pointer with the current process (with a value
    assigned by an RNG), with the LSB also being required to be set, if Rs1==X1.

    This can be used to add resistance against stack-stomping via buffer overflows, but is potentially risky with RISC-V:
    AUIPC X1, AddrHi
    JALR X0, AddrLo(X1)
    Can nuke the process, when officially it is allowed (vs forcing the use
    of a different register to encode a long branch).

    Where, for other contexts, AUIPC would necessarily need to produce an
    untagged address.


    It might be possible to do that effectively in a single-core,
    single-thread, single-task system that isn't taking many (if any)
    interrupts. In a multi-core system, running a complex operating system, several multi-threaded applications, and taking frequent interrupts and context switches, it is _not possible_. There is no knowledge of any of
    the interrupts, context switches or other applications at compile time,
    so the compiler has no idea what is in cache and what isn't. I don't understand why HP and Intel didn't realise this. It took me years, but I
    am no CPU designer.


    No idea there, but either way, seems like a difficult problem.


    Speculative execution addresses that problem quite effectively. We don't
    have a better way, almost thirty years after Itanium design decisions
    were taken. They didn't want to do speculative execution, and they close
    an instruction format and register set that made adding it later hard. If
    it was ever tried, nothing was released that had it AFAIK.

    The other problem was that they had three (or six, or twelve) in-order pipelines running in parallel. That meant the compilers had to provide
    enough ILP to keep those pipelines fed, or they'd just eat cache capacity
    and memory bandwidth executing no-ops ... in a very bulky instruction set. They didn't have a general way to extract enough ILP. Nobody does, even
    now. They just assumed that with an army of developers they'd find enough heuristics to make it work well enough. They didn't.


    Yeah...

    In my case, there is only 1 pipeline per core for now.
    But ISA is still mostly RISC-like.


    Not so much the 128-bits with 3-instructions thing, and then needing to
    NOP pad if one can't find 3 useful instructions which fit into the pipeline.

    My compiler would probably also be pretty awful if trying to target IA64.


    Though did get around to re-adding a repurposed version of the WEXifier
    for XG3 and RV, though its purpose was a little different in that these
    ISA's have no way to flag for parallel execution, so the purpose is more
    to shuffle instructions around to try to reduce register-RAW
    dependencies and to help out the in-order superscalar stuff.


    There was also an architectural misfeature with floating-point advance
    loads that could make them disappear entirely if there was a call
    instruction between an advance-load instruction and the corresponding check-load instruction. That cost me a couple of weeks working out and reporting the bug, which was unfixable. The only work-around was to
    re-issue all outstanding all floating-point advance-load instruction
    after each call returned. The effective code density went down further,
    and there were lots of extra read instructions issued.

    I guess it is more of an open question of what would have happened,
    say, if Intel had gone for an ISA design more like ARM64 or RISC-V
    or something.

    ARM64 seems to me to be the product of a lot more experience with speculatively-executing processors than was available in 1998. RISC-V has
    not demonstrated really high performance yet, and it's been around long enough that I'm starting to doubt it ever will.


    There seem to be some questionable design choices here, and also a lot
    of foot dragging for things that could help.


    They also seem to be relatively focused on the assumption of CPUs having low-latency ALU and Memory-Load ops, which seems like a dangerous
    assumption to make.


    Like, how about one not try to bake in assumptions about 1-cycle ALU and 2-cycle Load being practical?...

    Vs, say, 2-cycle ALU ops and 3-cycle Loads; with an ideal of putting 5 instructions between an instruction that generates a result and the instruction that consumes the result as this is more likely to work with in-order superscalar.


    But, then one runs into the issue that if a basic operation then
    requires a multi-op sequence, the implied latency goes up considerably
    (say, could call this "soft latency", or SL).

    So, for example, it means that, say:
    2-instruction sign extension:
    RV working assumption: 2 cycles
    Hard latency (2c ALU): 4 cycles
    Soft latency: 12 cycles.
    For a 3-op sequence, the effective soft-latency goes up to 18, ...

    And, in cases where the soft-latency significantly exceeds the total
    length of the loop body, it is no longer viable to schedule the loop efficiently.

    So, in this case, an indexed-load instruction has an effective 9c SL,
    whereas SLLI+ADD+LD has a 21 cycle SL.


    where, in this case, the goal of something like the WEXifier is to
    minimize this soft-latency cost (in cases where a dependency is seen,
    any remaining soft-latency is counted as penalty).

    But, then again, maybe the concept of this sort of "soft latency" seems
    a bit alien.


    Granted, not sure how this maps over to OoO, but had noted that even
    with modern CPUs, there still seems to be benefit from assuming a sort
    of implicit high latency for instructions over assuming a lower latency.



    Well, or something like PowerPC, but then again, IBM still had
    difficulty keeping PPC competitive, so dunno. Then again, I think
    IBM's PPC issues were more related to trying to keep up in the chip
    fab race that was still going strong at the time, rather than an
    ISA design issue.

    I think that was fabs, rather than architecture. While I was providing libraries for PowerPC (strictly, POWER4, POWER5 and POWER6, one after another) it always had rather decent performance for its clockspeed and process.


    OK.

    I guess it is a question here of what if IBM had outsourced their fab
    stuff earlier.

    Though, there is still the potential downside of licensing based
    production (say, if they went for something more like in the ARM model),
    which is possibly worse than the argued threat of vendor-based market fragmentation (the usual counter-argument against RISc-V, *1).

    *1: Where people argue that if each vendor can do a CPU with their own
    custom ISA variants and without needing to license or get approval from
    a central authority, that invariably everything would decay into an
    incoherent mess where there is no binary compatibility between
    processors from different vendors (usual implication being that people
    are then better off staying within the ARM ecosystem to avoid RV's lawlessness).


    John


    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Feb 20 23:49:54 2026
    From Newsgroup: comp.arch


    BGB <cr88192@gmail.com> posted:

    On 2/19/2026 5:10 PM, John Dallman wrote: ------------------------------------
    This can be used to add resistance against stack-stomping via buffer overflows, but is potentially risky with RISC-V:
    AUIPC X1, AddrHi
    JALR X0, AddrLo(X1)
    Can nuke the process, when officially it is allowed (vs forcing the use
    of a different register to encode a long branch).

    That should be:
    AUPIC x1,hi(offset)
    JALR x0,lo(offset)

    using:
    SETHI x1,AddrHi
    JALR x0,AddrLo

    would work.

    ---------------------
    Like, how about one not try to bake in assumptions about 1-cycle ALU and 2-cycle Load being practical?...

    for the above to work::
    ALU is < -+ cycle leaving -+ cycle output drive and -+ cycle input mux
    SRAM is -+ cycle, AGEN to SRAM decode is -+ cycle, SRAM output to shifter
    is < -+ cycle, and set-selection is -+ cycle; leaving -+ cycle for output drive.

    Vs, say, 2-cycle ALU ops and 3-cycle Loads; with an ideal of putting 5 instructions between an instruction that generates a result and the instruction that consumes the result as this is more likely to work with in-order superscalar.

    1-cycle ALU with 3 cycle LD is not very hard at 16-gates per cycle.
    2-cycle LD is absolutely impossible with 1-cycle addr-in to data-out
    SRAM. So, we generally consider any design with 2-cycle LD to be
    frequency limited.

    But, then one runs into the issue that if a basic operation then
    requires a multi-op sequence, the implied latency goes up considerably
    (say, could call this "soft latency", or SL).

    So, for example, it means that, say:
    2-instruction sign extension:
    RV working assumption: 2 cycles
    Hard latency (2c ALU): 4 cycles
    Soft latency: 12 cycles.
    For a 3-op sequence, the effective soft-latency goes up to 18, ...

    One of the reasons a 16-gate design works better in practice than
    a 12-gate design. And why a 1-cycle ALU, 3-cycle LD runs at higher
    frequency.

    And, in cases where the soft-latency significantly exceeds the total
    length of the loop body, it is no longer viable to schedule the loop efficiently.

    In software, there remains no significant problem running the loop
    in HW.

    So, in this case, an indexed-load instruction has an effective 9c SL, whereas SLLI+ADD+LD has a 21 cycle SL.

    3-cycle indexed LD with cache hit in may -|Architectures--with scaled
    indexing. This is one of the driving influences of "raising" the
    semantic content of LD/ST instructions to [Rbase+Rindex<<sc+Disp]

    where, in this case, the goal of something like the WEXifier is to
    minimize this soft-latency cost (in cases where a dependency is seen,
    any remaining soft-latency is counted as penalty).

    But, then again, maybe the concept of this sort of "soft latency" seems
    a bit alien.

    Those ISAs without scaled indexing have longer effective latency through
    cache than those with: those without full range Dsip have similar problems: those without both are effectively adding 3-4 cycles to LD latency.

    Which is why the size of the execution windows grew from 60-ish to 300-ish
    to double performance--the ISA is adding latency and the size of execution window is the easiest way to absorb such latency.
    {{60-ish ~= Athlon; 300-ish ~= M4}}

    Granted, not sure how this maps over to OoO, but had noted that even
    with modern CPUs, there still seems to be benefit from assuming a sort
    of implicit high latency for instructions over assuming a lower latency.

    Execution window size is how it maps.

    *1: Where people argue that if each vendor can do a CPU with their own custom ISA variants and without needing to license or get approval from
    a central authority, that invariably everything would decay into an incoherent mess where there is no binary compatibility between
    processors from different vendors (usual implication being that people
    are then better off staying within the ARM ecosystem to avoid RV's lawlessness).

    RISC-V seems to be "eating" a year (or a bit more) to bring this mess into
    a coherent framework.

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Sat Feb 21 01:00:05 2026
    From Newsgroup: comp.arch

    On 2/20/2026 5:49 PM, MitchAlsup wrote:

    BGB <cr88192@gmail.com> posted:

    On 2/19/2026 5:10 PM, John Dallman wrote:
    ------------------------------------
    This can be used to add resistance against stack-stomping via buffer
    overflows, but is potentially risky with RISC-V:
    AUIPC X1, AddrHi
    JALR X0, AddrLo(X1)
    Can nuke the process, when officially it is allowed (vs forcing the use
    of a different register to encode a long branch).

    That should be:
    AUPIC x1,hi(offset)
    JALR x0,lo(offset)

    using:
    SETHI x1,AddrHi
    JALR x0,AddrLo

    would work.


    Usual notation seems to be that AUIPC uses a direct-immediate notation
    (eg "AUIPC X1, 0x12345"), and "JALR X0, 0x678(X1)".

    Though, GAS and friends can use:
    AUIPC X1, %hi(symbol)
    JALR X0, X1, %lo(symbol)

    Well, and for JALR:
    JALR Xd, Disp(Xs)
    JALR Xd, Xs, Disp
    Being basically equivalent.
    ...

    If expressing it using a symbol rather than a literal displacement...

    But, either way, it is using X1 that was the relevant point here, which
    is technically allowed in RISC-V, but would explode if one tries to
    constrain X1 to being used as a link register and then also uses
    enforced tag checking in this case.

    In BGBCC's native ASM notation, the symbol case would typically be
    expressed as:
    BRA symbol
    Which them implies the 2-op form if the "symbol is within +/- 1MB check" fails. But, would differ in that this pseudo-op will stomp X5.



    But, yeah, RISC-V ASM notation conventions seem to get a little
    confusing sometimes...

    But, errm, my point wasn't so much about RISC-V's ASM syntax patterns.


    There is a non-zero risk though when one disallows uses that are
    theoretically allowed in the ISA, even if GCC doesn't use them.


    Though, the reason to sanity-check X1 is that it is pretty much
    universally used as the link register, and sanitizing the link-register
    can be used to trap on potential stack-corruption in buffer overflow
    exploits (more so with a compiler that tends not to use stack canary
    checks).


    Well, and in terms of typical ASM notation, there is this mess:
    (Rb) / @Rb / @(Rb) //load/store register
    (Rb, Disp) / Disp(Rb) //load/store disp
    @(Rb, Disp) / @(Disp, Rb) //load/store disp (but with @)
    Then:
    (Rb, Ri) //indexed (element sized index)
    Ri(Rb) //indexed (byte-scaled index)
    (Rb, Ri, Sc) //indexed with scale
    Disp(Rb, Ri) //indexed with displacement
    Disp(Rb, Ri, Sc) //indexed with displacement and scale
    Then:
    @Rb+ / (Rb)+ //post-increment
    @-Rb / -(Rb) //pre-decrement
    @Rb- / (Rb)- //post-decrement
    @+Rb / +(Rb) //pre-increment

    And, in some variants, all the registers prefixed with '%'.

    Comparably, the Intel style notation is more consistent, but don't
    necessarily want to also throw Intel notation into this particular mix.


    Well, more so as there is an implicit visual hint, say in x86:
    movl 128(%ebx), %eax
    mov eax, [ebx+128]
    Where the notation partly also keys one into the register ordering, but
    if on had Intel style memory notation while using AT&T style ordering,
    this would be a problem (confusing mess).


    Well, or the other messy feature that BGBCC tries to infer the register
    order based on which nmemonics are used:
    OP Rd, Rs1, Rs2 //used in RV mnemonics dominate
    OP Rs1, Rs2, Rd //otherwise

    Likely, if [] notation were supported then it would likely signal "dest, source" ordering (like Intel x86, and ARM), though in this case
    [Rb+Disp] and [Rb,Disp] likely being treated as analogous.


    But, alas, kind of a mess...

    And, if trying to mix/match styles, "there be dragons here"...



    ---------------------
    Like, how about one not try to bake in assumptions about 1-cycle ALU and
    2-cycle Load being practical?...

    for the above to work::
    ALU is < -+ cycle leaving -+ cycle output drive and -+ cycle input mux
    SRAM is -+ cycle, AGEN to SRAM decode is -+ cycle, SRAM output to shifter
    is < -+ cycle, and set-selection is -+ cycle; leaving -+ cycle for output drive.

    Vs, say, 2-cycle ALU ops and 3-cycle Loads; with an ideal of putting 5
    instructions between an instruction that generates a result and the
    instruction that consumes the result as this is more likely to work with
    in-order superscalar.

    1-cycle ALU with 3 cycle LD is not very hard at 16-gates per cycle.
    2-cycle LD is absolutely impossible with 1-cycle addr-in to data-out
    SRAM. So, we generally consider any design with 2-cycle LD to be
    frequency limited.


    My stuff mostly assumes:
    ADD and similar: 2 cycles
    Load: 3 cycles.

    In this case, some 1 cycle ops exist:
    MOV Rs, Rd / MV Xd, Xs
    MOV Imm, Rd / LI Xd, Imm

    For the RV and XG3 decoders, some special instructions are decoded as
    one of the above:
    ADDI Xd, Xs, 0 => MV
    ADDI Xd, X0, Imm => LI

    But, most remain as 2/3 cycle.

    A few instructions had a 4 cycle latency, mostly those which combined a
    Load with a format-conversion or similar.


    But, then one runs into the issue that if a basic operation then
    requires a multi-op sequence, the implied latency goes up considerably
    (say, could call this "soft latency", or SL).

    So, for example, it means that, say:
    2-instruction sign extension:
    RV working assumption: 2 cycles
    Hard latency (2c ALU): 4 cycles
    Soft latency: 12 cycles.
    For a 3-op sequence, the effective soft-latency goes up to 18, ...

    One of the reasons a 16-gate design works better in practice than
    a 12-gate design. And why a 1-cycle ALU, 3-cycle LD runs at higher
    frequency.


    OK.

    I ended up going for a slightly lower clock speed and slightly more
    complex operations because often this resulted in better performance.

    And, while I could probably run an RV32IM core at 100 MHz, I would need
    to pay in other areas.


    And, in cases where the soft-latency significantly exceeds the total
    length of the loop body, it is no longer viable to schedule the loop
    efficiently.

    In software, there remains no significant problem running the loop
    in HW.


    Another traditional option is modulo scheduling, but actually doing so
    in the compiler is more complex (and BGBCC does not do so).

    Can often do a "poor man's" version in C, which while a less elegant
    solution, often works out better in practice.


    One can try to effectively unroll the loop enough that the latency can
    be covered efficiently, but then the issue may become one of running out
    of working registers, and one doesn't want to unroll so much that the
    code starts thrashing, which tends to hurt worse than the potential loss
    of ILP by being narrower.



    So, in this case, an indexed-load instruction has an effective 9c SL,
    whereas SLLI+ADD+LD has a 21 cycle SL.

    3-cycle indexed LD with cache hit in may -|Architectures--with scaled indexing. This is one of the driving influences of "raising" the
    semantic content of LD/ST instructions to [Rbase+Rindex<<sc+Disp]


    Yeah, pretty much, or at lease [Rb+Ri<<Sc], but more of the full [Rb+Ri<<Sc+Disp] scenario often being uncommon IME.

    Well, with a possible exception of [GP+Ri<<Sc+Disp] which would see a localized spike due to:
    someGlobalArray[index]

    As-is, this case tends to manifest in my case as, say:
    LEA.Q (GP, Disp), R5
    MOV.Q (R5, R10), R11


    where, in this case, the goal of something like the WEXifier is to
    minimize this soft-latency cost (in cases where a dependency is seen,
    any remaining soft-latency is counted as penalty).

    But, then again, maybe the concept of this sort of "soft latency" seems
    a bit alien.

    Those ISAs without scaled indexing have longer effective latency through cache than those with: those without full range Dsip have similar problems: those without both are effectively adding 3-4 cycles to LD latency.

    Which is why the size of the execution windows grew from 60-ish to 300-ish
    to double performance--the ISA is adding latency and the size of execution window is the easiest way to absorb such latency.
    {{60-ish ~= Athlon; 300-ish ~= M4}}


    OK.

    FWIW, there are reasons I have indexed addressing and jumbo-prefixes for larger immediate values and displacements.

    But, seemingly, the idea of deviating from 2R1W and 16/32 instruction encodings fills the RISC-V people with fear.



    Granted, not sure how this maps over to OoO, but had noted that even
    with modern CPUs, there still seems to be benefit from assuming a sort
    of implicit high latency for instructions over assuming a lower latency.

    Execution window size is how it maps.


    OK.


    *1: Where people argue that if each vendor can do a CPU with their own
    custom ISA variants and without needing to license or get approval from
    a central authority, that invariably everything would decay into an
    incoherent mess where there is no binary compatibility between
    processors from different vendors (usual implication being that people
    are then better off staying within the ARM ecosystem to avoid RV's
    lawlessness).

    RISC-V seems to be "eating" a year (or a bit more) to bring this mess into
    a coherent framework.

    Yeah, and while ARC drags their feet,
    Qualcomm/Huawei/ByteDance/T-Head/... each go off and do similar things
    but in different ways...

    If I were organizing it, would likely handle it differently, by having a nested structure:
    Formal / Frozen //parts of the ISA that are fully settled.
    Semi-formal / non-frozen //details subject to change.
    provisional / experimental //very unstable.
    vendor-specific //excluded from standardization.


    In the provisional space, encodings could be defined, but could be
    reclaimed if the feature is "dead"; but would be in encoding blocks
    where they could be standardized later.

    The main difference being that the provisional space, there would be a semi-official website listing registered encodings. Rather than these encodings being scattered in the ISA documentation for the various
    vendor processors (requiring digging though a bunch of PDFs and so on to
    try to figure out which encodings are already in-use).


    Then sometimes there are encodings that are defined in way that don't
    make sense, like apparently there is a RISC-V core from MIPS
    Technologies, where they went and added Load/Store Pair, but with two
    data source/dest register fields and a very small displacement.

    This contrast strongly with, say, having even-pair registers and a
    non-tiny displacement (displacement needs to be at least big enough to
    cover a typical load/store area for a prolog/epilog).


    I had at first reused LDU/SDU encodings, but then the proposal for
    Load/Store indexed didn't go with my encoding scheme, but a different
    one that also used LDU/SDU (but, maybe this is ultimately a better place
    to put them; vs my approach of shoving them in an odd corner within the
    'A' extension's block).



    I ended up migrating Load/Store pair to the FLQ/FSQ encodings, partly as
    I had no intention to implement the Q extension as-is (and I needed
    somewhere to relocate them to). But, then this led to the "Pseudo Q" idea.


    In my case, there is XG3, but I consider this in a different category
    as, while it retains compatibility with RV64G and partial with RV64GC
    (mostly by providing for interoperability); it is in some ways a notable departure from "pure" RISC-V (well, in that modes, tagged pointers, and swapping out RV-C encodings for a different 32-bit encoding space, are
    not particularly small additions if one didn't already have a CPU
    designed in this way).


    ...





    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat Feb 21 18:41:12 2026
    From Newsgroup: comp.arch

    Stefan Monnier <monnier@iro.umontreal.ca> writes:
    At the time of conception, there were amny arguments that {sooner or
    later} compilers COULD figure stuff like this out.

    I can't remember seeing such arguments comping from compiler people, tho.

    Actually, the IA-64 people could point to the work on VLIW (in
    particular, Multiflow (trace scheduling) and Cydrome (software
    pipelining)), which in turn is based on the work on compilers for
    microcode.

    That did not solve memory latency, but that's a problem even for OoO
    cores.

    I suspect a big part of the problem was tension between Intel and HP
    were the only political solution was allowing the architects from both
    sides to "dump in" their favorite ideas. A recipe for disaster.

    The HP side had people like Bob Rau (Cydrome) and Josh Fisher
    (Multiflow), and given their premise, the architecture is ok; somewhat
    on the complex side, but they wanted to cover all the good ideas from
    earlier designs; after all, it was to be the one architecture to rule
    them all (especially performancewise). You cannot leave out a feature
    that a competitor could then add to outperform IA-64.

    The major problem was that the premise was wrong. They assumed that
    in-order would give them a clock rate edge, but that was not the case,
    right from the start (The 1GHz Itanium II (released July 2002)
    competed with 2.53GHz Pentium 4 (released May 2002) and 1800MHz Athlon
    XP (released June 2002)). They also assumed that explicit parallelism
    would provide at least as much ILP as hardware scheduling of OoO CPUs,
    but that was not the case for general-purpose code, and in any case,
    they needed a lot of additional ILP to make up for their clock speed disadvantage.

    The odd thing is that these were hardware companies betting on "someone
    else" solving their problem, yet if compiler people truly had managed to >solve those problems, then other hardware companies could have taken >advantage just as well.

    I am sure they had patents on stuff like the advanced load and the
    ALAT, so no, other hardware companies would have had a hard time.

    To me the main question is whether they were truly confused and just got >lucky (lucky because they still managed to sell their idea enough that
    most RISC companies folded),

    I think most RISC companies had troubles scaling. They were used to
    small design teams spinning out simple RISCs in a short time, and did
    not have the organization to deal with the much larger projects that
    OoO superscalars required. And while everybody inventing their own architecture may have looked like a good idea when developing an
    architecture and its implementations was cheap, it looked like a bad
    deal when development costs started to ramp up in the mid-90s. That's
    why HP went to Intel, and other companies (in particular, SGI) took
    this as an exit strategy from the own-RISC business.

    DEC had increasing delays in their chips, and eventually could not
    make enough money with them and had to sell themselves to Compaq (who
    also could not sustain the effort and sold themselves to HP (who
    canceled Alpha development)). I doubt that IA-64 played a big role in
    that game.

    Back to IA-64: At the time, when OoO was just starting, the premise of
    IA-64 looked plausible. Why wouldn't they see a fast clock rate and
    higher ILP from explicit parallelism than conventional architectures
    would see from OoO (apparently complex, and initially without anything
    like IA-64's ALAT)?

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat Feb 21 20:15:34 2026
    From Newsgroup: comp.arch


    BGB <cr88192@gmail.com> posted:

    On 2/20/2026 5:49 PM, MitchAlsup wrote:
    ----------------------------

    There is a non-zero risk though when one disallows uses that are theoretically allowed in the ISA, even if GCC doesn't use them.

    This is why one must decode all 32-bits of each instruction--so that
    there is no hole in the decoder that would allow the core to do some-
    thing not directly specified in ISA. {And one of the things that make
    an industrial quality ISA so hard to fully specify.}}
    ---------------------

    Well, and in terms of typical ASM notation, there is this mess:
    (Rb) / @Rb / @(Rb) //load/store register
    (Rb, Disp) / Disp(Rb) //load/store disp
    @(Rb, Disp) / @(Disp, Rb) //load/store disp (but with @)
    Then:
    (Rb, Ri) //indexed (element sized index)
    Ri(Rb) //indexed (byte-scaled index)
    (Rb, Ri, Sc) //indexed with scale
    Disp(Rb, Ri) //indexed with displacement
    Disp(Rb, Ri, Sc) //indexed with displacement and scale
    Then:
    @Rb+ / (Rb)+ //post-increment
    @-Rb / -(Rb) //pre-decrement
    @Rb- / (Rb)- //post-decrement
    @+Rb / +(Rb) //pre-increment

    And, in some variants, all the registers prefixed with '%'.

    Leading to SERIAL DECODE--which is BAD.
    -----------------------
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat Feb 21 20:38:51 2026
    From Newsgroup: comp.arch


    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

    Stefan Monnier <monnier@iro.umontreal.ca> writes:
    At the time of conception, there were amny arguments that {sooner or
    later} compilers COULD figure stuff like this out.

    I can't remember seeing such arguments comping from compiler people, tho.

    Actually, the IA-64 people could point to the work on VLIW (in
    particular, Multiflow (trace scheduling) and Cydrome (software
    pipelining)), which in turn is based on the work on compilers for
    microcode.

    That did not solve memory latency, but that's a problem even for OoO
    cores.

    I suspect a big part of the problem was tension between Intel and HP
    were the only political solution was allowing the architects from both
    sides to "dump in" their favorite ideas. A recipe for disaster.

    The HP side had people like Bob Rau (Cydrome) and Josh Fisher
    (Multiflow), and given their premise, the architecture is ok; somewhat
    on the complex side, but they wanted to cover all the good ideas from
    earlier designs; after all, it was to be the one architecture to rule
    them all (especially performancewise). You cannot leave out a feature
    that a competitor could then add to outperform IA-64.

    In this time period, performance was doubling every 14 months, so if a
    feature added x performance it MUST avoid adding more than x/14 months
    to the schedule. If IA-64 was 2 years earlier, it would have been com- petitive--sadly it was not.
    ---------------------
    To me the main question is whether they were truly confused and just got >lucky (lucky because they still managed to sell their idea enough that
    most RISC companies folded),

    I think most RISC companies had troubles scaling. They were used to
    small design teams spinning out simple RISCs in a short time, and did
    not have the organization to deal with the much larger projects that
    OoO superscalars required.

    Most RISC teams did not have the cubic dollars of revenue to afford the
    team size needed for GBOoO design--nor, BTW, the management expertise
    to run such a large organization efficiently.

    And while everybody inventing their own architecture may have looked like a good idea when developing an
    architecture and its implementations was cheap,

    1-wide, and a bit of 2-wide.

    it looked like a bad
    deal when development costs started to ramp up in the mid-90s. That's
    why HP went to Intel, and other companies (in particular, SGI) took
    this as an exit strategy from the own-RISC business.

    DEC had increasing delays in their chips, and eventually could not
    make enough money with them and had to sell themselves to Compaq (who
    also could not sustain the effort and sold themselves to HP (who
    canceled Alpha development)). I doubt that IA-64 played a big role in
    that game.

    Back to IA-64: At the time, when OoO was just starting, the premise of
    IA-64 looked plausible. Why wouldn't they see a fast clock rate and
    higher ILP from explicit parallelism than conventional architectures
    would see from OoO (apparently complex, and initially without anything
    like IA-64's ALAT)?

    - anton
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Sat Feb 21 14:59:54 2026
    From Newsgroup: comp.arch

    On 2/21/2026 2:15 PM, MitchAlsup wrote:

    BGB <cr88192@gmail.com> posted:

    On 2/20/2026 5:49 PM, MitchAlsup wrote:
    ----------------------------

    There is a non-zero risk though when one disallows uses that are
    theoretically allowed in the ISA, even if GCC doesn't use them.

    This is why one must decode all 32-bits of each instruction--so that
    there is no hole in the decoder that would allow the core to do some-
    thing not directly specified in ISA. {And one of the things that make
    an industrial quality ISA so hard to fully specify.}}
    ---------------------

    Sometimes there is a tension:
    What is theoretically allowed in the ISA;
    What is the theoretically expected behavior in some abstract model;
    What stuff is actually used by compilers;
    What features or behaviors does one want;
    ...

    Implementing RISC-V strictly as per an abstract model would limit both efficiency and hinder some use-cases.

    Then it comes down to "what do compilers do" and "what unintentional
    behaviors could an ASM programmer stumble onto unintentionally".

    Stuff like "Program misbehaves or crashes on a fairly mundane piece of
    code" are preferably avoided.


    Alternatives being, say:
    Define behaviors what programs are allowed to rely on;
    Be slightly conservative with how one defines edge cases;
    Avoid over-defining things too far outside the scope of what is actually relevant.

    Sometimes design elegance can become a trap.


    But, OTOH, having special cases for some instructions based on which
    registers or immediate values are used isn't exactly clean or elegant.

    Like, yeah:
    Using X0 or X1 here invokes magic;
    Instruction doesn't work unless X0 or X1;
    ...



    Well, and in terms of typical ASM notation, there is this mess:
    (Rb) / @Rb / @(Rb) //load/store register
    (Rb, Disp) / Disp(Rb) //load/store disp
    @(Rb, Disp) / @(Disp, Rb) //load/store disp (but with @)
    Then:
    (Rb, Ri) //indexed (element sized index)
    Ri(Rb) //indexed (byte-scaled index)
    (Rb, Ri, Sc) //indexed with scale
    Disp(Rb, Ri) //indexed with displacement
    Disp(Rb, Ri, Sc) //indexed with displacement and scale
    Then:
    @Rb+ / (Rb)+ //post-increment
    @-Rb / -(Rb) //pre-decrement
    @Rb- / (Rb)- //post-decrement
    @+Rb / +(Rb) //pre-increment

    And, in some variants, all the registers prefixed with '%'.

    Leading to SERIAL DECODE--which is BAD.
    -----------------------

    Depends on what ISA the ASM syntax is actually attached to...
    If it is a VAX, yeah, true enough.

    Seemingly most of these syntax variants go back to PDP-11 and VAX origins.

    Then some quirks, like '%' on register names, apparently mostly came
    from the M68K branch:
    PDP/VAX: No '%'
    M68K: Added '%'
    GAS on x86: Mostly kept using M68K notation.


    Then apparently the '@' thing was partly a thing originating either in
    Hitachi or Texas Instruments (along with putting '.' in many of the instruction mnemonics).

    So, if working backwards, could drop all the '@' variants, along with
    '%', ...


    Where, apparently, the syntax scheme I had mostly ended up using for
    BGBCC and my own stuff, ended up partly mutating back towards the
    original PDP/VAX style syntax.

    Namely, was using:
    (Rb)
    (Rb, Rb)
    (Rb, Disp) | Disp(Rb)
    ...

    Though, had mostly kept the dotted names vs reverting to dot-free names.


    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat Feb 21 22:56:47 2026
    From Newsgroup: comp.arch


    BGB <cr88192@gmail.com> posted:

    On 2/21/2026 2:15 PM, MitchAlsup wrote:

    BGB <cr88192@gmail.com> posted:

    On 2/20/2026 5:49 PM, MitchAlsup wrote:
    ----------------------------

    There is a non-zero risk though when one disallows uses that are
    theoretically allowed in the ISA, even if GCC doesn't use them.

    This is why one must decode all 32-bits of each instruction--so that
    there is no hole in the decoder that would allow the core to do some-
    thing not directly specified in ISA. {And one of the things that make
    an industrial quality ISA so hard to fully specify.}}
    ---------------------

    Sometimes there is a tension:
    What is theoretically allowed in the ISA;
    What is the theoretically expected behavior in some abstract model;
    What stuff is actually used by compilers;
    What features or behaviors does one want;
    ...
    Whether your ISA can be attacked with Spectr|- and/or Meltdown;
    Whether your DFAM can be attacked with RowHammer;
    Whether your call/return interface can be attacked with:
    { Return Orienter Programmeing, Buffer Overflows, ...}

    That is; whether you care if your system provides a decently robust
    programming environment.

    I happen to care. Apparently, most do not.

    Implementing RISC-V strictly as per an abstract model would limit both efficiency and hinder some use-cases.

    One can make an argument that it is GOOD to limit attack vectors, and
    provide a system that is robust in the face of attacks.

    Then it comes down to "what do compilers do" and "what unintentional behaviors could an ASM programmer stumble onto unintentionally".

    eve at best.

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sun Feb 22 13:37:04 2026
    From Newsgroup: comp.arch

    MitchAlsup <user5857@newsgrouper.org.invalid> writes:

    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:
    The HP side had people like Bob Rau (Cydrome) and Josh Fisher
    (Multiflow), and given their premise, the architecture is ok; somewhat
    on the complex side, but they wanted to cover all the good ideas from
    earlier designs; after all, it was to be the one architecture to rule
    them all (especially performancewise). You cannot leave out a feature
    that a competitor could then add to outperform IA-64.

    In this time period, performance was doubling every 14 months, so if a >feature added x performance it MUST avoid adding more than x/14 months
    to the schedule. If IA-64 was 2 years earlier, it would have been com- >petitive--sadly it was not.

    No, if a feature adds a year in development time, you start a year
    earlier (or alternatively target a release a year later).

    Intel adds features to AMD64 (or "Intel 64", as they call it) all the
    time, usually with little immediate performance impact, but they
    managed to keep their schedules, at least while the process advances
    also kept to their schedules (which broke down around 2016).

    For IA-64, Intel/HP later did not add ISA features, and still did not
    result in competetive performance for general-purpose code.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Sun Feb 22 11:51:36 2026
    From Newsgroup: comp.arch

    Anton Ertl [2026-02-21 18:41:12] wrote:
    Stefan Monnier <monnier@iro.umontreal.ca> writes:
    MitchAlsup <user5857@newsgrouper.org.invalid> wrote:
    At the time of conception, there were amny arguments that {sooner or
    later} compilers COULD figure stuff like this out.
    I can't remember seeing such arguments comping from compiler people, tho.
    Actually, the IA-64 people could point to the work on VLIW (in
    particular, Multiflow (trace scheduling) and Cydrome (software
    pipelining)), which in turn is based on the work on compilers for
    microcode.

    Of course, compiler people have worked on such problems and solved some
    cases. But what I wrote above is that "I can't remember seeing
    ... compiler people" claiming that "{sooner or later} compilers COULD
    figure stuff like this out".

    The major problem was that the premise was wrong. They assumed that
    in-order would give them a clock rate edge, but that was not the case,
    right from the start (The 1GHz Itanium II (released July 2002)
    competed with 2.53GHz Pentium 4 (released May 2002) and 1800MHz Athlon
    XP (released June 2002)). They also assumed that explicit parallelism
    would provide at least as much ILP as hardware scheduling of OoO CPUs,
    but that was not the case for general-purpose code, and in any case,
    they needed a lot of additional ILP to make up for their clock speed disadvantage.

    Definitely.

    The odd thing is that these were hardware companies betting on "someone >>else" solving their problem, yet if compiler people truly had managed to >>solve those problems, then other hardware companies could have taken >>advantage just as well.
    I am sure they had patents on stuff like the advanced load and the
    ALAT, so no, other hardware companies would have had a hard time.

    I'm pretty sure that if compiler people ever solve the problems that
    plagued the Itanium, those same solutions can bring similar benefits to architectures using other (non-patented) mechanisms.


    === Stefan
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From jgd@jgd@cix.co.uk (John Dallman) to comp.arch on Sun Feb 22 21:52:00 2026
    From Newsgroup: comp.arch

    In article <10nak0a$nrac$2@dont-email.me>, cr88192@gmail.com (BGB) wrote:

    Does imply that my younger self was notable, and not seen as just
    some otherwise worthless nerd.

    Educators who are any good notice the weird kids who are actually smart.

    For 128 predicate registers, this part doesn't make as much sense:

    I suspect they wanted to re-use some logic.

    The tricks Itanium could do with combinations of predicate registers were pretty weird. There was at least one instruction for manipulating them
    which I was entirely unable to understand, with the manual in front of me
    and pencil and paper to try examples. Fortunately, it never occurred in
    code generated by any of the compilers I used.

    *1: Where people argue that if each vendor can do a CPU with their
    own custom ISA variants and without needing to license or get
    approval from a central authority, that invariably everything would
    decay into an incoherent mess where there is no binary
    compatibility between processors from different vendors (usual
    implication being that people are then better off staying within
    the ARM ecosystem to avoid RV's lawlessness).

    The importance of binary compatibility is very much dependent on the
    market sector you're addressing. It's absolutely vital for consumer apps
    and games. It's much less important for current "AI" where each vendor
    has their own software stack anyway. RISC-V seems to be far more
    interested in the latter at present.

    John
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Tue Feb 24 17:32:45 2026
    From Newsgroup: comp.arch

    On 2/21/2026 4:56 PM, MitchAlsup wrote:

    BGB <cr88192@gmail.com> posted:

    On 2/21/2026 2:15 PM, MitchAlsup wrote:

    BGB <cr88192@gmail.com> posted:

    On 2/20/2026 5:49 PM, MitchAlsup wrote:
    ----------------------------

    There is a non-zero risk though when one disallows uses that are
    theoretically allowed in the ISA, even if GCC doesn't use them.

    This is why one must decode all 32-bits of each instruction--so that
    there is no hole in the decoder that would allow the core to do some-
    thing not directly specified in ISA. {And one of the things that make
    an industrial quality ISA so hard to fully specify.}}
    ---------------------

    Sometimes there is a tension:
    What is theoretically allowed in the ISA;
    What is the theoretically expected behavior in some abstract model;
    What stuff is actually used by compilers;
    What features or behaviors does one want;
    ...
    Whether your ISA can be attacked with Spectr|- and/or Meltdown;
    Whether your DFAM can be attacked with RowHammer;
    Whether your call/return interface can be attacked with:
    { Return Orienter Programmeing, Buffer Overflows, ...}

    That is; whether you care if your system provides a decently robust programming environment.

    I happen to care. Apparently, most do not.


    There is a way at least, as noted, to optionally provide some additional protection against buffer overflows (in a compiler that does not use
    stack canaries, eg, GCC).

    But, as-noted, it disallows AUIPC+JALR to use X1 in this way.
    Even if compiler output does generally use X5 for this case.


    Implementing RISC-V strictly as per an abstract model would limit both
    efficiency and hinder some use-cases.

    One can make an argument that it is GOOD to limit attack vectors, and
    provide a system that is robust in the face of attacks.


    This was a partial motivation for deviating from the abstract model.

    Deviating from the abstract model in some cases allows closing down
    attack vectors.


    Then it comes down to "what do compilers do" and "what unintentional
    behaviors could an ASM programmer stumble onto unintentionally".

    eve at best.


    Possibly, but there are some things a case can be made for disallowing:
    Using X1 for things other than as a Link Register;
    Disallowing JAL and JALR with Rd other than X0 or X1;
    Disallowing most instructions, other than a few special cases, from
    having X0 or X1 as a destination.

    RISC-V has a lot of "Hint" instructions, but a case can be made for
    making many of them illegal (where trying to use them will result in an exception; rather than simply ignoring them).

    In some other cases, it may be justified to disallow (and generate an exception for) things which can be expressed in the ISA, technically,
    but don't actually make sense for a program to make use of (some amount
    of edge cases that result in NOPs, or sometimes non-NOP behaviors which
    don't actually make sense); but are more likely to appear in undesirable
    cases (such as the CPU executing random garbage as instructions).

    ...


    Say, for example, the normal/canonical RISC-V NOP can't be expressed
    without 0x00 (NUL) bytes, whereas many other HINT type instructions can
    be encoded without NUL bytes.

    If someone can't as easily compose a NUL-byte-free NOP-slide, it makes
    it harder to inject shell code via ASCII strings (as does hindering the ability to tamper with return addresses), avoiding casual use of RWX
    memory, etc.


    The JAL/JALR Rd=X0|X1 only case, is one of those cases where one can
    argue a use-case exists, but is so rarely used as to make its native
    existence in hardware (or in an ISA design) difficult to justify. In
    effect, supporting it in HW adds non-zero cost, programs don't actually
    use it, and it burns 4 bits of encoding space you aren't really getting
    back (and they could have used it for something more useful, say, making
    JAL has a 16MB range or something).

    While one can't change the encoding now, they can essentially just turn
    the generic case into a trap and call it done.

    ...



    Though. yes, even if one nails all this down, there are still often
    other (nor-memory) attack vectors (such as attacking program logic).

    Saw something not too long ago where there was an RCE exploit for some
    random system (which operated via an HTTP servers and HTTP requests; I
    think for "enterprise supply-chain stuff" or something), where the
    exploit was basically the ability to execute arbitrary shell commands
    via expressing them as an HTTP request (with said server effectively
    running as "setuid root" or similar).

    Or, basically, something so insecure that someone could (in theory) hack
    it by typing specific URLs into a web browser or something (or maybe
    using "wget" via a bash script).

    Like, something like:
    http : //someaddress/cgi-bin/system.cgi?cmd=sshd%20...
    (Say, spawn an SSH server so they can stop using HTTP requests).
    Leaving people to just ROFLOL about how bad it was...

    And, how did the original product operate?... Mostly by sending
    unsecured shell commands over HTTP.


    So, alas...


    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Thu Feb 26 14:54:07 2026
    From Newsgroup: comp.arch

    On 2/22/2026 3:52 PM, John Dallman wrote:
    In article <10nak0a$nrac$2@dont-email.me>, cr88192@gmail.com (BGB) wrote:

    Does imply that my younger self was notable, and not seen as just
    some otherwise worthless nerd.

    Educators who are any good notice the weird kids who are actually smart.


    Sometimes I question if I really am though.

    Like, some evidence says I am, but by most metrics of "life success" I
    have done rather poorly.


    And, in middle and high-school, they just sorta forced me to sit through normal classes (which sucked really hard). Well, and I apparently missed
    the point of school, thinking it was more of an endurance thing with
    sort of a vague pretense of education (and I probably would have learned
    more if they just let me spend the time doing whatever else).

    ...



    But, it seems like a case of:
    By implication, I am smart, because if I wasn't, even my own (sometimes pointless) hobby interests would have been out of reach.

    Like, not a world of difficulty justifying them, or debating whether or
    not something is worth doing, but likely not something someone could do
    at all.


    Or, maybe, like encountering things that seem confusing isn't such a
    rare experience (or that people have learned how to deal more
    productively with things they can see but don't understand?...).


    But, there is a thing I have noted:
    I had a few times mentioned to people about finding that certain AIs had gotten smart enough to start understanding how a 5/6 bit finite state
    machine to predict repeating 1-4 bit patterns would be constructed.

    Then, I try to describe it, and then realize that for the people I try
    to mention it to, it isn't that they have difficulty imagining how one
    would go about filling in the table and getting all of the 4 bit
    patterns to fit into 32 possible states. Many seem to have difficulty understanding how such a finite state machine would operate in the first place.


    Even if, it seems like this part seems like something that pretty much
    anyone should be able to understand.

    Initially, I had used this as a test case for the AIs because it posed "moderate difficulty" for problems which could be reasonably completely described in a chat prompt (and is not overly generic).

    Nevermind if it is still a pain to generate tables by hand, and my
    attempts at hand-generated tables have tended to have worse adaptation
    rates than those generated using genetic algorithms (can be more clean looking, but tend to need more input bits to reach the target state if
    the pattern changes).


    Sometimes I feel like a poser.
    Other things, it seems, I had taken for granted.

    Seems sometimes if I were "actually smart", would have figured out some
    way to make better and more efficient use of my span of existence.


    For 128 predicate registers, this part doesn't make as much sense:

    I suspect they wanted to re-use some logic.

    The tricks Itanium could do with combinations of predicate registers were pretty weird. There was at least one instruction for manipulating them
    which I was entirely unable to understand, with the manual in front of me
    and pencil and paper to try examples. Fortunately, it never occurred in
    code generated by any of the compilers I used.


    Possibly.

    I had also looked into a more limited set of predicate registers at one
    point, but this fizzled in favor of just using GPRs.

    So, as noted:
    I have 1 predicate bit (T bit);
    Had looked into expanding it to 2 predicate bits (using an S bit as a
    second predicate), but this went nowhere.


    Had at another time looked into schemes for having a combination of 8x
    1-bit predicate registers with operations that could update the T bit.
    My initial attempt was an x87 style stack machine, and this was a fail.
    A later design attempt would have added U0..U7 as 8x 1-bit registers.

    Though, just ended up instead going with GPRs for this (following a
    pattern more like RISC-V). Though, in XG3, some operation can be
    directed at R0/XO to update the T bit.


    In RV-like terms:
    SLT, SGE, SEQ, SNE, SLTU, SGEU
    AND, OR //more recent
    Where, 'AND' partly takes over the role of the 2R "TST" instruction.
    AND X0, X10, X11

    Though, for now using AND/OR directed to X0 for bitwise predication will
    be specific to XG3 encodings.

    Say, because someone in their great wisdom decided to use ORI and
    similar directed to X0 in the RISC-V encoding space to encode the
    prefetch instructions.

    Personally, I would have used, say:
    LB X0, Disp(Xs)
    Or similar, since presumably any sane prefetch needs to be able to
    access the memory it is prefetching from, and load-as-prefetch makes
    more sense to me than ORI as prefetch, but alas...

    Then again, LHU/LWU encoding with X0 as an implicit branch for an
    optional feature is similarly suspect (and carries the risk of "what if someone else puts some other behavior here"?...).


    *1: Where people argue that if each vendor can do a CPU with their
    own custom ISA variants and without needing to license or get
    approval from a central authority, that invariably everything would
    decay into an incoherent mess where there is no binary
    compatibility between processors from different vendors (usual
    implication being that people are then better off staying within
    the ARM ecosystem to avoid RV's lawlessness).

    The importance of binary compatibility is very much dependent on the
    market sector you're addressing. It's absolutely vital for consumer apps
    and games. It's much less important for current "AI" where each vendor
    has their own software stack anyway. RISC-V seems to be far more
    interested in the latter at present.


    Probably true...


    Likely also the space of customized CPU design / experimentation is much
    more accepting of fragmentation, where in more mainline "user oriented" hardware, it would be a bigger issue.

    Still I sit around waiting to see if the whole RISC-V indexed load thing (zilx) becomes an actual extension.

    In my working version, did end up going and implementing support for it
    within BGBCC (and in my emulator and CPU core), but am still partly
    waiting on whether it gains actual approval from the ARC.

    Most recent news I saw was basically one of the people involved
    complaining that it would show no significant performance benefit for
    SPEC running on OoO processor implementations.

    Say, vs Doom on an in-order CPU, where it makes a much bigger difference.

    Sometimes, maybe, SPEC on high-end CPUs should not be the primary
    arbiter (more so when most of the CPUs are likely to end up in segments
    where in-order tends to dominate).

    ...

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Feb 27 19:27:21 2026
    From Newsgroup: comp.arch


    BGB <cr88192@gmail.com> posted:

    On 2/22/2026 3:52 PM, John Dallman wrote:
    In article <10nak0a$nrac$2@dont-email.me>, cr88192@gmail.com (BGB) wrote:

    Does imply that my younger self was notable, and not seen as just
    some otherwise worthless nerd.

    Educators who are any good notice the weird kids who are actually smart.


    Sometimes I question if I really am though.

    Like, some evidence says I am, but by most metrics of "life success" I
    have done rather poorly.


    And, in middle and high-school, they just sorta forced me to sit through normal classes (which sucked really hard)

    In my case, I remember sitting in the back of advanced algebra class
    (mostly senior HS people, me a sophomore) doing chemistry homework while vaguely listening to the teacher fail to get various students to solve
    a typical algebra problem. Then she called on me, I looked up at the board
    and in less than a second I rattled off the answer skipping 5 steps along
    the way. Moral, don't be bored in class, do something useful instead.

    Well, and I apparently missed
    the point of school, thinking it was more of an endurance thing with
    sort of a vague pretense of education (and I probably would have learned more if they just let me spend the time doing whatever else).

    For most people, school attempts to give the students just enough knowledge that they are not burdens on society.
    -------------------------
    The tricks Itanium could do with combinations of predicate registers were pretty weird. There was at least one instruction for manipulating them which I was entirely unable to understand, with the manual in front of me and pencil and paper to try examples. Fortunately, it never occurred in code generated by any of the compilers I used.

    It could have been a case where the obvious logic decoding "that" field in
    the instruction allowed for "a certain pattern" to perform what they described in the spec. I did some of this in Mc 88100, and this is what taught me never to do it again or allow anyone else to do it again.

    Possibly.

    I had also looked into a more limited set of predicate registers at one point, but this fizzled in favor of just using GPRs.

    So, as noted:
    I have 1 predicate bit (T bit);
    Had looked into expanding it to 2 predicate bits (using an S bit as a
    second predicate), but this went nowhere.

    I have tried several organizations over the last 40 years of practice::
    In my Humble and Honest Opinion, the only constructs predicates should
    support are singular comparisons and comparisons using && and || with deMoganizing logic {~}--not because other forms are unuseful, but be-
    cause those are the constructs programmers use writing code.

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Fri Feb 27 19:57:45 2026
    From Newsgroup: comp.arch

    MitchAlsup <user5857@newsgrouper.org.invalid> writes:



    And, in middle and high-school, they just sorta forced me to sit through
    normal classes (which sucked really hard)

    In my case, I remember sitting in the back of advanced algebra class
    (mostly senior HS people, me a sophomore) doing chemistry homework while >vaguely listening to the teacher fail to get various students to solve
    a typical algebra problem. Then she called on me, I looked up at the board >and in less than a second I rattled off the answer skipping 5 steps along
    the way. Moral, don't be bored in class, do something useful instead.

    Well, and I apparently missed
    the point of school, thinking it was more of an endurance thing with
    sort of a vague pretense of education (and I probably would have learned
    more if they just let me spend the time doing whatever else).

    For most people, school attempts to give the students just enough knowledge >that they are not burdens on society.

    My high school (1970s, when the split was K-7, 7-9, 10-12) had
    four "communities".

    Traditional
    Career
    Work Study
    Flexible Individual Learning (FIL)

    The college-bound were generally part of the
    FIL community. Career included business classes,
    traditional was more like the olden days and
    Work Study included off-school apprenticeships,
    shop classes, electronics training, etc.

    Students mostly took classes with peers in their
    community (there were over 400 in my graduating class).

    Worked rather well, but ended up segregating students
    by income level as well as IQ, so
    the school district changed that in the
    80s in the interest of equality treating the
    entire high school as a single community. The
    quality of the education received diminished
    thereafter, IMO.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Fri Feb 27 16:14:17 2026
    From Newsgroup: comp.arch

    On 2/27/2026 1:57 PM, Scott Lurndal wrote:
    MitchAlsup <user5857@newsgrouper.org.invalid> writes:



    And, in middle and high-school, they just sorta forced me to sit through >>> normal classes (which sucked really hard)

    In my case, I remember sitting in the back of advanced algebra class
    (mostly senior HS people, me a sophomore) doing chemistry homework while
    vaguely listening to the teacher fail to get various students to solve
    a typical algebra problem. Then she called on me, I looked up at the board >> and in less than a second I rattled off the answer skipping 5 steps along
    the way. Moral, don't be bored in class, do something useful instead.

    Well, and I apparently missed >>> the point of school, thinking it was more of an endurance thing with
    sort of a vague pretense of education (and I probably would have learned >>> more if they just let me spend the time doing whatever else).

    For most people, school attempts to give the students just enough knowledge >> that they are not burdens on society.

    My high school (1970s, when the split was K-7, 7-9, 10-12) had
    four "communities".

    Traditional
    Career
    Work Study
    Flexible Individual Learning (FIL)

    The college-bound were generally part of the
    FIL community. Career included business classes,
    traditional was more like the olden days and
    Work Study included off-school apprenticeships,
    shop classes, electronics training, etc.

    Students mostly took classes with peers in their
    community (there were over 400 in my graduating class).

    Worked rather well, but ended up segregating students
    by income level as well as IQ, so
    the school district changed that in the
    80s in the interest of equality treating the
    entire high school as a single community. The
    quality of the education received diminished
    thereafter, IMO.

    AFAIK, the high schools I went to in the early had 2 groups:
    Normal;
    Special Education.

    I think initially there would have been some "AP" classes, but these
    were eliminated because of "No Child Left Behind" or similar (easier to
    fold everyone into the same classes for sake of standardized testing).

    ...


    Well, they also had other things going on at the time:
    Entering the building involved a checkpoint and showing an ID (to be
    scanned with a handheld barcode reader, *1);
    The building was typically partitioned off with metal gates and checkpoints;
    At certain times they would open the gates to allow freer movement,
    other times one would need to show ID (which would be logged) and let
    through using a smaller gate;
    During classes, typically also guards would patrol the halls along with
    dogs, and if one were in the hall during class and ran into one, they
    would need to show ID and a hall-pass and similar (as a sort of
    print-out ticket identifying the class and teacher and the
    date/time-issued, etc, sorta like a receipt one gets in a store, where
    one needed to ask the teacher to leave the room, and the teachers'
    computers would often have receipt printers; well, I guess as opposed to
    using a laser printer to print a hall pass, *2);
    ...

    *1: Back in these days, barcodes were still the go-to technology, not
    having yet been replaced by the use of QR codes and similar.

    *2: I guess it is however the cost dynamics worked out between using the
    laser printer (and a full sheet of paper) vs also giving each teacher a thermal printer in addition to a laser printer (for the off-chance of
    students needing to use the bathroom or similar?...).



    Note that getting to/from the school generally involved the use of
    school busses (and, at the end of the day, the goal was mostly to make
    it out of the building and onto the correct bus before the bus leaves;
    note that there was generally no time to stop or loiter, or one would
    miss the bus).

    Or, if the teacher delayed dismissing everyone at the final bell, one
    could also miss the bus.

    ...


    Had noticed that some of this was typically lacking in TV show
    depictions of high-schools, which often show people moving freely and socializing; and not so much the use of guards and checkpoints (or flows
    of students each along their respective side of the hall, and needing to
    weave through the crowd at intersections, where the flow would become
    more turbulent).

    Well, and say, one needing to try to make it efficiently through the
    halls as to not be late for the next class (where, say, hallway crowding
    would sometimes make it difficult to cross the building within the 5
    minute time limit).


    Well, one could also maybe try to stop by the bathroom between classes,
    but doing so would greatly increase the likelihood of being later, but
    it was a tradeoff between tardiness and needing to bother the teacher
    for a pass to use the bathroom, so "lesser of two evils" or such.


    ...


    Not sure what modern high-schools are like though.


    Or such...

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Fri Feb 27 17:01:22 2026
    From Newsgroup: comp.arch

    On 2/27/2026 1:27 PM, MitchAlsup wrote:

    BGB <cr88192@gmail.com> posted:

    On 2/22/2026 3:52 PM, John Dallman wrote:
    In article <10nak0a$nrac$2@dont-email.me>, cr88192@gmail.com (BGB) wrote: >>>
    Does imply that my younger self was notable, and not seen as just
    some otherwise worthless nerd.

    Educators who are any good notice the weird kids who are actually smart. >>>

    Sometimes I question if I really am though.

    Like, some evidence says I am, but by most metrics of "life success" I
    have done rather poorly.


    And, in middle and high-school, they just sorta forced me to sit through
    normal classes (which sucked really hard)

    In my case, I remember sitting in the back of advanced algebra class
    (mostly senior HS people, me a sophomore) doing chemistry homework while vaguely listening to the teacher fail to get various students to solve
    a typical algebra problem. Then she called on me, I looked up at the board and in less than a second I rattled off the answer skipping 5 steps along
    the way. Moral, don't be bored in class, do something useful instead.


    I didn't really do much in terms of assignments.


    No one really called on me either (but, IME, calling on students to
    answer questions wasn't really a thing).

    Mostly things were pretty one way.

    I think at one point, there was a thing in one class where things got
    heated between the teacher and one of the students, like I think they
    were getting in an argument about GWB's invasion of Afganistan or
    something (and whether or not the invasion was justified or similar);
    she sent him to the office, and principal sent him back.

    Usually, expected role was to sit passively, do assignments as assigned,
    and say nothing.


    I think, there was another time where a science teacher was talking
    about stuff in class, and started getting agitated, and deviated from
    the contents in the textbook, expressing her disagreement with
    naturalistic evolution and started going on about intelligent design and similar.

    At the time, I wasn't entirely sure what to make of this, she was
    putting her job on the line by doing this (I am not sure what happened
    with her after this).

    Like, it wasn't usually a thing that the teachers would go against the textbook.


    I didn't do much at the time, I think at the time I didn't expect that I
    would still be around this far into the future (decades later).


    Well, and I apparently missed
    the point of school, thinking it was more of an endurance thing with
    sort of a vague pretense of education (and I probably would have learned
    more if they just let me spend the time doing whatever else).

    For most people, school attempts to give the students just enough knowledge that they are not burdens on society.
    -------------------------

    Probably.

    I think the general assumption at the time was that people would either
    go on to entry-level jobs, or some would go on to college.

    Well, and then find that none of these jobs really wanted to hire anyone.

    Like, stores aren't going to hire more people to work the registers if
    they already have enough people working the registers. Well, or the
    people who went to do inventory or warehouse jobs, etc.


    The tricks Itanium could do with combinations of predicate registers were >>> pretty weird. There was at least one instruction for manipulating them
    which I was entirely unable to understand, with the manual in front of me >>> and pencil and paper to try examples. Fortunately, it never occurred in
    code generated by any of the compilers I used.

    It could have been a case where the obvious logic decoding "that" field in the instruction allowed for "a certain pattern" to perform what they described
    in the spec. I did some of this in Mc 88100, and this is what taught me never to do it again or allow anyone else to do it again.


    I haven't looked all that deeply into IA-64 predicate handling, partly
    as I had done it in a different way.


    Possibly.

    I had also looked into a more limited set of predicate registers at one
    point, but this fizzled in favor of just using GPRs.

    So, as noted:
    I have 1 predicate bit (T bit);
    Had looked into expanding it to 2 predicate bits (using an S bit as a
    second predicate), but this went nowhere.

    I have tried several organizations over the last 40 years of practice::
    In my Humble and Honest Opinion, the only constructs predicates should support are singular comparisons and comparisons using && and || with deMoganizing logic {~}--not because other forms are unuseful, but be-
    cause those are the constructs programmers use writing code.



    In this case, the pattern could have been expanded:
    OP //unconditional
    OP?T //T is Set
    OP?F //T is Clear
    OP?ST //S is Set
    OP?SF //S is Clear

    Analysis benefit of S bit here? None.
    The only notable nominal benefit of multiple predicate bits would be in
    the 1 true / 1 false case, but this is already handled by the ?T / ?F
    scheme, which (unlike IA-64) would not need multiple predicate bits for
    the THEN and ELSE branch.

    I considered U bits, but this went nowhere.
    These could have been U0..U7 as 1 bit flags, but would still need an
    operation to direct them into T to use for actual predication.


    Even if U-bit instructions were added, they couldn't save much over, say
    (RV like notation for XG3, *1):
    SGE X10, X18, 1
    SLT X11, X19, 10
    AND X0, X10, X11 //T=(X10&X11)!=0
    OP?T ...
    OP?F ...
    ...
    For:
    if((x>0) && (x<10))
    { ... }
    else
    { ... }

    Or, "if(!x) ...":
    SEQ X0, X18, X0
    OP1?T


    *1: This particular pattern is N/E in XG1 or XG3, and N/A in RISC-V.
    In XG1/XG2, there were 2R instructions for some of these cases instead
    (but these were dropped in favor of using X0/Zero to encode the same
    intent, but as noted XG1 and XG2 lack a Zero register).

    In theory, could revive more of XG2's 2R instructions in XG3, but the alternative here being to just leave them as disallowed and use
    zero-register encodings to signal the same intent (in the name of
    simplifying the ISA design).


    Things get maybe more complex for nested branching:
    if(x<y)
    {
    if(x<z)
    ... else ...
    }else
    {
    ...
    }
    But, this is more a compiler/code-structuring issue than an
    actual/significant ISA limitation. And, in most other cases, complex
    nested branches represent cases too bulky to benefit from predication
    (much past a small number of instructions, loses out to the use of
    branches).


    Where using GPRs here achieves the same basic effect without needing to
    add any new encodings or special-case handling to direct comparison
    output into the U bits. Though, ultimately, the U bits ended up used
    more just as a way to optionally detect LR stomping.

    ...


    Nevermind if XG3 still falls slightly behind XG2 for code-density.
    Harder to nail this down exactly, possibly:
    Usage of 8-arg ABI over 16 arg ABI (*);
    Slightly fewer callee save registers (28 vs 31);
    Loss of various 2R instructions and similar;
    ...

    *: Had temporarily moved to a 16-arg ABI, but ended up reverting this
    choice as the number of ABI related issues was non-zero. Did keep a
    register assignment change that went from 24 to 28 callee save registers.


    Code density and performance still beat out my extended variants of
    RISC-V though.

    ...


    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Sat Feb 28 16:41:53 2026
    From Newsgroup: comp.arch

    BGB wrote:
    On 2/22/2026 3:52 PM, John Dallman wrote:
    In article <10nak0a$nrac$2@dont-email.me>, cr88192@gmail.com (BGB) wrote:

    Does imply that my younger self was notable, and not seen as just
    some otherwise worthless nerd.

    Educators who are any good notice the weird kids who are actually smart.


    Sometimes I question if I really am though.

    Like, some evidence says I am, but by most metrics of "life success" I
    have done rather poorly.


    And, in middle and high-school, they just sorta forced me to sit through normal classes (which sucked really hard). Well, and I apparently missed
    the point of school, thinking it was more of an endurance thing with
    sort of a vague pretense of education (and I probably would have learned more if they just let me spend the time doing whatever else).

    ...



    But, it seems like a case of:
    By implication, I am smart, because if I wasn't, even my own (sometimes pointless) hobby interests would have been out of reach.

    Like, not a world of difficulty justifying them, or debating whether or
    not something is worth doing, but likely not something someone could do
    at all.


    Or, maybe, like encountering things that seem confusing isn't such a
    rare experience (or that people have learned how to deal more
    productively with things they can see but don't understand?...).


    But, there is a thing I have noted:
    I had a few times mentioned to people about finding that certain AIs had gotten smart enough to start understanding how a 5/6 bit finite state machine to predict repeating 1-4 bit patterns would be constructed.

    Then, I try to describe it, and then realize that for the people I try
    to mention it to, it isn't that they have difficulty imagining how one
    would go about filling in the table and getting all of the 4 bit
    patterns to fit into 32 possible states. Many seem to have difficulty understanding how such a finite state machine would operate in the first place.


    Even if, it seems like this part seems like something that pretty much anyone should be able to understand.

    Initially, I had used this as a test case for the AIs because it posed "moderate difficulty" for problems which could be reasonably completely described in a chat prompt (and is not overly generic).

    Nevermind if it is still a pain to generate tables by hand, and my
    attempts at hand-generated tables have tended to have worse adaptation
    rates than those generated using genetic algorithms (can be more clean looking, but tend to need more input bits to reach the target state if
    the pattern changes).


    Sometimes I feel like a poser.
    Other things, it seems, I had taken for granted.

    Seems sometimes if I were "actually smart", would have figured out some
    way to make better and more efficient use of my span of existence.

    BGB, please don't give up!

    I think it is very obvious to all the regulars here that you are
    obviously very bright, otherwise you would never even have started most
    of the projects you've told us about, not to mention actually making
    them work.

    Yes, on a number of occations I have thought that maybe you were
    attacking the wrong set of problems, but personally I've been very
    impressed, for several years now.

    Just keep on doing what you find interesting!

    Terje
    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Sat Feb 28 16:57:00 2026
    From Newsgroup: comp.arch

    MitchAlsup wrote:

    BGB <cr88192@gmail.com> posted:

    On 2/22/2026 3:52 PM, John Dallman wrote:
    In article <10nak0a$nrac$2@dont-email.me>, cr88192@gmail.com (BGB) wrote: >>>
    Does imply that my younger self was notable, and not seen as just
    some otherwise worthless nerd.

    Educators who are any good notice the weird kids who are actually smart. >>>

    Sometimes I question if I really am though.

    Like, some evidence says I am, but by most metrics of "life success" I
    have done rather poorly.


    And, in middle and high-school, they just sorta forced me to sit through
    normal classes (which sucked really hard)

    In my case, I remember sitting in the back of advanced algebra class
    (mostly senior HS people, me a sophomore) doing chemistry homework while vaguely listening to the teacher fail to get various students to solve
    a typical algebra problem. Then she called on me, I looked up at the board and in less than a second I rattled off the answer skipping 5 steps along
    the way. Moral, don't be bored in class, do something useful instead.

    I used a double physics time slot (i.e two 50-min time slots with a
    5-min break between them) in exactly the same way, except that I
    calculated ~24 digits of pi using the taylor series for atan(1/5) and atan(1/239). The latter part was much faster of course!

    Doing long divisions by 25 and (n^2+n) took the majority of the time.

    Terje
    PS. I re-implemented the exact same algorithm, using base 1e10, on the
    very first computer I got access to, a Univac 110x in University. This
    was my first ever personal piece of programming.
    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Sat Feb 28 17:36:47 2026
    From Newsgroup: comp.arch

    Terje Mathisen <terje.mathisen@tmsw.no> writes:
    MitchAlsup wrote:

    In my case, I remember sitting in the back of advanced algebra class
    (mostly senior HS people, me a sophomore) doing chemistry homework while
    vaguely listening to the teacher fail to get various students to solve
    a typical algebra problem. Then she called on me, I looked up at the board >> and in less than a second I rattled off the answer skipping 5 steps along
    the way. Moral, don't be bored in class, do something useful instead.

    I used a double physics time slot (i.e two 50-min time slots with a
    5-min break between them) in exactly the same way, except that I
    calculated ~24 digits of pi using the taylor series for atan(1/5) and >atan(1/239). The latter part was much faster of course!

    Coincidentally, I did the same exercise with the taylor
    series, albeit after school when I had access to the
    ASR-33 remotely dialed into either a PDP-8 (TSS/8.24) or
    an HP-3000 (MPE). I might have a listing of the
    PDP-8 basic program around in a box somewhere.



    Doing long divisions by 25 and (n^2+n) took the majority of the time.

    Terje
    PS. I re-implemented the exact same algorithm, using base 1e10, on the
    very first computer I got access to, a Univac 110x in University. This
    was my first ever personal piece of programming.

    My first was a simple BASIC "hello world" program in 1974 on a
    Burroughs B5500 (remotely, via again an ASR-33) which we had
    for a week in 7th grade math class.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Mon Feb 9 19:09:50 2026
    From Newsgroup: comp.arch


    Paul Clayton <paaronclayton@gmail.com> posted:

    On 11/5/25 3:52 PM, MitchAlsup wrote:

    Robert Finch <robfi680@gmail.com> posted:

    On 2025-11-05 1:47 a.m., Robert Finch wrote:
    -----------
    I am now modifying Qupls2024 into Qupls2026 rather than starting a
    completely new ISA. The big difference is Qupls2024 uses 64-bit
    instructions and Qupls2026 uses 48-bit instructions making the code 25%
    more compact with no real loss of operations.

    Qupls2024 also used 8-bit register specs. This was a bit of overkill and >> not really needed. Register specs are reduced to 6-bits. Right-away that >> reduced most instructions eight bits.

    4 register specifiers: check.

    I decided I liked the dual operations that some instructions supported,
    which need a wide instruction format.

    With 48-bits, if you can get 2 instructions 50% of the time, you are only 12% bigger than a 32-bit ISA.

    I must be misunderstanding your math; if half of the
    6-byte instructions are two operations, I think that
    means 12 bytes would have three operations which is
    the same as for a 32-bit ISA.

    Perhaps you meant for every two instructions, there
    is a 50% chance neither can be "fused" and a 50%
    chance they can be fused with each other; this would
    get four operations in 18 bytes, which _is_ 12.5%
    bigger. That seems an odd expression, as if the
    ability to fuse was not quasi-independent.

    It could just be that one of us has a "thought-O".

    One gotcha is that 64-bit constant overrides need to be modified. For
    Qupls2024 a 64-bit constant override could be specified using only a
    single additional instruction word. This is not possible with 48-bit
    instruction words. Qupls2024 only allowed a single additional constant
    word. I may maintain this for Qupls2026, but that means that a max
    constant override of 48-bits would be supported. A 64-bit constant can
    still be built up in a register using the add-immediate with shift
    instruction. It is ugly and takes about three instructions.

    It was that sticking problem of constants that drove most of My 66000
    ISA style--variable length and how to encode access to these constants
    and routing thereof.

    Motto: never execute any instructions fetching or building constants.

    I am guessing that having had experience with x86
    (and the benefit of predecode bits), you recognized
    that VLE need not be horribly complex to parse.
    My 66000 does not use "start bits", but the length
    is quickly decoded from the first word and the
    critical information is in mostly fixed locations
    in the first word.

    My 66000 Constants are available when:
    inst<31> = 0 and
    inst<30> != inst<29> and
    inst<6> = 1 where
    inst<6:5> = 10 means 32-bit
    constant and inst<6:5> = 11 means 64-bit constant.
    6 total gates and 2-gates of delay gives unary 3-bit
    instruction length.

    (One might argue that opcode
    can be in two locations depending on if the
    instruction uses a 16-bit immediate or not rCo
    assuming I remember that correctly.)

    Obviously, something like DOUBLE could provide
    extra register operands to a complex instruction,
    though there may not be any operation needing
    five register inputs. Similarly, opcode refinement
    (that does not affect operation routing) could be
    placed into an "immediate". I think you do not
    expect to need such tricks because reduced
    number of instructions is a design principle and
    there is lots of opcode space remaining, but I
    feel these also allow the ISA to be extended in
    unexpected directions.

    I think that motto could be generalized to "do
    not do at decode time what can be done at
    compile time" (building immediates could be
    or link time.
    "executed" in decode). There are obvious limits
    to that principle; e.g., one would not encode
    instructions as control bits, i.e., "predecoded",
    in order to avoid decode work. For My 66000
    immediates, reducing decode work also decreases
    code size.

    Discerning when to apply a transformation and if/
    where to cache the result seems useful. E.g., a
    compiler caches the source code to machine code
    transformation inside an executable binary. My
    66000's Virtual Vector Method implementations
    are expected, from what I understand, to cache
    fetch and decode work and simplify operand
    routing.

    First v in vVM is lower case.

    Caching branch prediction information in an
    instruction seems to be viewed generally as not
    worth much since dynamic predictors are generally
    more accurate.

    Yes. If your branch predictor is having problems
    then use predication for flow control.

    Static prediction by branch
    "type" (e.g., forward not-taken) can require no
    additional information. (Branch prediction
    _directives_ are somewhat different. Such might
    be used to reduce the time for a critical path,
    but average time is usually a greater concern.)
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Mar 1 12:18:10 2026
    From Newsgroup: comp.arch

    Scott Lurndal <scott@slp53.sl.home> schrieb:

    My first was a simple BASIC "hello world" program in 1974 on a
    Burroughs B5500 (remotely, via again an ASR-33) which we had
    for a week in 7th grade math class.

    I started out on my father's first programmable pocket calculator,
    a Casio model with 38 steps (I think).

    I was quite proud when I managed to factorize 123456789, which
    took some time.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Andy Valencia@vandys@vsta.org to comp.arch on Sun Mar 1 07:55:48 2026
    From Newsgroup: comp.arch

    Thomas Koenig <tkoenig@netcologne.de> writes:
    I was quite proud when I managed to factorize 123456789, which
    took some time.

    Out of curiosity, I just used /usr/bin/factor: 3 3 3607 3803

    Which took 3ms. :)

    Andy Valencia
    Home page: https://www.vsta.org/andy/
    To contact me: https://www.vsta.org/contact/andy.html
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch on Sun Mar 1 19:19:45 2026
    From Newsgroup: comp.arch

    On 01/03/2026 13:18, Thomas Koenig wrote:
    Scott Lurndal <scott@slp53.sl.home> schrieb:

    My first was a simple BASIC "hello world" program in 1974 on a
    Burroughs B5500 (remotely, via again an ASR-33) which we had
    for a week in 7th grade math class.

    I started out on my father's first programmable pocket calculator,
    a Casio model with 38 steps (I think).


    Would that have been a Casio fx-3600P ? I bought one of these as a
    teenager, and used it non-stop. 38 steps of program space was not a
    lot, but I remember making a library for complex number calculations for it.

    I was quite proud when I managed to factorize 123456789, which
    took some time.

    I used mine to find formulas for numerical integration (like Simpson's
    rule, but higher order). Basically useless, but fun!

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Mar 1 20:24:04 2026
    From Newsgroup: comp.arch

    David Brown <david.brown@hesbynett.no> schrieb:
    On 01/03/2026 13:18, Thomas Koenig wrote:
    Scott Lurndal <scott@slp53.sl.home> schrieb:

    My first was a simple BASIC "hello world" program in 1974 on a
    Burroughs B5500 (remotely, via again an ASR-33) which we had
    for a week in 7th grade math class.

    I started out on my father's first programmable pocket calculator,
    a Casio model with 38 steps (I think).


    Would that have been a Casio fx-3600P ? I bought one of these as a teenager, and used it non-stop. 38 steps of program space was not a
    lot, but I remember making a library for complex number calculations for it.

    Either the fx-180P or the fx-3600P.


    I was quite proud when I managed to factorize 123456789, which
    took some time.

    I used mine to find formulas for numerical integration (like Simpson's
    rule, but higher order). Basically useless, but fun!

    Later, I had a fx-602P, which was a much larger beast. For this,
    I programmed a whole "Kurvendiskussion" (not sure what the English
    term is, it entails finding roots, extrema and inflection points),
    learning about the non-joys of numeric differentiation in the process.
    I deleted this before my final exams, though :-)

    I still have a list of programs I wrote back then, including a
    Moon Lander, although I lost the calculator when studying.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From kegs@kegs@provalid.com (Kent Dickey) to comp.arch on Sun Mar 1 21:12:39 2026
    From Newsgroup: comp.arch

    In article <2026Feb21.194112@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    Stefan Monnier <monnier@iro.umontreal.ca> writes:
    At the time of conception, there were amny arguments that {sooner or
    later} compilers COULD figure stuff like this out.

    I can't remember seeing such arguments comping from compiler people, tho.

    Actually, the IA-64 people could point to the work on VLIW (in
    particular, Multiflow (trace scheduling) and Cydrome (software
    pipelining)), which in turn is based on the work on compilers for
    microcode.

    That did not solve memory latency, but that's a problem even for OoO
    cores.

    I suspect a big part of the problem was tension between Intel and HP
    were the only political solution was allowing the architects from both
    sides to "dump in" their favorite ideas. A recipe for disaster.

    The HP side had people like Bob Rau (Cydrome) and Josh Fisher
    (Multiflow), and given their premise, the architecture is ok; somewhat
    on the complex side, but they wanted to cover all the good ideas from
    earlier designs; after all, it was to be the one architecture to rule
    them all (especially performancewise). You cannot leave out a feature
    that a competitor could then add to outperform IA-64.

    The major problem was that the premise was wrong. They assumed that
    in-order would give them a clock rate edge, but that was not the case,
    right from the start (The 1GHz Itanium II (released July 2002)
    competed with 2.53GHz Pentium 4 (released May 2002) and 1800MHz Athlon
    XP (released June 2002)). They also assumed that explicit parallelism
    would provide at least as much ILP as hardware scheduling of OoO CPUs,
    but that was not the case for general-purpose code, and in any case,
    they needed a lot of additional ILP to make up for their clock speed >disadvantage.

    As I've said before: I worked at HP during IA64, and it was not driven
    by technical issues, but rather political/financial issues.

    On HP's side, IA64 was driven by HP Labs, which was an independent group
    doing technical investigations without any clear line to products. They
    had to "sell" their ideas to the HP development groups, who could ignore them. They managed to get some upper level HP managers interested in IA64,
    and took that directly to Intel. The HP internal development groups (the
    ones making CPUs and server/workstation chipsets) did almost nothing with
    IA64 until after Intel announced the IA64 agreement.

    IA64 was called PrecisionArchitecture-WideWord (PA-WW) by HP Labs as a
    follow on to PA-RISC. The initial version of PA-WW had no register
    interlocks whatsoever, code had to be written to know the L1 and L2
    cache latency, and not touch the result registers too soon. This was
    laughed out of the room, and they came back with interlocks in the next iteration. This happened in 1993-1994, which was before the Out-of-Order
    RISCs came to market (but they were in development in HP and Intel), so the IA64 decisions were being made in the time window before folks really got to see what OoO could do.

    Also on HP's side, we had our own fab, which was having trouble keeping up
    with the rest of the industry. Designers felt performance was not
    predictable, and the fab's costs were escalating. The fab was going to
    have trouble getting to 180nm and beyond. So HP wanted access to Intel's
    fabs, and that was part of the IA64 deal--we could make PA-RISC chips on Intel's fabs for a long time.

    On Intel's side, Intel was divided very strongly geographically. At the
    time, Hillsboro was "winning" in the x86 CPU area, and Santa Clara was
    on the outs (I think they did 860 and other failures like that). So
    when Santa Clara heard of IA64, they jumped on the opportunity--a way to
    leap past Hillsboro. IA64 solved the AMD problem--with all new IA64
    patents, AMD couldn't clone it like x86, so management was interested. Technically, IA64 just had to be "as good as" x86, to make it worth
    while to jump to a new architecture which removes their competitor. I
    can see how even smart folks could get sucked in to thinking
    "architecture doesn't matter, and this new one prevents clones, so we
    should do it to eventually make more money".

    Both companies had selfish mid-level managers who saw a way to pad their resumes to leap to VP of engineering almost anywhere else. And they were right--on HP's side, I think every manager involved moved to a promotion
    at another company just before Merced came out. So IA64 was not going to
    get canceled--the managers didn't want to admit they were wrong.

    Both companies also saw IA64 as a way to kill off the RISC competitors.
    And on this point, they were right, IA64 did kill the RISC minicomputer
    market.

    The technical merits of IA64 don't make the top 5 in the list of reasons to
    do IA64 for either company.

    But HP using Intel's fabs didn't work out well. HP's first CPU on
    Intel's fabs was the 360MHz PA-8500. This was a disappointing step up
    from the 240MHz PA-8200 (which was partly speed limited by external L1
    cache memory, running at 4ns, and the 8500 moved to on-chip L1 cache
    memory, removing that limit). It turned out Intel's fab advantage was consistency and yield, not speed, and so it would take tuning to get the
    speed up. Intel did this tuning with large teams, and this was not easy
    for HP to replicate. And by this time, IBM was marketing a 180nm copper
    wire SOI process which WAS much faster (and yields weren't a concern for
    HP), so after getting the PA-8500 up to 550MHz after a lot of work, HP
    jumped to IBM as a fab, and the speeds went up to 750MHz and then 875MHz with some light tuning (and a lot less work).

    Everyone technically minded knew IA64 was technically not that great, but
    both companies had their reasons to do it anyway.

    Kent
    --- Synchronet 3.21d-Linux NewsLink 1.2