• Combining Practicality with Perfection

    From John Savard@quadibloc@invalid.invalid to comp.arch on Sat Jan 31 18:28:36 2026
    From Newsgroup: comp.arch

    I had looked into unusual memory architectures to allow a computer to be designed which had single-precision floats that were 36 bits long, so that
    it would be possible more often to avoid recourse to double precision, and which had double-precision floats that were 60 bits long, also a multiple
    of 12, because it wasn't necessary to have all the precision of 64-bit
    floats.

    Also thrown in were 48-bit floats, which were designed to have 11-digit precision and a range just exceeding 10^-99 to 1^99, so as to be
    comparable with what scientific calculators offer.

    While it was interesting to examine the possible ways this could be
    managed, all the possibilities involved awkwardness and complexity - as
    might be expected.

    So how could I achieve my original goals while avoiding awkwardness?

    Well, I came up with this:

    Have floating-point formats that are either 36 bits long or 72 bits long.

    That way, the 36-bit format is available, and longer formats, being twice
    as long, are easy to fetch from memory.

    One of the 72-bit formats has the same significand (or mantissa) length as
    the 48-bit floats in my idealized computer. But no bits are wasted;
    instead, the exponent field is just enlarged.

    It's still a conventional floating-point format, where the lengths of the exponent and significand are fixed, unlike John Gustavson's posits. But
    this gives it the advantage that either a computation will fail, or the precision of all the intermediate results will be the same as that of the final result; no catastrophic loss of precision will pass by unnoticed.

    The other 72-bit format has a significand the same size as that of IEEE
    754 64-bit floats. Offering lower precision, the same as that of a 60-bit float... would, no doubt, be too tough a sell. So the exponent field,
    while not as large as that of the other format, would still be 8 bits
    longer than usual, which, no doubt would be helpful.

    John Savard
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Thu Feb 5 01:57:22 2026
    From Newsgroup: comp.arch


    John Savard <quadibloc@invalid.invalid> posted:

    I had looked into unusual memory architectures to allow a computer to be designed which had single-precision floats that were 36 bits long, so that it would be possible more often to avoid recourse to double precision, and which had double-precision floats that were 60 bits long, also a multiple
    of 12, because it wasn't necessary to have all the precision of 64-bit floats.

    Any machine on sale today (selling at les 100,000 machines/year)
    provide 36-bit or 60-bit or 72-bit FP ?!?

    If you want to build a 12-bit=base machine, go ahead--just don't
    expect much takeup.

    Also thrown in were 48-bit floats, which were designed to have 11-digit precision and a range just exceeding 10^-99 to 1^99, so as to be
    comparable with what scientific calculators offer.

    While it was interesting to examine the possible ways this could be
    managed, all the possibilities involved awkwardness and complexity - as might be expected.

    There really is something special about 2^(3+n) data sizes.
    It _IS_ what everyone wants...

    So how could I achieve my original goals while avoiding awkwardness?

    Avoid non 8^n design points altogether.

    Well, I came up with this:

    Have floating-point formats that are either 36 bits long or 72 bits long.

    Ok, better than the above, 12^n -> {12, 24, 48, 96}
    WOOPS no 36, 60 or 72 !!!
    ............................6^n -> {6, 12, 24, 48, 96} still does not work.

    That way, the 36-bit format is available, and longer formats, being twice
    as long, are easy to fetch from memory.

    Your problem is that 36 is not a 2^ of anything !

    One of the 72-bit formats has the same significand (or mantissa) length as the 48-bit floats in my idealized computer. But no bits are wasted;
    instead, the exponent field is just enlarged.

    72-bit FP (ala IEEE754 rules) is arguably better than Posits.

    It's still a conventional floating-point format, where the lengths of the exponent and significand are fixed, unlike John Gustavson's posits. But
    this gives it the advantage that either a computation will fail, or the precision of all the intermediate results will be the same as that of the final result; no catastrophic loss of precision will pass by unnoticed.

    The other 72-bit format has a significand
    ?? fraction ??
    the same size as that of IEEE
    754 64-bit floats. Offering lower precision, the same as that of a 60-bit float... would, no doubt, be too tough a sell. So the exponent field,
    while not as large as that of the other format, would still be 8 bits
    longer than usual, which, no doubt would be helpful.

    John Savard
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Thu Feb 5 06:06:05 2026
    From Newsgroup: comp.arch

    On Thu, 05 Feb 2026 01:57:22 +0000, MitchAlsup wrote:
    John Savard <quadibloc@invalid.invalid> posted:

    I had looked into unusual memory architectures to allow a computer to
    be designed which had single-precision floats that were 36 bits long,
    so that it would be possible more often to avoid recourse to double
    precision, and which had double-precision floats that were 60 bits
    long, also a multiple of 12, because it wasn't necessary to have all
    the precision of 64-bit floats.

    Any machine on sale today (selling at les 100,000 machines/year)
    provide 36-bit or 60-bit or 72-bit FP ?!?

    Not that I know of. Of course, there's Univac, which still sells machines supporting their old 36-bit architecture.

    If you want to build a 12-bit=base machine, go ahead--just don't expect
    much takeup.

    That's indeed the problem, so I tried to address the problem.

    So how could I achieve my original goals while avoiding awkwardness?

    Avoid non 8^n design points altogether.

    That, unfortunately, couldn't achieve my original goals.

    Well, I came up with this:

    Have floating-point formats that are either 36 bits long or 72 bits
    long.

    Ok, better than the above, 12^n -> {12, 24, 48, 96}
    WOOPS no 36, 60 or 72 !!!
    ............................6^n -> {6, 12, 24, 48, 96} still does not
    work.

    The idea is now there's a 9-bit byte, and everything is build around that 9-bit byte. Although 9 is not a power of two, all other lengths are 9
    times a power of two, so binary addressing of these bytes and two-byte and four-byte and eight-byte quantities remains just as simple as on a pure
    2^n machine.

    Since 2^n machines with *bit addressing* are just about as rare as 36-bit
    and 60-bit machines... now my proposal is "just as good".

    I _still_ don't _really_ expect much takeup, even though my floats have
    sizes that seem to match the precisions those engaged in scientific
    computing were fond of.

    One of the 72-bit formats has the same significand (or mantissa) length
    as the 48-bit floats in my idealized computer. But no bits are wasted;
    instead, the exponent field is just enlarged.

    72-bit FP (ala IEEE754 rules) is arguably better than Posits.

    At least one bit of positivity.

    The other 72-bit format has a significand

    ?? fraction ??

    A floating-point number usually has three parts; a sign, an exponent
    (which includes its own sign) and...

    a coefficient or mantissa or fraction... which is now referred to, in the
    IEEE standard, as a "significand", so I guess we have to get use to the
    new official name for it.

    John Savard

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Robert Finch@robfi680@gmail.com to comp.arch on Thu Feb 5 08:10:58 2026
    From Newsgroup: comp.arch

    On 2026-02-05 1:06 a.m., quadi wrote:
    On Thu, 05 Feb 2026 01:57:22 +0000, MitchAlsup wrote:
    John Savard <quadibloc@invalid.invalid> posted:

    I had looked into unusual memory architectures to allow a computer to
    be designed which had single-precision floats that were 36 bits long,
    so that it would be possible more often to avoid recourse to double
    precision, and which had double-precision floats that were 60 bits
    long, also a multiple of 12, because it wasn't necessary to have all
    the precision of 64-bit floats.

    Any machine on sale today (selling at les 100,000 machines/year)
    provide 36-bit or 60-bit or 72-bit FP ?!?

    Not that I know of. Of course, there's Univac, which still sells machines supporting their old 36-bit architecture.

    If you want to build a 12-bit=base machine, go ahead--just don't expect
    much takeup.

    That's indeed the problem, so I tried to address the problem.

    So how could I achieve my original goals while avoiding awkwardness?

    Avoid non 8^n design points altogether.

    That, unfortunately, couldn't achieve my original goals.

    Well, I came up with this:

    Have floating-point formats that are either 36 bits long or 72 bits
    long.

    Ok, better than the above, 12^n -> {12, 24, 48, 96}
    WOOPS no 36, 60 or 72 !!!
    ............................6^n -> {6, 12, 24, 48, 96} still does not
    work.

    The idea is now there's a 9-bit byte, and everything is build around that 9-bit byte. Although 9 is not a power of two, all other lengths are 9
    times a power of two, so binary addressing of these bytes and two-byte and four-byte and eight-byte quantities remains just as simple as on a pure
    2^n machine.

    Since 2^n machines with *bit addressing* are just about as rare as 36-bit
    and 60-bit machines... now my proposal is "just as good".

    I _still_ don't _really_ expect much takeup, even though my floats have
    sizes that seem to match the precisions those engaged in scientific
    computing were fond of.

    One of the 72-bit formats has the same significand (or mantissa) length
    as the 48-bit floats in my idealized computer. But no bits are wasted;
    instead, the exponent field is just enlarged.

    72-bit FP (ala IEEE754 rules) is arguably better than Posits.

    At least one bit of positivity.

    The other 72-bit format has a significand

    ?? fraction ??

    A floating-point number usually has three parts; a sign, an exponent
    (which includes its own sign) and...

    a coefficient or mantissa or fraction... which is now referred to, in the IEEE standard, as a "significand", so I guess we have to get use to the
    new official name for it.

    John Savard

    I do not see anything wrong with an odd sized machine. One just has to
    accept that it would not be accepted by the community at large. One
    would be expending a lot of effort. It may be considered an artistic
    endevour.

    I have toyed with the idea of 10 or 11 bit bytes as there could be byte
    error correct on them if they used 16-bits.

    An issue is that 2^n floats work very well. Some of the approximations
    are more costly to achieve (larger tables) with a wider byte. Estimating
    a reciprocal to eight bits is less costly than 9 or 10 bits. I would not
    do anything other than 32/64/128 bits floats even in a machine with odd
    sized bytes.

    I have made a couple of machines (not really finished off) with odd byte
    sizes and it is a ton of work to get the software working. Best to stick
    with eight bits.


    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Thu Feb 5 14:50:47 2026
    From Newsgroup: comp.arch

    quadi <quadibloc@ca.invalid> writes:
    On Thu, 05 Feb 2026 01:57:22 +0000, MitchAlsup wrote:
    John Savard <quadibloc@invalid.invalid> posted:

    I had looked into unusual memory architectures to allow a computer to
    be designed which had single-precision floats that were 36 bits long,
    so that it would be possible more often to avoid recourse to double
    precision, and which had double-precision floats that were 60 bits
    long, also a multiple of 12, because it wasn't necessary to have all
    the precision of 64-bit floats.

    Any machine on sale today (selling at les 100,000 machines/year)
    provide 36-bit or 60-bit or 72-bit FP ?!?

    Not that I know of. Of course, there's Univac, which still sells machines >supporting their old 36-bit architecture.

    The last Unisys CMOS machines were shipped over a decade ago. Modern 2200 systems are all emulated on standard x86 cores running under linux.

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Thu Feb 5 23:58:52 2026
    From Newsgroup: comp.arch

    On Thu, 05 Feb 2026 06:06:05 +0000, quadi wrote:

    The idea is now there's a 9-bit byte, and everything is build around
    that 9-bit byte. Although 9 is not a power of two, all other lengths are
    9 times a power of two, so binary addressing of these bytes and two-byte
    and four-byte and eight-byte quantities remains just as simple as on a
    pure 2^n machine.

    Since 2^n machines with *bit addressing* are just about as rare as
    36-bit and 60-bit machines... now my proposal is "just as good".

    I have given more thought to interoperability with the 8-bit world.

    Giving it additional numeric types which are stored normally in registers,
    but which are stored in memory using only the least significant eight bits
    of each nine-bit byte, would allow it to exchange data with conventional machines based on the eight-bit byte.

    John Savard
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From jgd@jgd@cix.co.uk (John Dallman) to comp.arch on Fri Feb 6 16:37:00 2026
    From Newsgroup: comp.arch

    In article <10m3ars$3loo1$1@dont-email.me>, quadibloc@ca.invalid (quadi)
    wrote:

    Giving it additional numeric types which are stored normally in
    registers, but which are stored in memory using only the least
    significant eight bits of each nine-bit byte, would allow it to
    exchange data with conventional machines based on the eight-bit
    byte.

    Isn't that going to create opcode space pressure?

    It's certainly going to create some interesting new types of bad data if pointers to the two types of data get confused.

    How are you planning to handle UTF-8, UTF-16 and UTF-32 character data? Creating UTF-9, UTF-18 and UTF-36 seems like pointless complexity.


    John
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Fri Feb 6 22:32:25 2026
    From Newsgroup: comp.arch

    On Fri, 06 Feb 2026 16:37:00 +0000, John Dallman wrote:

    Isn't that going to create opcode space pressure?

    Well, that will be less of an issue in an architecture where the
    instructions are stored in wider memory.

    How are you planning to handle UTF-8, UTF-16 and UTF-32 character data? Creating UTF-9, UTF-18 and UTF-36 seems like pointless complexity.

    I think UTF-9 was described in an April 1st RFC. But I agree with that.

    Essentially, I am now thinking that a CPU with this architecture might
    have its primary application as a numerical co-processor for a
    conventional CPU. This would provide the opportunity for carrying out computations with extra exponent range or higher precision without having
    to switch to a much larger floating-point format, thus avoiding loss of
    speed.

    One would need to create a new kind of RAM module to support a 144-bit
    wide data bus, but it would be unrealistic to create new video cards and
    so on.

    So it would have its own FORTRAN compiler - that would be the highest
    priority in software development, after some kind of operating system for
    the compiler to run within. Well, maybe porting a C compiler would need to come first, to allow everything else to be ported.

    John Savard
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Fri Feb 6 22:34:24 2026
    From Newsgroup: comp.arch

    On Fri, 06 Feb 2026 22:32:25 +0000, quadi wrote:
    On Fri, 06 Feb 2026 16:37:00 +0000, John Dallman wrote:

    How are you planning to handle UTF-8, UTF-16 and UTF-32 character data?
    Creating UTF-9, UTF-18 and UTF-36 seems like pointless complexity.

    I think UTF-9 was described in an April 1st RFC.

    Ah, yes. Here we are:

    https://www.ietf.org/rfc/rfc4042.txt

    RFC 4042.

    John Savard
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Fri Feb 6 22:40:56 2026
    From Newsgroup: comp.arch

    On Fri, 06 Feb 2026 16:37:00 +0000, John Dallman wrote:

    How are you planning to handle UTF-8, UTF-16 and UTF-32 character data?

    Character strings would normally be handled as sequences of nine-bit bytes.

    Given that the various compatibility formats for numbers would place them
    in the least significant eight bits of successive bytes, this is also how eight-bit characters would be handled; they would be placed in successive nine-bit bytes.

    Then they would be converted to 31-bit numbers representing Unicode characters; presumably as normal 36-bit integers, but they could be placed
    in 32-bit compatibility-form integers if they were going back out to a conventional system.

    John Savard
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sat Feb 7 00:57:08 2026
    From Newsgroup: comp.arch


    quadi <quadibloc@ca.invalid> posted:

    On Fri, 06 Feb 2026 16:37:00 +0000, John Dallman wrote:

    Isn't that going to create opcode space pressure?

    Well, that will be less of an issue in an architecture where the instructions are stored in wider memory.

    I would think that with 36-bit instructions, you have the OpCode space
    to 'blow'...

    How are you planning to handle UTF-8, UTF-16 and UTF-32 character data? Creating UTF-9, UTF-18 and UTF-36 seems like pointless complexity.

    I think UTF-9 was described in an April 1st RFC. But I agree with that.

    Essentially, I am now thinking that a CPU with this architecture might
    have its primary application as a numerical co-processor for a
    conventional CPU. This would provide the opportunity for carrying out computations with extra exponent range or higher precision without having
    to switch to a much larger floating-point format, thus avoiding loss of speed.

    One would need to create a new kind of RAM module to support a 144-bit
    wide data bus, but it would be unrealistic to create new video cards and
    so on.

    So it would have its own FORTRAN compiler - that would be the highest priority in software development, after some kind of operating system for the compiler to run within. Well, maybe porting a C compiler would need to come first, to allow everything else to be ported.

    C and FORTRAN will put you in a position where everything is 9|u2^n in
    size, so you might as well just design a 72-bit machine. 72-bit registers 72-bit Virtual Address Space, ...

    And then have the MMUs do translations between 8-bit Byte-world and 9-bit Byte-world. Everything in CPU-land is 72-bits... Done right, LDs and STs through certain PTEs "translate" between 64-bit world and 72-bit world.

    If you have MMU doing the translation, you do not need 8|u2^n instruction calculations--saving OpCOde space {from Concertina III !}

    John Savard

    Why Quadi ??
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From jgd@jgd@cix.co.uk (John Dallman) to comp.arch on Sat Jan 31 22:36:00 2026
    From Newsgroup: comp.arch

    In article <10llhkk$34374$2@dont-email.me>, quadibloc@invalid.invalid
    (John Savard) wrote:

    Have floating-point formats that are either 36 bits long or 72 bits
    long.

    That way, the 36-bit format is available, and longer formats, being
    twice as long, are easy to fetch from memory.
    ...
    The other 72-bit format has a significand the same size as that of
    IEEE 754 64-bit floats. Offering lower precision, the same as that
    of a 60-bit float... would, no doubt, be too tough a sell. So the
    exponent field, while not as large as that of the other format,
    would still be 8 bits longer than usual, which, no doubt would be
    helpful.

    If you're going to have non-IEEE-standard formats, that seems a way for
    them to be more acceptable. Accurate representation of smaller values
    without underflow is good.

    John
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Sat Feb 7 02:45:35 2026
    From Newsgroup: comp.arch

    On Sat, 07 Feb 2026 00:57:08 +0000, MitchAlsup wrote:
    quadi <quadibloc@ca.invalid> posted:

    John Savard

    Why Quadi ??

    http://www.quadibloc.com/crypto/co0407.htm

    John Savard

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Mon Feb 9 22:46:44 2026
    From Newsgroup: comp.arch

    On Thu, 05 Feb 2026 23:58:52 +0000, quadi wrote:

    I have given more thought to interoperability with the 8-bit world.

    Giving it additional numeric types which are stored normally in
    registers,
    but which are stored in memory using only the least significant eight
    bits of each nine-bit byte, would allow it to exchange data with
    conventional machines based on the eight-bit byte.

    I have now added a new page to my site,

    http://www.quadibloc.com/arch/per16.htm

    where this is explained more completely with illustrations.

    John Savard
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Tue Feb 10 03:49:51 2026
    From Newsgroup: comp.arch

    On Mon, 09 Feb 2026 22:46:44 +0000, quadi wrote:

    On Thu, 05 Feb 2026 23:58:52 +0000, quadi wrote:

    I have now added a new page to my site,

    http://www.quadibloc.com/arch/per16.htm

    where this is explained more completely with illustrations.

    I have further updated that page to show how this principle can be
    extended to connect the 36-bit word computer not only to a 32-bit word computer, but also to a 24-bit word computer, and I mention that integer formats as well as floating-point ones of this type are needed.

    John Savard
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Tue Feb 10 19:10:45 2026
    From Newsgroup: comp.arch


    quadi <quadibloc@ca.invalid> posted:

    On Mon, 09 Feb 2026 22:46:44 +0000, quadi wrote:

    On Thu, 05 Feb 2026 23:58:52 +0000, quadi wrote:

    I have now added a new page to my site,

    http://www.quadibloc.com/arch/per16.htm

    where this is explained more completely with illustrations.

    I have further updated that page to show how this principle can be
    extended to connect the 36-bit word computer not only to a 32-bit word computer, but also to a 24-bit word computer, and I mention that integer formats as well as floating-point ones of this type are needed.

    Over the last couple of days, I have come to the conclusion that
    your job, in the near future, is to sell the idea of a 72-bit
    computer architecture.

    Figure out some way to have 9-bit, 12-bit things (without resorting
    to 3-bit base things) and you have access to {9, 12, 18, 24, 30, 36,
    42, 48, 54, 60, 66, 72} data sizes.

    Provide a means to access 8|u2^n via PTE translation, and presto, you
    have all the (odd) data sizes You have been struggling with for Oh so
    long.

    John Savard
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Wed Feb 11 23:04:20 2026
    From Newsgroup: comp.arch

    On Tue, 10 Feb 2026 19:10:45 +0000, MitchAlsup wrote:

    Over the last couple of days, I have come to the conclusion that your
    job, in the near future, is to sell the idea of a 72-bit computer architecture.

    Here, you are raising the one question that I have been merrily avoiding
    as irrelevant. Although it is anything but irrelevant in one way, as it directly deals with the value of all this in the real world.

    Trying to persuade the world to switch from 32 bits to 36 bits? How could
    I be anything other than an amusing crank if I did that?

    I remember having read one article in a computer magazine where someone mentioned that an unfortunate result of the transition from the IBM 7090
    to the IBM System/360 was that a lot of FORTRAN programs that were able to
    use ordinary real nubers had to be switched over to double precision to
    yield acceptable results.

    And I noticed that a lot of mathematical tables from the old days went up
    to 10 digit accuracy, and scientific calculators had 10 digit displays, calculating internally to a slightly higher precision.

    And a passing remark in Petr Beckmann's "A History of Pi" about how even
    using pi to the accuracy of a computer double precision number was 'artificial' encouraged me to think of trimming down double precision a
    bit - say by one digit, to match the precision of numbers in the Control
    Data 6600, with which scientists seemed to have been quite content in its
    day.

    All this was a rather slim basis on which to conclude that our 32-bit and 64-bit floats ought to be replaced by 36-bit, 48-bit, and 60-bit floats.

    And in the days that immediately followed the emergence of the IBM System/
    360, of course, transistors were still *expensive*. So it made sense to be concerned about optimizing floating-point formats, so that their precision
    was as much as necessary, but no more - so that a computer with as few transistors as possible could perform calculations as fast as possible to
    get the results needed.

    But now? Powerful microprocessors are cheap. The cost of buying a custom specialized part would be so high as to completely eliminate the potential savings of using 36-bit floats instead of 64-bit floats when they might do.

    So the only way a benefit would result... is if 36/72 bits became the ubiquitous new standard! I suppose that _could_ happen, if it were widely acknowledged that the requirements of scientific computing would be better
    met in that case.

    So it seems as if it's impossible for the 36/72 bit transition to start on
    a small scale, with something that fills a niche demand, because the lower production volumes would create higher costs that entirely negate the
    value for the niche.

    Except...

    Speaking of niche products, there's the SX-Aurora TSUBASA from NEC... it
    looks like a video card, but it's actually the last surviving *vector* supercomputer in the Cray tradition!

    As it happens, I encountered - in my years as a grad student - a computer add-on from Floating-Point Systems which, so that it could be attached to (then still existing) 36-bit computers or 18-bit minis in addition to the 32-bit and 16-bit ones... used 38-bit floating-point numbers internally.

    And Cray style vector instructions are one thing I've been including in my various hypothetical architectures, on the grounds that there about the
    only architectural feature aimed at providing more power that (some) mainframes had that isn't routine in micros these days. Of course, though, you've noted that it can't really be effective without huge memory
    bandwidth, which is impractical to provide.

    And the SX-Aurora TSUBASA has internal memory, which may even be HBM, so
    that removes the issue that standard memory modules are designed around
    the 32/64/128/256 -bit data bus width.

    So vector modules are a potential niche that could run in 36 bits while connecting to a 32 bit world - and making 36 bits connect to 32 bits is,
    of course, just what my latest brainstorm was dealing with.

    John Savard
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Wed Feb 11 23:55:29 2026
    From Newsgroup: comp.arch


    quadi <quadibloc@ca.invalid> posted:

    On Tue, 10 Feb 2026 19:10:45 +0000, MitchAlsup wrote:

    Over the last couple of days, I have come to the conclusion that your
    job, in the near future, is to sell the idea of a 72-bit computer architecture.

    Here, you are raising the one question that I have been merrily avoiding
    as irrelevant. Although it is anything but irrelevant in one way, as it directly deals with the value of all this in the real world.

    Trying to persuade the world to switch from 32 bits to 36 bits? How could
    I be anything other than an amusing crank if I did that?

    You do not.

    Remember I said you could use PTEs to access 32-bit data (or 36-bit data)
    and thus, calculations instructions do not need 32-bit-ed-ness. So, the
    whole data-path is 36/72/144-bits wide.

    Properly arranged, your disk files remain 32-bit, internet 32-bits, ...

    I remember having read one article in a computer magazine where someone mentioned that an unfortunate result of the transition from the IBM 7090
    to the IBM System/360 was that a lot of FORTRAN programs that were able to use ordinary real nubers had to be switched over to double precision to yield acceptable results.

    I remember that, too.

    And I noticed that a lot of mathematical tables from the old days went up
    to 10 digit accuracy, and scientific calculators had 10 digit displays, calculating internally to a slightly higher precision.

    And a passing remark in Petr Beckmann's "A History of Pi" about how even using pi to the accuracy of a computer double precision number was 'artificial' encouraged me to think of trimming down double precision a
    bit - say by one digit, to match the precision of numbers in the Control Data 6600, with which scientists seemed to have been quite content in its day.

    Scientists of the day were happy with 60-bit CRAY-quality FP. They were
    not happy with IBM 32-bit, and sort-of-OK with Univac 36-bit.

    All this was a rather slim basis on which to conclude that our 32-bit and 64-bit floats ought to be replaced by 36-bit, 48-bit, and 60-bit floats.

    36/72-bit have the property that 32/64-bit^2 does not overflow !!
    avoiding all sorts of IEEE_HYPOT() problems.

    And in the days that immediately followed the emergence of the IBM System/ 360, of course, transistors were still *expensive*. So it made sense to be concerned about optimizing floating-point formats, so that their precision was as much as necessary, but no more - so that a computer with as few transistors as possible could perform calculations as fast as possible to get the results needed.

    So optimized that IBM forgot about the Guard digit !!!

    But now? Powerful microprocessors are cheap. The cost of buying a custom specialized part would be so high as to completely eliminate the potential savings of using 36-bit floats instead of 64-bit floats when they might do.

    So the only way a benefit would result... is if 36/72 bits became the ubiquitous new standard! I suppose that _could_ happen, if it were widely acknowledged that the requirements of scientific computing would be better met in that case.

    In this case YOU have to ask YOURSELF why are you providing any of those strange data sizes AT ALL ??? That is who is Concertina for ???

    So it seems as if it's impossible for the 36/72 bit transition to start on
    a small scale, with something that fills a niche demand, because the lower production volumes would create higher costs that entirely negate the
    value for the niche.

    Except...

    Speaking of niche products, there's the SX-Aurora TSUBASA from NEC... it looks like a video card, but it's actually the last surviving *vector* supercomputer in the Cray tradition!

    As it happens, I encountered - in my years as a grad student - a computer add-on from Floating-Point Systems which, so that it could be attached to (then still existing) 36-bit computers or 18-bit minis in addition to the 32-bit and 16-bit ones... used 38-bit floating-point numbers internally.

    And Cray style vector instructions are one thing I've been including in my various hypothetical architectures, on the grounds that there about the
    only architectural feature aimed at providing more power that (some) mainframes had that isn't routine in micros these days. Of course, though, you've noted that it can't really be effective without huge memory bandwidth, which is impractical to provide.

    And the SX-Aurora TSUBASA has internal memory, which may even be HBM, so that removes the issue that standard memory modules are designed around
    the 32/64/128/256 -bit data bus width.

    So vector modules are a potential niche that could run in 36 bits while connecting to a 32 bit world - and making 36 bits connect to 32 bits is,
    of course, just what my latest brainstorm was dealing with.

    John Savard
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From David Schultz@david.schultz@earthlink.net to comp.arch on Wed Feb 11 17:57:29 2026
    From Newsgroup: comp.arch

    On 2/11/26 5:04 PM, quadi wrote:
    I remember having read one article in a computer magazine where someone mentioned that an unfortunate result of the transition from the IBM 7090
    to the IBM System/360 was that a lot of FORTRAN programs that were able to use ordinary real nubers had to be switched over to double precision to
    yield acceptable results.

    This reminds me of when I took a numerical analysis course. (The many
    ways that computer calculations can go wrong and how to deal with it.)
    The professor said that the schools IBM (360 or 370, ca. 1980) was
    perfect for the course because of the defects in its floating point
    system. Guard digits and rounding sorts of things as near as I can recall.
    --
    http://davesrocketworks.com
    David Schultz
    "Gag me with a Smurf"
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From John Levine@johnl@taugh.com to comp.arch on Thu Feb 12 00:04:37 2026
    From Newsgroup: comp.arch

    According to David Schultz <david.schultz@earthlink.net>:
    This reminds me of when I took a numerical analysis course. (The many
    ways that computer calculations can go wrong and how to deal with it.)
    The professor said that the schools IBM (360 or 370, ca. 1980) was
    perfect for the course because of the defects in its floating point
    system. Guard digits and rounding sorts of things as near as I can recall.

    The 360's floating point is a famous and somewhat puzzling failure, considering how much else they got right.

    It does hex normalization rather than binary. They assumed that
    leading digits are evenly distributed so there's be on average one
    zero bit, but in fact they're geometrically distributed, so on average
    there's two. They got one bit back by making the exponent units of 16
    rather than 2, but that's still one bit gone. It truncated rather than rounding, another bit gone. They also truncated rather than rounding
    results.

    Originally there wre no guard digits which made the results comically
    bad but IBM retrofitted them at great cost to all the installed machines.

    IEEE floating point can be seen as a reaction to that, how do you use
    the same number of bits but get good results.
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Thu Feb 12 02:04:58 2026
    From Newsgroup: comp.arch


    John Levine <johnl@taugh.com> posted:

    According to David Schultz <david.schultz@earthlink.net>:
    This reminds me of when I took a numerical analysis course. (The many
    ways that computer calculations can go wrong and how to deal with it.)
    The professor said that the schools IBM (360 or 370, ca. 1980) was
    perfect for the course because of the defects in its floating point >system. Guard digits and rounding sorts of things as near as I can recall.

    The 360's floating point is a famous and somewhat puzzling failure, considering
    how much else they got right.

    It does hex normalization rather than binary. They assumed that
    leading digits are evenly distributed so there's be on average one
    zero bit, but in fact they're geometrically distributed, so on average there's two. They got one bit back by making the exponent units of 16
    rather than 2, but that's still one bit gone. It truncated rather than rounding, another bit gone. They also truncated rather than rounding results.

    Originally there wre no guard digits which made the results comically
    bad but IBM retrofitted them at great cost to all the installed machines.

    IEEE floating point can be seen as a reaction to that, how do you use
    the same number of bits but get good results.

    VAX got this correct too (the VAX format not the one inherited from
    PDP-11/45; PDP-11/40* FP was worse). VAX FP is arguably as good as
    IEEE 754 with the exception that more IEEE numbers have reciprocals
    due to the change in exponent bias by 1. {{One can STILL argue whether deNormals were a plus or a minus in IEEE}}

    CMU had a PDP-11/40 with writable control store 1974. I programmed it
    to do PDP-11/45 FP instead of PDP-11/40 FP as a Jr. project.

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From John Levine@johnl@taugh.com to comp.arch on Thu Feb 12 03:31:02 2026
    From Newsgroup: comp.arch

    According to MitchAlsup <user5857@newsgrouper.org.invalid>:

    John Levine <johnl@taugh.com> posted:

    According to David Schultz <david.schultz@earthlink.net>:
    This reminds me of when I took a numerical analysis course. (The many
    ways that computer calculations can go wrong and how to deal with it.)
    The professor said that the schools IBM (360 or 370, ca. 1980) was
    perfect for the course because of the defects in its floating point
    system. Guard digits and rounding sorts of things as near as I can recall. >>
    The 360's floating point is a famous and somewhat puzzling failure, considering
    how much else they got right.

    It does hex normalization rather than binary. They assumed that
    leading digits are evenly distributed so there's be on average one
    zero bit, but in fact they're geometrically distributed, so on average
    there's two. They got one bit back by making the exponent units of 16
    rather than 2, but that's still one bit gone. It truncated rather than
    rounding, another bit gone. They also truncated rather than rounding
    results.

    Oh I forgot that using hex exponents meant there was no hidden bit, so
    in practice it lost three bits of precision on every operation. There was
    a great deal of grumbling that people with 709x Fortran codes had to
    make everything double precision to keep getting reasonably good results.

    Originally there wre no guard digits which made the results comically
    bad but IBM retrofitted them at great cost to all the installed machines.

    IEEE floating point can be seen as a reaction to that, how do you use
    the same number of bits but get good results.

    VAX got this correct too (the VAX format not the one inherited from >PDP-11/45; PDP-11/40* FP was worse). ...

    The VAX is the first machine I know that used the hidden bit trick to
    get an extra bit of significance. The PDP-6/10 was pretty close but
    their format was two's complement which meant no hidden bit but they
    could use integer comparisons on normalized floating point numbers.
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Wed Feb 11 19:50:00 2026
    From Newsgroup: comp.arch

    On 2/11/2026 3:04 PM, quadi wrote:

    snip


    And I noticed that a lot of mathematical tables from the old days went up
    to 10 digit accuracy, and scientific calculators had 10 digit displays, calculating internally to a slightly higher precision.

    The ten digit displays came from the design of the first electric
    calculators, made by such companies as Friden and Monroe in the 1940s
    and 50s). They had ten rows of numeric keys (0-9), so that the
    operator, who presumably had ten fingers (including thumbs) could
    operate them quickly. So 10 digits sort of became standard. When
    computers came along, and the designers wanted to use binary for them,
    they needed 35 bits (including sign) to hold the ten digits. Going with
    36 bits allowed six six bit characters. The requirement from the US
    Navy (a major customer) for that precision led to the 36 bit Univac 1100 series being a 36 bit machine. Once you have 36 bit integers, you might
    as well use 36 bit floating point numbers, and then 72 bit double
    precision floating point numbers as the 1100 series did.
    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Thu Feb 12 05:51:03 2026
    From Newsgroup: comp.arch

    On Wed, 11 Feb 2026 17:57:29 -0600, David Schultz wrote:
    On 2/11/26 5:04 PM, quadi wrote:

    I remember having read one article in a computer magazine where someone
    mentioned that an unfortunate result of the transition from the IBM
    7090 to the IBM System/360 was that a lot of FORTRAN programs that were
    able to use ordinary real nubers had to be switched over to double
    precision to yield acceptable results.

    This reminds me of when I took a numerical analysis course. (The many
    ways that computer calculations can go wrong and how to deal with it.)
    The professor said that the schools IBM (360 or 370, ca. 1980) was
    perfect for the course because of the defects in its floating point
    system. Guard digits and rounding sorts of things as near as I can
    recall.

    Mitch Alsup mentioned that there was no guard digit in the floating-point arithmetic units of the various IBM System/360 models when they were
    initially released. However, this was so serious an omission, as was
    quickly noted in practice, that IBM quickly modified the design, and
    refitted all the units in the field.

    Even after this was done, though, since the exponent in IBM floating-point
    was a power of 16 rather than 2, and since floating-point calculations
    were truncated rather than rounded on the System/360, its floating-point
    was still considered to be less than the greatest.

    There were workarounds, though, which people have mostly forgotten about
    due to the ubiquity of IEEE 754 floating-point these days. A famous
    numerical analysis textbook which explained how to cope with the problems caused by substandard floating point formats was _Floating-Point
    Computation_ by Pat H. Sterbenz.

    John Savard

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Thu Feb 12 05:55:14 2026
    From Newsgroup: comp.arch

    On Wed, 11 Feb 2026 19:50:00 -0800, Stephen Fuld wrote:
    On 2/11/2026 3:04 PM, quadi wrote:

    And I noticed that a lot of mathematical tables from the old days went
    up to 10 digit accuracy, and scientific calculators had 10 digit
    displays, calculating internally to a slightly higher precision.

    The ten digit displays came from the design of the first electric calculators, made by such companies as Friden and Monroe in the 1940s
    and 50s). They had ten rows of numeric keys (0-9), so that the
    operator, who presumably had ten fingers (including thumbs) could
    operate them quickly.

    So you're saying that the tendency of log tables and the like to go up to
    a maximum of ten digits precision wasn't because ten digits were needed
    for, say, celestial mechanics or something like that, so my premise that
    ten significant digits was what scientific computation usually needs, as reflected in the design of calculators and math tables is completely
    mistaken.

    Drat!

    John Savard
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Thu Feb 12 06:04:49 2026
    From Newsgroup: comp.arch

    On Wed, 11 Feb 2026 23:55:29 +0000, MitchAlsup wrote:
    quadi <quadibloc@ca.invalid> posted:
    All this was a rather slim basis on which to conclude that our 32-bit
    and 64-bit floats ought to be replaced by 36-bit, 48-bit, and 60-bit
    floats.

    36/72-bit have the property that 32/64-bit^2 does not overflow !!
    avoiding all sorts of IEEE_HYPOT() problems.

    If one's primary floating-point format has a certain length of exponent
    and mantissa, one is needed that has at least twice the length of
    mantissa, and an exponent field that's one bit longer, then, for
    intermediate results if exact squares are sometimes needed.

    That was a point I had not thought of.

    So the only way a benefit would result... is if 36/72 bits became the
    ubiquitous new standard! I suppose that _could_ happen, if it were
    widely acknowledged that the requirements of scientific computing would
    be better met in that case.

    In this case YOU have to ask YOURSELF why are you providing any of those strange data sizes AT ALL ??? That is who is Concertina for ???

    As I've noted, the strange data sizes were provided on the basis that they would be preferable for scientific computation.

    And so I continued on to suggest a possibility might be to address a niche market (just like ARM became a thing in an x86 world by going into smartphones) - use the strange data sizes to perform computations
    internally in a numerical vector coprocessor... your internal values are a
    few bits more precise, so your final answer, back in a standard data type,
    is more accurate.

    John Savard
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Thu Feb 12 08:45:03 2026
    From Newsgroup: comp.arch

    MitchAlsup <user5857@newsgrouper.org.invalid> writes:
    {{One can STILL argue whether
    deNormals were a plus or a minus in IEEE}}

    I am surprised to read that from you, who has always written that
    denormals can be implemented cheaply and efficiently in hardware. The additional hardware cost (or the cost of trapping and software
    emulation) has been the only argument against denormals that I ever encountered.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Thu Feb 12 10:49:47 2026
    From Newsgroup: comp.arch

    On Thu, 12 Feb 2026 02:04:58 GMT
    MitchAlsup <user5857@newsgrouper.org.invalid> wrote:
    John Levine <johnl@taugh.com> posted:

    According to David Schultz <david.schultz@earthlink.net>:
    This reminds me of when I took a numerical analysis course. (The
    many ways that computer calculations can go wrong and how to deal
    with it.) The professor said that the schools IBM (360 or 370, ca.
    1980) was perfect for the course because of the defects in its
    floating point system. Guard digits and rounding sorts of things
    as near as I can recall.

    The 360's floating point is a famous and somewhat puzzling failure, considering how much else they got right.

    It does hex normalization rather than binary. They assumed that
    leading digits are evenly distributed so there's be on average one
    zero bit, but in fact they're geometrically distributed, so on
    average there's two. They got one bit back by making the exponent
    units of 16 rather than 2, but that's still one bit gone. It
    truncated rather than rounding, another bit gone. They also
    truncated rather than rounding results.

    Originally there wre no guard digits which made the results
    comically bad but IBM retrofitted them at great cost to all the
    installed machines.

    IEEE floating point can be seen as a reaction to that, how do you
    use the same number of bits but get good results.

    VAX got this correct too (the VAX format not the one inherited from PDP-11/45; PDP-11/40* FP was worse). VAX FP is arguably as good as
    IEEE 754 with the exception that more IEEE numbers have reciprocals
    due to the change in exponent bias by 1. {{One can STILL argue whether deNormals were a plus or a minus in IEEE}}
    From the perspective of stability of convergence of few common
    algorithms denormals are significant plus.
    From the perspective of minimizing surprises it is also plus. On VAX
    (a > b) does not necessarily guarantee (a-b > 0).
    I wonder in which situation it can be seen as a minus?
    There are several things that I don't like about IEEE-754 Standard, but
    none of them related to format of binary numbers.

    CMU had a PDP-11/40 with writable control store 1974. I programmed it
    to do PDP-11/45 FP instead of PDP-11/40 FP as a Jr. project.

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Thu Feb 12 10:53:58 2026
    From Newsgroup: comp.arch

    On Wed, 11 Feb 2026 17:57:29 -0600
    David Schultz <david.schultz@earthlink.net> wrote:

    On 2/11/26 5:04 PM, quadi wrote:
    I remember having read one article in a computer magazine where
    someone mentioned that an unfortunate result of the transition from
    the IBM 7090 to the IBM System/360 was that a lot of FORTRAN
    programs that were able to use ordinary real nubers had to be
    switched over to double precision to yield acceptable results.

    This reminds me of when I took a numerical analysis course. (The many
    ways that computer calculations can go wrong and how to deal with
    it.) The professor said that the schools IBM (360 or 370, ca. 1980)
    was perfect for the course because of the defects in its floating
    point system. Guard digits and rounding sorts of things as near as I
    can recall.


    Was not quality of arithmetic of CDC machines of the 70s even worse
    than that of IBM ?

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Thu Feb 12 15:54:46 2026
    From Newsgroup: comp.arch

    Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
    On 2/11/2026 3:04 PM, quadi wrote:

    snip


    And I noticed that a lot of mathematical tables from the old days went up
    to 10 digit accuracy, and scientific calculators had 10 digit displays,
    calculating internally to a slightly higher precision.

    The ten digit displays came from the design of the first electric >calculators, made by such companies as Friden and Monroe in the 1940s
    and 50s). They had ten rows of numeric keys (0-9), so that the
    operator, who presumably had ten fingers (including thumbs) could
    operate them quickly. So 10 digits sort of became standard. When
    computers came along, and the designers wanted to use binary for them,

    When computers came along, they used 40 bits to store 10 BCD digits
    (e.g. the electrodata 220 (44 bit) from the mid 50s and the successor Burroughs machines (B300, B3500). The B3500 extended the maximum operand size
    to 100 BCD digits. 80's versions of the B3500 had a 40-bit memory
    bus (operating on 10 digits at a time).


    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Thu Feb 12 08:40:36 2026
    From Newsgroup: comp.arch

    On 2/11/2026 9:55 PM, quadi wrote:
    On Wed, 11 Feb 2026 19:50:00 -0800, Stephen Fuld wrote:
    On 2/11/2026 3:04 PM, quadi wrote:

    And I noticed that a lot of mathematical tables from the old days went
    up to 10 digit accuracy, and scientific calculators had 10 digit
    displays, calculating internally to a slightly higher precision.

    The ten digit displays came from the design of the first electric
    calculators, made by such companies as Friden and Monroe in the 1940s
    and 50s). They had ten rows of numeric keys (0-9), so that the
    operator, who presumably had ten fingers (including thumbs) could
    operate them quickly.

    So you're saying that the tendency of log tables and the like to go up to
    a maximum of ten digits precision wasn't because ten digits were needed
    for, say, celestial mechanics or something like that, so my premise that
    ten significant digits was what scientific computation usually needs, as reflected in the design of calculators and math tables is completely mistaken.

    See

    https://en.wikipedia.org/wiki/36-bit_computing#History
    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Thu Feb 12 16:51:07 2026
    From Newsgroup: comp.arch

    Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
    On 2/11/2026 9:55 PM, quadi wrote:
    On Wed, 11 Feb 2026 19:50:00 -0800, Stephen Fuld wrote:
    On 2/11/2026 3:04 PM, quadi wrote:

    And I noticed that a lot of mathematical tables from the old days went >>>> up to 10 digit accuracy, and scientific calculators had 10 digit
    displays, calculating internally to a slightly higher precision.

    The ten digit displays came from the design of the first electric
    calculators, made by such companies as Friden and Monroe in the 1940s
    and 50s). They had ten rows of numeric keys (0-9), so that the
    operator, who presumably had ten fingers (including thumbs) could
    operate them quickly.

    So you're saying that the tendency of log tables and the like to go up to
    a maximum of ten digits precision wasn't because ten digits were needed
    for, say, celestial mechanics or something like that, so my premise that
    ten significant digits was what scientific computation usually needs, as
    reflected in the design of calculators and math tables is completely
    mistaken.

    See

    https://en.wikipedia.org/wiki/36-bit_computing#History'

    The Burroughs class-1 machines from the early 1900s were
    built in several widths, but the bulk of them which were
    sold to banks, etc. had 9 columns, which were often treated
    as fixed point operating on values in pennies.

    A typical column:

    https://americanhistory.si.edu/collections/object/nmah_690198
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Thu Feb 12 08:52:43 2026
    From Newsgroup: comp.arch

    On 2/12/2026 7:54 AM, Scott Lurndal wrote:
    Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
    On 2/11/2026 3:04 PM, quadi wrote:

    snip


    And I noticed that a lot of mathematical tables from the old days went up >>> to 10 digit accuracy, and scientific calculators had 10 digit displays,
    calculating internally to a slightly higher precision.

    The ten digit displays came from the design of the first electric
    calculators, made by such companies as Friden and Monroe in the 1940s
    and 50s). They had ten rows of numeric keys (0-9), so that the
    operator, who presumably had ten fingers (including thumbs) could
    operate them quickly. So 10 digits sort of became standard. When
    computers came along, and the designers wanted to use binary for them,

    When computers came along, they used 40 bits to store 10 BCD digits
    (e.g. the electrodata 220 (44 bit) from the mid 50s and the successor Burroughs
    machines (B300, B3500). The B3500 extended the maximum operand size
    to 100 BCD digits. 80's versions of the B3500 had a 40-bit memory
    bus (operating on 10 digits at a time).

    In the early days of computers, there was a distinction between
    "business" computers and "scientific" computers. Many (most?) of the
    business computers were decimal) e.g. the ones you mentioned and some
    IBM lines) and character oriented. Conversely, many of the scientific computers were binary and often used 36 bit words.

    https://en.wikipedia.org/wiki/36-bit_computing#History

    These often used 6 bit characters and conveniently used octal.

    But as the late great Tom Lehrer said, "Base eight is just like base ten
    if you have no thumbs!"

    Of course, the IBM S/360 line was designed to provide one architecture
    for both major uses. It also introduced the eight bit character (byte)
    and thus naturally hexidecimal.
    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Thu Feb 12 17:09:16 2026
    From Newsgroup: comp.arch


    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

    MitchAlsup <user5857@newsgrouper.org.invalid> writes:
    {{One can STILL argue whether
    deNormals were a plus or a minus in IEEE}}

    I am surprised to read that from you, who has always written that
    denormals can be implemented cheaply and efficiently in hardware. The additional hardware cost (or the cost of trapping and software
    emulation) has been the only argument against denormals that I ever encountered.

    It is only after IEEE 754-2008 came with FMAC that deNormals became
    a low cost addition. {And that has been my point--you seem to have
    forgotten the -2008 part or the argument}

    - anton
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Thu Feb 12 17:10:45 2026
    From Newsgroup: comp.arch


    Michael S <already5chosen@yahoo.com> posted:

    On Thu, 12 Feb 2026 02:04:58 GMT
    MitchAlsup <user5857@newsgrouper.org.invalid> wrote:

    John Levine <johnl@taugh.com> posted:

    According to David Schultz <david.schultz@earthlink.net>:
    This reminds me of when I took a numerical analysis course. (The
    many ways that computer calculations can go wrong and how to deal
    with it.) The professor said that the schools IBM (360 or 370, ca. >1980) was perfect for the course because of the defects in its
    floating point system. Guard digits and rounding sorts of things
    as near as I can recall.

    The 360's floating point is a famous and somewhat puzzling failure, considering how much else they got right.

    It does hex normalization rather than binary. They assumed that
    leading digits are evenly distributed so there's be on average one
    zero bit, but in fact they're geometrically distributed, so on
    average there's two. They got one bit back by making the exponent
    units of 16 rather than 2, but that's still one bit gone. It
    truncated rather than rounding, another bit gone. They also
    truncated rather than rounding results.

    Originally there wre no guard digits which made the results
    comically bad but IBM retrofitted them at great cost to all the
    installed machines.

    IEEE floating point can be seen as a reaction to that, how do you
    use the same number of bits but get good results.

    VAX got this correct too (the VAX format not the one inherited from PDP-11/45; PDP-11/40* FP was worse). VAX FP is arguably as good as
    IEEE 754 with the exception that more IEEE numbers have reciprocals
    due to the change in exponent bias by 1. {{One can STILL argue whether deNormals were a plus or a minus in IEEE}}

    From the perspective of stability of convergence of few common
    algorithms denormals are significant plus.
    From the perspective of minimizing surprises it is also plus. On VAX
    (a > b) does not necessarily guarantee (a-b > 0).
    I wonder in which situation it can be seen as a minus?

    a-b underflows and takes a trap.

    There are several things that I don't like about IEEE-754 Standard, but
    none of them related to format of binary numbers.


    CMU had a PDP-11/40 with writable control store 1974. I programmed it
    to do PDP-11/45 FP instead of PDP-11/40 FP as a Jr. project.



    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Thu Feb 12 18:37:49 2026
    From Newsgroup: comp.arch

    On Thu, 12 Feb 2026 08:52:43 -0800, Stephen Fuld wrote:
    On 2/12/2026 7:54 AM, Scott Lurndal wrote:

    When computers came along, they used 40 bits to store 10 BCD digits
    (e.g. the electrodata 220 (44 bit) from the mid 50s and the successor
    Burroughs machines (B300, B3500). The B3500 extended the maximum
    operand size to 100 BCD digits. 80's versions of the B3500 had a
    40-bit memory bus (operating on 10 digits at a time).

    In the early days of computers, there was a distinction between
    "business" computers and "scientific" computers. Many (most?) of the business computers were decimal) e.g. the ones you mentioned and some
    IBM lines) and character oriented. Conversely, many of the scientific computers were binary and often used 36 bit words.

    https://en.wikipedia.org/wiki/36-bit_computing#History

    These often used 6 bit characters and conveniently used octal.

    But in the _really_ early days of computers, ones that did binary
    arithmetic also often came with 40-bit words. Like EDVAC.

    John Savard
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Thu Feb 12 18:40:07 2026
    From Newsgroup: comp.arch

    On Thu, 12 Feb 2026 10:53:58 +0200, Michael S wrote:

    Was not quality of arithmetic of CDC machines of the 70s even worse than
    that of IBM ?

    I don't know about that. But I do know that despite having a power-of-two exponent, quality of arithmetic on the Cray I was pretty terrible.

    John Savard

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Thu Feb 12 18:49:01 2026
    From Newsgroup: comp.arch

    On Thu, 12 Feb 2026 08:40:36 -0800, Stephen Fuld wrote:
    On 2/11/2026 9:55 PM, quadi wrote:

    So you're saying that the tendency of log tables and the like to go up
    to a maximum of ten digits precision wasn't because ten digits were
    needed for, say, celestial mechanics or something like that, so my
    premise that ten significant digits was what scientific computation
    usually needs, as reflected in the design of calculators and math
    tables is completely mistaken.

    See

    https://en.wikipedia.org/wiki/36-bit_computing#History

    After Seymour Cray left Univac to participate in founding Control Data,
    their first product was the CDC 1604 computer, which had a 48-bit word
    length. As it had a 36-bit mantissa (not including the sign of the number)
    it had two bits more than needed for 10-digit precision (10 bits has 1,024 combinations, so 10 bits give 3 digits, thus 34 bits give ten if you don't need a bit for the sign; you were indeed right about 35 bits being the minimum).

    So the idea that ten digits is good for integers moved over to ten digits
    is good for floating-point at that stage.

    John Savard
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Thu Feb 12 20:02:08 2026
    From Newsgroup: comp.arch


    quadi <quadibloc@ca.invalid> posted:

    On Thu, 12 Feb 2026 10:53:58 +0200, Michael S wrote:

    Was not quality of arithmetic of CDC machines of the 70s even worse than that of IBM ?

    I don't know about that. But I do know that despite having a power-of-two exponent, quality of arithmetic on the Cray I was pretty terrible.

    Unlike CDC 6600/7600, CRAY multiply did not have a full 'tree' of multiplication logic--leading to all sorts of "headaches" for
    numerical analysists. Given a 4|u4 set of multiplication gates,
    cray only used 80%-odd of the required macros to get multiplication
    'right'.

    John Savard

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Thu Feb 12 20:33:05 2026
    From Newsgroup: comp.arch


    quadi <quadibloc@ca.invalid> posted:

    On Thu, 12 Feb 2026 10:53:58 +0200, Michael S wrote:

    Was not quality of arithmetic of CDC machines of the 70s even worse than that of IBM ?

    I don't know about that. But I do know that despite having a power-of-two exponent, quality of arithmetic on the Cray I was pretty terrible.

    Unlike CDC 6600/7600, CRAY multiply did not have a full 'tree' of multiplication logic--leading to all sorts of "headaches" for
    numerical analysists. Given a 4|u4 set of multiplication gates,
    cray only used 80%-odd of the required macros to get multiplication
    'right'.

    John Savard

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Thu Feb 12 23:38:52 2026
    From Newsgroup: comp.arch

    On Thu, 12 Feb 2026 17:10:45 GMT
    MitchAlsup <user5857@newsgrouper.org.invalid> wrote:

    Michael S <already5chosen@yahoo.com> posted:

    On Thu, 12 Feb 2026 02:04:58 GMT
    MitchAlsup <user5857@newsgrouper.org.invalid> wrote:

    John Levine <johnl@taugh.com> posted:

    According to David Schultz <david.schultz@earthlink.net>:
    This reminds me of when I took a numerical analysis course.
    (The many ways that computer calculations can go wrong and how
    to deal with it.) The professor said that the schools IBM (360
    or 370, ca. 1980) was perfect for the course because of the
    defects in its floating point system. Guard digits and
    rounding sorts of things as near as I can recall.

    The 360's floating point is a famous and somewhat puzzling
    failure, considering how much else they got right.

    It does hex normalization rather than binary. They assumed that
    leading digits are evenly distributed so there's be on average
    one zero bit, but in fact they're geometrically distributed, so
    on average there's two. They got one bit back by making the
    exponent units of 16 rather than 2, but that's still one bit
    gone. It truncated rather than rounding, another bit gone.
    They also truncated rather than rounding results.

    Originally there wre no guard digits which made the results
    comically bad but IBM retrofitted them at great cost to all the installed machines.

    IEEE floating point can be seen as a reaction to that, how do
    you use the same number of bits but get good results.

    VAX got this correct too (the VAX format not the one inherited
    from PDP-11/45; PDP-11/40* FP was worse). VAX FP is arguably as
    good as IEEE 754 with the exception that more IEEE numbers have reciprocals due to the change in exponent bias by 1. {{One can
    STILL argue whether deNormals were a plus or a minus in IEEE}}

    From the perspective of stability of convergence of few common
    algorithms denormals are significant plus.
    From the perspective of minimizing surprises it is also plus. On
    VAX (a > b) does not necessarily guarantee (a-b > 0).
    I wonder in which situation it can be seen as a minus?

    a-b underflows and takes a trap.


    Then, don't take traps on underflow. It's most certainly not a
    default behavior.
    If somebody decided to enable trap then hopefully he knows what he is
    doing.


    There are several things that I don't like about IEEE-754 Standard,
    but none of them related to format of binary numbers.


    CMU had a PDP-11/40 with writable control store 1974. I
    programmed it to do PDP-11/45 FP instead of PDP-11/40 FP as a Jr. project.





    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.arch on Sat Feb 14 16:33:41 2026
    From Newsgroup: comp.arch

    quadi <quadibloc@ca.invalid> wrote:

    I remember having read one article in a computer magazine where someone mentioned that an unfortunate result of the transition from the IBM 7090
    to the IBM System/360 was that a lot of FORTRAN programs that were able to use ordinary real nubers had to be switched over to double precision to yield acceptable results.

    Note that IBM floating point format effectively lost about 3 bits of
    accuracy compared to modern 32-bit format. I am not sure how much they
    lost compared to IBM 7090 but it looks that it was at least 5 bits.
    Assuming that accuracy requirements are uniformly distributed between
    20 and say 60 bits, we can estimate that loss of 5 bits affected about
    25% (or more) of applications that could run using 36-bits. That is
    "a lot" of programs.

    But it does not mean that 36-bits are somewhat magical. Simply, given
    36-bit machine original author had extra motivation to make sure that
    the program run in 36-bit floating point.

    And I noticed that a lot of mathematical tables from the old days went up
    to 10 digit accuracy, and scientific calculators had 10 digit displays, calculating internally to a slightly higher precision.

    There were various accuracies. I have (or maybe had) 100 digits
    logarithm tables. Trouble with high accuracy tables is that with
    naive use k-digit table need 10 to k positions, so quickly becomes
    unmanagably big. Usual way is to have main table and table of correction coefficients to allow easy interpolation. That means that 2k-digit
    table needs 10 to k positions. With k=5 we get reasonably sized
    table. So 10-digits tables are the largest ones that still have
    reasonable size and are easy to use.

    Concernig calculators, in one case internal accuracy was about 14
    digits which agrees with using 8-byte BCD format (with 2 digit
    exponents).

    Early machines that I know about had really varying accuracies,
    starting from 23-bits trough 43-bits.

    AFAICS now that main consumer of FLOPS is graphics. Most of graphics
    would be happy with lower accuracy, but 32-bit is available. There
    is one important thing that needs higher accuracy, namely determining orientation of a triangle require computing sign of determnant of
    a 3 by 3 matrix. Assuming 16 significant bit input data (adequate
    for most graphic uses) determiant needs about 48 significant bits,
    so works with 64-bit doubles and would not work with 36-bit or
    32-bit floating point numbers.

    Second big user is audio. Some audio fans what high accuracy,
    32-bit integer probably is good enough for them, 36-bit floating
    point is probably kind of borderline and 32-bit float is not
    good enough. But it seems that most audio uses lower accuracy,
    in particular 32-bit floats.

    So, the biggest users of floating point probably have no demand
    for 36-bit floats.

    If you look at other uses, then note that GPU makers used to
    offer single precision only GPU-s as main option and fast double
    precision as an expensive extra. So they understand need
    for better precision but are relucant to affer it as a
    standard feature. 36-bit clearly is much more exotic than that.

    BTW. You seem to care about floating point. My personal intersts
    go mainly toward integer calculations. More precisly, I am
    interested in exact results, that is symbolic computation.
    In symbolic computation main way to speed up computation is to
    compute modulo prime number. For this 61-bit self-complement
    aritmetic would be nice (self-complement means arithemtic modulo
    2^n - 1, and 2^61 - 1 is a prime). 64-bit self-complement would
    be of some use, but not so good because 2^64 - 1 has several
    small factors. It would be good to have 3 argument artimetic
    operations of form

    (x op y) mod p

    p could be of restricted form, say 2^64 - a where a could have
    restricted range, for example 10-bit or 16-bit. IIUC for
    addition and subtraction circuit cost of such operation could
    be quite low, main const would be data path to prowide a and
    encoding space. For multiplication cost would be higher, as
    one would need to compute high bits of the product, shift them
    down, multiply by a and subtract, then perform final correction
    as in case of addition. Still, that would probably add about
    20% of cost compared to normal multiplication, so not so high
    cost you need the operation. Considering that alternative
    uses several instructions, some of them expensive ones
    (either division or extra multiplication) such instruction could
    be attractive one.
    --
    Waldek Hebisch
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Sat Feb 14 18:03:07 2026
    From Newsgroup: comp.arch

    MitchAlsup wrote:

    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

    MitchAlsup <user5857@newsgrouper.org.invalid> writes:
    {{One can STILL argue whether
    deNormals were a plus or a minus in IEEE}}

    I am surprised to read that from you, who has always written that
    denormals can be implemented cheaply and efficiently in hardware. The
    additional hardware cost (or the cost of trapping and software
    emulation) has been the only argument against denormals that I ever
    encountered.

    It is only after IEEE 754-2008 came with FMAC that deNormals became
    a low cost addition. {And that has been my point--you seem to have
    forgotten the -2008 part or the argument}

    Not at all forgotten, I just didn't notice that caveat in the current discussion. Mea Culpa!

    Terje
    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From John Levine@johnl@taugh.com to comp.arch on Sat Feb 14 18:12:37 2026
    From Newsgroup: comp.arch

    According to Waldek Hebisch <antispam@fricas.org>:
    quadi <quadibloc@ca.invalid> wrote:

    I remember having read one article in a computer magazine where someone
    mentioned that an unfortunate result of the transition from the IBM 7090
    to the IBM System/360 was that a lot of FORTRAN programs that were able to >> use ordinary real nubers had to be switched over to double precision to
    yield acceptable results.

    Note that IBM floating point format effectively lost about 3 bits of
    accuracy compared to modern 32-bit format. I am not sure how much they
    lost compared to IBM 7090 but it looks that it was at least 5 bits.
    Assuming that accuracy requirements are uniformly distributed between
    20 and say 60 bits, we can estimate that loss of 5 bits affected about
    25% (or more) of applications that could run using 36-bits. That is
    "a lot" of programs.

    But it does not mean that 36-bits are somewhat magical. Simply, given
    36-bit machine original author had extra motivation to make sure that
    the program run in 36-bit floating point.

    It's worse than that, because the 360's floating point had wobbling precision. Depending on the number of leading zero bits in the fraction it could lose anywhere from 1 to 5 bits of precision compared to a rounded binary format. Hence the badness of the result depended more than usual on the input
    data.

    IBM had excellent numerical analysts who wrote the widely used Scientific Subroutine Package which got decent results with 360 arithmetic:

    https://bitsavers.org/pdf/ibm/ssp/GH20-0205-4-SSP-programmers_Aug70.pdf
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From jgd@jgd@cix.co.uk (John Dallman) to comp.arch on Sun Feb 15 14:37:00 2026
    From Newsgroup: comp.arch

    In article <10mq853$106co$1@paganini.bofh.team>, antispam@fricas.org
    (Waldek Hebisch) wrote:

    Note that IBM floating point format effectively lost about 3 bits of
    accuracy compared to modern 32-bit format. I am not sure how much
    they lost compared to IBM 7090 but it looks that it was at least 5
    bits.

    It's somewhat worse than that. Because the mantissa is in whole hex
    digits, accuracy is lost in 4-bit lumps during a calculation. And because normalisation is of whole hex digits, and Barnard's Law applies, accuracy
    and its loss are quite data-dependent.

    But it does not mean that 36-bits are somewhat magical.

    Definitely not.

    Quadi, have your computer architectures included IBM 360 floating point support? There is probably more demand for that than for 36-bit these
    days.

    John
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Feb 15 16:53:10 2026
    From Newsgroup: comp.arch

    John Levine <johnl@taugh.com> schrieb:

    Oh I forgot that using hex exponents meant there was no hidden bit, so
    in practice it lost three bits of precision on every operation. There was
    a great deal of grumbling that people with 709x Fortran codes had to
    make everything double precision to keep getting reasonably good results.

    Hacker's Delight phrases this as

    "When IBM introduced the System/360 computer in 1964, numerical
    analysts were horrified at the loss of precision of single-precision arithmetic."

    and then goes on to show how the distribution of floating point
    values effectively reduces the precision of one quarter of floating
    point values to 21. (There's a name for, somebody's law, but it escapes
    me at the moment).
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.arch on Tue Feb 17 01:16:33 2026
    From Newsgroup: comp.arch

    John Levine <johnl@taugh.com> wrote:
    According to Waldek Hebisch <antispam@fricas.org>:
    quadi <quadibloc@ca.invalid> wrote:

    I remember having read one article in a computer magazine where someone >>> mentioned that an unfortunate result of the transition from the IBM 7090 >>> to the IBM System/360 was that a lot of FORTRAN programs that were able to >>> use ordinary real nubers had to be switched over to double precision to >>> yield acceptable results.

    Note that IBM floating point format effectively lost about 3 bits of >>accuracy compared to modern 32-bit format. I am not sure how much they >>lost compared to IBM 7090 but it looks that it was at least 5 bits. >>Assuming that accuracy requirements are uniformly distributed between
    20 and say 60 bits, we can estimate that loss of 5 bits affected about
    25% (or more) of applications that could run using 36-bits. That is
    "a lot" of programs.

    But it does not mean that 36-bits are somewhat magical. Simply, given >>36-bit machine original author had extra motivation to make sure that
    the program run in 36-bit floating point.

    It's worse than that, because the 360's floating point had wobbling precision.
    Depending on the number of leading zero bits in the fraction it could lose anywhere from 1 to 5 bits of precision compared to a rounded binary format. Hence the badness of the result depended more than usual on the input
    data.

    Well, IBM format had twice the rage of IEEE format, so effectively one
    bit moved from mantissa to exponent. Looking at representable values
    except at low end of the range only nomalized values matter. In
    hex format 15/16 of values are normalized, which is better than
    binary without hidden bit and marginaly worse than binary with hidden
    bit. One hex order of maginitude has 15/16 representable values
    compared to binary without hidden bit and with IEEE range and
    15/32 representable values compared to IEEE. This order of magnitude correspond to 4 binary orders of magnitude, and each binary order
    of magnitude has the same namber of values. So hex block beginning
    with 1 has 1/16 values compared to all bit patterns of given hex order of magnitude, while corresponding IEEE binary orger of magnitude has
    1/2 bit patterns compared to given hex order of magnitude. Which
    gives 8 times bigger density for IEEE binary, that is 3 bits of
    accuracy. IBM truncated, which looses one extra buit, so AFAICS
    worse case for IBM hex is loss of 4 bits. At the high end of
    hex order of magnitude density is the same, but still there is
    one bit loss due to truncation. So actually, loss varies between
    1 to 4 bits. Simple average is 2.5 bit loss, but 3 bits is more
    realistic, because once you loose a bit performing following operations
    with better accuracy will not compensate for loss.

    Note that 1 bit is due to using truncation in arithmetic, which is
    indepedent of format. 1 bit is due to exponent range. Hex makes
    IBM choice of range natural, but if they really wanted they could
    halve exponent range and add one bit to mantissa. So, compared
    to binary machine using truncation, no hidden bit and the same
    range as IBM hex one looses 1 bit in worst case and gains 2 bits
    in best case. So, IBM choice was bad, but at that time other
    made bad choices too.
    --
    Waldek Hebisch
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From John Levine@johnl@taugh.com to comp.arch on Tue Feb 17 01:24:17 2026
    From Newsgroup: comp.arch

    According to Waldek Hebisch <antispam@fricas.org>:
    Well, IBM format had twice the rage of IEEE format, so effectively one
    bit moved from mantissa to exponent. Looking at representable values
    except at low end of the range only nomalized values matter. In
    hex format 15/16 of values are normalized, ...

    That's the same mistake IBM made when they designed the 360's FP.
    Leading fraction digits are geometrically distributed, not linearly.
    (Look at a slide rule to see what I mean.)

    There are on average two leading zeros so only half of the values are normalized.
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.arch on Tue Feb 17 16:21:44 2026
    From Newsgroup: comp.arch

    John Levine <johnl@taugh.com> wrote:
    According to Waldek Hebisch <antispam@fricas.org>:
    Well, IBM format had twice the rage of IEEE format, so effectively one
    bit moved from mantissa to exponent. Looking at representable values >>except at low end of the range only nomalized values matter. In
    hex format 15/16 of values are normalized, ...

    That's the same mistake IBM made when they designed the 360's FP.
    Leading fraction digits are geometrically distributed, not linearly.
    (Look at a slide rule to see what I mean.)

    If you have read und understand what I wrote (and you snipped), you
    would see that I handle distribution of numbers. Hint: the point of
    talking abount hex order of magnitude and binary orders of magnitude
    is to compare both distributions.

    There are on average two leading zeros so only half of the values are normalized.

    No. By _definition_ hex floating point number is normalized if and
    only if its leading hex digit is different than zero. It is easy
    to check that different normalized hex bit pattern produce different
    values.
    --
    Waldek Hebisch
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Tue Feb 17 18:57:18 2026
    From Newsgroup: comp.arch

    On Sun, 15 Feb 2026 14:37:00 +0000, John Dallman wrote:

    Quadi, have your computer architectures included IBM 360 floating point support? There is probably more demand for that than for 36-bit these
    days.

    Yes, in fact they have. The goal there is to facilitate data interchange
    and emulation, not to provide better quality floating-point arithmetic... since, of course, it provides rather the opposite, as has been discussed
    in this thread.

    The original CISC Concertina I architecture went further; it had the goal
    of being able to natively emulate the floating-point of just about every computer ever made.

    John Savard
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Tue Feb 17 19:09:53 2026
    From Newsgroup: comp.arch

    On Tue, 17 Feb 2026 01:24:17 +0000, John Levine wrote:
    According to Waldek Hebisch <antispam@fricas.org>:

    Well, IBM format had twice the rage of IEEE format, so effectively one
    bit moved from mantissa to exponent. Looking at representable values >>except at low end of the range only nomalized values matter. In hex
    format 15/16 of values are normalized, ...

    That's the same mistake IBM made when they designed the 360's FP.
    Leading fraction digits are geometrically distributed, not linearly.
    (Look at a slide rule to see what I mean.)

    This is Benford's Law, and there was an interesting discussion of it in
    the December, 1969 issue of _Scientific American_ - in an article, not in Martin Gardner's _Mathematical Games_ column, as I would have expected.

    John Savard
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From John Levine@johnl@taugh.com to comp.arch on Tue Feb 17 19:20:33 2026
    From Newsgroup: comp.arch

    According to Waldek Hebisch <antispam@fricas.org>:
    There are on average two leading zeros so only half of the values are
    normalized.

    No. By _definition_ hex floating point number is normalized if and
    only if its leading hex digit is different than zero.

    I wrote sloppily. On average a normalized hex FP number has two leading
    zeros so you lose another bit compared to binary, in addition to what you
    lose by no hidden bit and no rounding.
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.arch on Tue Feb 17 19:52:46 2026
    From Newsgroup: comp.arch

    John Levine <johnl@taugh.com> wrote:
    According to Waldek Hebisch <antispam@fricas.org>:
    There are on average two leading zeros so only half of the values are
    normalized.

    No. By _definition_ hex floating point number is normalized if and
    only if its leading hex digit is different than zero.

    I wrote sloppily. On average a normalized hex FP number has two leading zeros so you lose another bit compared to binary, in addition to what you lose by no hidden bit and no rounding.

    That is almost what I wrote, except for that that I sketched proof that
    hex FP looses that one bit _in worst case_, and average is better. In
    case of IBM hex float tradoff between range and mantissa bits leads to
    another bit lost from accuracy, so 4 bits in worst case (but range
    is twice as large as IEEE floats). To summarize: 1 bit loss (compared
    to binary with no hidden bit) due to uneven distribution of hex, 1 bit
    loss due to impossibility to use hidden bit in hex, 1 bit loss due to
    larger range, 1 bit loss due to lack of rounding.
    --
    Waldek Hebisch
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.arch on Tue Feb 17 20:43:35 2026
    From Newsgroup: comp.arch

    quadi <quadibloc@ca.invalid> wrote:
    On Sun, 15 Feb 2026 14:37:00 +0000, John Dallman wrote:

    Quadi, have your computer architectures included IBM 360 floating point
    support? There is probably more demand for that than for 36-bit these
    days.

    Yes, in fact they have. The goal there is to facilitate data interchange
    and emulation, not to provide better quality floating-point arithmetic... since, of course, it provides rather the opposite, as has been discussed
    in this thread.

    The original CISC Concertina I architecture went further; it had the goal
    of being able to natively emulate the floating-point of just about every computer ever made.

    That was probably already written, but since you are revising your
    design it may be worth stating some facts. If you have 64-bit
    machine with convenient access to 32-bit, 16-bit and 8-bit parts
    you can store any number of bits between 4 and 64 wasting at most
    50% of storage and have simple access to each item. So in terms
    of memory use you are trying to avoid this 50% loss. In practice
    loss will be much smaller because:

    - power of 2 quantities are quite popular
    - when program needs large number of items of some other size
    programmer is likely to use packing/unpacking routines, keeping
    data is space efficient packed formant for most time and unpacking
    it for processing
    - machine with fast bit-extract/bit-insert instruction can perform
    most operation quite fast even on packed data

    so possible gain in memory consumption is quite low. Given that
    non-standard memory modules and support chips tend to be much more
    expensive than standard ones, economically attempting such savings
    make no sense.

    Of course, that is also question of speed. The argument above shows
    that loss of speed on access itself can be quite small. So what
    remains is speed of processing data. As long as you do processing
    on power of 2 sized items (that is unusual sizes are limited to
    storage), loss of speed can be modest, basically dedicated 36-bit
    machine probably can do 2 times as much 36-bit float operations
    as standard machine can do 64-bit operations. Practically, this
    loss will be than loss of storage, but still does not look significant
    enough to warrant developement of special machine.

    Things are somewhat different when you want bit-accurate result
    using old formats. Here already one-complement arithmetic has
    significant overhead on two-complement machine. And emulating
    old floating point formats is mare expensive. OTOH, modern
    machines are much faster than old ones. For example modern CPU
    seem to be more than 1000 times faster than real CDC-6600, so
    even slow emulation is likely to be faster than real machine,
    which means that emulated machine can do the work of orignal
    one.

    So to summarize: practical consideration leave rather small space
    for machine using non-power-of-two formats, and it is rather
    unlikely that any design can fit there.

    Of course, there is very good reason to expore non-mainstream
    approaches, namely having fun. But once you realize that
    mainstream designs make their choices for good reasons,
    exploring alternatives gets less funny (at least for me).
    --
    Waldek Hebisch
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Wed Feb 18 00:50:52 2026
    From Newsgroup: comp.arch


    antispam@fricas.org (Waldek Hebisch) posted:

    quadi <quadibloc@ca.invalid> wrote:
    On Sun, 15 Feb 2026 14:37:00 +0000, John Dallman wrote:

    Quadi, have your computer architectures included IBM 360 floating point
    support? There is probably more demand for that than for 36-bit these
    days.

    Yes, in fact they have. The goal there is to facilitate data interchange and emulation, not to provide better quality floating-point arithmetic... since, of course, it provides rather the opposite, as has been discussed in this thread.

    The original CISC Concertina I architecture went further; it had the goal of being able to natively emulate the floating-point of just about every computer ever made.

    That was probably already written, but since you are revising your
    design it may be worth stating some facts. If you have 64-bit
    machine with convenient access to 32-bit, 16-bit and 8-bit parts
    you can store any number of bits between 4 and 64 wasting at most
    50% of storage and have simple access to each item. So in terms
    of memory use you are trying to avoid this 50% loss. In practice
    loss will be much smaller because:

    - power of 2 quantities are quite popular
    - when program needs large number of items of some other size
    programmer is likely to use packing/unpacking routines, keeping
    data is space efficient packed formant for most time and unpacking
    it for processing
    - machine with fast bit-extract/bit-insert instruction can perform
    most operation quite fast even on packed data

    so possible gain in memory consumption is quite low. Given that
    non-standard memory modules and support chips tend to be much more
    expensive than standard ones, economically attempting such savings
    make no sense.

    Of course, that is also question of speed. The argument above shows
    that loss of speed on access itself can be quite small. So what
    remains is speed of processing data. As long as you do processing
    on power of 2 sized items (that is unusual sizes are limited to
    storage), loss of speed can be modest, basically dedicated 36-bit
    machine probably can do 2 times as much 36-bit float operations
    as standard machine can do 64-bit operations. Practically, this
    loss will be than loss of storage, but still does not look significant
    enough to warrant developement of special machine.

    Things are somewhat different when you want bit-accurate result
    using old formats. Here already one-complement arithmetic has
    significant overhead on two-complement machine.

    The only useful difference in 1-s complement and 2-s complement in
    ADD is the end around carry, and the adder will have the same number
    of gates and the same gates of delay. So, in theory, one could make
    a {1-s or 2-s} complement adder at the cost of 1 gate of delay and
    one logic gate.

    And emulating
    old floating point formats is mare expensive. OTOH, modern
    machines are much faster than old ones. For example modern CPU
    seem to be more than 1000 times faster than real CDC-6600, so
    even slow emulation is likely to be faster than real machine,
    which means that emulated machine can do the work of orignal
    one.

    Access to 64|u64->128 is the key unit of processing.

    So to summarize: practical consideration leave rather small space
    for machine using non-power-of-two formats, and it is rather
    unlikely that any design can fit there.

    Of course, there is very good reason to expore non-mainstream
    approaches, namely having fun. But once you realize that
    mainstream designs make their choices for good reasons,
    exploring alternatives gets less funny (at least for me).

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Wed Feb 18 08:52:27 2026
    From Newsgroup: comp.arch

    On Tue, 17 Feb 2026 20:43:35 +0000, Waldek Hebisch wrote:

    But once you realize that mainstream
    designs make their choices for good reasons,
    exploring alternatives gets less funny (at least for me).

    At one time, back in the past, the mainstream computers had word lengths
    such as 12 bits, 18 bits, 24 bits, 30 bits, 36 bits, 48 bits, 60 bits...
    all multiples of 6 bits.

    The reason for this was that computers needed a character set with
    letters, numbers, and various special characters - and a six-bit
    character, with 64 possibilities, was adequate for that.

    As technology advanced, and computer power became cheaper, it became
    possible to think of using computers for more applications. Using an eight-
    bit character allowed the use of lower-case characters, getting rid of a limitation of the older computers that could possibly become annoying in
    the future. Of course, a 7-bit character would also be enough for that -
    and at least one company, ASI, actually made computers with word lengths
    that were multiples of 7 bits.

    Even before System/360, IBM made a computer built around a 64-bit word,
    the STRETCH. It was intended to be a very powerful scientific computer,
    but it also had the very rare feature of bit addressing - which a power-of-
    two word length made much more practical.

    Hardly any architectures provide bit addressing these days, though.

    None the less, a character set that includes lower-case is a good reason. Since a 36-bit word works better with 9-bit characters instead of 6-bit characters being addressable, nothing is really lost by going to 36 bits.

    Of course, there's another good reason for sticking with 32-bit or 64-bit designs: because that's what everyone else is using, standard memory
    modules have data buses corresponding to such widths, possibly with extra
    bits for ECC.

    To me, those don't seem to be enough "good reasons" to absolutely preclude different word lengths. But there would definitely have to be a real
    benefit to justify the cost and effort to use a different length. It seems
    to me there is a real benefit, in that the available data sizes in the 32-
    bit world aren't optimized to the needs of scientific computation.

    But it's quite correct to feel this real benefit isn't enough to make
    machines oriented around the 36-bit word length likely.

    John Savard

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Robert Finch@robfi680@gmail.com to comp.arch on Wed Feb 18 08:40:38 2026
    From Newsgroup: comp.arch

    On 2026-02-18 3:52 a.m., quadi wrote:
    On Tue, 17 Feb 2026 20:43:35 +0000, Waldek Hebisch wrote:

    But once you realize that mainstream
    designs make their choices for good reasons,
    exploring alternatives gets less funny (at least for me).

    At one time, back in the past, the mainstream computers had word lengths
    such as 12 bits, 18 bits, 24 bits, 30 bits, 36 bits, 48 bits, 60 bits...
    all multiples of 6 bits.

    The reason for this was that computers needed a character set with
    letters, numbers, and various special characters - and a six-bit
    character, with 64 possibilities, was adequate for that.

    As technology advanced, and computer power became cheaper, it became
    possible to think of using computers for more applications. Using an eight- bit character allowed the use of lower-case characters, getting rid of a limitation of the older computers that could possibly become annoying in
    the future. Of course, a 7-bit character would also be enough for that -
    and at least one company, ASI, actually made computers with word lengths
    that were multiples of 7 bits.

    Even before System/360, IBM made a computer built around a 64-bit word,
    the STRETCH. It was intended to be a very powerful scientific computer,
    but it also had the very rare feature of bit addressing - which a power-of- two word length made much more practical.

    Hardly any architectures provide bit addressing these days, though.

    None the less, a character set that includes lower-case is a good reason. Since a 36-bit word works better with 9-bit characters instead of 6-bit characters being addressable, nothing is really lost by going to 36 bits.

    Of course, there's another good reason for sticking with 32-bit or 64-bit designs: because that's what everyone else is using, standard memory
    modules have data buses corresponding to such widths, possibly with extra bits for ECC.

    To me, those don't seem to be enough "good reasons" to absolutely preclude different word lengths. But there would definitely have to be a real
    benefit to justify the cost and effort to use a different length. It seems
    to me there is a real benefit, in that the available data sizes in the 32- bit world aren't optimized to the needs of scientific computation.

    But it's quite correct to feel this real benefit isn't enough to make machines oriented around the 36-bit word length likely.

    John Savard

    Maybe we should switch to 18-bit bytes to support UNICODE.

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Thu Feb 19 02:10:07 2026
    From Newsgroup: comp.arch

    On 2/12/2026 11:09 AM, MitchAlsup wrote:

    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

    MitchAlsup <user5857@newsgrouper.org.invalid> writes:
    {{One can STILL argue whether
    deNormals were a plus or a minus in IEEE}}

    I am surprised to read that from you, who has always written that
    denormals can be implemented cheaply and efficiently in hardware. The
    additional hardware cost (or the cost of trapping and software
    emulation) has been the only argument against denormals that I ever
    encountered.

    It is only after IEEE 754-2008 came with FMAC that deNormals became
    a low cost addition. {And that has been my point--you seem to have
    forgotten the -2008 part or the argument}

    And, can note, this is assuming that one actually pays the cost of
    native hardware FMAC.


    Well, and the secondary irony that it is mainly cost-added for FMUL,
    whereas FADD almost invariably has the necessary support hardware already.

    But:
    FMUL is expensive operation + cheap normalizer (if no denormals);
    FADD is cheap operation with expensive normalizer.

    FMAC then is gluing the costs of the two units together, but:
    With roughly the latency of both;
    The need to be significantly wider internally to deal with some cases.

    So, FMAC is a single unit that costs more than both units taken
    separately, and with a higher latency.



    FMAC does suddenly get a bit cheaper if its scope is limited to
    FP8*FP8+FP16, but this operation is a bit niche.


    This one makes a lot of sense for NN's, but haven't gotten my NN tech
    working good enough to make a strong use-case for it.

    Where, in terms of algorithmic or behavioral complexity relative to computational efficiency, NN's are significantly behind what is possible
    with genetic algorithms or genetic programming.


    So, for computational efficiency of the result:
    Hand-written native code, best efficiency;
    Genetic algorithm, moderate efficiency;
    Neural Net, very inefficient.

    The merit of NNs could then be if one could make them adaptive in some practical way:
    Native code: No adaptation apart from specific algos;
    Genetic algorithms: Only when running the evolver, static otherwise;
    NN's: Could be made adaptable in theory, usually fixed in practice.

    And, adaptation process:
    Native: None, maybe manual fiddling by programmer;
    Genetic algo: Initially very slow, gradually converges on answer;
    NNs, via generic algorithm: Slow, but converges toward an answer;
    NNs, via backprop: Rapid adaptation initially, then hits a plateau.

    Backprop is seemingly prone to get stuck at a non-optimal solution, and
    then is hard pressed to make any further progress. Seemingly isn't
    really able to "fix" any obvious structural defects once it hits a
    plateau, but can sometimes jump up or down between various nearby
    options (when obvious suboptimal patterns persist).

    Some tricks that work with GA-NN's don't really work with backprop, and
    my initial attempts to glue GA handling onto backprop have not been
    effective. Also it seems to need at least FP16 weights for training to
    work effectively (though, one other option being FP8 with a bias
    counter; but this is effectively analogous to using a non-standard
    S.E4.M11 format).


    Seemingly, my own efforts are getting stuck at the level of very
    inefficiently solving very mundane issues, nowhere near the success
    being seen by more mainstream efforts.

    Nor, as of yet, even anything particularly interesting...



    Had started making some progress in other types of areas though, for
    example:
    Figured out a practical way to get below 16kbps for audio...

    By using 8kHz ADPCM and then using lookup table and reversed LZ search trickery to make the audio more LZ compressible (without changing the
    storage format).

    Or, basically, ADPCM encoding strategy like:
    Lookup a match for the last 4 bytes;
    Look for the longest backwards match (last N bytes);
    Evaluate if the next byte for pattern is within an error limit;
    Select based on combination of error and length
    Longer matches permit more error than shorter ones.
    Check a pattern table,
    seeing if anything is within an acceptable error limit;
    Use pattern if so.
    Else:
    Figure out best-match for next 6 samples,
    using this to encode next 4 samples (1 byte).

    Was able to get around a 20-30% reduction in bitrate, or around 12 kbps typical, before loss of audio quality becomes unacceptable (starts
    breaking down in obvious ways).


    Did version for 4-bit ADPCM, which can get a roughly similar reduction,
    or around 24 kbps, though trying to push it much lower makes 2-bit ADPCM preferable.

    A slightly higher reduction rate is possible if the baseline sample-rate
    is increased to 16kHz, but still doesn't get as low as when using 8 kHz.



    Note that it is possible to just use a pattern table directly to give an equivalent of 8 kbps ADPCM (each byte encoding an index into an 8-sample table, which is then decoded as 2-bit ADPCM), but the audio quality is unacceptably poor (for much of any use-case).


    Though, all this was mostly dusting off an experiment from last year,
    and putting it to use in my packaging tool (inside BGBCC).

    Mostly it is a case of:
    It is "good enough" to at least allow for optional super-compression of
    ADPCM without breaking the existing decoders.


    ...




    - anton

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Thu Feb 19 17:30:50 2026
    From Newsgroup: comp.arch


    BGB <cr88192@gmail.com> posted:

    On 2/12/2026 11:09 AM, MitchAlsup wrote:

    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

    MitchAlsup <user5857@newsgrouper.org.invalid> writes:
    {{One can STILL argue whether
    deNormals were a plus or a minus in IEEE}}

    I am surprised to read that from you, who has always written that
    denormals can be implemented cheaply and efficiently in hardware. The
    additional hardware cost (or the cost of trapping and software
    emulation) has been the only argument against denormals that I ever
    encountered.

    It is only after IEEE 754-2008 came with FMAC that deNormals became
    a low cost addition. {And that has been my point--you seem to have forgotten the -2008 part or the argument}

    And, can note, this is assuming that one actually pays the cost of
    native hardware FMAC.

    It is exceedingly difficult to get an IEEE quality rounded result if
    not done in HW.

    Well, and the secondary irony that it is mainly cost-added for FMUL,
    whereas FADD almost invariably has the necessary support hardware already.

    But:
    FMUL is expensive operation + cheap normalizer (if no denormals);
    FADD is cheap operation with expensive normalizer.

    FMAC then is gluing the costs of the two units together, but:
    With roughly the latency of both;
    The need to be significantly wider internally to deal with some cases.

    The add stage after the multiplication tree is <essentially> 2|u as wide.
    FMUL needs a 108-bit 2-input adder
    FMAC needs a 160-bit 3-input adder and a 52-bit incrementor.
    The multiplication tree is the same, normalizer is larger.


    So, FMAC is a single unit that costs more than both units taken
    separately, and with a higher latency.

    Prior RISC processors did FMUL in 3-4 cycles (mostly 4).
    Later RISC processors and x86 did FMAC in 4-cycles (occasionally 5).

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Thu Feb 19 20:15:29 2026
    From Newsgroup: comp.arch

    On Thu, 19 Feb 2026 17:30:50 GMT
    MitchAlsup <user5857@newsgrouper.org.invalid> wrote:
    BGB <cr88192@gmail.com> posted:

    On 2/12/2026 11:09 AM, MitchAlsup wrote:

    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

    MitchAlsup <user5857@newsgrouper.org.invalid> writes:
    {{One can STILL argue whether
    deNormals were a plus or a minus in IEEE}}

    I am surprised to read that from you, who has always written that
    denormals can be implemented cheaply and efficiently in
    hardware. The additional hardware cost (or the cost of trapping
    and software emulation) has been the only argument against
    denormals that I ever encountered.

    It is only after IEEE 754-2008 came with FMAC that deNormals
    became a low cost addition. {And that has been my point--you seem
    to have forgotten the -2008 part or the argument}

    And, can note, this is assuming that one actually pays the cost of
    native hardware FMAC.

    It is exceedingly difficult to get an IEEE quality rounded result if
    not done in HW.

    Well, and the secondary irony that it is mainly cost-added for
    FMUL, whereas FADD almost invariably has the necessary support
    hardware already.

    But:
    FMUL is expensive operation + cheap normalizer (if no denormals);
    FADD is cheap operation with expensive normalizer.

    FMAC then is gluing the costs of the two units together, but:
    With roughly the latency of both;
    The need to be significantly wider internally to deal with some
    cases.

    The add stage after the multiplication tree is <essentially> 2- as
    wide. FMUL needs a 108-bit 2-input adder
    FMAC needs a 160-bit 3-input adder and a 52-bit incrementor.
    The multiplication tree is the same, normalizer is larger.


    So, FMAC is a single unit that costs more than both units taken separately, and with a higher latency.

    Prior RISC processors did FMUL in 3-4 cycles (mostly 4).
    Later RISC processors and x86 did FMAC in 4-cycles (occasionally 5).

    Arm Inc. application processors cores have FMAC latency=4 for
    multiplicands, but 2 for accumulator.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Thu Feb 19 18:49:22 2026
    From Newsgroup: comp.arch

    John Dallman <jgd@cix.co.uk> schrieb:

    Quadi, have your computer architectures included IBM 360 floating point support? There is probably more demand for that than for 36-bit these
    days.

    It has been quite a few decades since the last large-scale
    scientific calculations in IBM hex float; I believe it must have
    been the Japanese vector computers (one of which I worked on in
    the mid to late 1990s). It is probably safe to say that any
    hex float these days is embedded firmly in the z ecosystem.

    Since every laptop these days has more performance than the old
    vector computers, I very much doubt that there is significant data
    saved in that format. Same thing for VAX floating point formats.

    Bit vs. little endian data could is more recent. Around 20 years
    ago, I wrote code to convert between big- and little endian data
    for gfortran. This is also quite irrelevant today.

    The last conversion issue I had a hand in was for IBM's "double
    double" 128-bit real. Now POWER supports this as IEEE in hardware
    (if not very fast), but this ABI change is very painful.

    There could, however, be a niche for 36-bit reals - graphics cards.
    I have recently discovered a GPU solver in a commercial package that
    I use, and it has an option for using 32-bit reals. 36-bit reals
    could extend the usefulness of such a solver.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Thu Feb 19 19:55:40 2026
    From Newsgroup: comp.arch


    Michael S <already5chosen@yahoo.com> posted:

    On Thu, 19 Feb 2026 17:30:50 GMT
    MitchAlsup <user5857@newsgrouper.org.invalid> wrote:

    BGB <cr88192@gmail.com> posted:

    On 2/12/2026 11:09 AM, MitchAlsup wrote:

    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

    MitchAlsup <user5857@newsgrouper.org.invalid> writes:
    {{One can STILL argue whether
    deNormals were a plus or a minus in IEEE}}

    I am surprised to read that from you, who has always written that
    denormals can be implemented cheaply and efficiently in
    hardware. The additional hardware cost (or the cost of trapping
    and software emulation) has been the only argument against
    denormals that I ever encountered.

    It is only after IEEE 754-2008 came with FMAC that deNormals
    became a low cost addition. {And that has been my point--you seem
    to have forgotten the -2008 part or the argument}

    And, can note, this is assuming that one actually pays the cost of native hardware FMAC.

    It is exceedingly difficult to get an IEEE quality rounded result if
    not done in HW.

    Well, and the secondary irony that it is mainly cost-added for
    FMUL, whereas FADD almost invariably has the necessary support
    hardware already.

    But:
    FMUL is expensive operation + cheap normalizer (if no denormals);
    FADD is cheap operation with expensive normalizer.

    FMAC then is gluing the costs of the two units together, but:
    With roughly the latency of both;
    The need to be significantly wider internally to deal with some
    cases.

    The add stage after the multiplication tree is <essentially> 2|u as
    wide. FMUL needs a 108-bit 2-input adder
    FMAC needs a 160-bit 3-input adder and a 52-bit incrementor.
    The multiplication tree is the same, normalizer is larger.


    So, FMAC is a single unit that costs more than both units taken separately, and with a higher latency.

    Prior RISC processors did FMUL in 3-4 cycles (mostly 4).
    Later RISC processors and x86 did FMAC in 4-cycles (occasionally 5).


    Arm Inc. application processors cores have FMAC latency=4 for
    multiplicands, but 2 for accumulator.

    Thank you for that tid-bit of information.

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From quadi@quadibloc@ca.invalid to comp.arch on Fri Feb 20 08:14:46 2026
    From Newsgroup: comp.arch

    On Wed, 18 Feb 2026 08:40:38 -0500, Robert Finch wrote:

    Maybe we should switch to 18-bit bytes to support UNICODE.

    It's true that Unicode has expanded beyond the old 16-bit Basic
    Multilingual Plane. But while all currently-defined characters would fit
    in 18 bits, they envisage enlarging Unicode to as many as 31 bits; that is what UTF-8 supports.

    If 9-bit bytes are used for simple applications, it certainly will be true that 18-bit halfwords will be an available data type.

    John Savard
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Fri Feb 20 05:08:28 2026
    From Newsgroup: comp.arch

    On 2/19/2026 11:30 AM, MitchAlsup wrote:

    BGB <cr88192@gmail.com> posted:

    On 2/12/2026 11:09 AM, MitchAlsup wrote:

    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

    MitchAlsup <user5857@newsgrouper.org.invalid> writes:
    {{One can STILL argue whether
    deNormals were a plus or a minus in IEEE}}

    I am surprised to read that from you, who has always written that
    denormals can be implemented cheaply and efficiently in hardware. The >>>> additional hardware cost (or the cost of trapping and software
    emulation) has been the only argument against denormals that I ever
    encountered.

    It is only after IEEE 754-2008 came with FMAC that deNormals became
    a low cost addition. {And that has been my point--you seem to have
    forgotten the -2008 part or the argument}

    And, can note, this is assuming that one actually pays the cost of
    native hardware FMAC.

    It is exceedingly difficult to get an IEEE quality rounded result if
    not done in HW.

    Likely depends.


    Can use the trick of bumping to the next size up and use that for
    computation.

    So, for Binary32 compute it as Binary64, and for Binary64 compute it as Binary128.

    Can special case the "Binary64 * Binary64 => Binary128" case to save
    cost over using a native Binary128 multiply.

    For Binary128 multiply, can also make sense to detect and special-case
    the "low order bits are zero" case:
    If low-order bits are zero, can use a multiply that only produces the
    high 128 bits;
    Vs a transient 128*128=>256 bit, and then needing to round.



    Relative cost is lower if one is already paying the cost of a trap
    handler or similar (except that if the ISA supports it, you really don't
    want the compiler to combine these operations).

    So, one can maybe document if using a compiler like GCC to use "-fp-contract=off -fno-fdiv", ...



    Well, and the secondary irony that it is mainly cost-added for FMUL,
    whereas FADD almost invariably has the necessary support hardware already. >>
    But:
    FMUL is expensive operation + cheap normalizer (if no denormals);
    FADD is cheap operation with expensive normalizer.

    FMAC then is gluing the costs of the two units together, but:
    With roughly the latency of both;
    The need to be significantly wider internally to deal with some cases.

    The add stage after the multiplication tree is <essentially> 2|u as wide. FMUL needs a 108-bit 2-input adder
    FMAC needs a 160-bit 3-input adder and a 52-bit incrementor.
    The multiplication tree is the same, normalizer is larger.


    A 160-bit 3-way adder happening "quickly" is still kinda asking a lot though...


    Though, granted, the first step is deciding to decide full-width
    multiply, and not discard the low order results.

    Granted discarding the low results reduces rounding accuracy, but a way
    to fake full IEEE rounding was to detect this case and have the FMUL
    raise a fault (similar to denormal/underflow handling). Though, does
    mean there is a performance penalty if multiplying numbers where the
    low-order bits in both values are non-zero.


    In my ISA, the exact behavior depends on instruction an rounding mode.
    In the RISC-V mode, partly based on instruction rounding mode and and
    flags settings.

    For reasons though, can't safely enable full IEEE emulation until after setting up virtual memory and similar though.


    The handling of the RISC-V F/D extensions was non-standard in my case,
    though not in a way that effects GCC output (it seems to exclusively use
    the DYN rounding mode in instructions, assuming the rounding mode to be handled via CSR's). Also, ironically and contrasting with the seeming
    design of these extensions, these registers are so rarely accessed in
    practice that it seemed most sensible to use trap-and-emulate for the CSRs.


    Granted, there are limits to corner cutting:
    If a design does not produce exact results for cases where it is trivial
    to verify that an exact answer exists in cases that do not require
    rounding, IMO this is below the minimum limit for a usable general
    purpose FPU.



    So, FMAC is a single unit that costs more than both units taken
    separately, and with a higher latency.

    Prior RISC processors did FMUL in 3-4 cycles (mostly 4).
    Later RISC processors and x86 did FMAC in 4-cycles (occasionally 5).


    Trying to push the latency down would be pretty bad for timing, unless
    there is some cheaper way to implement FPUs that I am not aware of.

    In my case:
    FMADD.D, RM=DYN: Trap
    FMADD.D, RM=RNE, 10-cycle, double-rounded (non-standard)
    FMADD.S, RM=DYN, 10-cycle (mimics single rounding, *)
    *: Happens internally at Binary64 precision.

    It could be possible to handle FMADD.D RM=DYN the same way as RNE
    internally, but then trap if the inputs would potentially give a
    non-IEEE result. Though, for now, trapping is the cheaper solution in
    terms of HW cost.




    The one exception is FP8*FP8 + FP16, but mostly because it is possible
    to do FP8*FP8 under 1 cycle.

    But, still not free here; and overly niche. Ended up going with a
    cheaper option of simply having an SIMD FP8*FP8=>FP16 multiply op (which
    still ends up as a 2-cycle op, because...).


    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Fri Feb 20 15:22:05 2026
    From Newsgroup: comp.arch

    BGB wrote:
    On 2/19/2026 11:30 AM, MitchAlsup wrote:

    BGB <cr88192@gmail.com> posted:

    On 2/12/2026 11:09 AM, MitchAlsup wrote:

    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

    MitchAlsup <user5857@newsgrouper.org.invalid> writes:
    {{One can STILL argue whether
    deNormals were a plus or a minus in IEEE}}

    I am surprised to read that from you, who has always written that
    denormals can be implemented cheaply and efficiently in hardware.-a The >>>>> additional hardware cost (or the cost of trapping and software
    emulation) has been the only argument against denormals that I ever>>>>> encountered.

    It is only after IEEE 754-2008 came with FMAC that deNormals became
    a low cost addition. {And that has been my point--you seem to have
    forgotten the -2008 part or the argument}

    And, can note, this is assuming that one actually pays the cost of
    native hardware FMAC.

    It is exceedingly difficult to get an IEEE quality rounded result if
    not done in HW.

    Likely depends.


    Can use the trick of bumping to the next size up and use that for computation.

    So, for Binary32 compute it as Binary64, and for Binary64 compute it as Binary128.
    Neither of those work!
    I believed this to be true but I was shown the error of my thinking by
    more knowledgable people in the 754 working group. I.e. they had a very simple/small example where doing the calculation in the next higher
    precision would still cause double rounding errors.
    Also note that Mitch have stated multiple times that you need ~160
    mantissa bits during FMAC double calculations.
    Terje
    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Fri Feb 20 15:26:24 2026
    From Newsgroup: comp.arch

    On 2/20/2026 8:22 AM, Terje Mathisen wrote:
    BGB wrote:
    On 2/19/2026 11:30 AM, MitchAlsup wrote:

    BGB <cr88192@gmail.com> posted:

    On 2/12/2026 11:09 AM, MitchAlsup wrote:

    anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:

    MitchAlsup <user5857@newsgrouper.org.invalid> writes:
    {{One can STILL argue whether
    deNormals were a plus or a minus in IEEE}}

    I am surprised to read that from you, who has always written that
    denormals can be implemented cheaply and efficiently in hardware. >>>>>> The
    additional hardware cost (or the cost of trapping and software
    emulation) has been the only argument against denormals that I ever >>>>>> encountered.

    It is only after IEEE 754-2008 came with FMAC that deNormals became
    a low cost addition. {And that has been my point--you seem to have
    forgotten the -2008 part or the argument}

    And, can note, this is assuming that one actually pays the cost of
    native hardware FMAC.

    It is exceedingly difficult to get an IEEE quality rounded result if
    not done in HW.

    Likely depends.


    Can use the trick of bumping to the next size up and use that for
    computation.

    So, for Binary32 compute it as Binary64, and for Binary64 compute it
    as Binary128.

    Neither of those work!

    I believed this to be true but I was shown the error of my thinking by
    more knowledgable people in the 754 working group. I.e. they had a very simple/small example where doing the calculation in the next higher precision would still cause double rounding errors.

    Also note that Mitch have stated multiple times that you need ~160
    mantissa bits during FMAC double calculations.


    Could look into this, next option being to use a makeshift 192-bit FP
    format with a 176 bit mantissa (likely cheaper than going all the way to
    224 bits).

    This is slow/annoying, but not really likely a "hard" problem (when one
    is already doing this stuff in software in a trap handler).


    So, potentially:
    Binary32 -> FP96 (truncated Binary128, still stored as Binary128)
    Binary64 -> FP192 (extended Binary128)
    Binary128 -> FP384 (likewise)
    Big/ugly, but no one says this needs to be fast...


    Might end up on a sort of "TODO list"...


    In any case, actual native hardware support for single-rounded FMA is
    unlikely to happen in my case.


    ...

    Terje


    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Thu Feb 12 14:28:27 2026
    From Newsgroup: comp.arch

    MitchAlsup wrote:

    John Levine <johnl@taugh.com> posted:

    According to David Schultz <david.schultz@earthlink.net>:
    This reminds me of when I took a numerical analysis course. (The many
    ways that computer calculations can go wrong and how to deal with it.)
    The professor said that the schools IBM (360 or 370, ca. 1980) was
    perfect for the course because of the defects in its floating point
    system. Guard digits and rounding sorts of things as near as I can recall. >>
    The 360's floating point is a famous and somewhat puzzling failure, considering
    how much else they got right.

    It does hex normalization rather than binary. They assumed that
    leading digits are evenly distributed so there's be on average one
    zero bit, but in fact they're geometrically distributed, so on average
    there's two. They got one bit back by making the exponent units of 16
    rather than 2, but that's still one bit gone. It truncated rather than
    rounding, another bit gone. They also truncated rather than rounding
    results.

    Originally there wre no guard digits which made the results comically
    bad but IBM retrofitted them at great cost to all the installed machines.

    IEEE floating point can be seen as a reaction to that, how do you use
    the same number of bits but get good results.

    VAX got this correct too (the VAX format not the one inherited from PDP-11/45; PDP-11/40* FP was worse). VAX FP is arguably as good as
    IEEE 754 with the exception that more IEEE numbers have reciprocals
    due to the change in exponent bias by 1. {{One can STILL argue whether deNormals were a plus or a minus in IEEE}}

    You _can_ argue about that, but as you've told us on numerous occations,
    it doesn't actually cost any clock cycles, and there are a few
    zer-seeking algorithms which would not be trivially stable without
    subnormals.

    Finally, having subnormals meens that any possible bit pattern between negative NaN and positive NaN have a meaningful real value (or +/- Inf),
    so you can compare them, increment them etc without worry.

    Terje
    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"
    --- Synchronet 3.21d-Linux NewsLink 1.2