• Re: Software proofs, was Are there different programming languages that are compiled to the same intermediate language?

    From Spiros Bousbouras@spibou@gmail.com to comp.compilers on Sun Feb 5 15:14:02 2023
    From Newsgroup: comp.compilers

    On 01 Feb 2023 08:07:24 GMT
    arnold@freefriends.org (Aharon Robbins) wrote:
    I've never understood this. Isn't there a chicken and egg problem?
    How do we know that the theorem prover is correct and bug free?

    We prove it by hand ? I don't know if anyone has done this for the
    provers mentioned but it feels like a feasible task and when it's done ,
    it's done i.e. a theorem prover is not the kind of software which
    continually gets updated.

    [It's a perfectly reasonable question. Alan Perlis, who was my thesis advisor, never saw any reason to believe that a thousand line proof
    was any more likely to be bug-free than a thousand line program.
    -John]

    It depends on what we mean by "bug" , in mathematics we wouldn't say "bug"
    but "error".

    First you have *typos* which is situations where one was meaning to , for example , write i and they wrote j instead. It may be that mathematics proofs have more of those than computer programmes because in computer programmes some will be caught by the compiler like for example if j has
    not been declared. But the essential difference is that a typo is harmless in mathematics , the reader of the proof will either notice and understand what was intended or even unconsciously replace what is written with what was intended. But in computer programmes this kind of typo bug can be
    catastrophic. Hence in mathematics we wouldn't say that a proof has errors
    just because it has a typo , we would say it has a typo.

    For other errors , logic errors , I would guess that computer programmes will have more errors/bugs than mathematical proofs. This assertion is based on personal experience and instinct to some extent , having written both many programmes and many proofs. But I will attempt to give some of what I think
    are reasons :

    1:
    In mathematics you work with a formalism right from the start but in computer programmes often one has an intuitive understanding of what the program is meant to achieve and a working programme may be considered to have a bug when its behaviour turns out not to match closely enough that intuitive understanding. The difference in the level of formalism is not just about
    the programmes but about the programming languages too. In standards of programming languages I've read , the language is not described in precise mathematical detail. The grammar usually is but not other parts of the language.

    For example , from the C standard :
    The result of the binary + operator is the sum of the operands.

    Even just for integer arguments , this is nowhere a complete definition but
    to piece together a complete definition one must combine information
    scattered in faraway parts of the standard. This would never occur in a mathematics books or paper , the complete definition of a function or
    operator or whatever (including the precise set for which it is defined)
    would appear in one place. This sort of thing where one has to piece
    together information from many parts of the document to arrive at a complete definition , seems to be common in computer language standards and I have
    no idea why it happens , I don't think anything about specifying a programming language forces things to be done like this. But the result is that programmers will often work with a mix of precise specification and intuitive understanding of the language they're using.

    2:
    In mathematics a correct proof *is* the goal but in computer programming some set of behaviours is the goal and the set of behaviours may not even be the ultimate goal but the desire to make a profit is. For example

    en.wikipedia.org/wiki/Pac-Man :
    Due to an integer overflow, the 256th level loads improperly, rendering
    it impossible to complete.

    But this didn't prevent the game from being highly successful. "Donkey Kong" also had a bug which prevented progress from some point onwards.

    What is the goal also influences the kind of people who specialise in computer programming vs those who write mathematical proofs. My impression is that there exist computer programmers who are of the "trial and error" type who write some code which seems to approximate what they have in mind , test it , write some more code and so forth. Eventually they arrive at something which seems to match their mental model and they consider themselves done. They wouldn't have any interest and possibly ability to write a proof of correctness. As a practical
    matter , they may be perfectly productive and write code which works for the desired cases.

    3:
    There is a kind of bug/error for which there's no mathematical analogue :

    From "Expert C Programming" by Peter van der Linden :

    One release of an ANSI C compiler had an interesting bug. The symbol
    table was accessed by a hash function that computed a likely place from
    which to start a serial search. The computation was copiously commented,
    even describing the book the algorithm came from. Unfortunately, the
    programmer omitted to close the comment! The entire hash initial value
    calculation thus fell inside the continuing comment,


    [...]

    The entire calculation of the initial hash value was omitted, so the
    table was always searched serially from the zeroth element! As a result,
    symbol table lookup (a very frequent operation in a compiler) was much
    slower than it should have been. This was never found during testing
    because it only affected the speed of a lookup, not the result. This is
    why some compilers complain if they notice an opening comment in a
    comment string. The error was eventually found in the course of looking
    for a different bug. Inserting the closing comment resulted in an
    immediate compilation speedup of 15%!

    [The history of formal description of programming languages is not
    encouraging. Back in the 1970s the ANSI PL/I standard was defined in
    terms of formal operations on trees, but even so it had bugs/errors.
    Also, per your comment on addition, in a lot of cases the practical
    definition turns into whatever the underlying machine's instruction
    does. Until recently the C standard was deliberately unclear about
    whether arithmetic was ones- or twos-complement. -John]
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.compilers on Sun Feb 5 16:14:57 2023
    From Newsgroup: comp.compilers

    [...]
    [The history of formal description of programming languages is not encouraging. Back in the 1970s the ANSI PL/I standard was defined in
    terms of formal operations on trees, but even so it had bugs/errors.
    Also, per your comment on addition, in a lot of cases the practical definition turns into whatever the underlying machine's instruction
    does. Until recently the C standard was deliberately unclear about
    whether arithmetic was ones- or twos-complement. -John]

    The C standard still doesn't mandate two's-complement.

    The 1990 edition of the C standard was vague about integer
    representations, though it did impose some requirements. It required
    a "pure binary" representation, and it required, for example,
    a value that is within the range of both int and unsigned int to
    have the same representation in both. That excluded some of the
    more exotic possibilites. (It mentioned 2's complement a few times,
    but only as an example.)

    The 1999 standard explicitly required one of three representations:
    sign and magnitude, two's complement, or one's complement. It also
    explicitly permitted padding bits (bits that don't contribute
    to the value, and that can either be ignored or create trap
    representations).

    The 2011 standard corrected the spelling of "ones' complement",
    but otherwise didn't change anything significant. (The 2017 edition
    was a minor update.)

    The upcoming 2023 standard mandates two's complement. And it
    requires INT_MIN to be exactly -INT_MAX-1; previously INT_MIN could
    be equal to -INT_MAX, and -INT_MAX-1 could be a trap representation.
    It still permits padding bits and trap representations.

    Note that twos's complement *representation* doesn't imply two's
    complement *behavior* on overflow. Signed integer overflow still
    has undefined behavior.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for XCOM Labs
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From gah4@gah4@u.washington.edu to comp.compilers on Mon Feb 6 13:26:00 2023
    From Newsgroup: comp.compilers

    On Sunday, February 5, 2023 at 6:01:04 PM UTC-8, Keith Thompson wrote:

    (snip)

    The upcoming 2023 standard mandates two's complement. And it
    requires INT_MIN to be exactly -INT_MAX-1; previously INT_MIN could
    be equal to -INT_MAX, and -INT_MAX-1 could be a trap representation.
    It still permits padding bits and trap representations.

    Too bad for those CDC computers, and Unisys computers.
    Last I know of sign-magnitude is the IBM 7090 and 7094.


    Note that twos's complement *representation* doesn't imply two's
    complement *behavior* on overflow. Signed integer overflow still
    has undefined behavior.

    Overflow behavior mostly depends on the hardware.

    Many computers, such as the IBM S/360 and successors, have a bit that
    enables or disables the fixed point overflow exception. (For IBM, it
    also applies to decimal overflow.)

    For IA32, that is x86, there is the INTO instruction, which is put
    after any instruction that could generate overflow, and causes the
    interrupt if the overflow bit is set. Compilers would have to generate
    that instruction.

    It seems, though, that has gone away in x86-64 mode.

    I don't know that there is any replacement, other than a conditional
    branch based on the overflow flag.

    Fortran programmers rely on fixed point overflow, as they have been
    doing for 60 years, and don't have unsigned.

    (There are two routines in TeX that Knuth suggests replacing with
    assembly code. Those do multiply and divide, with 32 bit fixed point
    values, 16 bits after the binary point. Pascal has no option for
    avoiding the overflow trap, and it takes a lot of Pascal code to do
    the multiply and divide!)
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Hans-Peter Diettrich@DrDiettrich1@netscape.net to comp.compilers on Tue Feb 7 14:31:26 2023
    From Newsgroup: comp.compilers

    On 2/6/23 10:26 PM, gah4 wrote:

    Too bad for those CDC computers, and Unisys computers.
    Last I know of sign-magnitude is the IBM 7090 and 7094.

    AFAIK use IEEE-754 floating point numbers still sign-magnitude
    representation.
    Then the same representation of integral numbers may have advantages in computations.

    DoDi
    [I presume the sign-magnitude is to enable the hidden bit trick,
    which doesn't apply in unscaled integers. -John]
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From gah4@gah4@u.washington.edu to comp.compilers on Wed Feb 8 01:10:49 2023
    From Newsgroup: comp.compilers

    On Tuesday, February 7, 2023 at 6:30:40 PM UTC-8, Hans-Peter Diettrich wrote:
    On 2/6/23 10:26 PM, gah4 wrote:

    Too bad for those CDC computers, and Unisys computers.
    Last I know of sign-magnitude is the IBM 7090 and 7094.

    AFAIK use IEEE-754 floating point numbers still sign-magnitude representation.
    Then the same representation of integral numbers may have advantages in computations.

    [I presume the sign-magnitude is to enable the hidden bit trick,
    which doesn't apply in unscaled integers. -John]

    Yes, I meant integer representation.

    Well, I have been wondering for years when we get a C compiler
    for the 7090 so we can test out sign-magnitude integers.

    I think the 7090 does 16 bit integers, at least that is what
    its Fortran compilers did, stored in 36 bit words.

    As for floating point, the PDP-10 uses a two's complement
    floating point format. It does two's complement on the
    whole 36 bit word. The result is that fixed point compare
    instructions work on floating point values.
    [The 704x/709x series did 36 bit sign-magnitude arithmetic. Fortran
    integers were limited to 15 bits plus a sign, probably because that
    was the size of addresses, and they expected integer arithmetic to
    be used only for counting and subscripts. In 709 Fortran II they
    expanded them to 17 bits, in 7090 Fortran IV they were finally a
    full word. -John]

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.compilers on Wed Feb 8 10:19:54 2023
    From Newsgroup: comp.compilers

    Hans-Peter Diettrich <DrDiettrich1@netscape.net> writes:
    AFAIK use IEEE-754 floating point numbers still sign-magnitude >representation.
    Then the same representation of integral numbers may have advantages in >computations.

    Such as? Anyway, whatever these advantages may be, they have not been
    enough to prevent the extermination of sign-magnitude integers.

    [I presume the sign-magnitude is to enable the hidden bit trick,
    which doesn't apply in unscaled integers. -John]

    With a ones-complement or two's-complement mantissa the hidden bit
    would just have the same sign as the sign bit, so this trick is not
    tied to sign-magnitude representation.

    Some years ago someone sketched a two's-complement representation for
    FP (that also includes the exponent), but I did not quite get the
    details. Anyway, I expect that whatever the advantages of that
    representation are, they are not enough to justify the transition
    cost.

    - anton
    --
    M. Anton Ertl
    anton@mips.complang.tuwien.ac.at
    http://www.complang.tuwien.ac.at/anton/
    [PDP-6/10 floating point was two's complement. As someone else recently noted, that
    meant they could use fixed point compare instructions. -John]
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Hans-Peter Diettrich@DrDiettrich1@netscape.net to comp.compilers on Wed Feb 8 15:24:55 2023
    From Newsgroup: comp.compilers

    On 2/7/23 2:31 PM, Hans-Peter Diettrich wrote:
    On 2/6/23 10:26 PM, gah4 wrote:

    Too bad for those CDC computers, and Unisys computers.
    Last I know of sign-magnitude is the IBM 7090 and 7094.

    AFAIK use IEEE-754 floating point numbers still sign-magnitude representation.
    Then the same representation of integral numbers may have advantages in computations.

    DoDi
    [I presume the sign-magnitude is to enable the hidden bit trick,
    which doesn't apply in unscaled integers. -John]

    That's correct, the inprecise representation of FP numbers allows for
    such tricks. The hidden bit trick can be used again with the FP
    exponents, as I outlined in my Dynamic Floating Point Exponents proposal <https://figshare.com/articles/preprint/Dynamic_Exponents_for_Floating_Point_Numbers-2022-04-07_pdf/19548187>.

    DoDi
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Hans-Peter Diettrich@DrDiettrich1@netscape.net to comp.compilers on Fri Feb 10 19:11:31 2023
    From Newsgroup: comp.compilers

    On 2/8/23 11:19 AM, Anton Ertl wrote:
    With a ones-complement or two's-complement mantissa the hidden bit
    would just have the same sign as the sign bit, so this trick is not
    tied to sign-magnitude representation.

    Please explain the provenience or purpose of that hidden bit with
    integral numbers. How can integral values be *normalized* so that a
    previously required bit can be hidden? Sign extension to a higher number
    of bits does not increase the value or accuracy of an integral number.

    DoDi
    [He said "mantissa", so it's floating point. I've certainly seen scaled integer arithmetic, but normalized integers other than +/- zero in systems
    with signed zeros seems unlikely. -John]
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From gah4@gah4@u.washington.edu to comp.compilers on Fri Feb 10 23:47:40 2023
    From Newsgroup: comp.compilers

    On Friday, February 10, 2023 at 10:18:49 AM UTC-8, Hans-Peter Diettrich wrote:

    (snip)

    Please explain the provenience or purpose of that hidden bit with
    integral numbers. How can integral values be *normalized* so that a previously required bit can be hidden? Sign extension to a higher number
    of bits does not increase the value or accuracy of an integral number.

    [He said "mantissa", so it's floating point. I've certainly seen scaled integer arithmetic, but normalized integers other than +/- zero in systems with signed zeros seems unlikely. -John]

    Normalized binary floating point, with hidden one, is pretty common.
    I knew IBM S/360 floating point for some years before learning about
    those, and it seemed surprising at the time.

    As for integers, though, there are some processors with a floating
    point format that does not left normalize values.

    Some CDC processors, if the value can be shifted, normalized,
    as an integer value without losing bits on either end, choose that.
    Even more, the exponent is zero for that case. I think some
    Burroughs processors also do that.

    The result of doing that is that, for values in the appropriate range,
    the floating point instructions work for integer values. No instructions
    are needed to convert (in range) integers to floating point.

    There is so much fun history to the different floating point
    formats used over the years. Now almost forgotten.
    [I am not aware of any hidden bit formats before IEEE but the 704
    manual noted that normalized mantissas always have a 1 in the high
    bit so it wasn't a big leap. -John]
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From gah4@gah4@u.washington.edu to comp.compilers on Sat Feb 11 00:01:33 2023
    From Newsgroup: comp.compilers

    (snip, I wrote)

    But also, the 704 Fortran, and I believe still the 7090,
    indexes arrays from the end of memory toward the beginning.

    And or moderator wrote:
    [It did because for reasons I have never been able to figure out,
    the 70x series subtracted rather than added the contents of
    an index register to get the effective address. -John]

    I suppose so, but it was also convenient. Compilers (or linkers)
    generate code starting from the bottom of memory, and allocate
    variables starting from the top. (Different for different sized
    machines.) Since Fortran COMMON can be different length, or at least
    blank COMMON in later versions, allocating it from the end of memory
    works well.

    You can extend common from one subroutine to the next, or with
    chaining, to another whole program.

    So, was Fortran designed around the hardware,
    to allocate memory from the end?

    Or maybe just lucky.
    [Fortran was definitely designed around the 704. I have done a lot of
    looking to try and find out where the subtracted index registers came
    from, and while there are a lot of guesses, there is nothing written
    down. The three index registers were 1,2, and 4, and if you specified
    more than one, it OR'ed them together and subtracted the result, which
    was really strange. There were a lot of other machines with index
    registers but none I know of that subtracted or OR'ed. I have also
    been unable to tell whether the OR'ing was deliberate or just a result
    of minimizing the number of tubes that they then documented. It was
    likely useless since the 7094 had 7 index registers and a flag in
    case you wanted the old behavior. -John]
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.compilers on Sat Feb 11 22:34:25 2023
    From Newsgroup: comp.compilers

    gah4 <gah4@u.washington.edu> writes:
    [I am not aware of any hidden bit formats before IEEE

    The VAX FP formats have a hidden bit <https://nssdc.gsfc.nasa.gov/nssdc/formats/VAXFloatingPoint.htm>, and
    AFAIK VAX F and G are pretty close to IEEE binary32 and binary64.

    According to <https://home.kpn.nl/jhm.bonten/computers/bitsandbytes/wordsizes/hidbit.htm> already Zuse used a hidden bit in FP numbers on his machines (1940s
    and 1950s).

    --
    M. Anton Ertl
    anton@mips.complang.tuwien.ac.at
    http://www.complang.tuwien.ac.at/anton/
    [Oh, right, I forgot about Zuse. He invented a lot of stuff that other
    people reinvented later. Squinting at my VAX architecture handbook, the
    formats have the same layout as IEEE but the exponents are excess 128 and
    1024 rather than 127 and 1023 and there's no infinities or denormals. -John] --- Synchronet 3.21b-Linux NewsLink 1.2
  • From drb@drb@ihatespam.msu.edu (Dennis Boone) to comp.compilers on Sun Feb 12 04:37:07 2023
    From Newsgroup: comp.compilers

    The three index registers were 1,2, and 4, and if you specified more
    than one, it OR'ed them together and subtracted the result, which was
    really strange. There were a lot of other machines with index registers
    but none I know of that subtracted or OR'ed. I have also been unable to tell whether the OR'ing was deliberate or just a result of minimizing
    the number of tubes that they then documented.

    Not having to have the circuitry to make sure multiple registers
    were never gated onto the internal bus seems like a likely reason.

    But one _could_ also conceivably use multiple index registers for
    multiple dimensions, with careful allocation.

    De
    [As I said, I've found lots of guesses but no documents. Multiple
    dimensions only work if each dimension is a power of two which,
    considering the 704's tiny memory, seems unlikely. My guess is the
    same as yours, fewer tubes, but it's just a guess. -John]
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From gah4@gah4@u.washington.edu to comp.compilers on Sat Feb 11 22:48:41 2023
    From Newsgroup: comp.compilers

    (our moderator wrote)

    [Oh, right, I forgot about Zuse. He invented a lot of stuff that other
    people reinvented later. Squinting at my VAX architecture handbook, the formats have the same layout as IEEE but the exponents are excess 128 and 1024 rather than 127 and 1023 and there's no infinities or denormals. -John]

    I always forget the difference between them.

    VAX has the binary point to the left of the hidden one, and IEEE to the right. So that makes up for the one difference in exponent bias.

    But no infinity and NaN means that the actual exponent can be one higher.

    I think that is the way it works.

    I never had problems reading S/360 format from hex dumps, where
    the exponent is in whole hexdecimal digits, and no hidden one.
    --- Synchronet 3.21b-Linux NewsLink 1.2