Forum: Too Lazy BBS

Re: Software proofs, was Are there different programming languages that are compiled to the same intermediate language?

From Spiros Bousbouras@spibou@gmail.com to comp.compilers on Sun Feb 5 15:14:02 2023

From Newsgroup: comp.compilers

On 01 Feb 2023 08:07:24 GMT
arnold@freefriends.org (Aharon Robbins) wrote:

I've never understood this. Isn't there a chicken and egg problem?
How do we know that the theorem prover is correct and bug free?

We prove it by hand ? I don't know if anyone has done this for the
provers mentioned but it feels like a feasible task and when it's done ,
it's done i.e. a theorem prover is not the kind of software which
continually gets updated.

[It's a perfectly reasonable question. Alan Perlis, who was my thesis advisor, never saw any reason to believe that a thousand line proof
was any more likely to be bug-free than a thousand line program.
-John]

It depends on what we mean by "bug" , in mathematics we wouldn't say "bug"
but "error".

First you have *typos* which is situations where one was meaning to , for example , write i and they wrote j instead. It may be that mathematics proofs have more of those than computer programmes because in computer programmes some will be caught by the compiler like for example if j has
not been declared. But the essential difference is that a typo is harmless in mathematics , the reader of the proof will either notice and understand what was intended or even unconsciously replace what is written with what was intended. But in computer programmes this kind of typo bug can be
catastrophic. Hence in mathematics we wouldn't say that a proof has errors
just because it has a typo , we would say it has a typo.

For other errors , logic errors , I would guess that computer programmes will have more errors/bugs than mathematical proofs. This assertion is based on personal experience and instinct to some extent , having written both many programmes and many proofs. But I will attempt to give some of what I think
are reasons :

1:
In mathematics you work with a formalism right from the start but in computer programmes often one has an intuitive understanding of what the program is meant to achieve and a working programme may be considered to have a bug when its behaviour turns out not to match closely enough that intuitive understanding. The difference in the level of formalism is not just about
the programmes but about the programming languages too. In standards of programming languages I've read , the language is not described in precise mathematical detail. The grammar usually is but not other parts of the language.

For example , from the C standard :
The result of the binary + operator is the sum of the operands.

Even just for integer arguments , this is nowhere a complete definition but
to piece together a complete definition one must combine information
scattered in faraway parts of the standard. This would never occur in a mathematics books or paper , the complete definition of a function or
operator or whatever (including the precise set for which it is defined)
would appear in one place. This sort of thing where one has to piece
together information from many parts of the document to arrive at a complete definition , seems to be common in computer language standards and I have
no idea why it happens , I don't think anything about specifying a programming language forces things to be done like this. But the result is that programmers will often work with a mix of precise specification and intuitive understanding of the language they're using.

2:
In mathematics a correct proof *is* the goal but in computer programming some set of behaviours is the goal and the set of behaviours may not even be the ultimate goal but the desire to make a profit is. For example

en.wikipedia.org/wiki/Pac-Man :
Due to an integer overflow, the 256th level loads improperly, rendering
it impossible to complete.

But this didn't prevent the game from being highly successful. "Donkey Kong" also had a bug which prevented progress from some point onwards.

What is the goal also influences the kind of people who specialise in computer programming vs those who write mathematical proofs. My impression is that there exist computer programmers who are of the "trial and error" type who write some code which seems to approximate what they have in mind , test it , write some more code and so forth. Eventually they arrive at something which seems to match their mental model and they consider themselves done. They wouldn't have any interest and possibly ability to write a proof of correctness. As a practical
matter , they may be perfectly productive and write code which works for the desired cases.

3:
There is a kind of bug/error for which there's no mathematical analogue :

From "Expert C Programming" by Peter van der Linden :

One release of an ANSI C compiler had an interesting bug. The symbol
table was accessed by a hash function that computed a likely place from
which to start a serial search. The computation was copiously commented,
even describing the book the algorithm came from. Unfortunately, the
programmer omitted to close the comment! The entire hash initial value
calculation thus fell inside the continuing comment,

[...]

The entire calculation of the initial hash value was omitted, so the
table was always searched serially from the zeroth element! As a result,
symbol table lookup (a very frequent operation in a compiler) was much
slower than it should have been. This was never found during testing
because it only affected the speed of a lookup, not the result. This is
why some compilers complain if they notice an opening comment in a
comment string. The error was eventually found in the course of looking
for a different bug. Inserting the closing comment resulted in an
immediate compilation speedup of 15%!

[The history of formal description of programming languages is not
encouraging. Back in the 1970s the ANSI PL/I standard was defined in
terms of formal operations on trees, but even so it had bugs/errors.
Also, per your comment on addition, in a lot of cases the practical
definition turns into whatever the underlying machine's instruction
does. Until recently the C standard was deliberately unclear about
whether arithmetic was ones- or twos-complement. -John]
--- Synchronet 3.21b-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.compilers on Sun Feb 5 16:14:57 2023

From Newsgroup: comp.compilers

[...]

[The history of formal description of programming languages is not encouraging. Back in the 1970s the ANSI PL/I standard was defined in
terms of formal operations on trees, but even so it had bugs/errors.
Also, per your comment on addition, in a lot of cases the practical definition turns into whatever the underlying machine's instruction
does. Until recently the C standard was deliberately unclear about
whether arithmetic was ones- or twos-complement. -John]

The C standard still doesn't mandate two's-complement.

The 1990 edition of the C standard was vague about integer
representations, though it did impose some requirements. It required
a "pure binary" representation, and it required, for example,
a value that is within the range of both int and unsigned int to
have the same representation in both. That excluded some of the
more exotic possibilites. (It mentioned 2's complement a few times,
but only as an example.)

The 1999 standard explicitly required one of three representations:
sign and magnitude, two's complement, or one's complement. It also
explicitly permitted padding bits (bits that don't contribute
to the value, and that can either be ignored or create trap
representations).

The 2011 standard corrected the spelling of "ones' complement",
but otherwise didn't change anything significant. (The 2017 edition
was a minor update.)

The upcoming 2023 standard mandates two's complement. And it
requires INT_MIN to be exactly -INT_MAX-1; previously INT_MIN could
be equal to -INT_MAX, and -INT_MAX-1 could be a trap representation.
It still permits padding bits and trap representations.

Note that twos's complement *representation* doesn't imply two's
complement *behavior* on overflow. Signed integer overflow still
has undefined behavior.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.21b-Linux NewsLink 1.2

From gah4@gah4@u.washington.edu to comp.compilers on Mon Feb 6 13:26:00 2023

From Newsgroup: comp.compilers

On Sunday, February 5, 2023 at 6:01:04 PM UTC-8, Keith Thompson wrote:

(snip)

The upcoming 2023 standard mandates two's complement. And it
requires INT_MIN to be exactly -INT_MAX-1; previously INT_MIN could
be equal to -INT_MAX, and -INT_MAX-1 could be a trap representation.
It still permits padding bits and trap representations.

Too bad for those CDC computers, and Unisys computers.
Last I know of sign-magnitude is the IBM 7090 and 7094.

Note that twos's complement *representation* doesn't imply two's
complement *behavior* on overflow. Signed integer overflow still
has undefined behavior.

Overflow behavior mostly depends on the hardware.

Many computers, such as the IBM S/360 and successors, have a bit that
enables or disables the fixed point overflow exception. (For IBM, it
also applies to decimal overflow.)

For IA32, that is x86, there is the INTO instruction, which is put
after any instruction that could generate overflow, and causes the
interrupt if the overflow bit is set. Compilers would have to generate
that instruction.

It seems, though, that has gone away in x86-64 mode.

I don't know that there is any replacement, other than a conditional
branch based on the overflow flag.

Fortran programmers rely on fixed point overflow, as they have been
doing for 60 years, and don't have unsigned.

(There are two routines in TeX that Knuth suggests replacing with
assembly code. Those do multiply and divide, with 32 bit fixed point
values, 16 bits after the binary point. Pascal has no option for
avoiding the overflow trap, and it takes a lot of Pascal code to do
the multiply and divide!)
--- Synchronet 3.21b-Linux NewsLink 1.2

From Hans-Peter Diettrich@DrDiettrich1@netscape.net to comp.compilers on Tue Feb 7 14:31:26 2023

From Newsgroup: comp.compilers

On 2/6/23 10:26 PM, gah4 wrote:

Too bad for those CDC computers, and Unisys computers.
Last I know of sign-magnitude is the IBM 7090 and 7094.

AFAIK use IEEE-754 floating point numbers still sign-magnitude
representation.
Then the same representation of integral numbers may have advantages in computations.

DoDi
[I presume the sign-magnitude is to enable the hidden bit trick,
which doesn't apply in unscaled integers. -John]
--- Synchronet 3.21b-Linux NewsLink 1.2

From gah4@gah4@u.washington.edu to comp.compilers on Wed Feb 8 01:10:49 2023

From Newsgroup: comp.compilers

On Tuesday, February 7, 2023 at 6:30:40 PM UTC-8, Hans-Peter Diettrich wrote:

On 2/6/23 10:26 PM, gah4 wrote:

Too bad for those CDC computers, and Unisys computers.
Last I know of sign-magnitude is the IBM 7090 and 7094.

AFAIK use IEEE-754 floating point numbers still sign-magnitude representation.
Then the same representation of integral numbers may have advantages in computations.

[I presume the sign-magnitude is to enable the hidden bit trick,
which doesn't apply in unscaled integers. -John]

Yes, I meant integer representation.

Well, I have been wondering for years when we get a C compiler
for the 7090 so we can test out sign-magnitude integers.

I think the 7090 does 16 bit integers, at least that is what
its Fortran compilers did, stored in 36 bit words.

As for floating point, the PDP-10 uses a two's complement
floating point format. It does two's complement on the
whole 36 bit word. The result is that fixed point compare
instructions work on floating point values.
[The 704x/709x series did 36 bit sign-magnitude arithmetic. Fortran
integers were limited to 15 bits plus a sign, probably because that
was the size of addresses, and they expected integer arithmetic to
be used only for counting and subscripts. In 709 Fortran II they
expanded them to 17 bits, in 7090 Fortran IV they were finally a
full word. -John]

--- Synchronet 3.21b-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.compilers on Wed Feb 8 10:19:54 2023

From Newsgroup: comp.compilers

Hans-Peter Diettrich <DrDiettrich1@netscape.net> writes:

AFAIK use IEEE-754 floating point numbers still sign-magnitude >representation.
Then the same representation of integral numbers may have advantages in >computations.

Such as? Anyway, whatever these advantages may be, they have not been
enough to prevent the extermination of sign-magnitude integers.

[I presume the sign-magnitude is to enable the hidden bit trick,
which doesn't apply in unscaled integers. -John]

With a ones-complement or two's-complement mantissa the hidden bit
would just have the same sign as the sign bit, so this trick is not
tied to sign-magnitude representation.

Some years ago someone sketched a two's-complement representation for
FP (that also includes the exponent), but I did not quite get the
details. Anyway, I expect that whatever the advantages of that
representation are, they are not enough to justify the transition
cost.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[PDP-6/10 floating point was two's complement. As someone else recently noted, that
meant they could use fixed point compare instructions. -John]
--- Synchronet 3.21b-Linux NewsLink 1.2

From Hans-Peter Diettrich@DrDiettrich1@netscape.net to comp.compilers on Wed Feb 8 15:24:55 2023

From Newsgroup: comp.compilers

On 2/7/23 2:31 PM, Hans-Peter Diettrich wrote:

On 2/6/23 10:26 PM, gah4 wrote:

Too bad for those CDC computers, and Unisys computers.
Last I know of sign-magnitude is the IBM 7090 and 7094.

AFAIK use IEEE-754 floating point numbers still sign-magnitude representation.
Then the same representation of integral numbers may have advantages in computations.

DoDi
[I presume the sign-magnitude is to enable the hidden bit trick,
which doesn't apply in unscaled integers. -John]

That's correct, the inprecise representation of FP numbers allows for
such tricks. The hidden bit trick can be used again with the FP
exponents, as I outlined in my Dynamic Floating Point Exponents proposal <https://figshare.com/articles/preprint/Dynamic_Exponents_for_Floating_Point_Numbers-2022-04-07_pdf/19548187>.

DoDi
--- Synchronet 3.21b-Linux NewsLink 1.2

From Hans-Peter Diettrich@DrDiettrich1@netscape.net to comp.compilers on Fri Feb 10 19:11:31 2023

From Newsgroup: comp.compilers

On 2/8/23 11:19 AM, Anton Ertl wrote:

With a ones-complement or two's-complement mantissa the hidden bit
would just have the same sign as the sign bit, so this trick is not
tied to sign-magnitude representation.

Please explain the provenience or purpose of that hidden bit with
integral numbers. How can integral values be *normalized* so that a
previously required bit can be hidden? Sign extension to a higher number
of bits does not increase the value or accuracy of an integral number.

DoDi
[He said "mantissa", so it's floating point. I've certainly seen scaled integer arithmetic, but normalized integers other than +/- zero in systems
with signed zeros seems unlikely. -John]
--- Synchronet 3.21b-Linux NewsLink 1.2

From gah4@gah4@u.washington.edu to comp.compilers on Fri Feb 10 23:47:40 2023

From Newsgroup: comp.compilers

On Friday, February 10, 2023 at 10:18:49 AM UTC-8, Hans-Peter Diettrich wrote:

(snip)

Please explain the provenience or purpose of that hidden bit with
integral numbers. How can integral values be *normalized* so that a previously required bit can be hidden? Sign extension to a higher number
of bits does not increase the value or accuracy of an integral number.

[He said "mantissa", so it's floating point. I've certainly seen scaled integer arithmetic, but normalized integers other than +/- zero in systems with signed zeros seems unlikely. -John]

Normalized binary floating point, with hidden one, is pretty common.
I knew IBM S/360 floating point for some years before learning about
those, and it seemed surprising at the time.

As for integers, though, there are some processors with a floating
point format that does not left normalize values.

Some CDC processors, if the value can be shifted, normalized,
as an integer value without losing bits on either end, choose that.
Even more, the exponent is zero for that case. I think some
Burroughs processors also do that.

The result of doing that is that, for values in the appropriate range,
the floating point instructions work for integer values. No instructions
are needed to convert (in range) integers to floating point.

There is so much fun history to the different floating point
formats used over the years. Now almost forgotten.
[I am not aware of any hidden bit formats before IEEE but the 704
manual noted that normalized mantissas always have a 1 in the high
bit so it wasn't a big leap. -John]
--- Synchronet 3.21b-Linux NewsLink 1.2

From gah4@gah4@u.washington.edu to comp.compilers on Sat Feb 11 00:01:33 2023

From Newsgroup: comp.compilers

(snip, I wrote)

But also, the 704 Fortran, and I believe still the 7090,
indexes arrays from the end of memory toward the beginning.

And or moderator wrote:

[It did because for reasons I have never been able to figure out,
the 70x series subtracted rather than added the contents of
an index register to get the effective address. -John]

I suppose so, but it was also convenient. Compilers (or linkers)
generate code starting from the bottom of memory, and allocate
variables starting from the top. (Different for different sized
machines.) Since Fortran COMMON can be different length, or at least
blank COMMON in later versions, allocating it from the end of memory
works well.

You can extend common from one subroutine to the next, or with
chaining, to another whole program.

So, was Fortran designed around the hardware,
to allocate memory from the end?

Or maybe just lucky.
[Fortran was definitely designed around the 704. I have done a lot of
looking to try and find out where the subtracted index registers came
from, and while there are a lot of guesses, there is nothing written
down. The three index registers were 1,2, and 4, and if you specified
more than one, it OR'ed them together and subtracted the result, which
was really strange. There were a lot of other machines with index
registers but none I know of that subtracted or OR'ed. I have also
been unable to tell whether the OR'ing was deliberate or just a result
of minimizing the number of tubes that they then documented. It was
likely useless since the 7094 had 7 index registers and a flag in
case you wanted the old behavior. -John]
--- Synchronet 3.21b-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.compilers on Sat Feb 11 22:34:25 2023

From Newsgroup: comp.compilers

gah4 <gah4@u.washington.edu> writes:

[I am not aware of any hidden bit formats before IEEE

The VAX FP formats have a hidden bit <https://nssdc.gsfc.nasa.gov/nssdc/formats/VAXFloatingPoint.htm>, and
AFAIK VAX F and G are pretty close to IEEE binary32 and binary64.

According to <https://home.kpn.nl/jhm.bonten/computers/bitsandbytes/wordsizes/hidbit.htm> already Zuse used a hidden bit in FP numbers on his machines (1940s
and 1950s).

--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[Oh, right, I forgot about Zuse. He invented a lot of stuff that other
people reinvented later. Squinting at my VAX architecture handbook, the
formats have the same layout as IEEE but the exponents are excess 128 and
1024 rather than 127 and 1023 and there's no infinities or denormals. -John] --- Synchronet 3.21b-Linux NewsLink 1.2

From drb@drb@ihatespam.msu.edu (Dennis Boone) to comp.compilers on Sun Feb 12 04:37:07 2023

From Newsgroup: comp.compilers

The three index registers were 1,2, and 4, and if you specified more
than one, it OR'ed them together and subtracted the result, which was
really strange. There were a lot of other machines with index registers
but none I know of that subtracted or OR'ed. I have also been unable to tell whether the OR'ing was deliberate or just a result of minimizing
the number of tubes that they then documented.

Not having to have the circuitry to make sure multiple registers
were never gated onto the internal bus seems like a likely reason.

But one _could_ also conceivably use multiple index registers for
multiple dimensions, with careful allocation.

De
[As I said, I've found lots of guesses but no documents. Multiple
dimensions only work if each dimension is a power of two which,
considering the 704's tiny memory, seems unlikely. My guess is the
same as yours, fewer tubes, but it's just a guess. -John]
--- Synchronet 3.21b-Linux NewsLink 1.2

From gah4@gah4@u.washington.edu to comp.compilers on Sat Feb 11 22:48:41 2023

From Newsgroup: comp.compilers

(our moderator wrote)

[Oh, right, I forgot about Zuse. He invented a lot of stuff that other
people reinvented later. Squinting at my VAX architecture handbook, the formats have the same layout as IEEE but the exponents are excess 128 and 1024 rather than 127 and 1023 and there's no infinities or denormals. -John]

I always forget the difference between them.

VAX has the binary point to the left of the hidden one, and IEEE to the right. So that makes up for the one difference in exponent bias.

But no infinity and NaN means that the actual exponent can be one higher.

I think that is the way it works.

I never had problems reading S/360 format from hex dumps, where
the exponent is in whole hexdecimal digits, and no hidden one.
--- Synchronet 3.21b-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Geek2
  Tue May 19 08:20:37 2026
  from Euclid, Oh via Telnet
- Geek2
  Sun May 17 07:06:15 2026
  from Euclid, Oh via Telnet
- Geek2
  Sat May 16 21:25:04 2026
  from Euclid, Oh via Telnet
- Jas Hud
  Sat May 16 00:50:28 2026
  from Bbs.Eob-Bbs.Com,wi via Telnet

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	65
Nodes:	6 (0 / 6)
Uptime:	16:08:29
Calls:	863
Calls today:	1
Files:	1,311
D/L today:	11 files (21,614K bytes)
Messages:	265,788

Re: Software proofs, was Are there different programming languages that are compiled to the same intermediate language?

Who's Online

Recent Visitors

System Info