Forum: Too Lazy BBS

Are there different programming languages that are compiled to the same intermediate language?

From Roger L Costello@costello@mitre.org to comp.compilers on Sun Jan 29 15:49:14 2023

From Newsgroup: comp.compilers

Hi Folks,

Programming_Language_1 has a compiler which generates Intermediate_Code_1.
Many backends are created for Intermediate_Code_1.

Time passes ....

Sally Smith creates Programming_Language_2. Sally wants to leverage the compiler work done on Programming_Language_1. She considers two approaches:

1. Create a compiler front-end for Programming_Language_2 which compiles instances to Intermediate_Code_1.
2. Create a translator which converts instances of Programming_Language_2 into Programming_Language_1 instances.

Sally is concerned. She asks herself:

- With either approach, how do I prove that my mapping is correct?
- For approach 1, how do I prove that the Intermediate_Code_1 that I generate is correct for my programming language?
- For approach 2, how do I prove that the instance of Programming_Language_1 that I generate is semantically equivalent to my Programming_Language_2 instance?"

What is the answer to Sally's questions?

/Roger
[I think the answer either way is that you don't, and you try to have a test suite that is as comprehensive as possible. -John]
--- Synchronet 3.21b-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.compilers on Sun Jan 29 22:14:51 2023

From Newsgroup: comp.compilers

Roger L Costello <costello@mitre.org> schrieb:

Hi Folks,

Programming_Language_1 has a compiler which generates Intermediate_Code_1. Many backends are created for Intermediate_Code_1.

Time passes ....

Sally Smith creates Programming_Language_2. Sally wants to leverage the compiler work done on Programming_Language_1. She considers two approaches:

1. Create a compiler front-end for Programming_Language_2 which compiles instances to Intermediate_Code_1.

Compilers have been doing that for a very long time. I remember
reading that the IBM PL.8 compiler used a target-independent
intermediate representation, and gcc and LLVM both adopt this.
Makes sense - most of the optimizations are target-indpendent.
Typically, SSA (single static assignment) is used these days
because it offers many optimization opportunites.

2. Create a translator which converts instances of Programming_Language_2 into
Programming_Language_1 instances.

That is also an approach, which works if the Programming_Language_2
is sufficiently low-level, and yet sufficiently flexible, to allow
a reasonable representation. In practice, this usually means C,
where things like pointer arithmetic can be used to implement
multi-dimensional arrays.

Fortran is a prime example. Both the well-known f2c translator, based
on the Bell Labs Fortran 77 compiler, and the NAG Fortran comppiler use
C as intermediate language.

Sally is concerned. She asks herself:

- With either approach, how do I prove that my mapping is correct?

Usually, she can't.

Compiler front ends are well-known to have bugs, and it would be
very hard to prove their correctness, especially if the bugs are
in semantics, which are usually not described in a formal way.

- For approach 1, how do I prove that the Intermediate_Code_1 that I generate is correct for my programming language?

See above.

- For approach 2, how do I prove that the instance of Programming_Language_1 that I generate is semantically equivalent to my Programming_Language_2 instance?"

Same problem.

[Target-independent intermediate forms work OK when the source
languages are semantically similar, e.g. C and Fortran, and range of
target architectures is limited, e.g., two's complement machines with
8-bit byte flat addressing. Every few years someone comes up with the
bright idea to do an all purpose intermediate language, which never
works. See UNCOL around 1960 for the first time that idea failed.

By the way, f2c is based on f77 which compiled Fortran into the intermediate language the PDP-11 C compiler used. -John]
--- Synchronet 3.21b-Linux NewsLink 1.2

From arnold@arnold@freefriends.org (Aharon Robbins) to comp.compilers on Mon Jan 30 08:03:12 2023

From Newsgroup: comp.compilers

Our esteemed moderator writes:

By the way, f2c is based on f77 which compiled Fortran into the intermediate >language the PDP-11 C compiler used. -John]

Just to be clear, it was the intermediate language of Johnson's PCC,
which was first targeted at the Interdata, then the PDP-11 and Vax,
not Dennis Ritchie's original PDP-11 compiler.

Arnold
[Good point. I wrote a competing F77 compiler INfort which did use the
Ritchie intermediate language. Someone else later modified it to use
pcc. -John]
--- Synchronet 3.21b-Linux NewsLink 1.2

From William Fahle@billfahle@gmail.com to comp.compilers on Mon Jan 30 02:00:42 2023

From Newsgroup: comp.compilers

On Sunday, January 29, 2023 at 10:58:58 AM UTC-6, Roger L Costello wrote:

Sally is concerned. She asks herself:

- With either approach, how do I prove that my mapping is correct?
- For approach 1, how do I prove that the Intermediate_Code_1 that I generate is correct for my programming language?
- For approach 2, how do I prove that the instance of Programming_Language_1 that I generate is semantically equivalent to my Programming_Language_2 instance?"

What is the answer to Sally's questions?

/Roger
[I think the answer either way is that you don't, and you try to have a test suite that is as comprehensive as possible. -John]

This is Post's Correspondence Problem, which has been proven to be
equivalent to the Halting Problem, thus uncomputable.
[Remember that while the halting problem is insoluble in general, it's
often soluble in specific cases, e.g. "halt" or "foo: goto foo".
Nevertheless, I would be quite surprised if anyone could prove a
non-toy translator correct. -John]
--- Synchronet 3.21b-Linux NewsLink 1.2

From Roger L Costello@costello@mitre.org to comp.compilers on Mon Jan 30 14:36:25 2023

From Newsgroup: comp.compilers

I wrote:

2. Create a translator which converts instances of
Programming_Language_2 into
Programming_Language_1 instances.

Thomas Koenig responded:

That is also an approach, which works if the
Programming_Language_2 is sufficiently low-level

Is that a typo? Do you mean Programming_Language_1 is sufficiently low level? Recall that the approach is to translate instances of Programming_Language_2
to instances of Programming_Language_1.

I will assume you meant Programming_Language_1. So you are saying that the target language (Programming_Language_1) must be a lower level language than the source language (Programming_Language_2), right?

John Levine wrote:

Target-independent intermediate forms work OK
when the source languages are semantically similar

How to determine if two source languages are semantically similar?

/Roger
[Not to snark too much, but you know when when you see it. C and Fortran
are similar, COBOL and Lisp are not. -John]
--- Synchronet 3.21b-Linux NewsLink 1.2

From Kaz Kylheku@864-117-4973@kylheku.com to comp.compilers on Mon Jan 30 17:36:30 2023

From Newsgroup: comp.compilers

On 2023-01-29, Roger L Costello <costello@mitre.org> wrote:

Hi Folks,

Programming_Language_1 has a compiler which generates Intermediate_Code_1. Many backends are created for Intermediate_Code_1.

Time passes ....

Sally Smith creates Programming_Language_2. Sally wants to leverage the compiler work done on Programming_Language_1. She considers two approaches:

1. Create a compiler front-end for Programming_Language_2 which compiles instances to Intermediate_Code_1.
2. Create a translator which converts instances of Programming_Language_2 into
Programming_Language_1 instances.

Sally is concerned. She asks herself:

- With either approach, how do I prove that my mapping is correct?
- For approach 1, how do I prove that the Intermediate_Code_1 that I generate is correct for my programming language?
- For approach 2, how do I prove that the instance of Programming_Language_1 that I generate is semantically equivalent to my Programming_Language_2 instance?"

What is the answer to Sally's questions?

The answer is that proving a compiler correct is the same as proving
any other software correct. What can make it it easier than some
software is that there is no real-time, nondeterministic behavior. A
compiler should produce exactly the same result every time it is
invoked with the same inputs.

Also, compilers can be written using pure functional techniques, and
this is actually "easy" to do, compared to some other kinds of
tasks. Recursion is natural for compiling because the input has a
nested, recursive structure, which the compiler follows.

Functional programs are easier to prove correct than imperative or object-oriented programs which work by mutating the state in variables
and objects.

The compiler is expressed as a recursive function which handles all the
cases that occur (syntax shapes being translated). You can then argue
that it is correct by induction: inspect that each case is performing a
correct transformation from source language to the intermediate language
for the construct which it is handling, and that it's correctly invoking
itself recursively for the sub-expressions that the input contains.
Also included must be an argument showing that the recursion terminates.

E.g. If you know that your compiler handles, say, an if/else statement correctly and some kind of for loop also correctly, and it's all
clean recursion, you can confidently argue that if statements nested
in loops, and vice versa, or in each other, are also handled correctly.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
[I have bad news for you -- there are real compilers that produce different results on different runs, due to things like it being loaded at different memory addreses and hash tables of pointers end up in different orders. -John] --- Synchronet 3.21b-Linux NewsLink 1.2

From Martin Ward@mwardgkc@gmail.com to comp.compilers on Tue Jan 31 14:04:39 2023

From Newsgroup: comp.compilers

On 30/01/2023 10:00, John Levine wrote:

[Remember that while the halting problem is insoluble in general, it's
often soluble in specific cases, e.g. "halt" or "foo: goto foo".

More usefully, there are methods for transforming a formal
specification into an executable implementation which
preserve semantic equivalence, and therefore are guaranteed
to produce a terminating program (for input states for which
the specification is defined). For example:

"Provably Correct Derivation of Algorithms Using FermaT"
Martin Ward and Hussein Zedan
Formal Aspects of Computing, September 2014.
Volume 26, Issue 5, Pages 993--1031, 2014-09-01, ISSN: 0934-5043 doi:dx.doi.org/10.1007/s00165-013-0287-2 http://www.gkc.org.uk/martin/papers/trans-prog-t.pdf

Nevertheless, I would be quite surprised if anyone could prove a
non-toy translator correct. -John]

According to this paper: "The state of the art [as of 2014] is
the CompCert compiler, which compiles almost all of the C language
to PowerPC, ARM and x86 assembly and has been mechanically verified
using the Coq proof assistant."

Xavier Leroy. "Formally verifying a compiler: Why? How? How far?".
CGO 2011 - 9th Annual IEEE/ACM International Symposium on Code
Generation and Optimization, Apr 2011, Chamonix, France. rf?10.1109/CGO.2011.5764668rf-. rf?hal-01091800rf-

https://hal.inria.fr/hal-01091800

--
Martin

Dr Martin Ward | Email: martin@gkc.org.uk | http://www.gkc.org.uk G.K.Chesterton site: http://www.gkc.org.uk/gkc | Erdos number: 4
--- Synchronet 3.21b-Linux NewsLink 1.2

From gah4@gah4@u.washington.edu to comp.compilers on Tue Jan 31 22:59:17 2023

From Newsgroup: comp.compilers

On Sunday, January 29, 2023 at 8:58:58 AM UTC-8, Roger L Costello wrote:

(snip)

Sally Smith creates Programming_Language_2. Sally wants to leverage the compiler work done on Programming_Language_1. She considers two approaches:

1. Create a compiler front-end for Programming_Language_2 which compiles instances to Intermediate_Code_1.

2. Create a translator which converts instances of Programming_Language_2 into
Programming_Language_1 instances.

The complication with this question is the definition of intermediate language.

Some compilers have a front end, middle end, and back end, with some form of intermediate code between each. They might be more or less general.

Depending on the compiler writer (or writers), there may be thought and time
to make them more general, allowing for other languages.

But okay, choice 2, especially if the language 1 has a standard, can be useful. (I was almost going to shorten "Programming_Language_1" to PL-1, but
then realized that some might read that wrong.)

That allows it to be used with other compilers, even ones not written yet.
On the other hand, it is restricted by the features of that language, which sometimes make things complicated.

In the case of choice 1, you might be able to add new features to the intermediate language, especially if you can join with its project.

Usually, though, there are some features that don't translate to
language 1. Or translate in an extremely complicated way.

Usually the translated language is unreadable to most people who
actually know the language.

Some examples. TeX was written in Pascal. Well, not really.
It was written in Web, a language and preprocessor that generates
Pascal code to be compiled. At some point, it was desired to have
it in C instead. A program was written to translate much of
Pascal into C, but some things were not so easy. So instead,
the features of the Web preprocessor were used, to convert to
a Pascal-like language, easier to convert to C. This was made
possible by the macro facilities of Web, and with its change
file feature.

Another one is f2c, which translates Fortran to C, but uses
special library routines and otherwise not so readable output.

Could we have a standard intermediate language, with all
features needed, or that ever will be needed?
[The answer to that last question is no. It's a very old bad idea,
with the first failed attempt UNCOL in 1958. -John]
--- Synchronet 3.21b-Linux NewsLink 1.2

From arnold@arnold@freefriends.org (Aharon Robbins) to comp.compilers on Wed Feb 1 08:07:24 2023

From Newsgroup: comp.compilers

I've never understood this. Isn't there a chicken and egg problem?
How do we know that the theorem prover is correct and bug free?

I ask in all seriousness.

Thanks,

Arnold

In article <23-01-092@comp.compilers>, Martin Ward <martin@gkc.org.uk> wrote: >On 30/01/2023 10:00, John Levine wrote:

[Remember that while the halting problem is insoluble in general, it's
often soluble in specific cases, e.g. "halt" or "foo: goto foo".

More usefully, there are methods for transforming a formal
specification into an executable implementation which
preserve semantic equivalence, and therefore are guaranteed
to produce a terminating program (for input states for which
the specification is defined). ...

[It's a perfectly reasonable question. Alan Perlis, who was my thesis
advisor, never saw any reason to believe that a thousand line proof
was any more likely to be bug-free than a thousand line program.
-John]
--- Synchronet 3.21b-Linux NewsLink 1.2

From Martin Ward@mwardgkc@gmail.com to comp.compilers on Thu Feb 2 15:44:17 2023

From Newsgroup: comp.compilers

On 01/02/2023 08:07, Aharon Robbins wrote:> I've never understood this.
Isn't there a chicken and egg problem?

How do we know that the theorem prover is correct and bug free?

A theorem prover generates a proof of the theorem (if it succeeds).
Checking the correctness of a proof is a much simpler task
than finding the proof in the first place and can be carried
out independently by different teams using different methods.
Appel and Haken's proof of the four colour theorem, for example,
involved a significant element of computer checking which was
independently double checked with different programs and computers.

[It's a perfectly reasonable question. Alan Perlis, who was my thesis advisor, never saw any reason to believe that a thousand line proof
was any more likely to be bug-free than a thousand line program.
-John]

Mathematicians publish proofs all the time and only a tiny percentage
of published proofs turn out to have errors. Programmers release
programs all the time and a vanishingly small percentage of these
turn out to be free from all bugs. Alan Perlis may not have been able
to think of a reason why this should be the case, but it is,
nevetheless, the case.

--
Martin

Dr Martin Ward | Email: martin@gkc.org.uk | http://www.gkc.org.uk G.K.Chesterton site: http://www.gkc.org.uk/gkc | Erdos number: 4
[Computer programs tend to be a lot longer than mathematical proofs. I
realize there are some 500 page proofs, but there are a whole lot of
500 page programs. It is my impression that in proofs, as in progams,
the longer and more complicated they are, the more likely they are to
have bugs. -John]
--- Synchronet 3.21b-Linux NewsLink 1.2

From gah4@gah4@u.washington.edu to comp.compilers on Thu Feb 2 16:24:11 2023

From Newsgroup: comp.compilers

(snip, I wrote)

Could we have a standard intermediate language, with all
features needed, or that ever will be needed?

[The answer to that last question is no. It's a very old bad idea,
with the first failed attempt UNCOL in 1958. -John]

I have wondered, though, about a standardized intermediate
for a processor family. One could write compilers to generate it,
and then updated processors come along with updated code
generators. Or even distribute intermediate code, and the
installer generates the right version for the processor.

This would have been especially useful for Itanium, which
(mostly) failed due to problems with code generation.
Since the whole idea is that the processor depends on the
code generator doing things in the right order. That is, out
of order execution, but determined at compile time. Failure
to do that meant failure for the whole idea.

[Someone comes up with an intermediate language that works for a few
source languages and a few targets, and usually publishes a paper
about his breakthrough. Then people try to add more front ends and
back ends, the intermediate language gets impossibly complex and
buggy, and the project is quietly forgotten. I'd think the back end
problem is a lot easier now than it was in 1958 since everything is
twos complement arithmetic in 8-bit bytes, but the front end is still
daunting.
If you don't push it too far it's certainly possible to do a lot of
work in a largely machine-independent way as LLVM does. -John]
--- Synchronet 3.21b-Linux NewsLink 1.2

From arnold@arnold@skeeve.com (Aharon Robbins) to comp.compilers on Fri Feb 3 08:26:45 2023

From Newsgroup: comp.compilers

Dr. Ward replied to me privately also, but since this went to the group,
I'll say the same thing here that I replied to him.

In article <23-02-005@comp.compilers>, Martin Ward <martin@gkc.org.uk> wrote: >On 01/02/2023 08:07, Aharon Robbins wrote:

I've never understood this. Isn't there a chicken and egg problem?
How do we know that the theorem prover is correct and bug free?

A theorem prover generates a proof of the theorem (if it succeeds).
Checking the correctness of a proof is a much simpler task
than finding the proof in the first place and can be carried
out independently by different teams using different methods.
Appel and Haken's proof of the four colour theorem, for example,
involved a significant element of computer checking which was
independently double checked with different programs and computers.

This tells me what a theorem prover does, and why it's useful. It
does not answer my question: How do we know that the theorem prover
itself is correct and bug free?

[It's a perfectly reasonable question. Alan Perlis, who was my thesis
advisor, never saw any reason to believe that a thousand line proof
was any more likely to be bug-free than a thousand line program.
-John]

And this is a different point from my question.

Mathematicians publish proofs all the time and only a tiny percentage
of published proofs turn out to have errors. Programmers release
programs all the time and a vanishingly small percentage of these
turn out to be free from all bugs. Alan Perlis may not have been able
to think of a reason why this should be the case, but it is,
nevetheless, the case.

I don't argue this. But I think mathematical theorems and their
proofs are different qualitatively from real world large programs.
I do think there's room for mathematical techniques to help
improve software development, but I don't see that done down
in the trenches, and I've been down in the trenches for close
to 40 years.

Arnold
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
--- Synchronet 3.21b-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.compilers on Fri Feb 3 11:44:03 2023

From Newsgroup: comp.compilers

gah4 <gah4@u.washington.edu> writes:

I have wondered, though, about a standardized intermediate
for a processor family. One could write compilers to generate it,
and then updated processors come along with updated code
generators. Or even distribute intermediate code, and the
installer generates the right version for the processor.

This would have been especially useful for Itanium, which
(mostly) failed due to problems with code generation.

I dispute the latter claim. My take is that IA-64 failed because the
original assumption that in-order performance would exceed OoO
performance was wrong. OoO processors surpassed in-order CPUs; they
managed to get higher clock rates (my guess is that this is due to
them having smaller feedback loops) and they benefit from better
branch prediction, which extends to 512-instruction reorder buffers on
recent Intel CPUs, far beyond what compilers can achieve on IA-64.
The death knell for IA-64 competetiveness was the introduction of SIMD instruction set extensions which made OoO CPUs surpass IA-64 even in
those vectorizable codes where IA-64 had been competetive.

Since the whole idea is that the processor depends on the
code generator doing things in the right order. That is, out
of order execution, but determined at compile time. Failure
to do that meant failure for the whole idea.

But essentially all sold IA-64 CPUs were variations of the McKinley microarchitecture as far as performance characteristics were
concerned, especially during the time when IA-64 was still perceived
as relevant. The next microarchitecture Poulson was only released in
2012 when IA-64 had already lost.

[Someone comes up with an intermediate language that works for a few
source languages and a few targets, and usually publishes a paper
about his breakthrough. Then people try to add more front ends and
back ends, the intermediate language gets impossibly complex and
buggy, and the project is quietly forgotten. I'd think the back end
problem is a lot easier now than it was in 1958 since everything is
twos complement arithmetic in 8-bit bytes

Yes. Computer architecture converges. 20 years ago we still had to
worry about alignment and byte ordering, nowadays alignment has been
settled in general-purpose CPUs (no alignment restrictions), and byte
ordering is mostly settled to little-endian (exception s390/s390x).

If you don't push it too far it's certainly possible to do a lot of
work in a largely machine-independent way as LLVM does. -John]

LLVM mostly supports what the hardware supports, but there is at least
one exception: LLVM IR divides the code into functions, that you can
only call, with LLVM handling the calling convention. When someone
needs to deviate from the calling convention, like the Haskel/C--
people, LLVM provides some possibilities, but from what I read, they
had a hard time pulling it through.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[I'm with you on IA-64 and VLIW. I knew the Multiflow people pretty
well and it is evident in retrospect that while it was a good fit
with what you could build in the 1980s, hardware soon reached the
point where it could do better what VLIW compilers had done. -John]
--- Synchronet 3.21b-Linux NewsLink 1.2

From gah4@gah4@u.washington.edu to comp.compilers on Fri Feb 3 19:13:17 2023

From Newsgroup: comp.compilers

On Friday, February 3, 2023 at 10:17:06 AM UTC-8, Anton Ertl wrote:

(snip, I wrote)

This would have been especially useful for Itanium, which
(mostly) failed due to problems with code generation.

I dispute the latter claim. My take is that IA-64 failed because the
original assumption that in-order performance would exceed OoO
performance was wrong. OoO processors surpassed in-order CPUs; they
managed to get higher clock rates (my guess is that this is due to
them having smaller feedback loops) and they benefit from better
branch prediction, which extends to 512-instruction reorder buffers on
recent Intel CPUs, far beyond what compilers can achieve on IA-64.
The death knell for IA-64 competetiveness was the introduction of SIMD instruction set extensions which made OoO CPUs surpass IA-64 even in
those vectorizable codes where IA-64 had been competetive.

I got interested in OoO in the days of the IBM 360/91, which is early
in the line of OoO processors. Among others, the 91 has imprecise
interrupts, where the stored address is not the instruction after the
interrupt cause.

But okay, the biggest failure of Itanium is that it was two years or
so behind schedule when it came out. And partly, as well as I
remember, is the need to implement x86 instructions, too.

Since the whole idea is that the processor depends on the
code generator doing things in the right order. That is, out
of order execution, but determined at compile time. Failure
to do that meant failure for the whole idea.

But essentially all sold IA-64 CPUs were variations of the McKinley microarchitecture as far as performance characteristics were
concerned, especially during the time when IA-64 was still perceived
as relevant. The next microarchitecture Poulson was only released in
2012 when IA-64 had already lost.

But is it the whole idea of compile-time instruction scheduling the
cause of the failure, or just the way they did it?

The 360/91 had some interesting problems. One is that it had 16 way interleaved memory with a cycle time of 13 processor cycles, and the
goal of one instruction per clock cycle. That means it is dependent on
memory address ordering, which is hard to know at compile time.

The slightly later 360/85, without the fancy OoO processor, but with
cache memory (the first machine with cache!) was about as fast
on real problems.

Otherwise, the 91 has a limited number of reservation stations,
limiting how for OoO it can go. All OoO processors have a limit
to how far they can go. But the compiler does not have that limit.

Now, since transistors are cheap now, and one can throw a large
number into reorder buffers and such, one can build really deep
pipelines.

But the reason for bringing this up, is that if Intel had a defined intermediate code, and supplied the back end that used it,
and even more, could update that back end later, that would have
been very convenient for compiler writers.

Even more, design for it could have been done in parallel with the
processor, making both work well together.

Reminds me of the 8087 virtual register stack. The 8087 has
eight registers, but they were supposed to be a virtual stack.
They would be copied to/from memory on stack overflow
or underflow. But no-one tried to write the interrupt routine
until after the hardware was made, and it turns out not to
be possible. I never knew why that wasn't fixed for the 287
or 387, though.
[Multiflow found that VLIW compile-time instruction scheduling was
swell for code with predictable memory access patterns, much less so
for code with data-dependent access patterns. I doubt that has
changed. And if the memory access is that predictable, you can
likely use SIMD instructions instead. -John]

--- Synchronet 3.21b-Linux NewsLink 1.2

From gah4@gah4@u.washington.edu to comp.compilers on Sat Feb 4 02:55:01 2023

From Newsgroup: comp.compilers

On Friday, February 3, 2023 at 7:51:41 PM UTC-8, gah4 wrote:

(snip on Itanium and OoO execution)

The 360/91 had some interesting problems. One is that it had 16 way interleaved memory with a cycle time of 13 processor cycles, and the
goal of one instruction per clock cycle. That means it is dependent on
memory address ordering, which is hard to know at compile time.

(snip)

But the reason for bringing this up, is that if Intel had a defined intermediate code, and supplied the back end that used it,
and even more, could update that back end later, that would have
been very convenient for compiler writers.

(snip, our moderator wrote)

[Multiflow found that VLIW compile-time instruction scheduling was
swell for code with predictable memory access patterns, much less so
for code with data-dependent access patterns. I doubt that has
changed. And if the memory access is that predictable, you can
likely use SIMD instructions instead. -John]

So, 50 years ago we had the 360/91, and some CDC competition.

About 45 years ago, we had the Cray-1 and successor vector processors.

So, yes, SIMD processors.

Then 30 years ago, Fortran added the FORALL statement, to make writing
vector code easier. Except that it doesn't.

FORALL, and array assignment statements, always operate such that
the effect is as if the whole right-hand side is evaluated before the left
side is changed. Unless the whole thing fits in a vector register,
the compiler is still stuck.

And just about that time, vector processor go out of style.

(There is now DO CONCURRENT, which might be better.)

Much matrix code goes sequentially through memory, and programmers
know that. (Or they are supposed to know that.)

Early Fortran had the FREQUENCY statement, such that one could
help the compiler know the likely iteration count for DO loops, and
branch probability for branch instructions. Those would help with
static branch predictors, but are long gone now.

It would seem, though, that a statement telling the compiler the
expected memory access pattern would help.

And the Cray-1, like the 360/91, has 16-way interleaved
memory, 64 bits wide.
[Legend says that in one of the early Fortran compilers, the FREQUENCY statement was accidentally implemented backward and nobody noticed. We
have found over and over that programmers' intuitions about the way
their programs perform are quite poor and we're much better off saying
what we mean and letting the compiler figure out how to do low level optimizations. -John]
--- Synchronet 3.21b-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.compilers on Sun Feb 5 17:59:03 2023

From Newsgroup: comp.compilers

Martin Ward <mwardgkc@gmail.com> writes:

On 01/02/2023 08:07, Aharon Robbins wrote:> I've never understood this.

[It's a perfectly reasonable question. Alan Perlis, who was my thesis
advisor, never saw any reason to believe that a thousand line proof
was any more likely to be bug-free than a thousand line program.
-John]

Mathematicians publish proofs all the time and only a tiny percentage
of published proofs turn out to have errors.

Even at face value I would like to see some evidence for this claim.

A comparable statement would be "Computer scientists publish papers
about programs all the time, and only a tiny percentage of these
papers turn out to have errors".

There is a difference between how a mathematical proof is used and how
a program is used.

1) A program is usually fed to a machine for execution. A published
proof is read by a number (>=0) of mathematicians, who fill in a lot
of what the author has left out. If you feed the published proof to
an automatic proof checker (of your choice), how many of the published
proofs need no additions and no changes (bug fixes) before the proof
checker verifies the proof. I guess there is a reason why Wikipedia
has no page on "proof checker", but suggests "proof assistant": "a
software tool to assist with the development of formal proofs by
human-machine collaboration."

2) A program has to satisfy the requirements of its users, while a
published proof is limited to proving a published theorem. One
example of the difference is that undefinedness is totally acceptable
in mathematics, while it is a bug in programs (interestingly, there is
a significant number of compiler writers who take the mathematical
view in what they provide to programmers, but consider that a program
in their programming language that exercises the undefined behaviour
that they provide to programmers to be buggy).

The size argument that our esteemed moderator provided also has to be considered.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
--- Synchronet 3.21b-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.compilers on Tue Feb 7 08:35:00 2023

From Newsgroup: comp.compilers

gah4 <gah4@u.washington.edu> writes:

On Friday, February 3, 2023 at 10:17:06 AM UTC-8, Anton Ertl wrote:

(snip, I wrote)

This would have been especially useful for Itanium, which
(mostly) failed due to problems with code generation.

I dispute the latter claim. My take is that IA-64 failed because the
original assumption that in-order performance would exceed OoO
performance was wrong. OoO processors surpassed in-order CPUs; they
managed to get higher clock rates (my guess is that this is due to
them having smaller feedback loops) and they benefit from better
branch prediction, which extends to 512-instruction reorder buffers on
recent Intel CPUs, far beyond what compilers can achieve on IA-64.
The death knell for IA-64 competetiveness was the introduction of SIMD
instruction set extensions which made OoO CPUs surpass IA-64 even in
those vectorizable codes where IA-64 had been competetive.

...

But okay, the biggest failure of Itanium is that it was two years or
so behind schedule when it came out.

If it had had superior performance when McKinley came out in 2002,
that would have been just a minor speed bump on the way to success.

And partly, as well as I
remember, is the need to implement x86 instructions, too.

Certainly, backwards compatibility is paramount in the markets that it
was intended for. And the 386 compatibility also had to be close to competetive. But it wasn't.

But is it the whole idea of compile-time instruction scheduling the
cause of the failure, or just the way they did it?

It seems to me that the idea that in-order execution combined with architectural features for reducing dependencies and with compiler
scheduling turned out to be inferior to out-of-order execution.
Moreover, in those areas where the compiler works well (numerical
code), it also works ok for using SIMD instructions
(auto-vectorization) which could be added relatively cheaply to the
hardware.

All OoO processors have a limit
to how far they can go. But the compiler does not have that limit.

And the compiler can make more sophisticated scheduling decisions
based on the critical path, while the hardware scheduler picks the
oldest ready instruction. These were the ideas that seduced Intel,
HP, and Transmeta to invest huge amounts of money into EPIC ideas.

But the compiler has other limits. It cannot schedule across indirect
calls (used in object-oriented dispatch), across compilation unit
boundaries (in particular, calls to and returns from shared
libraries). Another important limit is the predictability of
branches. Static branch prediction using profiles has ~10%
mispredictions, while (hardware) dynamic branch prediction has a much
lower misprediction rate (I remember numbers like 3% (for the same
benchmarks that have 10% mispredictions with static branch prediction)
in papers from the last century; I expect that this has improved even
more in the meantime. If the compiler mispredicts, it will schedule instructions from the wrong path, instructions that will be useless
for execution.

In the end a compiler typically can schedule across a few dozen
instructions, while hardware can schedule across a few hundred.
Compiler scheduling works well for simple loops, and that's where
IA-64 shone, but only doing loops well is not good enough for
general-purpose software.

Now, since transistors are cheap now, and one can throw a large
number into reorder buffers and such, one can build really deep
pipelines.

It's interesting that Intel managed to produce their first OoO CPU
(the Pentium Pro with 5.5M transistors) in 350nm, while Merced (the
first Itanium) at 25.4M transistors was too large for the 250nm and
they had to switch to 180nm (contributing to the delays). So, while
the theory was that the EPIC principle would reduce the hardware
complexity (to allow adding more functional units for increased
performance), in Itanium practice the hardware was more complex, and
the performance advantages did not appear.

But the reason for bringing this up, is that if Intel had a defined >intermediate code, and supplied the back end that used it,
and even more, could update that back end later, that would have
been very convenient for compiler writers.

Something like this happened roughly at the same time with LLVM.
There were other initiatives, but LLVM was the one that succeeded.

There was the Open Research Compiler for IA-64 from Intel and the
Chinese Academy of Sciences.

SGI released their compiler targeted at IA-64 as Open64.

Even more, design for it could have been done in parallel with the
processor, making both work well together.

Intel, HP, and others worked on compilers in parallel to the hardware
work. It's just that the result did not perform as well for
general-purpose code as OoO processors with conventional compilers.

[Multiflow found that VLIW compile-time instruction scheduling was
swell for code with predictable memory access patterns, much less so
for code with data-dependent access patterns.

Multiflow and Cydrome built computers for numerical computations (aka
HPC aka (mini-)supercomputing). These tend to spend a lot of time in
simple loops with high iteration counts and have statically
predictable branches. They found compiler techniques like trace
scheduling (Multiflow) that work well for code with predictable
branches, and modulo schedling (Cydrome) that work well for simple
loops with high iteration counts. The IA-64 and Transmeta architects
wanted to extend these successes to general-purpose computing, but it
did not work out.

Concerning memory access patterns: While they are not particularly
predictable in general-purpose code, mostd general-purpose code
benefits quite a lot from caches (more than numerical code), so I
don't think that this was a big problem for IA-64.

Some people mention varying latencies due to caches as a problem, but
the choice of a small L1 cache (16KB compared to 64KB for the earlier
21264 and K7 CPUs) for McKinley indicates that average latency was
more of a problem for IA-64 than varying latencies.

And if the memory access is that predictable, you can
likely use SIMD instructions instead. -John]

Yes, SIMD ate EPICs lunch on the numerical program side, leaving few
programs where IA-64 outdid the mainstream.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
--- Synchronet 3.21b-Linux NewsLink 1.2

From gah4@gah4@u.washington.edu to comp.compilers on Wed Feb 8 01:04:38 2023

From Newsgroup: comp.compilers

On Tuesday, February 7, 2023 at 6:24:44 PM UTC-8, Anton Ertl wrote:

(snip)

All OoO processors have a limit
to how far they can go. But the compiler does not have that limit.

And the compiler can make more sophisticated scheduling decisions
based on the critical path, while the hardware scheduler picks the
oldest ready instruction. These were the ideas that seduced Intel,
HP, and Transmeta to invest huge amounts of money into EPIC ideas.

But the compiler has other limits. It cannot schedule across indirect
calls (used in object-oriented dispatch), across compilation unit
boundaries (in particular, calls to and returns from shared
libraries).

OK, maybe we are getting closer.
Many processors now have speculative execution.
If you don't know what else to do, execute some instructions
just in case, but don't change actual memory or registers yet.

And Itanium doesn't do that. I don't see, though, that it isn't
possible to have EPIC and also speculative execution.

Another important limit is the predictability of
branches. Static branch prediction using profiles has ~10%
mispredictions, while (hardware) dynamic branch prediction has a much
lower misprediction rate (I remember numbers like 3% (for the same
benchmarks that have 10% mispredictions with static branch prediction)
in papers from the last century; I expect that this has improved even
more in the meantime. If the compiler mispredicts, it will schedule instructions from the wrong path, instructions that will be useless
for execution.

OK, but we just discussed speculative execution. So you can execute
two paths or maybe three.

In the end a compiler typically can schedule across a few dozen
instructions, while hardware can schedule across a few hundred.
Compiler scheduling works well for simple loops, and that's where
IA-64 shone, but only doing loops well is not good enough for
general-purpose software.

OK, loops were important for the 360/91, meant for floating point
number crunching. (Only floating point is OoO.)
And small inner loops are usual in those programs. Among those
is loop mode, where it keeps the instructions, and doesn't have
to keep fetching them.

But also, the 360/91 was designed to run code written for any
S/360 model. So, no special ordering and even self-modifying
code. Even more, S/360 only has four floating point registers,
so register renaming was important. Instructions had to be
ordered for use of registers, where out-of-order and register
renaming could fix that.

Okay, so I am not saying that EPIC has to get all the ordering,
but only close enough. So, ones now can keep hundreds of
instructions in execution at the same time?

Now, since transistors are cheap now, and one can throw a large
number into reorder buffers and such, one can build really deep
pipelines.

It's interesting that Intel managed to produce their first OoO CPU
(the Pentium Pro with 5.5M transistors) in 350nm, while Merced (the
first Itanium) at 25.4M transistors was too large for the 250nm and
they had to switch to 180nm (contributing to the delays). So, while
the theory was that the EPIC principle would reduce the hardware
complexity (to allow adding more functional units for increased
performance), in Itanium practice the hardware was more complex, and
the performance advantages did not appear.

I remember lots of stories about how PPro didn't do well with the 16 bit
code still in much of DOS and Windows. Enough that it was slower
than Pentium. I am not sure by now how much OO the PPro does.

But the reason for bringing this up, is that if Intel had a defined >intermediate code, and supplied the back end that used it,
and even more, could update that back end later, that would have
been very convenient for compiler writers.

Something like this happened roughly at the same time with LLVM.
There were other initiatives, but LLVM was the one that succeeded.

There was the Open Research Compiler for IA-64 from Intel and the
Chinese Academy of Sciences.

SGI released their compiler targeted at IA-64 as Open64.

Even more, design for it could have been done in parallel with the >processor, making both work well together.

Intel, HP, and others worked on compilers in parallel to the hardware
work. It's just that the result did not perform as well for
general-purpose code as OoO processors with conventional compilers.

[Multiflow found that VLIW compile-time instruction scheduling was
swell for code with predictable memory access patterns, much less so
for code with data-dependent access patterns.

Multiflow and Cydrome built computers for numerical computations (aka
HPC aka (mini-)supercomputing). These tend to spend a lot of time in
simple loops with high iteration counts and have statically
predictable branches. They found compiler techniques like trace
scheduling (Multiflow) that work well for code with predictable
branches, and modulo schedling (Cydrome) that work well for simple
loops with high iteration counts. The IA-64 and Transmeta architects
wanted to extend these successes to general-purpose computing, but it
did not work out.

I sometimes work on computational physics problems. The ones with
tight loops of floating point operations. I believe that some Itanium
systems went into a large cluster somewhere. I got an RX2600
for $100 some years ago, when there were many on eBay.
(You could even buy three at a time.)

But yes, I suspect for web servers, that they aren't especially good.

Concerning memory access patterns: While they are not particularly predictable in general-purpose code, mostd general-purpose code
benefits quite a lot from caches (more than numerical code), so I
don't think that this was a big problem for IA-64.

Some people mention varying latencies due to caches as a problem, but
the choice of a small L1 cache (16KB compared to 64KB for the earlier
21264 and K7 CPUs) for McKinley indicates that average latency was
more of a problem for IA-64 than varying latencies.

And if the memory access is that predictable, you can
likely use SIMD instructions instead. -John]

Yes, SIMD ate EPICs lunch on the numerical program side, leaving few
programs where IA-64 outdid the mainstream.

The Cray series of vector processors seems long gone by now.
--- Synchronet 3.21b-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Geek2
  Sun May 17 07:06:15 2026
  from Euclid, Oh via Telnet
- Geek2
  Sat May 16 21:25:04 2026
  from Euclid, Oh via Telnet
- Jas Hud
  Sat May 16 00:50:28 2026
  from Bbs.Eob-Bbs.Com,wi via Telnet
- Geek2
  Fri May 15 19:53:20 2026
  from Euclid, Oh via Telnet

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	65
Nodes:	6 (0 / 6)
Uptime:	10:53:35
Calls:	862
Files:	1,311
D/L today:	3 files (7,546K bytes)
Messages:	265,264

Are there different programming languages that are compiled to the same intermediate language?

Who's Online

Recent Visitors

System Info