Hi Folks,
Programming_Language_1 has a compiler which generates Intermediate_Code_1. Many backends are created for Intermediate_Code_1.
Time passes ....
Sally Smith creates Programming_Language_2. Sally wants to leverage the compiler work done on Programming_Language_1. She considers two approaches:
1. Create a compiler front-end for Programming_Language_2 which compiles instances to Intermediate_Code_1.
2. Create a translator which converts instances of Programming_Language_2 into
Programming_Language_1 instances.
Sally is concerned. She asks herself:
- With either approach, how do I prove that my mapping is correct?
- For approach 1, how do I prove that the Intermediate_Code_1 that I generate is correct for my programming language?
- For approach 2, how do I prove that the instance of Programming_Language_1 that I generate is semantically equivalent to my Programming_Language_2 instance?"
By the way, f2c is based on f77 which compiled Fortran into the intermediate >language the PDP-11 C compiler used. -John]
Sally is concerned. She asks herself:
- With either approach, how do I prove that my mapping is correct?
- For approach 1, how do I prove that the Intermediate_Code_1 that I generate is correct for my programming language?
- For approach 2, how do I prove that the instance of Programming_Language_1 that I generate is semantically equivalent to my Programming_Language_2 instance?"
What is the answer to Sally's questions?
/Roger
[I think the answer either way is that you don't, and you try to have a test suite that is as comprehensive as possible. -John]
2. Create a translator which converts instances of
Programming_Language_2 into
Programming_Language_1 instances.
That is also an approach, which works if the
Programming_Language_2 is sufficiently low-level
Target-independent intermediate forms work OK
when the source languages are semantically similar
Hi Folks,
Programming_Language_1 has a compiler which generates Intermediate_Code_1. Many backends are created for Intermediate_Code_1.
Time passes ....
Sally Smith creates Programming_Language_2. Sally wants to leverage the compiler work done on Programming_Language_1. She considers two approaches:
1. Create a compiler front-end for Programming_Language_2 which compiles instances to Intermediate_Code_1.
2. Create a translator which converts instances of Programming_Language_2 into
Programming_Language_1 instances.
Sally is concerned. She asks herself:
- With either approach, how do I prove that my mapping is correct?
- For approach 1, how do I prove that the Intermediate_Code_1 that I generate is correct for my programming language?
- For approach 2, how do I prove that the instance of Programming_Language_1 that I generate is semantically equivalent to my Programming_Language_2 instance?"
What is the answer to Sally's questions?
[Remember that while the halting problem is insoluble in general, it's
often soluble in specific cases, e.g. "halt" or "foo: goto foo".
Nevertheless, I would be quite surprised if anyone could prove aAccording to this paper: "The state of the art [as of 2014] is
non-toy translator correct. -John]
Sally Smith creates Programming_Language_2. Sally wants to leverage the compiler work done on Programming_Language_1. She considers two approaches:
1. Create a compiler front-end for Programming_Language_2 which compiles instances to Intermediate_Code_1.
2. Create a translator which converts instances of Programming_Language_2 into
Programming_Language_1 instances.
[It's a perfectly reasonable question. Alan Perlis, who was my thesis[Remember that while the halting problem is insoluble in general, it's
often soluble in specific cases, e.g. "halt" or "foo: goto foo".
More usefully, there are methods for transforming a formal
specification into an executable implementation which
preserve semantic equivalence, and therefore are guaranteed
to produce a terminating program (for input states for which
the specification is defined). ...
How do we know that the theorem prover is correct and bug free?A theorem prover generates a proof of the theorem (if it succeeds).
[It's a perfectly reasonable question. Alan Perlis, who was my thesis advisor, never saw any reason to believe that a thousand line proof
was any more likely to be bug-free than a thousand line program.
-John]
Could we have a standard intermediate language, with all
features needed, or that ever will be needed?
[The answer to that last question is no. It's a very old bad idea,
with the first failed attempt UNCOL in 1958. -John]
I've never understood this. Isn't there a chicken and egg problem?
How do we know that the theorem prover is correct and bug free?
A theorem prover generates a proof of the theorem (if it succeeds).
Checking the correctness of a proof is a much simpler task
than finding the proof in the first place and can be carried
out independently by different teams using different methods.
Appel and Haken's proof of the four colour theorem, for example,
involved a significant element of computer checking which was
independently double checked with different programs and computers.
[It's a perfectly reasonable question. Alan Perlis, who was my thesis
advisor, never saw any reason to believe that a thousand line proof
was any more likely to be bug-free than a thousand line program.
-John]
Mathematicians publish proofs all the time and only a tiny percentage
of published proofs turn out to have errors. Programmers release
programs all the time and a vanishingly small percentage of these
turn out to be free from all bugs. Alan Perlis may not have been able
to think of a reason why this should be the case, but it is,
nevetheless, the case.
I have wondered, though, about a standardized intermediate
for a processor family. One could write compilers to generate it,
and then updated processors come along with updated code
generators. Or even distribute intermediate code, and the
installer generates the right version for the processor.
This would have been especially useful for Itanium, which
(mostly) failed due to problems with code generation.
Since the whole idea is that the processor depends on the
code generator doing things in the right order. That is, out
of order execution, but determined at compile time. Failure
to do that meant failure for the whole idea.
[Someone comes up with an intermediate language that works for a few
source languages and a few targets, and usually publishes a paper
about his breakthrough. Then people try to add more front ends and
back ends, the intermediate language gets impossibly complex and
buggy, and the project is quietly forgotten. I'd think the back end
problem is a lot easier now than it was in 1958 since everything is
twos complement arithmetic in 8-bit bytes
If you don't push it too far it's certainly possible to do a lot of
work in a largely machine-independent way as LLVM does. -John]
This would have been especially useful for Itanium, which
(mostly) failed due to problems with code generation.
I dispute the latter claim. My take is that IA-64 failed because the
original assumption that in-order performance would exceed OoO
performance was wrong. OoO processors surpassed in-order CPUs; they
managed to get higher clock rates (my guess is that this is due to
them having smaller feedback loops) and they benefit from better
branch prediction, which extends to 512-instruction reorder buffers on
recent Intel CPUs, far beyond what compilers can achieve on IA-64.
The death knell for IA-64 competetiveness was the introduction of SIMD instruction set extensions which made OoO CPUs surpass IA-64 even in
those vectorizable codes where IA-64 had been competetive.
Since the whole idea is that the processor depends on the
code generator doing things in the right order. That is, out
of order execution, but determined at compile time. Failure
to do that meant failure for the whole idea.
But essentially all sold IA-64 CPUs were variations of the McKinley microarchitecture as far as performance characteristics were
concerned, especially during the time when IA-64 was still perceived
as relevant. The next microarchitecture Poulson was only released in
2012 when IA-64 had already lost.
The 360/91 had some interesting problems. One is that it had 16 way interleaved memory with a cycle time of 13 processor cycles, and the
goal of one instruction per clock cycle. That means it is dependent on
memory address ordering, which is hard to know at compile time.
But the reason for bringing this up, is that if Intel had a defined intermediate code, and supplied the back end that used it,
and even more, could update that back end later, that would have
been very convenient for compiler writers.
[Multiflow found that VLIW compile-time instruction scheduling was
swell for code with predictable memory access patterns, much less so
for code with data-dependent access patterns. I doubt that has
changed. And if the memory access is that predictable, you can
likely use SIMD instructions instead. -John]
On 01/02/2023 08:07, Aharon Robbins wrote:> I've never understood this.
[It's a perfectly reasonable question. Alan Perlis, who was my thesis
advisor, never saw any reason to believe that a thousand line proof
was any more likely to be bug-free than a thousand line program.
-John]
Mathematicians publish proofs all the time and only a tiny percentage
of published proofs turn out to have errors.
On Friday, February 3, 2023 at 10:17:06 AM UTC-8, Anton Ertl wrote:...
(snip, I wrote)
This would have been especially useful for Itanium, which
(mostly) failed due to problems with code generation.
I dispute the latter claim. My take is that IA-64 failed because the
original assumption that in-order performance would exceed OoO
performance was wrong. OoO processors surpassed in-order CPUs; they
managed to get higher clock rates (my guess is that this is due to
them having smaller feedback loops) and they benefit from better
branch prediction, which extends to 512-instruction reorder buffers on
recent Intel CPUs, far beyond what compilers can achieve on IA-64.
The death knell for IA-64 competetiveness was the introduction of SIMD
instruction set extensions which made OoO CPUs surpass IA-64 even in
those vectorizable codes where IA-64 had been competetive.
But okay, the biggest failure of Itanium is that it was two years or
so behind schedule when it came out.
And partly, as well as I
remember, is the need to implement x86 instructions, too.
But is it the whole idea of compile-time instruction scheduling the
cause of the failure, or just the way they did it?
All OoO processors have a limit
to how far they can go. But the compiler does not have that limit.
Now, since transistors are cheap now, and one can throw a large
number into reorder buffers and such, one can build really deep
pipelines.
But the reason for bringing this up, is that if Intel had a defined >intermediate code, and supplied the back end that used it,
and even more, could update that back end later, that would have
been very convenient for compiler writers.
Even more, design for it could have been done in parallel with the
processor, making both work well together.
[Multiflow found that VLIW compile-time instruction scheduling was
swell for code with predictable memory access patterns, much less so
for code with data-dependent access patterns.
And if the memory access is that predictable, you can
likely use SIMD instructions instead. -John]
All OoO processors have a limit
to how far they can go. But the compiler does not have that limit.
And the compiler can make more sophisticated scheduling decisions
based on the critical path, while the hardware scheduler picks the
oldest ready instruction. These were the ideas that seduced Intel,
HP, and Transmeta to invest huge amounts of money into EPIC ideas.
But the compiler has other limits. It cannot schedule across indirect
calls (used in object-oriented dispatch), across compilation unit
boundaries (in particular, calls to and returns from shared
libraries).
Another important limit is the predictability of
branches. Static branch prediction using profiles has ~10%
mispredictions, while (hardware) dynamic branch prediction has a much
lower misprediction rate (I remember numbers like 3% (for the same
benchmarks that have 10% mispredictions with static branch prediction)
in papers from the last century; I expect that this has improved even
more in the meantime. If the compiler mispredicts, it will schedule instructions from the wrong path, instructions that will be useless
for execution.
In the end a compiler typically can schedule across a few dozen
instructions, while hardware can schedule across a few hundred.
Compiler scheduling works well for simple loops, and that's where
IA-64 shone, but only doing loops well is not good enough for
general-purpose software.
Now, since transistors are cheap now, and one can throw a large
number into reorder buffers and such, one can build really deep
pipelines.
It's interesting that Intel managed to produce their first OoO CPU
(the Pentium Pro with 5.5M transistors) in 350nm, while Merced (the
first Itanium) at 25.4M transistors was too large for the 250nm and
they had to switch to 180nm (contributing to the delays). So, while
the theory was that the EPIC principle would reduce the hardware
complexity (to allow adding more functional units for increased
performance), in Itanium practice the hardware was more complex, and
the performance advantages did not appear.
But the reason for bringing this up, is that if Intel had a defined >intermediate code, and supplied the back end that used it,
and even more, could update that back end later, that would have
been very convenient for compiler writers.
Something like this happened roughly at the same time with LLVM.
There were other initiatives, but LLVM was the one that succeeded.
There was the Open Research Compiler for IA-64 from Intel and the
Chinese Academy of Sciences.
SGI released their compiler targeted at IA-64 as Open64.
Even more, design for it could have been done in parallel with the >processor, making both work well together.
Intel, HP, and others worked on compilers in parallel to the hardware
work. It's just that the result did not perform as well for
general-purpose code as OoO processors with conventional compilers.
[Multiflow found that VLIW compile-time instruction scheduling was
swell for code with predictable memory access patterns, much less so
for code with data-dependent access patterns.
Multiflow and Cydrome built computers for numerical computations (aka
HPC aka (mini-)supercomputing). These tend to spend a lot of time in
simple loops with high iteration counts and have statically
predictable branches. They found compiler techniques like trace
scheduling (Multiflow) that work well for code with predictable
branches, and modulo schedling (Cydrome) that work well for simple
loops with high iteration counts. The IA-64 and Transmeta architects
wanted to extend these successes to general-purpose computing, but it
did not work out.
Concerning memory access patterns: While they are not particularly predictable in general-purpose code, mostd general-purpose code
benefits quite a lot from caches (more than numerical code), so I
don't think that this was a big problem for IA-64.
Some people mention varying latencies due to caches as a problem, but
the choice of a small L1 cache (16KB compared to 64KB for the earlier
21264 and K7 CPUs) for McKinley indicates that average latency was
more of a problem for IA-64 than varying latencies.
And if the memory access is that predictable, you can
likely use SIMD instructions instead. -John]
Yes, SIMD ate EPICs lunch on the numerical program side, leaving few
programs where IA-64 outdid the mainstream.
| Sysop: | Amessyroom |
|---|---|
| Location: | Fayetteville, NC |
| Users: | 59 |
| Nodes: | 6 (0 / 6) |
| Uptime: | 25:30:21 |
| Calls: | 810 |
| Files: | 1,287 |
| Messages: | 196,025 |