• Are there different programming languages that are compiled to the same intermediate language?

    From Roger L Costello@costello@mitre.org to comp.compilers on Sun Jan 29 15:49:14 2023
    From Newsgroup: comp.compilers

    Hi Folks,

    Programming_Language_1 has a compiler which generates Intermediate_Code_1.
    Many backends are created for Intermediate_Code_1.

    Time passes ....

    Sally Smith creates Programming_Language_2. Sally wants to leverage the compiler work done on Programming_Language_1. She considers two approaches:

    1. Create a compiler front-end for Programming_Language_2 which compiles instances to Intermediate_Code_1.
    2. Create a translator which converts instances of Programming_Language_2 into Programming_Language_1 instances.

    Sally is concerned. She asks herself:

    - With either approach, how do I prove that my mapping is correct?
    - For approach 1, how do I prove that the Intermediate_Code_1 that I generate is correct for my programming language?
    - For approach 2, how do I prove that the instance of Programming_Language_1 that I generate is semantically equivalent to my Programming_Language_2 instance?"

    What is the answer to Sally's questions?

    /Roger
    [I think the answer either way is that you don't, and you try to have a test suite that is as comprehensive as possible. -John]
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.compilers on Sun Jan 29 22:14:51 2023
    From Newsgroup: comp.compilers

    Roger L Costello <costello@mitre.org> schrieb:
    Hi Folks,

    Programming_Language_1 has a compiler which generates Intermediate_Code_1. Many backends are created for Intermediate_Code_1.

    Time passes ....

    Sally Smith creates Programming_Language_2. Sally wants to leverage the compiler work done on Programming_Language_1. She considers two approaches:

    1. Create a compiler front-end for Programming_Language_2 which compiles instances to Intermediate_Code_1.

    Compilers have been doing that for a very long time. I remember
    reading that the IBM PL.8 compiler used a target-independent
    intermediate representation, and gcc and LLVM both adopt this.
    Makes sense - most of the optimizations are target-indpendent.
    Typically, SSA (single static assignment) is used these days
    because it offers many optimization opportunites.

    2. Create a translator which converts instances of Programming_Language_2 into
    Programming_Language_1 instances.

    That is also an approach, which works if the Programming_Language_2
    is sufficiently low-level, and yet sufficiently flexible, to allow
    a reasonable representation. In practice, this usually means C,
    where things like pointer arithmetic can be used to implement
    multi-dimensional arrays.

    Fortran is a prime example. Both the well-known f2c translator, based
    on the Bell Labs Fortran 77 compiler, and the NAG Fortran comppiler use
    C as intermediate language.

    Sally is concerned. She asks herself:

    - With either approach, how do I prove that my mapping is correct?

    Usually, she can't.

    Compiler front ends are well-known to have bugs, and it would be
    very hard to prove their correctness, especially if the bugs are
    in semantics, which are usually not described in a formal way.

    - For approach 1, how do I prove that the Intermediate_Code_1 that I generate is correct for my programming language?

    See above.

    - For approach 2, how do I prove that the instance of Programming_Language_1 that I generate is semantically equivalent to my Programming_Language_2 instance?"

    Same problem.

    [Target-independent intermediate forms work OK when the source
    languages are semantically similar, e.g. C and Fortran, and range of
    target architectures is limited, e.g., two's complement machines with
    8-bit byte flat addressing. Every few years someone comes up with the
    bright idea to do an all purpose intermediate language, which never
    works. See UNCOL around 1960 for the first time that idea failed.

    By the way, f2c is based on f77 which compiled Fortran into the intermediate language the PDP-11 C compiler used. -John]
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From arnold@arnold@freefriends.org (Aharon Robbins) to comp.compilers on Mon Jan 30 08:03:12 2023
    From Newsgroup: comp.compilers

    Our esteemed moderator writes:

    By the way, f2c is based on f77 which compiled Fortran into the intermediate >language the PDP-11 C compiler used. -John]

    Just to be clear, it was the intermediate language of Johnson's PCC,
    which was first targeted at the Interdata, then the PDP-11 and Vax,
    not Dennis Ritchie's original PDP-11 compiler.

    Arnold
    [Good point. I wrote a competing F77 compiler INfort which did use the
    Ritchie intermediate language. Someone else later modified it to use
    pcc. -John]
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From William Fahle@billfahle@gmail.com to comp.compilers on Mon Jan 30 02:00:42 2023
    From Newsgroup: comp.compilers

    On Sunday, January 29, 2023 at 10:58:58 AM UTC-6, Roger L Costello wrote:
    Sally is concerned. She asks herself:

    - With either approach, how do I prove that my mapping is correct?
    - For approach 1, how do I prove that the Intermediate_Code_1 that I generate is correct for my programming language?
    - For approach 2, how do I prove that the instance of Programming_Language_1 that I generate is semantically equivalent to my Programming_Language_2 instance?"

    What is the answer to Sally's questions?

    /Roger
    [I think the answer either way is that you don't, and you try to have a test suite that is as comprehensive as possible. -John]

    This is Post's Correspondence Problem, which has been proven to be
    equivalent to the Halting Problem, thus uncomputable.
    [Remember that while the halting problem is insoluble in general, it's
    often soluble in specific cases, e.g. "halt" or "foo: goto foo".
    Nevertheless, I would be quite surprised if anyone could prove a
    non-toy translator correct. -John]
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Roger L Costello@costello@mitre.org to comp.compilers on Mon Jan 30 14:36:25 2023
    From Newsgroup: comp.compilers

    I wrote:

    2. Create a translator which converts instances of
    Programming_Language_2 into
    Programming_Language_1 instances.

    Thomas Koenig responded:

    That is also an approach, which works if the
    Programming_Language_2 is sufficiently low-level

    Is that a typo? Do you mean Programming_Language_1 is sufficiently low level? Recall that the approach is to translate instances of Programming_Language_2
    to instances of Programming_Language_1.

    I will assume you meant Programming_Language_1. So you are saying that the target language (Programming_Language_1) must be a lower level language than the source language (Programming_Language_2), right?

    John Levine wrote:

    Target-independent intermediate forms work OK
    when the source languages are semantically similar

    How to determine if two source languages are semantically similar?

    /Roger
    [Not to snark too much, but you know when when you see it. C and Fortran
    are similar, COBOL and Lisp are not. -John]
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Kaz Kylheku@864-117-4973@kylheku.com to comp.compilers on Mon Jan 30 17:36:30 2023
    From Newsgroup: comp.compilers

    On 2023-01-29, Roger L Costello <costello@mitre.org> wrote:
    Hi Folks,

    Programming_Language_1 has a compiler which generates Intermediate_Code_1. Many backends are created for Intermediate_Code_1.

    Time passes ....

    Sally Smith creates Programming_Language_2. Sally wants to leverage the compiler work done on Programming_Language_1. She considers two approaches:

    1. Create a compiler front-end for Programming_Language_2 which compiles instances to Intermediate_Code_1.
    2. Create a translator which converts instances of Programming_Language_2 into
    Programming_Language_1 instances.

    Sally is concerned. She asks herself:

    - With either approach, how do I prove that my mapping is correct?
    - For approach 1, how do I prove that the Intermediate_Code_1 that I generate is correct for my programming language?
    - For approach 2, how do I prove that the instance of Programming_Language_1 that I generate is semantically equivalent to my Programming_Language_2 instance?"

    What is the answer to Sally's questions?

    The answer is that proving a compiler correct is the same as proving
    any other software correct. What can make it it easier than some
    software is that there is no real-time, nondeterministic behavior. A
    compiler should produce exactly the same result every time it is
    invoked with the same inputs.

    Also, compilers can be written using pure functional techniques, and
    this is actually "easy" to do, compared to some other kinds of
    tasks. Recursion is natural for compiling because the input has a
    nested, recursive structure, which the compiler follows.

    Functional programs are easier to prove correct than imperative or object-oriented programs which work by mutating the state in variables
    and objects.

    The compiler is expressed as a recursive function which handles all the
    cases that occur (syntax shapes being translated). You can then argue
    that it is correct by induction: inspect that each case is performing a
    correct transformation from source language to the intermediate language
    for the construct which it is handling, and that it's correctly invoking
    itself recursively for the sub-expressions that the input contains.
    Also included must be an argument showing that the recursion terminates.

    E.g. If you know that your compiler handles, say, an if/else statement correctly and some kind of for loop also correctly, and it's all
    clean recursion, you can confidently argue that if statements nested
    in loops, and vice versa, or in each other, are also handled correctly.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    [I have bad news for you -- there are real compilers that produce different results on different runs, due to things like it being loaded at different memory addreses and hash tables of pointers end up in different orders. -John] --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Martin Ward@mwardgkc@gmail.com to comp.compilers on Tue Jan 31 14:04:39 2023
    From Newsgroup: comp.compilers

    On 30/01/2023 10:00, John Levine wrote:
    [Remember that while the halting problem is insoluble in general, it's
    often soluble in specific cases, e.g. "halt" or "foo: goto foo".

    More usefully, there are methods for transforming a formal
    specification into an executable implementation which
    preserve semantic equivalence, and therefore are guaranteed
    to produce a terminating program (for input states for which
    the specification is defined). For example:

    "Provably Correct Derivation of Algorithms Using FermaT"
    Martin Ward and Hussein Zedan
    Formal Aspects of Computing, September 2014.
    Volume 26, Issue 5, Pages 993--1031, 2014-09-01, ISSN: 0934-5043 doi:dx.doi.org/10.1007/s00165-013-0287-2 http://www.gkc.org.uk/martin/papers/trans-prog-t.pdf

    Nevertheless, I would be quite surprised if anyone could prove a
    non-toy translator correct. -John]
    According to this paper: "The state of the art [as of 2014] is
    the CompCert compiler, which compiles almost all of the C language
    to PowerPC, ARM and x86 assembly and has been mechanically verified
    using the Coq proof assistant."

    Xavier Leroy. "Formally verifying a compiler: Why? How? How far?".
    CGO 2011 - 9th Annual IEEE/ACM International Symposium on Code
    Generation and Optimization, Apr 2011, Chamonix, France. rf?10.1109/CGO.2011.5764668rf-. rf?hal-01091800rf-

    https://hal.inria.fr/hal-01091800

    --
    Martin

    Dr Martin Ward | Email: martin@gkc.org.uk | http://www.gkc.org.uk G.K.Chesterton site: http://www.gkc.org.uk/gkc | Erdos number: 4
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From gah4@gah4@u.washington.edu to comp.compilers on Tue Jan 31 22:59:17 2023
    From Newsgroup: comp.compilers

    On Sunday, January 29, 2023 at 8:58:58 AM UTC-8, Roger L Costello wrote:

    (snip)

    Sally Smith creates Programming_Language_2. Sally wants to leverage the compiler work done on Programming_Language_1. She considers two approaches:

    1. Create a compiler front-end for Programming_Language_2 which compiles instances to Intermediate_Code_1.

    2. Create a translator which converts instances of Programming_Language_2 into
    Programming_Language_1 instances.

    The complication with this question is the definition of intermediate language.

    Some compilers have a front end, middle end, and back end, with some form of intermediate code between each. They might be more or less general.

    Depending on the compiler writer (or writers), there may be thought and time
    to make them more general, allowing for other languages.

    But okay, choice 2, especially if the language 1 has a standard, can be useful. (I was almost going to shorten "Programming_Language_1" to PL-1, but
    then realized that some might read that wrong.)

    That allows it to be used with other compilers, even ones not written yet.
    On the other hand, it is restricted by the features of that language, which sometimes make things complicated.

    In the case of choice 1, you might be able to add new features to the intermediate language, especially if you can join with its project.

    Usually, though, there are some features that don't translate to
    language 1. Or translate in an extremely complicated way.

    Usually the translated language is unreadable to most people who
    actually know the language.

    Some examples. TeX was written in Pascal. Well, not really.
    It was written in Web, a language and preprocessor that generates
    Pascal code to be compiled. At some point, it was desired to have
    it in C instead. A program was written to translate much of
    Pascal into C, but some things were not so easy. So instead,
    the features of the Web preprocessor were used, to convert to
    a Pascal-like language, easier to convert to C. This was made
    possible by the macro facilities of Web, and with its change
    file feature.

    Another one is f2c, which translates Fortran to C, but uses
    special library routines and otherwise not so readable output.

    Could we have a standard intermediate language, with all
    features needed, or that ever will be needed?
    [The answer to that last question is no. It's a very old bad idea,
    with the first failed attempt UNCOL in 1958. -John]
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From arnold@arnold@freefriends.org (Aharon Robbins) to comp.compilers on Wed Feb 1 08:07:24 2023
    From Newsgroup: comp.compilers

    I've never understood this. Isn't there a chicken and egg problem?
    How do we know that the theorem prover is correct and bug free?

    I ask in all seriousness.

    Thanks,

    Arnold

    In article <23-01-092@comp.compilers>, Martin Ward <martin@gkc.org.uk> wrote: >On 30/01/2023 10:00, John Levine wrote:
    [Remember that while the halting problem is insoluble in general, it's
    often soluble in specific cases, e.g. "halt" or "foo: goto foo".

    More usefully, there are methods for transforming a formal
    specification into an executable implementation which
    preserve semantic equivalence, and therefore are guaranteed
    to produce a terminating program (for input states for which
    the specification is defined). ...
    [It's a perfectly reasonable question. Alan Perlis, who was my thesis
    advisor, never saw any reason to believe that a thousand line proof
    was any more likely to be bug-free than a thousand line program.
    -John]
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Martin Ward@mwardgkc@gmail.com to comp.compilers on Thu Feb 2 15:44:17 2023
    From Newsgroup: comp.compilers

    On 01/02/2023 08:07, Aharon Robbins wrote:> I've never understood this.
    Isn't there a chicken and egg problem?
    How do we know that the theorem prover is correct and bug free?
    A theorem prover generates a proof of the theorem (if it succeeds).
    Checking the correctness of a proof is a much simpler task
    than finding the proof in the first place and can be carried
    out independently by different teams using different methods.
    Appel and Haken's proof of the four colour theorem, for example,
    involved a significant element of computer checking which was
    independently double checked with different programs and computers.

    [It's a perfectly reasonable question. Alan Perlis, who was my thesis advisor, never saw any reason to believe that a thousand line proof
    was any more likely to be bug-free than a thousand line program.
    -John]

    Mathematicians publish proofs all the time and only a tiny percentage
    of published proofs turn out to have errors. Programmers release
    programs all the time and a vanishingly small percentage of these
    turn out to be free from all bugs. Alan Perlis may not have been able
    to think of a reason why this should be the case, but it is,
    nevetheless, the case.

    --
    Martin

    Dr Martin Ward | Email: martin@gkc.org.uk | http://www.gkc.org.uk G.K.Chesterton site: http://www.gkc.org.uk/gkc | Erdos number: 4
    [Computer programs tend to be a lot longer than mathematical proofs. I
    realize there are some 500 page proofs, but there are a whole lot of
    500 page programs. It is my impression that in proofs, as in progams,
    the longer and more complicated they are, the more likely they are to
    have bugs. -John]
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From gah4@gah4@u.washington.edu to comp.compilers on Thu Feb 2 16:24:11 2023
    From Newsgroup: comp.compilers

    (snip, I wrote)

    Could we have a standard intermediate language, with all
    features needed, or that ever will be needed?

    [The answer to that last question is no. It's a very old bad idea,
    with the first failed attempt UNCOL in 1958. -John]

    I have wondered, though, about a standardized intermediate
    for a processor family. One could write compilers to generate it,
    and then updated processors come along with updated code
    generators. Or even distribute intermediate code, and the
    installer generates the right version for the processor.

    This would have been especially useful for Itanium, which
    (mostly) failed due to problems with code generation.
    Since the whole idea is that the processor depends on the
    code generator doing things in the right order. That is, out
    of order execution, but determined at compile time. Failure
    to do that meant failure for the whole idea.

    [Someone comes up with an intermediate language that works for a few
    source languages and a few targets, and usually publishes a paper
    about his breakthrough. Then people try to add more front ends and
    back ends, the intermediate language gets impossibly complex and
    buggy, and the project is quietly forgotten. I'd think the back end
    problem is a lot easier now than it was in 1958 since everything is
    twos complement arithmetic in 8-bit bytes, but the front end is still
    daunting.
    If you don't push it too far it's certainly possible to do a lot of
    work in a largely machine-independent way as LLVM does. -John]
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From arnold@arnold@skeeve.com (Aharon Robbins) to comp.compilers on Fri Feb 3 08:26:45 2023
    From Newsgroup: comp.compilers

    Dr. Ward replied to me privately also, but since this went to the group,
    I'll say the same thing here that I replied to him.

    In article <23-02-005@comp.compilers>, Martin Ward <martin@gkc.org.uk> wrote: >On 01/02/2023 08:07, Aharon Robbins wrote:
    I've never understood this. Isn't there a chicken and egg problem?
    How do we know that the theorem prover is correct and bug free?

    A theorem prover generates a proof of the theorem (if it succeeds).
    Checking the correctness of a proof is a much simpler task
    than finding the proof in the first place and can be carried
    out independently by different teams using different methods.
    Appel and Haken's proof of the four colour theorem, for example,
    involved a significant element of computer checking which was
    independently double checked with different programs and computers.

    This tells me what a theorem prover does, and why it's useful. It
    does not answer my question: How do we know that the theorem prover
    itself is correct and bug free?

    [It's a perfectly reasonable question. Alan Perlis, who was my thesis
    advisor, never saw any reason to believe that a thousand line proof
    was any more likely to be bug-free than a thousand line program.
    -John]

    And this is a different point from my question.

    Mathematicians publish proofs all the time and only a tiny percentage
    of published proofs turn out to have errors. Programmers release
    programs all the time and a vanishingly small percentage of these
    turn out to be free from all bugs. Alan Perlis may not have been able
    to think of a reason why this should be the case, but it is,
    nevetheless, the case.

    I don't argue this. But I think mathematical theorems and their
    proofs are different qualitatively from real world large programs.
    I do think there's room for mathematical techniques to help
    improve software development, but I don't see that done down
    in the trenches, and I've been down in the trenches for close
    to 40 years.

    Arnold
    --
    Aharon (Arnold) Robbins arnold AT skeeve DOT com
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.compilers on Fri Feb 3 11:44:03 2023
    From Newsgroup: comp.compilers

    gah4 <gah4@u.washington.edu> writes:
    I have wondered, though, about a standardized intermediate
    for a processor family. One could write compilers to generate it,
    and then updated processors come along with updated code
    generators. Or even distribute intermediate code, and the
    installer generates the right version for the processor.

    This would have been especially useful for Itanium, which
    (mostly) failed due to problems with code generation.

    I dispute the latter claim. My take is that IA-64 failed because the
    original assumption that in-order performance would exceed OoO
    performance was wrong. OoO processors surpassed in-order CPUs; they
    managed to get higher clock rates (my guess is that this is due to
    them having smaller feedback loops) and they benefit from better
    branch prediction, which extends to 512-instruction reorder buffers on
    recent Intel CPUs, far beyond what compilers can achieve on IA-64.
    The death knell for IA-64 competetiveness was the introduction of SIMD instruction set extensions which made OoO CPUs surpass IA-64 even in
    those vectorizable codes where IA-64 had been competetive.

    Since the whole idea is that the processor depends on the
    code generator doing things in the right order. That is, out
    of order execution, but determined at compile time. Failure
    to do that meant failure for the whole idea.

    But essentially all sold IA-64 CPUs were variations of the McKinley microarchitecture as far as performance characteristics were
    concerned, especially during the time when IA-64 was still perceived
    as relevant. The next microarchitecture Poulson was only released in
    2012 when IA-64 had already lost.

    [Someone comes up with an intermediate language that works for a few
    source languages and a few targets, and usually publishes a paper
    about his breakthrough. Then people try to add more front ends and
    back ends, the intermediate language gets impossibly complex and
    buggy, and the project is quietly forgotten. I'd think the back end
    problem is a lot easier now than it was in 1958 since everything is
    twos complement arithmetic in 8-bit bytes

    Yes. Computer architecture converges. 20 years ago we still had to
    worry about alignment and byte ordering, nowadays alignment has been
    settled in general-purpose CPUs (no alignment restrictions), and byte
    ordering is mostly settled to little-endian (exception s390/s390x).

    If you don't push it too far it's certainly possible to do a lot of
    work in a largely machine-independent way as LLVM does. -John]

    LLVM mostly supports what the hardware supports, but there is at least
    one exception: LLVM IR divides the code into functions, that you can
    only call, with LLVM handling the calling convention. When someone
    needs to deviate from the calling convention, like the Haskel/C--
    people, LLVM provides some possibilities, but from what I read, they
    had a hard time pulling it through.

    - anton
    --
    M. Anton Ertl
    anton@mips.complang.tuwien.ac.at
    http://www.complang.tuwien.ac.at/anton/
    [I'm with you on IA-64 and VLIW. I knew the Multiflow people pretty
    well and it is evident in retrospect that while it was a good fit
    with what you could build in the 1980s, hardware soon reached the
    point where it could do better what VLIW compilers had done. -John]
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From gah4@gah4@u.washington.edu to comp.compilers on Fri Feb 3 19:13:17 2023
    From Newsgroup: comp.compilers

    On Friday, February 3, 2023 at 10:17:06 AM UTC-8, Anton Ertl wrote:

    (snip, I wrote)

    This would have been especially useful for Itanium, which
    (mostly) failed due to problems with code generation.

    I dispute the latter claim. My take is that IA-64 failed because the
    original assumption that in-order performance would exceed OoO
    performance was wrong. OoO processors surpassed in-order CPUs; they
    managed to get higher clock rates (my guess is that this is due to
    them having smaller feedback loops) and they benefit from better
    branch prediction, which extends to 512-instruction reorder buffers on
    recent Intel CPUs, far beyond what compilers can achieve on IA-64.
    The death knell for IA-64 competetiveness was the introduction of SIMD instruction set extensions which made OoO CPUs surpass IA-64 even in
    those vectorizable codes where IA-64 had been competetive.

    I got interested in OoO in the days of the IBM 360/91, which is early
    in the line of OoO processors. Among others, the 91 has imprecise
    interrupts, where the stored address is not the instruction after the
    interrupt cause.

    But okay, the biggest failure of Itanium is that it was two years or
    so behind schedule when it came out. And partly, as well as I
    remember, is the need to implement x86 instructions, too.

    Since the whole idea is that the processor depends on the
    code generator doing things in the right order. That is, out
    of order execution, but determined at compile time. Failure
    to do that meant failure for the whole idea.

    But essentially all sold IA-64 CPUs were variations of the McKinley microarchitecture as far as performance characteristics were
    concerned, especially during the time when IA-64 was still perceived
    as relevant. The next microarchitecture Poulson was only released in
    2012 when IA-64 had already lost.

    But is it the whole idea of compile-time instruction scheduling the
    cause of the failure, or just the way they did it?

    The 360/91 had some interesting problems. One is that it had 16 way interleaved memory with a cycle time of 13 processor cycles, and the
    goal of one instruction per clock cycle. That means it is dependent on
    memory address ordering, which is hard to know at compile time.

    The slightly later 360/85, without the fancy OoO processor, but with
    cache memory (the first machine with cache!) was about as fast
    on real problems.

    Otherwise, the 91 has a limited number of reservation stations,
    limiting how for OoO it can go. All OoO processors have a limit
    to how far they can go. But the compiler does not have that limit.

    Now, since transistors are cheap now, and one can throw a large
    number into reorder buffers and such, one can build really deep
    pipelines.

    But the reason for bringing this up, is that if Intel had a defined intermediate code, and supplied the back end that used it,
    and even more, could update that back end later, that would have
    been very convenient for compiler writers.

    Even more, design for it could have been done in parallel with the
    processor, making both work well together.

    Reminds me of the 8087 virtual register stack. The 8087 has
    eight registers, but they were supposed to be a virtual stack.
    They would be copied to/from memory on stack overflow
    or underflow. But no-one tried to write the interrupt routine
    until after the hardware was made, and it turns out not to
    be possible. I never knew why that wasn't fixed for the 287
    or 387, though.
    [Multiflow found that VLIW compile-time instruction scheduling was
    swell for code with predictable memory access patterns, much less so
    for code with data-dependent access patterns. I doubt that has
    changed. And if the memory access is that predictable, you can
    likely use SIMD instructions instead. -John]

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From gah4@gah4@u.washington.edu to comp.compilers on Sat Feb 4 02:55:01 2023
    From Newsgroup: comp.compilers

    On Friday, February 3, 2023 at 7:51:41 PM UTC-8, gah4 wrote:

    (snip on Itanium and OoO execution)

    The 360/91 had some interesting problems. One is that it had 16 way interleaved memory with a cycle time of 13 processor cycles, and the
    goal of one instruction per clock cycle. That means it is dependent on
    memory address ordering, which is hard to know at compile time.

    (snip)

    But the reason for bringing this up, is that if Intel had a defined intermediate code, and supplied the back end that used it,
    and even more, could update that back end later, that would have
    been very convenient for compiler writers.

    (snip, our moderator wrote)
    [Multiflow found that VLIW compile-time instruction scheduling was
    swell for code with predictable memory access patterns, much less so
    for code with data-dependent access patterns. I doubt that has
    changed. And if the memory access is that predictable, you can
    likely use SIMD instructions instead. -John]

    So, 50 years ago we had the 360/91, and some CDC competition.

    About 45 years ago, we had the Cray-1 and successor vector processors.

    So, yes, SIMD processors.

    Then 30 years ago, Fortran added the FORALL statement, to make writing
    vector code easier. Except that it doesn't.

    FORALL, and array assignment statements, always operate such that
    the effect is as if the whole right-hand side is evaluated before the left
    side is changed. Unless the whole thing fits in a vector register,
    the compiler is still stuck.

    And just about that time, vector processor go out of style.

    (There is now DO CONCURRENT, which might be better.)

    Much matrix code goes sequentially through memory, and programmers
    know that. (Or they are supposed to know that.)

    Early Fortran had the FREQUENCY statement, such that one could
    help the compiler know the likely iteration count for DO loops, and
    branch probability for branch instructions. Those would help with
    static branch predictors, but are long gone now.

    It would seem, though, that a statement telling the compiler the
    expected memory access pattern would help.

    And the Cray-1, like the 360/91, has 16-way interleaved
    memory, 64 bits wide.
    [Legend says that in one of the early Fortran compilers, the FREQUENCY statement was accidentally implemented backward and nobody noticed. We
    have found over and over that programmers' intuitions about the way
    their programs perform are quite poor and we're much better off saying
    what we mean and letting the compiler figure out how to do low level optimizations. -John]
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.compilers on Sun Feb 5 17:59:03 2023
    From Newsgroup: comp.compilers

    Martin Ward <mwardgkc@gmail.com> writes:
    On 01/02/2023 08:07, Aharon Robbins wrote:> I've never understood this.
    [It's a perfectly reasonable question. Alan Perlis, who was my thesis
    advisor, never saw any reason to believe that a thousand line proof
    was any more likely to be bug-free than a thousand line program.
    -John]

    Mathematicians publish proofs all the time and only a tiny percentage
    of published proofs turn out to have errors.

    Even at face value I would like to see some evidence for this claim.

    A comparable statement would be "Computer scientists publish papers
    about programs all the time, and only a tiny percentage of these
    papers turn out to have errors".

    There is a difference between how a mathematical proof is used and how
    a program is used.

    1) A program is usually fed to a machine for execution. A published
    proof is read by a number (>=0) of mathematicians, who fill in a lot
    of what the author has left out. If you feed the published proof to
    an automatic proof checker (of your choice), how many of the published
    proofs need no additions and no changes (bug fixes) before the proof
    checker verifies the proof. I guess there is a reason why Wikipedia
    has no page on "proof checker", but suggests "proof assistant": "a
    software tool to assist with the development of formal proofs by
    human-machine collaboration."

    2) A program has to satisfy the requirements of its users, while a
    published proof is limited to proving a published theorem. One
    example of the difference is that undefinedness is totally acceptable
    in mathematics, while it is a bug in programs (interestingly, there is
    a significant number of compiler writers who take the mathematical
    view in what they provide to programmers, but consider that a program
    in their programming language that exercises the undefined behaviour
    that they provide to programmers to be buggy).

    The size argument that our esteemed moderator provided also has to be considered.

    - anton
    --
    M. Anton Ertl
    anton@mips.complang.tuwien.ac.at
    http://www.complang.tuwien.ac.at/anton/
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.compilers on Tue Feb 7 08:35:00 2023
    From Newsgroup: comp.compilers

    gah4 <gah4@u.washington.edu> writes:
    On Friday, February 3, 2023 at 10:17:06 AM UTC-8, Anton Ertl wrote:

    (snip, I wrote)

    This would have been especially useful for Itanium, which
    (mostly) failed due to problems with code generation.

    I dispute the latter claim. My take is that IA-64 failed because the
    original assumption that in-order performance would exceed OoO
    performance was wrong. OoO processors surpassed in-order CPUs; they
    managed to get higher clock rates (my guess is that this is due to
    them having smaller feedback loops) and they benefit from better
    branch prediction, which extends to 512-instruction reorder buffers on
    recent Intel CPUs, far beyond what compilers can achieve on IA-64.
    The death knell for IA-64 competetiveness was the introduction of SIMD
    instruction set extensions which made OoO CPUs surpass IA-64 even in
    those vectorizable codes where IA-64 had been competetive.
    ...
    But okay, the biggest failure of Itanium is that it was two years or
    so behind schedule when it came out.

    If it had had superior performance when McKinley came out in 2002,
    that would have been just a minor speed bump on the way to success.

    And partly, as well as I
    remember, is the need to implement x86 instructions, too.

    Certainly, backwards compatibility is paramount in the markets that it
    was intended for. And the 386 compatibility also had to be close to competetive. But it wasn't.

    But is it the whole idea of compile-time instruction scheduling the
    cause of the failure, or just the way they did it?

    It seems to me that the idea that in-order execution combined with architectural features for reducing dependencies and with compiler
    scheduling turned out to be inferior to out-of-order execution.
    Moreover, in those areas where the compiler works well (numerical
    code), it also works ok for using SIMD instructions
    (auto-vectorization) which could be added relatively cheaply to the
    hardware.

    All OoO processors have a limit
    to how far they can go. But the compiler does not have that limit.

    And the compiler can make more sophisticated scheduling decisions
    based on the critical path, while the hardware scheduler picks the
    oldest ready instruction. These were the ideas that seduced Intel,
    HP, and Transmeta to invest huge amounts of money into EPIC ideas.

    But the compiler has other limits. It cannot schedule across indirect
    calls (used in object-oriented dispatch), across compilation unit
    boundaries (in particular, calls to and returns from shared
    libraries). Another important limit is the predictability of
    branches. Static branch prediction using profiles has ~10%
    mispredictions, while (hardware) dynamic branch prediction has a much
    lower misprediction rate (I remember numbers like 3% (for the same
    benchmarks that have 10% mispredictions with static branch prediction)
    in papers from the last century; I expect that this has improved even
    more in the meantime. If the compiler mispredicts, it will schedule instructions from the wrong path, instructions that will be useless
    for execution.

    In the end a compiler typically can schedule across a few dozen
    instructions, while hardware can schedule across a few hundred.
    Compiler scheduling works well for simple loops, and that's where
    IA-64 shone, but only doing loops well is not good enough for
    general-purpose software.

    Now, since transistors are cheap now, and one can throw a large
    number into reorder buffers and such, one can build really deep
    pipelines.

    It's interesting that Intel managed to produce their first OoO CPU
    (the Pentium Pro with 5.5M transistors) in 350nm, while Merced (the
    first Itanium) at 25.4M transistors was too large for the 250nm and
    they had to switch to 180nm (contributing to the delays). So, while
    the theory was that the EPIC principle would reduce the hardware
    complexity (to allow adding more functional units for increased
    performance), in Itanium practice the hardware was more complex, and
    the performance advantages did not appear.

    But the reason for bringing this up, is that if Intel had a defined >intermediate code, and supplied the back end that used it,
    and even more, could update that back end later, that would have
    been very convenient for compiler writers.

    Something like this happened roughly at the same time with LLVM.
    There were other initiatives, but LLVM was the one that succeeded.

    There was the Open Research Compiler for IA-64 from Intel and the
    Chinese Academy of Sciences.

    SGI released their compiler targeted at IA-64 as Open64.

    Even more, design for it could have been done in parallel with the
    processor, making both work well together.

    Intel, HP, and others worked on compilers in parallel to the hardware
    work. It's just that the result did not perform as well for
    general-purpose code as OoO processors with conventional compilers.

    [Multiflow found that VLIW compile-time instruction scheduling was
    swell for code with predictable memory access patterns, much less so
    for code with data-dependent access patterns.

    Multiflow and Cydrome built computers for numerical computations (aka
    HPC aka (mini-)supercomputing). These tend to spend a lot of time in
    simple loops with high iteration counts and have statically
    predictable branches. They found compiler techniques like trace
    scheduling (Multiflow) that work well for code with predictable
    branches, and modulo schedling (Cydrome) that work well for simple
    loops with high iteration counts. The IA-64 and Transmeta architects
    wanted to extend these successes to general-purpose computing, but it
    did not work out.

    Concerning memory access patterns: While they are not particularly
    predictable in general-purpose code, mostd general-purpose code
    benefits quite a lot from caches (more than numerical code), so I
    don't think that this was a big problem for IA-64.

    Some people mention varying latencies due to caches as a problem, but
    the choice of a small L1 cache (16KB compared to 64KB for the earlier
    21264 and K7 CPUs) for McKinley indicates that average latency was
    more of a problem for IA-64 than varying latencies.

    And if the memory access is that predictable, you can
    likely use SIMD instructions instead. -John]

    Yes, SIMD ate EPICs lunch on the numerical program side, leaving few
    programs where IA-64 outdid the mainstream.

    - anton
    --
    M. Anton Ertl
    anton@mips.complang.tuwien.ac.at
    http://www.complang.tuwien.ac.at/anton/
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From gah4@gah4@u.washington.edu to comp.compilers on Wed Feb 8 01:04:38 2023
    From Newsgroup: comp.compilers

    On Tuesday, February 7, 2023 at 6:24:44 PM UTC-8, Anton Ertl wrote:

    (snip)

    All OoO processors have a limit
    to how far they can go. But the compiler does not have that limit.

    And the compiler can make more sophisticated scheduling decisions
    based on the critical path, while the hardware scheduler picks the
    oldest ready instruction. These were the ideas that seduced Intel,
    HP, and Transmeta to invest huge amounts of money into EPIC ideas.

    But the compiler has other limits. It cannot schedule across indirect
    calls (used in object-oriented dispatch), across compilation unit
    boundaries (in particular, calls to and returns from shared
    libraries).

    OK, maybe we are getting closer.
    Many processors now have speculative execution.
    If you don't know what else to do, execute some instructions
    just in case, but don't change actual memory or registers yet.

    And Itanium doesn't do that. I don't see, though, that it isn't
    possible to have EPIC and also speculative execution.


    Another important limit is the predictability of
    branches. Static branch prediction using profiles has ~10%
    mispredictions, while (hardware) dynamic branch prediction has a much
    lower misprediction rate (I remember numbers like 3% (for the same
    benchmarks that have 10% mispredictions with static branch prediction)
    in papers from the last century; I expect that this has improved even
    more in the meantime. If the compiler mispredicts, it will schedule instructions from the wrong path, instructions that will be useless
    for execution.

    OK, but we just discussed speculative execution. So you can execute
    two paths or maybe three.


    In the end a compiler typically can schedule across a few dozen
    instructions, while hardware can schedule across a few hundred.
    Compiler scheduling works well for simple loops, and that's where
    IA-64 shone, but only doing loops well is not good enough for
    general-purpose software.

    OK, loops were important for the 360/91, meant for floating point
    number crunching. (Only floating point is OoO.)
    And small inner loops are usual in those programs. Among those
    is loop mode, where it keeps the instructions, and doesn't have
    to keep fetching them.

    But also, the 360/91 was designed to run code written for any
    S/360 model. So, no special ordering and even self-modifying
    code. Even more, S/360 only has four floating point registers,
    so register renaming was important. Instructions had to be
    ordered for use of registers, where out-of-order and register
    renaming could fix that.

    Okay, so I am not saying that EPIC has to get all the ordering,
    but only close enough. So, ones now can keep hundreds of
    instructions in execution at the same time?

    Now, since transistors are cheap now, and one can throw a large
    number into reorder buffers and such, one can build really deep
    pipelines.

    It's interesting that Intel managed to produce their first OoO CPU
    (the Pentium Pro with 5.5M transistors) in 350nm, while Merced (the
    first Itanium) at 25.4M transistors was too large for the 250nm and
    they had to switch to 180nm (contributing to the delays). So, while
    the theory was that the EPIC principle would reduce the hardware
    complexity (to allow adding more functional units for increased
    performance), in Itanium practice the hardware was more complex, and
    the performance advantages did not appear.

    I remember lots of stories about how PPro didn't do well with the 16 bit
    code still in much of DOS and Windows. Enough that it was slower
    than Pentium. I am not sure by now how much OO the PPro does.

    But the reason for bringing this up, is that if Intel had a defined >intermediate code, and supplied the back end that used it,
    and even more, could update that back end later, that would have
    been very convenient for compiler writers.

    Something like this happened roughly at the same time with LLVM.
    There were other initiatives, but LLVM was the one that succeeded.

    There was the Open Research Compiler for IA-64 from Intel and the
    Chinese Academy of Sciences.

    SGI released their compiler targeted at IA-64 as Open64.

    Even more, design for it could have been done in parallel with the >processor, making both work well together.

    Intel, HP, and others worked on compilers in parallel to the hardware
    work. It's just that the result did not perform as well for
    general-purpose code as OoO processors with conventional compilers.

    [Multiflow found that VLIW compile-time instruction scheduling was
    swell for code with predictable memory access patterns, much less so
    for code with data-dependent access patterns.

    Multiflow and Cydrome built computers for numerical computations (aka
    HPC aka (mini-)supercomputing). These tend to spend a lot of time in
    simple loops with high iteration counts and have statically
    predictable branches. They found compiler techniques like trace
    scheduling (Multiflow) that work well for code with predictable
    branches, and modulo schedling (Cydrome) that work well for simple
    loops with high iteration counts. The IA-64 and Transmeta architects
    wanted to extend these successes to general-purpose computing, but it
    did not work out.

    I sometimes work on computational physics problems. The ones with
    tight loops of floating point operations. I believe that some Itanium
    systems went into a large cluster somewhere. I got an RX2600
    for $100 some years ago, when there were many on eBay.
    (You could even buy three at a time.)

    But yes, I suspect for web servers, that they aren't especially good.

    Concerning memory access patterns: While they are not particularly predictable in general-purpose code, mostd general-purpose code
    benefits quite a lot from caches (more than numerical code), so I
    don't think that this was a big problem for IA-64.

    Some people mention varying latencies due to caches as a problem, but
    the choice of a small L1 cache (16KB compared to 64KB for the earlier
    21264 and K7 CPUs) for McKinley indicates that average latency was
    more of a problem for IA-64 than varying latencies.

    And if the memory access is that predictable, you can
    likely use SIMD instructions instead. -John]

    Yes, SIMD ate EPICs lunch on the numerical program side, leaving few
    programs where IA-64 outdid the mainstream.

    The Cray series of vector processors seems long gone by now.
    --- Synchronet 3.21b-Linux NewsLink 1.2