Forum: Too Lazy BBS

Compilation Quotient (CQ): A Metric for the Compilation Hardness of Programming Languages

From John R Levine@johnl@taugh.com to comp.compilers on Mon Jun 10 14:21:36 2024

From Newsgroup: comp.compilers

This preprint from TU Delft and ETH Zurich generates small programs from
the grammars of several popular programs, and calculates CQ, which is
roughly the percentage (0-100) that compile, intended as a proxy for how
hard the languages are to write. C has a CQ of 48, Rust barely above
zero.

In the discussion at the end they say "A programmer's task is to write
programs that compile." which I think summarizes the basic problem with
the paper. Take a look.

https://arxiv.org/abs/2406.04778

Regards,
John Levine, johnl@taugh.com, Taughannock Networks, Trumansburg NY
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.21b-Linux NewsLink 1.2

From Jon Chesterfield@jonathanchesterfield@gmail.com to comp.compilers on Mon Jun 10 19:20:08 2024

From Newsgroup: comp.compilers

Curious paper, thank you.

The probability that a program generated by the grammar fails semantic
analysis does seem an interesting value. Estimating it by sampling from a property based tester seems reasonable too.

I don't think this says anything meaningful about the experience of
programming in one of these as grammar and sema errors are both reported
early. It probably does indicate cases that a given language could detect earlier by changing their grammar.

Jon
[I had two other thoughts. One was that you can tell C was written when parsing was still hard enough that you didn't want to bulk the parsers
up with semantic stuff. The other was that in the languages where it is
hard to write a valid problem, how much more likely is it that the program actually works once you get it to compile? -John]

--- Synchronet 3.21b-Linux NewsLink 1.2

From Derek@derek-nospam@shape-of-code.com to comp.compilers on Mon Jun 10 20:30:37 2024

From Newsgroup: comp.compilers

John,

This preprint from TU Delft and ETH Zurich generates small programs from
the grammars of several popular programs, and calculates CQ, which is
roughly the percentage (0-100) that compile, intended as a proxy for how
hard the languages are to write. C has a CQ of 48, Rust barely above
zero.

The paper
Programming Languages vs. Fat Fingers https://www2.dmst.aueb.gr/dds/blog/20121205/index.html

made small changes to existing code, in various languages,
and then measured how many compiled, ran and produced
the correct output.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Derek@derek-nospam@shape-of-code.com to comp.compilers on Tue Jun 11 00:28:18 2024

From Newsgroup: comp.compilers

John,

[I had two other thoughts. One was that you can tell C was written when parsing was still hard enough that you didn't want to bulk the parsers
up with semantic stuff. The other was that in the languages where it is
hard to write a valid problem, how much more likely is it that the program actually works once you get it to compile? -John]

C was created after Algol 68, whose 2-level grammar contained
syntax+semantics. Algol 68 programs automatically generated from the
language grammar should compile just fine. I suspect that output would
be rare, because generating the code needed to produce output would be uncommon, and the path to it being the end result of a drunkards walk.

C had a kind-of conventional grammar, where-as Algol 68 grammar is
certainly not conventional (it might even be unique).
[I never heard of any other language using VW-grammars. In C's
defense, the early compilers -John]
--- Synchronet 3.21b-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at to comp.compilers on Tue Jun 11 07:57:46 2024

From Newsgroup: comp.compilers

John Levine:

[I had two other thoughts. One was that you can tell C was written when >parsing was still hard enough that you didn't want to bulk the parsers
up with semantic stuff.

To me it looks the other way 'round: syntax specification formalisms
such as BNF inspired programming language designers to put a lot of
stuff in syntax, because that was formal. E.g., Algol 60
differentiates between booleans and other values on the syntax level.
Algol 68 introduced Van Wijngaarden grammars to specify the type
system and the syntax in one syntactic formalism.

Other, later languages have reduced the scope of syntax (often only
slightly), and specify the type system as a separate entity.
Interestingly, I am not aware of a widely successful formalism for
type systems, even though many programming languages specify static
type systems and their implementations have to perform static type
checking (plus there is also dynamic type checking).

The other was that in the languages where it is
hard to write a valid program, how much more likely is it that the program >actually works once you get it to compile? -John]

That is the promise of programming langauges that make it hard to get
a program to compile: get it to compile, and it is usually correct. I
am not aware of any empirical evidence that supports this promise.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
--- Synchronet 3.21b-Linux NewsLink 1.2

From Martin Ward@mwardgkc@gmail.com to comp.compilers on Tue Jun 11 15:06:07 2024

From Newsgroup: comp.compilers

On 10/06/2024 13:21, John R Levine wrote:

C has a CQ of 48, Rust barely above zero.

In the discussion at the end they say "A programmer's task is to
write programs that compile." which I think summarizes the basic
problem with the paper. Take a look.

CQ is, very approximately, a measure of how likely it is that
a compiler will detect a typo in your code (using "typo" in
the broadest sense of: you are thinking of one program but
actually type in something vaguely similar but different).

"Almost any random garbage is a valid program in our language"
does not appear to me to be a particularly attractive feature
of a language.

\--
Martin

Dr Martin Ward | Email: [martin@gkc.org.uk](mailto:martin@gkc.org.uk) | <http://www.gkc.org.uk>
G.K.Chesterton site: <http://www.gkc.org.uk/gkc> | Erdos number: 4
--- Synchronet 3.21b-Linux NewsLink 1.2

From Derek@derek-nospam@shape-of-code.com to comp.compilers on Tue Jun 11 22:45:30 2024

From Newsgroup: comp.compilers

John, Anton,

The other was that in the languages where it is
hard to write a valid program, how much more likely is it that the program >> actually works once you get it to compile? -John]

That is the promise of programming langauges that make it hard to get
a program to compile: get it to compile, and it is usually correct. I
am not aware of any empirical evidence that supports this promise.

Requiring that variables are defined before use
decreases incorrectness (which is not a marketable term).

There is a tiny amount of evidence that strong typing may
be a benefit https://shape-of-code.com/2014/08/27/evidence-for-the-benefits-of-strong-typing-where-is-it/

cost effectiveness of benefits is a question that
researchers avoid (it smacks of grubby usefulness).

If you are interested in evidence, check out
My book, Evidence-based Software Engineering, which
discusses what is currently known about software engineering,
based on an analysis of all the publicly available data
pdf+code+all data freely available here:
http://knosof.co.uk/ESEUR/

If you know of any interesting software engineering
data that I don't have, please tell me about it.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Hans-Peter Diettrich@DrDiettrich1@netscape.net to comp.compilers on Wed Jun 12 11:27:21 2024

From Newsgroup: comp.compilers

On 6/10/24 2:21 PM, John R Levine wrote:

generates small programs from
the grammars of several popular programs,

I think that the *syntactic grammar* of program *languages* is meant:

The key idea is to measure the compilation success rates of programs
sampled from context-free grammars.
<<

Then I wonder how ever valid random programs can be generated for
languages that require a declaration before use of an identifier,
clearly a *semantic* issue. A CQ of 40 for C indicates to me that
certain semantic rules have been built into the program generator.

Or what did I not understand right?

DoDi
[The paper describes the grammars they use. C grammar requires declarations precede other statements so that's easy to get right. -John]
--- Synchronet 3.21b-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Geek2
  Tue May 19 08:20:37 2026
  from Euclid, Oh via Telnet
- Geek2
  Sun May 17 07:06:15 2026
  from Euclid, Oh via Telnet
- Geek2
  Sat May 16 21:25:04 2026
  from Euclid, Oh via Telnet
- Jas Hud
  Sat May 16 00:50:28 2026
  from Bbs.Eob-Bbs.Com,wi via Telnet

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	65
Nodes:	6 (0 / 6)
Uptime:	16:08:31
Calls:	863
Calls today:	1
Files:	1,311
D/L today:	11 files (21,614K bytes)
Messages:	265,788

Compilation Quotient (CQ): A Metric for the Compilation Hardness of Programming Languages

Who's Online

Recent Visitors

System Info