Forum: Too Lazy BBS

Re: Constants and undefined behavior

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Fri Jun 5 10:49:28 2026

From Newsgroup: comp.lang.c

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 2026-06-05 01:49, Dan Cross wrote:

[...]

[...]

[ ... (INT_MAX+1)*0 ]

Furthermore, the expression above is obviously an integer
constant expression as defined by sec 6.6 para 8. Section 6.6,
para 4, reads in part, "Each constant expression shall evaluate
to a constant that is in the range of representable values for
its type." The expression, `(INT_MAX+1)*0` violates this
constraint, and so therefore a diagnostic is mandated as per
sec 5.1.1.3 para 1. That it appears in code that is not
obviously called from `main` doesn't change that.

I'm curious about that "violation"; a violation would require
(at least) two sorts of logical preconditions. - The first is
that all *sequentially* (literally) evaluated sub-expression
values are representable as value - INT_MAX+1 certainly can't
be represented in generated code that conforms to the abstract
*mathematical* value - but is that necessary if _the whole_
expression is (mathematically) just 0 (because of the final
factor). And the second (related) is whether the order of the
sub-expression evaluation is relevant; if we'd assume the
expression evaluation to be considered from right to left then
it would be irrelevant what's inside the parenthesis.

If the expression were evaluated right to left, it would still
compute INT_MAX+1, which is UB.

Let's look at an example where it's not in a context that requires a
constant expression:

int n;
n = (INT_MAX+1)*0;

In the abstract machine, the RHS is evaluated by adding INT_MAX
and 1 (which overflows, UB) and then multiplying the result by 0.

A compiler is allowed, but not required, to reduce the assignment to
`n = 0;`. If it does so, then no overflow occurs at run time --
but the definedness of the behavior is determined independent of
any optimizations. The C standard does not require any particular
behavior. It can set n to 0 because that's a valid consequence
of UB.

Let's take an example where it's definitely in a context that
requires an integer constant expression:

switch (0) {
case (INT_MAX+1)*0:
break;
}

The wording in 6.6 (Constant expressions) is slightly vague.
For example, I would assume that any subexpression of a constant
expression must be a constant expression, but it doesn't actually
say so.

But since, in the abstract machine, (INT_MAX+1)*0 doesn't yield
any defined value, I'd say it violates the constraint that "Each
constant expression shall evaluate to a constant that is in the
range of representable values for its type".

The alternative would be for to be a constant expression for
implementations that are able to recognize that anything multiplied
by zero is zero (analysis that compilers aren't required to perform),
and not for others.

On the other hand, "An implementation may accept other forms of
constant expressions; however, it is implementation-defined whether
they are an integer constant expression." That probably allows,
but does not reuqire, an implementation to treat (INT_MAX+1)*0 as
a constant expression with the value 0.

From the standard quotes I cannot really recognize that these
preconditions, how to determine UB/errors/violations, would be
necessary.

I'm no native speaker and I fear my question as formulated was
hard to understand. It's basically the question of the standard
implying (INT_MAX+1)*0 to be analyzed sequentially as written
or whether it could as well analyze it from right to left and
thus recognizing no problem, since from the mathematical view -
but also practically - a concrete representable value of a here
irrelevant sub-expression isn't necessary. Or another try of a
(paraphrased) formulation; for the determination of constraint
violations does the expression have strict (sort of) sequencing
points _after each term_ (and each left-to-right sub-expression
has to be well-defined) or can it be valued/analyzed as a whole
not putting any preconditions about evaluation order etc. when
determining the overall value?

PS: One yet non-considered question that was part of my original
post was: "Is there any rationale from the _software designer_'s perspective?"

From a programmer's perspective, it's good to have consistent
rules rather than leaving the decision of whether an expression
is a constant expression up to the undocumented vagaries of how
clever a compiler happens to be.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.22a-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Sat Jun 6 03:10:20 2026

From Newsgroup: comp.lang.c

In article <10vt7b9$pi3s$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <10vsnl7$lkmu$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <865x3yd21n.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote: >>>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <86ik81cfk5.fsf_-_@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

[...]

There's an important distinction to make here. Consider this
program:

#include <limits.h>

int
foo(){
int zero = (INT_MAX+1)*0;
return zero;
}

int
main(){
return 0;
}

This program does not transgress the bounds of undefined behavior. >>>>>

To clarify, the comments in my posting were meant to be read as >>>>>saying the given text is the entire program, and that it is strictly >>>>>conforming with respect to conforming hosted implementations. >>>>>(Incidentally, given the rules for freestanding implementations, I'm >>>>>not sure that it is even possible for any program to be strictly >>>>>conforming with respect to conforming freestanding implementations. >>>>>In any case my statements were meant only in the context of hosted >>>>>implementations.)

Ok.

[snip]
Perhaps you mean that this is irrelevant because `foo` is not
invoked, but I see no reason why that need be the case in e.g.
a freestanding environment.

I explained the context of my previous statements above. Sorry for >>>>>not saying that in the original message.

In a hosted environment, I don't
think anything explicitly prevents `foo` from being called after
`main` returns (though I can't imagine that would happen in real
life; it would be weird if it did).

The semantics described in the ISO C standard don't admit that >>>>>possibility.

Could you please point to where it says this, in the C standard?

I cannot find anything that says that arbitrary code cannot run
after `main()` returns, and I don't see how that could possibly
be true.

N3220 5.1.2.4, Program semantics.

It defines the *observable behavior* of a program, which consists of >>>accesses to volatile objects, data written to files, and I/O dynamics of >>>interactive devices.

Yes, but it does so for strictly-conforming programs with no UB.

It does so for programs in general, not just strictly conforming
ones. If a program has undefined behavior, all bets are off,
but for example a program that evaluates `printf("%d\n", INT_MAX)`
is not strictly conforming, but it's fully subject to 5.1.2.4.

To understand conformance, we have to jump over to section 4,
which explicitly says that, 'Undefined behavior is otherwise
indicated in this document by the words "undefined behavior" or
by the omission of any explicit definition of behavior.' As it
does not say that a program with an instance of undefined
behavior in an integer constant expression that is not executed
must otherwise behave in any given manner, what the program does
is undefined. A constaint violation mandates a diagnostic, but
beyond that, the standard is (AFAICT) silent.

I don't think an integer constant expression can have undefined
behavior. INT_MAX+1 and 1/0 are not constant expressions, because
neither "evaluate(s) to a constant that is in the range of
representable values for its type".

I claim that an expression that looks like a constant expression
*isn't* a constant-expression if it doesn't appear in a context
that requires a constant-expression.

That's a bold claim, but I think I see why you're saying that.

The program in question, quoted above, has:

int zero = (INT_MAX+1)*0;

`(INT_MAX+1)*0` is not a constant expression, not because of the
overflow, but because a constant expression is not required in
that context. "constant-expression" is defined by a production in
the grammar (it reduces to "conditional-expression"). Even in

int n = 42;

42 is not a a constant expression, because the grammar doesn't
call for a constant expression in that context -- even though it
looks like one. Similarly, in `a + b * c`, `a + b` looks like an
additive expression, but it isn't one. (Not a perfect analogy.)

Right; I see what you mean. In this case, the
`assignment-expression` production applies, not
`constant-expression`.

Undefined Behavior, in turn, is not defined as specific only to
execution: the standard simply says that it is "behavior, upon
use of a *nonportable or erroneous program construct*..." for
which there are no requirements, and there are examples of
things that are explicitly UB at translation time, such as
improperly terminated lexemes and so forth.

Yes, there are constructs that are explicitly UB at translation time.
(I think that's unfortunate, and there are efforts to clear up some
such cases in C2y.)

It's unclear to me how it could be any other way. If UB was
_only_ an issue at runtime, then how could a compiler take
advantage of it to perform optimizations during translation?
We know that compilers do this.

Signed integer overflow is not one of those constructs.

This I'm not sure I agree with. It the compiler detects signed
integer overflow in (perhaps not relevant in _this_ example) an
integer constant expression, I still don't see anthing that
makes that anything other than UB. It's a constaint violation,
sure, but nothing says it is not also UB.

Any undefined behavior from evaluating INT_MAX+1 happens during
execution (barring constraint violations).

I'm not sure the standard says that. The standard says this
happens during _evaluation_, and that evaluation must be
performed in accordance with the rules of the abstract syntax
machine. But it doesn't precisely specify _when_ evaluation
takes place, and in particular, there are places in the standard
that explicitly mention evaluation during translation. I still
don't see anything that prohibits a compiler from evaluating
that expression at compile time (indeed, it clearly does, as it
generates a diagnostic about the overflow).

I suppose that changes the matter: does the language merely
leave that unspecified, in which case, this program is not
strictly conforming, or does it say that it _cannot_ make any
translation-time decisions about it? I cannot find a satisfying
argument for the latter.

Furthermore, the expression above is obviously an integer
constant expression as defined by sec 6.6 para 8. Section 6.6,
para 4, reads in part, "Each constant expression shall evaluate
to a constant that is in the range of representable values for
its type." The expression, `(INT_MAX+1)*0` violates this
constraint, and so therefore a diagnostic is mandated as per
sec 5.1.1.3 para 1. That it appears in code that is not
obviously called from `main` doesn't change that.

It satisfies the requirements for an integer constant expression in
6.6p8, but it violates the constraint in 6.6p4. (I presume that an
"integer constant expression" must be a "constant expression".)
But since "constant-expression" is a grammatical production,
it doesn't have to satisfy that constraint, and no diagnostic
is required. (A warning is certainly permitted.)

Fair point. It's grammatical position makes it an
assignment-expression. I clearly misinterpreted that before.

Similarly, this:
int n = INT_MAX + 1;
at block scope doesn't require a diagnostic, though of course it
has undefined behavior -- but at file scope, the initializer is a
constant expression, so that would be a constraint violation.

Right. The semantics of this are defined in sec 6.7.11 para 5.

Morever, sec 6.6 para 17 says that, "the semantic rules for
evaluation of a constant expression are the same as for
nonconstant expressions." This brings us back to 5.1.2.4,
though I submit that para (4) is a stronger argument for what
you and Tim are saying, as it reads in part, "An actual
implementation is not required to evaluate part of an expression
if it can deduce that its value is not used and that no needed
side effects are produced (including any caused by calling a
function or through volatile access to an object)." I interpret
this to mean that, if the implementation can determine that
there is no way that `foo` can be called, it does not _have_ to
evaluate the above expression. However, it must satisfy the
range constraint from section 6.6, so it likely will, and in any
event, the standard does not say that it, "shall not" evaluate
it, or when.

Overflow in a constant expression is not undefined behavior. It's a >constraint violation. But that doesn't apply here, because the
initializer is not a constant expression. (Sorry if I'm repeating
myself.)

Where does it say that UB and constraint violations are mutually
exclusive? I don't see any such statement in the standard. Am
I missing it?

The standard says that if a constraint is violated, a diagnostic
must be emitted, regardless of whether or not the constraint
violation is the result of something that is UB not; that is, if
a constraint violation occurs due to something that is UB, the
implementation must still emit a diagnostic: UB is not an escape
hatch from that requirement.

It also says, 'If a "shall" or "shall not" requirement that
appears outside of a constraint or runtime-constraint is
violated, the behavior is undefined. Undefined behavior is
otherwise indicated in this document by the words "undefined
behavior" or by the omission of any explicit definition of
behavior.' However, that does not preclude such behavior being
undefined; it just means that the words "shall" and "shall not"
in a constraint violation do not a priori describe behavior vis
definition.

Once the compiler does that, if it does, and observes UB, the
standard is silent on what requirements it imposes, which means
the behavior is undefined. I see no reason it couldn't arrange
to invoke `foo` at that point.

Any UB in the program would occur during execution,

I suppose; but it's not clear to me that UB is tied _only_ to
execution time.

The standard is explicit that there _are_ things that are
evaluated at translation time, like the initializer for an
object with storage class `constexpr`. It is not clear me that
a compiler is otherwise _prohibited_ from evaluating an
expression during translation; indeed, one could imagine it
doing so to perform constant folding, and I do not believe there
exists any normative text defining it as such.

I realize this is an extreme interpretation, and not one that is
not widely shared. Personally, I think it's rather silly.

However, I that is _a_ danger of the informality of the C
specification; it does not define the semantics of the abstract
machine in the formally precise way that, say, the SML spec
defines that language's semantics. Rather, it informally
specifies them in prose, and that prose is ambiguous.

Probably much good would be done if C's semantics _were_
rigorously defined, but they are not. Thus, they are open to
radical interpretation, and as extreme as those may be, I do not
see how the normative text of the standard explicitly
_prohibits_ them.

and in fact
it *won't* occur during execution because foo() isn't called.
A compiler can't generate code with arbitrary behavior just because
it can't prove that there will be no UB. If it could, every signed
or floating-point arithmetic operation with unknown operand values
would grant the same permission.

But that's not the situation here. The situation is that the
compiler can prove that something _is_ UB.

Regardless, I think you highlighted an actual problem with the
spec; I don't think that behavior is _explicitly_ prohibited,
therefore, it is likely undefined, but at a minimum unspecified,
whether it actually could happen. If the argument against that
is that this renders the language essentially unusuable, then
my response is, "yeah, well, welcome to programming in C in the
2020s." Most compilers would never be that extreme, but I see
no evidence that it would not be an invalid reading of the
literal text of the standard if they did.

So no, I do not see how execution according to the rules of the
abstract machine is not guaranteed, here. I certainly see no
way in which this can be regarded as a strictly conforming
program.

foo()'s behavior would be undefined if it were called. It *isn't*
called, so there's no actual UB. The program does not violate any
of the other requirements for strict conformance.

I understand _what_ you're saying: despite the expression itself
manifesting undefined behavior, in this case it's not UB because
`foo` is never executed. What I'm saying is that I don't see
anything in the standard that restricts UB to _only_ executed
code. A reputable compiler obviously instruments `foo` with
code to trap into ubsan; if it's not UB, since it's not
executed, then why do so? Granted, that's not evidence of
anything other than the behavior of those compilers, but still.

It is clearly the _intent_ that this be a strictly conforming
program. The C standard, as an imprecise, informal document,
cannot guarantee it.

If the usual "Hello, world" program prints "Hello, world" followed
by "Goodbye", the implementation is non-conforming. If it formats
my hard drive after printing "Goodbye", it's non-conforming and >>>dangerous.

Two separate things. My point earlier was that code can
obviously run after `main` terminates. Moreoever, I can't
imagine what would _prevent_ a runtime system that invokes
`main` from doing something like printing, "PROGRAM STOPPED"
after `main` returned. C imposes no requirements here.

Yes, it does. An OS can print "PROGRAM STOPPED", but not as part
of the execution of the program. On my system, a shell prompt is
printed after a program terminates, but not by the program. If I
execute a "hello, world" program with its output redirected to a file
(on a system that supports that), the resulting file cannot contain
"PROGRAM STOPPED". The requirements in 5.1.2.4 specify both what
the execution of a program must do and what it must not do.

Files are a separate case. There's no guarantee that the
standard output refers to a file; it may well refer to an
"interactive device", the semantics of which are (necessarily)
unspecified.

Here's an example: consider an interactive user who uses a
screen reader device. Suppose that user makes use of an
implementation that includes runtime support for that device,
and that precedes invocation of `main` with a command sequence
causing the screen reader to (perhaps) change intonation; and
suceeds return from main by outputing another command sequence
that resets to the original state.

I do not see how C could prohibit that, assuming that the
implementation takes care to detect whether standard output
really refers to the screen reader, and does emit the control
sequences if output is redirected to a file. Another user who
runs that same program without a screen reader may see the
standard text printed on the screen, without the control
sequence sandwich.

I don't think a conforming implementation can prohibit that kind
of thing.

Whether foo() has external linkage or internal
linkage doesn't change that.

I disagree. There's no possible way for the implementation to
know whether a function with external linkage will be ultimately
invoked or not; consider a system that supports loadable shared
modules. Nothing prevents even this simple program from being
compiled as a shared module, dynamically loaded, the loading
program explicitly searching for and finding the symbol
corresponding to the `foo` function, and invoking it.

Remember that linking is translation phase 8. The compiler is not
the entire implementation.

Exactly my point. The compiler cannot know how `foo` might be
used, or how the translated object might be exercised. There's
I don't see how it could possibly know that, given that `foo`
has external linkage.

We were presented with a complete translation unit that included a
function definition for "main". It's a complete program. There's no
valid way for some other program to call foo. If OS provided such
a mechanism, it would be outside the scope of C.

Given an excessively pedantic and literal reading of the text of
the standard, I don't think an implementation is explicitly
prohibited from evaluating the initializer at translation time,
deducing that the behavior is undefined, and blaming it on the
program, at which point, all bets are off.

Hence, the compiler _must_ treat with UB as written, which is
why `ubsan` inserts trapping code in `foo`.

I don't know what "_must_ treat with UB" means.

foo() has undefined behavior if it's called, so replacing its
body with trapping code is valid. But (I'm reasonably sure that)
an implementation cannot reject a program just because it can't
prove that it has no undefined behavior during execution. It can
reject it if it can prove that it *always* has undefined behavior
during execution.

What I'm saying is that, `foo` has undefined behavior _period_.
That's manifest in an integer constant expression, whether it is
executed at runtime or not. I believe that the standard forces
the expression to be evaluated at translation time, via the
"shall" mandate when checking the constraint on the range in sec
6.6 para 4. Further, that evaluation must happen in accordance
with the rules of the abstract machine, as per 5.1.2.4 para 17.
The diagnostic is mandated, as is the translation-time
evaluation. The expression is itself manifestly exhibits UB,
and so therefore the result of the rest of the translation is
undefined.

foo is a function. foo does not have undefined behavior; it has no
behavior at all. A *call* to foo during execution has undefined
behavior. (`foo;` is a statement-expression that does nothing;
it does not have undefined behavior.)

The _evaluation_ of that expression in `foo` has undefined
behavior. The standard does not say that it _cannot_ be
evaluated at translation time.

[SNIP]

I think the question of whether the initializer is a
constant-expression or not has caused some not entirely relevant
confusion.

Here's another example that avoids that issue.

#include <limits.h>

int foo(void) {
int zero;
zero = INT_MAX;
zero ++;
zero *= 0;
return zero;
}

int main(void) {
return 0;
}

Given my grammatical argument above, I would say that this program
has no constant expressions.

Agreed, if by "constant expressions" you mean those mandated to
use the `constant-expression` grammatical production.

Whether that argument is correct or
not, it certainly has no constant expressions that violate any
constraint or that have undefined behavior. Evaluating `zero ++`
(which doesn't even pretend to be a constant expression) would have
run-time undefined behavior -- *if* foo() were ever called.

Let me turn this around in two ways: suppose that the
translation unit _only_ included `foo`. Could the compiler
deduce that the behavior of `foo`, if called, is undefined? If
not, why not?

Second, suppose that `foo` _were_ called, could the compiler
replace this with a program that was the equivalent of,
`int main(void) {printf("check your nose"); abort();}`? If so
why? If not, why not?

And given this translation unit, I don't think there's any way to
construct a multi-TU program that calls foo, so a compiler *can*
determine that foo is never called (but there's no requirement to
do so, or to make any use of that information).

This is the crux of my point, as well. There's not requirement
for the translator to _not_ evaluate the expression and become
privy to UB.

Would it be stupid if a compiler did that? Yes. Do existing
compilers do so? No, not that I'm aware of. Would some dweeb
nerd compiler douche who thinks this would make a compiler
benchmark some microfraction of a percent faster take advantage
of that? I absolutely think so, yes.

- Dan C.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Fri Jun 5 23:50:49 2026

From Newsgroup: comp.lang.c

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <10vt7b9$pi3s$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <10vsnl7$lkmu$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <865x3yd21n.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote: >>>>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <86ik81cfk5.fsf_-_@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

[...]

There's an important distinction to make here. Consider this
program:

#include <limits.h>

int
foo(){
int zero = (INT_MAX+1)*0;
return zero;
}

int
main(){
return 0;
}

This program does not transgress the bounds of undefined behavior. >>>>>>

To clarify, the comments in my posting were meant to be read as >>>>>>saying the given text is the entire program, and that it is strictly >>>>>>conforming with respect to conforming hosted implementations. >>>>>>(Incidentally, given the rules for freestanding implementations, I'm >>>>>>not sure that it is even possible for any program to be strictly >>>>>>conforming with respect to conforming freestanding implementations. >>>>>>In any case my statements were meant only in the context of hosted >>>>>>implementations.)

Ok.

[snip]
Perhaps you mean that this is irrelevant because `foo` is not
invoked, but I see no reason why that need be the case in e.g.
a freestanding environment.

I explained the context of my previous statements above. Sorry for >>>>>>not saying that in the original message.

In a hosted environment, I don't
think anything explicitly prevents `foo` from being called after >>>>>>> `main` returns (though I can't imagine that would happen in real >>>>>>> life; it would be weird if it did).

The semantics described in the ISO C standard don't admit that >>>>>>possibility.

Could you please point to where it says this, in the C standard?

I cannot find anything that says that arbitrary code cannot run
after `main()` returns, and I don't see how that could possibly
be true.

N3220 5.1.2.4, Program semantics.

It defines the *observable behavior* of a program, which consists of >>>>accesses to volatile objects, data written to files, and I/O dynamics of >>>>interactive devices.

Yes, but it does so for strictly-conforming programs with no UB.

It does so for programs in general, not just strictly conforming
ones. If a program has undefined behavior, all bets are off,
but for example a program that evaluates `printf("%d\n", INT_MAX)`
is not strictly conforming, but it's fully subject to 5.1.2.4.

To understand conformance, we have to jump over to section 4,
which explicitly says that, 'Undefined behavior is otherwise
indicated in this document by the words "undefined behavior" or
by the omission of any explicit definition of behavior.' As it
does not say that a program with an instance of undefined
behavior in an integer constant expression that is not executed
must otherwise behave in any given manner, what the program does
is undefined. A constaint violation mandates a diagnostic, but
beyond that, the standard is (AFAICT) silent.

I don't think an integer constant expression can have undefined
behavior. INT_MAX+1 and 1/0 are not constant expressions, because
neither "evaluate(s) to a constant that is in the range of
representable values for its type".

I claim that an expression that looks like a constant expression
*isn't* a constant-expression if it doesn't appear in a context
that requires a constant-expression.

That's a bold claim, but I think I see why you're saying that.

The program in question, quoted above, has:

int zero = (INT_MAX+1)*0;

`(INT_MAX+1)*0` is not a constant expression, not because of the
overflow, but because a constant expression is not required in
that context. "constant-expression" is defined by a production in
the grammar (it reduces to "conditional-expression"). Even in

int n = 42;

42 is not a a constant expression, because the grammar doesn't
call for a constant expression in that context -- even though it
looks like one. Similarly, in `a + b * c`, `a + b` looks like an
additive expression, but it isn't one. (Not a perfect analogy.)

Right; I see what you mean. In this case, the
`assignment-expression` production applies, not
`constant-expression`.

Undefined Behavior, in turn, is not defined as specific only to
execution: the standard simply says that it is "behavior, upon
use of a *nonportable or erroneous program construct*..." for
which there are no requirements, and there are examples of
things that are explicitly UB at translation time, such as
improperly terminated lexemes and so forth.

Yes, there are constructs that are explicitly UB at translation time.
(I think that's unfortunate, and there are efforts to clear up some
such cases in C2y.)

It's unclear to me how it could be any other way. If UB was
_only_ an issue at runtime, then how could a compiler take
advantage of it to perform optimizations during translation?
We know that compilers do this.

There are instances of undefined behavior that depend on specific characteristics of a source file, not on run-time behavior.
The first example I found (N3220) is in the description of
translation phase 4, 5.1.1.2:

If a character sequence that matches the syntax of a universal
character name is produced by token concatenation (6.10.5.3), the
behavior is undefined.

That's something that can be detected during compilation. It would
be far better if it were either well defined or a syntax rule
violation. And in fact the latest C2y draft doesn't have that
wording. There's an ongoing effort to clean up this kind of thing.

That's not the kind of UB I'm talking about.

Signed integer overflow is not one of those constructs.

This I'm not sure I agree with. It the compiler detects signed
integer overflow in (perhaps not relevant in _this_ example) an
integer constant expression, I still don't see anthing that
makes that anything other than UB. It's a constaint violation,
sure, but nothing says it is not also UB.

An implementation can choose to successfully translate a program that
violates a constraint. In my opinion, the resulting program has (or
should be considered to have) undefined behavior, but the standard
doesn't explicitly say so. My argument is based on the definition
of "constraint": "restriction, either syntactic or semantic,
by which the exposition of language elements is interpreted".
If a constraint is violated, I argue that there is no basis for
interpreting the exposition of language elements, and therefore no
definition of the behavior.

Other interpretations are possible.

So if an overflow in an ICE has undefined behavior, it's merely
an instance of this more general principle, which might even not
be valid.

An unambiguous case is:

case INT_MAX+1:

That's a constraint violation. The expression is required to be an
ICE, but it doesn't "evaluate to a constant that is in the range
of representable values for its type" (unless you want to argue
that it can evaluate to INT_MIN for a particular implementation,
but I really dislike the implications of that). If there's UB,
it's because of the constraint violation. (In fact I'd expect most
compilers to reject it, so there's no behavior at all.)

On the other hand, this:

int n = INT_MAX;
n++;

has undefined behavior and is not a constraint violation. A note on the definition of "undefined behavior" says:

Possible undefined behavior ranges from ignoring the situation
completely with unpredictable results, to behaving during
translation or program execution in a documented manner
characteristic of the environment (with or without the issuance of a
diagnostic message), to terminating a translation or execution (with
the issuance of a diagnostic message).

So a compiler can reject it *if* it can prove that the undefined
behavior will always occur. The standard is not 100% clear about
whether it can be rejected if the code is never executed, or is
executed conditionally, but I think that's not permitted, or at least
it shouldn't be. Rejecting code because the compiler can't prove
the behavior is undefined has some very unpleasant implications.

Any undefined behavior from evaluating INT_MAX+1 happens during
execution (barring constraint violations).

I'm not sure the standard says that. The standard says this
happens during _evaluation_, and that evaluation must be
performed in accordance with the rules of the abstract syntax
machine. But it doesn't precisely specify _when_ evaluation
takes place, and in particular, there are places in the standard
that explicitly mention evaluation during translation. I still
don't see anything that prohibits a compiler from evaluating
that expression at compile time (indeed, it clearly does, as it
generates a diagnostic about the overflow).

I suppose that changes the matter: does the language merely
leave that unspecified, in which case, this program is not
strictly conforming, or does it say that it _cannot_ make any translation-time decisions about it? I cannot find a satisfying
argument for the latter.

Ok, given:

case INT_MAX+1:

a compiler could issue the required diagnostic for the constraint
violation as a non-fatal warning, then generate code that executes
an ADD instruction with operands INT_MAX and 1. That would be
conforming but silly. The compiler has to determine that INT_MAX+1
overflows anyway so it can issue the diagnostic.

Furthermore, the expression above is obviously an integer
constant expression as defined by sec 6.6 para 8. Section 6.6,
para 4, reads in part, "Each constant expression shall evaluate
to a constant that is in the range of representable values for
its type." The expression, `(INT_MAX+1)*0` violates this
constraint, and so therefore a diagnostic is mandated as per
sec 5.1.1.3 para 1. That it appears in code that is not
obviously called from `main` doesn't change that.

It satisfies the requirements for an integer constant expression in
6.6p8, but it violates the constraint in 6.6p4. (I presume that an >>"integer constant expression" must be a "constant expression".)
But since "constant-expression" is a grammatical production,
it doesn't have to satisfy that constraint, and no diagnostic
is required. (A warning is certainly permitted.)

Fair point. It's grammatical position makes it an
assignment-expression. I clearly misinterpreted that before.

Similarly, this:
int n = INT_MAX + 1;
at block scope doesn't require a diagnostic, though of course it
has undefined behavior -- but at file scope, the initializer is a
constant expression, so that would be a constraint violation.

Right. The semantics of this are defined in sec 6.7.11 para 5.

Morever, sec 6.6 para 17 says that, "the semantic rules for
evaluation of a constant expression are the same as for
nonconstant expressions." This brings us back to 5.1.2.4,
though I submit that para (4) is a stronger argument for what
you and Tim are saying, as it reads in part, "An actual
implementation is not required to evaluate part of an expression
if it can deduce that its value is not used and that no needed
side effects are produced (including any caused by calling a
function or through volatile access to an object)." I interpret
this to mean that, if the implementation can determine that
there is no way that `foo` can be called, it does not _have_ to
evaluate the above expression. However, it must satisfy the
range constraint from section 6.6, so it likely will, and in any
event, the standard does not say that it, "shall not" evaluate
it, or when.

Overflow in a constant expression is not undefined behavior. It's a >>constraint violation. But that doesn't apply here, because the
initializer is not a constant expression. (Sorry if I'm repeating
myself.)

Where does it say that UB and constraint violations are mutually
exclusive? I don't see any such statement in the standard. Am
I missing it?

It doesn't.

As a practical matter, when I look at C code, if it violates a
constraint, I typically don't care about its behavior. I want it
to be rejected at compile time (unless it's deliberately taking
advantage of a documented extension). I'll fix it rather than
worrying about its behavior.

(Unless the code has somehow gotten into production and it's my
job to analyze how it misbehaves.)

Yes, a program that violates a constraint can have run-time behavior if
the compiler chooses not to reject it, and that behavior may be
undefined.

The standard says that if a constraint is violated, a diagnostic
must be emitted, regardless of whether or not the constraint
violation is the result of something that is UB not; that is, if
a constraint violation occurs due to something that is UB, the
implementation must still emit a diagnostic: UB is not an escape
hatch from that requirement.

Right.

It also says, 'If a "shall" or "shall not" requirement that
appears outside of a constraint or runtime-constraint is
violated, the behavior is undefined. Undefined behavior is
otherwise indicated in this document by the words "undefined
behavior" or by the omission of any explicit definition of
behavior.' However, that does not preclude such behavior being
undefined; it just means that the words "shall" and "shall not"
in a constraint violation do not a priori describe behavior vis
definition.

Right.

Once the compiler does that, if it does, and observes UB, the
standard is silent on what requirements it imposes, which means
the behavior is undefined. I see no reason it couldn't arrange
to invoke `foo` at that point.

Any UB in the program would occur during execution,

I suppose; but it's not clear to me that UB is tied _only_ to
execution time.

The standard is explicit that there _are_ things that are
evaluated at translation time, like the initializer for an
object with storage class `constexpr`. It is not clear me that
a compiler is otherwise _prohibited_ from evaluating an
expression during translation; indeed, one could imagine it
doing so to perform constant folding, and I do not believe there
exists any normative text defining it as such.

Certainly a compiler can, but need not, evaluate any expression at
compile time if it's able to:

int n;
n = 2 + 2;

I'd be surprised to see an ADD instruction in the generated code, but
a naive compiler could certainly generate one. For that matter, a
perverse compiler could generate code that adds 3 and 1 or divides 28
by 7. Anything that implements the required *observable behavior*
(5.1.2.4 Program semantics) is acceptable. Executing an ADD
instruction is not part of the observable behavior.

I realize this is an extreme interpretation, and not one that is
not widely shared. Personally, I think it's rather silly.

However, I that is _a_ danger of the informality of the C
specification; it does not define the semantics of the abstract
machine in the formally precise way that, say, the SML spec
defines that language's semantics. Rather, it informally
specifies them in prose, and that prose is ambiguous.

There have been attempts to define C's semantics formally, but
those attempts are not part of the standard. Fully defining C's
semantics formally rather than in English would, I imagine it would
be a *lot* of work -- and fewer people would be able to understand
the specification or work on it.

Probably much good would be done if C's semantics _were_
rigorously defined, but they are not. Thus, they are open to
radical interpretation, and as extreme as those may be, I do not
see how the normative text of the standard explicitly
_prohibits_ them.

and in fact
it *won't* occur during execution because foo() isn't called.
A compiler can't generate code with arbitrary behavior just because
it can't prove that there will be no UB. If it could, every signed
or floating-point arithmetic operation with unknown operand values
would grant the same permission.

But that's not the situation here. The situation is that the
compiler can prove that something _is_ UB.

In the program quoted at the top of this post, the UB occurs in
a function foo() that's never called. A compiler can replace the
body of foo() with a trap, and it can certainly warn about the UB,
but I don't believe it can reject the entire program. A clever
compiler could prove that the UB never occurs.

A naive compiler that performs no optimizations would generate
code for foo() that attempts to compute (INT_MAX+1)*0 step by
step, without recognizing the overflow, and that code would never
be executed.

Regardless, I think you highlighted an actual problem with the
spec; I don't think that behavior is _explicitly_ prohibited,
therefore, it is likely undefined, but at a minimum unspecified,
whether it actually could happen. If the argument against that
is that this renders the language essentially unusuable, then
my response is, "yeah, well, welcome to programming in C in the
2020s." Most compilers would never be that extreme, but I see
no evidence that it would not be an invalid reading of the
literal text of the standard if they did.

So no, I do not see how execution according to the rules of the
abstract machine is not guaranteed, here. I certainly see no
way in which this can be regarded as a strictly conforming
program.

foo()'s behavior would be undefined if it were called. It *isn't*
called, so there's no actual UB. The program does not violate any
of the other requirements for strict conformance.

I understand _what_ you're saying: despite the expression itself
manifesting undefined behavior, in this case it's not UB because
`foo` is never executed. What I'm saying is that I don't see
anything in the standard that restricts UB to _only_ executed
code. A reputable compiler obviously instruments `foo` with
code to trap into ubsan; if it's not UB, since it's not
executed, then why do so? Granted, that's not evidence of
anything other than the behavior of those compilers, but still.

Probably the compiler generated the trap code because it didn't
(yet?) know whether foo is ever called. If it were clever enough
to prove that foo is never called, it could generate no code for
it at all.

The note on the definition of undefined behavior is a bit vague.
It permits terminating a translation in response to UB, but that
doesn't address exactly when it can do so. I believe it can do so
only when it can prove that the UB always occurs, but that's not
clearly stated.

However, the behavior of the program as a whole is clearly defined.
It returns a status of 0 from main and does nothing else.
A conforming implementation *must* generate code that implements
that behavior.

Another argument (subject to interpretation of wording): Undefined
behavior is "behavior, **upon use** of a nonportable or erroneous
program construct or of erroneous data, for which this document
imposes no requirements". The overflowing expression within foo()
is never *used*, so there is no undefined behavior.

To put it another way, undefined behavior is behavior. Something
that never occurs is not behavior.

It is clearly the _intent_ that this be a strictly conforming
program. The C standard, as an imprecise, informal document,
cannot guarantee it.

If the usual "Hello, world" program prints "Hello, world" followed
by "Goodbye", the implementation is non-conforming. If it formats
my hard drive after printing "Goodbye", it's non-conforming and >>>>dangerous.

Two separate things. My point earlier was that code can
obviously run after `main` terminates. Moreoever, I can't
imagine what would _prevent_ a runtime system that invokes
`main` from doing something like printing, "PROGRAM STOPPED"
after `main` returned. C imposes no requirements here.

Yes, it does. An OS can print "PROGRAM STOPPED", but not as part
of the execution of the program. On my system, a shell prompt is
printed after a program terminates, but not by the program. If I
execute a "hello, world" program with its output redirected to a file
(on a system that supports that), the resulting file cannot contain >>"PROGRAM STOPPED". The requirements in 5.1.2.4 specify both what
the execution of a program must do and what it must not do.

Files are a separate case. There's no guarantee that the
standard output refers to a file; it may well refer to an
"interactive device", the semantics of which are (necessarily)
unspecified.

The requirements for "observable behavior" cover both files and
interactive devices.

Here's an example: consider an interactive user who uses a
screen reader device. Suppose that user makes use of an
implementation that includes runtime support for that device,
and that precedes invocation of `main` with a command sequence
causing the screen reader to (perhaps) change intonation; and
suceeds return from main by outputing another command sequence
that resets to the original state.

I do not see how C could prohibit that, assuming that the
implementation takes care to detect whether standard output
really refers to the screen reader, and does emit the control
sequences if output is redirected to a file. Another user who
runs that same program without a screen reader may see the
standard text printed on the screen, without the control
sequence sandwich.

I don't think a conforming implementation can prohibit that kind
of thing.

I agree. printf("hello, world\n") must write that string to standard
output, which may be a file or an interactive device. Just what
that means is unspecified or implementation-defined. It might be
printed in EBCDIC or incised into clay tablets. Closing stdout,
which occurs when main() terminates, might involve firing the tablet
or emitting control sequences for a screen reader.

Whether foo() has external linkage or internal
linkage doesn't change that.

I disagree. There's no possible way for the implementation to
know whether a function with external linkage will be ultimately
invoked or not; consider a system that supports loadable shared
modules. Nothing prevents even this simple program from being
compiled as a shared module, dynamically loaded, the loading
program explicitly searching for and finding the symbol
corresponding to the `foo` function, and invoking it.

Remember that linking is translation phase 8. The compiler is not
the entire implementation.

Exactly my point. The compiler cannot know how `foo` might be
used, or how the translated object might be exercised. There's
I don't see how it could possibly know that, given that `foo`
has external linkage.

We were presented with a complete translation unit that included a
function definition for "main". It's a complete program. There's no
valid way for some other program to call foo. If OS provided such
a mechanism, it would be outside the scope of C.

Given an excessively pedantic and literal reading of the text of
the standard, I don't think an implementation is explicitly
prohibited from evaluating the initializer at translation time,
deducing that the behavior is undefined, and blaming it on the
program, at which point, all bets are off.

An implementation can certainly evaluate the initializer at
translation time, deduce that the behavior would be undefined
*if the initializer were evaluated*, and blame it on the program.
That doesn't mean it can reject a strictly conforming program.

Hence, the compiler _must_ treat with UB as written, which is
why `ubsan` inserts trapping code in `foo`.

I don't know what "_must_ treat with UB" means.

foo() has undefined behavior if it's called, so replacing its
body with trapping code is valid. But (I'm reasonably sure that)
an implementation cannot reject a program just because it can't
prove that it has no undefined behavior during execution. It can >>>>reject it if it can prove that it *always* has undefined behavior >>>>during execution.

What I'm saying is that, `foo` has undefined behavior _period_.
That's manifest in an integer constant expression, whether it is
executed at runtime or not. I believe that the standard forces
the expression to be evaluated at translation time, via the
"shall" mandate when checking the constraint on the range in sec
6.6 para 4. Further, that evaluation must happen in accordance
with the rules of the abstract machine, as per 5.1.2.4 para 17.
The diagnostic is mandated, as is the translation-time
evaluation. The expression is itself manifestly exhibits UB,
and so therefore the result of the rest of the translation is
undefined.

foo is a function. foo does not have undefined behavior; it has no >>behavior at all. A *call* to foo during execution has undefined
behavior. (`foo;` is a statement-expression that does nothing;
it does not have undefined behavior.)

The _evaluation_ of that expression in `foo` has undefined
behavior. The standard does not say that it _cannot_ be
evaluated at translation time.

If a compiler sees a subexpression INT_MAX+1 it can attempt to
evaluate it at compile time. But it can't just blindly add the
values if overflow would cause a fatal trap, crashing the compiler.
That would be a serious compiler bug. The behavior *of the compiler*
is not undefined.

[SNIP]

I think the question of whether the initializer is a
constant-expression or not has caused some not entirely relevant
confusion.

Here's another example that avoids that issue.

#include <limits.h>

int foo(void) {
int zero;
zero = INT_MAX;
zero ++;
zero *= 0;
return zero;
}

int main(void) {
return 0;
}

Given my grammatical argument above, I would say that this program
has no constant expressions.

Agreed, if by "constant expressions" you mean those mandated to
use the `constant-expression` grammatical production.

Yes, that's what I mean by it.

Whether that argument is correct or
not, it certainly has no constant expressions that violate any
constraint or that have undefined behavior. Evaluating `zero ++`
(which doesn't even pretend to be a constant expression) would have >>run-time undefined behavior -- *if* foo() were ever called.

Let me turn this around in two ways: suppose that the
translation unit _only_ included `foo`. Could the compiler
deduce that the behavior of `foo`, if called, is undefined? If
not, why not?

Certainly.

Second, suppose that `foo` _were_ called, could the compiler
replace this with a program that was the equivalent of,
`int main(void) {printf("check your nose"); abort();}`? If so
why? If not, why not?

Yes, if foo were called in every possible execution of the program,
the program's behavior would be undefined. The compiler could also
reject it.

And given this translation unit, I don't think there's any way to
construct a multi-TU program that calls foo, so a compiler *can*
determine that foo is never called (but there's no requirement to
do so, or to make any use of that information).

This is the crux of my point, as well. There's not requirement
for the translator to _not_ evaluate the expression and become
privy to UB.

I believe there is. The program is strictly conforming, which means,
among other things, that it does not produce output depending on any
undefined behavior. There is no undefined behavior because foo() is
never called.

A *strictly conforming program* shall use only those features of the
language and library specified in this document. It shall not
produce output dependent on any unspecified, undefined, or
implementation- defined behavior, and shall not exceed any minimum
implementation limit.

...

A *conforming hosted implementation* shall accept any strictly
conforming program.

An implementation that rejects the program quoted at the top of this
article is non-conforming.

Would it be stupid if a compiler did that? Yes. Do existing
compilers do so? No, not that I'm aware of. Would some dweeb
nerd compiler douche who thinks this would make a compiler
benchmark some microfraction of a percent faster take advantage
of that? I absolutely think so, yes.

And I'd submit a bug report.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.22a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sat Jun 6 15:47:07 2026

From Newsgroup: comp.lang.c

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

I claim that an expression that looks like a constant expression
*isn't* a constant-expression if it doesn't appear in a context
that requires a constant-expression.

Right. This question came up years ago in a Defect Report. The
response from the Committee was basically the same as what you
said: the 6.6 constraints for constant expressions apply only in
situations where the C standard expressly requires a constant
expression. (I don't have the DR in front of me; I'm summarizing
based on memory, but am confident the actual wording is consistent
with what I just said.)
--- Synchronet 3.22a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sat Jun 6 16:15:05 2026

From Newsgroup: comp.lang.c

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

PS: One yet non-considered question that was part of my original
post was: "Is there any rationale from the _software designer_'s perspective?"

I didn't respond to your original question because it was based on a misconception. Whether a given expression is a constant expression,
in the sense of needing to satisfy the constraints of 6.6, depends
not on the form of the expression but on the context in which it
appears. The 6.6 constraints apply only in situations where the C
standard expressly requires a constant expression. Other cases,
such as a use like this

int
whatever(){
int r = (int)(-1u/2) + 1;
return r;
}

do not need to satisfy the 6.6 constraints, because the C standard
doesn't require a constant expression in that context. (Note that
the initializing expression for 'r' does overflow the range of int
in implementations where UINT_MAX == INT_MAX*2.)
--- Synchronet 3.22a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Sat Jun 6 16:36:14 2026

From Newsgroup: comp.lang.c

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

I claim that an expression that looks like a constant expression
*isn't* a constant-expression if it doesn't appear in a context
that requires a constant-expression.

Right. This question came up years ago in a Defect Report. The
response from the Committee was basically the same as what you
said: the 6.6 constraints for constant expressions apply only in
situations where the C standard expressly requires a constant
expression. (I don't have the DR in front of me; I'm summarizing
based on memory, but am confident the actual wording is consistent
with what I just said.)

C99 DR 261 looks similar to what you're talking about.

https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_261.htm

The Committee Response section says:

In general, the interpretation of an expression for constantness
is context sensitive. For any expression which contains only
constants:

- If the syntax or context only permits a constant expression, the
constraints of 6.6#3 and 6.6#4 shall apply.
- Otherwise, if the expression meets the requirements of 6.6
(including any form accepted in accordance with 6.6#10), it is a
constant expression.
- Otherwise it is not a constant expression.

That's close to what I claimed, but the second bullet point differs.
My claim was that, given:

n = 2+2;

2+2 is not a constant expression because the grammar doesn't require
a constant expression in that context. The Committee's opinion
(at least at the time) was that it is a constant expression because
it meets the requirements of 6.6.

But I *think* it's a distinction without a difference. Calling 2+2
a constant expression has no effect on the semantics, and does not
require or forbid the implementation from, for example, generating
an ADD instruction. The distinction would matter for an expression
that has UB and/or does not yield a value of the type, but that
falls through to the third bullet.

I found another interesting tidbit, C90 DR 031, relevant to another
point I made elsethread:

https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_031.html

case (INT_MAX*4)/4: is a constraint violation.
When subclause 6.4 says on page 55, lines 11-12:
Each constant expression shall evaluate to a constant that is in
the range of representable values for its type.
the Committee's judgement of the intent is that the
``representable'' requirement applies to each subexpression of a
constant expression, as shown in the third example. A constant
expression is meant as defined by the syntax rules.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.22a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sat Jun 6 16:43:53 2026

From Newsgroup: comp.lang.c

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

I claim that an expression that looks like a constant expression
*isn't* a constant-expression if it doesn't appear in a context
that requires a constant-expression.

Right. This question came up years ago in a Defect Report. The
response from the Committee was basically the same as what you
said: the 6.6 constraints for constant expressions apply only in
situations where the C standard expressly requires a constant
expression. (I don't have the DR in front of me; I'm summarizing
based on memory, but am confident the actual wording is consistent
with what I just said.)

C99 DR 261 looks similar to what you're talking about.

https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_261.htm

The Committee Response section says:

In general, the interpretation of an expression for constantness
is context sensitive. For any expression which contains only
constants:

- If the syntax or context only permits a constant expression, the
constraints of 6.6#3 and 6.6#4 shall apply.
- Otherwise, if the expression meets the requirements of 6.6
(including any form accepted in accordance with 6.6#10), it is a
constant expression.
- Otherwise it is not a constant expression.

That's close to what I claimed, but the second bullet point differs.
My claim was that, given:

n = 2+2;

2+2 is not a constant expression because the grammar doesn't require
a constant expression in that context. The Committee's opinion
(at least at the time) was that it is a constant expression because
it meets the requirements of 6.6.

But I *think* it's a distinction without a difference. [...]

Right. The key point is that the constraints need to be satisfied
only in situations where the C standard expressly requires a
constant expression. Whether a given expression is called a
"constant expression" doesn't matter; all that does matter is
whether the constraints need to be satisfied.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Sat Jun 6 17:41:34 2026

From Newsgroup: comp.lang.c

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

[...]

That's close to what I claimed, but the second bullet point differs.
My claim was that, given:

n = 2+2;

2+2 is not a constant expression because the grammar doesn't require
a constant expression in that context. The Committee's opinion
(at least at the time) was that it is a constant expression because
it meets the requirements of 6.6.

But I *think* it's a distinction without a difference. [...]

Right. The key point is that the constraints need to be satisfied
only in situations where the C standard expressly requires a
constant expression. Whether a given expression is called a
"constant expression" doesn't matter; all that does matter is
whether the constraints need to be satisfied.

Well, it matters a little bit, at least to me, even though the
distinction doesn't seem to affect the validity or semantics of
any C code.

A clear and unambiguous definition of what is or is not a "constant
expression" would make the language just a bit easier to understand
and explain. I'd even be satisified with the definition given in
the DR *if* it were clearly expressed in the standard.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.22a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sat Jun 6 18:06:37 2026

From Newsgroup: comp.lang.c

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <865x3yd21n.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <86ik81cfk5.fsf_-_@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

[...]

There's an important distinction to make here. Consider this
program:

#include <limits.h>

int
foo(){
int zero = (INT_MAX+1)*0;
return zero;
}

int
main(){
return 0;
}

This program does not transgress the bounds of undefined behavior.

To clarify, the comments in my posting were meant to be read as
saying the given text is the entire program, and that it is strictly
conforming with respect to conforming hosted implementations.
(Incidentally, given the rules for freestanding implementations, I'm
not sure that it is even possible for any program to be strictly
conforming with respect to conforming freestanding implementations.
In any case my statements were meant only in the context of hosted
implementations.)

[...]

foo() has undefined behavior if it's called, so replacing its
body with trapping code is valid.

Right.

But (I'm reasonably sure that)
an implementation cannot reject a program just because it can't
prove that it has no undefined behavior during execution. [...]

Right.

In your example, `foo` clearly exhibits UB; I think your
argument is whether that has a realized effect or not, since the
UB is not invoked. I'm saying that in general a compiler cannot
possibly know that when it compiles `foo`, and is free to assume
the worst.

foo() exhibits UB if and only if it's called during execution.

Right.
--- Synchronet 3.22a-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Mon Jun 8 02:20:51 2026

From Newsgroup: comp.lang.c

In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

and in fact
it *won't* occur during execution because foo() isn't called.
A compiler can't generate code with arbitrary behavior just because
it can't prove that there will be no UB. If it could, every signed
or floating-point arithmetic operation with unknown operand values
would grant the same permission.

But that's not the situation here. The situation is that the
compiler can prove that something _is_ UB.

In the program quoted at the top of this post, the UB occurs in
a function foo() that's never called. A compiler can replace the
body of foo() with a trap, and it can certainly warn about the UB,
but I don't believe it can reject the entire program. A clever
compiler could prove that the UB never occurs.

So there are two things that are at play here.

First, this notion that UB is _only_ a runtime matter. The text
of the standard contradicting that aside, if a translator can
detect that the behavior of a construct is provably undefined if
executed, then it seems axiomatic that UB is clearly something
that plays a role at translation time, as well.

Indeed, I would go so far as to suggest that _most_ instances of
UB are detected and used (by the translator) during translation.

So to say that, "this program doesn't have UB because the
statement that contains UB is never executed" doesn't make a lot
of sense to me. It would be closer to being correct if one said
"this program is unaffected by UB since the expression that has
UB is never evaluated when the program executes": again, in this
case (as, I suspect, in most cases) the UB simply _is_: the
expression `INT_MAX + 1` does not become well-defined just
because it is never executed.

Second, there's this notion that the standard is just
underspecified with respect to these matters, specifically, it
does not _prohibit_ a translation from implementing an emulator
for the abstract machine that evaluates code at translation
time. Indeed, I suspect that _most_ compilers do something
largely analogous to that; that's how they detect UB so that
they can take advantage of it when optimizing. But if that's
the case, then nothing prohibits them from relieving themselves
of their obligation to follow the standard once they observe
that some bit of code has UB.

A naive compiler that performs no optimizations would generate
code for foo() that attempts to compute (INT_MAX+1)*0 step by
step, without recognizing the overflow, and that code would never
be executed.

Sure. But a far more sophisticated translator (and I would
argue a nefarious one) could emulate that code, decide it was
UB, and immediately fail translation with an error.

foo()'s behavior would be undefined if it were called. It *isn't* >>>called, so there's no actual UB. The program does not violate any
of the other requirements for strict conformance.

I understand _what_ you're saying: despite the expression itself
manifesting undefined behavior, in this case it's not UB because
`foo` is never executed. What I'm saying is that I don't see
anything in the standard that restricts UB to _only_ executed
code. A reputable compiler obviously instruments `foo` with
code to trap into ubsan; if it's not UB, since it's not
executed, then why do so? Granted, that's not evidence of
anything other than the behavior of those compilers, but still.

Probably the compiler generated the trap code because it didn't
(yet?) know whether foo is ever called. If it were clever enough
to prove that foo is never called, it could generate no code for
it at all.

The note on the definition of undefined behavior is a bit vague.
It permits terminating a translation in response to UB, but that
doesn't address exactly when it can do so. I believe it can do so
only when it can prove that the UB always occurs, but that's not
clearly stated.

Exactly. That it's not clearly stated makes be believe that it
is open to interpretation.

However, the behavior of the program as a whole is clearly defined.

Is it? I am unable to locate where the standard _actually says
that it is_. That is my whole point.

It returns a status of 0 from main and does nothing else.
A conforming implementation *must* generate code that implements
that behavior.

I have yet to find or be shown a way in which the standard
actually guarantees that.

Another argument (subject to interpretation of wording): Undefined
behavior is "behavior, **upon use** of a nonportable or erroneous
program construct or of erroneous data, for which this document
imposes no requirements". The overflowing expression within foo()
is never *used*, so there is no undefined behavior.

To put it another way, undefined behavior is behavior. Something
that never occurs is not behavior.

And yet the standard does not say that. That is an
interpretation; I assume it is universally shared, but if we
want to limit ourselves to what the standard _actually says_ it
is woefully underspecified in this regard.

There was, once, a view that was almost universally shared that
UB was meant for things that could not be precisely described
because hardware was too varied. We're well past that; now it's
a vehicle for compiler writers to make benchmarks faster, but is
(generally) hostile to programmers. A lot of hay is made about
it in this group, but at the core, it's just (ironically) not
well-defined.

It is clearly the _intent_ that this be a strictly conforming
program. The C standard, as an imprecise, informal document,
cannot guarantee it.

If the usual "Hello, world" program prints "Hello, world" followed
by "Goodbye", the implementation is non-conforming. If it formats
my hard drive after printing "Goodbye", it's non-conforming and >>>>>dangerous.

Two separate things. My point earlier was that code can
obviously run after `main` terminates. Moreoever, I can't
imagine what would _prevent_ a runtime system that invokes
`main` from doing something like printing, "PROGRAM STOPPED"
after `main` returned. C imposes no requirements here.

Yes, it does. An OS can print "PROGRAM STOPPED", but not as part
of the execution of the program. On my system, a shell prompt is
printed after a program terminates, but not by the program. If I
execute a "hello, world" program with its output redirected to a file
(on a system that supports that), the resulting file cannot contain >>>"PROGRAM STOPPED". The requirements in 5.1.2.4 specify both what
the execution of a program must do and what it must not do.

Files are a separate case. There's no guarantee that the
standard output refers to a file; it may well refer to an
"interactive device", the semantics of which are (necessarily)
unspecified.

The requirements for "observable behavior" cover both files and
interactive devices.

Ok, but irrelevant.

Here's an example: consider an interactive user who uses a
screen reader device. Suppose that user makes use of an
implementation that includes runtime support for that device,
and that precedes invocation of `main` with a command sequence
causing the screen reader to (perhaps) change intonation; and
suceeds return from main by outputing another command sequence
that resets to the original state.

I do not see how C could prohibit that, assuming that the
implementation takes care to detect whether standard output
really refers to the screen reader, and does emit the control
sequences if output is redirected to a file. Another user who
runs that same program without a screen reader may see the
standard text printed on the screen, without the control
sequence sandwich.

I don't think a conforming implementation can prohibit that kind
of thing.

I agree. printf("hello, world\n") must write that string to standard
output, which may be a file or an interactive device. Just what
that means is unspecified or implementation-defined. It might be
printed in EBCDIC or incised into clay tablets. Closing stdout,
which occurs when main() terminates, might involve firing the tablet
or emitting control sequences for a screen reader.

Exactly. It could also emit the string, "GOODBYE WORLD."

[snip for size]
Given an excessively pedantic and literal reading of the text of
the standard, I don't think an implementation is explicitly
prohibited from evaluating the initializer at translation time,
deducing that the behavior is undefined, and blaming it on the
program, at which point, all bets are off.

An implementation can certainly evaluate the initializer at
translation time, deduce that the behavior would be undefined
*if the initializer were evaluated*, and blame it on the program.
That doesn't mean it can reject a strictly conforming program.

This is circular reasoning. You're saying that something that
is provably UB in this program cannot prevent that program from
being strictly confirming because the program is strictly
confirming.

This presupposes that the program is strictly conforming, but
in the limit, the standard can be interpreted in such a way that
if any statement in the program is proveably UB (as this one is)
then the program cannot said to be strictly conforming.

Does any compiler actually do this? No, probably not. Does the
standard explicitly prevent it? I haven't seen an argument for
that does not rely on either history or a subjective
interpretation.

foo is a function. foo does not have undefined behavior; it has no >>>behavior at all. A *call* to foo during execution has undefined >>>behavior. (`foo;` is a statement-expression that does nothing;
it does not have undefined behavior.)

The _evaluation_ of that expression in `foo` has undefined
behavior. The standard does not say that it _cannot_ be
evaluated at translation time.

If a compiler sees a subexpression INT_MAX+1 it can attempt to
evaluate it at compile time. But it can't just blindly add the
values if overflow would cause a fatal trap, crashing the compiler.
That would be a serious compiler bug. The behavior *of the compiler*
is not undefined.

I did not say that the behavior of the _compiler_ is undefined.

I said that a translator is not prohibited from evaluating the
expression at translation time, observing that the behavior is
undefined, and erroring out. There is no reason a translator
cannot include a simple emulator for the abstract machine as
specified in the standard for that purpose; it's behavior would
not be undefined, but it could detect undefined behavior.

[SNIP]
I think the question of whether the initializer is a
constant-expression or not has caused some not entirely relevant >>>confusion.

Here's another example that avoids that issue.

#include <limits.h>

int foo(void) {
int zero;
zero = INT_MAX;
zero ++;
zero *= 0;
return zero;
}

int main(void) {
return 0;
}

Given my grammatical argument above, I would say that this program
has no constant expressions.

Agreed, if by "constant expressions" you mean those mandated to
use the `constant-expression` grammatical production.

Yes, that's what I mean by it.

Whether that argument is correct or
not, it certainly has no constant expressions that violate any
constraint or that have undefined behavior. Evaluating `zero ++`
(which doesn't even pretend to be a constant expression) would have >>>run-time undefined behavior -- *if* foo() were ever called.

Let me turn this around in two ways: suppose that the
translation unit _only_ included `foo`. Could the compiler
deduce that the behavior of `foo`, if called, is undefined? If
not, why not?

Certainly.

Ok, so in that case, would we say that "`foo` has undefined
behavior?" The qualification, "...if called" seems superfluous,
and I don't see anything in the standard that explicitly
disagrees.

Second, suppose that `foo` _were_ called, could the compiler
replace this with a program that was the equivalent of,
`int main(void) {printf("check your nose"); abort();}`? If so
why? If not, why not?

Yes, if foo were called in every possible execution of the program,
the program's behavior would be undefined. The compiler could also
reject it.

UB can time-travel, however. Because it's undefined, the
compiler is free to assume that it never executes, or that it
always executes.

And given this translation unit, I don't think there's any way to >>>construct a multi-TU program that calls foo, so a compiler *can* >>>determine that foo is never called (but there's no requirement to
do so, or to make any use of that information).

This is the crux of my point, as well. There's not requirement
for the translator to _not_ evaluate the expression and become
privy to UB.

I believe there is. The program is strictly conforming, which means,
among other things, that it does not produce output depending on any >undefined behavior. There is no undefined behavior because foo() is
never called.

You _say_ the program is stictly conforming. The brunt of what
I am saying is that I do not believe that the text of the
standard actually guarantees that. It is an assumption. Not an
unreasonable one, mind, but it's not guaranteed.

A *strictly conforming program* shall use only those features of the
language and library specified in this document. It shall not
produce output dependent on any unspecified, undefined, or
implementation- defined behavior, and shall not exceed any minimum
implementation limit.

...

A *conforming hosted implementation* shall accept any strictly
conforming program.

An implementation that rejects the program quoted at the top of this
article is non-conforming.

So any program that produces no output at all is strictly
conforming? Then what about this?

#include <limits.h>

int
zero(void)
{
return (INT_MAX + 1) * 0;
}

int
main(void)
{
(void)zero();
return 0;
}

This program produces no output, yet clearly executes a function
that contains an expression that induces undefined behavior when
evaluated. I suppose an argument could be made that it _might_
generate output due to UB, as UB imposes no requirements Not to
do so, so perhaps the _absence_ of output depends on UB.

Would it be stupid if a compiler did that? Yes. Do existing
compilers do so? No, not that I'm aware of. Would some dweeb
nerd compiler douche who thinks this would make a compiler
benchmark some microfraction of a percent faster take advantage
of that? I absolutely think so, yes.

And I'd submit a bug report.

I would go further and chuck that compiler in the trashcan.

However, I can find no normative textin the standard preventing
it.

In my ideal world, C would be rigorously defined with a precise
operational semantics. That would be accompanied by an
explanatory document that presented those semantics in lay
terms in prose, similar to the standard now, for those who did
not want to drive Coq or something similar. But at least we'd
have something definitive to define the language, so that when
there was apparent ambiguity, we had some objective metric by
which to judge. The C standard, as written, is nowhere close as
precise as it should be.

I do not think that this will ever happen: not only would it be
very difficult to produce (as you noted elsethread), I think the
compiler writers would rebel if they felt that their UB hands
were tied by a formal specification.

- Dan C.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sun Jun 7 22:34:52 2026

From Newsgroup: comp.lang.c

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

[...]

But (I'm reasonably sure that)
an implementation cannot reject a program just because it can't
prove that it has no undefined behavior during execution. [...]

Right.

Expanding on that, there is no requirement even to try to
prove such a conjecture. An implementation could simply
give a warning like "there may be undiagnosed constraint
violations in this compilation", and accept the TU no
matter what (except of course for the dreaded #error
preprocessing directive, which if encountered in a live
portion of the translation must result in a rejection).

I presume none of what I'm saying here is news to the usual
suspects; mostly I'm saying it just to remind myself.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Mon Jun 8 12:39:04 2026

From Newsgroup: comp.lang.c

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

[...]

A naive compiler that performs no optimizations would generate
code for foo() that attempts to compute (INT_MAX+1)*0 step by
step, without recognizing the overflow, and that code would never
be executed.

Sure. But a far more sophisticated translator (and I would
argue a nefarious one) could emulate that code, decide it was
UB, and immediately fail translation with an error.

I disagree. That's not a sensible interpretation of what the
standard says.

A call to a foo() would have undefined behavior if it occurred. There
is no call to foo().

Similarly:

int a = ..., b = ...;
int c;
if (b != 0) {
c = a / b;
}
else {
c = 0;
}

A division by zero would have undefined behavior if it occurred,
but it never occurs. A compiler cannot reject the above code
because of UB that never happens.

[...]

It returns a status of 0 from main and does nothing else.
A conforming implementation *must* generate code that implements
that behavior.

I have yet to find or be shown a way in which the standard
actually guarantees that.

How does the standard guarantee *anything*?

This strictly conforming program:

int main(void) { return 0; }

when executed returns a status of 0 from main and does nothing else.
Adding an uncalled function to the same source file doesn't change
that.

[...]

There was, once, a view that was almost universally shared that
UB was meant for things that could not be precisely described
because hardware was too varied. We're well past that; now it's
a vehicle for compiler writers to make benchmarks faster, but is
(generally) hostile to programmers. A lot of hay is made about
it in this group, but at the core, it's just (ironically) not
well-defined.

The standard does say what UB is meant for. It says what UB
*is*, and what constructs lead to it (by omission in some cases).
Any optimization tricks played by compiler implementers must be
based on that specification.

[...]

I agree. printf("hello, world\n") must write that string to standard
output, which may be a file or an interactive device. Just what
that means is unspecified or implementation-defined. It might be
printed in EBCDIC or incised into clay tablets. Closing stdout,
which occurs when main() terminates, might involve firing the tablet
or emitting control sequences for a screen reader.

Exactly. It could also emit the string, "GOODBYE WORLD."

No, it couldn't. It must emit "hello, world\n" in some form.
It must emit the character 'h' as represented in the execution
character set, followed by 'e', and so on.

[...]

This presupposes that the program is strictly conforming, but
in the limit, the standard can be interpreted in such a way that
if any statement in the program is proveably UB (as this one is)
then the program cannot said to be strictly conforming.

It's not UB if it's never called. Behavior that doesn't happen is
not behavior.

I did not presuppose that the program is strictly conforming.
I read the source code and determined that it meets the standard's
definition of a strictly conforming program.

[...]

Ok, so in that case, would we say that "`foo` has undefined
behavior?" The qualification, "...if called" seems superfluous,
and I don't see anything in the standard that explicitly
disagrees.

The qualification "if called" is the whole point.

[...]

UB can time-travel, however. Because it's undefined, the
compiler is free to assume that it never executes, or that it
always executes.

"UB can time-travel" is perhaps an oversimplification. An example is
a bug that occurred in the Linux kernel, something like:

void func(int *ptr) {
do_something_with(*ptr);
if (ptr != NULL) {
blah();
}
}

The compiler, on seeing the expression `*ptr`, assumed that `ptr` is
not null, and elided the test on the following line.

But even assuming that's valid, a compiler absolutely cannot assume that
an instance UB always executes when, according to the semantics of the
program, it provably never executes.

[...]

So any program that produces no output at all is strictly
conforming? Then what about this?

#include <limits.h>

int
zero(void)
{
return (INT_MAX + 1) * 0;
}

int
main(void)
{
(void)zero();
return 0;
}

That's an interesting point. A more terse example:

#include <limits.h>
int main(void) {
int unused = INT_MAX + 1;
}

This program produces no output, yet clearly executes a function
that contains an expression that induces undefined behavior when
evaluated. I suppose an argument could be made that it _might_
generate output due to UB, as UB imposes no requirements Not to
do so, so perhaps the _absence_ of output depends on UB.

The program clearly has undefined behavior when executed, but no
output depends on that undefined behavior. In my humble opinion,
this demonstrates a flaw in the standard's definition of "strictly
conforming program". (As a programmer: Don't do that.)

[...]

In my ideal world, C would be rigorously defined with a precise
operational semantics. That would be accompanied by an
explanatory document that presented those semantics in lay
terms in prose, similar to the standard now, for those who did
not want to drive Coq or something similar. But at least we'd
have something definitive to define the language, so that when
there was apparent ambiguity, we had some objective metric by
which to judge. The C standard, as written, is nowhere close as
precise as it should be.

I do not think that this will ever happen: not only would it be
very difficult to produce (as you noted elsethread), I think the
compiler writers would rebel if they felt that their UB hands
were tied by a formal specification.

"There are only two kinds of languages: the ones people complain
about and the ones nobody uses."
-- Bjarne Stroustrup
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.22a-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Mon Jun 8 23:15:48 2026

From Newsgroup: comp.lang.c

In article <11075os$3fm4u$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

[...]

A naive compiler that performs no optimizations would generate
code for foo() that attempts to compute (INT_MAX+1)*0 step by
step, without recognizing the overflow, and that code would never
be executed.

Sure. But a far more sophisticated translator (and I would
argue a nefarious one) could emulate that code, decide it was
UB, and immediately fail translation with an error.

I disagree. That's not a sensible interpretation of what the
standard says.

I agree it's not sensible. But sadly, the standard does not
seem to explicitly prohibit it, either. This is the point: we
necessarily rely on a "reasonable interpretation" of the
standard to be able to usefully write C code. An adversarial
interpretation is not sensible, but it appears that such is
possible given the standard as written. This is a danger with a
language that is not formally specified.

A call to a foo() would have undefined behavior if it occurred.

What I'm really trying to get at is that the behavior of
`int zero = (INT_MAX + 1)*0;` is undefined in all cases. There
is no input for which it is valid at all. It is qualitatively
different than other examples where UB cannot be detected
_except_ at runtime.

In particular, it does not become defined just because it's in a
function that is not called; the behavior is UB on its face. It
is utterly meaningless as far as C is concerned; it is what
Regehr calls a "Type 3" function in his taxonomy at https://blog.regehr.org/archives/213: it literally has no
definition.

There
is no call to foo().

What I am further saying is that I do not see where the C
standard puts additional constraints on an implementation so
that it _must_ accept a program with such a construct in it, as
sensible as that may otherwise be (I actually don't think that
is very sensible, but that's my opinion). The specific wording
of the standard appears to allow a compiler to halt translation
if it observes that expression, whether it's in a function that
is called or not.

I readily concede that I may be wrong. But the arguments I have
heard opposing this interpration are not well-supported by the
text. I would be happy if someone could provide such an
argument that did not ultimately rely on either intuition or
assumptions about reasonable behavior, but so far, none have
been proferred.

Similarly:

int a = ..., b = ...;
int c;
if (b != 0) {
c = a / b;
}
else {
c = 0;
}

A division by zero would have undefined behavior if it occurred,
but it never occurs. A compiler cannot reject the above code
because of UB that never happens.

This I also agree with. But assuming this is in some function
that is otherwise well-defined, this is what Regehr calls a
"Type-1" function: there is no input for which it is undefined.

In this regard, it is qualitatively different than the `foo`
example that is the subject of this thread. I suggest that that
qualitative difference actually matters.

[...]

It returns a status of 0 from main and does nothing else.
A conforming implementation *must* generate code that implements
that behavior.

I have yet to find or be shown a way in which the standard
actually guarantees that.

How does the standard guarantee *anything*?

The thrust of what I have been driving at is that the standard
actually guarantees a lot less than people take for granted.

This strictly conforming program:

int main(void) { return 0; }

when executed returns a status of 0 from main and does nothing else.

Actually, does it? It also implicitly closes the standard
input, output, and error streams. That could have side effects.

Adding an uncalled function to the same source file doesn't change
that.

But it's not _just_ an uncalled function. It's an uncalled
function that is manifestly gibberish because there is no input
for which that expression is well-defined.

I have not found evidence that the standard explicitly prohibits
a pathological compiler from doing something unexpected in that
case. An adversarial read of the standard could allow a
compiler to treat this in a manner similar to a syntax error.

[...]

There was, once, a view that was almost universally shared that
UB was meant for things that could not be precisely described
because hardware was too varied. We're well past that; now it's
a vehicle for compiler writers to make benchmarks faster, but is
(generally) hostile to programmers. A lot of hay is made about
it in this group, but at the core, it's just (ironically) not
well-defined.

The standard does say what UB is meant for. It says what UB
*is*, and what constructs lead to it (by omission in some cases).
Any optimization tricks played by compiler implementers must be
based on that specification.

Yes. Just so. And it also says that anything not explicitly
stated in the standard is UB.

As we all know, the definition of UB in the standard is,
"behavior, upon use of a nonportable or erroneous program
construct or of erroneous data, for which this document imposes
no requirements."

Behavior is defined as, "external appearance or action". Note
that this does not explicitly state that "behavior" is only
applicable during execution, and we know that the standard, as
written today, says that some behaviors are "undefined" _at
translation time_. I cannot find something forbidding an
implementation from interpreting "external appearance or action"
to refer to the success or failure of translation and production
of an associated artifact. Translation phase 7 then says that
the after all of the preprocessing and so forth, "the resulting
tokens are syntactically and semantically analyzed and
translated as a translation unit." As written, a compiler could
certainly detect that that expression, whether executed or not,
is UB.

Indeed, sec 3.5.3 para 2, "Note 1 to entry", explicitly mentions
terminating translation as one of a few sample "undefined
behaviors". It doesn't say that the compiler _has_ to do that,
but does not say that it _must not_, either.

Sec 3.5.3 para 4 ("Note 3 to entry") is the closest I see to
mandating the interpretation you and Rentsch have taken, but
that is specific to _execution time_, not _translation time_,
and the latter is not outright banned from responding to UB: the
text of the standard imposes no requirements in this context.
Dare I say that the translation-time behavior is undefined?

[...]

I agree. printf("hello, world\n") must write that string to standard
output, which may be a file or an interactive device. Just what
that means is unspecified or implementation-defined. It might be
printed in EBCDIC or incised into clay tablets. Closing stdout,
which occurs when main() terminates, might involve firing the tablet
or emitting control sequences for a screen reader.

Exactly. It could also emit the string, "GOODBYE WORLD."

No, it couldn't. It must emit "hello, world\n" in some form.
It must emit the character 'h' as represented in the execution
character set, followed by 'e', and so on.

I didn't say that it wouldn't; I was referring specifically to
the behavior on closing stdout. You are right, it must emit
something corresponding to, "hello, world\n"; but what it does
after that is up to the implementation. We agree that it could
emit a terminal reset sequence; there is no reason that sequence
couldn't be, "GOODBYE WORLD." It'd be a weird one, but it's not
impossible.

[...]

This presupposes that the program is strictly conforming, but
in the limit, the standard can be interpreted in such a way that
if any statement in the program is proveably UB (as this one is)
then the program cannot said to be strictly conforming.

It's not UB if it's never called. Behavior that doesn't happen is
not behavior.

See above. The standard simply does not say that. The standard
merely says that behavior is something that manifests as
"external appearance or action." Translation is certainly an
action with an "external appearance" and nothing says that
behavior _during translation_ is any less "behavior" than
behavior during execution. In fact, the standard explicitly
mentions undefined behavior and translation.

I did not presuppose that the program is strictly conforming.

Well, you kinda did: you said that the program is strictly
conforming, and then said that it must be accepted because it is
strictly conforming. That acceptance is predicated on it being
strictly conforming.

I read the source code and determined that it meets the standard's
definition of a strictly conforming program.

I have presented what I think is an equally valid, alternative
reading of the text of the standard where that does not hold.

That reading is, admittedly, adversarial. That does not mean it
is wrong. I am saying that this is a weakness of the standard,
not a good interpretation.

40 years ago people thought the idea of that a post-modern
compiler time-travelling in the pursuit of optimization when UB
is detected during translation was an adversarial read of the
standard. And yet, here we are.

[...]

Ok, so in that case, would we say that "`foo` has undefined
behavior?" The qualification, "...if called" seems superfluous,
and I don't see anything in the standard that explicitly
disagrees.

The qualification "if called" is the whole point.

Except it's not. The behavior of that expression is simply
undefined; whether executed or not, there's no way it _could_ be
defined.

[...]

UB can time-travel, however. Because it's undefined, the
compiler is free to assume that it never executes, or that it
always executes.

"UB can time-travel" is perhaps an oversimplification.

An example is
a bug that occurred in the Linux kernel, something like:

void func(int *ptr) {
do_something_with(*ptr);
if (ptr != NULL) {
blah();
}
}

The compiler, on seeing the expression `*ptr`, assumed that `ptr` is
not null, and elided the test on the following line.

But even assuming that's valid, a compiler absolutely cannot assume that
an instance UB always executes when, according to the semantics of the >program, it provably never executes.

Time travel is a term of art, here. I posted this elsewhere in
the thread, and I think he does a much better job explaining it
than I can:
https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633

Reading a bit more, I think that C23 sec 3.5.3 para 4 appears
to be trying to reign that in. Hope springs eternal.

[...]

So any program that produces no output at all is strictly
conforming? Then what about this?

#include <limits.h>

int
zero(void)
{
return (INT_MAX + 1) * 0;
}

int
main(void)
{
(void)zero();
return 0;
}

That's an interesting point. A more terse example:

#include <limits.h>
int main(void) {
int unused = INT_MAX + 1;
}

Sure. Or consider this program:

```
#include <limits.h>

int
foo(int a)
{
extern int int_max;
int_max = INT_MAX + 1;
return int_max;
}

int
main(void)
{
return 0;
}
```

Suppose that no definition for `int_max` is provided; is this a
strictly conforming program? Consider section 6.9.1, which
describes external definitions. The relevant paragraph is 5,
which reads in part, "If an identifier declared with external
linkage is used in an expression somewhere in the entire program
there shall be exactly one external definition for the
identifier; otherwise, there shall be no more than one."

But as has been argued, `int_max` is not actually _used_, since
`foo` is never called. If that holds, then this ought to be
accepted by a conforming implementation. Yet, this fails to
build with both gcc and clang, clearly both consider `int_max`
to be "used". Ok, so what about this?

#include <limits.h>

int
foo(int a)
{
extern int int_max;
if ((INT_MAX + 1)*0) {
int_max = INT_MAX + 1;
}
return 0;
}

int
main(void)
{
return 0;
}

This _does_ build.

So it appears that, at least for `gcc` and `clang`, merely not
calling `foo` is insufficient.

This program produces no output, yet clearly executes a function
that contains an expression that induces undefined behavior when
evaluated. I suppose an argument could be made that it _might_
generate output due to UB, as UB imposes no requirements Not to
do so, so perhaps the _absence_ of output depends on UB.

The program clearly has undefined behavior when executed, but no
output depends on that undefined behavior. In my humble opinion,
this demonstrates a flaw in the standard's definition of "strictly
conforming program". (As a programmer: Don't do that.)

That's kind of what I'm saying. Though this interpretation
hinges on whether the absence of output can be defined as output
in some sense; in this case, the compiler could emit code that
says, "this program has UB", and I think that would be fine with
respect to the standard.

But the standard says that an implementation can stop
translating a program if it detects UB, and nothing appears to
limit that to functions that have been called from `main`.

[...]

In my ideal world, C would be rigorously defined with a precise
operational semantics. That would be accompanied by an
explanatory document that presented those semantics in lay
terms in prose, similar to the standard now, for those who did
not want to drive Coq or something similar. But at least we'd
have something definitive to define the language, so that when
there was apparent ambiguity, we had some objective metric by
which to judge. The C standard, as written, is nowhere close as
precise as it should be.

I do not think that this will ever happen: not only would it be
very difficult to produce (as you noted elsethread), I think the
compiler writers would rebel if they felt that their UB hands
were tied by a formal specification.

"There are only two kinds of languages: the ones people complain
about and the ones nobody uses."

Yup.

- Dan C.

--- Synchronet 3.22a-Linux NewsLink 1.2

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Tue Jun 9 01:25:04 2026

From Newsgroup: comp.lang.c

Dan Cross <cross@spitfire.i.gajendra.net> wrote:

In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

and in fact
it *won't* occur during execution because foo() isn't called.
A compiler can't generate code with arbitrary behavior just because
it can't prove that there will be no UB. If it could, every signed
or floating-point arithmetic operation with unknown operand values >>>>would grant the same permission.

But that's not the situation here. The situation is that the
compiler can prove that something _is_ UB.

In the program quoted at the top of this post, the UB occurs in
a function foo() that's never called. A compiler can replace the
body of foo() with a trap, and it can certainly warn about the UB,
but I don't believe it can reject the entire program. A clever
compiler could prove that the UB never occurs.

So there are two things that are at play here.

First, this notion that UB is _only_ a runtime matter. The text
of the standard contradicting that aside, if a translator can
detect that the behavior of a construct is provably undefined if
executed, then it seems axiomatic that UB is clearly something
that plays a role at translation time, as well.

I think that this paragraph (and several other it this post and
other posts) represent fundamental misanderstanding. This may
be due to the way C standard is written. AFAIK Extended Pascal
standard (once you translate terminalogy) states the same things as
C about UB, but in clearer way. Some relevant parts below:

: 3.1 Dynamic-violation
: A violation by a program of the requirements of this International
: Standard that a processor is permitted to leave undetected up to,
: but not beyond, execution of the declaration, definition, or
: statement that exhibits (see clause 6) the dynamic-violation.

: 3.2 Error
: A violation by a program of the requirements of this International
: Standard that a processor is permitted to leave undetected.
...
: 5.1 Processors
...
: e) be able to determine whether or not the program violates any
: requirements of this International Standard, where such a violation is
: not designated an error or dynamic-violation,
...

: 5.2 Programs
...
: b) if it conforms at level 1, use only those features of the language
: specified in clause 6;

UB in C standard corresponds with 'error' in Pascal standard. And
(by clause above) program is allowed only to use defined features,
trying to use something that has no definition (undefined by
ommision of definition) is automatically an error.

Overflow in arithmetic in Pascal is an error, as is accessing
wrong variant of variant record. Due to this accessing variable
using wring type is an error in Pascal.

Since valid programs shall contain no errors (as defined above)
Pascal compiler my optimize assuming that user program contains
no errors. This is the same as C compiler optimizing on assumption
that there is no udefined behaviour in the program.

Of course, C is different language than Pascal and in particular
C contains more "dangerous" constructs that may lead to
undefined behaviour.

However, the fundamental thing remain: detecting undefined behaviour
("errors") at compile time in general is hard, and compilers are
not obliged to do so. But they may optimize trusting that program
contains no undefined behaviour ("no error").

Indeed, I would go so far as to suggest that _most_ instances of
UB are detected and used (by the translator) during translation.

I think it is different: compiler _assumes_ no undefined behaviour
and optimizes accoringly. But when there is undefined behaviour,
then program behaves in unexpected way at runtime. Also, when
you assume a false thing, then you can logically derive anything
from it, so there is no limit to possible damage.

So to say that, "this program doesn't have UB because the
statement that contains UB is never executed" doesn't make a lot
of sense to me. It would be closer to being correct if one said
"this program is unaffected by UB since the expression that has
UB is never evaluated when the program executes": again, in this
case (as, I suspect, in most cases) the UB simply _is_: the
expression `INT_MAX + 1` does not become well-defined just
because it is never executed.

Well, what is interesting to users is runtime behaviour of programs
and undefined behaviour usually is runtime thing (as troubles that
can be easily detected at compile time are usually constraint
violations which should be detected at compile time). Fact that
some undefined behaviour can be detected at compile time does
not change this. And AFAICS there was very deliberate decision to
allow programs which contains code that would be undefined behaviour
if executed, but are considerd OK if such code is not executed.

BTW: Pascal wording is different but Pascal standard contains
identical provision and Pascal validation suite contains explicit
tests of this sort.

Second, there's this notion that the standard is just
underspecified with respect to these matters, specifically, it
does not _prohibit_ a translation from implementing an emulator
for the abstract machine that evaluates code at translation
time. Indeed, I suspect that _most_ compilers do something
largely analogous to that; that's how they detect UB so that
they can take advantage of it when optimizing. But if that's
the case, then nothing prohibits them from relieving themselves
of their obligation to follow the standard once they observe
that some bit of code has UB.

As I wrote, this is different. Compilers routinely compute some
constant expressions at compile time. Constant here meaning that
expression does not depend on runtime values. Compilers track
ranges of variables. But this is done using assumption that
there are no undefined behaviour. For example in loop:

for(int i = 1; i > 0; i++) {
...
}

absent assigments to i in loop body compiler may infer that 'i > 0'
and skip the test. If however there is undefined behaviour, then
compiler may infer any nonsense. This may look like compiler
detected undefined behaviour, but compiler typically do not check
consistency of inferences. In fact, intermediate things and
useless facts are quickly discarded, so detecting undefined
behaviour via inconsistency of inferred facts would significantly
increase memory use and probably also compile time.

A naive compiler that performs no optimizations would generate
code for foo() that attempts to compute (INT_MAX+1)*0 step by
step, without recognizing the overflow, and that code would never
be executed.

Sure. But a far more sophisticated translator (and I would
argue a nefarious one) could emulate that code, decide it was
UB, and immediately fail translation with an error.

As already noted C standard explictely forbids such behaviour.

BTW: There were past discussions of the same and other people
quited relevant passage which is quite explicit. I am not
going to search for it, but it is in the standard.

Is it? I am unable to locate where the standard _actually says
that it is_. That is my whole point.

Sorry, I looked in place given by other people, but I do not
remember exact location. I would say that once you find right
place and read it carefuly it is pretty clear.

And yet the standard does not say that. That is an
interpretation; I assume it is universally shared, but if we
want to limit ourselves to what the standard _actually says_ it
is woefully underspecified in this regard.

There was, once, a view that was almost universally shared that
UB was meant for things that could not be precisely described
because hardware was too varied.

Originally C was defined by single implementation which was not
doing much optimisation. But clearly starting from the first
C standard undefined behaviour had the same meaning as Pascal
error: permission for compilers to optimize on assumption that
is does not happen. The issue was well understood in seventies.
Already in late sixties Fortran compilers could do interesting
optimizations, not expected by naive users. In seventies
majority (or at least most influential) view was that it is
programmer resonsibility to obey language rules and that
compiler should optimize on assumption that rules are obeyed.
C reflect this point of view.

One can discuss if such point of view is valid now, but C
is a product of such thinking.

This is circular reasoning. You're saying that something that
is provably UB in this program cannot prevent that program from
being strictly confirming because the program is strictly
confirming.

What you wrote above is similar to standard wording, except that
standard formilates it much better, closer to "code that would
cause undefined behaviour if executed does not prevent otherwise
strictly confirming program from being strictly confirming".

In the past there were disscusion when an implementation can reject
a program. I do not remember what was the conclusion in the case
when implementation can prove that program must cause undefined
behaviour, but otherwise program violates no constraints. Probably
it can reject it, but I am not sure. But if there were any possiblity
that program may execute without undefined behaviour (including
containg code that would cause undefined behaviour if executed),
then implementation should accept such program.

This presupposes that the program is strictly conforming, but
in the limit, the standard can be interpreted in such a way that
if any statement in the program is proveably UB (as this one is)
then the program cannot said to be strictly conforming.

As I wrote, there is quite explicit statement in the standard
which says opposite of what you wrote above: mere presence of
code that would cause undefined behaviour if executed does not
make program non conforming.

In my ideal world, C would be rigorously defined with a precise
operational semantics. That would be accompanied by an
explanatory document that presented those semantics in lay
terms in prose, similar to the standard now, for those who did
not want to drive Coq or something similar. But at least we'd
have something definitive to define the language, so that when
there was apparent ambiguity, we had some objective metric by
which to judge. The C standard, as written, is nowhere close as
precise as it should be.

I do not think that this will ever happen: not only would it be
very difficult to produce (as you noted elsethread), I think the
compiler writers would rebel if they felt that their UB hands
were tied by a formal specification.

I do not thing operational semantics is best way to define C.
Naive operational semantics would define too much things, so
one would need serious work to define what should be defined
and leave undefined what should be undefined.

My personal favorite is axiomatic semantics. IMO it is quite well
adapted to defining programming languages. Substantial part of
Pascal was given axiomatic semantics in Alagic and Arbib book.
Pascal standard do not explitely use axiomatic semantics, but
my impression was that with managable effort it coulde be rewritten
using axiomatic semantics. Modern C standard is bigger, partly due
to library case, partly because C have much more operators. But
problem seem to be mostly quantity of needed text.

I think that compiler writers would welcome axiomatic semantics,
it would make their work simpler. More preciely, now compiler
writers must temselves translate standad text into formulation
similar to axiomatic semantics. Having official semantics
would make their work simpler. It would prevent implementing
some optimizations based on misunderstanding of the standard,
but if such optimizations were deemed worthty they could
implement then as a nonstandard thing (like -fast-math now)
and lobby for change in the standard.

I think biggest trouble is normal programmers. They already
struggle with current standard text. More formal presentation
could alienate even folks who now are able to explain standard
rules to other programmers.
--
Waldek Hebisch
--- Synchronet 3.22a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Mon Jun 8 18:51:58 2026

From Newsgroup: comp.lang.c

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <11075os$3fm4u$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

[...]

A naive compiler that performs no optimizations would generate
code for foo() that attempts to compute (INT_MAX+1)*0 step by
step, without recognizing the overflow, and that code would never
be executed.

Sure. But a far more sophisticated translator (and I would
argue a nefarious one) could emulate that code, decide it was
UB, and immediately fail translation with an error.

I disagree. That's not a sensible interpretation of what the
standard says.

I agree it's not sensible. But sadly, the standard does not
seem to explicitly prohibit it, either. This is the point: we
necessarily rely on a "reasonable interpretation" of the
standard to be able to usefully write C code. An adversarial
interpretation is not sensible, but it appears that such is
possible given the standard as written. This is a danger with a
language that is not formally specified.

I started to compose a followup, but I found that I was mostly
repeating things I've already written.

I see no semantic difference between code in a function that's never
called and code that simply isn't in the program. Neither allows
an implementation to reject a strictly conforming program -- and
yes, the program we've been discussing is as strictly conforming as
`int main(void){}`.

There's nothing special about functions as units of a program
subject to undefined behavior. These two programs are semantically
equivalent:
void foo(void) { do_something(); }
int main(void) { foo(); }
and
int main(void) { do_something(); }

A simpler demonstration program might be:

#include <limits.h>
int main(void) {
return 0;
INT_MAX+1;
}

I assert that it is strictly conforming.

The permission for UB to result in terminating a translation
isn't even in normative text. It's in a non-normative note,
which in principle means that it should be derivable from the
normative text of the standard. (I'm not entirely sure it can be.)
It certainly doesn't override the requirement that a conforming
hosted implementation shall accept any strictly conforming program.

[...]
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.22a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Jun 8 23:05:24 2026

From Newsgroup: comp.lang.c

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <865x3yd21n.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <86ik81cfk5.fsf_-_@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 2026-06-01 00:54, Keith Thompson wrote:

[...]

Yes, a compiler can reduce (a + b) * 0 to just 0. But it's not
required to do so, and (INT_MAX + 1) * 0 still has undefined
behavior. Undefined behavior is determined by the rules of the
abstract machine *without* any adjustments permitted by the as-if
rule.

This is something I really don't get in the actual C-logic...

Using constants that can be determined at compile time is UB here,
despite the '* 0' mathematically indicating an IMO clear semantics,
but using variables is only UB possibly at runtime? [...]

There's an important distinction to make here. Consider this
program:

#include <limits.h>

int
foo(){
int zero = (INT_MAX+1)*0;
return zero;
}

int
main(){
return 0;
}

This program does not transgress the bounds of undefined behavior.

To clarify, the comments in my posting were meant to be read as
saying the given text is the entire program, and that it is strictly
conforming with respect to conforming hosted implementations.
(Incidentally, given the rules for freestanding implementations, I'm
not sure that it is even possible for any program to be strictly
conforming with respect to conforming freestanding implementations.
In any case my statements were meant only in the context of hosted
implementations.)

Ok.

[snip]
Perhaps you mean that this is irrelevant because `foo` is not
invoked, but I see no reason why that need be the case in e.g.
a freestanding environment.

I explained the context of my previous statements above. Sorry for
not saying that in the original message.

In a hosted environment, I don't
think anything explicitly prevents `foo` from being called after
`main` returns (though I can't imagine that would happen in real
life; it would be weird if it did).

The semantics described in the ISO C standard don't admit that
possibility.

I have read through much of what has been said in the subthread
following this posting. I expect I will not be responding to much
of it; my overall sense is that the discussion is mostly confused.
I would like to say one thing here, and see if that helps things.

Could you please point to where it says this, in the C standard?

I cannot find anything that says that arbitrary code cannot run
after `main()` returns, and I don't see how that could possibly
be true.

The logic here is backwards. The C standard is prescriptive: it
says what _does_ happen, not what _doesn't_ happen. If one wants
to establish that some "action" takes place, it is necessary to
find a passage, or passages, in the C standard that, if all are
taken together, shows that the "action" occurs, or at least that it
can occur. The C standard doesn't need to say that, for example, a
function x() other than main(), whose name is never referenced,
will never be called. If someone wants to establish that x() could
be called, there needs to be a chain of reasoning going through the
semantic descriptions given in the C standard, to show that a call
to x() could occur. If there is no such chain of reasoning, naming
the pertinent passages in the C standard, to establish a possible
call, then there is no possible call. In other words the burden of
proof for a claim that some action could occur rests on whoever is
making the claim; there is no need to look for something in the C
standard that says something cannot occur.
--- Synchronet 3.22a-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Tue Jun 9 09:46:01 2026

From Newsgroup: comp.lang.c

In article <1107rk3$3ldg4$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <11075os$3fm4u$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

[...]

A naive compiler that performs no optimizations would generate
code for foo() that attempts to compute (INT_MAX+1)*0 step by
step, without recognizing the overflow, and that code would never
be executed.

Sure. But a far more sophisticated translator (and I would
argue a nefarious one) could emulate that code, decide it was
UB, and immediately fail translation with an error.

I disagree. That's not a sensible interpretation of what the
standard says.

I agree it's not sensible. But sadly, the standard does not
seem to explicitly prohibit it, either. This is the point: we
necessarily rely on a "reasonable interpretation" of the
standard to be able to usefully write C code. An adversarial
interpretation is not sensible, but it appears that such is
possible given the standard as written. This is a danger with a
language that is not formally specified.

I started to compose a followup, but I found that I was mostly
repeating things I've already written.

Yeah, I feel we're going around in circles, here.

I see no semantic difference between code in a function that's never
called and code that simply isn't in the program. Neither allows
an implementation to reject a strictly conforming program -- and
yes, the program we've been discussing is as strictly conforming as
`int main(void){}`.

That's the crux of the issue. I'm not convinced that it is. I
can see an argument for it (and it's a pretty strong one) but I
can see an argument against, and the standard as written is
underspecified in my opinion. Really, that's it.

There's nothing special about functions as units of a program
subject to undefined behavior. These two programs are semantically >equivalent:
void foo(void) { do_something(); }
int main(void) { foo(); }
and
int main(void) { do_something(); }

A simpler demonstration program might be:

#include <limits.h>
int main(void) {
return 0;
INT_MAX+1;
}

I assert that it is strictly conforming.

The permission for UB to result in terminating a translation
isn't even in normative text. It's in a non-normative note,
which in principle means that it should be derivable from the
normative text of the standard. (I'm not entirely sure it can be.)

That specific instance is not, no; that's in a note as you point
out. I believe deriving it from the normative text is based on
UB imposing no requirement at all on the implementation.

It certainly doesn't override the requirement that a conforming
hosted implementation shall accept any strictly conforming program.

...assuming the program is strictly conforming.

I have arrived at the same place you are with your "42 is not an
expression" example. The wording of the standard could be
improved to avoid things like this.

- Dan C.

--- Synchronet 3.22a-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Tue Jun 9 10:19:21 2026

From Newsgroup: comp.lang.c

In article <86tsrc8d0b.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:
[snip]

I cannot find anything that says that arbitrary code cannot run
after `main()` returns, and I don't see how that could possibly
be true.

The logic here is backwards. The C standard is prescriptive: it
says what _does_ happen, not what _doesn't_ happen.

The definition of undefined behavior in the standard says that
it _imposes no requirements._ It is explicit that it says it
mandates neither "what _does_ happen" nor "what _doesn't_
happen."

If one wants
to establish that some "action" takes place, it is necessary to
find a passage, or passages, in the C standard that, if all are
taken together, shows that the "action" occurs, or at least that it
can occur.

So you're saying that the proverbial nasal demons quip about UB
is incorrect, since it's not proscribed by the standard. Thanks
for clarfiying that.

The C standard doesn't need to say that, for example, a
function x() other than main(), whose name is never referenced,
will never be called. If someone wants to establish that x() could
be called, there needs to be a chain of reasoning going through the
semantic descriptions given in the C standard, to show that a call
to x() could occur.

Actually, no, a reference to a function is not necessary. A
couple of years ago, a well-publicized issue in a C++ compiler a
couple of years ago was something along the lines of this:

```
#include <stdio.h>
void foo(void);
int
main(void)
{
for (;;);
}

void
foo(void)
{
printf("never called\n");
}
```

The result of which, when run, was to print the text "never
called" and exit. That compiler was conformant with the text
of the standard.

If there is no such chain of reasoning, naming
the pertinent passages in the C standard, to establish a possible
call, then there is no possible call. In other words the burden of
proof for a claim that some action could occur rests on whoever is
making the claim; there is no need to look for something in the C
standard that says something cannot occur.

See above.

- Dan C.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Tue Jun 9 15:07:54 2026

From Newsgroup: comp.lang.c

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <1107rk3$3ldg4$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

[...]

The permission for UB to result in terminating a translation
isn't even in normative text. It's in a non-normative note,
which in principle means that it should be derivable from the
normative text of the standard. (I'm not entirely sure it can be.)

That specific instance is not, no; that's in a note as you point
out. I believe deriving it from the normative text is based on
UB imposing no requirement at all on the implementation.

No, the standard imposes no requirements on the *behavior*.
It still imposes requirements on the implementation.

The requirements imposed on an implementation are of a different
kind than the requirements imposed on a running program.
(An implementation might not even be writtin in C.)

For example, if a program dies with a segfault, it's likely due to
the program having undefined behavior. If a compiler dies with a
segfault, it's always a bug in the compiler (though the standard
doesn't say this).

If, as I suggest, the word "behavior" ("external appearance or
action") refers only to the behavior of a running program, then I
don't see how the non-normative permission to terminate a translation
follows from any normative text.

One possible argument is the statement in Section 4 that "A
*conforming hosted implementation* shall accept any strictly
conforming program", which *might* imply that a conforming hosted implementation is permitted to reject (not accept) any program that
is not strictly conforming. I'm not comfortable with that argument.

It certainly doesn't override the requirement that a conforming
hosted implementation shall accept any strictly conforming program.

...assuming the program is strictly conforming.

Or deriving the fact that a program is strictly conforming by reading
the program and the definition of "strictly conforming program".

I have arrived at the same place you are with your "42 is not an
expression" example. The wording of the standard could be
improved to avoid things like this.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.22a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Tue Jun 9 15:12:42 2026

From Newsgroup: comp.lang.c

cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]

Actually, no, a reference to a function is not necessary. A
couple of years ago, a well-publicized issue in a C++ compiler a
couple of years ago was something along the lines of this:

```
#include <stdio.h>
void foo(void);
int
main(void)
{
for (;;);
}

void
foo(void)
{
printf("never called\n");
}
```

The result of which, when run, was to print the text "never
called" and exit. That compiler was conformant with the text
of the standard.

[...]

That doesn't make sense to me. Do you have a citation to this incident,
and is it relevant to C?

There is a special rule in C about implementations being allowed
to assume that an infinite loop terminates (N3220 6.8.6.1p4),
but (a) it wouldn't apply to this case, and (b) even if it did,
it wouldn't imply that an implicit call to foo would be permitted.
I can imagine an argument that the program has undefined behavior
and therefore it could print "never called" or "nasal demons",
but I'd have to see the argument.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.22a-Linux NewsLink 1.2

From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Tue Jun 9 18:29:38 2026

From Newsgroup: comp.lang.c

On 2026-06-08 21:25, Waldek Hebisch wrote:

Dan Cross <cross@spitfire.i.gajendra.net> wrote:

In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

...

In the program quoted at the top of this post, the UB occurs in
a function foo() that's never called. A compiler can replace the
body of foo() with a trap, and it can certainly warn about the UB,
but I don't believe it can reject the entire program. A clever
compiler could prove that the UB never occurs.

So there are two things that are at play here.

First, this notion that UB is _only_ a runtime matter. The text
of the standard contradicting that aside, if a translator can
detect that the behavior of a construct is provably undefined if
executed, then it seems axiomatic that UB is clearly something
that plays a role at translation time, as well.

The committee has decided otherwise. The committee's resolution to DR
109 said:

"A conforming implementation must not fail to translate a strictly
conforming program simply because some possible execution of that
program would result in undefined behavior. Because foo might never be
called, the example given must be successfully translated by a
conforming implementation."

The module in question defined a function with a line that contained the expression-statement

1/0;

and that statement was absolutely guaranteed to be executed if the
function was called. However, since the module did not contain any calls
to that function, the committee ruled that an implementation was not
allowed to refuse to translate it.

If linked to another module that contained a call to that function,
whether or not the implementation could refuse translation depends upon
what could be said about the call:

1. If the call to that function was guaranteed to be executed upon
starting the program, the implementation may refuse translation.

2. If the call to that function was guaranteed to never be executed, the undefined behavior associated with 1/0 has no effect.

3. If the call to that function might or might not be executed, the
undefined behavior associated with 1/0 cannot have effect until
execution of that call becomes inevitable.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Tue Jun 9 16:01:14 2026

From Newsgroup: comp.lang.c

James Kuyper <jameskuyper@alumni.caltech.edu> writes:
[...]

The committee has decided otherwise. The committee's resolution to DR
109 said:

"A conforming implementation must not fail to translate a strictly
conforming program simply because some possible execution of that
program would result in undefined behavior. Because foo might never be called, the example given must be successfully translated by a
conforming implementation."

https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_109.html

[...]
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.22a-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Wed Jun 10 12:36:28 2026

From Newsgroup: comp.lang.c

In article <110a5vr$b2kq$5@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

James Kuyper <jameskuyper@alumni.caltech.edu> writes:
[...]

The committee has decided otherwise. The committee's resolution to DR
109 said:

"A conforming implementation must not fail to translate a strictly
conforming program simply because some possible execution of that
program would result in undefined behavior. Because foo might never be
called, the example given must be successfully translated by a
conforming implementation."

https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_109.html

[...]

That does appear to settle the matter definitively, thanks.

Ok, I was wrong and I concede that the program we've been
discussing is strictly conforming, regardless of however
antagnostic a reader of the standard may be.

- Dan C.

--- Synchronet 3.22a-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Wed Jun 10 14:37:01 2026

From Newsgroup: comp.lang.c

In article <110a34q$b2kq$2@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:

[...]

Actually, no, a reference to a function is not necessary. A
couple of years ago, a well-publicized issue in a C++ compiler a
couple of years ago was something along the lines of this:

```
#include <stdio.h>
void foo(void);
int
main(void)
{
for (;;);
}

void
foo(void)
{
printf("never called\n");
}
```

The result of which, when run, was to print the text "never
called" and exit. That compiler was conformant with the text
of the standard.

[...]

That doesn't make sense to me. Do you have a citation to this incident,

Yes: https://godbolt.org/z/d1WP4KP99

There was such an outcry when this was discovered that the C++
standard was modified to add a note explicitly allowing,
"trivial infinite loops, which cannot be removed or reordered." https://eel.is/c++draft/intro.progress

That change is commit 29fcc1c1fab7277d96bbd2ccd37b0c14dfd75a0e (https://github.com/cplusplus/draft/commit/29fcc1c1fab7277d96bbd2ccd37b0c14dfd75a0e)
in response to P2809: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p2809r3.html

and is it relevant to C?

Here's a C version with the same behavior:

```
term% cat weird.c
#include <stdio.h>

int
main(void)
{
for (unsigned int k = 0; k != 1; k += 2)
;
return 0;
}

void
hello(void)
{
printf("Hello, World!\n");
}
term% clang --version
clang version 22.1.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
term% clang -Wall -pedantic -O1 -std=c23 -o weird weird.c
term% ./weird
Hello, World!
term%
```

There is a special rule in C about implementations being allowed
to assume that an infinite loop terminates (N3220 6.8.6.1p4),

The program above meets the criteria in sec 6.8.6.1 para 4 that
allows an implementation to assume that the loop terminates.
Godbolt link: https://godbolt.org/z/q46o5cYGM

but (a) it wouldn't apply to this case, and (b) even if it did,
it wouldn't imply that an implicit call to foo would be permitted.
I can imagine an argument that the program has undefined behavior
and therefore it could print "never called" or "nasal demons",
but I'd have to see the argument.

Regehr aluded to this with his taxonomy of undefined functions.
For a function that is always undefined (a "Type 3" function), a
compiler is under no obligation to even produce a return
instruction for it, and the behavior of a call to such a
function is totally undefined. Nothing stops it from cascading
into whatever the linker happens to put after it.

Therefore, given UB, it is not necessary to have a reference to
some function in a program's source text in order for it to be
executed.

- Dan C.

--- Synchronet 3.22a-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Wed Jun 10 18:30:53 2026

From Newsgroup: comp.lang.c

In article <110bsqd$9ab$1@reader1.panix.com>,
Dan Cross <cross@spitfire.i.gajendra.net> wrote:

In article <110a34q$b2kq$2@kst.eternal-september.org>,
[snip]
Here's a C version with the same behavior:

```
term% cat weird.c
#include <stdio.h>

int
main(void)
{
for (unsigned int k = 0; k != 1; k += 2)
;
return 0;
}

void
hello(void)
{
printf("Hello, World!\n");
}
term% clang --version
clang version 22.1.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
term% clang -Wall -pedantic -O1 -std=c23 -o weird weird.c
term% ./weird
Hello, World!
term%
```

Replying to myself here, but...this is another example of weird
behavior:

```
term% cat boo.c
#include <limits.h>

int
monstartup(void)
{
return INT_MAX + 1;
}

int
main(void)
{
return 0;
}
term% clang --version | sed 1q
FreeBSD clang version 19.1.7 (https://github.com/llvm/llvm-project.git llvmorg-19.1.7-0-gcd708029e0b2)
term% clang -Wall -Wextra -pedantic -pedantic-errors -pg -fsanitize=undefined -o boo boo.c
boo.c:6:17: warning: overflow in expression; result is -2'147'483'648 with type 'int' [-Winteger-overflow]
6 | return INT_MAX + 1;
| ~~~~~~~~^~~
1 warning generated.
term% ./boo
boo.c:6:17: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior boo.c:6:17
term%
```

(I admit that I am cheating a bit, but I claim that this program
is strictly conforming.)

- Dan C.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Wed Jun 10 14:47:10 2026

From Newsgroup: comp.lang.c

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <110a34q$b2kq$2@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]

Actually, no, a reference to a function is not necessary. A
couple of years ago, a well-publicized issue in a C++ compiler a
couple of years ago was something along the lines of this:

```
#include <stdio.h>
void foo(void);
int
main(void)
{
for (;;);
}

void
foo(void)
{
printf("never called\n");
}
```

The result of which, when run, was to print the text "never
called" and exit. That compiler was conformant with the text
of the standard.

[...]

That doesn't make sense to me. Do you have a citation to this incident,

Yes: https://godbolt.org/z/d1WP4KP99

There was such an outcry when this was discovered that the C++
standard was modified to add a note explicitly allowing,
"trivial infinite loops, which cannot be removed or reordered." https://eel.is/c++draft/intro.progress

That change is commit 29fcc1c1fab7277d96bbd2ccd37b0c14dfd75a0e (https://github.com/cplusplus/draft/commit/29fcc1c1fab7277d96bbd2ccd37b0c14dfd75a0e)
in response to P2809: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p2809r3.html

So the reason the behavior was conforming was that the behavior of
the infinite loop is undefined. I dislike the way the C++ standard
expresses this. It says "The implementation *may assume* that any
thread will eventually do one of the following" (emphasis added).
More on that later in the context of the similar C rule.

and is it relevant to C?

Here's a C version with the same behavior:

```
term% cat weird.c
#include <stdio.h>

int
main(void)
{
for (unsigned int k = 0; k != 1; k += 2)
;
return 0;
}

void
hello(void)
{
printf("Hello, World!\n");
}
term% clang --version
clang version 22.1.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
term% clang -Wall -pedantic -O1 -std=c23 -o weird weird.c
term% ./weird
Hello, World!
term%
```

There is a special rule in C about implementations being allowed
to assume that an infinite loop terminates (N3220 6.8.6.1p4),

The program above meets the criteria in sec 6.8.6.1 para 4 that
allows an implementation to assume that the loop terminates.
Godbolt link: https://godbolt.org/z/q46o5cYGM

Right. ("for (;;);" in the original program does not.)

Note that the C++ special rule applies only when the condition is
equivalent to a constant `true` and the body of the loop is empty.
An implementation can "assume" that any other loop will eventually
finish.

The rule in C is (6.8.6.1p4):

An iteration statement may be assumed by the implementation
to terminate if its controlling expression is not a constant
expression, and none of the following operations are performed
in its body, controlling expression or (in the case of a for
statement) its expression-3
rCo input/output operations
rCo accessing a volatile object
rCo synchronization or atomic operations.

`for (;;)` is treated as having a constant controlling expression.

This covers more cases than the C++ rule.

I dislike it for most of the same reasonss. It should be phrased
in terms of the permitted behavior of a program, not what an
implementation is allowed to "assume".

In addition to that, I dislike the whole idea. I think it's
intended to enable optimizations, but it means that for this
contrived program:

#include <stdio.h>
int main(void) {
bool keep_going = true;
while (keep_going) {
keep_going = true;
}
puts("never reached");
}

the implementation is allowed to "assume" that the loop eventually
terminates. It's not clear what permissions the implementation is being
given if the assumption is violated. I think the program could legally
print "never reached", but if violating the assumption implies undefined behavior it could do anything.

A programmer could easily write a program similar to the above
and think that the meaning is perfectly clear, have it behave very
differently because of one obscure subclause in the standard.

but (a) it wouldn't apply to this case, and (b) even if it did,
it wouldn't imply that an implicit call to foo would be permitted.
I can imagine an argument that the program has undefined behavior
and therefore it could print "never called" or "nasal demons",
but I'd have to see the argument.

Regehr aluded to this with his taxonomy of undefined functions.
For a function that is always undefined (a "Type 3" function), a
compiler is under no obligation to even produce a return
instruction for it, and the behavior of a call to such a
function is totally undefined. Nothing stops it from cascading
into whatever the linker happens to put after it.

Therefore, given UB, it is not necessary to have a reference to
some function in a program's source text in order for it to be
executed.

Of course. Given UB, anything can happen. There's nothing special
about a function that's never called in that context. It just
happens to be the way it showed up in the C++ incident.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.22a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Wed Jun 10 14:55:00 2026

From Newsgroup: comp.lang.c

cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]

Replying to myself here, but...this is another example of weird
behavior:

```
term% cat boo.c
#include <limits.h>

int
monstartup(void)
{
return INT_MAX + 1;
}

int
main(void)
{
return 0;
}

[SNIP]

(I admit that I am cheating a bit, but I claim that this program
is strictly conforming.)

I agree that the program is strictly conforming.

I don't know the details, but I think "monstartup" is a special name,
and that the program would behave as expected if a different name
were used. Since "monstartup" is not reserved, an implementation
that visibly treats it specially is not conforming.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.22a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Wed Jun 10 15:11:46 2026

From Newsgroup: comp.lang.c

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <86tsrc8d0b.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
[...]

The C standard doesn't need to say that, for example, a
function x() other than main(), whose name is never referenced,
will never be called. If someone wants to establish that x() could
be called, there needs to be a chain of reasoning going through the
semantic descriptions given in the C standard, to show that a call
to x() could occur.

Actually, no, a reference to a function is not necessary. A
couple of years ago, a well-publicized issue in a C++ compiler a
couple of years ago was something along the lines of this:
[...]

This is comp.lang.c. My comments were only about C, and not
about C++. But of course you already knew that.
--- Synchronet 3.22a-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Wed Jun 10 22:44:26 2026

From Newsgroup: comp.lang.c

In article <86ldcm82ql.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <86tsrc8d0b.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
[...]

The C standard doesn't need to say that, for example, a
function x() other than main(), whose name is never referenced,
will never be called. If someone wants to establish that x() could
be called, there needs to be a chain of reasoning going through the
semantic descriptions given in the C standard, to show that a call
to x() could occur.

Actually, no, a reference to a function is not necessary. A
couple of years ago, a well-publicized issue in a C++ compiler a
couple of years ago was something along the lines of this:
[...]

This is comp.lang.c. My comments were only about C, and not
about C++. But of course you already knew that.

I see you did not read the other messages in the (sub)thread,
but ok, here it is again, in C:

```
term% cat what.c
#include <stdio.h>
int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
void hello(void) { printf("Hello, World!\n"); }
term% clang --version | sed 1q
clang version 22.1.6
term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c what.c:2:58: warning: for loop has empty body [-Wempty-body]
2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
| ^
what.c:2:58: note: put the semicolon on a separate line to silence this warning 1 warning generated.
term% ./what
Hello, World!
term%
```

- Dan C.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Wed Jun 10 16:19:34 2026

From Newsgroup: comp.lang.c

cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]

I see you did not read the other messages in the (sub)thread,
but ok, here it is again, in C:

```
term% cat what.c
#include <stdio.h>
int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
void hello(void) { printf("Hello, World!\n"); }
term% clang --version | sed 1q
clang version 22.1.6
term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c what.c:2:58: warning: for loop has empty body [-Wempty-body]
2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
| ^ what.c:2:58: note: put the semicolon on a separate line to silence this warning
1 warning generated.
term% ./what
Hello, World!
term%
```

I see the same behavior.

The following largely repeats what I've written previously in
this thread.

Apparently the authors of clang decided that this statement in N3220
6.8.6.p4:

An iteration statement may be assumed by the implementation to
terminate if its controlling expression is not a constant
expression, ...

means that a program that violates that assumption has undefined
behavior. I intensely dislike both the rule and the way it's stated,
but I agree that the conclusion that the behavior is undefined is
a reasonable one.

Of course since the behavior is undefined, *anything* could happen.
I don't know what happened inside clang (or the minds of its
maintainers) that caused it to generate code that executes a
statement in the body of a function that's never called, but that's
just one of the infinitely many allowed behaviors. A quick look at the generated code indicates that there's no x86-64 "retq" instruction
for either main() or hello(), and apparently control falls through
from the end of main() to the body of hello(). That seems weird.

It might just be a bug (but not one that, as far as I can tell,
violates the C standard).

A function whose body contains a construct that would have undefined
behavior if the function were called (not the case here) does not
cause undefined behavior if there are no calls to the function.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.22a-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Wed Jun 10 23:32:47 2026

From Newsgroup: comp.lang.c

In article <110cmfk$116qm$3@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:

[...]

Replying to myself here, but...this is another example of weird
behavior:

```
term% cat boo.c
#include <limits.h>

int
monstartup(void)
{
return INT_MAX + 1;
}

int
main(void)
{
return 0;
}

[SNIP]

(I admit that I am cheating a bit, but I claim that this program
is strictly conforming.)

I agree that the program is strictly conforming.

I don't know the details, but I think "monstartup" is a special name,
and that the program would behave as expected if a different name
were used. Since "monstartup" is not reserved, an implementation
that visibly treats it specially is not conforming.

That's why it's cheating: `monstartup` is a function called from
the C runtime when using the `gprof` profiler, before `main` is
called, and I just happen to know that the csu code will call a
function by that name if compiled with profiling enabled. Thus,
this program can tickle the UB in `monstartup` in some weird
configurations. This is outside of the domain of strictly
defined C, but it is the sort of thing that happens in the real
world. Caveat emptor.

- Dan C.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Thu Jun 11 16:49:09 2026

From Newsgroup: comp.lang.c

On 2026-06-09 03:25, Waldek Hebisch wrote:

[...]

Interesting views. - Thanks.

I think biggest trouble is normal programmers. They already
struggle with current standard text. More formal presentation
could alienate even folks who now are able to explain standard
rules to other programmers.

I'm not sure what "normal programmers" are. From own experience
I can just say that there's a difference between what's "formal"
in a "lawyer's speeches and texts" sense and what's formal in a
mathematical sense. - The C-Standard as had been quoted here is
more of a lawyer's text, with its inherent property of not being
formally (in a mathematical sense) accurate (despite their tries;
in both areas, law and programming language, respectively). It's
thus not necessarily a problem if we'd have a more [mathematical]
formal standard. - Programmers, as I see it, need definite texts.
And rejection of the "lawyer's" sort of texts is not surprising.
That not necessarily affects their acceptance will of more formal specifications.

Janis

--- Synchronet 3.22a-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Thu Jun 11 15:20:01 2026

From Newsgroup: comp.lang.c

In article <110eht5$1naub$5@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 2026-06-09 03:25, Waldek Hebisch wrote:

[...]

Interesting views. - Thanks.

I think biggest trouble is normal programmers. They already
struggle with current standard text. More formal presentation
could alienate even folks who now are able to explain standard
rules to other programmers.

I'm not sure what "normal programmers" are. From own experience
I can just say that there's a difference between what's "formal"
in a "lawyer's speeches and texts" sense and what's formal in a
mathematical sense. - The C-Standard as had been quoted here is
more of a lawyer's text, with its inherent property of not being
formally (in a mathematical sense) accurate (despite their tries;
in both areas, law and programming language, respectively). It's
thus not necessarily a problem if we'd have a more [mathematical]
formal standard. - Programmers, as I see it, need definite texts.
And rejection of the "lawyer's" sort of texts is not surprising.
That not necessarily affects their acceptance will of more formal >specifications.

One hopes that a formal specification (that's a term of art, and
implies something that's mathematically precise) would be
accompanied by a commentary for more casual reading. However,
the truly precise, formal specification would be considered
definitive.

I think the odds of this ever happening for C are slim to none,
but it would be useful.

- Dan C.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Thu Jun 11 17:34:35 2026

From Newsgroup: comp.lang.c

On 2026-06-11 08:56, David Brown wrote:

On 10/06/2026 23:47, Keith Thompson wrote:

[...]

#include <stdio.h>
int main(void) {
-a-a-a-a bool keep_going = true;
-a-a-a-a while (keep_going) {
-a-a-a-a-a-a-a-a keep_going = true;
-a-a-a-a }
-a-a-a-a puts("never reached");
}

[...]

[...]

The loop might originally have contained source code, but become empty through pre-processing, or from other compiler transformations (such as
the compiler seeing that the "keep_going" variable is not volatile and
its value is never used, so assignments to it can be elided, or moving
other things outside the loop body).

A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite.-a But is it likely?

I think we should not make any assumptions about the "creativity" of a programmer ("C" or else). - Semantics should be well defined, and then
clear to the programmer.

In my
experience, infinite loops are generally very clearly written - either
as "for (;;)" loops or "while (true)" loops - or they are the result of
bugs in the code that accidentally run forever.-a If the loop is accidentally infinite, the programmer will already be expecting it to
run the code after the loop.

[...]

So while I agree that this kind of thing can lead to curiosities and behaviour that seems counter-intuitive, and is popular with the "modern compilers are evil" crowd, I really do not see it as an issue in
practice.-a There are many other mistakes programmers can make, or UB
that they hit accidentally - this is a drop in the ocean IMHO.

Languages shall be sensibly and clearly defined. For bad designs (or
bad standards) the language or standard should be blamed, and not the
critics badly and inappropriately despised as ''"modern compilers are
evil" crowd''. - Programmers are at the final end of the "food chain".
And there's a lot of horrible pits in the C-language where programmers
"made the mistake" to fall in; don't blame them, neither the ones who
silently suffer nor the ones who shout out.

Janis

--- Synchronet 3.22a-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Thu Jun 11 17:45:31 2026

From Newsgroup: comp.lang.c

On 2026-06-10 16:37, Dan Cross wrote:

[...]

Here's a C version with the same behavior:

```
term% cat weird.c
#include <stdio.h>

int
main(void)
{
for (unsigned int k = 0; k != 1; k += 2)
;
return 0;
}

void
hello(void)
{
printf("Hello, World!\n");
}
term% clang --version
clang version 22.1.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
term% clang -Wall -pedantic -O1 -std=c23 -o weird weird.c
term% ./weird
Hello, World!
term%
```

Wow, that's really fascinating! (In a bad sense.)

And (in clang) just an effect of the '-O1' (as I notice).

I may have missed the "programming language design" wisdom of the
past decades. Back then we had the conception that "optimization"
is a method to transform a program to a _functionally equivalent_
code (one that is faster, requires less memory, or some such).

Janis

[...]

--- Synchronet 3.22a-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Thu Jun 11 18:08:39 2026

From Newsgroup: comp.lang.c

On 2026-06-11 17:20, Dan Cross wrote:

In article <110eht5$1naub$5@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

I think biggest trouble is normal programmers. They already
struggle with current standard text. More formal presentation
could alienate even folks who now are able to explain standard
rules to other programmers.

I'm not sure what "normal programmers" are. From own experience
I can just say that there's a difference between what's "formal"
in a "lawyer's speeches and texts" sense and what's formal in a
mathematical sense. - The C-Standard as had been quoted here is
more of a lawyer's text, with its inherent property of not being
formally (in a mathematical sense) accurate (despite their tries;
in both areas, law and programming language, respectively). It's
thus not necessarily a problem if we'd have a more [mathematical]
formal standard. - Programmers, as I see it, need definite texts.
And rejection of the "lawyer's" sort of texts is not surprising.
That not necessarily affects their acceptance will of more formal
specifications.

One hopes that a formal specification (that's a term of art, and
implies something that's mathematically precise) would be
accompanied by a commentary for more casual reading.

Commentaries generally make sense, and they are one possibility
to serve the needs also of programmers. But a more formal text
would also help the authors of textbooks to provide a clearer
description for those programmers that are repelled by standards
papers.

However,
the truly precise, formal specification would be considered
definitive.

Yes. (That's what I intended to express.)

I think the odds of this ever happening for C are slim to none,
but it would be useful.

I agree. (And I don't wait for that; I'm taking "C" as it is.)

Janis

--- Synchronet 3.22a-Linux NewsLink 1.2

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Thu Jun 11 16:30:45 2026

From Newsgroup: comp.lang.c

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 2026-06-09 03:25, Waldek Hebisch wrote:

[...]

Interesting views. - Thanks.

I think biggest trouble is normal programmers. They already
struggle with current standard text. More formal presentation
could alienate even folks who now are able to explain standard
rules to other programmers.

I'm not sure what "normal programmers" are. From own experience
I can just say that there's a difference between what's "formal"
in a "lawyer's speeches and texts" sense and what's formal in a
mathematical sense. - The C-Standard as had been quoted here is
more of a lawyer's text, with its inherent property of not being
formally (in a mathematical sense) accurate (despite their tries;
in both areas, law and programming language, respectively). It's
thus not necessarily a problem if we'd have a more [mathematical]
formal standard. - Programmers, as I see it, need definite texts.
And rejection of the "lawyer's" sort of texts is not surprising.
That not necessarily affects their acceptance will of more formal specifications.

You sniped most of what I wrote. I certainly would prefer standard
that is less lawyerish and more mathematical, say written in similar
way to Pascal standard. But there is a _big_ gap between normal
mathematical text and a formal mathematical text (and let me note that
Pascal standard is less formal than normal mathematics). Normal
mathematical text depends on human understanding to disambiguate
and bridge small inconsistencies. Formal one has parts which
are there only because authors were not able to avoid
ambiguity in simpler way. And once things are written in a way
that is well fit to formalizm they tend to be much less
understandable to uninitiated.
--
Waldek Hebisch
--- Synchronet 3.22a-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Thu Jun 11 20:52:30 2026

From Newsgroup: comp.lang.c

On 2026-06-11 18:30, Waldek Hebisch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 2026-06-09 03:25, Waldek Hebisch wrote:

[...]

Interesting views. - Thanks.

I think biggest trouble is normal programmers. They already
struggle with current standard text. More formal presentation
could alienate even folks who now are able to explain standard
rules to other programmers.

I'm not sure what "normal programmers" are. From own experience
I can just say that there's a difference between what's "formal"
in a "lawyer's speeches and texts" sense and what's formal in a
mathematical sense. - The C-Standard as had been quoted here is
more of a lawyer's text, with its inherent property of not being
formally (in a mathematical sense) accurate (despite their tries;
in both areas, law and programming language, respectively). It's
thus not necessarily a problem if we'd have a more [mathematical]
formal standard. - Programmers, as I see it, need definite texts.
And rejection of the "lawyer's" sort of texts is not surprising.
That not necessarily affects their acceptance will of more formal
specifications.

You sniped most of what I wrote.

Yes, because I acknowledged it by my above on-line remark already
(and I didn't want to waste space unnecessarily). (No offense!)

I intended to comment just on the one paragraph above, with its
assumption that it may be an inherent problem to programmers.

To elaborate only a bit more...
There's folks who have problems with "lawyer's speech" standards.
There's folks who have problems with formal mathematical standards.
But, as to my observation, there's *no* strict or natural hierarchy
that one would imply the other.

You said: "They already struggle with current standard text."
as if there would be a strict "one implies the other" fact; there
isn't one, or to be more cautious, "there isn't necessarily one".
(I used the wording "necessarily" already in my original comment.)

I certainly would prefer standard
that is less lawyerish and more mathematical, say written in similar
way to Pascal standard. But there is a _big_ gap between normal
mathematical text and a formal mathematical text (and let me note that
Pascal standard is less formal than normal mathematics).

I agree.

Normal
mathematical text depends on human understanding to disambiguate
and bridge small inconsistencies. Formal one has parts which
are there only because authors were not able to avoid
ambiguity in simpler way. And once things are written in a way
that is well fit to formalizm they tend to be much less
understandable to uninitiated.

(I'll leave that uncommented. - I've said all I intended to say.)

Janis

--- Synchronet 3.22a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Thu Jun 11 13:29:10 2026

From Newsgroup: comp.lang.c

David Brown <david.brown@hesbynett.no> writes:
[...]

The idea of all this is given in a footnote in the C standards - "This
is intended to allow compiler transformations such as removal of empty
loops even when termination cannot be proven."

The loop might originally have contained source code, but become empty through pre-processing, or from other compiler transformations (such
as the compiler seeing that the "keep_going" variable is not volatile
and its value is never used, so assignments to it can be elided, or
moving other things outside the loop body).

A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite. But is it likely? In my
experience, infinite loops are generally very clearly written - either
as "for (;;)" loops or "while (true)" loops - or they are the result
of bugs in the code that accidentally run forever. If the loop is accidentally infinite, the programmer will already be expecting it to
run the code after the loop.

How about a loop that has a non-constant condition, but that is
not expected to terminate in normal usage?

while (! something_really_bad_happened()) {
sleep(1);
}
self_destruct();

A compiler could "assume" that the loop terminates, even if something_really_bad never happens, and that assumption could result in
a call to self_destruct(). There are probably better ways to do that,
but it's straightforward code with seemingly obvious semantics that
an implementation is permitted to make unwarrated assumptions about.

[...]
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.22a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Thu Jun 11 15:38:41 2026

From Newsgroup: comp.lang.c

cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]

I suspect the original intent is as you said, to support removal
of "dead" loops where the body has been optimized away, or
excised using conditional compilation. Something like,

#ifdef DEBUG
#define DOTHING true
#else
#define DOTHING false
#endif

...
for (int i = 0; i < n; i++) {
if (DOTHING) {
// Something complex here...
}
}

If `DEBUG` is not defined in the preprocessor, the compiler has
license to elide the entire loop as part of dead code
elimination.

I think I see what you mean, but in this particular case the loop
can be proven to terminate unless `i` is modified in the body of
the loop, and a compiler can elide the entire loop anyway.

[...]

As I understand it, primarily by reading the C++ problem report,
which covers both C and C++ for background, the idea is to
guarantee forward progress for programs that make use of
threads: consider cooperatively-scheduled green threads; a
programmer who inadvertantly creates an infinite loop shouldn't
be able to starve all threads for access to the CPU.

Personally, I don't think C should be in the business of doing
such things. But it is what it is.

I agree.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.22a-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Thu Jun 11 23:07:00 2026

From Newsgroup: comp.lang.c

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes: >cross@spitfire.i.gajendra.net (Dan Cross) writes:

[...]

I suspect the original intent is as you said, to support removal
of "dead" loops where the body has been optimized away, or
excised using conditional compilation. Something like,

#ifdef DEBUG
#define DOTHING true
#else
#define DOTHING false
#endif

...
for (int i = 0; i < n; i++) {
if (DOTHING) {
// Something complex here...
}
}

If `DEBUG` is not defined in the preprocessor, the compiler has
license to elide the entire loop as part of dead code
elimination.

I think I see what you mean, but in this particular case the loop
can be proven to terminate unless `i` is modified in the body of

...unless 'i' or 'n' is modified in the body of
--- Synchronet 3.22a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Thu Jun 11 16:28:47 2026

From Newsgroup: comp.lang.c

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <110cre9$13aa9$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]

I see you did not read the other messages in the (sub)thread,
but ok, here it is again, in C:

```
term% cat what.c
#include <stdio.h>
int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
void hello(void) { printf("Hello, World!\n"); }
term% clang --version | sed 1q
clang version 22.1.6
term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c >>> what.c:2:58: warning: for loop has empty body [-Wempty-body]
2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
| ^
what.c:2:58: note: put the semicolon on a separate line to silence this warning
1 warning generated.
term% ./what
Hello, World!
term%
```

I see the same behavior.

The following largely repeats what I've written previously in
this thread.

Apparently the authors of clang decided that this statement in N3220 >>6.8.6.p4:

An iteration statement may be assumed by the implementation to
terminate if its controlling expression is not a constant
expression, ...

means that a program that violates that assumption has undefined
behavior. I intensely dislike both the rule and the way it's stated,
but I agree that the conclusion that the behavior is undefined is
a reasonable one.

I think the behavior is technical "unspecified" in the sense of
the C standard, but yes, this is the important bit. The
controlling expresion is not constant, and the loop doesn't meet
any of the other criteria set forth in sec 6.8.6 para 4 for,
therefore, the translator may assume it terminates (it is
unspecified whether or not it does; either behavior is correct.
GCC, for example, appears not to make the same assumption).

Why do you think the behavior is unspecified rather that undefined?

Unspecified behavior is defined as: "behavior, that results from
the use of an unspecified value, or other behavior upon which
this document provides two or more possibilities and imposes
no further requirements on which is chosen in any instance". (Implementation-defined behavior differs from unspecified behavior
in that the implementation must document how the choice is made.)

What are the "two more more possibilities" in this case?

[SNIP]
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.22a-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Thu Jun 11 23:46:14 2026

From Newsgroup: comp.lang.c

In article <110fgbi$1qf9f$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <110cre9$13aa9$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:

[...]

I see you did not read the other messages in the (sub)thread,
but ok, here it is again, in C:

```
term% cat what.c
#include <stdio.h>
int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; } >>>> void hello(void) { printf("Hello, World!\n"); }
term% clang --version | sed 1q
clang version 22.1.6
term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c >>>> what.c:2:58: warning: for loop has empty body [-Wempty-body]
2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
| ^
what.c:2:58: note: put the semicolon on a separate line to silence this warning
1 warning generated.
term% ./what
Hello, World!
term%
```

I see the same behavior.

The following largely repeats what I've written previously in
this thread.

Apparently the authors of clang decided that this statement in N3220 >>>6.8.6.p4:

An iteration statement may be assumed by the implementation to
terminate if its controlling expression is not a constant
expression, ...

means that a program that violates that assumption has undefined >>>behavior. I intensely dislike both the rule and the way it's stated,
but I agree that the conclusion that the behavior is undefined is
a reasonable one.

I think the behavior is technical "unspecified" in the sense of
the C standard, but yes, this is the important bit. The
controlling expresion is not constant, and the loop doesn't meet
any of the other criteria set forth in sec 6.8.6 para 4 for,
therefore, the translator may assume it terminates (it is
unspecified whether or not it does; either behavior is correct.
GCC, for example, appears not to make the same assumption).

Why do you think the behavior is unspecified rather that undefined?

Unspecified behavior is defined as: "behavior, that results from
the use of an unspecified value, or other behavior upon which
this document provides two or more possibilities and imposes
no further requirements on which is chosen in any instance". >(Implementation-defined behavior differs from unspecified behavior
in that the implementation must document how the choice is made.)

What are the "two more more possibilities" in this case?

The two choices are that the implementation may assume the loop
terminates, or it may not, but it doesn't say which. I don't
think that the language permits it to be UB. But I could be
wrong. It's a bit of a distinction without a difference as far
as the outcome is concerned.

- Dan C.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Thu Jun 11 17:43:52 2026

From Newsgroup: comp.lang.c

scott@slp53.sl.home (Scott Lurndal) writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

[...]

I think I see what you mean, but in this particular case the loop
can be proven to terminate unless `i` is modified in the body of

...unless 'i' or 'n' is modified in the body of

Touch|-.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.22a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Thu Jun 11 18:29:54 2026

From Newsgroup: comp.lang.c

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <110fgbi$1qf9f$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <110cre9$13aa9$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:

[...]

I see you did not read the other messages in the (sub)thread,
but ok, here it is again, in C:

```
term% cat what.c
#include <stdio.h>
int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; } >>>>> void hello(void) { printf("Hello, World!\n"); }
term% clang --version | sed 1q
clang version 22.1.6
term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c >>>>> what.c:2:58: warning: for loop has empty body [-Wempty-body]
2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
| ^
what.c:2:58: note: put the semicolon on a separate line to silence this warning
1 warning generated.
term% ./what
Hello, World!
term%
```

I see the same behavior.

The following largely repeats what I've written previously in
this thread.

Apparently the authors of clang decided that this statement in N3220 >>>>6.8.6.p4:

An iteration statement may be assumed by the implementation to
terminate if its controlling expression is not a constant
expression, ...

means that a program that violates that assumption has undefined >>>>behavior. I intensely dislike both the rule and the way it's stated, >>>>but I agree that the conclusion that the behavior is undefined is
a reasonable one.

I think the behavior is technical "unspecified" in the sense of
the C standard, but yes, this is the important bit. The
controlling expresion is not constant, and the loop doesn't meet
any of the other criteria set forth in sec 6.8.6 para 4 for,
therefore, the translator may assume it terminates (it is
unspecified whether or not it does; either behavior is correct.
GCC, for example, appears not to make the same assumption).

Why do you think the behavior is unspecified rather that undefined?

Unspecified behavior is defined as: "behavior, that results from
the use of an unspecified value, or other behavior upon which
this document provides two or more possibilities and imposes
no further requirements on which is chosen in any instance". >>(Implementation-defined behavior differs from unspecified behavior
in that the implementation must document how the choice is made.)

What are the "two more more possibilities" in this case?

The two choices are that the implementation may assume the loop
terminates, or it may not, but it doesn't say which. I don't
think that the language permits it to be UB. But I could be
wrong. It's a bit of a distinction without a difference as far
as the outcome is concerned.

No, those are not the two choices. An assumption made by an
implementation is not behavior ("external appearance or action").
An implementation might invoke some behavior as a result of some
assumption.

If a loop doesn't terminate and the implementation assumes that
it does, the standard says nothing about the resulting behavior.
It doesn't provide two or more options for the actual behavior.
That's classic UB.

We've seen cases here where the actual behavior is falling through
into a function that's never called. That's certainly not a
possibility provided by the standard.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.22a-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Fri Jun 12 01:54:09 2026

From Newsgroup: comp.lang.c

In article <110fnem$1s3nm$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <110fgbi$1qf9f$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <110cre9$13aa9$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:

[...]

I see you did not read the other messages in the (sub)thread,
but ok, here it is again, in C:

```
term% cat what.c
#include <stdio.h>
int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; } >>>>>> void hello(void) { printf("Hello, World!\n"); }
term% clang --version | sed 1q
clang version 22.1.6
term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c >>>>>> what.c:2:58: warning: for loop has empty body [-Wempty-body]
2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
| ^ >>>>>> what.c:2:58: note: put the semicolon on a separate line to silence this warning
1 warning generated.
term% ./what
Hello, World!
term%
```

I see the same behavior.

The following largely repeats what I've written previously in
this thread.

Apparently the authors of clang decided that this statement in N3220 >>>>>6.8.6.p4:

An iteration statement may be assumed by the implementation to
terminate if its controlling expression is not a constant
expression, ...

means that a program that violates that assumption has undefined >>>>>behavior. I intensely dislike both the rule and the way it's stated, >>>>>but I agree that the conclusion that the behavior is undefined is
a reasonable one.

I think the behavior is technical "unspecified" in the sense of
the C standard, but yes, this is the important bit. The
controlling expresion is not constant, and the loop doesn't meet
any of the other criteria set forth in sec 6.8.6 para 4 for,
therefore, the translator may assume it terminates (it is
unspecified whether or not it does; either behavior is correct.
GCC, for example, appears not to make the same assumption).

Why do you think the behavior is unspecified rather that undefined?

Unspecified behavior is defined as: "behavior, that results from
the use of an unspecified value, or other behavior upon which
this document provides two or more possibilities and imposes
no further requirements on which is chosen in any instance". >>>(Implementation-defined behavior differs from unspecified behavior
in that the implementation must document how the choice is made.)

What are the "two more more possibilities" in this case?

The two choices are that the implementation may assume the loop
terminates, or it may not, but it doesn't say which. I don't
think that the language permits it to be UB. But I could be
wrong. It's a bit of a distinction without a difference as far
as the outcome is concerned.

No, those are not the two choices. An assumption made by an
implementation is not behavior ("external appearance or action").
An implementation might invoke some behavior as a result of some
assumption.

If a loop doesn't terminate and the implementation assumes that
it does, the standard says nothing about the resulting behavior.
It doesn't provide two or more options for the actual behavior.
That's classic UB.

We've seen cases here where the actual behavior is falling through
into a function that's never called. That's certainly not a
possibility provided by the standard.

Ok, fair point.

- Dan C.

--- Synchronet 3.22a-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Fri Jun 12 02:02:51 2026

From Newsgroup: comp.lang.c

In article <110fddl$1pooi$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:

[...]

I suspect the original intent is as you said, to support removal
of "dead" loops where the body has been optimized away, or
excised using conditional compilation. Something like,

#ifdef DEBUG
#define DOTHING true
#else
#define DOTHING false
#endif

...
for (int i = 0; i < n; i++) {
if (DOTHING) {
// Something complex here...
}
}

If `DEBUG` is not defined in the preprocessor, the compiler has
license to elide the entire loop as part of dead code
elimination.

I think I see what you mean, but in this particular case the loop
can be proven to terminate unless `i` is modified in the body of
the loop, and a compiler can elide the entire loop anyway.

Yes. Scott aluded to the rest; what if the actual body had set
the exit condition for the loop, and had been optimized away?

For example, given `DOTHING` as above:

for (int i = 0; i < n; ) {
if (DOTHING) {
// Something complex here...
i++;
}
}

Here, as before, the compiler is allowed to assume that the loop
_would_ terminate, and thus elide it, as before. Of course, it
is not forced to _guarantee_ that happens because it can't solve
the halting problem.

[...]

As I understand it, primarily by reading the C++ problem report,
which covers both C and C++ for background, the idea is to
guarantee forward progress for programs that make use of
threads: consider cooperatively-scheduled green threads; a
programmer who inadvertantly creates an infinite loop shouldn't
be able to starve all threads for access to the CPU.

Personally, I don't think C should be in the business of doing
such things. But it is what it is.

I agree.

Yup.

It is one of the reasons C is no longer my favorite language.

- Dan C.

--- Synchronet 3.22a-Linux NewsLink 1.2

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Fri Jun 12 02:20:11 2026

From Newsgroup: comp.lang.c

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 2026-06-11 18:30, Waldek Hebisch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 2026-06-09 03:25, Waldek Hebisch wrote:

[...]

Interesting views. - Thanks.

I think biggest trouble is normal programmers. They already
struggle with current standard text. More formal presentation
could alienate even folks who now are able to explain standard
rules to other programmers.

I'm not sure what "normal programmers" are. From own experience
I can just say that there's a difference between what's "formal"
in a "lawyer's speeches and texts" sense and what's formal in a
mathematical sense. - The C-Standard as had been quoted here is
more of a lawyer's text, with its inherent property of not being
formally (in a mathematical sense) accurate (despite their tries;
in both areas, law and programming language, respectively). It's
thus not necessarily a problem if we'd have a more [mathematical]
formal standard. - Programmers, as I see it, need definite texts.
And rejection of the "lawyer's" sort of texts is not surprising.
That not necessarily affects their acceptance will of more formal
specifications.

You sniped most of what I wrote.

Yes, because I acknowledged it by my above on-line remark already
(and I didn't want to waste space unnecessarily). (No offense!)

I intended to comment just on the one paragraph above, with its
assumption that it may be an inherent problem to programmers.

But this paragraph was closely linked to the text above. Dan Cross
wanted formal semantics and my paragraph was responding to this.
I think that lawyerish style of current C standard is mostly inertia,
and making standard more mathematical would improve it. But giving
formal semantic in the standard would mean significantly bigger
change.

To elaborate only a bit more...
There's folks who have problems with "lawyer's speech" standards.
There's folks who have problems with formal mathematical standards.
But, as to my observation, there's *no* strict or natural hierarchy
that one would imply the other.

You said: "They already struggle with current standard text."
as if there would be a strict "one implies the other" fact; there
isn't one, or to be more cautious, "there isn't necessarily one".
(I used the wording "necessarily" already in my original comment.)

I certainly would prefer standard
that is less lawyerish and more mathematical, say written in similar
way to Pascal standard. But there is a _big_ gap between normal
mathematical text and a formal mathematical text (and let me note that
Pascal standard is less formal than normal mathematics).

I agree.

Normal
mathematical text depends on human understanding to disambiguate
and bridge small inconsistencies. Formal one has parts which
are there only because authors were not able to avoid
ambiguity in simpler way. And once things are written in a way
that is well fit to formalizm they tend to be much less
understandable to uninitiated.

(I'll leave that uncommented. - I've said all I intended to say.)

Janis

--
Waldek Hebisch
--- Synchronet 3.22a-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Fri Jun 12 10:58:07 2026

From Newsgroup: comp.lang.c

On 11/06/2026 17:34, Janis Papanagnou wrote:

On 2026-06-11 08:56, David Brown wrote:

On 10/06/2026 23:47, Keith Thompson wrote:

[...]

#include <stdio.h>
int main(void) {
-a-a-a-a bool keep_going = true;
-a-a-a-a while (keep_going) {
-a-a-a-a-a-a-a-a keep_going = true;
-a-a-a-a }
-a-a-a-a puts("never reached");
}

[...]

[...]

The loop might originally have contained source code, but become empty
through pre-processing, or from other compiler transformations (such
as the compiler seeing that the "keep_going" variable is not volatile
and its value is never used, so assignments to it can be elided, or
moving other things outside the loop body).

A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite.-a But is it likely?

I think we should not make any assumptions about the "creativity" of a programmer ("C" or else). - Semantics should be well defined, and then
clear to the programmer.

I think the semantics of this "loops can be assumed to terminate" are
clearly defined in the standard. I agree that the details might not be
known to all C programmers, but I think they are only relevant in a very
small number of cases.

In my experience, infinite loops are generally very clearly written -
either as "for (;;)" loops or "while (true)" loops - or they are the
result of bugs in the code that accidentally run forever.-a If the loop
is accidentally infinite, the programmer will already be expecting it
to run the code after the loop.

[...]

So while I agree that this kind of thing can lead to curiosities and
behaviour that seems counter-intuitive, and is popular with the
"modern compilers are evil" crowd, I really do not see it as an issue
in practice.-a There are many other mistakes programmers can make, or
UB that they hit accidentally - this is a drop in the ocean IMHO.

Languages shall be sensibly and clearly defined. For bad designs (or
bad standards) the language or standard should be blamed, and not the
critics badly and inappropriately despised as ''"modern compilers are
evil" crowd''. - Programmers are at the final end of the "food chain".
And there's a lot of horrible pits in the C-language where programmers
"made the mistake" to fall in; don't blame them, neither the ones who silently suffer nor the ones who shout out.

I agree that standards should be clear, and standards documents should
be held accountable if they are not. There's no doubt that the C
standards are not perfect (Keith's "42 is not an expression" is an
example of that).

But it is less obvious that the language should be blamed for bad
design. As a wise man here said, "C is what it is". The reasons for
design decisions might be lost to history, inappropriate for a modern language, or forced for compatibility reasons - but the language stands
with the rules it has. I don't know of anyone who uses a mainstream programming language for serious work and does not think at least some
of its design decisions are bad - "bad" is highly subjective, depending
on both the programmer and the type of work they do. Just like for any programming language, if you are programming in C, then you need to be
aware of the pitfalls of C or steer well clear of where pitfalls might be.

Ultimately, programming languages are subject to the equivalent of
market forces - the choice of language to use for a particular task is a matter of weighing up what you think are the good and bad points for
available alternatives. As the incumbent in many situations, C of
course has an unfair advantage - but with enough incentive, people move
to other languages with their own benefits, disadvantages, and "bad"
design decisions. This is a slow process, but it is the only way forward.

As for my '"modern compilers are evil" crowd' comment, there are people
(not anyone involved in this discussion) who really do fall into that
camp. I've seen people who are experienced and respected developers
make all sorts of accusations to compiler developers, claiming they are
only interested in high scores on synthetic benchmarks and directly
insulting their motivations and integrity, blaming them for "breaking"
their code that relied on the effects of some kinds of UB. It is always frustrating when you have code that works fine with one compiler
version, but using another compiler results in failure due to UB in your
code - especially if writing correct code gives inefficient results with
the first compiler. And it's fine to say you'd be happier if a
particular thing that is UB in C were not UB - but it is unreasonable to
blame compiler developers for implementing the language as it is defined.

I am not in any way saying that critics of aspects of C (the language,
the standards, or compiler implementations) should be dismissed or
despised - merely that the example of loop elimination leading to UB and unexpected results is regularly used as "evidence" by those that hold
extreme positions about C, despite it being very unrealistic for the
issue to cause problems in real coding practice.

It is always best if compilers are able to warn you about problems in
your code - such as UB - and avoid surprising results. But I don't
think it is practical to expect them to catch everything, and too many warnings will flood you with false positives. (gcc used to have a
warning for when code was elided - as the compiler got stronger and
gained more optimisations, the warning was dropped because eliding code happened far too often to warn about.)

--- Synchronet 3.22a-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Fri Jun 12 11:02:24 2026

From Newsgroup: comp.lang.c

On 11/06/2026 22:29, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:
[...]

The idea of all this is given in a footnote in the C standards - "This
is intended to allow compiler transformations such as removal of empty
loops even when termination cannot be proven."

The loop might originally have contained source code, but become empty
through pre-processing, or from other compiler transformations (such
as the compiler seeing that the "keep_going" variable is not volatile
and its value is never used, so assignments to it can be elided, or
moving other things outside the loop body).

A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite. But is it likely? In my
experience, infinite loops are generally very clearly written - either
as "for (;;)" loops or "while (true)" loops - or they are the result
of bugs in the code that accidentally run forever. If the loop is
accidentally infinite, the programmer will already be expecting it to
run the code after the loop.

How about a loop that has a non-constant condition, but that is
not expected to terminate in normal usage?

while (! something_really_bad_happened()) {
sleep(1);
}
self_destruct();

A compiler could "assume" that the loop terminates, even if something_really_bad never happens, and that assumption could result in
a call to self_destruct(). There are probably better ways to do that,
but it's straightforward code with seemingly obvious semantics that
an implementation is permitted to make unwarrated assumptions about.

The compiler can only assume that if it knows that the controlling
expression - the call to "something_really_bad_happened()" - does not
contain any IO operations, volatile accesses or atomic operations.

--- Synchronet 3.22a-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Fri Jun 12 11:37:39 2026

From Newsgroup: comp.lang.c

On 12/06/2026 01:46, Dan Cross wrote:

In article <110fgbi$1qf9f$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <110cre9$13aa9$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]

I see you did not read the other messages in the (sub)thread,
but ok, here it is again, in C:

```
term% cat what.c
#include <stdio.h>
int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; } >>>>> void hello(void) { printf("Hello, World!\n"); }
term% clang --version | sed 1q
clang version 22.1.6
term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c >>>>> what.c:2:58: warning: for loop has empty body [-Wempty-body]
2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
| ^
what.c:2:58: note: put the semicolon on a separate line to silence this warning
1 warning generated.
term% ./what
Hello, World!
term%
```

I see the same behavior.

The following largely repeats what I've written previously in
this thread.

Apparently the authors of clang decided that this statement in N3220
6.8.6.p4:

An iteration statement may be assumed by the implementation to
terminate if its controlling expression is not a constant
expression, ...

means that a program that violates that assumption has undefined
behavior. I intensely dislike both the rule and the way it's stated,
but I agree that the conclusion that the behavior is undefined is
a reasonable one.

I think the behavior is technical "unspecified" in the sense of
the C standard, but yes, this is the important bit. The
controlling expresion is not constant, and the loop doesn't meet
any of the other criteria set forth in sec 6.8.6 para 4 for,
therefore, the translator may assume it terminates (it is
unspecified whether or not it does; either behavior is correct.
GCC, for example, appears not to make the same assumption).

Why do you think the behavior is unspecified rather that undefined?

Unspecified behavior is defined as: "behavior, that results from
the use of an unspecified value, or other behavior upon which
this document provides two or more possibilities and imposes
no further requirements on which is chosen in any instance".
(Implementation-defined behavior differs from unspecified behavior
in that the implementation must document how the choice is made.)

What are the "two more more possibilities" in this case?

The two choices are that the implementation may assume the loop
terminates, or it may not, but it doesn't say which. I don't
think that the language permits it to be UB. But I could be
wrong. It's a bit of a distinction without a difference as far
as the outcome is concerned.

- Dan C.

I think perhaps there is both undefined and unspecified aspects here.

The implementation may assume the loop terminates - that means, to me,
that there are no requirements for what happens if the loop does not terminate. Not terminating would be UB.

However, I don't support clang's reasoning after that in this case. As
I see it, a compiler can reason that the loop terminates and then
executes "return 0;" because the non-terminating situation is UB and
cannot occur. Thus it can skip the loop and go straight to "return 0;".
Alternatively, it can reason that the non-terminating situation is UB
and we don't care what happens if it does not terminate - so "return 0;"
would be fine in that case too, simplifying the generated code.

But it seems that clang is reasoning that it can assume the loop
terminates, and it can prove that the loop does not terminate, and this contradiction means that anything is allowed (including skipping all
code generation). The code has two conflicting semantics - it is an
infinite loop, and it is a terminating loop. I think the standards say
that the compiler /may/ consider the terminating loop interpretation as correct, thus giving just "return 0;", or it may choose not to consider
that it terminates, and generate an infinite loop. Clang appears to
think that it can pick both options at once, which would give
contradictory behaviour, and therefore jump straight to UB.

I would say that the best behaviour for a compiler here would be to give
a warning, then it should pick one or the other defined behaviours.
(gcc picks the infinite loop, but does not give any warning.) I cannot
say for sure that clang's behaviour is incorrect - but it is certainly
very unhelpful and poor quality of implementation.

(I also think that it makes sense for compilers to use the "ud2" or
similar "undefined behaviour" trap instructions in cases where they know
an execution path is definitely UB and doing so does not affect the
efficiency of non-UB paths.)

--- Synchronet 3.22a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Fri Jun 12 12:27:00 2026

From Newsgroup: comp.lang.c

David Brown <david.brown@hesbynett.no> writes:

On 11/06/2026 17:34, Janis Papanagnou wrote:

On 2026-06-11 08:56, David Brown wrote:

On 10/06/2026 23:47, Keith Thompson wrote:

[...]

#include <stdio.h>
int main(void) {
-a-a-a-a bool keep_going = true;
-a-a-a-a while (keep_going) {
-a-a-a-a-a-a-a-a keep_going = true;
-a-a-a-a }
-a-a-a-a puts("never reached");
}

[...]

[...]

The loop might originally have contained source code, but become
empty through pre-processing, or from other compiler
transformations (such as the compiler seeing that the "keep_going"
variable is not volatile and its value is never used, so
assignments to it can be elided, or moving other things outside the
loop body).

A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite.-a But is it likely?

I think we should not make any assumptions about the "creativity" of
a programmer ("C" or else). - Semantics should be well defined, and
then clear to the programmer.

I think the semantics of this "loops can be assumed to terminate" are
clearly defined in the standard. I agree that the details might not
be known to all C programmers, but I think they are only relevant in a
very small number of cases.

I disagree that the semantics are clearly defined. N3220 6.8.6.1p4
is specified in terms of what an implementation may "assume", not in
terms of the semantics of the program. One can conclude that this
means that the program has undefined behavior if the assumption is
violated, but that's not directly stated. I don't know how many C
programmers know the standard well enough to reach that conclusion.
I'm not even 100% sure it's accurate.

The permission was added in C11 with little fanfare. It's not
mentioned in the list of major changes in the C11 Foreword.
The cases where it applies may be rarer than I had assumed, but
it at least has the potential to break existing code that was well
defined in C99.

The rationale is to provide more opportunities for optimization,
but it's not at all clear (at least to me) that it's particularly
successful. If cases where it can cause problems are rare, then
presumably cases where it's actually useful are rare. (That may
be an oversimplification.)
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.22a-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Sat Jun 13 12:36:15 2026

From Newsgroup: comp.lang.c

On 12/06/2026 21:27, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:

On 11/06/2026 17:34, Janis Papanagnou wrote:

On 2026-06-11 08:56, David Brown wrote:

On 10/06/2026 23:47, Keith Thompson wrote:

[...]

#include <stdio.h>
int main(void) {
-a-a-a-a bool keep_going = true;
-a-a-a-a while (keep_going) {
-a-a-a-a-a-a-a-a keep_going = true;
-a-a-a-a }
-a-a-a-a puts("never reached");
}

[...]

[...]

The loop might originally have contained source code, but become
empty through pre-processing, or from other compiler
transformations (such as the compiler seeing that the "keep_going"
variable is not volatile and its value is never used, so
assignments to it can be elided, or moving other things outside the
loop body).

A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite.-a But is it likely?

I think we should not make any assumptions about the "creativity" of
a programmer ("C" or else). - Semantics should be well defined, and
then clear to the programmer.

I think the semantics of this "loops can be assumed to terminate" are
clearly defined in the standard. I agree that the details might not
be known to all C programmers, but I think they are only relevant in a
very small number of cases.

I disagree that the semantics are clearly defined. N3220 6.8.6.1p4
is specified in terms of what an implementation may "assume", not in
terms of the semantics of the program. One can conclude that this
means that the program has undefined behavior if the assumption is
violated, but that's not directly stated. I don't know how many C programmers know the standard well enough to reach that conclusion.
I'm not even 100% sure it's accurate.

The permission was added in C11 with little fanfare. It's not
mentioned in the list of major changes in the C11 Foreword.
The cases where it applies may be rarer than I had assumed, but
it at least has the potential to break existing code that was well
defined in C99.

The rationale is to provide more opportunities for optimization,
but it's not at all clear (at least to me) that it's particularly
successful. If cases where it can cause problems are rare, then
presumably cases where it's actually useful are rare. (That may
be an oversimplification.)

I agree on that last point. I doubt if any code would suffer if the
paragraph were removed entirely from the standard. And while I also
don't think much real-world code is at risk of problems from its
inclusion in the standard, as long as there is some risk of problems
with existing correct code, or some risk of confusion or
misunderstanding on the part of programmers reading the standard, then
it would be better if that paragraph had not been added to the standard
at all.

--- Synchronet 3.22a-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Sat Jun 13 12:02:24 2026

From Newsgroup: comp.lang.c

In article <110ghmv$21vi3$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

[snip]
As for my '"modern compilers are evil" crowd' comment, there are people
(not anyone involved in this discussion) who really do fall into that
camp. I've seen people who are experienced and respected developers
make all sorts of accusations to compiler developers, claiming they are
only interested in high scores on synthetic benchmarks and directly >insulting their motivations and integrity, blaming them for "breaking"
their code that relied on the effects of some kinds of UB. It is always >frustrating when you have code that works fine with one compiler
version, but using another compiler results in failure due to UB in your >code - especially if writing correct code gives inefficient results with
the first compiler. And it's fine to say you'd be happier if a
particular thing that is UB in C were not UB - but it is unreasonable to >blame compiler developers for implementing the language as it is defined.

Eh...I think those people have a point.

Note, I don't think that "modern compilers are evil" (I mean,
wow, that's a strong word) and I certainly do not think it is
appropriate to malign the people who write them personally over
what one does with code.

But I _do_ think it is fair to say that UB is very easy to fall
into in C, that programs that have worked correctly (insofar as
their intended behavior as written) for years can suddenly fail
because latent UB is treated differently in a point revision of
a compiler, and that that (as you point out) can be incredibly
frustrating for the authors.

Regehr called out a dichotomy with UB: programmers using a
language hate it; compiler writers love it.

Here's my own vignette: I was chatting with a friend who works
on LLVM and clang some time ago. I said, "I don't want UB" and
he replied, "no, you really do." I asked him what he meant and
he responded that I wanted a compiler that is capable of
optimizing my program; "sure, but I still don't want UB." We
went on for a bit, and it became clear that he saw UB as _the_
vehicle for unlocking optimization.

I realized that we were not speaking the same language _at all_.
He and I both wanted a language where we could write programs
that yield efficient object code. He saw UB as essential for
that; but what I want is a language with well-defined semantics
that can be aggressively optimized.

That, I think, is the tension: there was a fundamental breakdown
in communication between the users of the language, and those
defining and implementing it. My subjective sense is that in
the past few years things are getting somewhat better, but it is
hard to evolve something as critical and widely used as C.

I am not in any way saying that critics of aspects of C (the language,
the standards, or compiler implementations) should be dismissed or
despised - merely that the example of loop elimination leading to UB and >unexpected results is regularly used as "evidence" by those that hold >extreme positions about C, despite it being very unrealistic for the
issue to cause problems in real coding practice.

The kernel I am working on has about 5 million lines of code.
That code has been evolving for 40 years; some of it predates
the ISO standards and even the ANSI standard. It has been
updated for newer compilers, sure, but in some places the
treatment is surface-level: using ISO-style function prototypes
and definition syntax, for example. But deep problems remain in
parts, and contraints on engineering resources couple with
economic and business pressures so that it's not going to get
cleaned up any time soon. I'm sure there is UB in it; in fact,
I know there is. But them's the breaks; and yet, customers are
using it in production. Because of this, upgrading toolchains
is laborious and complex, and takes a lot of time, and new
compilers are (rightly) viewed with suspicion. That is not a
great situation, but I don't think anyone is angry at the
compiler people over it.

And just as it's not acceptable to blame compiler writers for
implementating the language as it is defined, it's not really
acceptable to blame programmers either; some of the people who
put the UB there are (literally) dead, and there's just not
enough time in the day to go clean it all up. I wish there was
more compassion for that.

As said earlier, C is what it is. I suspect that it will
continue to make incremental improvements, but we're basically
stuck with what we have.

- Dan C.

--- Synchronet 3.22a-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Sat Jun 13 12:03:47 2026

From Newsgroup: comp.lang.c

In article <110hmi7$2e85g$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

David Brown <david.brown@hesbynett.no> writes:

On 11/06/2026 17:34, Janis Papanagnou wrote:

On 2026-06-11 08:56, David Brown wrote:

On 10/06/2026 23:47, Keith Thompson wrote:

[...]

#include <stdio.h>
int main(void) {
-a-a-a-a bool keep_going = true;
-a-a-a-a while (keep_going) {
-a-a-a-a-a-a-a-a keep_going = true;
-a-a-a-a }
-a-a-a-a puts("never reached");
}

[...]

[...]

The loop might originally have contained source code, but become
empty through pre-processing, or from other compiler
transformations (such as the compiler seeing that the "keep_going"
variable is not volatile and its value is never used, so
assignments to it can be elided, or moving other things outside the
loop body).

A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite.-a But is it likely?

I think we should not make any assumptions about the "creativity" of
a programmer ("C" or else). - Semantics should be well defined, and
then clear to the programmer.

I think the semantics of this "loops can be assumed to terminate" are
clearly defined in the standard. I agree that the details might not
be known to all C programmers, but I think they are only relevant in a
very small number of cases.

I disagree that the semantics are clearly defined. N3220 6.8.6.1p4
is specified in terms of what an implementation may "assume", not in
terms of the semantics of the program. One can conclude that this
means that the program has undefined behavior if the assumption is
violated, but that's not directly stated. I don't know how many C >programmers know the standard well enough to reach that conclusion.
I'm not even 100% sure it's accurate.

The permission was added in C11 with little fanfare. It's not
mentioned in the list of major changes in the C11 Foreword.
The cases where it applies may be rarer than I had assumed, but
it at least has the potential to break existing code that was well
defined in C99.

Another example of something that was previously well-defined
and is now UB, I guess. :-/

The rationale is to provide more opportunities for optimization,
but it's not at all clear (at least to me) that it's particularly
successful. If cases where it can cause problems are rare, then
presumably cases where it's actually useful are rare. (That may
be an oversimplification.)

I'm not sure that's the rationale: rather, it's to guarantee
forward progress. Again, that's not really the language's
purview.

- Dan C.

--- Synchronet 3.22a-Linux NewsLink 1.2

From ram@ram@zedat.fu-berlin.de (Stefan Ram) to comp.lang.c on Sat Jun 13 12:13:13 2026

From Newsgroup: comp.lang.c

cross@spitfire.i.gajendra.net (Dan Cross) wrote or quoted:

Here's my own vignette: I was chatting with a friend who works
on LLVM and clang some time ago. I said, "I don't want UB" and
he replied, "no, you really do." I asked him what he meant and

Might like to have a look at the video

"Garbage In, Garbage Out, Arguing about Undefined Behavior
with Nasal Demons" (2016) by Chandler Carruth.

IIRC it essential takes the point of your friend, but maybe adds
some explanations. At 15' in, it discusses the suggestion to
"define all the behavior". It's for C++, but I think some of it
might apply to C as well. At 24' come some examples.

--- Synchronet 3.22a-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Sat Jun 13 12:44:29 2026

From Newsgroup: comp.lang.c

In article <video-20260613131240@ram.dialup.fu-berlin.de>,
Stefan Ram <ram@zedat.fu-berlin.de> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) wrote or quoted:

Here's my own vignette: I was chatting with a friend who works
on LLVM and clang some time ago. I said, "I don't want UB" and
he replied, "no, you really do." I asked him what he meant and

Might like to have a look at the video

"Garbage In, Garbage Out, Arguing about Undefined Behavior
with Nasal Demons" (2016) by Chandler Carruth.

IIRC it essential takes the point of your friend, but maybe adds
some explanations. At 15' in, it discusses the suggestion to
"define all the behavior". It's for C++, but I think some of it
might apply to C as well. At 24' come some examples.

I'm not a huge fan of Carruth.

- Dan C.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Sat Jun 13 14:57:52 2026

From Newsgroup: comp.lang.c

On 2026-06-12 04:20, Waldek Hebisch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 2026-06-11 18:30, Waldek Hebisch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 2026-06-09 03:25, Waldek Hebisch wrote:

[...]

Interesting views. - Thanks.

I think biggest trouble is normal programmers. They already
struggle with current standard text. More formal presentation
could alienate even folks who now are able to explain standard
rules to other programmers.

I'm not sure what "normal programmers" are. From own experience
I can just say that there's a difference between what's "formal"
in a "lawyer's speeches and texts" sense and what's formal in a
mathematical sense. - The C-Standard as had been quoted here is
more of a lawyer's text, with its inherent property of not being
formally (in a mathematical sense) accurate (despite their tries;
in both areas, law and programming language, respectively). It's
thus not necessarily a problem if we'd have a more [mathematical]
formal standard. - Programmers, as I see it, need definite texts.
And rejection of the "lawyer's" sort of texts is not surprising.
That not necessarily affects their acceptance will of more formal
specifications.

You sniped most of what I wrote.

Yes, because I acknowledged it by my above on-line remark already
(and I didn't want to waste space unnecessarily). (No offense!)

I intended to comment just on the one paragraph above, with its
assumption that it may be an inherent problem to programmers.

But this paragraph was closely linked to the text above. Dan Cross
wanted formal semantics and my paragraph was responding to this.
I think that lawyerish style of current C standard is mostly inertia,
and making standard more mathematical would improve it. But giving
formal semantic in the standard would mean significantly bigger
change.

Yes, you said that, and I had acknowledged that; meanwhile twice.

I'm not sure why you persistently insist on any relation to your
previous text when all what *I* wanted to comment on in your post
was just _one aspect_ in your last paragraph, which was:

I think biggest trouble is normal programmers.
They already struggle with current standard text.

And I expressed that I refute that view and I explained my view.

If you think your statement about "normal programmers" (whatever
you imply with "normal") is correct and my perception with people
is in any way wrong we can discuss that.

(On your other text I see nothing that we'd need to discuss.)

Janis

[...]

--- Synchronet 3.22a-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Sat Jun 13 18:32:24 2026

From Newsgroup: comp.lang.c

On 13/06/2026 14:02, Dan Cross wrote:

In article <110ghmv$21vi3$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

[snip]
As for my '"modern compilers are evil" crowd' comment, there are people
(not anyone involved in this discussion) who really do fall into that
camp. I've seen people who are experienced and respected developers
make all sorts of accusations to compiler developers, claiming they are
only interested in high scores on synthetic benchmarks and directly
insulting their motivations and integrity, blaming them for "breaking"
their code that relied on the effects of some kinds of UB. It is always
frustrating when you have code that works fine with one compiler
version, but using another compiler results in failure due to UB in your
code - especially if writing correct code gives inefficient results with
the first compiler. And it's fine to say you'd be happier if a
particular thing that is UB in C were not UB - but it is unreasonable to
blame compiler developers for implementing the language as it is defined.

Eh...I think those people have a point.

Note, I don't think that "modern compilers are evil" (I mean,
wow, that's a strong word) and I certainly do not think it is
appropriate to malign the people who write them personally over
what one does with code.

I think it is important for tools to be helpful, and it's fine to
complain if a tool is being directly unhelpful - or ask for improvements
when you think it could be better.

But I _do_ think it is fair to say that UB is very easy to fall
into in C, that programs that have worked correctly (insofar as
their intended behavior as written) for years can suddenly fail
because latent UB is treated differently in a point revision of
a compiler, and that that (as you point out) can be incredibly
frustrating for the authors.

It can certainly happen, yes. And I fully sympathise on these few
occasions when changes to the standard has meant that code that
previously had defined behaviour, now has different or undefined
behaviour. (However, I think that for some kinds of code, programmers
could be better at specifying exactly what standards their code
requires, and the standards they use when compiling code.)

But it is important to realise that if you write code with UB, it is
/your/ mistake - not the mistake of the compiler developers, or the
mistake of the standards authors. Compiler vendors can (and do!) try to
help programmers find their mistakes - experience shows, however, that
many programmers reach first for bug report forms or complaints in
forums before compiler tools like sanitisers or even enabling warnings
on their builds.

Programming in C is a cooperative effort - including the standards
authors, the compiler vendors, and the C programmers. Each group can
try to help the others, but each is ultimately responsible for their own
part.

Regehr called out a dichotomy with UB: programmers using a
language hate it; compiler writers love it.

I think Regehr has made some good points in his writings, but I do not
agree with him on everything.

As a programmer, I am a fan of the concept of UB. I am quite happy with
the idea that operations have a pre-condition, and that if there is no
"right answer" for a given input, I should not provide that input. I
prefer that signed integer arithmetic overflow is UB, and do not want it
to be wrapping or have some other semantics - to me, it is far clearer
that way. If I have UB in my code, it's a bug - no different from any
other bug I might make.

It is the case that in C, there are some kinds of UB that can be quite
subtle. However, you rarely need to risk meeting them. Yes, there are pitfalls - don't go near them, and they don't matter.

However, it is unfortunately the case that sometimes avoiding UB can be
costly in performance terms. An example would be if you have need of type-punning - perhaps you have a float in memory and you want to access
it as an uint32_t for some reason. Casting a float * to an uint32_t *
and using that new pointer is UB. Some compilers will nonetheless
generate the code you want after such a cast. Some compilers might not, depending on details of the rest of the surrounding code, because it is
UB. A non-UB solution would be to use memcpy(), or a type-punning
union. For highly optimising compilers, that's fine - the code
generated by gcc or clang for a memcpy() here is likely to be as
efficient as you could get - directly reading the float from memory to
an integer register. For other compilers, however, you might get a call
to a memcpy() library function in an external DLL, taking orders of
magnitude more cycles. What is the poor programmer to do? Write code
that is portable and correct, but very slow with some implementations?
Write code that "cheats" and is efficient on some implementations but
might not give the desired results on others? Use pre-processor
monstrosities to detect different compilers and adapt accordingly? That
is what I see as the biggest issue resulting from compiler optimisation
based on UB. I don't know what the "best" answer here is.

Here's my own vignette: I was chatting with a friend who works
on LLVM and clang some time ago. I said, "I don't want UB" and
he replied, "no, you really do." I asked him what he meant and
he responded that I wanted a compiler that is capable of
optimizing my program; "sure, but I still don't want UB." We
went on for a bit, and it became clear that he saw UB as _the_
vehicle for unlocking optimization.

I realized that we were not speaking the same language _at all_.
He and I both wanted a language where we could write programs
that yield efficient object code. He saw UB as essential for
that; but what I want is a language with well-defined semantics
that can be aggressively optimized.

I too want a language with well-defined semantics that can be
aggressively optimised. But I do not see UB as a hinder to that. I am
happy knowing that I cannot divide by 0, or find the square root of a
negative number (in the real domain). I am happy knowing that I cannot
add two ints if their sum overflows the range of their type, and that I
cannot call a function with a different number or type of parameters
than its definition. I have a great deal of difficulty seeing how
things could be any different, other than in a managed language with significant overhead from run-time checks - and that goes against the "aggressively optimised" requirement.

Having "well-defined semantics" does not mean the language should accept anything that happens to fit the syntax and grammar rules, or that all functions and operations should give a defined result for all inputs.
It means that the set of valid inputs is clearly defined, along with the outputs and effects you get when the inputs are valid.

(There are plenty of points in the C standards where the wording could
make the semantics clearer, or where the range of input values could
easily have been larger - I am not suggesting C is as well-defined as it
could reasonably be.)

That, I think, is the tension: there was a fundamental breakdown
in communication between the users of the language, and those
defining and implementing it. My subjective sense is that in
the past few years things are getting somewhat better, but it is
hard to evolve something as critical and widely used as C.

Communication between the separate parties is always an issue, and it is
easy for it to be a one-way street with a language standards committee dictating the rules with little attention to feedback, then compiler
vendors following these rules without listening to the users.

A challenge here, perhaps, is that users are a very diverse group. How
much should compiler vendors cater for those that put a lot of effort
into correctness and want top efficiency, or those that are less
knowledgable about the language but want to avoid the consequences of
their mistakes? What about those working with old code written for
different compilers with different unwritten rules? It is not easy to
please everyone.

I am not in any way saying that critics of aspects of C (the language,
the standards, or compiler implementations) should be dismissed or
despised - merely that the example of loop elimination leading to UB and
unexpected results is regularly used as "evidence" by those that hold
extreme positions about C, despite it being very unrealistic for the
issue to cause problems in real coding practice.

The kernel I am working on has about 5 million lines of code.
That code has been evolving for 40 years; some of it predates
the ISO standards and even the ANSI standard. It has been
updated for newer compilers, sure, but in some places the
treatment is surface-level: using ISO-style function prototypes
and definition syntax, for example. But deep problems remain in
parts, and contraints on engineering resources couple with
economic and business pressures so that it's not going to get
cleaned up any time soon. I'm sure there is UB in it; in fact,
I know there is. But them's the breaks; and yet, customers are
using it in production. Because of this, upgrading toolchains
is laborious and complex, and takes a lot of time, and new
compilers are (rightly) viewed with suspicion. That is not a
great situation, but I don't think anyone is angry at the
compiler people over it.

I think that is a good way to handle the situation. In my projects, I
do not normally upgrade or change toolchains. While I think the risk of
UB is small in my own code, small does not mean non-existent. And for
my work, generated code that behaves correctly in terms of C semantics
but has different execution times or code size might also be an issue -
so changes in toolchains mean a lot of extra testing and qualification.
In addition, for some microcontrollers the toolchains have relatively
small user bases and consequently higher risks of unknown bugs in the toolchains themselves. Sometimes there are also implementation-specific features that change between versions (though that is less of an issue
these days).

And just as it's not acceptable to blame compiler writers for
implementating the language as it is defined, it's not really
acceptable to blame programmers either; some of the people who
put the UB there are (literally) dead, and there's just not
enough time in the day to go clean it all up. I wish there was
more compassion for that.

Being dead does not resolve you of the responsibility - the person that
wrote the code with UB is the person who wrote the code with the UB,
just like any other bugs. That person wrote the code with the error.
It might not be fair to hold it against them - there are a great many
possible reasons why it was not their fault (typically management is
more at fault than the coders!). And placing blame is rarely a useful exercise - usually it does not matter where the bugs came from, only
that they are there and need to be fixed or worked around.

As said earlier, C is what it is. I suspect that it will
continue to make incremental improvements, but we're basically
stuck with what we have.

- Dan C.

Agreed.

--- Synchronet 3.22a-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Sun Jun 14 14:33:33 2026

From Newsgroup: comp.lang.c

In article <110k0mp$329k6$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

On 13/06/2026 14:02, Dan Cross wrote:

In article <110ghmv$21vi3$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

[snip]
As for my '"modern compilers are evil" crowd' comment, there are people
(not anyone involved in this discussion) who really do fall into that
camp. I've seen people who are experienced and respected developers
make all sorts of accusations to compiler developers, claiming they are
only interested in high scores on synthetic benchmarks and directly
insulting their motivations and integrity, blaming them for "breaking"
their code that relied on the effects of some kinds of UB. It is always >>> frustrating when you have code that works fine with one compiler
version, but using another compiler results in failure due to UB in your >>> code - especially if writing correct code gives inefficient results with >>> the first compiler. And it's fine to say you'd be happier if a
particular thing that is UB in C were not UB - but it is unreasonable to >>> blame compiler developers for implementing the language as it is defined. >>

Eh...I think those people have a point.

Note, I don't think that "modern compilers are evil" (I mean,
wow, that's a strong word) and I certainly do not think it is
appropriate to malign the people who write them personally over
what one does with code.

I think it is important for tools to be helpful, and it's fine to
complain if a tool is being directly unhelpful - or ask for improvements >when you think it could be better.

Yes.

But I _do_ think it is fair to say that UB is very easy to fall
into in C, that programs that have worked correctly (insofar as
their intended behavior as written) for years can suddenly fail
because latent UB is treated differently in a point revision of
a compiler, and that that (as you point out) can be incredibly
frustrating for the authors.

It can certainly happen, yes. And I fully sympathise on these few
occasions when changes to the standard has meant that code that
previously had defined behaviour, now has different or undefined
behaviour. (However, I think that for some kinds of code, programmers
could be better at specifying exactly what standards their code
requires, and the standards they use when compiling code.)

But it is important to realise that if you write code with UB, it is
/your/ mistake - not the mistake of the compiler developers, or the
mistake of the standards authors. Compiler vendors can (and do!) try to >help programmers find their mistakes - experience shows, however, that
many programmers reach first for bug report forms or complaints in
forums before compiler tools like sanitisers or even enabling warnings
on their builds.

Programming in C is a cooperative effort - including the standards
authors, the compiler vendors, and the C programmers. Each group can
try to help the others, but each is ultimately responsible for their own >part.

Here's the problem that I have with this line of reasoning. C
is a language that has considerable history; there was a large
body of C code written before the first standard was ever
created, in 1988; C was a teenager. And it took many years for
decent quality ANSI C compilers to be ubiquitous. C could
legally drink by then.

"Undefined Behavior", in C, in the manner usually discussed in
this newsgroup, was introduced with the first standard. That
means that there is --- still --- a large body of software that
has "UB" that was put there before UB existed as a thing
programmers needed to worry about in C.

Even once it was a part of C, the concept was communicated
poorly.

Some people seem to delight in this, believing precision in
interpreting the standard in abstruse ways is an expression of
deep technical expertise; but it really is not.

Yes, UB is created by programmers. However, in large systems,
it may be that it was created inadvertantly; someone makes a
change that subtley invalidates some invariant that an unknown
caller far away in the code base (or in another one that relies
on the change via an indirect dependency) and now you've got UB;
locally, everything appears correct; but it's the combination
where the UB manifests.

Regehr called out a dichotomy with UB: programmers using a
language hate it; compiler writers love it.

I think Regehr has made some good points in his writings, but I do not
agree with him on everything.

As a programmer, I am a fan of the concept of UB. I am quite happy with
the idea that operations have a pre-condition, and that if there is no >"right answer" for a given input, I should not provide that input. I
prefer that signed integer arithmetic overflow is UB, and do not want it
to be wrapping or have some other semantics - to me, it is far clearer
that way. If I have UB in my code, it's a bug - no different from any
other bug I might make.

This example makes little sense to me. If you don't want
integer overflow, then don't overflow; the techniques for
avoiding it are pretty well known. But why is specifically
better that it is UB, rather than than trapping in debug
builds, or having IB semantics based on the underlying machine?
It seems to be that the burden on the programmer is the same.

It is the case that in C, there are some kinds of UB that can be quite >subtle. However, you rarely need to risk meeting them. Yes, there are >pitfalls - don't go near them, and they don't matter.

I disagree. I think almost all non-trivial programs have UB to
a greater or lesser extent, whether they intend to or not.

However, it is unfortunately the case that sometimes avoiding UB can be >costly in performance terms. An example would be if you have need of >type-punning - perhaps you have a float in memory and you want to access
it as an uint32_t for some reason. Casting a float * to an uint32_t *
and using that new pointer is UB. Some compilers will nonetheless
generate the code you want after such a cast. Some compilers might not, >depending on details of the rest of the surrounding code, because it is
UB. A non-UB solution would be to use memcpy(), or a type-punning
union. For highly optimising compilers, that's fine - the code
generated by gcc or clang for a memcpy() here is likely to be as
efficient as you could get - directly reading the float from memory to
an integer register. For other compilers, however, you might get a call
to a memcpy() library function in an external DLL, taking orders of >magnitude more cycles. What is the poor programmer to do? Write code
that is portable and correct, but very slow with some implementations?
Write code that "cheats" and is efficient on some implementations but
might not give the desired results on others? Use pre-processor >monstrosities to detect different compilers and adapt accordingly? That
is what I see as the biggest issue resulting from compiler optimisation >based on UB. I don't know what the "best" answer here is.

This is kind of my point. If you need a fast way to convery

Here's my own vignette: I was chatting with a friend who works
on LLVM and clang some time ago. I said, "I don't want UB" and
he replied, "no, you really do." I asked him what he meant and
he responded that I wanted a compiler that is capable of
optimizing my program; "sure, but I still don't want UB." We
went on for a bit, and it became clear that he saw UB as _the_
vehicle for unlocking optimization.

I realized that we were not speaking the same language _at all_.
He and I both wanted a language where we could write programs
that yield efficient object code. He saw UB as essential for
that; but what I want is a language with well-defined semantics
that can be aggressively optimized.

I too want a language with well-defined semantics that can be
aggressively optimised. But I do not see UB as a hinder to that.

UB is literally the opposite of well-defined.

I am happy knowing that I cannot divide by 0,

Yup. That should be a trap.

or find the square root of a negative number (in the real
domain).

Yup. That should be a trap.

I am happy knowing that I cannot add two ints if their sum
overflows the range of their type,

Yup. That should be a trap (if you want wrapping semantics, you
should request it explicitly).

and that I cannot call a function with a different number or
type of parameters than its definition.

Yup. That should be a compile-time error.

I have a great deal of difficulty seeing how things could be
any different, other than in a managed language with significant
overhead from run-time checks - and that goes against the
"aggressively optimised" requirement.

There are existence proofs of other languages that can, and do,
do these things, and do them well. I hate to keep beating this
drum, but I think Rust does well here: in safe Rust, UB is a
compile-time error; in *unsafe* Rust, there are tools to help
find where programmers violate the language's invariants.

Having "well-defined semantics" does not mean the language should accept >anything that happens to fit the syntax and grammar rules, or that all >functions and operations should give a defined result for all inputs.

I never said that it did.

It means that the set of valid inputs is clearly defined, along with the >outputs and effects you get when the inputs are valid.

So I was the one who said "well-defined semantics" and I had a
specific meaning in mind. Your definition is incomplete with
respect to that meaning: in addition to what you said, invalid
inputs should be rejected, either as a compile time error, or by
generating an exception or panic at runtime. If you want to
live dangerously and turn the runtime checks off for performance
reasons, then you get 2's complement behavior for integers or
whatever the machine does for the others.

(There are plenty of points in the C standards where the wording could
make the semantics clearer, or where the range of input values could
easily have been larger - I am not suggesting C is as well-defined as it >could reasonably be.)

It's not just that it's nowhere close to being as well-defined
as it should be, it's because the language as defined permits
behavior that varies far too widely, specifically because of UB.

Consider one of the examples you gave: signed integer overflow.
The standard doesn't say that you _can't_ add two numbers
together if you overflow, it just says that if you do, the
language imposes no requirements on the resulting behavior. It
may trap, it may elide the addition entirely, or it may do it
and let the result be whatever the underlying machine does.

That is, the _language_ does not say that it's a bug; it says
that it's not going to say anything about it at all.

This is one reason the committee is trying to reign some of this
in.

That, I think, is the tension: there was a fundamental breakdown
in communication between the users of the language, and those
defining and implementing it. My subjective sense is that in
the past few years things are getting somewhat better, but it is
hard to evolve something as critical and widely used as C.

Communication between the separate parties is always an issue, and it is >easy for it to be a one-way street with a language standards committee >dictating the rules with little attention to feedback, then compiler
vendors following these rules without listening to the users.

A challenge here, perhaps, is that users are a very diverse group. How
much should compiler vendors cater for those that put a lot of effort
into correctness and want top efficiency, or those that are less >knowledgable about the language but want to avoid the consequences of
their mistakes? What about those working with old code written for >different compilers with different unwritten rules? It is not easy to >please everyone.

I think that's simplistic; not many programmers actively want to
"avoid the consequences of their mistakes." Do you really
believe that they do? If so, why?

Conversely, there *is* this kind of machismo attitude among many
C programmers that it requires a superior intellect to truly
understand this language, and those who do not (or who make any
mistake in their understanding) are simply unworthy. I have
repeatedly observed this over many decades now, and when I see
it, I think that it is odious.

My experience is that most programmers are highly intelligent,
capable people. They are not wrong to want behavior they can
rely on, particularly when things are not obvious, as they
often are not. They also want a language that requires a less
lawyerly read of to understand its semantics; that could go the
way of formality (my preferred approach) or just clearer
exposition. Either would be preferable to the current state.

In fairness, I think the current members of the committee
recognize this.

I am not in any way saying that critics of aspects of C (the language,
the standards, or compiler implementations) should be dismissed or
despised - merely that the example of loop elimination leading to UB and >>> unexpected results is regularly used as "evidence" by those that hold
extreme positions about C, despite it being very unrealistic for the
issue to cause problems in real coding practice.

The kernel I am working on has about 5 million lines of code.
That code has been evolving for 40 years; some of it predates
the ISO standards and even the ANSI standard. It has been
updated for newer compilers, sure, but in some places the
treatment is surface-level: using ISO-style function prototypes
and definition syntax, for example. But deep problems remain in
parts, and contraints on engineering resources couple with
economic and business pressures so that it's not going to get
cleaned up any time soon. I'm sure there is UB in it; in fact,
I know there is. But them's the breaks; and yet, customers are
using it in production. Because of this, upgrading toolchains
is laborious and complex, and takes a lot of time, and new
compilers are (rightly) viewed with suspicion. That is not a
great situation, but I don't think anyone is angry at the
compiler people over it.

I think that is a good way to handle the situation. In my projects, I
do not normally upgrade or change toolchains. While I think the risk of
UB is small in my own code, small does not mean non-existent. And for
my work, generated code that behaves correctly in terms of C semantics
but has different execution times or code size might also be an issue -
so changes in toolchains mean a lot of extra testing and qualification.

Obviously in a production setting tools should be tested and
qualified. But the danger posed by UB adds unacceptable risk on
large projects, and the burden for updating a toolchain is too
high. That is as much an indictment of the language as of any
particular project.

As a counter example, there was the Harvey project, which was a
fork of Plan 9 where the Plan 9 C dialect was replaced with ISO
C; we accounted for this by having CI build with 6 seperate
compilers; this flushed out a lot of bugs.

I am surprised that more projects do not adopt canary CI builds
against newer toolchains.

In addition, for some microcontrollers the toolchains have relatively
small user bases and consequently higher risks of unknown bugs in the >toolchains themselves. Sometimes there are also implementation-specific >features that change between versions (though that is less of an issue
these days).

Fun fact: part of the reason Google got involved in clang and
LLVM development was because the vendor toolchain for a
particular microcontroller used in android phones was buggy and
would crash (that is, the compiler itself crashed). The
solution was not to live with it; it was to build a better
toolchain.

Google could afford to do that; I recognize not many
organizations can.

And just as it's not acceptable to blame compiler writers for
implementating the language as it is defined, it's not really
acceptable to blame programmers either; some of the people who
put the UB there are (literally) dead, and there's just not
enough time in the day to go clean it all up. I wish there was
more compassion for that.

Being dead does not resolve you of the responsibility - the person that >wrote the code with UB is the person who wrote the code with the UB,
just like any other bugs. That person wrote the code with the error.

See above. Those people may well have written the code before C
was standardized and before UB as we know it now existed. Also,
by definition UB is not an error.

It might not be fair to hold it against them - there are a great many >possible reasons why it was not their fault (typically management is
more at fault than the coders!). And placing blame is rarely a useful >exercise - usually it does not matter where the bugs came from, only
that they are there and need to be fixed or worked around.

Exactly. The footguns hiding in C code that has worked
perfectly for decades, dating back to before the standards
existed, are legion. Caveat emptor.

_Or_ the code may have been written with careful regard for the
standard, but something _else_ may have been changed that now
leads to exposure to UB. For example, perhaps code was written
that multiples two numbers, `a*b`; a known to be `unsigned int`
when written, but `b` is a signed int. But maybe that is hidden
behind a typedef; some time in the future, the typedef is
changed so that `a` is now `unsigned short`; perhaps someone
realized that the domain values never exceed 16 bits and by
changing the definition some critical structure now fits in a
single cache line. But also now the type promotion rules kick
so that `a*b` happens with the factors as `signed int` and in
there exist values of `a` and `b` where `a*b` overflows: UB.

The code had no UB; the change was elsewhere; no one saw this
because the tests all passed and everything looked ok; then
someone upgrades the compiler and now things break.

Who's fault is that?

And no, this is not contrived; this is exactly the sort of thing
that happens on large, long-lived projects.

As said earlier, C is what it is. I suspect that it will
continue to make incremental improvements, but we're basically
stuck with what we have.

Agreed.

...but be careful blaming the programmer.

- Dan C.

--- Synchronet 3.22a-Linux NewsLink 1.2

From ram@ram@zedat.fu-berlin.de (Stefan Ram) to comp.lang.c on Sun Jun 14 17:22:22 2026

From Newsgroup: comp.lang.c

cross@spitfire.i.gajendra.net (Dan Cross) wrote or quoted:

I'm not a huge fan of Carruth.

(Text after "| " below was generated by a chatbot asked to explain
narrow contracts and the reduction of efficiency by defining UB.)

(Let me guess: You are not a huge fan of chatbots either!
Ok, that was easy.)

Chandler talked about how narrow contracts allow optimizations.

| - Wide Contract: The function guarantees to handle all possible inputs
| gracefully, usually by returning an error code or throwing an
| exception. (e.g., "If the pointer is null, return ERR_NULL_PTR").
|
| - Narrow Contract: The function only guarantees correct behavior if
| the caller meets specific preconditions. If the preconditions are
| violated, the behavior is undefined.
|
| When is it appropriate to have a narrow contract? Always, when
| performance, memory footprint, or direct hardware control are
| paramount. In operating system kernels, embedded systems, real-time
| applications, and high-performance computing, the overhead of
| validating every pointer, checking every array bound, and verifying
| every integer range is unacceptable. C assumes the programmer is
| competent and knows the state of their own data. Narrow contracts
| shift the burden of correctness from runtime execution to compile-time
| reasoning and programmer discipline.

Chandler also explained how defining UB for certain operations
would require less efficient code to be generated.

| The hardware: Some architectures silently wrap on overflow, some trap
| and halt the CPU, and some have no concept of the operation at all.
| Forcing a single, defined behavior (like "always wrap around") would
| require compilers to insert expensive emulation code on architectures
| that don't support it natively, destroying C's "trust the hardware"
| philosophy.
|
| Or, consider a loop:
|
| for (int i = 0; i < n; i++) {
| arr[i] = 0;
| }
|
| If out-of-bounds array access had defined behavior, the compiler would
| have to insert a bounds check ("if (i >= array_length)") on every single
| iteration. Because out-of-bounds access is UB, the compiler can assume
| n is always within bounds. This allows it to unroll the loop,
| vectorize it using SIMD instructions, and process 8 or 16 elements per
| CPU cycle, yielding massive performance gains.

Well, there are some tests that can be taken out of loops (as
in Java), but other tests can't.

--- Synchronet 3.22a-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Sun Jun 14 22:02:50 2026

From Newsgroup: comp.lang.c

On 14/06/2026 16:33, Dan Cross wrote:

In article <110k0mp$329k6$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

On 13/06/2026 14:02, Dan Cross wrote:

In article <110ghmv$21vi3$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

[snip]
As for my '"modern compilers are evil" crowd' comment, there are people >>>> (not anyone involved in this discussion) who really do fall into that
camp. I've seen people who are experienced and respected developers
make all sorts of accusations to compiler developers, claiming they are >>>> only interested in high scores on synthetic benchmarks and directly
insulting their motivations and integrity, blaming them for "breaking" >>>> their code that relied on the effects of some kinds of UB. It is always >>>> frustrating when you have code that works fine with one compiler
version, but using another compiler results in failure due to UB in your >>>> code - especially if writing correct code gives inefficient results with >>>> the first compiler. And it's fine to say you'd be happier if a
particular thing that is UB in C were not UB - but it is unreasonable to >>>> blame compiler developers for implementing the language as it is defined. >>>

Eh...I think those people have a point.

Note, I don't think that "modern compilers are evil" (I mean,
wow, that's a strong word) and I certainly do not think it is
appropriate to malign the people who write them personally over
what one does with code.

I think it is important for tools to be helpful, and it's fine to
complain if a tool is being directly unhelpful - or ask for improvements
when you think it could be better.

Yes.

But I _do_ think it is fair to say that UB is very easy to fall
into in C, that programs that have worked correctly (insofar as
their intended behavior as written) for years can suddenly fail
because latent UB is treated differently in a point revision of
a compiler, and that that (as you point out) can be incredibly
frustrating for the authors.

It can certainly happen, yes. And I fully sympathise on these few
occasions when changes to the standard has meant that code that
previously had defined behaviour, now has different or undefined
behaviour. (However, I think that for some kinds of code, programmers
could be better at specifying exactly what standards their code
requires, and the standards they use when compiling code.)

But it is important to realise that if you write code with UB, it is
/your/ mistake - not the mistake of the compiler developers, or the
mistake of the standards authors. Compiler vendors can (and do!) try to
help programmers find their mistakes - experience shows, however, that
many programmers reach first for bug report forms or complaints in
forums before compiler tools like sanitisers or even enabling warnings
on their builds.

Programming in C is a cooperative effort - including the standards
authors, the compiler vendors, and the C programmers. Each group can
try to help the others, but each is ultimately responsible for their own
part.

Here's the problem that I have with this line of reasoning. C
is a language that has considerable history; there was a large
body of C code written before the first standard was ever
created, in 1988; C was a teenager. And it took many years for
decent quality ANSI C compilers to be ubiquitous. C could
legally drink by then.

"Undefined Behavior", in C, in the manner usually discussed in
this newsgroup, was introduced with the first standard. That
means that there is --- still --- a large body of software that
has "UB" that was put there before UB existed as a thing
programmers needed to worry about in C.

Even once it was a part of C, the concept was communicated
poorly.

It is certainly the case that C code has been written for a long time.
And it is certainly the case that some C code was written long ago, and
is still used on systems today. But I think it is important to keep in
mind that the solid majority of C code is relatively recent. Very
little pre-C90 code is ever compiled with modern tools. Code that is
old and still in use is important code, but modern code and modern tools should not be kept back because of it.

Maybe there is scope for compilers to have better options for handling
old code, other than the usual "Use -O0 to avoid optimising on UB"
solution. You could come a long way with a "treat all variables as
volatile" flag, for example.

Some people seem to delight in this, believing precision in
interpreting the standard in abstruse ways is an expression of
deep technical expertise; but it really is not.

Agreed.

Yes, UB is created by programmers. However, in large systems,
it may be that it was created inadvertantly; someone makes a
change that subtley invalidates some invariant that an unknown
caller far away in the code base (or in another one that relies
on the change via an indirect dependency) and now you've got UB;
locally, everything appears correct; but it's the combination
where the UB manifests.

That can certainly happen. But that's just bugs in the code. I don't
see why UB should be considered as something special here. People
making changes to existing code sometimes misunderstand things, or accidentally break something that worked before. That's life as a
programmer, and there are techniques to reduce the risk - code reviews, linters, testing regimes, etc. Nothing gives 100% guarantees, and
everything has to weigh risks, consequences, costs and resources. UB is
not special here.

Regehr called out a dichotomy with UB: programmers using a
language hate it; compiler writers love it.

I think Regehr has made some good points in his writings, but I do not
agree with him on everything.

As a programmer, I am a fan of the concept of UB. I am quite happy with
the idea that operations have a pre-condition, and that if there is no
"right answer" for a given input, I should not provide that input. I
prefer that signed integer arithmetic overflow is UB, and do not want it
to be wrapping or have some other semantics - to me, it is far clearer
that way. If I have UB in my code, it's a bug - no different from any
other bug I might make.

This example makes little sense to me. If you don't want
integer overflow, then don't overflow; the techniques for
avoiding it are pretty well known. But why is specifically
better that it is UB, rather than than trapping in debug
builds, or having IB semantics based on the underlying machine?
It seems to be that the burden on the programmer is the same.

UB means precisely that I can choose trapping, or IB, or optimising on
the assumption it does not happen. If signed integer overflow were
defined as wrapping, then compilers could not put in traps to catch the
errors because as far as the language is concerned, they are not errors.
If they are defined as causing traps, then that's the semantics -
compilers could not optimise code assuming overflow does not happen,
unless it can prove there is no overflow.

And making it defined behaviour gives programmers the mistaken idea that
they don't need to avoid overflow because there is no UB.

Making this UB is an admission of the blindingly obvious - there is no
correct answer when signed integer overflow occurs. It tells
programmers that it is a mistake to let your arithmetic overflow, and it allows tools to help programmers avoid these mistakes, and it allows
compilers to give programmers the most efficient results from known good
code rather than adding unnecessary run-time checks that are never
triggered.

It is the case that in C, there are some kinds of UB that can be quite
subtle. However, you rarely need to risk meeting them. Yes, there are
pitfalls - don't go near them, and they don't matter.

I disagree. I think almost all non-trivial programs have UB to
a greater or lesser extent, whether they intend to or not.

However, it is unfortunately the case that sometimes avoiding UB can be
costly in performance terms. An example would be if you have need of
type-punning - perhaps you have a float in memory and you want to access
it as an uint32_t for some reason. Casting a float * to an uint32_t *
and using that new pointer is UB. Some compilers will nonetheless
generate the code you want after such a cast. Some compilers might not,
depending on details of the rest of the surrounding code, because it is
UB. A non-UB solution would be to use memcpy(), or a type-punning
union. For highly optimising compilers, that's fine - the code
generated by gcc or clang for a memcpy() here is likely to be as
efficient as you could get - directly reading the float from memory to
an integer register. For other compilers, however, you might get a call
to a memcpy() library function in an external DLL, taking orders of
magnitude more cycles. What is the poor programmer to do? Write code
that is portable and correct, but very slow with some implementations?
Write code that "cheats" and is efficient on some implementations but
might not give the desired results on others? Use pre-processor
monstrosities to detect different compilers and adapt accordingly? That
is what I see as the biggest issue resulting from compiler optimisation
based on UB. I don't know what the "best" answer here is.

This is kind of my point. If you need a fast way to convery

(I think you missed a bit of your answer here?)

Here's my own vignette: I was chatting with a friend who works
on LLVM and clang some time ago. I said, "I don't want UB" and
he replied, "no, you really do." I asked him what he meant and
he responded that I wanted a compiler that is capable of
optimizing my program; "sure, but I still don't want UB." We
went on for a bit, and it became clear that he saw UB as _the_
vehicle for unlocking optimization.

I realized that we were not speaking the same language _at all_.
He and I both wanted a language where we could write programs
that yield efficient object code. He saw UB as essential for
that; but what I want is a language with well-defined semantics
that can be aggressively optimized.

I too want a language with well-defined semantics that can be
aggressively optimised. But I do not see UB as a hinder to that.

UB is literally the opposite of well-defined.

I want good definitions of things that should be defined. Things that
cannot have good definitions, are fine left undefined. A language
standard should not be trying to define the behaviour of /everything/.

I am happy knowing that I cannot divide by 0,

Yup. That should be a trap.

For some programs, yes. For others, no.

or find the square root of a negative number (in the real
domain).

Yup. That should be a trap.

For some programs, yes. For others, no.

I am happy knowing that I cannot add two ints if their sum
overflows the range of their type,

Yup. That should be a trap (if you want wrapping semantics, you
should request it explicitly).

I agree that wrapping semantics should be something you have to ask for.
(As an aside, I think it is a mistake for languages to have types that
have wrapping semantics - it's the operations that should wrap, not the
types. Zig gets it right by distinguishing between "x + y" and "x +% y".)

I don't want to pay the price for checks, traps, and limited
re-arrangements and optimisations when I know my expressions don't
overflow. But I am also happy to be able to get a trap when I ask for it.

and that I cannot call a function with a different number or
type of parameters than its definition.

Yup. That should be a compile-time error.

There I agree entirely. The build model of compiling units to separate
object files without any information beyond symbol names made sense 50
years ago - we should be doing far better now. (We /can/ do far better,
but it requires conventions in the way you write your C code and the
options used when compiling or linting the program.)

I have a great deal of difficulty seeing how things could be
any different, other than in a managed language with significant
overhead from run-time checks - and that goes against the
"aggressively optimised" requirement.

There are existence proofs of other languages that can, and do,
do these things, and do them well. I hate to keep beating this
drum, but I think Rust does well here: in safe Rust, UB is a
compile-time error; in *unsafe* Rust, there are tools to help
find where programmers violate the language's invariants.

Certainly it is possible to eliminate a number of things that are UB in
C. UB that is not necessary, or not useful, is a bad thing in a language.

But I think it is equally bad to give things a definition simply to be
able to say there is no UB. It is, IMHO, entirely /wrong/ of a language
to define integer overflow as wrapping simply so that it is not UB. I
do not see a guaranteed incorrect result that likely has catastrophic consequences in a program as being better than UB. (I believe Rust
defines integer overflow as trapping in "debug" mode and wrapping in
"release" mode, which I think is a horrendous idea.)

Having "well-defined semantics" does not mean the language should accept
anything that happens to fit the syntax and grammar rules, or that all
functions and operations should give a defined result for all inputs.

I never said that it did.

I didn't say you said it did :-)

It means that the set of valid inputs is clearly defined, along with the
outputs and effects you get when the inputs are valid.

So I was the one who said "well-defined semantics" and I had a
specific meaning in mind. Your definition is incomplete with
respect to that meaning: in addition to what you said, invalid
inputs should be rejected, either as a compile time error, or by
generating an exception or panic at runtime. If you want to
live dangerously and turn the runtime checks off for performance
reasons, then you get 2's complement behavior for integers or
whatever the machine does for the others.

I am all in favour of compile-time checks and rejecting code with errors
(not just UB) as soon as possible. The "perfect" language is one where
you really can follow the old Ada saying - if you can make it compile,
it's ready to ship.

I don't live dangerously by not having run-time checks on integer
overflows. I make sure my code does not have them, so checks are
unnecessary. For some of my code, if it "panicked" somewhere in
calculations, that would be a disaster - when you have code controlling
power electronics, a sudden stop can mean short-circuits and components releasing their magic grey smoke.

Thinking that run-time checks will save you from UB is wishful thinking.
How are you going to have run-time checks that a pointer parameter
points to a valid object of the right type? You can check for a
null-pointer, but that's about it. Some things that are potential UB in
C are inherent in the type of language - checking for such problems (at compile-time or run-time) needs a language that has a different way of handling objects and pointers so that you cannot have arbitrary pointers
to arbitrary objects.

C is not a language suitable for such run-time or compile-time checks -
it is a language for getting the highest efficiency because the
programmer takes responsibility for getting things right. You are
correct that large programs normally have bugs (of which UB is just one
class) - the risk of bugs goes up with the size of the code base. The corollary is that C is not a language suitable for large programs.

Rust, I think, reduces the risk of some kinds of bugs. So does C++,
when used carefully. Most code, however, is best written in languages
where these issues cannot occur - or at least where checks can be done
without a measurable impact. For example, if you use Python, you never
have integer overflow, and you never have invalid pointers.

(There are plenty of points in the C standards where the wording could
make the semantics clearer, or where the range of input values could
easily have been larger - I am not suggesting C is as well-defined as it
could reasonably be.)

It's not just that it's nowhere close to being as well-defined
as it should be, it's because the language as defined permits
behavior that varies far too widely, specifically because of UB.

Consider one of the examples you gave: signed integer overflow.
The standard doesn't say that you _can't_ add two numbers
together if you overflow, it just says that if you do, the
language imposes no requirements on the resulting behavior. It
may trap, it may elide the addition entirely, or it may do it
and let the result be whatever the underlying machine does.

That is, the _language_ does not say that it's a bug; it says
that it's not going to say anything about it at all.

I'd be happy for the C standard to say that signed integer overflow is a
bug, or that code is not allowed to overflow its integer arithmetic. I
would not be happy if it said compilers must trap on the bug or handle
it in some specific way - what happens when a bug is reached is still
UB. And if the wording of the standard were changed to call it a "bug"
rather than "UB", it would make absolutely zero difference to the way I
write my code.

This is one reason the committee is trying to reign some of this
in.

That, I think, is the tension: there was a fundamental breakdown
in communication between the users of the language, and those
defining and implementing it. My subjective sense is that in
the past few years things are getting somewhat better, but it is
hard to evolve something as critical and widely used as C.

Communication between the separate parties is always an issue, and it is
easy for it to be a one-way street with a language standards committee
dictating the rules with little attention to feedback, then compiler
vendors following these rules without listening to the users.

A challenge here, perhaps, is that users are a very diverse group. How
much should compiler vendors cater for those that put a lot of effort
into correctness and want top efficiency, or those that are less
knowledgable about the language but want to avoid the consequences of
their mistakes? What about those working with old code written for
different compilers with different unwritten rules? It is not easy to
please everyone.

I think that's simplistic; not many programmers actively want to
"avoid the consequences of their mistakes." Do you really
believe that they do? If so, why?

It was badly worded - I meant that programmers do not want mistakes that
they might make to lead to additional problems. We can all appreciate
and expect that if we make a mistake in code with an incorrect
calculation, that will give incorrect output, or perhaps a crash in the program. But we hope that it will not lead to corruption of a
filesystem, or an exploitable security hole - something out of
proportion with the mistake.

Conversely, there *is* this kind of machismo attitude among many
C programmers that it requires a superior intellect to truly
understand this language, and those who do not (or who make any
mistake in their understanding) are simply unworthy. I have
repeatedly observed this over many decades now, and when I see
it, I think that it is odious.

In my field, people usually put a lot of effort into writing code simply
and clearly. You avoid mistakes not by being "clever", but by being meticulous and careful. I don't think successful C programming requires greater intellect, knowledge or experience compared to other programming languages - but it /does/ require an appropriate attitude. You are
working with sharp knives - pay attention to what you are doing, and
you'll be fine.

My experience is that most programmers are highly intelligent,
capable people. They are not wrong to want behavior they can
rely on, particularly when things are not obvious, as they
often are not. They also want a language that requires a less
lawyerly read of to understand its semantics; that could go the
way of formality (my preferred approach) or just clearer
exposition. Either would be preferable to the current state.

I was avoiding signed integer overflow long before I had read any C
standards or even knew about the term "UB". Programming in C does not
need a lawyer knowledge of the language. It is just like programming in
any other programming language - use features that you know are correct,
and if you want to do something and don't know how to do so correctly,
look it up.

In fairness, I think the current members of the committee
recognize this.

I am not in any way saying that critics of aspects of C (the language, >>>> the standards, or compiler implementations) should be dismissed or
despised - merely that the example of loop elimination leading to UB and >>>> unexpected results is regularly used as "evidence" by those that hold
extreme positions about C, despite it being very unrealistic for the
issue to cause problems in real coding practice.

The kernel I am working on has about 5 million lines of code.
That code has been evolving for 40 years; some of it predates
the ISO standards and even the ANSI standard. It has been
updated for newer compilers, sure, but in some places the
treatment is surface-level: using ISO-style function prototypes
and definition syntax, for example. But deep problems remain in
parts, and contraints on engineering resources couple with
economic and business pressures so that it's not going to get
cleaned up any time soon. I'm sure there is UB in it; in fact,
I know there is. But them's the breaks; and yet, customers are
using it in production. Because of this, upgrading toolchains
is laborious and complex, and takes a lot of time, and new
compilers are (rightly) viewed with suspicion. That is not a
great situation, but I don't think anyone is angry at the
compiler people over it.

I think that is a good way to handle the situation. In my projects, I
do not normally upgrade or change toolchains. While I think the risk of
UB is small in my own code, small does not mean non-existent. And for
my work, generated code that behaves correctly in terms of C semantics
but has different execution times or code size might also be an issue -
so changes in toolchains mean a lot of extra testing and qualification.

Obviously in a production setting tools should be tested and
qualified. But the danger posed by UB adds unacceptable risk on
large projects, and the burden for updating a toolchain is too
high. That is as much an indictment of the language as of any
particular project.

As a counter example, there was the Harvey project, which was a
fork of Plan 9 where the Plan 9 C dialect was replaced with ISO
C; we accounted for this by having CI build with 6 seperate
compilers; this flushed out a lot of bugs.

I am surprised that more projects do not adopt canary CI builds
against newer toolchains.

In addition, for some microcontrollers the toolchains have relatively
small user bases and consequently higher risks of unknown bugs in the
toolchains themselves. Sometimes there are also implementation-specific
features that change between versions (though that is less of an issue
these days).

Fun fact: part of the reason Google got involved in clang and
LLVM development was because the vendor toolchain for a
particular microcontroller used in android phones was buggy and
would crash (that is, the compiler itself crashed). The
solution was not to live with it; it was to build a better
toolchain.

Buggy toolchains are always a pain. (So is buggy hardware -
microcontrollers and cpus have their errors too.)

Google could afford to do that; I recognize not many
organizations can.

Unfortunately that's true.

And just as it's not acceptable to blame compiler writers for
implementating the language as it is defined, it's not really
acceptable to blame programmers either; some of the people who
put the UB there are (literally) dead, and there's just not
enough time in the day to go clean it all up. I wish there was
more compassion for that.

Being dead does not resolve you of the responsibility - the person that
wrote the code with UB is the person who wrote the code with the UB,
just like any other bugs. That person wrote the code with the error.

See above. Those people may well have written the code before C
was standardized and before UB as we know it now existed. Also,
by definition UB is not an error.

It might not be fair to hold it against them - there are a great many
possible reasons why it was not their fault (typically management is
more at fault than the coders!). And placing blame is rarely a useful
exercise - usually it does not matter where the bugs came from, only
that they are there and need to be fixed or worked around.

Exactly. The footguns hiding in C code that has worked
perfectly for decades, dating back to before the standards
existed, are legion. Caveat emptor.

_Or_ the code may have been written with careful regard for the
standard, but something _else_ may have been changed that now
leads to exposure to UB. For example, perhaps code was written
that multiples two numbers, `a*b`; a known to be `unsigned int`
when written, but `b` is a signed int. But maybe that is hidden
behind a typedef; some time in the future, the typedef is
changed so that `a` is now `unsigned short`; perhaps someone
realized that the domain values never exceed 16 bits and by
changing the definition some critical structure now fits in a
single cache line. But also now the type promotion rules kick
so that `a*b` happens with the factors as `signed int` and in
there exist values of `a` and `b` where `a*b` overflows: UB.

The code had no UB; the change was elsewhere; no one saw this
because the tests all passed and everything looked ok; then
someone upgrades the compiler and now things break.

Who's fault is that?

There's no simple answer here.

But one thing is clear to me - "UB" is irrelevant here (and in many of
your points). It would not matter if everything had fully defined
behaviour. The point is that something is changed in one part of the
code that has unexpected consequences in another part of the code. Who
cares if there is UB or not? The issue is that the code does not work
as intended or expected. UB can provide situations where you have
unexpected bugs - but so can all sorts of other things.

And no, this is not contrived; this is exactly the sort of thing
that happens on large, long-lived projects.

As said earlier, C is what it is. I suspect that it will
continue to make incremental improvements, but we're basically
stuck with what we have.

Agreed.

...but be careful blaming the programmer.

Or the language, or the tools.

--- Synchronet 3.22a-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Sun Jun 14 21:24:20 2026

From Newsgroup: comp.lang.c

ram@zedat.fu-berlin.de (Stefan Ram) writes:

cross@spitfire.i.gajendra.net (Dan Cross) wrote or quoted:

I'm not a huge fan of Carruth.

(Text after "| " below was generated by a chatbot asked to explain
narrow contracts and the reduction of efficiency by defining UB.)

(Let me guess: You are not a huge fan of chatbots either!
Ok, that was easy.)

Chandler talked about how narrow contracts allow optimizations.

| - Wide Contract: The function guarantees to handle all possible inputs
| gracefully, usually by returning an error code or throwing an
| exception. (e.g., "If the pointer is null, return ERR_NULL_PTR").
|
| - Narrow Contract: The function only guarantees correct behavior if
| the caller meets specific preconditions. If the preconditions are
| violated, the behavior is undefined.
|
| When is it appropriate to have a narrow contract? Always, when
| performance, memory footprint, or direct hardware control are
| paramount. In operating system kernels, embedded systems, real-time
| applications, and high-performance computing, the overhead of
| validating every pointer, checking every array bound, and verifying
| every integer range is unacceptable.

I have a recollection that a version of IBM's MVS operating
system did, indeed, validate input and output arguments to kernel
functions.

Indeed, google says it was called MVS/SP and later MVS/XA (extended addressing).
--- Synchronet 3.22a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Sun Jun 14 15:55:09 2026

From Newsgroup: comp.lang.c

David Brown <david.brown@hesbynett.no> writes:
[...]

UB means precisely that I can choose trapping, or IB, or optimising on
the assumption it does not happen.

No, it means that the implementation can make that choice (or allow you
to make that choice). A conforming compiler could generate code on the assumption that signed overflow never happens, and not give the
programmer any options.

[...]

Making this UB is an admission of the blindingly obvious - there is no correct answer when signed integer overflow occurs. It tells
programmers that it is a mistake to let your arithmetic overflow, and
it allows tools to help programmers avoid these mistakes, and it
allows compilers to give programmers the most efficient results from
known good code rather than adding unnecessary run-time checks that
are never triggered.

Trapping or raising/throwing an exception on overflow would also be an admission of the blindingly obvious. And a sufficiently clever compiler
can omit some (not all) checks in cases where it can be statically
proved that overflow doesn't occur, and/or hoist some checks out of
loops.

Of course those kinds of checks are not in the "spirit of C".

[...]

I am happy knowing that I cannot divide by 0,

Yup. That should be a trap.

For some programs, yes. For others, no.

What's the difference between these programs?

[...]

I don't want to pay the price for checks, traps, and limited
re-arrangements and optimisations when I know my expressions don't
overflow. But I am also happy to be able to get a trap when I ask for
it.

I don't want to pay the price of checking for syntax errors when I know
my code is syntactically correct. But I never know that, because I'm
fallible.

I admit that's not a very strong argument. There are real differences
between compile-time and run-time checks.

[...]
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.22a-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Mon Jun 15 10:09:56 2026

From Newsgroup: comp.lang.c

On 15/06/2026 00:55, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:
[...]

UB means precisely that I can choose trapping, or IB, or optimising on
the assumption it does not happen.

No, it means that the implementation can make that choice (or allow you
to make that choice). A conforming compiler could generate code on the assumption that signed overflow never happens, and not give the
programmer any options.

Sure. But if it were not UB, then a conforming implementation could not
make such choices or give me such choices. UB does not mean that I
definitely have such choices (as my poor wording implies), but that implementations are able to give me the choice.

If the standards had said integer overflow was IB, then that puts limits
on what the compiler can do - and therefore on what it can do to help
the programmer. Exactly what options it had would depend on the wording
of the standard, such as whether it required an "implementation-defined
value" or, like narrowing conversions to signed integer types, "either
the result is implementation-defined or an implementation-defined signal
is raised". However, even in that later case I think it would be more confusing for a lot of programmers - many programmers, quite reasonably,
have an intuition that "UB" means "don't do this" or "this is not legal
in C". They also have the intuition that "IB" means "this works
according to the underlying hardware". If the standards had said
integer overflow was IB, most programmers would immediately assume that
meant wrapping behaviour.

More interesting, I think, is the possible future "erroneous behaviour" marker. My understanding is that it lets the compiler have traps or
other run-time detection, or provide unspecified values, while making it
clear that erroneous behaviour is a result of software bugs.

[...]

Making this UB is an admission of the blindingly obvious - there is no
correct answer when signed integer overflow occurs. It tells
programmers that it is a mistake to let your arithmetic overflow, and
it allows tools to help programmers avoid these mistakes, and it
allows compilers to give programmers the most efficient results from
known good code rather than adding unnecessary run-time checks that
are never triggered.

Trapping or raising/throwing an exception on overflow would also be an admission of the blindingly obvious.

It is obvious - to me, anyway - that signed overflow is a mistake in the
code. It is trying to do something that cannot be done. What is the single-digit sum of 5 and 8? There is no answer. The answer is not 3,
or 9. Putting your hand in the air and asking the teacher for help
might be appropriate sometimes, but it is not a correct answer.

Throwing some kind of exception or trap can definitely be helpful at
times. And I agree that it would make it obvious that there has been a problem detected. But throwing exceptions or traps can cause more
problems (the Ariane 5 failure was caused by the exception handler, not
the overflow fault). That does not mean it is better to ignore
overflows - it means there is no appropriate action that is suitable in
every situation. I am far from convinced that there is even a
reasonable choice of default action that could be usefully made.

And a sufficiently clever compiler
can omit some (not all) checks in cases where it can be statically
proved that overflow doesn't occur, and/or hoist some checks out of
loops.

Sure - but in practice having strict overflow checks would significantly reduce optimisation and re-arrangement possibilities, as well as having
to include the checks themselves. You might allow non-strict checks in
some manner (thus allowing optimisations like "a + b - a" reducing to
just "b"), but I think that might be hard to specify and would reduce
the debugging help of the checks.

Of course those kinds of checks are not in the "spirit of C".

Indeed.

And if we want to move away from the "spirit of C", then I think we
should move away from the /language/ of C. In C, people do not expect exceptions or sudden jumps from their code - they expect that if there
is checking for errors, it is explicit in the code. In many other
languages, there is a much clearer understanding that lots of things can
fail and cause immediate exits from the function - and code is
(hopefully!) written to handle that.

[...]

I am happy knowing that I cannot divide by 0,

Yup. That should be a trap.

For some programs, yes. For others, no.

What's the difference between these programs?

There are disadvantages in having a trap. It can (depending on
hardware) mean extra code to detect the zero - usually that run-time
cost is negligible, but sometimes it is not. It will mean extra code to handle the exception - again, often but not always negligible. Those
costs apply even if the programmer has made sure that division by zero
never occurs. And if a trap is thrown, what then? I think that a
programmer that is careful enough to see that a division expression
might throw, and handle the trap or exception appropriately, is going to
be careful enough to avoid the problem in the first place. So the trap
is going to be unexpected and handled badly. A badly handled division
by zero exception left the USS Yorktown dead in the water for three hours.

Is it better /not/ to trap? There is no general rule. If you have
tried to divide by zero, something has gone wrong before the division,
and there are no good answers to what will go wrong afterwards.
Sometimes it is possible to do damage limitation - sometimes not.

The correct way to handle the situation is to avoid it - be sure that
you are not dividing by zero in the first place. Identify and handle
the problem where it occurs - when this zero is created, or the
circumstances leading to that point - rather than trying to do a
post-mortem after the failed division. And if you are doing that, then
what benefit is there in having trapping for division by zero? It
becomes just a waste of effort.

(There are other ways of handling such things, like the use of NaN's in floating point, or extending your integers with some kind of "invalid" indicators.)

[...]

I don't want to pay the price for checks, traps, and limited
re-arrangements and optimisations when I know my expressions don't
overflow. But I am also happy to be able to get a trap when I ask for
it.

I don't want to pay the price of checking for syntax errors when I know
my code is syntactically correct. But I never know that, because I'm fallible.

Checking for syntax errors is cheap - PC computing power is, in this
context, pretty much free and unlimited. If I am using a target
environment where run-time resources are plentiful, I would not be using
C in the first place.

I admit that's not a very strong argument. There are real differences between compile-time and run-time checks.

Perhaps I work in a field where that difference is more extreme than for
many programmers, and I thus feel it more than most.

--- Synchronet 3.22a-Linux NewsLink 1.2

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Mon Jun 15 10:43:25 2026

From Newsgroup: comp.lang.c

David Brown <david.brown@hesbynett.no> wrote:

On 15/06/2026 00:55, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:

<snip>

[...]

Making this UB is an admission of the blindingly obvious - there is no
correct answer when signed integer overflow occurs. It tells
programmers that it is a mistake to let your arithmetic overflow, and
it allows tools to help programmers avoid these mistakes, and it
allows compilers to give programmers the most efficient results from
known good code rather than adding unnecessary run-time checks that
are never triggered.

Trapping or raising/throwing an exception on overflow would also be an
admission of the blindingly obvious.

It is obvious - to me, anyway - that signed overflow is a mistake in the code. It is trying to do something that cannot be done. What is the single-digit sum of 5 and 8? There is no answer. The answer is not 3,
or 9. Putting your hand in the air and asking the teacher for help
might be appropriate sometimes, but it is not a correct answer.

Throwing some kind of exception or trap can definitely be helpful at
times. And I agree that it would make it obvious that there has been a problem detected. But throwing exceptions or traps can cause more
problems (the Ariane 5 failure was caused by the exception handler, not
the overflow fault). That does not mean it is better to ignore
overflows - it means there is no appropriate action that is suitable in every situation. I am far from convinced that there is even a
reasonable choice of default action that could be usefully made.

And a sufficiently clever compiler
can omit some (not all) checks in cases where it can be statically
proved that overflow doesn't occur, and/or hoist some checks out of
loops.

Sure - but in practice having strict overflow checks would significantly reduce optimisation and re-arrangement possibilities, as well as having
to include the checks themselves. You might allow non-strict checks in
some manner (thus allowing optimisations like "a + b - a" reducing to
just "b"), but I think that might be hard to specify and would reduce
the debugging help of the checks.

IMO resonable and easy definition is: computation either delivers mathematically correct result or traps, and it is not allowed to
trap in cases where naive bottom-up evaluation does not trap.
In more formal way optimization is not allowed to introduce
stronger precondition, but may weaken it.

<snip>

The correct way to handle the situation is to avoid it - be sure that
you are not dividing by zero in the first place. Identify and handle
the problem where it occurs - when this zero is created, or the circumstances leading to that point - rather than trying to do a
post-mortem after the failed division. And if you are doing that, then
what benefit is there in having trapping for division by zero? It
becomes just a waste of effort.

What is value of certification required for some software? If
programmer did good job then program will work correctly.
Trap give assurance that programmer indeed correctly handled
tricky problem. And once you know that computation works
according to math rules other forms of verification are easier.

You also seem to have bias to real time control: if you need
value just at given moment, then it is hard to do something
reasonable. But at least in some control areas there is
notion of "safe state", for example working heavy machine
is dangerous, stopped one usually is considerd safe. If
there is safe state, then anything not expected by program
should trigger transition to safe state.

In general computation, if you need correct value and have some
time there are options which may involve re-doing computation at
higher precistion, which may get rid of occasional overflows
and divisions by zero due to overflow. Division by zero may
be due to bad input data, traps allow indentification of
such data (doing it in other way may be computationaly quite
expensive).
--
Waldek Hebisch
--- Synchronet 3.22a-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Mon Jun 15 16:01:32 2026

From Newsgroup: comp.lang.c

On 15/06/2026 12:43, Waldek Hebisch wrote:

David Brown <david.brown@hesbynett.no> wrote:

On 15/06/2026 00:55, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:

<snip>

[...]

Making this UB is an admission of the blindingly obvious - there is no >>>> correct answer when signed integer overflow occurs. It tells
programmers that it is a mistake to let your arithmetic overflow, and
it allows tools to help programmers avoid these mistakes, and it
allows compilers to give programmers the most efficient results from
known good code rather than adding unnecessary run-time checks that
are never triggered.

Trapping or raising/throwing an exception on overflow would also be an
admission of the blindingly obvious.

It is obvious - to me, anyway - that signed overflow is a mistake in the
code. It is trying to do something that cannot be done. What is the
single-digit sum of 5 and 8? There is no answer. The answer is not 3,
or 9. Putting your hand in the air and asking the teacher for help
might be appropriate sometimes, but it is not a correct answer.

Throwing some kind of exception or trap can definitely be helpful at
times. And I agree that it would make it obvious that there has been a
problem detected. But throwing exceptions or traps can cause more
problems (the Ariane 5 failure was caused by the exception handler, not
the overflow fault). That does not mean it is better to ignore
overflows - it means there is no appropriate action that is suitable in
every situation. I am far from convinced that there is even a
reasonable choice of default action that could be usefully made.

And a sufficiently clever compiler
can omit some (not all) checks in cases where it can be statically
proved that overflow doesn't occur, and/or hoist some checks out of
loops.

Sure - but in practice having strict overflow checks would significantly
reduce optimisation and re-arrangement possibilities, as well as having
to include the checks themselves. You might allow non-strict checks in
some manner (thus allowing optimisations like "a + b - a" reducing to
just "b"), but I think that might be hard to specify and would reduce
the debugging help of the checks.

IMO resonable and easy definition is: computation either delivers mathematically correct result or traps, and it is not allowed to
trap in cases where naive bottom-up evaluation does not trap.
In more formal way optimization is not allowed to introduce
stronger precondition, but may weaken it.

It is always the case that an implementation can weaken preconditions
and strengthen postconditions and remain correct - though it might then
be less efficient than you expect. But if you are /requiring/ a weaker precondition and /requiring/ a strong postcondition - such as by
insisting on traps on overflow - you are changing the function or
operation specification, and it is not necessarily a good thing.

In C, the integer addition operation "c = a + b;" has a precondition :

(a + b) <= INT_MAX, (a + b) >= INT_MIN

It has the postcondition :

c == a + b

Saying that it must trap if there is overflow weakens the precondition
to any "a" and "b", but makes the postcondition much more complicated.
It means it is no longer true that the result of an addition operation
is the sum of the operands. Addition is no longer a "pure" function -
now it has side-effects that are completely unpredictable at the site of
use. Programmers can no longer rely on the timing of the operation,
stack usage, interaction with other code, or even that the operation
ever finishes.

If your code is correct, and overflow never happens, then this is all a
big disadvantage in terms of understanding and analysing the code. And
it does not in any way reduce the effort needed to be sure that your
inputs are appropriate for getting the desired results of the operation.

Trapping like this can certainly be useful for debugging. But as a
general feature it gives a false sense of security, complicates
mathematical analysis, introduces massive additional possible code path choices which are either real or almost certainly untested in practice,
or not real (because the compiler can see they are not taken) and
untestable. That is not qualitatively worse than "who knows what will
happen" UB, but it is not significantly better.

<snip>

The correct way to handle the situation is to avoid it - be sure that
you are not dividing by zero in the first place. Identify and handle
the problem where it occurs - when this zero is created, or the
circumstances leading to that point - rather than trying to do a
post-mortem after the failed division. And if you are doing that, then
what benefit is there in having trapping for division by zero? It
becomes just a waste of effort.

What is value of certification required for some software? If
programmer did good job then program will work correctly.

Yes.

Trap give assurance that programmer indeed correctly handled
tricky problem.

No, it certainly does not. And one of the reasons to dislike traps is
that it makes people think like that. A trap can only happen if the programmer did /not/ handle the problem correctly. And I expect that if
the programmer is able to write an appropriate specific trap handler for
the failing expression (rather than a program-global "crash with error message" handler), then he/she would be able to avoid the problem in the
first place.

Sometimes, of course, you are trying to write code that has some input
which is supposed to be correct, but you are not sure - and you can't
change the calling code. How you handle that situation will depend on
the program and the situation. But I don't see trapping as "correct
handling" unless the whole program is written with the expectation of
traps for error handling. You might, however, end up deciding that
trapping is the least bad option.

And once you know that computation works
according to math rules other forms of verification are easier.

You also seem to have bias to real time control: if you need
value just at given moment, then it is hard to do something
reasonable. But at least in some control areas there is
notion of "safe state", for example working heavy machine
is dangerous, stopped one usually is considerd safe. If
there is safe state, then anything not expected by program
should trigger transition to safe state.

I think if you are /not/ concerned with high efficiency in the code,
then you should be seriously questioning the choice of C as the language
in the first place. And even if you use C, there are often things you
can do to avoid having problems in the first place. The obvious one for integer overflow is to make more use of bigger types.

In general computation, if you need correct value and have some
time there are options which may involve re-doing computation at
higher precistion, which may get rid of occasional overflows
and divisions by zero due to overflow. Division by zero may
be due to bad input data, traps allow indentification of
such data (doing it in other way may be computationaly quite
expensive).

--- Synchronet 3.22a-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Mon Jun 15 17:52:09 2026

From Newsgroup: comp.lang.c

In article <8_EXR.112952$Mm3.81340@fx33.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:

ram@zedat.fu-berlin.de (Stefan Ram) writes:

cross@spitfire.i.gajendra.net (Dan Cross) wrote or quoted:

I'm not a huge fan of Carruth.

(Text after "| " below was generated by a chatbot asked to explain
narrow contracts and the reduction of efficiency by defining UB.)

(Let me guess: You are not a huge fan of chatbots either!
Ok, that was easy.)

Chandler talked about how narrow contracts allow optimizations.

| - Wide Contract: The function guarantees to handle all possible inputs
| gracefully, usually by returning an error code or throwing an
| exception. (e.g., "If the pointer is null, return ERR_NULL_PTR").
|
| - Narrow Contract: The function only guarantees correct behavior if
| the caller meets specific preconditions. If the preconditions are
| violated, the behavior is undefined.
|
| When is it appropriate to have a narrow contract? Always, when
| performance, memory footprint, or direct hardware control are
| paramount. In operating system kernels, embedded systems, real-time
| applications, and high-performance computing, the overhead of
| validating every pointer, checking every array bound, and verifying
| every integer range is unacceptable.

I have a recollection that a version of IBM's MVS operating
system did, indeed, validate input and output arguments to kernel
functions.

Indeed, google says it was called MVS/SP and later MVS/XA (extended addressing).

The Midori folks at Microsoft added bounds checking to all array
accesses in M# (the safe language they wrote Midori in). They
expected performance to be awful; when they provided it, the
overhead was pretty much undetectable: the cost was in the
noise.

- Dan C.

--- Synchronet 3.22a-Linux NewsLink 1.2

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Mon Jun 15 17:57:31 2026

From Newsgroup: comp.lang.c

David Brown <david.brown@hesbynett.no> wrote:

On 15/06/2026 12:43, Waldek Hebisch wrote:

David Brown <david.brown@hesbynett.no> wrote:

On 15/06/2026 00:55, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:

<snip>

[...]

Making this UB is an admission of the blindingly obvious - there is no >>>>> correct answer when signed integer overflow occurs. It tells
programmers that it is a mistake to let your arithmetic overflow, and >>>>> it allows tools to help programmers avoid these mistakes, and it
allows compilers to give programmers the most efficient results from >>>>> known good code rather than adding unnecessary run-time checks that
are never triggered.

Trapping or raising/throwing an exception on overflow would also be an >>>> admission of the blindingly obvious.

It is obvious - to me, anyway - that signed overflow is a mistake in the >>> code. It is trying to do something that cannot be done. What is the
single-digit sum of 5 and 8? There is no answer. The answer is not 3,
or 9. Putting your hand in the air and asking the teacher for help
might be appropriate sometimes, but it is not a correct answer.

Throwing some kind of exception or trap can definitely be helpful at
times. And I agree that it would make it obvious that there has been a
problem detected. But throwing exceptions or traps can cause more
problems (the Ariane 5 failure was caused by the exception handler, not
the overflow fault). That does not mean it is better to ignore
overflows - it means there is no appropriate action that is suitable in
every situation. I am far from convinced that there is even a
reasonable choice of default action that could be usefully made.

And a sufficiently clever compiler
can omit some (not all) checks in cases where it can be statically
proved that overflow doesn't occur, and/or hoist some checks out of
loops.

Sure - but in practice having strict overflow checks would significantly >>> reduce optimisation and re-arrangement possibilities, as well as having
to include the checks themselves. You might allow non-strict checks in
some manner (thus allowing optimisations like "a + b - a" reducing to
just "b"), but I think that might be hard to specify and would reduce
the debugging help of the checks.

IMO resonable and easy definition is: computation either delivers
mathematically correct result or traps, and it is not allowed to
trap in cases where naive bottom-up evaluation does not trap.
In more formal way optimization is not allowed to introduce
stronger precondition, but may weaken it.

It is always the case that an implementation can weaken preconditions
and strengthen postconditions and remain correct - though it might then
be less efficient than you expect. But if you are /requiring/ a weaker precondition and /requiring/ a strong postcondition - such as by
insisting on traps on overflow - you are changing the function or
operation specification, and it is not necessarily a good thing.

In C, the integer addition operation "c = a + b;" has a precondition :

(a + b) <= INT_MAX, (a + b) >= INT_MIN

It has the postcondition :

c == a + b

Saying that it must trap if there is overflow weakens the precondition
to any "a" and "b", but makes the postcondition much more complicated.

No. Precondition is the same. Postcondition has additional term
"computation finished with no traps".

It means it is no longer true that the result of an addition operation
is the sum of the operands.

Oposite of that: no traps means that regardless of precondition
the result of an addition operation is the sum of the operands.

Addition is no longer a "pure" function -
now it has side-effects that are completely unpredictable at the site of use. Programmers can no longer rely on the timing of the operation,
stack usage, interaction with other code, or even that the operation
ever finishes.

The difference is that without traps programmers do not know if
arithmetic operations give correct result. With traps they do
not know if program will successfully finish, but if it
finishes they know that arithmetic gave correct results.

If your code is correct, and overflow never happens, then this is all a
big disadvantage in terms of understanding and analysing the code. And
it does not in any way reduce the effort needed to be sure that your
inputs are appropriate for getting the desired results of the operation.

One needs to use correct formulas, there is no way around that.
Without traps programmer must analyse ranges of all intermetiate
expressions. That is tedious and error prone. People work
around that by activating traps during testing, but it is
quite hard to find worst case values, so errors may be
easily missed during testing. Having traps active during
production runs means that you may discover problem. You
apparently think that ignoring possible problems at
runtime is good thing. For simple programs you may analyze
it well enough to be sure that nothing bad happens at
runtime, but in general computing we use a lot of "interesting"
programs which are too complex to analyse. We hope that
they will run OK, but have no proof. Sometimes hope is
based on statistical tests and on low probability input
program may fail. Traps are useful to make sure that
wrong results will not propagate further.

Trapping like this can certainly be useful for debugging. But as a
general feature it gives a false sense of security, complicates
mathematical analysis, introduces massive additional possible code path choices which are either real or almost certainly untested in practice,
or not real (because the compiler can see they are not taken) and untestable.

You get extra code paths only if you attempt to handle traps.
Trapping of overflows gives you assurance that in computation that
you did and which finished with no traps there were no errors of
certain kind (that is wrong results due to overflow). That is
really not different than insistence on static types. Neither
assures you of no bugs, but each tells you that some bugs
did not happen. Of course, trapping at runtime is less
satisfactory than compile time checking, but tight a priori
bounds on ranges are notoriusly hard to obtain, so trapping
is the best we can have for high performance software with
current state of art.

That is not qualitatively worse than "who knows what will
happen" UB, but it is not significantly better.

<snip>

The correct way to handle the situation is to avoid it - be sure that
you are not dividing by zero in the first place. Identify and handle
the problem where it occurs - when this zero is created, or the
circumstances leading to that point - rather than trying to do a
post-mortem after the failed division. And if you are doing that, then
what benefit is there in having trapping for division by zero? It
becomes just a waste of effort.

What is value of certification required for some software? If
programmer did good job then program will work correctly.

Yes.

Trap give assurance that programmer indeed correctly handled
tricky problem.

No, it certainly does not. And one of the reasons to dislike traps is
that it makes people think like that. A trap can only happen if the programmer did /not/ handle the problem correctly.

Yes.

And I expect that if
the programmer is able to write an appropriate specific trap handler for
the failing expression (rather than a program-global "crash with error message" handler), then he/she would be able to avoid the problem in the first place.

Rather non-specific trap handler could work as "redo the computation
in arbitrary precision". If problem (like division by zero) persists,
then there is logic bug, otherwise it means that precision was
inadequate and problem is resolved.

Howver, you should think about such traps similarly to parity error
which can be signaled by some hardware. There is low but nonzero
probablity that such error can occur. Parity check gives you
reasonable chance to detect it. Handling is at least as problematic
as with overflow. Absence of traps gives you less info: no
overflow traps mean no overflow, no parity traps means that
parity was correct, but intent of parity check it to discover bit
error and they are possible even with correct parity. So, do you
think that parity check inside MCU-s are useless?

Sometimes, of course, you are trying to write code that has some input
which is supposed to be correct, but you are not sure - and you can't
change the calling code. How you handle that situation will depend on
the program and the situation. But I don't see trapping as "correct handling" unless the whole program is written with the expectation of
traps for error handling. You might, however, end up deciding that
trapping is the least bad option.

And once you know that computation works
according to math rules other forms of verification are easier.

You also seem to have bias to real time control: if you need
value just at given moment, then it is hard to do something
reasonable. But at least in some control areas there is
notion of "safe state", for example working heavy machine
is dangerous, stopped one usually is considerd safe. If
there is safe state, then anything not expected by program
should trigger transition to safe state.

I think if you are /not/ concerned with high efficiency in the code,

Well, if efficiency does not matter traps can be implemented as
a software layer above the language. Or one can use arbitrary
precision arithmetic. Traps matter when efficiency matters,
so they should be implemented in place giving best efficiency,
at best in CPU and if that is not possible then in optimizing
compiler.

then you should be seriously questioning the choice of C as the language
in the first place. And even if you use C, there are often things you
can do to avoid having problems in the first place. The obvious one for integer overflow is to make more use of bigger types.

Which may be best choice if efficiency is not important. But
some calculations require surprisingly large accuracy to avoid
overflow. Worse, in vast majority of cases lower accuracy
may be adequate, so there is pressure to use "sufficient"
accuracy overlooking special cases.

In general computation, if you need correct value and have some
time there are options which may involve re-doing computation at
higher precistion, which may get rid of occasional overflows
and divisions by zero due to overflow. Division by zero may
be due to bad input data, traps allow indentification of
such data (doing it in other way may be computationaly quite
expensive).

--
Waldek Hebisch
--- Synchronet 3.22a-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Mon Jun 15 19:26:16 2026

From Newsgroup: comp.lang.c

Prefatory: I think we're largely in agreement; I'll just add a
few notes, but snip most of the rest.

In article <110n1db$3sbck$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

[snip]
Maybe there is scope for compilers to have better options for handling
old code, other than the usual "Use -O0 to avoid optimising on UB"
solution. You could come a long way with a "treat all variables as >volatile" flag, for example.

The problem is, the language doesn't make any guarantees here,
and the compilers get to decide. If you're lucky, the compiler
gives you some control via flags or pragmas or something, but
if you're not lucky, it doesn't and the guarantees you can rely
on are just too weak.

[snip]
That can certainly happen. But that's just bugs in the code. I don't
see why UB should be considered as something special here.

Because unlike many bugs, which are clearly bugs, UB is just the
absence of defined behavior. So the output of a program
executes can change in subtle ways with no changes to the code,
only changes to the compiler or how it is invoked.

People
making changes to existing code sometimes misunderstand things, or >accidentally break something that worked before. That's life as a >programmer, and there are techniques to reduce the risk - code reviews, >linters, testing regimes, etc. Nothing gives 100% guarantees, and >everything has to weigh risks, consequences, costs and resources. UB is
not special here.

Yes. My point with this line is that UB doesn't show up because
programmers are just careless, and "just write code more
carefully" doesn't scale any better than, "have you tried just
writing code without bugs?"

UB means precisely that I can choose trapping, or IB, or optimising on
the assumption it does not happen. If signed integer overflow were
defined as wrapping, then compilers could not put in traps to catch the >errors because as far as the language is concerned, they are not errors.

If "you" means the compiler, then sure. If "you" means the
programmer, then you are lucky if you get to choose that, but
it is not guaranteed that you will have that kind of flexibility
available.

If they are defined as causing traps, then that's the semantics -
compilers could not optimise code assuming overflow does not happen,
unless it can prove there is no overflow.

And making it defined behaviour gives programmers the mistaken idea that >they don't need to avoid overflow because there is no UB.

Making this UB is an admission of the blindingly obvious - there is no >correct answer when signed integer overflow occurs. It tells
programmers that it is a mistake to let your arithmetic overflow, and it >allows tools to help programmers avoid these mistakes, and it allows >compilers to give programmers the most efficient results from known good >code rather than adding unnecessary run-time checks that are never >triggered.

But it doesn't say that. It says, "no guarantees; whatever
happens happens."

This is the thing: the correct answer is whatever the language
defines it to be. The language could say, "this is an error" or
it could say, "we do whatever the hardware does." But making it
UB isn't a statement of anything. UB is a refusal to make a
statement.

[snip]
(I think you missed a bit of your answer here?)

(I did, but i was just going to say something about memcpy; it
wasn't that interesting. :-/)

[snip]
I realized that we were not speaking the same language _at all_.
He and I both wanted a language where we could write programs
that yield efficient object code. He saw UB as essential for
that; but what I want is a language with well-defined semantics
that can be aggressively optimized.

I too want a language with well-defined semantics that can be
aggressively optimised. But I do not see UB as a hinder to that.

UB is literally the opposite of well-defined.

I want good definitions of things that should be defined. Things that >cannot have good definitions, are fine left undefined. A language
standard should not be trying to define the behaviour of /everything/.

I accept that there will be some number of things that one
cannot reasonably define when creating a programming language.
But that set should be small;

I am happy knowing that I cannot divide by 0,

Yup. That should be a trap.

For some programs, yes. For others, no.

No. I don't accept that division by zero is ever acceptable in
a real program. What purpose would be served by _not_ trapping?
Most hardware will do it anyway.

or find the square root of a negative number (in the real
domain).

Yup. That should be a trap.

For some programs, yes. For others, no.

Same as above. If you want a NaN to be a possbility, you should
use an operation that lets you get that, `unchecked_sqrt()` or
something.

I am happy knowing that I cannot add two ints if their sum
overflows the range of their type,

Yup. That should be a trap (if you want wrapping semantics, you
should request it explicitly).

I agree that wrapping semantics should be something you have to ask for.
(As an aside, I think it is a mistake for languages to have types that
have wrapping semantics - it's the operations that should wrap, not the >types. Zig gets it right by distinguishing between "x + y" and "x +% y".)

Yes. Rust has this as well, in `.wrapping_add()` et al.

I don't want to pay the price for checks, traps, and limited
re-arrangements and optimisations when I know my expressions don't
overflow. But I am also happy to be able to get a trap when I ask for it.

Then the language should give you the ability to explicitly ask
for the unchecked versions of those operations.

But I think it is equally bad to give things a definition simply to be
able to say there is no UB.

I'm not suggesting that one should do that. What I'm saying is
that it is possible to conceive of a language that lets you
write robust, complex programs with strong guarantees about the
behavior of code, without UB. That doesn't mean that the
language is devoid of all notions of undefined behavior, but
rather that unless you ask for it, using UB is an error.

It is, IMHO, entirely /wrong/ of a language
to define integer overflow as wrapping simply so that it is not UB. I
do not see a guaranteed incorrect result that likely has catastrophic >consequences in a program as being better than UB.

We've discussed this before, and I understand your perspective
on it, but I feel it necessary to reiterate that I do not share
that perspective.

Defining arithmetic to be modular is perfectly acceptable. It
is not "wrong". Defining arithmetic on explicitly sized types
to use 2's complement semantics similarly. C defined arithmetic
overflow for signed types to be UB because when it was
standardized, machines existed that had different behavior and
representations for signed types. Why didn't they make it IB?
I don't know.

The world is different now.

(I believe Rust
defines integer overflow as trapping in "debug" mode and wrapping in >"release" mode, which I think is a horrendous idea.)

I agree that's kind of a wart. It's basically what you get with
UB in C.

In my opinion, the right call is providing an `unchecked_add`
and forcing the caller to wrap that in an `unsafe` block, while
normal `+` is always checked unless the compiler can deduce that
overflow cannot happen.

https://doc.rust-lang.org/std/primitive.u32.html#method.unchecked_add

So I was the one who said "well-defined semantics" and I had a
specific meaning in mind. Your definition is incomplete with
respect to that meaning: in addition to what you said, invalid
inputs should be rejected, either as a compile time error, or by
generating an exception or panic at runtime. If you want to
live dangerously and turn the runtime checks off for performance
reasons, then you get 2's complement behavior for integers or
whatever the machine does for the others.

I am all in favour of compile-time checks and rejecting code with errors >(not just UB) as soon as possible. The "perfect" language is one where
you really can follow the old Ada saying - if you can make it compile,
it's ready to ship.

I don't live dangerously by not having run-time checks on integer
overflows. I make sure my code does not have them, so checks are >unnecessary. For some of my code, if it "panicked" somewhere in >calculations, that would be a disaster - when you have code controlling >power electronics, a sudden stop can mean short-circuits and components >releasing their magic grey smoke.

This doesn't follow. If you have validated that the code cannot
overflow, and you are confident in that, then the code won't
panic due to overflow. So arguing against the validation seems
superfluous.

And of course if the compiler can validate that your code is
free of overflow (perhaps by examining your checks) then it
needn't insert the checks, so there is no runtime overhead.

Thinking that run-time checks will save you from UB is wishful thinking.
How are you going to have run-time checks that a pointer parameter
points to a valid object of the right type?

In strongly-typed languages with non-nullable references and
lifetimes as a first-class property of an object, the compiler
does that for you, statically, at compile-time.

You can check for a
null-pointer, but that's about it. Some things that are potential UB in
C are inherent in the type of language - checking for such problems (at >compile-time or run-time) needs a language that has a different way of >handling objects and pointers so that you cannot have arbitrary pointers
to arbitrary objects.

C is not a language suitable for such run-time or compile-time checks -

I agree.

it is a language for getting the highest efficiency because the
programmer takes responsibility for getting things right.

Paradoxically, this is not true. Consider pointers: because
they can be invalid, they have to be checked before dereference.
Contrast to non-nullable references in e.g. Rust; since their
mere existence implies that they refer to a valid object, they
do not need to be checked for nullity, misalignment, etc. Thus,
the better-defined language with stronger guarantees can afford
opportunities for optimization that don't exist in the
lower-level language riddled with UB.

You are
correct that large programs normally have bugs (of which UB is just one >class) - the risk of bugs goes up with the size of the code base. The >corollary is that C is not a language suitable for large programs.

Sadly, I now agree.

Rust, I think, reduces the risk of some kinds of bugs. So does C++,
when used carefully. Most code, however, is best written in languages
where these issues cannot occur - or at least where checks can be done >without a measurable impact. For example, if you use Python, you never
have integer overflow, and you never have invalid pointers.

If you use Rust, and restrict yourself as far as practical to
the safe subset, you never have invalid pointers, either. Nor
do you have uninitialized variables, or double-frees, or data
races. Entire categories of problems --- and their expensive
runtime checks --- are simply eliminated.

[snip]
Consider one of the examples you gave: signed integer overflow.
The standard doesn't say that you _can't_ add two numbers
together if you overflow, it just says that if you do, the
language imposes no requirements on the resulting behavior. It
may trap, it may elide the addition entirely, or it may do it
and let the result be whatever the underlying machine does.

That is, the _language_ does not say that it's a bug; it says
that it's not going to say anything about it at all.

I'd be happy for the C standard to say that signed integer overflow is a >bug, or that code is not allowed to overflow its integer arithmetic. I >would not be happy if it said compilers must trap on the bug or handle
it in some specific way - what happens when a bug is reached is still
UB. And if the wording of the standard were changed to call it a "bug" >rather than "UB", it would make absolutely zero difference to the way I >write my code.

This is an example of two people who are not sharing a
vocabulary around UB. I have no real commentary on that; I just
think it is interesting.

[snip]
In my field, people usually put a lot of effort into writing code simply
and clearly. You avoid mistakes not by being "clever", but by being >meticulous and careful. I don't think successful C programming requires >greater intellect, knowledge or experience compared to other programming >languages - but it /does/ require an appropriate attitude. You are
working with sharp knives - pay attention to what you are doing, and
you'll be fine.

50 years of experience shows us that that simply isn't true.
"Pay attention" and "be careful" just don't work.

My experience is that most programmers are highly intelligent,
capable people. They are not wrong to want behavior they can
rely on, particularly when things are not obvious, as they
often are not. They also want a language that requires a less
lawyerly read of to understand its semantics; that could go the
way of formality (my preferred approach) or just clearer
exposition. Either would be preferable to the current state.

I was avoiding signed integer overflow long before I had read any C >standards or even knew about the term "UB". Programming in C does not
need a lawyer knowledge of the language. It is just like programming in
any other programming language - use features that you know are correct,
and if you want to do something and don't know how to do so correctly,
look it up.

Right. But the issue is that the source of truth, the standard,
is ambiguous in places and opaque in others. Sussing out the
true semantics of a thing can be cross-referencing half a dozen
different places, and this newsgroup sees cases where people who
are clearly intelligent, and who have an aptitude for
programming in C, can disagree on the specific meaning of things
in the standard.

Frankly, I think much of that is a waste of time. Let's have
better definitions, and more rigorous exposition.

[snip]
Exactly. The footguns hiding in C code that has worked
perfectly for decades, dating back to before the standards
existed, are legion. Caveat emptor.

_Or_ the code may have been written with careful regard for the
standard, but something _else_ may have been changed that now
leads to exposure to UB. For example, perhaps code was written
that multiples two numbers, `a*b`; a known to be `unsigned int`
when written, but `b` is a signed int. But maybe that is hidden
behind a typedef; some time in the future, the typedef is
changed so that `a` is now `unsigned short`; perhaps someone
realized that the domain values never exceed 16 bits and by
changing the definition some critical structure now fits in a
single cache line. But also now the type promotion rules kick
so that `a*b` happens with the factors as `signed int` and in
there exist values of `a` and `b` where `a*b` overflows: UB.

The code had no UB; the change was elsewhere; no one saw this
because the tests all passed and everything looked ok; then
someone upgrades the compiler and now things break.

Who's fault is that?

There's no simple answer here.

But one thing is clear to me - "UB" is irrelevant here (and in many of
your points). It would not matter if everything had fully defined >behaviour. The point is that something is changed in one part of the
code that has unexpected consequences in another part of the code. Who >cares if there is UB or not? The issue is that the code does not work
as intended or expected. UB can provide situations where you have >unexpected bugs - but so can all sorts of other things.

UB is the essential characteristic here. With a better defined
language, these issues are either compile-time failures, or they
become immediately apparent during testing. In the face of
C-style UB, however, they become spooky action at a distance;
the realized effect of the change may not manifest as a bug for
many years.

And no, this is not contrived; this is exactly the sort of thing
that happens on large, long-lived projects.

As said earlier, C is what it is. I suspect that it will
continue to make incremental improvements, but we're basically
stuck with what we have.

Agreed.

...but be careful blaming the programmer.

Or the language, or the tools.

I push back on both of these.

There's an old saw that goes, "a good craftsman never blames his
tools." (I dislike it, but that's how it usually goes.)

But there's an unstated corollary: a good craftsman also
maintains and carefully selects the tools for the job at hand.
You don't smooth a rough-cut board with a screwdriver, nor do
you turn a bolt with a hammer. And you don't use a chainsaw
without a guard.

- Dan C.

--- Synchronet 3.22a-Linux NewsLink 1.2

From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Mon Jun 15 21:59:14 2026

From Newsgroup: comp.lang.c

On 14/06/2026 16:33, Dan Cross wrote:
...

Here's the problem that I have with this line of reasoning. C
is a language that has considerable history; there was a large
body of C code written before the first standard was ever
created, in 1988; C was a teenager. And it took many years for
decent quality ANSI C compilers to be ubiquitous. C could
legally drink by then.

"Undefined Behavior", in C, in the manner usually discussed in
this newsgroup, was introduced with the first standard. That
means that there is --- still --- a large body of software that
has "UB" that was put there before UB existed as a thing
programmers needed to worry about in C.

"undefined behavior", defined as "behavior ... for which this
international standard imposes no requirements" Was introduced by the
first standard. However, before there was a standard there was K&R C,
the closest thing they had to a standard. And though the phrase
"undefined behavior" was not in use, there was "behavior for which K&R C imposes no requirements". In fact, there was a great deal more of it,
since K&R C was not written as carefully and precisely as the first
standard, so it left a great deal more behavior that was "undefined by
omission of any relevant definition" than there was in the first standard.

--- Synchronet 3.22a-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Tue Jun 16 04:59:38 2026

From Newsgroup: comp.lang.c

In article <110qali$3q27m$1@dont-email.me>,
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:

On 14/06/2026 16:33, Dan Cross wrote:
...

Here's the problem that I have with this line of reasoning. C
is a language that has considerable history; there was a large
body of C code written before the first standard was ever
created, in 1988; C was a teenager. And it took many years for
decent quality ANSI C compilers to be ubiquitous. C could
legally drink by then.

"Undefined Behavior", in C, in the manner usually discussed in
this newsgroup, was introduced with the first standard. That
means that there is --- still --- a large body of software that
has "UB" that was put there before UB existed as a thing
programmers needed to worry about in C.

"undefined behavior", defined as "behavior ... for which this
international standard imposes no requirements" Was introduced by the
first standard. However, before there was a standard there was K&R C,
the closest thing they had to a standard. And though the phrase
"undefined behavior" was not in use, there was "behavior for which K&R C >imposes no requirements". In fact, there was a great deal more of it,
since K&R C was not written as carefully and precisely as the first
standard, so it left a great deal more behavior that was "undefined by >omission of any relevant definition" than there was in the first standard.

I am guessing that there was supposed to be a point in there
somewhere, but I can't find it.

- Dan C.

--- Synchronet 3.22a-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Tue Jun 16 10:10:21 2026

From Newsgroup: comp.lang.c

On 15/06/2026 19:57, Waldek Hebisch wrote:

David Brown <david.brown@hesbynett.no> wrote:

On 15/06/2026 12:43, Waldek Hebisch wrote:

David Brown <david.brown@hesbynett.no> wrote:

On 15/06/2026 00:55, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:

<snip>

[...]

Making this UB is an admission of the blindingly obvious - there is no >>>>>> correct answer when signed integer overflow occurs. It tells
programmers that it is a mistake to let your arithmetic overflow, and >>>>>> it allows tools to help programmers avoid these mistakes, and it
allows compilers to give programmers the most efficient results from >>>>>> known good code rather than adding unnecessary run-time checks that >>>>>> are never triggered.

Trapping or raising/throwing an exception on overflow would also be an >>>>> admission of the blindingly obvious.

It is obvious - to me, anyway - that signed overflow is a mistake in the >>>> code. It is trying to do something that cannot be done. What is the
single-digit sum of 5 and 8? There is no answer. The answer is not 3, >>>> or 9. Putting your hand in the air and asking the teacher for help
might be appropriate sometimes, but it is not a correct answer.

Throwing some kind of exception or trap can definitely be helpful at
times. And I agree that it would make it obvious that there has been a >>>> problem detected. But throwing exceptions or traps can cause more
problems (the Ariane 5 failure was caused by the exception handler, not >>>> the overflow fault). That does not mean it is better to ignore
overflows - it means there is no appropriate action that is suitable in >>>> every situation. I am far from convinced that there is even a
reasonable choice of default action that could be usefully made.

And a sufficiently clever compiler
can omit some (not all) checks in cases where it can be statically
proved that overflow doesn't occur, and/or hoist some checks out of
loops.

Sure - but in practice having strict overflow checks would significantly >>>> reduce optimisation and re-arrangement possibilities, as well as having >>>> to include the checks themselves. You might allow non-strict checks in >>>> some manner (thus allowing optimisations like "a + b - a" reducing to
just "b"), but I think that might be hard to specify and would reduce
the debugging help of the checks.

IMO resonable and easy definition is: computation either delivers
mathematically correct result or traps, and it is not allowed to
trap in cases where naive bottom-up evaluation does not trap.
In more formal way optimization is not allowed to introduce
stronger precondition, but may weaken it.

It is always the case that an implementation can weaken preconditions
and strengthen postconditions and remain correct - though it might then
be less efficient than you expect. But if you are /requiring/ a weaker
precondition and /requiring/ a strong postcondition - such as by
insisting on traps on overflow - you are changing the function or
operation specification, and it is not necessarily a good thing.

In C, the integer addition operation "c = a + b;" has a precondition :

(a + b) <= INT_MAX, (a + b) >= INT_MIN

It has the postcondition :

c == a + b

Saying that it must trap if there is overflow weakens the precondition
to any "a" and "b", but makes the postcondition much more complicated.

No. Precondition is the same. Postcondition has additional term "computation finished with no traps".

That's back where we started, with no defined behaviour if "a + b" is
too big - that is the specification for normal C addition. When you say
that addition should either deliver the correct result for suitable "a"
and "b", and trap for other values, you now have an operation that
accepts any "a" and "b", and has a postcondition that includes traps.
You have changed the function, and changed its specification,
pre-conditions and post-conditions.

It means it is no longer true that the result of an addition operation
is the sum of the operands.

Oposite of that: no traps means that regardless of precondition
the result of an addition operation is the sum of the operands.

Your change means that either the result is no traps and a correct sum,
/or/ it is a trap and no valid sum (what you get returned as the "sum"
will depend on how you define all this).

I think, perhaps, what you mean here is that if you do something like "x
= a + b;", and the execution makes it through the addition and does the assignment, then "x" is guaranteed to be equal to the sum of "a" and
"b". That is fair enough - without such guarantees, traps, exceptions,
etc., would be a completely useless concept.

Addition is no longer a "pure" function -
now it has side-effects that are completely unpredictable at the site of
use. Programmers can no longer rely on the timing of the operation,
stack usage, interaction with other code, or even that the operation
ever finishes.

The difference is that without traps programmers do not know if
arithmetic operations give correct result.

They do know - if the code is written correctly. They know the result
is correct because they know they have fulfilled the pre-conditions. It
is the caller code that has the responsibility to make sure the
pre-conditions hold.

If the programmer does not know if the pre-conditions will hold before
the call, then they don't know what their code will do. And that is not
a good situation to be in - the possibility of some unknown jump to
somewhere else in the code does not make it better.

Note that all of this is different from run-time failures that might
occur in the normal course of the program, outside of the knowledge or
control of the calling code. C++ exceptions, or C error return codes,
are fine for things like a "read file" function in the case when the
file does not exist. That is not the result of a bug in the code.
(Well, it might be, but it doesn't have to be.) It is an expected
situation that can be handled.

Traps on UB are unexpected situations resulting from bugs in code. They
can be helpful for fault-finding, and may have some uses in damage
limitation.

With traps they do
not know if program will successfully finish, but if it
finishes they know that arithmetic gave correct results.

This is achievable in a controlled manner, without traps.

If your code is correct, and overflow never happens, then this is all a
big disadvantage in terms of understanding and analysing the code. And
it does not in any way reduce the effort needed to be sure that your
inputs are appropriate for getting the desired results of the operation.

One needs to use correct formulas, there is no way around that.
Without traps programmer must analyse ranges of all intermetiate
expressions. That is tedious and error prone.

Then do a better job of it - or find ways that are not as tedious.

The main reasons for getting integer overflow are :

1. Using unsanitised input.

2. Using types that are too small.

3. Not having a clear idea of what kinds of values you are dealing with,
and what you are doing with them.

The way to avoid 1 is obvious. The way to avoid 2 is obvious (except in
the very rare situations where 64-bit integers are not big enough). The
way to avoid 3 is obvious. (Sometimes the details of implementing these
fixes are not minor, but the principle is clear.)

People work
around that by activating traps during testing, but it is
quite hard to find worst case values, so errors may be
easily missed during testing. Having traps active during
production runs means that you may discover problem. You
apparently think that ignoring possible problems at
runtime is good thing.

No, ignoring problems is never a good thing. Writing code that doesn't
run the risk of problems is a good thing.

And I can agree that sometimes leaving traps enabled in released code
can be helpful - there are situations where you can't practically remove
the risk of overflows, and it is better to crash out reliably than risk running on with faulty data. It is, however, also the case that
sometimes traps will cause far more problems than incorrect data would. (Noting that UB does not guarantee "incorrect data" - it can do
anything. Wrapping semantics, or unspecified value semantics, would do
that.)

For simple programs you may analyze
it well enough to be sure that nothing bad happens at
runtime, but in general computing we use a lot of "interesting"
programs which are too complex to analyse. We hope that
they will run OK, but have no proof. Sometimes hope is
based on statistical tests and on low probability input
program may fail. Traps are useful to make sure that
wrong results will not propagate further.

This is why you break your code down into manageable and understandable
parts - functions, classes (for some languages), modules / translation
units, files, directories, libraries. Yes, there can be interactions
that can be very difficult to test well - testing is not easy.

Code over a certain size is likely to contain bugs - programmers are
rarely infallible, and even when they are ( :-) ), the customer
specifying the program is not.

But we are talking here about a specific class of bugs - UB that can be detected by trap options in code generation or cpu hardware, which
basically means integer overflows, divide by 0, dereferencing null
pointers, and shift by inappropriate amounts. Those bugs are avoidable
- I really do not see them as a concern. Trapping won't help all the
other bugs - buffer overflows, unterminated strings, index out of range, misunderstanding the specifications, mixing up parameter order in
function calls, data races, logical errors, memory resource ownership
mixups, and everything else.

So your traps on arithmetic overflow is crippling the efficiency of calculations (and efficiency of calculations is a big reason for picking
C in the first place) to give unexpected crashes when easily preventable mistakes occur - while doing nothing to aid the big risks.

Trapping like this can certainly be useful for debugging. But as a
general feature it gives a false sense of security, complicates
mathematical analysis, introduces massive additional possible code path
choices which are either real or almost certainly untested in practice,
or not real (because the compiler can see they are not taken) and
untestable.

You get extra code paths only if you attempt to handle traps.

Unhandled traps are also a code path.

Trapping of overflows gives you assurance that in computation that
you did and which finished with no traps there were no errors of
certain kind (that is wrong results due to overflow). That is
really not different than insistence on static types.

They are not remotely the same - the distinction between compile-time
and runtime is critical.

Neither
assures you of no bugs, but each tells you that some bugs
did not happen. Of course, trapping at runtime is less
satisfactory than compile time checking, but tight a priori
bounds on ranges are notoriusly hard to obtain, so trapping
is the best we can have for high performance software with
current state of art.

That is not qualitatively worse than "who knows what will
happen" UB, but it is not significantly better.

<snip>

The correct way to handle the situation is to avoid it - be sure that
you are not dividing by zero in the first place. Identify and handle
the problem where it occurs - when this zero is created, or the
circumstances leading to that point - rather than trying to do a
post-mortem after the failed division. And if you are doing that, then >>>> what benefit is there in having trapping for division by zero? It
becomes just a waste of effort.

What is value of certification required for some software? If
programmer did good job then program will work correctly.

Yes.

Trap give assurance that programmer indeed correctly handled
tricky problem.

No, it certainly does not. And one of the reasons to dislike traps is
that it makes people think like that. A trap can only happen if the
programmer did /not/ handle the problem correctly.

Yes.

And I expect that if
the programmer is able to write an appropriate specific trap handler for
the failing expression (rather than a program-global "crash with error
message" handler), then he/she would be able to avoid the problem in the
first place.

Rather non-specific trap handler could work as "redo the computation
in arbitrary precision". If problem (like division by zero) persists,
then there is logic bug, otherwise it means that precision was
inadequate and problem is resolved.

If you are talking here about using traps as a testing and debugging
aid, helping the developer spot problems and improve their code, then I
agree - that's a good thing.

If you are talking about some kind of automatic handling, then that is
totally out of scope for a language like C. It would be much more
appropriate to use a higher level managed language and higher level
arithmetic (like support for arbitrary precision integers) in the first
place.

Howver, you should think about such traps similarly to parity error
which can be signaled by some hardware. There is low but nonzero
probablity that such error can occur. Parity check gives you
reasonable chance to detect it.

That's not an unreasonable comparison. Parity checks used to be popular
- they are almost non-existent in communication protocols now. You
either have something that you know works correctly, or you use much
better methods - multiple ECC bits, CRCs, FEC, or whatever, according to
the balance of cost, error rates, consequences of data loss, etc.

Handling is at least as problematic
as with overflow. Absence of traps gives you less info: no
overflow traps mean no overflow, no parity traps means that
parity was correct, but intent of parity check it to discover bit
error and they are possible even with correct parity. So, do you
think that parity check inside MCU-s are useless?

Yes, for the most part. A parity check is almost always either
unnecessary, or not nearly enough.

Sometimes, of course, you are trying to write code that has some input
which is supposed to be correct, but you are not sure - and you can't
change the calling code. How you handle that situation will depend on
the program and the situation. But I don't see trapping as "correct
handling" unless the whole program is written with the expectation of
traps for error handling. You might, however, end up deciding that
trapping is the least bad option.

And once you know that computation works
according to math rules other forms of verification are easier.

You also seem to have bias to real time control: if you need
value just at given moment, then it is hard to do something
reasonable. But at least in some control areas there is
notion of "safe state", for example working heavy machine
is dangerous, stopped one usually is considerd safe. If
there is safe state, then anything not expected by program
should trigger transition to safe state.

I think if you are /not/ concerned with high efficiency in the code,

Well, if efficiency does not matter traps can be implemented as
a software layer above the language. Or one can use arbitrary
precision arithmetic. Traps matter when efficiency matters,
so they should be implemented in place giving best efficiency,
at best in CPU and if that is not possible then in optimizing
compiler.

then you should be seriously questioning the choice of C as the language
in the first place. And even if you use C, there are often things you
can do to avoid having problems in the first place. The obvious one for
integer overflow is to make more use of bigger types.

Which may be best choice if efficiency is not important. But
some calculations require surprisingly large accuracy to avoid
overflow. Worse, in vast majority of cases lower accuracy
may be adequate, so there is pressure to use "sufficient"
accuracy overlooking special cases.

In general computation, if you need correct value and have some
time there are options which may involve re-doing computation at
higher precistion, which may get rid of occasional overflows
and divisions by zero due to overflow. Division by zero may
be due to bad input data, traps allow indentification of
such data (doing it in other way may be computationaly quite
expensive).

--- Synchronet 3.22a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sun Jun 21 14:26:08 2026

From Newsgroup: comp.lang.c

antispam@fricas.org (Waldek Hebisch) writes:

Dan Cross <cross@spitfire.i.gajendra.net> wrote:

In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

and in fact
it *won't* occur during execution because foo() isn't called.
A compiler can't generate code with arbitrary behavior just
because it can't prove that there will be no UB. If it could,
every signed or floating-point arithmetic operation with unknown
operand values would grant the same permission.

But that's not the situation here. The situation is that the
compiler can prove that something _is_ UB.

In the program quoted at the top of this post, the UB occurs in
a function foo() that's never called. A compiler can replace the
body of foo() with a trap, and it can certainly warn about the UB,
but I don't believe it can reject the entire program. A clever
compiler could prove that the UB never occurs.

So there are two things that are at play here.

First, this notion that UB is _only_ a runtime matter. The text
of the standard contradicting that aside, if a translator can
detect that the behavior of a construct is provably undefined if
executed, then it seems axiomatic that UB is clearly something
that plays a role at translation time, as well.

I think that this paragraph (and several other it this post and
other posts) represent fundamental misanderstanding. This may
be due to the way C standard is written. AFAIK Extended Pascal
standard (once you translate terminalogy) states the same things as
C about UB, but in clearer way. Some relevant parts below:

: 3.1 Dynamic-violation
: A violation by a program of the requirements of this International
: Standard that a processor is permitted to leave undetected up to,
: but not beyond, execution of the declaration, definition, or
: statement that exhibits (see clause 6) the dynamic-violation.

: 3.2 Error
: A violation by a program of the requirements of this International
: Standard that a processor is permitted to leave undetected.
...
: 5.1 Processors
...
: e) be able to determine whether or not the program violates any
: requirements of this International Standard, where such a
violation is : not designated an error or dynamic-violation,
...

: 5.2 Programs
...
: b) if it conforms at level 1, use only those features of the
language : specified in clause 6;

UB in C standard corresponds with 'error' in Pascal standard. [...]

Does it? In C a syntax error is undefined behavior, but it
requires a diagnostic. (I don't mean to single out just syntax
errors; there are other examples.)
--- Synchronet 3.22a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sun Jun 21 15:26:35 2026

From Newsgroup: comp.lang.c

antispam@fricas.org (Waldek Hebisch) writes:

[...]

I think that lawyerish style of current C standard is mostly
inertia,

I wouldn't use a term like lawyerish to describe the text in the
ISO C standard. Can you explain what quality you mean to ascribe
to "lawyerish" writing in the C standard without using any term
related to lawyering or legal documents?

and making standard more mathematical would improve it.

Could you elaborate on that statement? In what ways would giving
a more mathematical treatment of C semantics improve the quality
of the ISO C document? How would doing that advance the stated
purposes or goals of the C standard?

But giving formal semantic in the standard would mean
significantly bigger change.

Due to the nature of C, I believe it is effectively impossible to
give a formal mathematical definition of the semantics of C. Do
you think such a thing is feasible or practicable? If so can you
explain the reasoning behind your thinking?
--- Synchronet 3.22a-Linux NewsLink 1.2

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Mon Jun 22 03:40:56 2026

From Newsgroup: comp.lang.c

Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

antispam@fricas.org (Waldek Hebisch) writes:

[...]

I think that lawyerish style of current C standard is mostly
inertia,

I wouldn't use a term like lawyerish to describe the text in the
ISO C standard. Can you explain what quality you mean to ascribe
to "lawyerish" writing in the C standard without using any term
related to lawyering or legal documents?

Sorry no, I can not. My point is that you need to treat
C standard almost like legal document and I can not explain
this without using proper terminology.

and making standard more mathematical would improve it.

Could you elaborate on that statement? In what ways would giving
a more mathematical treatment of C semantics improve the quality
of the ISO C document? How would doing that advance the stated
purposes or goals of the C standard?

There are many aspect of mathematical treatment. One is care
about terminology, namely that terms are either reasonably
clearly marked as "primitve" (and assumed to be understood
by readers) or are precisely defined. Related is that
words can be taken as written, without needing to look at
intent or similar legal style arguments. You may think that
C standard already posseses such properties, but recent
example, that is definition of expression nicely illustrates
current problems. With mathematical treatment expression
would be part of C program derived from corresponding
grammar rule and that would resolve the problem. In the
past in this group there were several discussions about
various parts of C standard, and there were cases were
standard wording looked genuinly confusing. I am not
prepared to dig into those discussions, but my impression
was that in some cases mathematical treatment would make
things clearer.

But giving formal semantic in the standard would mean
significantly bigger change.

Due to the nature of C, I believe it is effectively impossible to
give a formal mathematical definition of the semantics of C. Do
you think such a thing is feasible or practicable? If so can you
explain the reasoning behind your thinking?

I think that this is possible given dedicated team of qualified
people doing the work. I do not know if it is practically
possible to assemble needed team. I already mentioned axiomatic
semantics. There is C grammar and we need to assign semantics
to various production rules. We do this assigning precondtions
and postcondtions to the rules. In much simpler cases this
was done. C is bigger language and rules are more complicated,
but that for me looks like quantitive problem, that is there is
more work and result will be bigger. Clearly, this would
require buy-in from the standard body. Namely, formalization
is likely to uncover many unclear places in C standard and
ensuring that formalization matches the standard would require
resolution by the standard body. It is quite possible that
standard body would refuse to cooperate. To explain this more,
let me mention past discussion about Extended Pascal in a
different forum. I was looking at types of constants, but in
specific case rules looked contradictory, so I asked a
question. One response was from former commitee member (this
was several years after Pascal standard was ratified), he
basicaly said that type of constants does not matter. Which
was mostly true, but my reason for asking the question was
that validity of programs depended on types of constants.
Something similar may happen during formalization:
formalization may discover unclear places in C standard
which C commitee considers irrelevant in practice and
refuses to clarify.

BTW: Authors of some tools already need and have formal
semantics for language rather close to C. Namely,
Comp-Cert compiler is matching conditions in source
code with machine code and for that it needs reasonably
good aproximation to formal semantics of language implemented
by C compiler (more precisely gcc). Microsoft developed
formal checking tools and that too needs formal semantics.
But since goals are different neither give semantics of
standard C.
--
Waldek Hebisch
--- Synchronet 3.22a-Linux NewsLink 1.2

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Mon Jun 22 03:56:24 2026

From Newsgroup: comp.lang.c

Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

antispam@fricas.org (Waldek Hebisch) writes:

Dan Cross <cross@spitfire.i.gajendra.net> wrote:

In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

and in fact
it *won't* occur during execution because foo() isn't called.
A compiler can't generate code with arbitrary behavior just
because it can't prove that there will be no UB. If it could,
every signed or floating-point arithmetic operation with unknown
operand values would grant the same permission.

But that's not the situation here. The situation is that the
compiler can prove that something _is_ UB.

In the program quoted at the top of this post, the UB occurs in
a function foo() that's never called. A compiler can replace the
body of foo() with a trap, and it can certainly warn about the UB,
but I don't believe it can reject the entire program. A clever
compiler could prove that the UB never occurs.

So there are two things that are at play here.

First, this notion that UB is _only_ a runtime matter. The text
of the standard contradicting that aside, if a translator can
detect that the behavior of a construct is provably undefined if
executed, then it seems axiomatic that UB is clearly something
that plays a role at translation time, as well.

I think that this paragraph (and several other it this post and
other posts) represent fundamental misanderstanding. This may
be due to the way C standard is written. AFAIK Extended Pascal
standard (once you translate terminalogy) states the same things as
C about UB, but in clearer way. Some relevant parts below:

: 3.1 Dynamic-violation
: A violation by a program of the requirements of this International
: Standard that a processor is permitted to leave undetected up to,
: but not beyond, execution of the declaration, definition, or
: statement that exhibits (see clause 6) the dynamic-violation.

: 3.2 Error
: A violation by a program of the requirements of this International
: Standard that a processor is permitted to leave undetected.
...
: 5.1 Processors
...
: e) be able to determine whether or not the program violates any
: requirements of this International Standard, where such a
violation is : not designated an error or dynamic-violation,
...

: 5.2 Programs
...
: b) if it conforms at level 1, use only those features of the
language : specified in clause 6;

UB in C standard corresponds with 'error' in Pascal standard. [...]

Does it? In C a syntax error is undefined behavior, but it
requires a diagnostic. (I don't mean to single out just syntax
errors; there are other examples.)

I mean typical UB, especialy cases that people complain about.
It does not help that C uses the same term in few other cases,
which are really different.
--
Waldek Hebisch
--- Synchronet 3.22a-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Wed Jun 24 16:58:36 2026

From Newsgroup: comp.lang.c

I'm entering this sub-thread late and yet I haven't finished reading
all posts. - While I lately noticed some convergence of opinions and
facts this post appears to me to mostly fall back again; but I won't
re-open the discussion. I just want to comment on a single exposition.

On 2026-06-15 10:09, David Brown wrote:

On 15/06/2026 00:55, Keith Thompson wrote:

[...]

[...]

Throwing some kind of exception or trap can definitely be helpful at times.-a And I agree that it would make it obvious that there has been a problem detected.-a But throwing exceptions or traps can cause more
problems (the Ariane 5 failure was caused by the exception handler, not
the overflow fault).-a That does not mean it is better to ignore
overflows - it means there is no appropriate action that is suitable in every situation.-a I am far from convinced that there is even a
reasonable choice of default action that could be usefully made.

(I don't expect the complete investigation report on the Ariane 5
incident being represented or explained, but picking a few facts is
not only an oversimplification here, it lead to a misrepresentation
of the case and inappropriate reasoning and conclusions.)

Throwing an exception in a system that should have a well-defined and
safe behavior is of course stupid. Exceptions are there to catch them
and handle them with appropriate actions to mitigate or fix any issue.

Not UB, but well defined software and well defined system behavior is
the key! That should be not only in aviation and life-critical systems
but (ideally) also in "ordinary" software development with used tools.

The problem with the Ariane 5 was a sequence and combination of events.

But the _primary cause_ had not been the [technical] interrupt. It was
the fact that the *requirements* (the flight trajectories) changed from
Ariane 4 to Ariane 5 and that they didn't adjust the system accordingly
but just re-used formerly designed system components unchanged.

(This actually reminds (or resembles?) more the case that Dan narrated;
of using old software systems with new tools, that "unexpectedly" fails
in a new compiler-environment, because of a component in another place
that was just "invisible" at the place where the problem got triggered.)

In retrospect it is clear that all the software components with their
contracts should have been double-checked against the (new) Ariane 5 requirements - that hadn't been done and that was the problem source!

(There's a reason why they use Ada and not "C" in such areas; the
rocket might otherwise have exploded on the launching-ramp already. ;-)

Janis

[...]

--- Synchronet 3.22a-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Wed Jun 24 17:45:22 2026

From Newsgroup: comp.lang.c

On 2026-06-16 10:10, David Brown wrote:

On 15/06/2026 19:57, Waldek Hebisch wrote:

[...]

No, ignoring problems is never a good thing.-a Writing code that doesn't
run the risk of problems is a good thing.

Sure.

And I can agree that sometimes leaving traps enabled in released code
can be helpful - there are situations where you can't practically remove
the risk of overflows, and it is better to crash out reliably than risk running on with faulty data.-a It is, however, also the case that
sometimes traps will cause far more problems than incorrect data would. (Noting that UB does not guarantee "incorrect data" - it can do
anything.-a Wrapping semantics, or unspecified value semantics, would do that.)

Hmm.. - not sure what you mean (and imply with) "crash out reliably".

Having been engaged in server systems software development a crash
had never been an accepted option. And that's certainly also true
with life-critical applications and costly operations (upthread you
had mentioned Ariane 5). You should always avoid crashes and catch
exceptions. The point is what you can then do with that information,
and that depends on the actual application case; report it, retry it,
retry with alternative methods or adapted conditions, emulate the
result, estimate it, ask supervisor process, switch devices, etc.

I'm well aware that wrong data may also be bad, be it from a wrong
algorithms, a technical overflow situation, unreliable data sources,
or an unreliable processing (not-excluding effects of UB).

I'm really not sure whether to consider "not handling an exception"
better or worse than "not handling data errors"; usually you don't
want either. So both should prevented (if possible) or acted upon
(if getting a notice about it).

Janis

[...]

--- Synchronet 3.22a-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Wed Jun 24 12:27:47 2026

From Newsgroup: comp.lang.c

On 6/24/2026 8:45 AM, Janis Papanagnou wrote:

On 2026-06-16 10:10, David Brown wrote:

On 15/06/2026 19:57, Waldek Hebisch wrote:

[...]

No, ignoring problems is never a good thing.-a Writing code that
doesn't run the risk of problems is a good thing.

Sure.

And I can agree that sometimes leaving traps enabled in released code
can be helpful - there are situations where you can't practically
remove the risk of overflows, and it is better to crash out reliably
than risk running on with faulty data.-a It is, however, also the case
that sometimes traps will cause far more problems than incorrect data
would. (Noting that UB does not guarantee "incorrect data" - it can do
anything.-a Wrapping semantics, or unspecified value semantics, would
do that.)

Hmm.. - not sure what you mean (and imply with) "crash out reliably".

Having been engaged in server systems software development a crash
had never been an accepted option. And that's certainly also true
with life-critical applications and costly operations (upthread you
had mentioned Ariane 5). You should always avoid crashes and catch exceptions.

Right. Also, fwiw, I had a calibration system for my server framework
that would artificially crash a system while keep logs. On reboot, it
read the results and self calibrated itself.

The point is what you can then do with that information,

and that depends on the actual application case; report it, retry it,
retry with alternative methods or adapted conditions, emulate the
result, estimate it, ask supervisor process, switch devices, etc.

I'm well aware that wrong data may also be bad, be it from a wrong algorithms, a technical overflow situation, unreliable data sources,
or an unreliable processing (not-excluding effects of UB).

I'm really not sure whether to consider "not handling an exception"
better or worse than "not handling data errors"; usually you don't
want either. So both should prevented (if possible) or acted upon
(if getting a notice about it).

Janis

[...]

--- Synchronet 3.22a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Jun 29 05:41:10 2026

From Newsgroup: comp.lang.c

cross@spitfire.i.gajendra.net (Dan Cross) writes:

[...]

"Undefined Behavior", in C, in the manner usually discussed in
this newsgroup, was introduced with the first standard.

The term but not the concept, which was there since the
early days of C -- at least since K&R in 1978, and very
likely earlier (I haven't reviewed any of the earlier
descriptions of the language).
--- Synchronet 3.22a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Jun 29 06:27:13 2026

From Newsgroup: comp.lang.c

antispam@fricas.org (Waldek Hebisch) writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

antispam@fricas.org (Waldek Hebisch) writes:

[...]

UB in C standard corresponds with 'error' in Pascal standard. [...]

Does it? In C a syntax error is undefined behavior, but it
requires a diagnostic. (I don't mean to single out just syntax
errors; there are other examples.)

I mean typical UB, especialy cases that people complain about.
[...]

Then you should say what you mean, rather than leaving it
for other people to guess.
--- Synchronet 3.22a-Linux NewsLink 1.2

From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Mon Jun 29 15:23:31 2026

From Newsgroup: comp.lang.c

In article <86mrwd7c49.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

[...]

"Undefined Behavior", in C, in the manner usually discussed in
this newsgroup, was introduced with the first standard.

The term but not the concept, which was there since the
early days of C -- at least since K&R in 1978, and very
likely earlier (I haven't reviewed any of the earlier
descriptions of the language).

How much time elapsed before your response?

If you cannot respond in a timely manner (read: within a week),
then please do not respond at all.

That said, not really. I've read K&R, both editions, and the
first really doesn't define a concept that gives such supreme
latitude to the compiler. They merely acknowledged that there
existed things for which they could not give a good behavioral
definition. The way that UB is defined and used in 2026 was
absent in K&R in 1978.

- Dan C.

--- Synchronet 3.22a-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Geek2
  Thu Jul 2 11:41:05 2026
  from Euclid, Oh via Telnet
- Hannibal
  Thu Jul 2 05:49:27 2026
  from Des Moines via SSH
- Geek2
  Wed Jul 1 16:31:20 2026
  from Euclid, Oh via Telnet
- Hannibal
  Tue Jun 30 16:45:42 2026
  from Des Moines via SSH

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	70
Nodes:	6 (0 / 6)
Uptime:	37:38:46
Calls:	948
Calls today:	2
Files:	1,325
Messages:	280,462

Re: Constants and undefined behavior

Who's Online

Recent Visitors

System Info