Sysop: | Amessyroom |
---|---|
Location: | Fayetteville, NC |
Users: | 23 |
Nodes: | 6 (0 / 6) |
Uptime: | 52:39:59 |
Calls: | 583 |
Files: | 1,139 |
D/L today: |
179 files (27,921K bytes) |
Messages: | 111,617 |
In the SEI CERT C Soding Standards we read:
"According to the C Standard, Annex J, J.2 [ISO/IEC 9899:2024],
the behavior of a program is undefined in the circumstances outlined
in the following table."
The table has 221 numbered cases and can be found here:
<https://wiki.sei.cmu.edu/confluence/display/c/CC.%2BUndefined%2BBehavior>
According to the C Standard Committee (paraphrasing) "You may eat
from any tree in the garden of coding, except for any of the 221
trees of undefined behaviour. If you eat from any of the 221 trees
of undefined behaviour your program may die, either immediately or at
some unspecified time in the future, or may do absolutely anything at
any future time. You must study the Book of the Knowledge of Defined
and Undefined (the 758 page C23 standard document) to learn exactly
how to recognise each of the 221 trees of undefined behaviour.
Please pay the cashier $250.00 to purchase a copy of the Book
of the Knowledge of Defined and Undefined".
[When a language is 50 years old and there is a mountain of legacy code that >they really don't want to break, it accumulates a lot of cruft.
On the other hand, there's the python approach in which they deprecate and >remove little used and crufty features, but old python code doesn't work any >more unless you go back and update it every year or two. -John]
On 2025-08-20, Martin Ward <mwardgkc@gmail.com> wrote:
In the SEI CERT C Soding Standards we read:
"According to the C Standard, Annex J, J.2 [ISO/IEC 9899:2024],
the behavior of a program is undefined in the circumstances outlined
in the following table."
The table has 221 numbered cases and can be found here:
<https://wiki.sei.cmu.edu/confluence/display/c/CC.%2BUndefined%2BBehavior> >>
According to the C Standard Committee (paraphrasing) "You may eat
from any tree in the garden of coding, except for any of the 221
trees of undefined behaviour. If you eat from any of the 221 trees
of undefined behaviour your program may die, either immediately or at
some unspecified time in the future, or may do absolutely anything at
any future time. You must study the Book of the Knowledge of Defined
and Undefined (the 758 page C23 standard document) to learn exactly
how to recognise each of the 221 trees of undefined behaviour.
Please pay the cashier $250.00 to purchase a copy of the Book
of the Knowledge of Defined and Undefined".
The list is incomplete.
All platform specific headers and functions are effectively documented extensions replacing undefined behavior, which another impelmentation could neglect to define, or define arbitrarily (including in evil ways).
Once you grok the fact that almost real work in C takes place via undefined behavior (very few programs are maximally portable and strictly conforming) you
stop sweating it.
When a language is 50 years old and there is a mountain of legacy code that
they really don't want to break, it accumulates a lot of cruft. If we were starting now we'd get something more like Go.
On the other hand, there's the python approach in which they deprecate and remove little used and crufty features, but old python code doesn't work any more unless you go back and update it every year or two. -John]
Under "4. Conformance", the C standards says :
"""
If a "shall" or "shall not" requirement that appears outside of a
constraint or runtime-constraint is violated, the behavior is undefined.
Undefined behaviour is otherwise indicated in this International
Standard by the words "undefined behavior" or by the omission of any
explicit definition of behavior. There is no difference in emphasis
among these three; they all describe "behavior that is undefined".
"""
So no list could ever be complete here, since anything whose behaviour
is not defined in the C standards is undefined behaviour. I have
always found that slightly at odds with the definition under "3.
Terms, definitions, and symbols" of "behavior, upon use of a
nonportable or erroneous program construct or of erroneous data, for
which this International Standard imposes no requirements". In my
mind, things like externally defined functions (used correctly) could
be considered UB by the section 4 definitions but not by the section 3 definitions.
UB (both definitions) is an essential part of all programming languages
- after all, if you have a bug in your code, you have UB, and no
programming language has made it impossible to write bugs in your code.
C just has some things that are undefined in C but defined in some other languages, and it is a bit more open and honest about UB than many
language definitions.
David Brown <david.brown@hesbynett.no> writes:
[...]
Under "4. Conformance", the C standards says :
"""
If a "shall" or "shall not" requirement that appears outside of a
constraint or runtime-constraint is violated, the behavior is undefined.
Undefined behaviour is otherwise indicated in this International
Standard by the words "undefined behavior" or by the omission of any
explicit definition of behavior. There is no difference in emphasis
among these three; they all describe "behavior that is undefined".
"""
So no list could ever be complete here, since anything whose behaviour
is not defined in the C standards is undefined behaviour. I have
always found that slightly at odds with the definition under "3.
Terms, definitions, and symbols" of "behavior, upon use of a
nonportable or erroneous program construct or of erroneous data, for
which this International Standard imposes no requirements". In my
mind, things like externally defined functions (used correctly) could
be considered UB by the section 4 definitions but not by the section 3
definitions.
I don't see an inconsistency.
A C program that includes a non-standard header that's not part of
the program (e.g., `#include <windows.h>`) and calls a function
declared in that header has undefined behavior as far as the C
standard is concerned. The program could be compiled in a conforming environment that has its own <windows.h> header with a declaration
for different implementation of the same name.
That's undefined behavior under both the section 3 definition (use
of a nonportable program construct) and the section 4 definition
(the omission of any explicit definition of behavior).
[...]
UB (both definitions) is an essential part of all programming languages
- after all, if you have a bug in your code, you have UB, and no
programming language has made it impossible to write bugs in your code.
C just has some things that are undefined in C but defined in some other
languages, and it is a bit more open and honest about UB than many
language definitions.
No, a bug in your code is not necessarily undefined behavior. It could easily be code whose behavior is well defined by the language standard,
but that behavior isn't what the programmer intended.
Martin Ward <mwardgkc@gmail.com> writes:
[actually, John Levine writes:]
[When a language is 50 years old and there is a mountain of legacy code that >> they really don't want to break, it accumulates a lot of cruft.
But there is a very vocal group of people who argue that programs that exercise undefined behaviour are already broken (and they often use
stronger words that that) and that compilers are allowed to (and
should) compile them to code that behaves differently than earlier
compilers that the new compiler supposedly is just a new version of.
So according to this argument, when something that the legacy code
does is declared as undefined behaviour, this breaks this program.
And the practice is that the people in C compiler maintenance reject
bug reports as RESOLVED INVALID when the code exercises undefined
behaviour, even when the code works as intended in earlier versions of
the compiler and when the breakage could be easily fixed (e.g., for <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66804> and <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65709> by using movdqu
instead of movdqa).
But they not always do so: The SATD function from the SPEC benchmark 464.h264ref exercises undefined behaviour, and a pre-release version
of gcc-4.8 generated code that did not behave as intended. The
release version of gcc-4.8 compiled 464.h264ref as intended (but later
a similar case that was not in a SPEC program <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66875> was rejected as
RESOLVED INVALID).
When I brought this up, the reactions reached from
flat-out denial that it ever happened (despite it being widely
publicized <https://lwn.net/Articles/544123/>) through a claim that
the "optimization" turned out to have no benefit (and yet the similar
case mentioned above still was "optimized" in a later gcc version) to
a statement along the lines that 464.h264ref is a relevant benchmark.
The last reaction seems to be the most plausible to me. The people
working on the optimizers tend to evaluate their performance on a
number of benchmarks, i.e., "relevant benchmarks", and of course these benchmarks must be compiled as intended, so that's what happens. My
guess is that "relevant benchmarks" are industry standard benchmarks
like SPEC, but also programs coming from paying customers.
They also have their test suites of programs for regression testing,
and any behavioural change in these programs that is visible in this regression testing probably leads to applying the optimization in a
less aggressive way.
How do tests get added into the regression test suite? Ideally, if
somebody reports a case where a program behaves in one way in an
earlier version of the same compiler and differently in a later
version, that program and its original behaviour should usually be
added to the test suite <https://www.complang.tuwien.ac.at/papers/ertl17kps.pdf>, but in gcc
this does not happen (see the bug reports linked to above).
Apparently gcc has some other criteria for adding programs to the test
suite.
So, is C still usable when you do not maintain one of those programs
that are considered to be relevant by C compiler maintainers? My
experience is that the amount of breakage for the code I maintain has
been almost non-existent in the last 15 years. A big part of that is
that we use lots of flags to tell the compiler that certain behaviour
is defined even if the C standard does not define it.
Currently we
try the following flags with the versions of gcc or clang that support
them:
-fno-gcse -fcaller-saves -fno-defer-pop -fno-inline -fwrapv
-fchar-unsigned -fno-strict-aliasing -fno-cse-follow-jumps -fno-reorder-blocks -fno-reorder-blocks-and-partition
-fno-toplevel-reorder -fno-trigraphs -falign-labels=1 -falign-loops=1 -falign-jumps=1 -fno-delete-null-pointer-checks -fcf-protection=none -fno-tree-vectorize -mllvm=--tail-dup-indirect-size=0
Some of these flags just disable certain transformations; in those
cases there is no flag for defining the language in the way that our
program relies on, but only the optimization transforms it in a way
that is contrary to our intentions. In other cases, in particular -fno-tree-vectorize, using the flag just avoids slowdowns from the "optimization".
Another big part of the lack of breakage experience is probably the
code in the regression tests of the compiler, whatever the criteria
are used for including this code. I.e., our code rides in the
slipstream of this code.
On the other hand, there's the python approach in which they deprecate and >> remove little used and crufty features, but old python code doesn't work any >> more unless you go back and update it every year or two. -John]
Is it so bad with Python? From what I read, after the huge problems
that Python had with migrating the existing code base from Python2 to
Python3 (where Python3 was intentionally not backwards compatible with Python2), they had decided not to make backwards-incompatible changes
to the language in the future.
On 21/08/2025 21:53, Keith Thompson wrote:
David Brown <david.brown@hesbynett.no> writes:
UB (both definitions) is an essential part of all programming languages
- after all, if you have a bug in your code, you have UB, and no
programming language has made it impossible to write bugs in your code.
C just has some things that are undefined in C but defined in some other >>> languages, and it is a bit more open and honest about UB than many
language definitions.
No, a bug in your code is not necessarily undefined behavior. It could
easily be code whose behavior is well defined by the language standard,
but that behavior isn't what the programmer intended.
When I write code, /I/ define what the behaviour of the code should be.
A bug in the code means it is not acting according to my definitions -
it is UB.
UB (both definitions) is an essential part of all programming languages
On 21/08/2025 21:53, Keith Thompson wrote:[...]
If you declare and call a function "foo" that is written in fully
portable C code, but not part of the current translation unit being
compiled (perhaps it has been separately compiled or included in a
library), then it would be UB by the section 4 definition (since the C standards don't say anything about what "foo" does, nor does your code).
But the code that calls "foo" is portable and not erroneous, so it is
not UB by the section 3 definition.
Add to that, the C standard has a specific term for features that are non-portable but not undefined behaviour - "implementation-defined behaviour". Code that relies on "int" being 32-bit is not portable, but
it is not UB when compiled on implementations for which "int" /is/ 32-bit.
No, a bug in your code is not necessarily undefined behavior. It could
easily be code whose behavior is well defined by the language standard,
but that behavior isn't what the programmer intended.
When I write code, /I/ define what the behaviour of the code should be.
A bug in the code means it is not acting according to my definitions -
it is UB. It may still be acting according to the definitions of the C abstract machine given in the C standards (you are correct there). Even
if it has C-standard UB, it will still be acting according to the
definitions of the target machine's instruction set. Behaviour is
defined on multiple levels, only one of which is the C standard.
On 20/08/2025 14:06, John wrote:
When a language is 50 years old and there is a mountain of legacy code that
they really don't want to break, it accumulates a lot of cruft. If we were >> starting now we'd get something more like Go.
On the other hand, there's the python approach in which they deprecate and >> remove little used and crufty features, but old python code doesn't work any >> m
On 23/08/2025 00:11, Keith Thompson wrote:...
David Brown <david.brown@hesbynett.no> writes:The C standard does not define how this linking or combing is done - it
On 21/08/2025 21:53, Keith Thompson wrote:[...]
If you declare and call a function "foo" that is written in fully
portable C code, but not part of the current translation unit being
compiled (perhaps it has been separately compiled or included in a
library), then it would be UB by the section 4 definition (since the C
standards don't say anything about what "foo" does, nor does your code). ...
only covers certain specific aspects of the linking that relate directly
to C. The behaviour of the function "foo" here is not defined in the C standards, and if the source code is not available when translating a different translation unit, the behaviour of "foo" is undefined.
I remember having an immensely frustrating discussion on this issue a
couple of decades ago.
On 2025-08-25 22:13, James Kuyper wrote:
...
I remember having an immensely frustrating discussion on this issue
a couple of decades ago.
The discussion was on comp.std.c, the Subject: was "clrsc and UB", and
my participation in the discussion started 2002-02-05.
[Yeah, it's not like this is a new topic. -John]
On Tue, 26 Aug 2025 13:41:14 -0400...
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
The discussion was on comp.std.c, the Subject: was "clrsc and UB", and
my participation in the discussion started 2002-02-05.
[Yeah, it's not like this is a new topic. -John]
Don't you mean "clrscr and UB" ?