Hello,
While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.
Let's take an example. There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb
For the lazy, I report it here:
int table[4] = {0};
int exists_in_table(int v)
{
// return true in one of the first 4 iterations
// or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return 1;
}
return 0;
}
This is compiled (with no warning whatsoever) into:
exists_in_table:
mov eax, 1
ret
table:
.zero 16
Well, this is *obviously* wrong. And sure, so is the original code,
but I find it hard to think that the compiler isn't able to notice it,
given that it is even "exploiting" it to produce very efficient code.
I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible. The compiled function will
basically return 1 for any input, and the final program will be
buggy.
Wouldn't it be more sensible to have a compilation error, or
at least a warning? The compiler will be happy even with -Wall
-Wextra -Werror.
There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?
I mean, yes I would find the problem, thanks to my 100% coverage
unit testing, but couldn't the compiler give me a hint?
Could someone drive me into this reasoning? I know there is a lot of
thinking behind it, yet everything seems to me very incorrect!
I'm in deep cognitive dissonance here! :) Help!
Hello,
While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.
Let's take an example. There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb
For the lazy, I report it here:
int table[4] = {0};
int exists_in_table(int v)
{
// return true in one of the first 4 iterations
// or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return 1;
}
return 0;
}
This is compiled (with no warning whatsoever) into:
exists_in_table:
mov eax, 1
ret
table:
.zero 16
Well, this is *obviously* wrong. And sure, so is the original code,
but I find it hard to think that the compiler isn't able to notice it,
given that it is even "exploiting" it to produce very efficient code.
I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible. The compiled function will
basically return 1 for any input, and the final program will be
buggy.
Wouldn't it be more sensible to have a compilation error, or
at least a warning? The compiler will be happy even with -Wall -Wextra -Werror.
On Thu, 1 Jan 2026 22:54:05 +0100
highcrew <high.crew3868@fastmail.com> wrote:
For the lazy, I report it here:[...]
int table[4] = {0};
int exists_in_table(int v)
{
// return true in one of the first 4 iterations
// or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return 1;
}
return 0;
}
This is compiled (with no warning whatsoever) into:
exists_in_table:
mov eax, 1
ret
table:
.zero 16
IMHO, for compiler that eliminated all comparisons (I assume that it was
gcc -O2/-O3) an absence of warning is a bug.
It's worth reporting.
Hello,
While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.
Let's take an example. There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb
For the lazy, I report it here:
int table[4] = {0};
int exists_in_table(int v)
{
// return true in one of the first 4 iterations
// or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return 1;
}
return 0;
}
This is compiled (with no warning whatsoever) into:
exists_in_table:
mov eax, 1
ret
table:
.zero 16
Well, this is *obviously* wrong. And sure, so is the original code,
but I find it hard to think that the compiler isn't able to notice it,
given that it is even "exploiting" it to produce very efficient code.
I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible. The compiled function will
basically return 1 for any input, and the final program will be
buggy.
Wouldn't it be more sensible to have a compilation error, or
at least a warning? The compiler will be happy even with -Wall -Wextra -Werror.
There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?
I mean, yes I would find the problem, thanks to my 100% coverage
unit testing, but couldn't the compiler give me a hint?
Could someone drive me into this reasoning? I know there is a lot of
thinking behind it, yet everything seems to me very incorrect!
I'm in deep cognitive dissonance here! :) Help!
Hello,
While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.
Let's take an example.-a There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb
For the lazy, I report it here:
-a int table[4] = {0};
-a int exists_in_table(int v)
-a {
-a-a-a-a-a // return true in one of the first 4 iterations
-a-a-a-a-a // or UB due to out-of-bounds access
-a-a-a-a-a for (int i = 0; i <= 4; i++) {
-a-a-a-a-a-a-a-a-a if (table[i] == v) return 1;
-a-a-a-a-a }
-a-a-a-a-a return 0;
-a }
This is compiled (with no warning whatsoever) into:
-a exists_in_table:
-a-a-a-a-a-a-a-a-a mov-a-a-a-a eax, 1
-a-a-a-a-a-a-a-a-a ret
-a table:
-a-a-a-a-a-a-a-a-a .zero-a-a 16
Well, this is *obviously* wrong. And sure, so is the original code,
but I find it hard to think that the compiler isn't able to notice it,
given that it is even "exploiting" it to produce very efficient code.
I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible.-a The compiled function will
basically return 1 for any input, and the final program will be
buggy.
Wouldn't it be more sensible to have a compilation error, or
at least a warning?-a The compiler will be happy even with -Wall -Wextra -Werror.
There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?
I mean, yes I would find the problem, thanks to my 100% coverage
unit testing, but couldn't the compiler give me a hint?
Could someone drive me into this reasoning? I know there is a lot of
thinking behind it, yet everything seems to me very incorrect!
I'm in deep cognitive dissonance here! :) Help!
highcrew <high.crew3868@fastmail.com> wrote:
You do not get the formalism: compiler applies a lot transformations
which are supposed to be correct for programs obeying the C rules.
However, compiler does not understand the program. It may notice
details that you missed, but it act essentialy blindly on
information it has. And most transformations have only limited
info (storing all things that compiler infers would take a lot
of memory and searching all info would take a lot of time).
Code that you see is a result of many transformations, possibly
hundreds or more. The result is a conseqence of all steps,
but it could be hard to isolate a single "silly" step.
[...]
There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?
By using C you implicitely gave "yes" as an answer.
However, it /does/ make sense to ask whether the compiler could have
been more helpful in pointing out your mistake - and clearly, in theory
at least, the answer is yes.
[...]
I had a little look in the gcc bugzilla, but could not find any issue
that directly matches this case.-a So I think it is worthwhile if you
file it in as a gcc bug.-a (It is not technically a "bug", but it is definitely an "opportunity for improvement".)-a If the gcc passes make it hard to implement as a normal warning, it may still be possible to add
it to the "-fanalyzer" passes.
There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?
You are asking if you want the generated code to be efficiently wrong or inefficiently wrong?
Sometimes that means things can appear simple and obvious to the user,
but would require unwarranted effort to implement in the compiler."
You can also make use of run-time sanitizers that are ideal for catching this sort of thing (albeit with an inevitable speed overhead).
<https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html>
Wouldn't it be more sensible to have a compilation error, or
at least a warning?
On Thu, 1 Jan 2026 22:54:05 +0100
highcrew <high.crew3868@fastmail.com> wrote:
Hello,
While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.
Let's take an example. There's plenty here:
https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb
For the lazy, I report it here:
int table[4] = {0};
int exists_in_table(int v)
{
// return true in one of the first 4 iterations
// or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return 1;
}
return 0;
}
This is compiled (with no warning whatsoever) into:
exists_in_table:
mov eax, 1
ret
table:
.zero 16
Well, this is *obviously* wrong. And sure, so is the original code,
but I find it hard to think that the compiler isn't able to notice it,
given that it is even "exploiting" it to produce very efficient code.
I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible. The compiled function will
basically return 1 for any input, and the final program will be
buggy.
Wouldn't it be more sensible to have a compilation error, or
at least a warning? The compiler will be happy even with -Wall
-Wextra -Werror.
There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?
I mean, yes I would find the problem, thanks to my 100% coverage
unit testing, but couldn't the compiler give me a hint?
Could someone drive me into this reasoning? I know there is a lot of
thinking behind it, yet everything seems to me very incorrect!
I'm in deep cognitive dissonance here! :) Help!
IMHO, for compiler that eliminated all comparisons (I assume that it was
gcc -O2/-O3) an absence of warning is a bug.
And it has nothing to do with C standard and what considered UB by the standard and what not.
On 1/2/26 6:53 AM, Waldek Hebisch wrote:
highcrew <high.crew3868@fastmail.com> wrote:
You do not get the formalism: compiler applies a lot transformations
which are supposed to be correct for programs obeying the C rules.
However, compiler does not understand the program.-a It may notice
details that you missed, but it act essentialy blindly on
information it has.-a And most transformations have only limited
info (storing all things that compiler infers would take a lot
of memory and searching all info would take a lot of time).
Code that you see is a result of many transformations, possibly
hundreds or more.-a The result is a conseqence of all steps,
but it could be hard to isolate a single "silly" step.
[...]
Thanks for your answer.
So you are basically saying that spotting such a problem is
way more difficult than optimizing it? And indeed so difficult that the compiler fails at it?
There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?
By using C you implicitely gave "yes" as an answer.
Wait, I don't think that makes sense.
If we are talking about a legitimate limitation of the compilers, as you
seem to suggest, then it is a different situation.
Perhaps it would be more proper to say that, by using C, one implicitly accepts to take the burden of writing UB-free code.
The compiler can't guarantee that it will detect UB, so the contract
is: you get a correct program if you write correct code.
On 1/2/26 10:31 AM, David Brown wrote:
However, it /does/ make sense to ask whether the compiler could have
been more helpful in pointing out your mistake - and clearly, in
theory at least, the answer is yes.
[...]
Thanks for the clarification.
Yes, I'm exactly wondering if the compiler shouldn't reject that code
to begin with.-a I'm not expecting to enter wrong code and get a
working program.-a That would be dark magic.
So you are basically confirming what I inferred from Waldek Hebisch's
answer: it is actually quite hard for the compiler to spot it. So we
live with it.
I had a little look in the gcc bugzilla, but could not find any issue
that directly matches this case.-a So I think it is worthwhile if you
file it in as a gcc bug.-a (It is not technically a "bug", but it is
definitely an "opportunity for improvement".)-a If the gcc passes make
it hard to implement as a normal warning, it may still be possible to
add it to the "-fanalyzer" passes.
Erm... I will considering filing an "opportunity for improvement"
ticket then, thank you.
There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?
You are asking if you want the generated code to be efficiently wrong
or inefficiently wrong?
I was asking if it is reasonable to accept as valid a program which
is wrong, and make it optimized in its wrong behavior.
What I could not grasp is the difficulty of the job.
To quote your own words:
Sometimes that means things can appear simple and obvious to the user,
but would require unwarranted effort to implement in the compiler."
You can also make use of run-time sanitizers that are ideal for
catching this sort of thing (albeit with an inevitable speed overhead).
<https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html>
Yes, I'm aware of this instruments, but I'm not very knowledgeable about
it. I'd like to learn more, and I'll need to spend time doing so.
Thanks!
Yes, I'm aware of this instruments, but I'm not very knowledgeable about
it. I'd like to learn more, and I'll need to spend time doing so.
The tools here can be useful.-a Of course it is best when you can find
bugs earlier, at the static analysis stage (I am a big fan of lots of compiler warnings), but the "-fsanatize" options are the next step for a
lot of development.-a They are of limited value in my own work (small embedded systems - there's often no console for log messages, and much
less possibility of "hardware accelerated" error detection such as
creative use of a processor's MMU), but for PC programming they can be a great help.
Thanks!
It's been a good thread - on-topic, interesting discussion, people have
got a better understanding of a few things, there's an opportunity to contribute to better C development tools, and no flames.-a I look forward
to your next question!
Let's take an example.-a There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb
For the lazy, I report it here:
-a int table[4] = {0};
-a int exists_in_table(int v)
-a {
-a-a-a-a-a // return true in one of the first 4 iterations
-a-a-a-a-a // or UB due to out-of-bounds access
-a-a-a-a-a for (int i = 0; i <= 4; i++) {
-a-a-a-a-a-a-a-a-a if (table[i] == v) return 1;
-a-a-a-a-a }
-a-a-a-a-a return 0;
-a }
This is compiled (with no warning whatsoever) into:
-a exists_in_table:
-a-a-a-a-a-a-a-a-a mov-a-a-a-a eax, 1
-a-a-a-a-a-a-a-a-a ret
-a table:
-a-a-a-a-a-a-a-a-a .zero-a-a 16
Well, this is *obviously* wrong.
I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
On 1/3/26 1:42 PM, David Brown wrote:
Yes, I'm aware of this instruments, but I'm not very knowledgeable about >>> it. I'd like to learn more, and I'll need to spend time doing so.
The tools here can be useful.-a Of course it is best when you can find
bugs earlier, at the static analysis stage (I am a big fan of lots of
compiler warnings), but the "-fsanatize" options are the next step for
a lot of development.-a They are of limited value in my own work (small
embedded systems - there's often no console for log messages, and much
less possibility of "hardware accelerated" error detection such as
creative use of a processor's MMU), but for PC programming they can be
a great help.
Agreed.
I happen to work with embedded systems as well, and while I came late to
the party (all the possible checks are already employed by colleagues
who came before me. They took the fun part!), I can tell the value of sanitizers even if the code will later run on embedded systems.
That's why I say I'd like to learn more: I'm merely a user of it.
Following this thoughts, I started to wonder: the code I reported in
the beginning of the thread, built with -O2, is effectively coping with
UB by replacing the function with the equivalent of `return 1`.
What if I build it with -O2 and -fsanitize=address?
Will the instrumentation be able to catch it, given that there's nothing inherently bad around a `return 1` (minus the fact that it's not what
the developer intended?).
Well, what do you know? -fsanitize=address seems to interfere with optimizations, at least on my system. Link it, run it, and I get a nice segfault.
Now the circle is closed!
On 2026-01-01, Michael S <already5chosen@yahoo.com> wrote:
On Thu, 1 Jan 2026 22:54:05 +0100
highcrew <high.crew3868@fastmail.com> wrote:
Hello,
While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.
Let's take an example. There's plenty here:
https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb
For the lazy, I report it here:
int table[4] = {0};
int exists_in_table(int v)
{
// return true in one of the first 4 iterations
// or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return 1;
}
return 0;
}
This is compiled (with no warning whatsoever) into:
exists_in_table:
mov eax, 1
ret
table:
.zero 16
Well, this is *obviously* wrong. And sure, so is the original code,
but I find it hard to think that the compiler isn't able to notice
it, given that it is even "exploiting" it to produce very
efficient code.
I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible. The compiled function will
basically return 1 for any input, and the final program will be
buggy.
Wouldn't it be more sensible to have a compilation error, or
at least a warning? The compiler will be happy even with -Wall
-Wextra -Werror.
There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?
I mean, yes I would find the problem, thanks to my 100% coverage
unit testing, but couldn't the compiler give me a hint?
Could someone drive me into this reasoning? I know there is a lot
of thinking behind it, yet everything seems to me very incorrect!
I'm in deep cognitive dissonance here! :) Help!
IMHO, for compiler that eliminated all comparisons (I assume that
it was gcc -O2/-O3) an absence of warning is a bug.
A bug against which requirement, articulated where?
And it has nothing to do with C standard and what considered UB by
the standard and what not.
It has everything to do with it, unfortunately. It literally has
nothing to do with anything else, in fact.
That function either finds a match in the four array elements and
returns 1, or else its behavior is undefined.
Therefore there is no situation under which it is /required/ to return anything other than 1.
You literally cannot write a test case which tests for the "return 0",
such that the test case has well-defined behavior.
All well-defined test cases can only test for 1 being returned.
And that is satisfied by machine code which unconditionally returns 1.
There is no requirement anywhere that the function requires a
diagnostic; not in ISO C and not in any GCC documentation.
Therefore your bug report would have to be not about the compiler
behavior but about the lack of the requirement.
This is a difficult problem: writing the requirement /in a good way/
that covers many cases is not easy, and that's before you implement
anything in the compiler.
On 2026-01-01, highcrew <high.crew3868@fastmail.com> wrote:
For the situation in your program, it would be unacceptable to have implementations stop translating.
We really want just a warning (at
least by default; in specific project and situations, developers
could elect to treat certain warnings as fatal, even standard-required warnings.)
The second new thing is that to diagnose this, we need to make
diagnosis dependent on reachability.
We want a rule which is something like "whenever the body of
a function, or an initializing expression for an external definition
reaches an expression which has unconditional undefined behavior
that is not an unreachability assertion and not a documented
extension, a warning diagnostic must be issued".
This kind of diagnostic would be a good thing in my opinion; justNow I'm wondering how much work it requires to properly define
nobody has stepped up to the plate because of the challenges:
- introducing the concept of a warning versus error diagnostic.
- defining a clear set of rules for trivial reachability which
can catch the majority of these situations without too much
complexity. (The C++ rules for functions that return value
reaching their end without a return statement can be used
as inspiration here.)
- specifying exactly what "statically obvious" undefined behavior
is and how positively determine that a certain expression
exhibits it.
(The C++ rules for functions that return value
reaching their end without a return statement can be used
as inspiration here.)
Well, this is *obviously* wrong.
On your case undefined behavior happens when `i` reaches 4. Hence the compiler is free to assume that `i` is guaranteed to never reach 4. This means that the `if` condition is guaranteed to become true at some lower value of `i` (i.e. the compiler is free to assume that the calling code
made a promise to never pass a `v` that is not present in `table`). This immediately means that the function will always return 1.
[Be careful snipping attributions.-a Make sure you have enough left for
all levels of quotation.-a The following paragraph was written by you,
not by me.]
I have a horrible question now, but that's for a
separate question...
On 1/4/26 12:15 AM, highcrew wrote:
I have a horrible question now, but that's for a
separate question...
And the question is:
Embedded systems. Address 0x00000000 is mapped to the flash.
I want to assign a pointer to 0x00000000 and dereference it to
read the first word.
That's UB.
How do I?
For the lazy, I report it here:
-a int table[4] = {0};
-a int exists_in_table(int v)
-a {
-a-a-a-a-a // return true in one of the first 4 iterations
-a-a-a-a-a // or UB due to out-of-bounds access
-a-a-a-a-a for (int i = 0; i <= 4; i++) {
-a-a-a-a-a-a-a-a-a if (table[i] == v) return 1;
-a-a-a-a-a }
-a-a-a-a-a return 0;
-a }
On 1/4/26 12:15 AM, highcrew wrote:
I have a horrible question now, but that's for a
separate question...
And the question is:
Embedded systems.-a Address 0x00000000 is mapped to the flash.
I want to assign a pointer to 0x00000000 and dereference it to
read the first word.
That's UB.
How do I?
The compiler on that embedded system is, of course, aware of the
fact that address 0x00000000 is perfectly valid and should be left accessible. So, for that reason, the compiler is supposed to choose
some other physical representation for null pointers ...
On Sat, 3 Jan 2026 17:24:54 -0800, Andrey Tarasevich wrote:
The compiler on that embedded system is, of course, aware of the
fact that address 0x00000000 is perfectly valid and should be left
accessible. So, for that reason, the compiler is supposed to choose
some other physical representation for null pointers ...
What if the entire machine address space is valid? Are C pointer types supposed to add an extra rCLinvalidrCY value on top of that?
On Sat 1/3/2026 3:25 PM, highcrew wrote:
On 1/4/26 12:15 AM, highcrew wrote:
I have a horrible question now, but that's for a
separate question...
And the question is:
Embedded systems.-a Address 0x00000000 is mapped to the flash.
I want to assign a pointer to 0x00000000 and dereference it to
read the first word.
That's UB.
How do I?
Well, the first question would be: what is the physical null pointer representation in that C implementation on that embedded system?
...
On 2026-01-03 21:19, Lawrence DrCOOliveiro wrote:
What if the entire machine address space is valid? Are C pointer
types supposed to add an extra rCLinvalidrCY value on top of that?
Either that, or set aside one piece of addressable memory that is
not available to user code. Note, in particular, that it might be a
piece of memory used by the implementation of C, or by the operating
system. In which case, the undefined behavior that can occur as a
result of dereferencing a null point would take the form of messing
up the C runtime or the operating system.
This particular example is explained is several places, e.g.:
https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633
Perhaps a slightly better explanation of the same example:
https://medium.com/@pauljlucas/undefined-behavior-in-c-and-c-f30844f20e2a
- Paul
On 1/2/26 11:52 PM, Kaz Kylheku wrote:
On 2026-01-01, highcrew <high.crew3868@fastmail.com> wrote:
For the situation in your program, it would be unacceptable to have
implementations stop translating.
I can somehow get the idea that it is difficult for the compiler
to spot the issue, but why do you think it would be unacceptable
to stop translating?
We really want just a warning (at
least by default; in specific project and situations, developers
could elect to treat certain warnings as fatal, even standard-required
warnings.)
Even a warning would be enough though.-a Btw, my typical way of
working is to enable -Werror while developing, but I don't like
to force it in general.-a That would be an interesting digression,
but definitely OT.
The second new thing is that to diagnose this, we need to make
diagnosis dependent on reachability.
We want a rule which is something like "whenever the body of
a function, or an initializing expression for an external definition
reaches an expression which has unconditional undefined behavior
that is not an unreachability assertion and not a documented
extension, a warning diagnostic must be issued".
That's an interesting perspective: reachability.
Would you say that the incriminated piece of code is UB only if it
is reachable in the final program, therefore it is acceptable
to keep it as long as unreachable?
Now that I think of it, the __builtin_unreachable() implemented
by popular compilers is technically UB *if reached* :)
This kind of diagnostic would be a good thing in my opinion; justNow I'm wondering how much work it requires to properly define
nobody has stepped up to the plate because of the challenges:
- introducing the concept of a warning versus error diagnostic.
- defining a clear set of rules for trivial reachability which
-a-a can catch the majority of these situations without too much
-a-a complexity. (The C++ rules for functions that return value
-a-a reaching their end without a return statement can be used
-a-a as inspiration here.)
- specifying exactly what "statically obvious" undefined behavior
-a-a is and how positively determine that a certain expression
-a-a exhibits it.
the rules that the standard mandates!
As for me the main take-away is that the detection of certain UB
is non-trivial, it would be very evil if the standard was mandating
some nearly-impossible task to the compiler!
(The C++ rules for functions that return value
-a-a-a reaching their end without a return statement can be used
-a-a-a as inspiration here.)
C++ does *what*?? I'm definitely not up to speed with C++, but
I totally have missed that.-a Could you please tell me the name
of this bizarre feature? I *need* to look it up :D
You literally cannot write a test case which tests for the "return 0",
such that the test case has well-defined behavior.
All well-defined test cases can only test for 1 being returned.
And that is satisfied by machine code which unconditionally returns 1.
On 2026-01-03 18:25, highcrew wrote:
On 1/4/26 12:15 AM, highcrew wrote:
I have a horrible question now, but that's for a
separate question...
And the question is:
Embedded systems. Address 0x00000000 is mapped to the flash.
I want to assign a pointer to 0x00000000 and dereference it to
read the first word.
That's UB.
Actually, that's not necessarily true. A null pointer is not required to >refer to the location with an address of 0. An integer constant
expression with a value of 0, converted to a pointer type, is guaranteed
to be a null pointer, but that pointer need not have a representation
that has all bits 0. However, an integer expression that is not a
constant expression, if converted to a pointer type, is not required to
be a null pointer - it could convert to an entirely different pointer value.
So an implementation could allow it simply by reserving a pointer to
some other location (such as the last position in memory) as the >representation of a null pointer.
How do I?
Even on an implementation that uses a pointer representing a machine
address of 0 as a null pointer, such code can still work. In the C
standard, "undefined behavior" means that the C standard imposes no >requirements on the behavior. That doesn't prohibit other sources from >imposing requirements. On such a system, it could define the behavior as >accessing the flash.
On Sat, 3 Jan 2026 17:24:54 -0800, Andrey Tarasevich wrote:
The compiler on that embedded system is, of course, aware of the
fact that address 0x00000000 is perfectly valid and should be left
accessible. So, for that reason, the compiler is supposed to choose
some other physical representation for null pointers ...
What if the entire machine address space is valid? Are C pointer types >supposed to add an extra rCLinvalidrCY value on top of that?
On 1/4/26 12:15 AM, highcrew wrote:
I have a horrible question now, but that's for a
separate question...
And the question is:
Embedded systems.-a Address 0x00000000 is mapped to the flash.
I want to assign a pointer to 0x00000000 and dereference it to
read the first word.
That's UB.
How do I?
Now I guess that an embedded compiler targeting that certain
architecture where dereferencing 0 makes sense will not treat
it as UB.-a But it is for sure a weird corner case.
On Sat, 3 Jan 2026 21:31:20 -0500, James Kuyper wrote:
On 2026-01-03 21:19, Lawrence DrCOOliveiro wrote:
What if the entire machine address space is valid? Are C pointer
types supposed to add an extra rCLinvalidrCY value on top of that?
Either that, or set aside one piece of addressable memory that is
not available to user code. Note, in particular, that it might be a
piece of memory used by the implementation of C, or by the operating
system. In which case, the undefined behavior that can occur as a
result of dereferencing a null point would take the form of messing
up the C runtime or the operating system.
rCLUndefined behaviourrCY could also include rCLperforming a valid memory accessrCY, could it not.
On 2026-01-03 23:52, Lawrence DrCOOliveiro wrote:
On Sat, 3 Jan 2026 21:31:20 -0500, James Kuyper wrote:
On 2026-01-03 21:19, Lawrence DrCOOliveiro wrote:
What if the entire machine address space is valid? Are C pointer
types supposed to add an extra rCLinvalidrCY value on top of that?
Either that, or set aside one piece of addressable memory that is
not available to user code. Note, in particular, that it might be
a piece of memory used by the implementation of C, or by the
operating system. In which case, the undefined behavior that can
occur as a result of dereferencing a null point would take the
form of messing up the C runtime or the operating system.
rCLUndefined behaviourrCY could also include rCLperforming a valid memory
accessrCY, could it not.
Of course. In fact, the single most dangerous thing that can occur
when code with undefined behavior is executed is that it does
exactly what you incorrectly believe it is required to do. As a
result, you fail to be warned of the error in your beliefs.
Not differently from halting problem: sure, it is theoretically
impossible to understand if a program will terminate, but in
practical terms, if you expect it to take less than 1 second and it
takes more than 10, you area already hitting ^C and conjecturing
that something went horribly wrong :D
On Sun, 4 Jan 2026 13:00:02 -0500, James Kuyper wrote:...
On 2026-01-03 23:52, Lawrence DrCOOliveiro wrote:
rCLUndefined behaviourrCY could also include rCLperforming a valid memory >>> accessrCY, could it not.
Of course. In fact, the single most dangerous thing that can occur
when code with undefined behavior is executed is that it does
exactly what you incorrectly believe it is required to do. As a
result, you fail to be warned of the error in your beliefs.
In this case, itrCOs not clear what choice you have.
Not differently from halting problem: sure, it is theoretically
impossible to understand if a program will terminate,
On 2026-01-04 16:22, Lawrence DrCOOliveiro wrote:
...
On Sun, 4 Jan 2026 13:00:02 -0500, James Kuyper wrote:
On 2026-01-03 23:52, Lawrence DrCOOliveiro wrote:
rCLUndefined behaviourrCY could also include rCLperforming a valid
memory accessrCY, could it not.
Of course. In fact, the single most dangerous thing that can occur
when code with undefined behavior is executed is that it does
exactly what you incorrectly believe it is required to do. As a
result, you fail to be warned of the error in your beliefs.
In this case, itrCOs not clear what choice you have.
I may have lost the thread here - which choice are you talking
about?
On 2026-01-04 08:38, highcrew wrote:
...
Not differently from halting problem: sure, it is theoretically
impossible to understand if a program will terminate,
That's an incorrect characterization of the halting problem. There are
many programs where it's entirely feasible, and even easy, to determine whether they will halt. What has been proven is that there must be some programs that for which it cannnot be done.
James Kuyper <jameskuyper@alumni.caltech.edu> writes:
On 2026-01-03 18:25, highcrew wrote:
On 1/4/26 12:15 AM, highcrew wrote:
I have a horrible question now, but that's for a
separate question...
And the question is:
Embedded systems. Address 0x00000000 is mapped to the flash.
I want to assign a pointer to 0x00000000 and dereference it to
read the first word.
That's UB.
Actually, that's not necessarily true. A null pointer is not required to
refer to the location with an address of 0. An integer constant
expression with a value of 0, converted to a pointer type, is guaranteed
to be a null pointer, but that pointer need not have a representation
that has all bits 0. However, an integer expression that is not a
constant expression, if converted to a pointer type, is not required to
be a null pointer - it could convert to an entirely different pointer value. >>
So an implementation could allow it simply by reserving a pointer to
some other location (such as the last position in memory) as the
representation of a null pointer.
How do I?
Even on an implementation that uses a pointer representing a machine
address of 0 as a null pointer, such code can still work. In the C
standard, "undefined behavior" means that the C standard imposes no
requirements on the behavior. That doesn't prohibit other sources from
imposing requirements. On such a system, it could define the behavior as
accessing the flash.
Indeed, every C compiler I've ever used has simply dereferenced a
pointer that has a value of zero. In user mode, the kernel will
generally trap and generate a SIGSEGV or equivalent. In kernel
mode, it will just work, assuming that the CPU is configured to
run with MMU disabled (or the MMU has a valid mapping for virtual
address zero).
On 2026-01-03 23:52, Lawrence DrCOOliveiro wrote:
On Sat, 3 Jan 2026 21:31:20 -0500, James Kuyper wrote:
On 2026-01-03 21:19, Lawrence DrCOOliveiro wrote:
What if the entire machine address space is valid? Are C pointer
types supposed to add an extra rCLinvalidrCY value on top of that?
Either that, or set aside one piece of addressable memory that is
not available to user code. Note, in particular, that it might be a
piece of memory used by the implementation of C, or by the operating
system. In which case, the undefined behavior that can occur as a
result of dereferencing a null point would take the form of messing
up the C runtime or the operating system.
rCLUndefined behaviourrCY could also include rCLperforming a valid memory
accessrCY, could it not.
Of course. In fact, the single most dangerous thing that can occur when
code with undefined behavior is executed is that it does exactly what
you incorrectly believe it is required to do. As a result, you fail to
be warned of the error in your beliefs.
On Sun, 4 Jan 2026 16:53:16 -0500, James Kuyper wrote:...
On 2026-01-04 16:22, Lawrence DrCOOliveiro wrote:
In this case, itrCOs not clear what choice you have.
I may have lost the thread here - which choice are you talking
about?
What if the entire machine address space is valid? Are C pointer types supposed to add an extra rCLinvalidrCY value on top of that?
On 1/4/26 2:10 AM, Paul J. Lucas wrote:
This particular example is explained is several places, e.g.:
https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633
Perhaps a slightly better explanation of the same example:
https://medium.com/@pauljlucas/undefined-behavior-in-c-and-c-f30844f20e2a
- Paul
Hey, thanks for the pointers.
I found the second a really good write up!
On 04/01/2026 00:25, highcrew wrote:
On 1/4/26 12:15 AM, highcrew wrote:
I have a horrible question now, but that's for a
separate question...
And the question is:
Embedded systems.-a Address 0x00000000 is mapped to the flash.
I want to assign a pointer to 0x00000000 and dereference it to
read the first word.
That's UB.
How do I?
Now I guess that an embedded compiler targeting that certain
architecture where dereferencing 0 makes sense will not treat
it as UB.-a But it is for sure a weird corner case.
There are some common misconceptions about null pointers in C. A "null pointer" is the result of converting a "null pointer constant", or
another "null pointer", to a pointer type. A null pointer constant is either an integer constant expression with the value 0 (such as the
constant 0, or "1 - 1"), or "nullptr" in C23. You can use "NULL" from <stddef.h> as a null pointer constant.
So if you write "int * p = 0;", then "p" holds a null pointer. If you
write "int * p = (int *) sizeof(*p); p--;" then "p" does not hold a null pointer, even though it will hold the value "0".
On virtually all real-world systems, including all embedded systems I
have ever known (and that's quite a few), null pointers correspond to
the address 0. But that does not mean that dereferencing a pointer
whose value is 0 is necessarily UB.
And even when dereferencing a pointer /is/ UB, a compiler can handle it
as defined if it wants.
I think that if you have a microcontroller with code at address 0, and a pointer of some object type (say, "const uint8_t * p" or "const uint32_t
* p") holding the address 0, then using that to read the flash at that address is UB. But it is not UB because "p" holds a null pointer - it
may or may not be a null pointer. It is UB because "p" does not point
to an object.
In practice, I have never seen an embedded compiler fail to do the
expected thing when reading flash from address 0. (Typical use-cases
are for doing CRC checks or signature checks on code, or for reading the initial stack pointer value or reset vector of the code.) If you want
to be more confident, use a pointer to volatile type.
Putting volatile qualifier on p gives working code, but apparently
disables optimization. Also, this looks fragile. So if I needed
to access address 0 I probably would use assembly routine to do this.
On 04/01/2026 12:51, highcrew wrote:
On 1/4/26 2:10 AM, Paul J. Lucas wrote:
Perhaps a slightly better explanation of the same example:
https://medium.com/@pauljlucas/undefined-behavior-in-c-and-c-
f30844f20e2a
That one starts off with a bit of a jumble of misconceptions.
To start with, "undefined behaviour" does not exist because of
compatibility issues or the merging of different C variations into one standard C.
The C standard is simply somewhat unusual in that it is more explicit
about UB than many languages' documentation.-a And being a language
intended for maximally efficient code, C leaves a number of things as UB where other languages might throw an exception or have other error
handling.
Implementation defined behaviour is /not/ "bad" - pretty much all
programs rely on implementation-defined behaviour such as the size of
"int", character sets used, etc.-a Relying on implementation-defined behaviour reduces the portability of code, but that is not necessary a
bad thing.
And while it is true that UB is "worse" than either implementation-
defined behaviour or unspecified behaviour, it is not for either of the reasons given.-a The *nix program "date" does not need to contain UB in order to produce different results at different times.
It also makes the mistake common in discussions of UB optimisations of concluding that the optimisation makes the code "wrong".-a Optimisations, such as the example of the "assign_not_null" function, are "logicallyWhat the author meant is that optimization can make UB manifest more
valid" and /correct/ from the given source code.-a Optimisations have not made the code "wrong", nor has the compiler.-a The source code is correct for a given validity subset of its parameter types, and the object code
is correct for that same subset.-a If the source code is intended to work over a wider range of inputs, then it is the source code that is wrong -
not the optimiser or the optimised code.
On Sun, 4 Jan 2026 14:38:00 +0100, highcrew wrote:
Not differently from halting problem: sure, it is theoretically
impossible to understand if a program will terminate, but in
practical terms, if you expect it to take less than 1 second and it
takes more than 10, you area already hitting ^C and conjecturing
that something went horribly wrong :D
What do Windows users hit instead of CTRL/C? Because CTRL/C means
something different to them, doesnrCOt it?
On 1/5/26 6:39 AM, David Brown wrote:
On 04/01/2026 12:51, highcrew wrote:
On 1/4/26 2:10 AM, Paul J. Lucas wrote:
Perhaps a slightly better explanation of the same example:
https://medium.com/@pauljlucas/undefined-behavior-in-c-and-c-
f30844f20e2a
That one starts off with a bit of a jumble of misconceptions.
To start with, "undefined behaviour" does not exist because of
compatibility issues or the merging of different C variations into one
standard C.
...
The C standard is simply somewhat unusual in that it is more explicit
about UB than many languages' documentation.-a And being a language
intended for maximally efficient code, C leaves a number of things as
UB where other languages might throw an exception or have other error
handling.
Other languages had the luxury of doing that.-a As the article pointed
out, C had existed for over a decade before the standard and there were
many programs in the wild that relied on their existing behaviors.-a By
this time, the C standard could not retroactively "throw an exception or
have other error handling" since it would have broken those programs, so
it _had_ to leave many things as UB explicitly.-a Hence, the article
isn't wrong.
Implementation defined behaviour is /not/ "bad" - pretty much all
programs rely on implementation-defined behaviour such as the size of
"int", character sets used, etc.-a Relying on implementation-defined
behaviour reduces the portability of code, but that is not necessary a
bad thing.
It's "bad" if a naive programmer isn't aware it's implementation defined
and just assumes it's defined however it's defined on his machine.
And while it is true that UB is "worse" than either implementation-
defined behaviour or unspecified behaviour, it is not for either of
the reasons given.-a The *nix program "date" does not need to contain
UB in order to produce different results at different times.
Sure, but the article didn't mean such cases.
It meant for cases like
incrementing a signed integer past INT_MAX.-a A program could
legitimately give different answers for the same line of code at
different times.
It also makes the mistake common in discussions of UB optimisations ofWhat the author meant is that optimization can make UB manifest more bizarrely in ways than not optimizing wouldn't.-a Code that contains UB
concluding that the optimisation makes the code "wrong".
Optimisations, such as the example of the "assign_not_null" function,
are "logically valid" and /correct/ from the given source code.
Optimisations have not made the code "wrong", nor has the compiler.
The source code is correct for a given validity subset of its
parameter types, and the object code is correct for that same subset.
If the source code is intended to work over a wider range of inputs,
then it is the source code that is wrong - not the optimiser or the
optimised code.
is always wrong.
What the author meant is that optimization can make UB manifest more bizarrely in ways than not optimizing wouldn't. Code that contains UB
is always wrong.
Other languages had the luxury of doing that. As the article pointed
out, C had existed for over a decade before the standard and there
were many programs in the wild that relied on their existing
behaviors. By this time, the C standard could not retroactively
"throw an exception or have other error handling" since it would have
broken those programs, so it _had_ to leave many things as UB
explicitly. Hence, the article isn't wrong.
I get the following assembly:
00000000 <read_at0>:
0: b108 cbz r0, 6 <read_at0+0x6>
2: 2000 movs r0, #0
4: 4770 bx lr
6: 6803 ldr r3, [r0, #0]
8: deff udf #255 @ 0xff
a: bf00 nop
So compiler generates actiual access, but then, instead of returning
the value it executes undefined opcode. Without test for null
pointer I get simple access to memory.
When it comes to invalid (or missing, in C++) `return` statements,
GCC tends to adhere to a "punitive" approach in optimized code - it
injects instructions to deliberately cause a crash/segfault in such
cases.
Clang on the other hand tends to stick to the uniform approach based
on the "UB cannot happen" methodology, i.e. your code sample would
be translated under "p is never null" assumption, and the function
will fold into a simple unconditional `return 0`.
Hello,
While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.
Let's take an example. There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb
For the lazy, I report it here:
int table[4] = {0};
int exists_in_table(int v)
{
// return true in one of the first 4 iterations
// or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return 1;
}
return 0;
}
This is compiled (with no warning whatsoever) into:
exists_in_table:
mov eax, 1
ret
table:
.zero 16
Well, this is *obviously* wrong. And sure, so is the original code,
but I find it hard to think that the compiler isn't able to notice it,
given that it is even "exploiting" it to produce very efficient code.
I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible. The compiled function will
basically return 1 for any input, and the final program will be
buggy.
Wouldn't it be more sensible to have a compilation error, or
at least a warning? The compiler will be happy even with -Wall -Wextra -Werror.
There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?
I mean, yes I would find the problem, thanks to my 100% coverage
unit testing, but couldn't the compiler give me a hint?
Could someone drive me into this reasoning? I know there is a lot of thinking behind it, yet everything seems to me very incorrect!
I'm in deep cognitive dissonance here! :) Help!
highcrew <high.crew3868@fastmail.com> writes:
Hello,
While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.
Let's take an example. There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb
For the lazy, I report it here:
int table[4] = {0};
int exists_in_table(int v)
{
// return true in one of the first 4 iterations
// or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return 1;
}
return 0;
}
This is compiled (with no warning whatsoever) into:
exists_in_table:
mov eax, 1
ret
table:
.zero 16
Well, this is *obviously* wrong. And sure, so is the original code,
but I find it hard to think that the compiler isn't able to notice
it, given that it is even "exploiting" it to produce very efficient
code.
I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible. The compiled function will
basically return 1 for any input, and the final program will be
buggy.
Wouldn't it be more sensible to have a compilation error, or
at least a warning? The compiler will be happy even with -Wall
-Wextra -Werror.
There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?
I mean, yes I would find the problem, thanks to my 100% coverage
unit testing, but couldn't the compiler give me a hint?
Could someone drive me into this reasoning? I know there is a lot
of thinking behind it, yet everything seems to me very incorrect!
I'm in deep cognitive dissonance here! :) Help!
The important thing to realize is that the fundamental issue here
is not a technical question but a social question. In effect what
you are asking is "why doesn't gcc (or clang, or whatever) do what
I want or expect?". The answer is different people want or expect
different things. For some people the behavior described is
egregiously wrong and must be corrected immediately. For other
people the compiler is acting just as they think it should,
nothing to see here, just fix the code and move on to the next
bug. Different people have different priorities.
After observing that, I think the right question is something like
"Given that compilers act in these surprising ways, how should I
protect my code so that it doesn't fall prey to the death-by-UB
syndrome, or what can I do to diagnose a possibly death-by-UB
situation when a strange bug crops up?" I don't pretend to have
good answers to these questions. The best advice I can give
(besides seeking help from others with more experience) is to be
persistent, and to realize that the skills needed for combating a
death-by-UB syndrome are rather different from the skills needed
for regular programming. I have been in the situation of being
made responsible for finding and correcting a death-by-UB kind of
symptom, and what's worse in programming environment where I
didn't have a great deal of familiarity or experience. Despite
those drawbacks the bug got diagnosed and fixed, and I attribute
that result mostly to tenacity and by being willing to consider
unusual or unfamiliar points of view.
Hello,
While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.
Let's take an example. There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb
For the lazy, I report it here:
int table[4] = {0};
int exists_in_table(int v)
{
// return true in one of the first 4 iterations
// or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return 1;
}
return 0;
}
This is compiled (with no warning whatsoever) into:
exists_in_table:
mov eax, 1
ret
table:
.zero 16
Well, this is *obviously* wrong. And sure, so is the original code,
but I find it hard to think that the compiler isn't able to notice it,
given that it is even "exploiting" it to produce very efficient code.
I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible. The compiled function will
basically return 1 for any input, and the final program will be
buggy.
Wouldn't it be more sensible to have a compilation error, or
at least a warning? The compiler will be happy even with -Wall
-Wextra -Werror.
There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?
I mean, yes I would find the problem, thanks to my 100% coverage
unit testing, but couldn't the compiler give me a hint?
Could someone drive me into this reasoning? I know there is a lot of
thinking behind it, yet everything seems to me very incorrect!
I'm in deep cognitive dissonance here! :) Help!
On Thu, 1 Jan 2026 22:54:05 +0100
highcrew <high.crew3868@fastmail.com> wrote:
Hello,
While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.
Let's take an example.-a There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb
For the lazy, I report it here:
-a-a int table[4] = {0};
-a-a int exists_in_table(int v)
-a-a {
-a-a-a-a-a-a // return true in one of the first 4 iterations
-a-a-a-a-a-a // or UB due to out-of-bounds access
-a-a-a-a-a-a for (int i = 0; i <= 4; i++) {
-a-a-a-a-a-a-a-a-a-a if (table[i] == v) return 1;
-a-a-a-a-a-a }
-a-a-a-a-a-a return 0;
-a-a }
This is compiled (with no warning whatsoever) into:
-a-a exists_in_table:
-a-a-a-a-a-a-a-a-a-a mov-a-a-a-a eax, 1
-a-a-a-a-a-a-a-a-a-a ret
-a-a table:
-a-a-a-a-a-a-a-a-a-a .zero-a-a 16
It is UB, what the implement is irrevant.Well, this is *obviously* wrong. And sure, so is the original code,
but I find it hard to think that the compiler isn't able to notice it, given that it is even "exploiting" it to produce very efficient code.
I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible.-a The compiled function will
basically return 1 for any input, and the final program will be
buggy.
Wouldn't it be more sensible to have a compilation error, or
at least a warning?-a The compiler will be happy even with -Wall
-Wextra -Werror.
There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?
I mean, yes I would find the problem, thanks to my 100% coverage
unit testing, but couldn't the compiler give me a hint?
Could someone drive me into this reasoning? I know there is a lot of thinking behind it, yet everything seems to me very incorrect!
I'm in deep cognitive dissonance here! :) Help!
Personally, I am not shocked by gcc behavior in this case. May be,
saddened, but not shocked.
I am shocked by slightly modified variant of it.
struct {
-a int table[4];
-a int other_table[4];
} bar;
int exists_in_table(int v)
{
-a-a for (int i = 0; i <= 4; i++) {
-a-a-a-a if (bar.table[i] == v)
-a-a-a-a-a-a return 1;
-a-a }
-a-a return 0;
}
An original variant is unlikely to be present in the code bases that I
care about professionally. But something akin to modified variant could
be present.
Godbolt shows that this behaviour was first introduced in gcc5. It was backported to gcc4 series in gcc 4.8
One of my suspect code bases currently at gcc 4.7. I was considering
moving to 5.3. In lights of that example, I likely am not going to
do it.
Unless there is a magic flag that disables this optimization.
On Fri, 09 Jan 2026 01:42:53 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
highcrew <high.crew3868@fastmail.com> writes:
Hello,
While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.
Let's take an example. There's plenty here:
https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb
For the lazy, I report it here:
int table[4] = {0};
int exists_in_table(int v)
{
// return true in one of the first 4 iterations
// or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return 1;
}
return 0;
}
This is compiled (with no warning whatsoever) into:
exists_in_table:
mov eax, 1
ret
table:
.zero 16
Well, this is *obviously* wrong. And sure, so is the original code,
but I find it hard to think that the compiler isn't able to notice
it, given that it is even "exploiting" it to produce very efficient
code.
I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible. The compiled function will
basically return 1 for any input, and the final program will be
buggy.
Wouldn't it be more sensible to have a compilation error, or
at least a warning? The compiler will be happy even with -Wall
-Wextra -Werror.
There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?
I mean, yes I would find the problem, thanks to my 100% coverage
unit testing, but couldn't the compiler give me a hint?
Could someone drive me into this reasoning? I know there is a lot
of thinking behind it, yet everything seems to me very incorrect!
I'm in deep cognitive dissonance here! :) Help!
The important thing to realize is that the fundamental issue here
is not a technical question but a social question. In effect what
you are asking is "why doesn't gcc (or clang, or whatever) do what
I want or expect?". The answer is different people want or expect
different things. For some people the behavior described is
egregiously wrong and must be corrected immediately. For other
people the compiler is acting just as they think it should,
nothing to see here, just fix the code and move on to the next
bug. Different people have different priorities.
I have hard time imagining sort of people that would have objections in
case compiler generates the same code as today, but issues diagnostic.
On 2026-01-09, Michael S <already5chosen@yahoo.com> wrote:
On Fri, 09 Jan 2026 01:42:53 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
highcrew <high.crew3868@fastmail.com> writes:
Hello,
While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.
Let's take an example. There's plenty here:
https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb
For the lazy, I report it here:
int table[4] = {0};
int exists_in_table(int v)
{
// return true in one of the first 4 iterations
// or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return 1;
}
return 0;
}
This is compiled (with no warning whatsoever) into:
exists_in_table:
mov eax, 1
ret
table:
.zero 16
Well, this is *obviously* wrong. And sure, so is the original
code, but I find it hard to think that the compiler isn't able
to notice it, given that it is even "exploiting" it to produce
very efficient code.
I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible. The compiled function will
basically return 1 for any input, and the final program will be
buggy.
Wouldn't it be more sensible to have a compilation error, or
at least a warning? The compiler will be happy even with -Wall
-Wextra -Werror.
There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?
I mean, yes I would find the problem, thanks to my 100% coverage
unit testing, but couldn't the compiler give me a hint?
Could someone drive me into this reasoning? I know there is a
lot of thinking behind it, yet everything seems to me very
incorrect! I'm in deep cognitive dissonance here! :) Help!
The important thing to realize is that the fundamental issue here
is not a technical question but a social question. In effect what
you are asking is "why doesn't gcc (or clang, or whatever) do what
I want or expect?". The answer is different people want or expect
different things. For some people the behavior described is
egregiously wrong and must be corrected immediately. For other
people the compiler is acting just as they think it should,
nothing to see here, just fix the code and move on to the next
bug. Different people have different priorities.
I have hard time imagining sort of people that would have
objections in case compiler generates the same code as today, but
issues diagnostic.
If false positives occur for the diagnostic frequently, there
will be legitimate complaint.
If there is only a simple switch for it, it will get turned off
and then it no longer serves its purpose of catching errors.
There are all kinds of optimizations compilers commonly do that could
also be erroneous situations. For instance, eliminating dead code.
// code portable among several types of systems:
switch (sizeof var) {
case 2: ...
case 4: ...
case 8: ...
}
sizeof var is a compile time constant expected to be 2, 4 or 8 bytes.
The other cases are unreachable code.
Suppose every time the compiler eliminates unreachable code, it
issues a diagnostic "foo.c:42: 3 lines of unreachable code removed".
That would be annoying when the programmer knows about dead code
elimination and is counting on it.
We also have to consider that not all code is written directly by
hand.
Code generation techniques (including macros) can produce "weird" code
in some of their corner cases. The code is correct, and it would take
more complexity to identify those cases and generate more idiomatic
code; it is left to the compiler to clean up.
On 2026-01-09, Michael S <already5chosen@yahoo.com> wrote:
On Fri, 09 Jan 2026 01:42:53 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
The important thing to realize is that the fundamental issue here
is not a technical question but a social question. In effect what
you are asking is "why doesn't gcc (or clang, or whatever) do what
I want or expect?". The answer is different people want or expect
different things. For some people the behavior described is
egregiously wrong and must be corrected immediately. For other
people the compiler is acting just as they think it should,
nothing to see here, just fix the code and move on to the next
bug. Different people have different priorities.
I have hard time imagining sort of people that would have
objections in case compiler generates the same code as today, but
issues diagnostic.
If false positives occur for the diagnostic frequently, there
will be legitimate complaint.
If there is only a simple switch for it, it will get turned off
and then it no longer serves its purpose of catching errors.
There are all kinds of optimizations compilers commonly do that could
also be erroneous situations. For instance, eliminating dead code.
If there is only a simple switch for it, it will get turned off
and then it no longer serves its purpose of catching errors.
On Fri, 09 Jan 2026 01:42:53 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
highcrew <high.crew3868@fastmail.com> writes:
Hello,
While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.
Let's take an example. There's plenty here:
https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb
For the lazy, I report it here:
int table[4] = {0};
int exists_in_table(int v)
{
// return true in one of the first 4 iterations
// or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return 1;
}
return 0;
}
This is compiled (with no warning whatsoever) into:
exists_in_table:
mov eax, 1
ret
table:
.zero 16
Well, this is *obviously* wrong. And sure, so is the original code,
but I find it hard to think that the compiler isn't able to notice
it, given that it is even "exploiting" it to produce very efficient
code.
I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible. The compiled function will
basically return 1 for any input, and the final program will be
buggy.
Wouldn't it be more sensible to have a compilation error, or
at least a warning? The compiler will be happy even with -Wall
-Wextra -Werror.
There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?
I mean, yes I would find the problem, thanks to my 100% coverage
unit testing, but couldn't the compiler give me a hint?
Could someone drive me into this reasoning? I know there is a lot
of thinking behind it, yet everything seems to me very incorrect!
I'm in deep cognitive dissonance here! :) Help!
The important thing to realize is that the fundamental issue here
is not a technical question but a social question. In effect what
you are asking is "why doesn't gcc (or clang, or whatever) do what
I want or expect?". The answer is different people want or expect
different things. For some people the behavior described is
egregiously wrong and must be corrected immediately. For other
people the compiler is acting just as they think it should,
nothing to see here, just fix the code and move on to the next
bug. Different people have different priorities.
I have hard time imagining sort of people that would have objections
in case compiler generates the same code as today, but issues
diagnostic.
Michael S <already5chosen@yahoo.com> writes:
On Fri, 09 Jan 2026 01:42:53 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
highcrew <high.crew3868@fastmail.com> writes:
Hello,
While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.
Let's take an example. There's plenty here:
https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb
For the lazy, I report it here:
int table[4] = {0};
int exists_in_table(int v)
{
// return true in one of the first 4 iterations
// or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return 1;
}
return 0;
}
This is compiled (with no warning whatsoever) into:
exists_in_table:
mov eax, 1
ret
table:
.zero 16
Well, this is *obviously* wrong. And sure, so is the original
code, but I find it hard to think that the compiler isn't able to
notice it, given that it is even "exploiting" it to produce very
efficient code.
I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible. The compiled function will
basically return 1 for any input, and the final program will be
buggy.
Wouldn't it be more sensible to have a compilation error, or
at least a warning? The compiler will be happy even with -Wall
-Wextra -Werror.
There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?
I mean, yes I would find the problem, thanks to my 100% coverage
unit testing, but couldn't the compiler give me a hint?
Could someone drive me into this reasoning? I know there is a lot
of thinking behind it, yet everything seems to me very incorrect!
I'm in deep cognitive dissonance here! :) Help!
The important thing to realize is that the fundamental issue here
is not a technical question but a social question. In effect what
you are asking is "why doesn't gcc (or clang, or whatever) do what
I want or expect?". The answer is different people want or expect
different things. For some people the behavior described is
egregiously wrong and must be corrected immediately. For other
people the compiler is acting just as they think it should,
nothing to see here, just fix the code and move on to the next
bug. Different people have different priorities.
I have hard time imagining sort of people that would have objections
in case compiler generates the same code as today, but issues
diagnostic.
It depends on what the tradeoffs are. For example, given a
choice, I would rather have an option to prevent this particular
death-by-UB optimization than an option to issue a diagnostic.
Having both costs more effort than having just only one.
But there are limits to what considered negotiable by worshippers of
nasal demons and what is beyond that. Warning is negotiable, turning
off the transformation is most likely beyond.
Michael S <already5chosen@yahoo.com> writes:
[...]
But there are limits to what considered negotiable by worshippers of
nasal demons and what is beyond that. Warning is negotiable, turning
off the transformation is most likely beyond.
Your use of the word "worshippers" suggests a misunderstanding on
your part.
I certainly do not "worship" anything about C. I don't think
anyone else you've been talking to does either. I have a pretty
good understanding of it. There are plenty of things I don't
particularly like.
In the vast majority of my posts here, I simply try to explain what
the standard actually says and offer advice based on that.
Hello,
While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.
Let's take an example. There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb
For the lazy, I report it here:
int table[4] = {0};
int exists_in_table(int v)
{
// return true in one of the first 4 iterations
// or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return 1;
}
return 0;
}
This is compiled (with no warning whatsoever) into:
exists_in_table:
mov eax, 1
ret
table:
.zero 16
Well, this is *obviously* wrong. And sure, so is the original code,
but I find it hard to think that the compiler isn't able to notice it,
given that it is even "exploiting" it to produce very efficient code.
I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible. The compiled function will
basically return 1 for any input, and the final program will be
buggy.
Wouldn't it be more sensible to have a compilation error, or
at least a warning? The compiler will be happy even with -Wall
-Wextra -Werror.
There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?
I mean, yes I would find the problem, thanks to my 100% coverage
unit testing, but couldn't the compiler give me a hint?
Could someone drive me into this reasoning? I know there is a lot of
thinking behind it, yet everything seems to me very incorrect!
I'm in deep cognitive dissonance here! :) Help!
On Thu, 1 Jan 2026 22:54:05 +0100
On related note.
struct bar1 {
int table[4];
int other_table[4];
};
struct bar2 {
int other_table[4];
int table[4];
};
int foo1(struct bar1* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->table[i] == v)
return 1;
return 0;
}
int foo2(struct bar2* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->table[i] == v)
return 1;
return 0;
}
According to C Standard, access to p->table[4] in foo1() is UB.
[O.T.]
I want to use language (or, better, standardize dialect of C) in which behavior in this case is defined, but I am bad at influencing other
people. So can not get what I want.
[/O.T.]
Now the question.
What The Standard says about foo2() ? Is there UB in foo2() as well?
According to C Standard, access to p->table[4] in foo1() is UB.
...
Now the question.
What The Standard says about foo2() ? Is there UB in foo2() as well?
gcc code generator does not think so.
On Mon 1/12/2026 6:28 AM, Michael S wrote:
According to C Standard, access to p->table[4] in foo1() is UB.
...
Now the question.
What The Standard says about foo2() ? Is there UB in foo2() as
well?
Yes, in the same sense as in `foo1`.
gcc code generator does not think so.
It definitely does.
However, since this is the trailing array member
of the struct, GCC does not want to accidentally suppress the classic "struct hack". It assumes that quite possibly the pointer passed to
the function points to a struct object allocated through the "struct
hack" technique.
table plays a role FMA. A lot of code depends on such pattern. It'srather standard practice in communication programming. Esp. so in C90,
Add an extra field after the trailing array and `foo2` will also fold
into `return 1`, just like `foo1`.
Perhaps there's a switch in GCC that would outlaw the classic "struct hack"... But in any case, it is not prohibited by default for
compatibility with pre-C99 code.
On 12/01/2026 14:28, Michael S wrote:
On Thu, 1 Jan 2026 22:54:05 +0100
On related note.
struct bar1 {
int table[4];
int other_table[4];
};
struct bar2 {
int other_table[4];
int table[4];
};
int foo1(struct bar1* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->table[i] == v)
return 1;
return 0;
}
int foo2(struct bar2* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->table[i] == v)
return 1;
return 0;
}
According to C Standard, access to p->table[4] in foo1() is UB.
[O.T.]
I want to use language (or, better, standardize dialect of C) in
which behavior in this case is defined, but I am bad at influencing
other people. So can not get what I want.
[/O.T.]
So you want to deliberately read one element past the end because you
know it will be the first element of other_table?
I think then it would be better writing it like this:
struct bar1 {
union {
struct {
int table[4];
int other_table[4];
};
int xtable[8];
};
};
int foo1(struct bar1* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->xtable[i] == v)
return 1;
return 0;
}
At least your intent is signaled to whomever is reading your code.
But I don't know if UB goes away, if you intend writing to .table and .other_table, and reading those values via .xtable (I can't remember
the rules).
I'm not even sure about there being no padding between .table and .other_table.
(In my systems language, the behaviour of your original foo1, in an equivalent program, is well-defined. But not of foo2, given that you
may read some garbage value beyond the struct, which may or may not
be within valid memory.)
Now the question.
What The Standard says about foo2() ? Is there UB in foo2() as
well?
Given that you may be reading garbage as I said, whether it is UB or
not is irrelevant; your program has a bug.
Unless you can add extra context which would make that reasonable.
For example, the struct is within an array, it's not the last
element, so it will read the first element of .other_table, and you
are doing this knowingly rather than through oversight.
It might well be UB, but that is a separate problem.
On Mon, 12 Jan 2026 15:58:15 +0000
bart <bc@freeuk.com> wrote:
On 12/01/2026 14:28, Michael S wrote:
On Thu, 1 Jan 2026 22:54:05 +0100
On related note.
struct bar1 {
int table[4];
int other_table[4];
};
struct bar2 {
int other_table[4];
int table[4];
};
int foo1(struct bar1* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->table[i] == v)
return 1;
return 0;
}
int foo2(struct bar2* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->table[i] == v)
return 1;
return 0;
}
According to C Standard, access to p->table[4] in foo1() is UB.
[O.T.]
I want to use language (or, better, standardize dialect of C) in
which behavior in this case is defined, but I am bad at influencing
other people. So can not get what I want.
[/O.T.]
So you want to deliberately read one element past the end because you
know it will be the first element of other_table?
Yes. I primarily want it for multi-dimensional arrays.
On Mon, 12 Jan 2026 08:03:31 -0800
Andrey Tarasevich <noone@noone.net> wrote:
On Mon 1/12/2026 6:28 AM, Michael S wrote:
According to C Standard, access to p->table[4] in foo1() is UB.
...
Now the question.
What The Standard says about foo2() ? Is there UB in foo2() as
well?
Yes, in the same sense as in `foo1`.
gcc code generator does not think so.
It definitely does.
Do you have citation from the Standard?
Michael S <already5chosen@yahoo.com> writes:
On Mon, 12 Jan 2026 08:03:31 -0800
Andrey Tarasevich <noone@noone.net> wrote:
On Mon 1/12/2026 6:28 AM, Michael S wrote:
According to C Standard, access to p->table[4] in foo1() is UB.
...
Now the question.
What The Standard says about foo2() ? Is there UB in foo2() as
well?
Yes, in the same sense as in `foo1`.
gcc code generator does not think so.
It definitely does.
Right.
Do you have citation from the Standard?
The short answer is section 6.5.6 paragraph 8.
There is amplification in Annex J.2, roughly three pages
after the start of J.2. You can search for "an array
subscript is out of range", where there is a clarifying
example.
Normally phrase "worshippers of nasal demons" in my posts refers to
faction among developers and maintainers of gcc and clang compilers. I
think that it's not an unusual use of the phrase, but I can be wrong
about it.
AFAIK, you are not gcc or clang maintainer. So, not a "worshipper".described as one myself. It means those who are knowledgeable about what
When I want to characterize [in derogatory fashion] people that have no direct influence on behavior of common software tools, but share the
attitude of "worshippers" toward UBs then I use phrase 'language lawyers'."language lawyers", at least, I understand, having frequently been
On Thu, 1 Jan 2026 22:54:05 +0100
On related note.
struct bar1 {
int table[4];
int other_table[4];
};
struct bar2 {
int other_table[4];
int table[4];
};
int foo1(struct bar1* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->table[i] == v)
return 1;
return 0;
}
int foo2(struct bar2* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->table[i] == v)
return 1;
return 0;
}
According to C Standard, access to p->table[4] in foo1() is UB.
[O.T.]
I want to use language (or, better, standardize dialect of C) in which behavior in this case is defined, but I am bad at influencing other
people. So can not get what I want.
Michael S <already5chosen@yahoo.com> writes:...
On Mon, 12 Jan 2026 15:58:15 +0000
bart <bc@freeuk.com> wrote:
On 12/01/2026 14:28, Michael S wrote:
...struct bar1 {
int table[4];
int other_table[4];
};
So you want to deliberately read one element past the end because you
know it will be the first element of other_table?
Yes. I primarily want it for multi-dimensional arrays.
So declare it as int table[4][4].
On Mon, 12 Jan 2026 12:03:36 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Mon, 12 Jan 2026 08:03:31 -0800
Andrey Tarasevich <noone@noone.net> wrote:
On Mon 1/12/2026 6:28 AM, Michael S wrote:
According to C Standard, access to p->table[4] in foo1() is UB.
...
Now the question.
What The Standard says about foo2() ? Is there UB in foo2() as
well?
Yes, in the same sense as in `foo1`.
gcc code generator does not think so.
It definitely does.
Right.
May be. But it's not expressed by gcc code generator or by any wranings.
So, how can we know?
Do you have citation from the Standard?
The short answer is section 6.5.6 paragraph 8.
I am reading N3220 draft https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf
Here section 6.5.6 has no paragraph 8 :(
There is amplification in Annex J.2, roughly three pages
after the start of J.2. You can search for "an array
subscript is out of range", where there is a clarifying
example.
I see the following text:
"An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.7)."
That's what you had in mind?
On 12/01/2026 14:28, Michael S wrote:
On Thu, 1 Jan 2026 22:54:05 +0100
On related note.
struct bar1 {
int table[4];
int other_table[4];
};
struct bar2 {
int other_table[4];
int table[4];
};
int foo1(struct bar1* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->table[i] == v)
return 1;
return 0;
}
int foo2(struct bar2* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->table[i] == v)
return 1;
return 0;
}
According to C Standard, access to p->table[4] in foo1() is UB.
[O.T.]
I want to use language (or, better, standardize dialect of C) in
which behavior in this case is defined, but I am bad at influencing
other people. So can not get what I want.
OK - so how do you want it to be defined? I've used languages where
table[n] for n>3 would have exactly the same effect as table[3], and table[n] for n<0 would have exactly the same effect as table[0]. I've
seen algorithms that were actually simplified by relying upon this
behavior.
On 2026-01-12 15:02, Scott Lurndal wrote:
Michael S <already5chosen@yahoo.com> writes:...
On Mon, 12 Jan 2026 15:58:15 +0000
bart <bc@freeuk.com> wrote:
On 12/01/2026 14:28, Michael S wrote:
...struct bar1 {
int table[4];
int other_table[4];
};
So you want to deliberately read one element past the end because
you know it will be the first element of other_table?
Yes. I primarily want it for multi-dimensional arrays.
So declare it as int table[4][4].
Note that this suggestion does not make the behavior defined. It is undefined behavior to make dereference table[0]+4, and it is
undefined behavior to make any use of table[0]+5.
But I was interested in the "opinion" of C Standard rather than of gcc compiler.
Is it full nasal UB or merely "implementation-defined behavior"?
Perhaps there's a switch in GCC that would outlaw the classic "struct
hack"... But in any case, it is not prohibited by default for
compatibility with pre-C99 code.
gcc indeed has something of this sort : -fstrict-flex-arrays=3
But at the moment it does not appear to affect code generation [in this particular example].
| Sysop: | Amessyroom |
|---|---|
| Location: | Fayetteville, NC |
| Users: | 54 |
| Nodes: | 6 (0 / 6) |
| Uptime: | 14:07:18 |
| Calls: | 742 |
| Files: | 1,218 |
| D/L today: |
3 files (2,681K bytes) |
| Messages: | 183,841 |
| Posted today: | 1 |