Forum: Too Lazy BBS

Re: On Undefined Behavior

From David Brown@david.brown@hesbynett.no to comp.lang.c on Mon Jan 5 15:39:28 2026

From Newsgroup: comp.lang.c

On 04/01/2026 12:51, highcrew wrote:

On 1/4/26 2:10 AM, Paul J. Lucas wrote:

This particular example is explained is several places, e.g.:

https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633

At a cursory read, that article looks okay. The lesson to learn is
"look before you leap" - don't use data if you are not sure it is valid,
and certainly don't add new uses of the data (such as debug prints) just before validity checks!

It does, however, perpetuate the myth that there is a clear distinction between "classical compilers" or "non-optimising compilers" and
"optimising compilers". That is not true - for any two standards
conforming compilers (or selection of flags for the same compiler), the
same source code is equally defined or undefined. Source code with UB
has UB whether it is "optimised" or not, though the colour of the
resulting nasal daemons may vary.

Perhaps a slightly better explanation of the same example:

https://medium.com/@pauljlucas/undefined-behavior-in-c-and-c-f30844f20e2a

That one starts off with a bit of a jumble of misconceptions.

To start with, "undefined behaviour" does not exist because of
compatibility issues or the merging of different C variations into one standard C. It is a fundamental principle in programming because many computing functions are, mathematically, partial functions - they can
only give a sensible defined result for some inputs. While it can
sometimes be possible to verify the validity of inputs, it is often
infeasible or at least very costly, especially in non-managed (compiled) languages. Pointer dereference, for example, only has defined behaviour
if the pointer points to a valid object - otherwise the result is
meaningless (even if some assembly code can be generated). Garbage in, garbage out - see the Babbage quotation.

The C standard is simply somewhat unusual in that it is more explicit
about UB than many languages' documentation. And being a language
intended for maximally efficient code, C leaves a number of things as UB
where other languages might throw an exception or have other error handling.

The definition given for "implementation defined behaviour" and
"unspecified behaviour" is poor. (IMHO the comp.lang.c FAQ is
inaccurate here.) In particular, "unspecified behaviour" does not need
to be consistent. For example, the order of evaluation of function
arguments is unspecified, and can be done in different orders at
different call sites - even in identical source code. It can even be re-ordered between different invocations of the same code - perhaps due
to complicated inter-procedural optimisations, inlining, code cloning,
and constant propagation.

It then goes on to say that the order of evaluation of the operands of
"+" are implementation defined, when it is in fact a good example of unspecified behaviour that is /not/ implementation defined.

Implementation defined behaviour is /not/ "bad" - pretty much all
programs rely on implementation-defined behaviour such as the size of
"int", character sets used, etc. Relying on implementation-defined
behaviour reduces the portability of code, but that is not necessary a
bad thing.

And while it is true that UB is "worse" than either
implementation-defined behaviour or unspecified behaviour, it is not for either of the reasons given. The *nix program "date" does not need to
contain UB in order to produce different results at different times.

The examples of UB, and the consequences of them, are better.

It also makes the mistake common in discussions of UB optimisations of concluding that the optimisation makes the code "wrong". Optimisations,
such as the example of the "assign_not_null" function, are "logically
valid" and /correct/ from the given source code. Optimisations have not
made the code "wrong", nor has the compiler. The source code is correct
for a given validity subset of its parameter types, and the object code
is correct for that same subset. If the source code is intended to work
over a wider range of inputs, then it is the source code that is wrong -
not the optimiser or the optimised code.

- Paul

Hey, thanks for the pointers.
I found the second a really good write up!

I've seen worse, but it could be better.

--- Synchronet 3.21a-Linux NewsLink 1.2

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Tue Jan 6 13:08:57 2026

From Newsgroup: comp.lang.c

David Brown <david.brown@hesbynett.no> wrote:

On 04/01/2026 00:25, highcrew wrote:

On 1/4/26 12:15 AM, highcrew wrote:

I have a horrible question now, but that's for a
separate question...

And the question is:

Embedded systems.-a Address 0x00000000 is mapped to the flash.
I want to assign a pointer to 0x00000000 and dereference it to
read the first word.
That's UB.

How do I?

Now I guess that an embedded compiler targeting that certain
architecture where dereferencing 0 makes sense will not treat
it as UB.-a But it is for sure a weird corner case.

There are some common misconceptions about null pointers in C. A "null pointer" is the result of converting a "null pointer constant", or
another "null pointer", to a pointer type. A null pointer constant is either an integer constant expression with the value 0 (such as the
constant 0, or "1 - 1"), or "nullptr" in C23. You can use "NULL" from <stddef.h> as a null pointer constant.

So if you write "int * p = 0;", then "p" holds a null pointer. If you
write "int * p = (int *) sizeof(*p); p--;" then "p" does not hold a null pointer, even though it will hold the value "0".

On virtually all real-world systems, including all embedded systems I
have ever known (and that's quite a few), null pointers correspond to
the address 0. But that does not mean that dereferencing a pointer
whose value is 0 is necessarily UB.

And even when dereferencing a pointer /is/ UB, a compiler can handle it
as defined if it wants.

I think that if you have a microcontroller with code at address 0, and a pointer of some object type (say, "const uint8_t * p" or "const uint32_t
* p") holding the address 0, then using that to read the flash at that address is UB. But it is not UB because "p" holds a null pointer - it
may or may not be a null pointer. It is UB because "p" does not point
to an object.

In practice, I have never seen an embedded compiler fail to do the
expected thing when reading flash from address 0. (Typical use-cases
are for doing CRC checks or signature checks on code, or for reading the initial stack pointer value or reset vector of the code.) If you want
to be more confident, use a pointer to volatile type.

For curiosity I tried the following:

#include <stdint.h>

uint32_t
read_at0(uint32_t * p) {
if (!p) {
return *p;
} else {
return 0;
}
}

that is we read trough a pointer only when it is a null pointer.
Using gcc-12 with command line:

arm-none-eabi-gcc -O3 -fverbose-asm -fno-builtin -Wall -g -mthumb -mcpu=cortex-m3 -c ts_null.c

I get the following assembly:

00000000 <read_at0>:
0: b108 cbz r0, 6 <read_at0+0x6>
2: 2000 movs r0, #0
4: 4770 bx lr
6: 6803 ldr r3, [r0, #0]
8: deff udf #255 @ 0xff
a: bf00 nop

So compiler generates actiual access, but then, instead of returning
the value it executes undefined opcode. Without test for null
pointer I get simple access to memory.

So at least with gcc access works as long as compiler does not
know that it is accessing null pointer. But if compiler can
infer that pointer is null generated code may do strange
things.

Putting volatile qualifier on p gives working code, but apparently
disables optimization. Also, this looks fragile. So if I needed
to access address 0 I probably would use assembly routine to do this.
--
Waldek Hebisch
--- Synchronet 3.21a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Tue Jan 6 21:59:56 2026

From Newsgroup: comp.lang.c

On Tue, 6 Jan 2026 13:08:57 -0000 (UTC), Waldek Hebisch wrote:

Putting volatile qualifier on p gives working code, but apparently
disables optimization. Also, this looks fragile. So if I needed
to access address 0 I probably would use assembly routine to do this.

Seems to be a fundamental C language limitation, wouldnrCOt you say?
--- Synchronet 3.21a-Linux NewsLink 1.2

From Paul J. Lucas@paul@lucasmail.org to comp.lang.c on Tue Jan 6 18:08:22 2026

From Newsgroup: comp.lang.c

On 1/5/26 6:39 AM, David Brown wrote:

On 04/01/2026 12:51, highcrew wrote:

On 1/4/26 2:10 AM, Paul J. Lucas wrote:

Perhaps a slightly better explanation of the same example:

https://medium.com/@pauljlucas/undefined-behavior-in-c-and-c-
f30844f20e2a

That one starts off with a bit of a jumble of misconceptions.

To start with, "undefined behaviour" does not exist because of
compatibility issues or the merging of different C variations into one standard C.

...

The C standard is simply somewhat unusual in that it is more explicit
about UB than many languages' documentation.-a And being a language
intended for maximally efficient code, C leaves a number of things as UB where other languages might throw an exception or have other error
handling.

Other languages had the luxury of doing that. As the article pointed
out, C had existed for over a decade before the standard and there were
many programs in the wild that relied on their existing behaviors. By
this time, the C standard could not retroactively "throw an exception or
have other error handling" since it would have broken those programs, so
it _had_ to leave many things as UB explicitly. Hence, the article
isn't wrong.

Implementation defined behaviour is /not/ "bad" - pretty much all
programs rely on implementation-defined behaviour such as the size of
"int", character sets used, etc.-a Relying on implementation-defined behaviour reduces the portability of code, but that is not necessary a
bad thing.

It's "bad" if a naive programmer isn't aware it's implementation defined
and just assumes it's defined however it's defined on his machine.

And while it is true that UB is "worse" than either implementation-
defined behaviour or unspecified behaviour, it is not for either of the reasons given.-a The *nix program "date" does not need to contain UB in order to produce different results at different times.

Sure, but the article didn't mean such cases. It meant for cases like incrementing a signed integer past INT_MAX. A program could
legitimately give different answers for the same line of code at
different times.

It also makes the mistake common in discussions of UB optimisations of concluding that the optimisation makes the code "wrong".-a Optimisations, such as the example of the "assign_not_null" function, are "logically
valid" and /correct/ from the given source code.-a Optimisations have not made the code "wrong", nor has the compiler.-a The source code is correct for a given validity subset of its parameter types, and the object code
is correct for that same subset.-a If the source code is intended to work over a wider range of inputs, then it is the source code that is wrong -
not the optimiser or the optimised code.

What the author meant is that optimization can make UB manifest more
bizarrely in ways than not optimizing wouldn't. Code that contains UB
is always wrong.

- Paul
--- Synchronet 3.21a-Linux NewsLink 1.2

From candycanearter07@candycanearter07@candycanearter07.nomail.afraid to comp.lang.c on Wed Jan 7 06:40:03 2026

From Newsgroup: comp.lang.c

Lawrence DrCOOliveiro <ldo@nz.invalid> wrote at 21:42 this Sunday (GMT):

On Sun, 4 Jan 2026 14:38:00 +0100, highcrew wrote:

Not differently from halting problem: sure, it is theoretically
impossible to understand if a program will terminate, but in
practical terms, if you expect it to take less than 1 second and it
takes more than 10, you area already hitting ^C and conjecturing
that something went horribly wrong :D

What do Windows users hit instead of CTRL/C? Because CTRL/C means
something different to them, doesnrCOt it?

ctrl-alt-delete?
--
user <candycane> is generated from /dev/urandom
--- Synchronet 3.21a-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Wed Jan 7 11:25:50 2026

From Newsgroup: comp.lang.c

On 07/01/2026 03:08, Paul J. Lucas wrote:

On 1/5/26 6:39 AM, David Brown wrote:

On 04/01/2026 12:51, highcrew wrote:

On 1/4/26 2:10 AM, Paul J. Lucas wrote:

Perhaps a slightly better explanation of the same example:

https://medium.com/@pauljlucas/undefined-behavior-in-c-and-c-
f30844f20e2a

That one starts off with a bit of a jumble of misconceptions.

To start with, "undefined behaviour" does not exist because of
compatibility issues or the merging of different C variations into one
standard C.

...

The C standard is simply somewhat unusual in that it is more explicit
about UB than many languages' documentation.-a And being a language
intended for maximally efficient code, C leaves a number of things as
UB where other languages might throw an exception or have other error
handling.

Other languages had the luxury of doing that.-a As the article pointed
out, C had existed for over a decade before the standard and there were
many programs in the wild that relied on their existing behaviors.-a By
this time, the C standard could not retroactively "throw an exception or
have other error handling" since it would have broken those programs, so
it _had_ to leave many things as UB explicitly.-a Hence, the article
isn't wrong.

UB as a /concept/ does not exist because of compatibility issues.
Certain particular things may have been declared UB in C because of compatibility between different existing compilers or different targets (though it is more common for such things to be declared "implementation-defined" rather than UB). I am, however, having
difficulty finding examples of that for run-time UB. (There are plenty
of situations where there is UB that could be identified at compile-time
or link time, but the standard does not require toolchains to diagnose.)

The idea that something can be expressed in a programming language,
without errors in syntax, but have no meaningful or correct behaviour,
is not new, and not restricted to C. UB in C is not different from
asking for the square root of a negative number in the real domain, or
asking a kid to add 3 and 4 using the fingers of one hand.

Implementation defined behaviour is /not/ "bad" - pretty much all
programs rely on implementation-defined behaviour such as the size of
"int", character sets used, etc.-a Relying on implementation-defined
behaviour reduces the portability of code, but that is not necessary a
bad thing.

It's "bad" if a naive programmer isn't aware it's implementation defined
and just assumes it's defined however it's defined on his machine.

Sure. But that applies to all portability issues - people make all
sorts of assumptions about the system their code will be used on, of
which the implementation-defined aspects of C are only a small part.

And while it is true that UB is "worse" than either implementation-
defined behaviour or unspecified behaviour, it is not for either of
the reasons given.-a The *nix program "date" does not need to contain
UB in order to produce different results at different times.

Sure, but the article didn't mean such cases.

If the author meant something different, he/she should have written
something different.

It meant for cases like
incrementing a signed integer past INT_MAX.-a A program could
legitimately give different answers for the same line of code at
different times.

It could also give different answers for unspecified behaviour :

int first(void) { printf("1 "); return 1; }
int second(void) { printf("2 "); return 2; }

int x = first() + second();

The evaluation order of the operands of the addition - and therefore the
order of the debug prints, is unspecified. Not only is the order not something specified by the C standards, but it is not something that
needs to be consistent even between different runs of the same code.

So this "giving different answers" is not something special about UB.

It also makes the mistake common in discussions of UB optimisations of
concluding that the optimisation makes the code "wrong".
Optimisations, such as the example of the "assign_not_null" function,
are "logically valid" and /correct/ from the given source code.
Optimisations have not made the code "wrong", nor has the compiler.
The source code is correct for a given validity subset of its
parameter types, and the object code is correct for that same subset.
If the source code is intended to work over a wider range of inputs,
then it is the source code that is wrong - not the optimiser or the
optimised code.

What the author meant is that optimization can make UB manifest more bizarrely in ways than not optimizing wouldn't.-a Code that contains UB
is always wrong.

If the author meant something different from what he wrote, it would
have been better if he wrote what he meant.

Yes, in practice you /can/ get a wider variety of strange results from
code with UB if you use a highly optimising compiler compared to a
simple compiler. But there are no guarantees there - you can get
strange results from UB when not optimising, and perhaps enabling
optimisation will give you simple and more consistent results (possibly
the results you expected, possibly not).

It is fine to tell people about some of the strange possibilities that
can occur when you have UB. But anything that even sounds vaguely like
a suggestion that you can mitigate the dangers of UB by disabling
optimisation is bad. Far too many C programmers believe that.

--- Synchronet 3.21a-Linux NewsLink 1.2

From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Wed Jan 7 06:31:31 2026

From Newsgroup: comp.lang.c

On 2026-01-06 21:08, Paul J. Lucas wrote:
...

What the author meant is that optimization can make UB manifest more bizarrely in ways than not optimizing wouldn't. Code that contains UB
is always wrong.

"undefined behavior" is defined by the C standard as referring to
behavior on which "this international standard imposes no requirements".
It remains UB even if some other document imposes requirements on the
behavior. In particular, if a given implementation implements an
extension that gives defined behavior to code that the C standard does
not, it's still UB, but it's entirely reasonable for users of that implementation to decide they want to use that extension.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Wed Jan 7 14:10:46 2026

From Newsgroup: comp.lang.c

On Tue, 6 Jan 2026 18:08:22 -0800
"Paul J. Lucas" <paul@lucasmail.org> wrote:

Other languages had the luxury of doing that. As the article pointed
out, C had existed for over a decade before the standard and there
were many programs in the wild that relied on their existing
behaviors. By this time, the C standard could not retroactively
"throw an exception or have other error handling" since it would have
broken those programs, so it _had_ to leave many things as UB
explicitly. Hence, the article isn't wrong.

O.T.
Rust exists for 13 years without standard. Did not prevent it from
becoming more hyped than Ada in her heyday.

Go exists without standard for how long? 20 years?
But at least in case of Go there exists official specification that
is not rewritten on every Tuesday.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Andrey Tarasevich@noone@noone.net to comp.lang.c on Wed Jan 7 20:48:03 2026

From Newsgroup: comp.lang.c

On Tue 1/6/2026 5:08 AM, Waldek Hebisch wrote:

I get the following assembly:

00000000 <read_at0>:
0: b108 cbz r0, 6 <read_at0+0x6>
2: 2000 movs r0, #0
4: 4770 bx lr
6: 6803 ldr r3, [r0, #0]
8: deff udf #255 @ 0xff
a: bf00 nop

So compiler generates actiual access, but then, instead of returning
the value it executes undefined opcode. Without test for null
pointer I get simple access to memory.

When it comes to invalid (or missing, in C++) `return` statements, GCC
tends to adhere to a "punitive" approach in optimized code - it injects instructions to deliberately cause a crash/segfault in such cases.

Clang on the other hand tends to stick to the uniform approach based on
the "UB cannot happen" methodology, i.e. your code sample would be
translated under "p is never null" assumption, and the function will
fold into a simple unconditional `return 0`.
--
Best regards,
Andrey
--- Synchronet 3.21a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Thu Jan 8 23:56:16 2026

From Newsgroup: comp.lang.c

On Wed, 7 Jan 2026 20:48:03 -0800, Andrey Tarasevich wrote:

When it comes to invalid (or missing, in C++) `return` statements,
GCC tends to adhere to a "punitive" approach in optimized code - it
injects instructions to deliberately cause a crash/segfault in such
cases.

Clang on the other hand tends to stick to the uniform approach based
on the "UB cannot happen" methodology, i.e. your code sample would
be translated under "p is never null" assumption, and the function
will fold into a simple unconditional `return 0`.

Which one is more likely to lead to unexpected, hard-to-debug results?
--- Synchronet 3.21a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Fri Jan 9 01:42:53 2026

From Newsgroup: comp.lang.c

highcrew <high.crew3868@fastmail.com> writes:

Hello,

While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.

Let's take an example. There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb

For the lazy, I report it here:

int table[4] = {0};
int exists_in_table(int v)
{
// return true in one of the first 4 iterations
// or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return 1;
}
return 0;
}

This is compiled (with no warning whatsoever) into:

exists_in_table:
mov eax, 1
ret
table:
.zero 16

Well, this is *obviously* wrong. And sure, so is the original code,
but I find it hard to think that the compiler isn't able to notice it,
given that it is even "exploiting" it to produce very efficient code.

I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible. The compiled function will
basically return 1 for any input, and the final program will be
buggy.

Wouldn't it be more sensible to have a compilation error, or
at least a warning? The compiler will be happy even with -Wall -Wextra -Werror.

There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?

I mean, yes I would find the problem, thanks to my 100% coverage
unit testing, but couldn't the compiler give me a hint?

Could someone drive me into this reasoning? I know there is a lot of thinking behind it, yet everything seems to me very incorrect!
I'm in deep cognitive dissonance here! :) Help!

The important thing to realize is that the fundamental issue here
is not a technical question but a social question. In effect what
you are asking is "why doesn't gcc (or clang, or whatever) do what
I want or expect?". The answer is different people want or expect
different things. For some people the behavior described is
egregiously wrong and must be corrected immediately. For other
people the compiler is acting just as they think it should,
nothing to see here, just fix the code and move on to the next
bug. Different people have different priorities.

After observing that, I think the right question is something like
"Given that compilers act in these surprising ways, how should I
protect my code so that it doesn't fall prey to the death-by-UB
syndrome, or what can I do to diagnose a possibly death-by-UB
situation when a strange bug crops up?" I don't pretend to have
good answers to these questions. The best advice I can give
(besides seeking help from others with more experience) is to be
persistent, and to realize that the skills needed for combating a
death-by-UB syndrome are rather different from the skills needed
for regular programming. I have been in the situation of being
made responsible for finding and correcting a death-by-UB kind of
symptom, and what's worse in programming environment where I
didn't have a great deal of familiarity or experience. Despite
those drawbacks the bug got diagnosed and fixed, and I attribute
that result mostly to tenacity and by being willing to consider
unusual or unfamiliar points of view.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Fri Jan 9 14:36:47 2026

From Newsgroup: comp.lang.c

On Fri, 09 Jan 2026 01:42:53 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

highcrew <high.crew3868@fastmail.com> writes:

Hello,

While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.

Let's take an example. There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb

For the lazy, I report it here:

int table[4] = {0};
int exists_in_table(int v)
{
// return true in one of the first 4 iterations
// or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return 1;
}
return 0;
}

This is compiled (with no warning whatsoever) into:

exists_in_table:
mov eax, 1
ret
table:
.zero 16

Well, this is *obviously* wrong. And sure, so is the original code,
but I find it hard to think that the compiler isn't able to notice
it, given that it is even "exploiting" it to produce very efficient
code.

I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible. The compiled function will
basically return 1 for any input, and the final program will be
buggy.

Wouldn't it be more sensible to have a compilation error, or
at least a warning? The compiler will be happy even with -Wall
-Wextra -Werror.

There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?

I mean, yes I would find the problem, thanks to my 100% coverage
unit testing, but couldn't the compiler give me a hint?

Could someone drive me into this reasoning? I know there is a lot
of thinking behind it, yet everything seems to me very incorrect!
I'm in deep cognitive dissonance here! :) Help!

The important thing to realize is that the fundamental issue here
is not a technical question but a social question. In effect what
you are asking is "why doesn't gcc (or clang, or whatever) do what
I want or expect?". The answer is different people want or expect
different things. For some people the behavior described is
egregiously wrong and must be corrected immediately. For other
people the compiler is acting just as they think it should,
nothing to see here, just fix the code and move on to the next
bug. Different people have different priorities.

I have hard time imagining sort of people that would have objections in
case compiler generates the same code as today, but issues diagnostic.
Probably in the same style that it often produces in similar situations:
warning: array subscript 4 is above array bounds of 'int[4]'
[-Warray-bounds]

After observing that, I think the right question is something like
"Given that compilers act in these surprising ways, how should I
protect my code so that it doesn't fall prey to the death-by-UB
syndrome, or what can I do to diagnose a possibly death-by-UB
situation when a strange bug crops up?" I don't pretend to have
good answers to these questions. The best advice I can give
(besides seeking help from others with more experience) is to be
persistent, and to realize that the skills needed for combating a
death-by-UB syndrome are rather different from the skills needed
for regular programming. I have been in the situation of being
made responsible for finding and correcting a death-by-UB kind of
symptom, and what's worse in programming environment where I
didn't have a great deal of familiarity or experience. Despite
those drawbacks the bug got diagnosed and fixed, and I attribute
that result mostly to tenacity and by being willing to consider
unusual or unfamiliar points of view.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Fri Jan 9 15:54:48 2026

From Newsgroup: comp.lang.c

On Thu, 1 Jan 2026 22:54:05 +0100
highcrew <high.crew3868@fastmail.com> wrote:

Hello,

While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.

Let's take an example. There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb

For the lazy, I report it here:

int table[4] = {0};
int exists_in_table(int v)
{
// return true in one of the first 4 iterations
// or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return 1;
}
return 0;
}

This is compiled (with no warning whatsoever) into:

exists_in_table:
mov eax, 1
ret
table:
.zero 16

Well, this is *obviously* wrong. And sure, so is the original code,
but I find it hard to think that the compiler isn't able to notice it,
given that it is even "exploiting" it to produce very efficient code.

I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible. The compiled function will
basically return 1 for any input, and the final program will be
buggy.

Wouldn't it be more sensible to have a compilation error, or
at least a warning? The compiler will be happy even with -Wall
-Wextra -Werror.

There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?

I mean, yes I would find the problem, thanks to my 100% coverage
unit testing, but couldn't the compiler give me a hint?

Could someone drive me into this reasoning? I know there is a lot of
thinking behind it, yet everything seems to me very incorrect!
I'm in deep cognitive dissonance here! :) Help!

Personally, I am not shocked by gcc behavior in this case. May be,
saddened, but not shocked.
I am shocked by slightly modified variant of it.

struct {
int table[4];
int other_table[4];
} bar;

int exists_in_table(int v)
{
for (int i = 0; i <= 4; i++) {
if (bar.table[i] == v)
return 1;
}
return 0;
}

An original variant is unlikely to be present in the code bases that I
care about professionally. But something akin to modified variant could
be present.
Godbolt shows that this behaviour was first introduced in gcc5. It was backported to gcc4 series in gcc 4.8

One of my suspect code bases currently at gcc 4.7. I was considering
moving to 5.3. In lights of that example, I likely am not going to
do it.
Unless there is a magic flag that disables this optimization.

--- Synchronet 3.21a-Linux NewsLink 1.2

From wij@wyniijj5@gmail.com to comp.lang.c on Sat Jan 10 00:08:24 2026

From Newsgroup: comp.lang.c

On Fri, 2026-01-09 at 15:54 +0200, Michael S wrote:

On Thu, 1 Jan 2026 22:54:05 +0100
highcrew <high.crew3868@fastmail.com> wrote:

Hello,

While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.

Let's take an example.-a There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb

For the lazy, I report it here:

-a-a int table[4] = {0};
-a-a int exists_in_table(int v)
-a-a {
-a-a-a-a-a-a // return true in one of the first 4 iterations
-a-a-a-a-a-a // or UB due to out-of-bounds access
-a-a-a-a-a-a for (int i = 0; i <= 4; i++) {
-a-a-a-a-a-a-a-a-a-a if (table[i] == v) return 1;
-a-a-a-a-a-a }
-a-a-a-a-a-a return 0;
-a-a }

This is compiled (with no warning whatsoever) into:

-a-a exists_in_table:
-a-a-a-a-a-a-a-a-a-a mov-a-a-a-a eax, 1
-a-a-a-a-a-a-a-a-a-a ret
-a-a table:
-a-a-a-a-a-a-a-a-a-a .zero-a-a 16

Well, this is *obviously* wrong. And sure, so is the original code,
but I find it hard to think that the compiler isn't able to notice it, given that it is even "exploiting" it to produce very efficient code.

I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible.-a The compiled function will
basically return 1 for any input, and the final program will be
buggy.

It is UB, what the implement is irrevant.
The for loop above is equivalent to:
for (int i = 0; i <= 3; i++) {
if (table[i] == v) return 1;
}
if(table[i]==v) { // implement defined
return 1;
}
// implement defined
So, always returning 1 is correct compilation (no way exists_in_table(v) will return non-1).

Wouldn't it be more sensible to have a compilation error, or
at least a warning?-a The compiler will be happy even with -Wall
-Wextra -Werror.

There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?

I mean, yes I would find the problem, thanks to my 100% coverage
unit testing, but couldn't the compiler give me a hint?

Could someone drive me into this reasoning? I know there is a lot of thinking behind it, yet everything seems to me very incorrect!
I'm in deep cognitive dissonance here! :) Help!

Personally, I am not shocked by gcc behavior in this case. May be,
saddened, but not shocked.
I am shocked by slightly modified variant of it.

struct {
-a int table[4];
-a int other_table[4];
} bar;

int exists_in_table(int v)
{
-a-a for (int i = 0; i <= 4; i++) {
-a-a-a-a if (bar.table[i] == v)
-a-a-a-a-a-a return 1;
-a-a }
-a-a return 0;
}

An original variant is unlikely to be present in the code bases that I
care about professionally. But something akin to modified variant could
be present.
Godbolt shows that this behaviour was first introduced in gcc5. It was backported to gcc4 series in gcc 4.8

One of my suspect code bases currently at gcc 4.7. I was considering
moving to 5.3. In lights of that example, I likely am not going to
do it.
Unless there is a magic flag that disables this optimization.

I am also shocked many seemingly missed.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Kaz Kylheku@046-301-5902@kylheku.com to comp.lang.c on Fri Jan 9 20:14:04 2026

From Newsgroup: comp.lang.c

On 2026-01-09, Michael S <already5chosen@yahoo.com> wrote:

On Fri, 09 Jan 2026 01:42:53 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

highcrew <high.crew3868@fastmail.com> writes:

Hello,

While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.

Let's take an example. There's plenty here:
https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb

For the lazy, I report it here:

int table[4] = {0};
int exists_in_table(int v)
{
// return true in one of the first 4 iterations
// or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return 1;
}
return 0;
}

This is compiled (with no warning whatsoever) into:

exists_in_table:
mov eax, 1
ret
table:
.zero 16

Well, this is *obviously* wrong. And sure, so is the original code,
but I find it hard to think that the compiler isn't able to notice
it, given that it is even "exploiting" it to produce very efficient
code.

I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible. The compiled function will
basically return 1 for any input, and the final program will be
buggy.

Wouldn't it be more sensible to have a compilation error, or
at least a warning? The compiler will be happy even with -Wall
-Wextra -Werror.

There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?

I mean, yes I would find the problem, thanks to my 100% coverage
unit testing, but couldn't the compiler give me a hint?

Could someone drive me into this reasoning? I know there is a lot
of thinking behind it, yet everything seems to me very incorrect!
I'm in deep cognitive dissonance here! :) Help!

The important thing to realize is that the fundamental issue here
is not a technical question but a social question. In effect what
you are asking is "why doesn't gcc (or clang, or whatever) do what
I want or expect?". The answer is different people want or expect
different things. For some people the behavior described is
egregiously wrong and must be corrected immediately. For other
people the compiler is acting just as they think it should,
nothing to see here, just fix the code and move on to the next
bug. Different people have different priorities.

I have hard time imagining sort of people that would have objections in
case compiler generates the same code as today, but issues diagnostic.

If false positives occur for the diagnostic frequently, there
will be legitimate complaint.

If there is only a simple switch for it, it will get turned off
and then it no longer serves its purpose of catching errors.

There are all kinds of optimizations compilers commonly do that could
also be erroneous situations. For instance, eliminating dead code.

// code portable among several types of systems:

switch (sizeof var) {
case 2: ...
case 4: ...
case 8: ...
}

sizeof var is a compile time constant expected to be 2, 4 or 8 bytes.
The other cases are unreachable code.

Suppose every time the compiler eliminates unreachable code, it
issues a diagnostic "foo.c:42: 3 lines of unreachable code removed".

That would be annoying when the programmer knows about dead code
elimination and is counting on it.

We also have to consider that not all code is written directly by hand.

Code generation techniques (including macros) can produce "weird" code
in some of their corner cases. The code is correct, and it would take
more complexity to identify those cases and generate more idiomatic
code; it is left to the compiler to clean up.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Sat Jan 10 18:19:05 2026

From Newsgroup: comp.lang.c

On Fri, 9 Jan 2026 20:14:04 -0000 (UTC)
Kaz Kylheku <046-301-5902@kylheku.com> wrote:

On 2026-01-09, Michael S <already5chosen@yahoo.com> wrote:

On Fri, 09 Jan 2026 01:42:53 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

highcrew <high.crew3868@fastmail.com> writes:

Hello,

While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.

Let's take an example. There's plenty here:
https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb

For the lazy, I report it here:

int table[4] = {0};
int exists_in_table(int v)
{
// return true in one of the first 4 iterations
// or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return 1;
}
return 0;
}

This is compiled (with no warning whatsoever) into:

exists_in_table:
mov eax, 1
ret
table:
.zero 16

Well, this is *obviously* wrong. And sure, so is the original
code, but I find it hard to think that the compiler isn't able
to notice it, given that it is even "exploiting" it to produce
very efficient code.

I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible. The compiled function will
basically return 1 for any input, and the final program will be
buggy.

Wouldn't it be more sensible to have a compilation error, or
at least a warning? The compiler will be happy even with -Wall
-Wextra -Werror.

There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?

I mean, yes I would find the problem, thanks to my 100% coverage
unit testing, but couldn't the compiler give me a hint?

Could someone drive me into this reasoning? I know there is a
lot of thinking behind it, yet everything seems to me very
incorrect! I'm in deep cognitive dissonance here! :) Help!

The important thing to realize is that the fundamental issue here
is not a technical question but a social question. In effect what
you are asking is "why doesn't gcc (or clang, or whatever) do what
I want or expect?". The answer is different people want or expect
different things. For some people the behavior described is
egregiously wrong and must be corrected immediately. For other
people the compiler is acting just as they think it should,
nothing to see here, just fix the code and move on to the next
bug. Different people have different priorities.

I have hard time imagining sort of people that would have
objections in case compiler generates the same code as today, but
issues diagnostic.

If false positives occur for the diagnostic frequently, there
will be legitimate complaint.

If there is only a simple switch for it, it will get turned off
and then it no longer serves its purpose of catching errors.

There are all kinds of optimizations compilers commonly do that could
also be erroneous situations. For instance, eliminating dead code.

// code portable among several types of systems:

switch (sizeof var) {
case 2: ...
case 4: ...
case 8: ...
}

sizeof var is a compile time constant expected to be 2, 4 or 8 bytes.
The other cases are unreachable code.

Suppose every time the compiler eliminates unreachable code, it
issues a diagnostic "foo.c:42: 3 lines of unreachable code removed".

That would be annoying when the programmer knows about dead code
elimination and is counting on it.

We also have to consider that not all code is written directly by
hand.

Code generation techniques (including macros) can produce "weird" code
in some of their corner cases. The code is correct, and it would take
more complexity to identify those cases and generate more idiomatic
code; it is left to the compiler to clean up.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Sat Jan 10 18:41:06 2026

From Newsgroup: comp.lang.c

On Fri, 9 Jan 2026 20:14:04 -0000 (UTC)
Kaz Kylheku <046-301-5902@kylheku.com> wrote:

On 2026-01-09, Michael S <already5chosen@yahoo.com> wrote:

On Fri, 09 Jan 2026 01:42:53 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

The important thing to realize is that the fundamental issue here
is not a technical question but a social question. In effect what
you are asking is "why doesn't gcc (or clang, or whatever) do what
I want or expect?". The answer is different people want or expect
different things. For some people the behavior described is
egregiously wrong and must be corrected immediately. For other
people the compiler is acting just as they think it should,
nothing to see here, just fix the code and move on to the next
bug. Different people have different priorities.

I have hard time imagining sort of people that would have
objections in case compiler generates the same code as today, but
issues diagnostic.

If false positives occur for the diagnostic frequently, there
will be legitimate complaint.

If there is only a simple switch for it, it will get turned off
and then it no longer serves its purpose of catching errors.

There are all kinds of optimizations compilers commonly do that could
also be erroneous situations. For instance, eliminating dead code.

<snip>

I am not talking about some general abstraction, but about specific
case.
You example is irrelevant.
-Warray-bounds exists for a long time.
-Warray-bounds=1 is a part of -Wall set.
Message 'array subscript nnn is above array bounds' fits this
particular case as well as any other case when compiler does not forget
to issue it.
Defending gcc behavior of not issuing the enabled warning in situation
where compiler certainly detected out of bound access sounds like
Stockholm syndrome.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Tristan Wibberley@tristan.wibberley+netnews2@alumni.manchester.ac.uk to comp.lang.c on Sat Jan 10 17:08:43 2026

From Newsgroup: comp.lang.c

On 09/01/2026 20:14, Kaz Kylheku wrote:

If there is only a simple switch for it, it will get turned off
and then it no longer serves its purpose of catching errors.

But it might still serve its purpose for assigning criminal liability.
--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sun Jan 11 11:48:08 2026

From Newsgroup: comp.lang.c

Michael S <already5chosen@yahoo.com> writes:

On Fri, 09 Jan 2026 01:42:53 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

highcrew <high.crew3868@fastmail.com> writes:

Hello,

While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.

Let's take an example. There's plenty here:
https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb

For the lazy, I report it here:

int table[4] = {0};
int exists_in_table(int v)
{
// return true in one of the first 4 iterations
// or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return 1;
}
return 0;
}

This is compiled (with no warning whatsoever) into:

exists_in_table:
mov eax, 1
ret
table:
.zero 16

Well, this is *obviously* wrong. And sure, so is the original code,
but I find it hard to think that the compiler isn't able to notice
it, given that it is even "exploiting" it to produce very efficient
code.

I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible. The compiled function will
basically return 1 for any input, and the final program will be
buggy.

Wouldn't it be more sensible to have a compilation error, or
at least a warning? The compiler will be happy even with -Wall
-Wextra -Werror.

There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?

I mean, yes I would find the problem, thanks to my 100% coverage
unit testing, but couldn't the compiler give me a hint?

Could someone drive me into this reasoning? I know there is a lot
of thinking behind it, yet everything seems to me very incorrect!
I'm in deep cognitive dissonance here! :) Help!

The important thing to realize is that the fundamental issue here
is not a technical question but a social question. In effect what
you are asking is "why doesn't gcc (or clang, or whatever) do what
I want or expect?". The answer is different people want or expect
different things. For some people the behavior described is
egregiously wrong and must be corrected immediately. For other
people the compiler is acting just as they think it should,
nothing to see here, just fix the code and move on to the next
bug. Different people have different priorities.

I have hard time imagining sort of people that would have objections
in case compiler generates the same code as today, but issues
diagnostic.

It depends on what the tradeoffs are. For example, given a
choice, I would rather have an option to prevent this particular
death-by-UB optimization than an option to issue a diagnostic.
Having both costs more effort than having just only one.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Sun Jan 11 22:52:56 2026

From Newsgroup: comp.lang.c

On Sun, 11 Jan 2026 11:48:08 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

Michael S <already5chosen@yahoo.com> writes:

On Fri, 09 Jan 2026 01:42:53 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

highcrew <high.crew3868@fastmail.com> writes:

Hello,

While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.

Let's take an example. There's plenty here:
https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb

For the lazy, I report it here:

int table[4] = {0};
int exists_in_table(int v)
{
// return true in one of the first 4 iterations
// or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return 1;
}
return 0;
}

This is compiled (with no warning whatsoever) into:

exists_in_table:
mov eax, 1
ret
table:
.zero 16

Well, this is *obviously* wrong. And sure, so is the original
code, but I find it hard to think that the compiler isn't able to
notice it, given that it is even "exploiting" it to produce very
efficient code.

I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible. The compiled function will
basically return 1 for any input, and the final program will be
buggy.

Wouldn't it be more sensible to have a compilation error, or
at least a warning? The compiler will be happy even with -Wall
-Wextra -Werror.

There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?

I mean, yes I would find the problem, thanks to my 100% coverage
unit testing, but couldn't the compiler give me a hint?

Could someone drive me into this reasoning? I know there is a lot
of thinking behind it, yet everything seems to me very incorrect!
I'm in deep cognitive dissonance here! :) Help!

The important thing to realize is that the fundamental issue here
is not a technical question but a social question. In effect what
you are asking is "why doesn't gcc (or clang, or whatever) do what
I want or expect?". The answer is different people want or expect
different things. For some people the behavior described is
egregiously wrong and must be corrected immediately. For other
people the compiler is acting just as they think it should,
nothing to see here, just fix the code and move on to the next
bug. Different people have different priorities.

I have hard time imagining sort of people that would have objections
in case compiler generates the same code as today, but issues
diagnostic.

It depends on what the tradeoffs are. For example, given a
choice, I would rather have an option to prevent this particular
death-by-UB optimization than an option to issue a diagnostic.
Having both costs more effort than having just only one.

Me too.
But there are limits to what considered negotiable by worshippers of
nasal demons and what is beyond that. Warning is negotiable, turning
off the transformation is most likely beyond.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Sun Jan 11 22:53:53 2026

From Newsgroup: comp.lang.c

Michael S <already5chosen@yahoo.com> writes:
[...]

But there are limits to what considered negotiable by worshippers of
nasal demons and what is beyond that. Warning is negotiable, turning
off the transformation is most likely beyond.

Your use of the word "worshippers" suggests a misunderstanding on
your part.

I certainly do not "worship" anything about C. I don't think
anyone else you've been talking to does either. I have a pretty
good understanding of it. There are plenty of things I don't
particularly like.

In the vast majority of my posts here, I simply try to explain what
the standard actually says and offer advice based on that.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Mon Jan 12 11:44:43 2026

From Newsgroup: comp.lang.c

On Sun, 11 Jan 2026 22:53:53 -0800
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

Michael S <already5chosen@yahoo.com> writes:
[...]

But there are limits to what considered negotiable by worshippers of
nasal demons and what is beyond that. Warning is negotiable, turning
off the transformation is most likely beyond.

Your use of the word "worshippers" suggests a misunderstanding on
your part.

I certainly do not "worship" anything about C. I don't think
anyone else you've been talking to does either. I have a pretty
good understanding of it. There are plenty of things I don't
particularly like.

In the vast majority of my posts here, I simply try to explain what
the standard actually says and offer advice based on that.

About my personal vocabulary.

Normally phrase "worshippers of nasal demons" in my posts refers to
faction among developers and maintainers of gcc and clang compilers. I
think that it's not an unusual use of the phrase, but I can be wrong
about it.

AFAIK, you are not gcc or clang maintainer. So, not a "worshipper".
When I want to characterize [in derogatory fashion] people that have no
direct influence on behavior of common software tools, but share the
attitude of "worshippers" toward UBs then I use phrase 'language
lawyers'.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Mon Jan 12 16:28:57 2026

From Newsgroup: comp.lang.c

On Thu, 1 Jan 2026 22:54:05 +0100
highcrew <high.crew3868@fastmail.com> wrote:

Hello,

While I consider myself reasonably good as C programmer, I still
have difficulties in understanding undefined behavior.
I wonder if anyone in this NG could help me.

Let's take an example. There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
So let's focus on https://godbolt.org/z/48bn19Tsb

For the lazy, I report it here:

int table[4] = {0};
int exists_in_table(int v)
{
// return true in one of the first 4 iterations
// or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return 1;
}
return 0;
}

This is compiled (with no warning whatsoever) into:

exists_in_table:
mov eax, 1
ret
table:
.zero 16

Well, this is *obviously* wrong. And sure, so is the original code,
but I find it hard to think that the compiler isn't able to notice it,
given that it is even "exploiting" it to produce very efficient code.

I understand the formalism: the resulting assembly is formally
"correct", in that UB implies that anything can happen.
Yet I can't think of any situation where the resulting assembly
could be considered sensible. The compiled function will
basically return 1 for any input, and the final program will be
buggy.

Wouldn't it be more sensible to have a compilation error, or
at least a warning? The compiler will be happy even with -Wall
-Wextra -Werror.

There's plenty of documentation, articles and presentations that
explain how this can make very efficient code... but nothing
will answer this question: do I really want to be efficiently
wrong?

I mean, yes I would find the problem, thanks to my 100% coverage
unit testing, but couldn't the compiler give me a hint?

Could someone drive me into this reasoning? I know there is a lot of
thinking behind it, yet everything seems to me very incorrect!
I'm in deep cognitive dissonance here! :) Help!

On related note.

struct bar1 {
int table[4];
int other_table[4];
};

struct bar2 {
int other_table[4];
int table[4];
};

int foo1(struct bar1* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->table[i] == v)
return 1;
return 0;
}

int foo2(struct bar2* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->table[i] == v)
return 1;
return 0;
}

According to C Standard, access to p->table[4] in foo1() is UB.
[O.T.]
I want to use language (or, better, standardize dialect of C) in which
behavior in this case is defined, but I am bad at influencing other
people. So can not get what I want.
[/O.T.]

Now the question.
What The Standard says about foo2() ? Is there UB in foo2() as well?
gcc code generator does not think so.

.file "ub.c"
.text
.p2align 4
.globl foo1
.def foo1; .scl 2; .type
32; .endef .seh_proc foo1
foo1:
.seh_endprologue
movl $1, %eax
ret
.seh_endproc
.p2align 4
.globl foo2
.def foo2; .scl 2; .type
32; .endef .seh_proc foo2
foo2:
.seh_endprologue
leaq 16(%rcx), %rax
addq $36, %rcx
.L5:
cmpl %edx, (%rax)
je .L6
addq $4, %rax
cmpq %rcx, %rax
jne .L5
xorl %eax, %eax
ret
.p2align 4,,10
.p2align 3
.L6:
movl $1, %eax
ret
.seh_endproc
.ident "GCC: (Rev8, Built by MSYS2 project) 15.2.0"

--- Synchronet 3.21a-Linux NewsLink 1.2

From bart@bc@freeuk.com to comp.lang.c on Mon Jan 12 15:58:15 2026

From Newsgroup: comp.lang.c

On 12/01/2026 14:28, Michael S wrote:

On Thu, 1 Jan 2026 22:54:05 +0100

On related note.

struct bar1 {
int table[4];
int other_table[4];
};

struct bar2 {
int other_table[4];
int table[4];
};

int foo1(struct bar1* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->table[i] == v)
return 1;
return 0;
}

int foo2(struct bar2* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->table[i] == v)
return 1;
return 0;
}

According to C Standard, access to p->table[4] in foo1() is UB.
[O.T.]
I want to use language (or, better, standardize dialect of C) in which behavior in this case is defined, but I am bad at influencing other
people. So can not get what I want.
[/O.T.]

So you want to deliberately read one element past the end because you
know it will be the first element of other_table?

I think then it would be better writing it like this:

struct bar1 {
union {
struct {
int table[4];
int other_table[4];
};
int xtable[8];
};
};

int foo1(struct bar1* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->xtable[i] == v)
return 1;
return 0;
}

At least your intent is signaled to whomever is reading your code.

But I don't know if UB goes away, if you intend writing to .table and .other_table, and reading those values via .xtable (I can't remember the rules).

I'm not even sure about there being no padding between .table and .other_table.

(In my systems language, the behaviour of your original foo1, in an
equivalent program, is well-defined. But not of foo2, given that you may
read some garbage value beyond the struct, which may or may not be
within valid memory.)

Now the question.
What The Standard says about foo2() ? Is there UB in foo2() as well?

Given that you may be reading garbage as I said, whether it is UB or not
is irrelevant; your program has a bug.

Unless you can add extra context which would make that reasonable. For example, the struct is within an array, it's not the last element, so it
will read the first element of .other_table, and you are doing this
knowingly rather than through oversight.

It might well be UB, but that is a separate problem.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Andrey Tarasevich@noone@noone.net to comp.lang.c on Mon Jan 12 08:03:31 2026

From Newsgroup: comp.lang.c

On Mon 1/12/2026 6:28 AM, Michael S wrote:

According to C Standard, access to p->table[4] in foo1() is UB.
...
Now the question.
What The Standard says about foo2() ? Is there UB in foo2() as well?

Yes, in the same sense as in `foo1`.

gcc code generator does not think so.

It definitely does. However, since this is the trailing array member of
the struct, GCC does not want to accidentally suppress the classic
"struct hack". It assumes that quite possibly the pointer passed to the function points to a struct object allocated through the "struct hack" technique.

Add an extra field after the trailing array and `foo2` will also fold
into `return 1`, just like `foo1`.

Perhaps there's a switch in GCC that would outlaw the classic "struct
hack"... But in any case, it is not prohibited by default for
compatibility with pre-C99 code.
--
Best regards,
Andrey
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Mon Jan 12 19:36:52 2026

From Newsgroup: comp.lang.c

On Mon, 12 Jan 2026 08:03:31 -0800
Andrey Tarasevich <noone@noone.net> wrote:

On Mon 1/12/2026 6:28 AM, Michael S wrote:

According to C Standard, access to p->table[4] in foo1() is UB.
...
Now the question.
What The Standard says about foo2() ? Is there UB in foo2() as
well?

Yes, in the same sense as in `foo1`.

gcc code generator does not think so.

It definitely does.

Do you have citation from the Standard?

However, since this is the trailing array member
of the struct, GCC does not want to accidentally suppress the classic "struct hack". It assumes that quite possibly the pointer passed to
the function points to a struct object allocated through the "struct
hack" technique.

That much I understand myself.

table plays a role FMA. A lot of code depends on such pattern. It's

rather standard practice in communication programming. Esp. so in C90,
when there were no FMA and in C++ where FMA does not exist even today. Production compiler like gcc has really no option except to handle
it as expected by millions of programmers.

But I was interested in the "opinion" of C Standard rather than of gcc compiler.
Is it full nasal UB or merely "implementation-defined behavior"?

Add an extra field after the trailing array and `foo2` will also fold
into `return 1`, just like `foo1`.

Perhaps there's a switch in GCC that would outlaw the classic "struct hack"... But in any case, it is not prohibited by default for
compatibility with pre-C99 code.

gcc indeed has something of this sort : -fstrict-flex-arrays=3
But at the moment it does not appear to affect code generation [in this particular example].

--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Mon Jan 12 20:08:21 2026

From Newsgroup: comp.lang.c

On Mon, 12 Jan 2026 15:58:15 +0000
bart <bc@freeuk.com> wrote:

On 12/01/2026 14:28, Michael S wrote:

On Thu, 1 Jan 2026 22:54:05 +0100

On related note.

struct bar1 {
int table[4];
int other_table[4];
};

struct bar2 {
int other_table[4];
int table[4];
};

int foo1(struct bar1* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->table[i] == v)
return 1;
return 0;
}

int foo2(struct bar2* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->table[i] == v)
return 1;
return 0;
}

According to C Standard, access to p->table[4] in foo1() is UB.
[O.T.]
I want to use language (or, better, standardize dialect of C) in
which behavior in this case is defined, but I am bad at influencing
other people. So can not get what I want.
[/O.T.]

So you want to deliberately read one element past the end because you
know it will be the first element of other_table?

Yes. I primarily want it for multi-dimensional arrays. Making the same
pattern defined in 'struct' is less important in practice, but desirable
for consistency between arrays and structures.

I think then it would be better writing it like this:

struct bar1 {
union {
struct {
int table[4];
int other_table[4];
};
int xtable[8];
};
};

int foo1(struct bar1* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->xtable[i] == v)
return 1;
return 0;
}

At least your intent is signaled to whomever is reading your code.

If were use language or dialect in which the behavior is defined, why
would you consider the second variant better?
I don't mean in this particular very simplified example, but generally,
where layout is more complicated.

But I don't know if UB goes away, if you intend writing to .table and .other_table, and reading those values via .xtable (I can't remember
the rules).

I'm not even sure about there being no padding between .table and .other_table.

Considering that they both 'int' I don't think that it could happen,
even in standard C. In "my" dialect, padding in such situation can be explicitly disallowed by the standard.

(In my systems language, the behaviour of your original foo1, in an equivalent program, is well-defined. But not of foo2, given that you
may read some garbage value beyond the struct, which may or may not
be within valid memory.)

Now the question.
What The Standard says about foo2() ? Is there UB in foo2() as
well?

Given that you may be reading garbage as I said, whether it is UB or
not is irrelevant; your program has a bug.

Whether there is bug or not depends on what caller passed to foo2().
There are great many programs around that do similar things and contain
no bugs. Most typically, caller creates argument p by casting of char
array that is long enough for table member to hold more than 4
elements.
Without seeing code on the caller's site we could only guess, due to
suspect way the code is written, that there is bug. But we can't be
sure.

Unless you can add extra context which would make that reasonable.
For example, the struct is within an array, it's not the last
element, so it will read the first element of .other_table, and you
are doing this knowingly rather than through oversight.

It might well be UB, but that is a separate problem.

--- Synchronet 3.21a-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Mon Jan 12 20:02:20 2026

From Newsgroup: comp.lang.c

Michael S <already5chosen@yahoo.com> writes:

On Mon, 12 Jan 2026 15:58:15 +0000
bart <bc@freeuk.com> wrote:

On 12/01/2026 14:28, Michael S wrote:

On Thu, 1 Jan 2026 22:54:05 +0100

On related note.

struct bar1 {
int table[4];
int other_table[4];
};

struct bar2 {
int other_table[4];
int table[4];
};

int foo1(struct bar1* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->table[i] == v)
return 1;
return 0;
}

int foo2(struct bar2* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->table[i] == v)
return 1;
return 0;
}

According to C Standard, access to p->table[4] in foo1() is UB.
[O.T.]
I want to use language (or, better, standardize dialect of C) in
which behavior in this case is defined, but I am bad at influencing
other people. So can not get what I want.
[/O.T.]

So you want to deliberately read one element past the end because you
know it will be the first element of other_table?

Yes. I primarily want it for multi-dimensional arrays.

So declare it as int table[4][4].

$ cat /tmp/a.c
#include <stdio.h>
int table[4][4] = { {1,2,3,4}, {5,6,7,8}, {9, 10, 11, 12}, {13, 14, 15, 16} };

int main(int argc, const char **argv, const char **envp)
{

printf("%u\n", table[3][2]);
return 0;
}
$ cc -o /tmp/a /tmp/a.c
$ /tmp/a
15

--- Synchronet 3.21a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Jan 12 12:03:36 2026

From Newsgroup: comp.lang.c

Michael S <already5chosen@yahoo.com> writes:

On Mon, 12 Jan 2026 08:03:31 -0800
Andrey Tarasevich <noone@noone.net> wrote:

On Mon 1/12/2026 6:28 AM, Michael S wrote:

According to C Standard, access to p->table[4] in foo1() is UB.
...
Now the question.
What The Standard says about foo2() ? Is there UB in foo2() as
well?

Yes, in the same sense as in `foo1`.

gcc code generator does not think so.

It definitely does.

Right.

Do you have citation from the Standard?

The short answer is section 6.5.6 paragraph 8.

There is amplification in Annex J.2, roughly three pages
after the start of J.2. You can search for "an array
subscript is out of range", where there is a clarifying
example.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Mon Jan 12 22:41:07 2026

From Newsgroup: comp.lang.c

On Mon, 12 Jan 2026 12:03:36 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

Michael S <already5chosen@yahoo.com> writes:

On Mon, 12 Jan 2026 08:03:31 -0800
Andrey Tarasevich <noone@noone.net> wrote:

On Mon 1/12/2026 6:28 AM, Michael S wrote:

According to C Standard, access to p->table[4] in foo1() is UB.
...
Now the question.
What The Standard says about foo2() ? Is there UB in foo2() as
well?

Yes, in the same sense as in `foo1`.

gcc code generator does not think so.

It definitely does.

Right.

May be. But it's not expressed by gcc code generator or by any wranings.
So, how can we know?

Do you have citation from the Standard?

The short answer is section 6.5.6 paragraph 8.

I am reading N3220 draft https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf
Here section 6.5.6 has no paragraph 8 :(

There is amplification in Annex J.2, roughly three pages
after the start of J.2. You can search for "an array
subscript is out of range", where there is a clarifying
example.

I see the following text:
"An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.7)."

That's what you had in mind?

--- Synchronet 3.21a-Linux NewsLink 1.2

From James Russell Kuyper Jr.@jameskuyper@alumni.caltech.edu to comp.lang.c on Mon Jan 12 20:29:40 2026

From Newsgroup: comp.lang.c

On 2026-01-12 04:44, Michael S wrote:
...

Normally phrase "worshippers of nasal demons" in my posts refers to
faction among developers and maintainers of gcc and clang compilers. I
think that it's not an unusual use of the phrase, but I can be wrong
about it.

Which faction would that be? I'm sure there's more than one to choose
from. An example of what they've done that, in your opinion, justifies
that description might also be helpful

...

AFAIK, you are not gcc or clang maintainer. So, not a "worshipper".
When I want to characterize [in derogatory fashion] people that have no direct influence on behavior of common software tools, but share the
attitude of "worshippers" toward UBs then I use phrase 'language lawyers'."language lawyers", at least, I understand, having frequently been

described as one myself. It means those who are knowledgeable about what
the standard allows and prohibits, both for programs and for
implementations. I'm no sure why you'd consider them "worshippers" of
UB; they are characterized as language lawyers because they know
precisely when the behavior is or is not UB - but that says nothing
about whether they approve of UB or not. They would still be language
lawyers whether they approved of UB, or despised it.
--- Synchronet 3.21a-Linux NewsLink 1.2

From James Russell Kuyper Jr.@jameskuyper@alumni.caltech.edu to comp.lang.c on Mon Jan 12 20:35:09 2026

From Newsgroup: comp.lang.c

On 12/01/2026 14:28, Michael S wrote:

On Thu, 1 Jan 2026 22:54:05 +0100

On related note.

struct bar1 {
int table[4];
int other_table[4];
};

struct bar2 {
int other_table[4];
int table[4];
};

int foo1(struct bar1* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->table[i] == v)
return 1;
return 0;
}

int foo2(struct bar2* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->table[i] == v)
return 1;
return 0;
}

According to C Standard, access to p->table[4] in foo1() is UB.
[O.T.]
I want to use language (or, better, standardize dialect of C) in which behavior in this case is defined, but I am bad at influencing other
people. So can not get what I want.

OK - so how do you want it to be defined? I've used languages where
table[n] for n>3 would have exactly the same effect as table[3], and
table[n] for n<0 would have exactly the same effect as table[0]. I've
seen algorithms that were actually simplified by relying upon this behavior. --- Synchronet 3.21a-Linux NewsLink 1.2

From James Russell Kuyper Jr.@jameskuyper@alumni.caltech.edu to comp.lang.c on Mon Jan 12 21:09:25 2026

From Newsgroup: comp.lang.c

On 2026-01-12 15:02, Scott Lurndal wrote:

Michael S <already5chosen@yahoo.com> writes:

On Mon, 12 Jan 2026 15:58:15 +0000
bart <bc@freeuk.com> wrote:

On 12/01/2026 14:28, Michael S wrote:

...

struct bar1 {
int table[4];
int other_table[4];
};

...

So you want to deliberately read one element past the end because you
know it will be the first element of other_table?

Yes. I primarily want it for multi-dimensional arrays.

So declare it as int table[4][4].

Note that this suggestion does not make the behavior defined. It is
undefined behavior to make dereference table[0]+4, and it is undefined behavior to make any use of table[0]+5.

--- Synchronet 3.21a-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Tue Jan 13 09:12:14 2026

From Newsgroup: comp.lang.c

On 12/01/2026 21:41, Michael S wrote:

On Mon, 12 Jan 2026 12:03:36 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

Michael S <already5chosen@yahoo.com> writes:

On Mon, 12 Jan 2026 08:03:31 -0800
Andrey Tarasevich <noone@noone.net> wrote:

On Mon 1/12/2026 6:28 AM, Michael S wrote:

According to C Standard, access to p->table[4] in foo1() is UB.
...
Now the question.
What The Standard says about foo2() ? Is there UB in foo2() as
well?

Yes, in the same sense as in `foo1`.

gcc code generator does not think so.

It definitely does.

Right.

May be. But it's not expressed by gcc code generator or by any wranings.
So, how can we know?

Do you have citation from the Standard?

The short answer is section 6.5.6 paragraph 8.

I am reading N3220 draft https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf
Here section 6.5.6 has no paragraph 8 :(

The C standards managed to keep section numbers and even paragraph
numbers consistent between versions for a long time, but there are a
number of differences in C23. 6.5.6p8 in, for example, C11, is 6.5.7p9
in N3220. (N3220 is an early draft of the next Cy version, and is far
from complete. The best C23 draft is N3096, where the relevant
paragraph is 6.5.6p9.)

There is amplification in Annex J.2, roughly three pages
after the start of J.2. You can search for "an array
subscript is out of range", where there is a clarifying
example.

I see the following text:
"An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.7)."

That's what you had in mind?

I can't read Tim's mind, but it is certainly an example that /I/ think
is pretty clear. The list of undefined behaviours in J.2 is
non-normative (meaning it does not define the rules of the language, it
just tries to explain them or list them), and not complete (lots of
things are UB without being listed, simply because the standard does not define behaviours for them). But the list in J.2 can be a very useful
summary of UB's, and it can be easier to follow than to understand the referenced sections in the normative text.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Tue Jan 13 11:07:27 2026

From Newsgroup: comp.lang.c

On Mon, 12 Jan 2026 20:35:09 -0500
"James Russell Kuyper Jr." <jameskuyper@alumni.caltech.edu> wrote:

On 12/01/2026 14:28, Michael S wrote:

On Thu, 1 Jan 2026 22:54:05 +0100

On related note.

struct bar1 {
int table[4];
int other_table[4];
};

struct bar2 {
int other_table[4];
int table[4];
};

int foo1(struct bar1* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->table[i] == v)
return 1;
return 0;
}

int foo2(struct bar2* p, int v)
{
for (int i = 0; i <= 4; ++i)
if (p->table[i] == v)
return 1;
return 0;
}

According to C Standard, access to p->table[4] in foo1() is UB.
[O.T.]
I want to use language (or, better, standardize dialect of C) in
which behavior in this case is defined, but I am bad at influencing
other people. So can not get what I want.

OK - so how do you want it to be defined? I've used languages where
table[n] for n>3 would have exactly the same effect as table[3], and table[n] for n<0 would have exactly the same effect as table[0]. I've
seen algorithms that were actually simplified by relying upon this
behavior.

I want "my" dialect to be based on abstract machine with flat memory
model. All variables, except for automatic variables which address
was never taken by the program, are laid upon one big implicit
array of char.
For my purposes, Harvard abstract machine is sufficient.
I am sure that there are multiple people that would want option for Von
Neumann abstract machine, i.e. for program code to be laid over the same implicit array as variables, with as many things defined in the
standard as practically possible. My aspirations do not go that far.

In specific case of 'struct bar1', it means that I want p->table[4:7] to
be absolute equivalents of p->other_table[0:3]. For p->table[n] where n
< 0 or n > 7, I want generated code to access respective locations in
implicit underlying array. Whether resulting behavior defined or
undefined would depend on the specifics of the caller.

If you say that "my" dialect is less optimizable than Standard C then
my answer is "Yes, I know and I don't care".

If you say that "my" dialect removes certain potential for detection of
buffer overflows by compiler then my answer is "Generally, yes, and it's
not great, but I consider it a fair price.". Pay attention that there
are still plenty of places where compiler can warn, like in majority of automatic and static arrays. In other situations bound checking can be
enabled at spot by special attribute.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Tue Jan 13 11:31:55 2026

From Newsgroup: comp.lang.c

On Mon, 12 Jan 2026 21:09:25 -0500
"James Russell Kuyper Jr." <jameskuyper@alumni.caltech.edu> wrote:

On 2026-01-12 15:02, Scott Lurndal wrote:

Michael S <already5chosen@yahoo.com> writes:

On Mon, 12 Jan 2026 15:58:15 +0000
bart <bc@freeuk.com> wrote:

On 12/01/2026 14:28, Michael S wrote:

...

struct bar1 {
int table[4];
int other_table[4];
};

...

So you want to deliberately read one element past the end because
you know it will be the first element of other_table?

Yes. I primarily want it for multi-dimensional arrays.

So declare it as int table[4][4].

Note that this suggestion does not make the behavior defined. It is undefined behavior to make dereference table[0]+4, and it is
undefined behavior to make any use of table[0]+5.

Pay attention that Scott didn't suggest that dereferencing table[0][4]
in his example is defined.
Not that I understood what he wanted to suggest :(

--- Synchronet 3.21a-Linux NewsLink 1.2

From Andrey Tarasevich@noone@noone.net to comp.lang.c on Tue Jan 13 08:11:15 2026

From Newsgroup: comp.lang.c

On Mon 1/12/2026 9:36 AM, Michael S wrote:

But I was interested in the "opinion" of C Standard rather than of gcc compiler.
Is it full nasal UB or merely "implementation-defined behavior"?

It is full nasal UB per the standard. And, of course, it is as "implementation-defined" as any other UB in a sense that the standard
permits implementations to _extend_ the language in any way they please,
as long as they don't forget to issue diagnostics when diagnostics are required (by the standard).

Perhaps there's a switch in GCC that would outlaw the classic "struct
hack"... But in any case, it is not prohibited by default for
compatibility with pre-C99 code.

gcc indeed has something of this sort : -fstrict-flex-arrays=3
But at the moment it does not appear to affect code generation [in this particular example].

Yeah... I tried both the command-line setting and the attribute. No
effect on the code though.
--
Best regards,
Andrey
--- Synchronet 3.21a-Linux NewsLink 1.2

From pa@pa@see.signature.invalid (Pierre Asselin) to comp.lang.c on Tue Jan 13 20:19:50 2026

From Newsgroup: comp.lang.c

Michael S <already5chosen@yahoo.com> wrote:

There is amplification in Annex J.2, roughly three pages
after the start of J.2. You can search for "an array
subscript is out of range", where there is a clarifying
example.

I see the following text:
"An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.7)."

In numerical analysis it is often useful to flatten
multidimensional arrays. For example,

void initialize(int table[4][4])
{
int *flat= &table[0][0];
for (int i= 0; i<16; i++) flat[i]= i;
}

Accessing table[0][i] would have been UB according to appendix J.
Is going through flat[i] UB as well? Probably, and that's a shame.
--
pa at panix dot com
--- Synchronet 3.21a-Linux NewsLink 1.2

From Tristan Wibberley@tristan.wibberley+netnews2@alumni.manchester.ac.uk to comp.lang.c on Tue Jan 13 20:37:33 2026

From Newsgroup: comp.lang.c

On 01/01/2026 21:54, highcrew wrote:

do I really want to be efficiently
wrong?

If you wanted to give up efficiency to be not wrong you would have taken
more care over writing your loop. You didn't therefore the compiler
reasonably acts accordingly.

You /may/ write a static analyser despite the inefficiency of doing so.
You /may/ give the compiler a flag to help you more.

Consider the problems of making changes to the program unpredictable in
terms of development cost! If the compiler issues a diagnostic for some programs but not others based merely on whether it /can/ the wider
process is impacted even when predictability is essential and
non-compiler methods are anyway employed to avoid errors.

That is: which choices are encoded into the compiler is a preference.
Which choices are given to you for nothing is the compiler author's
preference.
--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Tristan Wibberley@tristan.wibberley+netnews2@alumni.manchester.ac.uk to comp.lang.c on Tue Jan 13 21:54:43 2026

From Newsgroup: comp.lang.c

On 12/01/2026 15:58, bart wrote:

-astruct bar1 {
-a-a union {
-a-a-a-a struct {
-a-a-a-a-a-a int table[4];
-a-a-a-a-a-a int other_table[4];
-a-a-a-a-a };
-a-a-a-a int xtable[8];
-a-a };
-a};

-aint foo1(struct bar1* p, int v)
-a{
-a-a for (int i = 0; i <= 4; ++i)
-a-a-a-a if (p->xtable[i] == v)
-a-a-a-a-a-a return 1;
-a-a return 0;
-a}

At least your intent is signaled to whomever is reading your code.

But I don't know if UB goes away, if you intend writing to .table and .other_table, and reading those values via .xtable (I can't remember the rules).

I'm not even sure about there being no padding between .table and .other_table.

IIRC indexing a table follows the rules of pointers and doing so outside
of a table's bounds is generally U/B except for very peculiar specific
cases. You can do it in a struct across members /sometimes/ because a
struct is a single object. In general it depends on whether your pointer arithmetic has been written such that it matches the layout of the
structure.

IIRC there is a standard version upon which certain combinations are
guaranteed to be packed, the examples above /might/ exemplify some of them.

This is another matter:

int table[2][4];
(table[0][4] == v);

IIRC, that /is/ a valid reference to the first element of the second
table and is easier to rely on than other cases that might be valid.

ie table[0][4] is equivalent to table[1][0] because both just juggle
pointers around within a single object in a way that's a valid pointer
at every moment (which is a stronger condition than what's actually
required anyway).

*(*(table + 0) + 4) is /functionally/ equivalent to *(*(table + 1) + 0)
and there is validity assured by the way table is defined: table[1] is
not a separate object from table[0] and there can be no padding by a peculiarity of the rules even for quite old versions of the standard, I
think even C89 but I bet there are others with better memories.
--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Kaz Kylheku@046-301-5902@kylheku.com to comp.lang.c on Tue Jan 13 23:31:26 2026

From Newsgroup: comp.lang.c

On 2026-01-10, Michael S <already5chosen@yahoo.com> wrote:

On Fri, 9 Jan 2026 20:14:04 -0000 (UTC)
Kaz Kylheku <046-301-5902@kylheku.com> wrote:

On 2026-01-09, Michael S <already5chosen@yahoo.com> wrote:

On Fri, 09 Jan 2026 01:42:53 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

The important thing to realize is that the fundamental issue here
is not a technical question but a social question. In effect what
you are asking is "why doesn't gcc (or clang, or whatever) do what
I want or expect?". The answer is different people want or expect
different things. For some people the behavior described is
egregiously wrong and must be corrected immediately. For other
people the compiler is acting just as they think it should,
nothing to see here, just fix the code and move on to the next
bug. Different people have different priorities.

I have hard time imagining sort of people that would have
objections in case compiler generates the same code as today, but
issues diagnostic.

If false positives occur for the diagnostic frequently, there
will be legitimate complaint.

If there is only a simple switch for it, it will get turned off
and then it no longer serves its purpose of catching errors.

There are all kinds of optimizations compilers commonly do that could
also be erroneous situations. For instance, eliminating dead code.

<snip>

I am not talking about some general abstraction, but about specific
case.
You example is irrelevant.
-Warray-bounds exists for a long time.
-Warray-bounds=1 is a part of -Wall set.

In your particular example, it is crystal clear that the "return 0"
statement is elided away due to being considered unreachable, and the
only reason for that can be undefined behavior, and the only undefined
behavior is accessing the array out of bounds.

The compiler has decided to use the undefined behavior of the OOB array
access as an unreachable() assertion, and at the same time neglected to
issue the -Warray-bounds diagnostic which is expected to be issued for
OOB access situations that the compiler can identify.

No one can claim that the OOB situation in the code has escaped
identification, because a code-eliminating optimization was predicated
on it.

It looks as if the logic for identifying OOB accesses for diagnosis is
out of sync with the logic for identifying OOB accesses as assertions of undefined behavior.

In some situations, a surprising optimization occurs not because of
undefined behavior, but because the compiler is assuming well-defined
behavior (absence of UB).

That's not the case here; it is relying on the presence of UB.

Or rather, it is relyiing on the absence of UB in an assinine way:
it is assuming that the program does not reach the out-of-bounds
access, because the sought-after value is found in the array.

But that reasoning requires awareness of the existence of the
out-of-bounds access.

That's the crux of the issue there.

There is an unreachable() assertion in modern C. And it works by
invoking undefined behavior; it means "let's have undefined behavior
in this spot of the code". And then, since the compiler assumes
behavior is well-defined, assumes that that statement is not reached,
nor anything after it, and can eliminate it.

The problem is that an OOB array access should not be treated
as the same thing, as if it were unreachable(). Or, rather, no,
sure it's okay to treat an OOB arrary access as unreachable() --- IF
you generate the diagonstic about OOB array access that you
were asked to generate!!!
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
--- Synchronet 3.21a-Linux NewsLink 1.2

From Tristan Wibberley@tristan.wibberley+netnews2@alumni.manchester.ac.uk to comp.lang.c on Tue Jan 13 23:53:18 2026

From Newsgroup: comp.lang.c

On 12/01/2026 20:03, Tim Rentsch wrote:

Michael S <already5chosen@yahoo.com> writes:

Do you have citation from the Standard?

The short answer is section 6.5.6 paragraph 8.

There is amplification in Annex J.2, roughly three pages after the
start of J.2. You can search for "an array subscript is out of
range", where there is a clarifying example.

That is "... /apparently/ /accessible/ ..." not "... /actually/
/present/ ..." and "... given the /declaration/ ..." not "... given the /definition/ ..."

Annex J.2 is not an amplification but an inference. Fortunately there is
logic involved so statements are logical sums and logical products, the
logical sum of 1 with 1 is 1 as is the logical product, not 2 so no amplification. An inference in a technical defining document from its
own definitions is just clarification, not amplification, and a sanity
check that might help find inconsistencies.

From 6.5.7:

"8 For the purposes of these operators, a pointer to an object that is
not an element of an array behaves
the same as a pointer to the first element of an array of length one
with the type of the object as its
element type.
9 When an expression that has integer type is added to or subtracted
from a pointer, the result has the
type of the pointer operand. If the pointer operand points to an element
of an array object, and the
array is large enough ..."

Combining these, and padding requirements, you can definedly reach
existing elements of multidimensional array objects. The pointer to the
first element of the first nested array is as good as a pointer to the
first element of a non-nested array through which you can reach the
elements in the second nested array if they exist. That depends on the /definition/ of the array object, not on the /declaration/, hence the
inference whose conclusion was stated in J.2 regarding the
ineffectiveness of /declarations/.
--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- Synchronet 3.21a-Linux NewsLink 1.2

From James Russell Kuyper Jr.@jameskuyper@alumni.caltech.edu to comp.lang.c on Tue Jan 13 21:58:12 2026

From Newsgroup: comp.lang.c

On 2026-01-13 16:54, Tristan Wibberley wrote:

On 12/01/2026 15:58, bart wrote:

-astruct bar1 {
-a-a union {
-a-a-a-a struct {
-a-a-a-a-a-a int table[4];
-a-a-a-a-a-a int other_table[4];
-a-a-a-a-a };
-a-a-a-a int xtable[8];
-a-a };
-a};

-aint foo1(struct bar1* p, int v)
-a{
-a-a for (int i = 0; i <= 4; ++i)
-a-a-a-a if (p->xtable[i] == v)
-a-a-a-a-a-a return 1;
-a-a return 0;
-a}

...

IIRC indexing a table follows the rules of pointers and doing so outside
of a table's bounds is generally U/B except for very peculiar specific
cases. You can do it in a struct across members /sometimes/ because a
struct is a single object. ...

No, there is no such exception in the standard. It is still undefined behavior. One of the most annoying ways undefined behavior can manifest
is that you get exactly the same behavior that you incorrectly thought
you were guaranteed to get. That's a problem, because it can leave you
unaware of your error.

IIRC there is a standard version upon which certain combinations are guaranteed to be packed, the examples above /might/ exemplify some of them.

Elements of the same array are stored consecutively, with no gaps
between them. Consecutive bit-fields whose sizes allow them to be
allocated within the same allocation unit must be allocated from the
same allocation unit. Note that the size of the allocation unit is unspecified. I can't think of any other situations where the standard addresses packing. Many implementations provide non-standard extensions
that give you control over packing.

This is another matter:

int table[2][4];
(table[0][4] == v);

IIRC, that /is/ a valid reference to the first element of the second
table and is easier to rely on than other cases that might be valid.

No, table[0][4] violates 6.5.7p9. People have often had trouble seeing
that this is the case - asking about the issue was the very first
question I posed to comp.std.c a couple of decades ago. As a result,
starting in C99 an example was added (currently Annex J.2 "undefined
behavior" item 39) as follows:

"An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7]
given the declaration int a[4][5])".

This makes it clear that it's the length of the sub-array that limits
how much you can add to pointers that point at elements of the sub-array.

ie table[0][4] is equivalent to table[1][0] because both just juggle
pointers around within a single object in a way that's a valid pointer
at every moment (which is a stronger condition than what's actually
required anyway).

Yes, table[0]+4 is required to compare equal to table[1], but
table[0][4] has undefined behavior, while table[1][0] does not.
--- Synchronet 3.21a-Linux NewsLink 1.2

From James Russell Kuyper Jr.@jameskuyper@alumni.caltech.edu to comp.lang.c on Tue Jan 13 22:10:27 2026

From Newsgroup: comp.lang.c

On 2026-01-13 11:11, Andrey Tarasevich wrote:

On Mon 1/12/2026 9:36 AM, Michael S wrote:

But I was interested in the "opinion" of C Standard rather than of gcc
compiler.
Is it full nasal UB or merely "implementation-defined behavior"?

... And, of course, it is as
"implementation-defined" as any other UB in a sense that the standard permits implementations to _extend_ the language in any way they please,
as long as they don't forget to issue diagnostics when diagnostics are required (by the standard).

"implementation defined" is a term defined by the standard. It does not
carry it's ordinary English definition "defined by the implementation". Instead, it means "unspecified behavior where each implementation
documents how the choice is made". Unless the standard explicity says
that the behavior is implementation-defined, the fact that any
particular implementation chooses to define it is irrelevant.

Perhaps there's a switch in GCC that would outlaw the classic "struct
hack"... But in any case, it is not prohibited by default for
compatibility with pre-C99 code.

The struct hack has always technically had undefined behavior, but in
practice almost all C90 implementations allowed it to work. In C99
flexible array members were added, which allows the struct hack to work
with fully-defined behavior using slightly different syntax. The struct
hack itself is still just as undefined as it ever was, and because of
the invention of flexible array members, is increasingly likely to not
be supported.
--- Synchronet 3.21a-Linux NewsLink 1.2

From James Russell Kuyper Jr.@jameskuyper@alumni.caltech.edu to comp.lang.c on Tue Jan 13 22:19:23 2026

From Newsgroup: comp.lang.c

On 2026-01-12 13:08, Michael S wrote:

On Mon, 12 Jan 2026 15:58:15 +0000
bart <bc@freeuk.com> wrote:

...

struct bar1 {
union {
struct {
int table[4];
int other_table[4];
};
int xtable[8];
};
};

...

I'm not even sure about there being no padding between .table and
.other_table.

Considering that they both 'int' I don't think that it could happen,
even in standard C.

"Each non-bit-field member of a structure or union object is aligned in
an implementation-defined manner appropriate to its type." (6.7.3.2p16)
"... There can be unnamed padding within a structure object, but not
at its beginning." (6.7.3.2p17)

While I can't think of any good reason for an implementation to insert
padding between those objects, it would not violate any requirement of
the standard if one did.

--- Synchronet 3.21a-Linux NewsLink 1.2

From James Russell Kuyper Jr.@jameskuyper@alumni.caltech.edu to comp.lang.c on Tue Jan 13 22:20:03 2026

From Newsgroup: comp.lang.c

On 2026-01-12 12:36, Michael S wrote:

On Mon, 12 Jan 2026 08:03:31 -0800
Andrey Tarasevich <noone@noone.net> wrote:

On Mon 1/12/2026 6:28 AM, Michael S wrote:

According to C Standard, access to p->table[4] in foo1() is UB.
...
Now the question.
What The Standard says about foo2() ? Is there UB in foo2() as
well?

...

gcc code generator does not think so.

When the behavior is undefined, there's no such thing as incorrect
generated code. In particular, undefined behavior includes the
possibility of your code producing precisely the same behavior that you incorrectly thought it was required to have.

Do you have citation from the Standard?

table[4] is defined as equivalent to *(table+4), and and the relevant
rule for that expression is "If the addition or subtraction produces
an overflow, the behavior is undefined." (6.5.7p9)

...

But I was interested in the "opinion" of C Standard rather than of gcc compiler.
Is it full nasal UB or merely "implementation-defined behavior"?

UB.

--- Synchronet 3.21a-Linux NewsLink 1.2

From James Russell Kuyper Jr.@jameskuyper@alumni.caltech.edu to comp.lang.c on Tue Jan 13 22:20:28 2026

From Newsgroup: comp.lang.c

On 2026-01-12 15:41, Michael S wrote:
...

May be. But it's not expressed by gcc code generator or by any wranings.
So, how can we know?

It's impossible to determine whether the behavior of a piece of code is defined or undefined by examining the output of code generator, because there's nothing that a code generator is required to do when the
behavior is undefined, that it's not allowed to do when the behavior is defined (and vice-versa). The only way to determine whether the behavior
is defined is by examining the code and understanding what the relevant clauses of the standard say about it's behavior.

...

I am reading N3220 draft https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf
Here section 6.5.6 has no paragraph 8 :(

The latest version is n3685, but the behavior is still undefined.

There is amplification in Annex J.2, roughly three pages
after the start of J.2. You can search for "an array
subscript is out of range", where there is a clarifying
example.

I see the following text:
"An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.7)."

That's what you had in mind?

Yes, a[1][7] is defined by the standard as being equivalent to
*(*(a+1)+7). The +7 produces the overflow referred to in 6.5.7p9,
because 7 is greater than the 5 in "int a[4][5]", which makes it clear
that it's the length of the sub-array that matters, the fact that
there's another sub-array immediately following it does not render the behavior defined.

--- Synchronet 3.21a-Linux NewsLink 1.2

From James Russell Kuyper Jr.@jameskuyper@alumni.caltech.edu to comp.lang.c on Tue Jan 13 22:21:00 2026

From Newsgroup: comp.lang.c

On 2026-01-13 04:31, Michael S wrote:

On Mon, 12 Jan 2026 21:09:25 -0500
"James Russell Kuyper Jr." <jameskuyper@alumni.caltech.edu> wrote:

On 2026-01-12 15:02, Scott Lurndal wrote:

Michael S <already5chosen@yahoo.com> writes:

On Mon, 12 Jan 2026 15:58:15 +0000
bart <bc@freeuk.com> wrote:

On 12/01/2026 14:28, Michael S wrote:

...

struct bar1 {
int table[4];
int other_table[4];
};

...

So you want to deliberately read one element past the end because
you know it will be the first element of other_table?

Yes. I primarily want it for multi-dimensional arrays.

So declare it as int table[4][4].

Note that this suggestion does not make the behavior defined. It is
undefined behavior to make dereference table[0]+4, and it is
undefined behavior to make any use of table[0]+5.

Pay attention that Scott didn't suggest that dereferencing table[0][4]
in his example is defined.
Not that I understood what he wanted to suggest :(

That's the difference - I did understand. In your struct, other_table is
not required to immediately follow table, but in the 2D array, table[0]
is guaranteed to follow table[1]. That's not sufficient to make
table[0][5] have defined behavior, but many people are unaware of that,
or are willing to take the chance.

--- Synchronet 3.21a-Linux NewsLink 1.2

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Wed Jan 14 03:57:41 2026

From Newsgroup: comp.lang.c

Kaz Kylheku <046-301-5902@kylheku.com> wrote:

On 2026-01-10, Michael S <already5chosen@yahoo.com> wrote:

On Fri, 9 Jan 2026 20:14:04 -0000 (UTC)
Kaz Kylheku <046-301-5902@kylheku.com> wrote:

On 2026-01-09, Michael S <already5chosen@yahoo.com> wrote:

On Fri, 09 Jan 2026 01:42:53 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

The important thing to realize is that the fundamental issue here
is not a technical question but a social question. In effect what
you are asking is "why doesn't gcc (or clang, or whatever) do what
I want or expect?". The answer is different people want or expect
different things. For some people the behavior described is
egregiously wrong and must be corrected immediately. For other
people the compiler is acting just as they think it should,
nothing to see here, just fix the code and move on to the next
bug. Different people have different priorities.

I have hard time imagining sort of people that would have
objections in case compiler generates the same code as today, but
issues diagnostic.

If false positives occur for the diagnostic frequently, there
will be legitimate complaint.

If there is only a simple switch for it, it will get turned off
and then it no longer serves its purpose of catching errors.

There are all kinds of optimizations compilers commonly do that could
also be erroneous situations. For instance, eliminating dead code.

<snip>

I am not talking about some general abstraction, but about specific
case.
You example is irrelevant.
-Warray-bounds exists for a long time.
-Warray-bounds=1 is a part of -Wall set.

In your particular example, it is crystal clear that the "return 0"
statement is elided away due to being considered unreachable, and the
only reason for that can be undefined behavior, and the only undefined behavior is accessing the array out of bounds.

The compiler has decided to use the undefined behavior of the OOB array access as an unreachable() assertion, and at the same time neglected to
issue the -Warray-bounds diagnostic which is expected to be issued for
OOB access situations that the compiler can identify.

No one can claim that the OOB situation in the code has escaped identification, because a code-eliminating optimization was predicated
on it.

It looks as if the logic for identifying OOB accesses for diagnosis is
out of sync with the logic for identifying OOB accesses as assertions of undefined behavior.

AFAIK gcc warning machinery depends on information gathered during optimization. In this case reasonable guess is that optimizer
deleted offending access before warning machinery could see it.
I do not know how hard is to fix this.
--
Waldek Hebisch
--- Synchronet 3.21a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Tue Jan 13 22:02:46 2026

From Newsgroup: comp.lang.c

"James Russell Kuyper Jr." <jameskuyper@alumni.caltech.edu> writes:

On 2026-01-13 16:54, Tristan Wibberley wrote:

[...]

IIRC indexing a table follows the rules of pointers and doing so
outside of a table's bounds is generally U/B except for very peculiar
specific cases. You can do it in a struct across members /sometimes/
because a struct is a single object. ...

No, there is no such exception in the standard. It is still undefined behavior. One of the most annoying ways undefined behavior can
manifest is that you get exactly the same behavior that you
incorrectly thought you were guaranteed to get. That's a problem,
because it can leave you unaware of your error.

[...]

Perhaps the exception Tristan was referring to (though it doesn't apply
to indexing) is this, in N3220 6.5.10p7:

Two pointers compare equal if and only if both are null pointers,
both are pointers to the same object (including a pointer to an
object and a subobject at its beginning) or function, both are
pointers to one past the last element of the same array object,
or one is a pointer to one past the end of one array object and
the other is a pointer to the start of a different array object
that happens to immediately follow the first array object in
the address space.

with a footnote:

Two objects can be adjacent in memory because they are adjacent
elements of a larger array or adjacent members of a structure
with no padding between them, or because the implementation
chose to place them so, even though they are unrelated. If prior
invalid pointer operations (such as accesses outside array
bounds) produced undefined behavior, subsequent comparisons
also produce undefined behavior.

The idea, I think, is that without that paragraph, given something like
this:

#include <stdio.h>
int main(void) {
struct {
int a[10];
int b[10];
} obj;

printf("obj.a+10 %s obj.b\n",
obj.a+10 == obj.b ? "==" : "!=");
}

the compiler would have to go out of its way to treat obj.a+10 and obj.b
as unequal. (The output on my system is "obj.a+10 == obj.b", but the
pointers could be unequal if there's padding between a and b -- which is unlikely.)

(I reported a relevant bug in gcc, where for objects that happen to be
adjacent the addresses are reported as unequal with -O1 or higher; the
gcc maintainers disagreed. <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63611>)
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.21a-Linux NewsLink 1.2

From Tristan Wibberley@tristan.wibberley+netnews2@alumni.manchester.ac.uk to comp.lang.c on Wed Jan 14 08:06:50 2026

From Newsgroup: comp.lang.c

On 13/01/2026 23:53, Tristan Wibberley wrote:

Combining these, and padding requirements, you can definedly reach

I recall padding requirements for the extremes of array object types
from discussions on usenet years ago, however, perhaps they were for C++ because I can find nonsuch, nonsuch at all, not even the slightest peep,
in the standard final-drafts. There are several lingering evidences of
the requirement to have no padding even at the extremes of arrays with
element size 2 and above but all direct statement of such is missing.
The lingering evidence leaves a nondeterminism or unspecified nature to
some matters such as whether sizeof includes any padding at the extremes
of an array or not, while it is explicit about the matter for structs
and unions.

Even the example in the drafts of the use of sizeof array/sizeof
array[0] to determine the number of objects is excluded for arrays with elements of size 1 due to being unspecified by the standard by the
decree of the limitation in the section on representation that all representation constraints are found in that section alone and are
otherwise unspecified.

No constraints on representation of arrays is provided in any way
because the contiguousness of elements is mooted outside the
representation subclause, as is the sizeof trick, and the sizeof trick
can only work if the array is represented as contiguous representations
of its elements and is represented with no padding at its extremes,
neither of which is stipulated in the representation subclause that
allows stipulations on the matter only within its own bounds.

Furthermore, the representation stipulation horizon is in the "General" subclause preventing the "integer types" subclause from effecting specifications of representations.

If the drafts I can see actually reflect the standards as they were
rather than merely a history of them as the history now is then it was
always impractical to use ISO C anywhere a system had to be safe to use
and nearly all advice from outside the standard was unreliable and some
of the advice within it. An implementers document for C implementations
that ought be implemented but ought not have any programs to translate.

I have to recommend avoiding it everywhere that matters.
--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- Synchronet 3.21a-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Wed Jan 14 09:35:22 2026

From Newsgroup: comp.lang.c

On 14/01/2026 04:19, James Russell Kuyper Jr. wrote:

On 2026-01-12 13:08, Michael S wrote:

On Mon, 12 Jan 2026 15:58:15 +0000
bart <bc@freeuk.com> wrote:

...

struct bar1 {
union {
struct {
int table[4];
int other_table[4];
};
int xtable[8];
};
};

...

I'm not even sure about there being no padding between .table and
.other_table.

Considering that they both 'int' I don't think that it could happen,
even in standard C.

"Each non-bit-field member of a structure or union object is aligned in
an implementation-defined manner appropriate to its type." (6.7.3.2p16)
"... There can be unnamed padding within a structure object, but not
at its beginning." (6.7.3.2p17)

Does this allow different alignment rules for a type when it is
stand-alone, in an array, or in a struct? I don't think so - I have
always interpreted this to mean that the alignment is tied to the type,
not where the type is used.

Thus if "int" has 4-byte size and 4-byte alignment, and you have :

struct X {
char a;
int b;
int c;
int ds[4];
}

then there will be 3 bytes of padding between "a" and "b", but cannot be
any between "b" and "c" or between "c" and "ds".

Even if you have a weird system that has, say, 3-byte "int" with 4-byte alignment, where you would have a byte of padding between "b" and "c",
you would have the same padding there as between "ds[0]" and "ds[1]".

(None of this means you are allowed to access data with "p[i]" or "p +
i" outside of the range of the object that "p" points to or into.)

While I can't think of any good reason for an implementation to insert padding between those objects, it would not violate any requirement of
the standard if one did.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Wed Jan 14 10:47:21 2026

From Newsgroup: comp.lang.c

On Tue, 13 Jan 2026 23:31:26 -0000 (UTC)
Kaz Kylheku <046-301-5902@kylheku.com> wrote:

On 2026-01-10, Michael S <already5chosen@yahoo.com> wrote:

On Fri, 9 Jan 2026 20:14:04 -0000 (UTC)
Kaz Kylheku <046-301-5902@kylheku.com> wrote:

On 2026-01-09, Michael S <already5chosen@yahoo.com> wrote:

On Fri, 09 Jan 2026 01:42:53 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

The important thing to realize is that the fundamental issue
here is not a technical question but a social question. In
effect what you are asking is "why doesn't gcc (or clang, or
whatever) do what I want or expect?". The answer is different
people want or expect different things. For some people the
behavior described is egregiously wrong and must be corrected
immediately. For other people the compiler is acting just as
they think it should, nothing to see here, just fix the code
and move on to the next bug. Different people have different
priorities.

I have hard time imagining sort of people that would have
objections in case compiler generates the same code as today, but
issues diagnostic.

If false positives occur for the diagnostic frequently, there
will be legitimate complaint.

If there is only a simple switch for it, it will get turned off
and then it no longer serves its purpose of catching errors.

There are all kinds of optimizations compilers commonly do that
could also be erroneous situations. For instance, eliminating dead
code.

<snip>

I am not talking about some general abstraction, but about specific
case.
You example is irrelevant.
-Warray-bounds exists for a long time.
-Warray-bounds=1 is a part of -Wall set.

In your particular example, it is crystal clear that the "return 0"
statement is elided away due to being considered unreachable, and the
only reason for that can be undefined behavior, and the only undefined behavior is accessing the array out of bounds.

The compiler has decided to use the undefined behavior of the OOB
array access as an unreachable() assertion, and at the same time
neglected to issue the -Warray-bounds diagnostic which is expected to
be issued for OOB access situations that the compiler can identify.

No one can claim that the OOB situation in the code has escaped identification, because a code-eliminating optimization was predicated
on it.

It looks as if the logic for identifying OOB accesses for diagnosis is
out of sync with the logic for identifying OOB accesses as assertions
of undefined behavior.

In some situations, a surprising optimization occurs not because of
undefined behavior, but because the compiler is assuming well-defined behavior (absence of UB).

That's not the case here; it is relying on the presence of UB.

Or rather, it is relyiing on the absence of UB in an assinine way:
it is assuming that the program does not reach the out-of-bounds
access, because the sought-after value is found in the array.

But that reasoning requires awareness of the existence of the
out-of-bounds access.

That's the crux of the issue there.

There is an unreachable() assertion in modern C. And it works by
invoking undefined behavior; it means "let's have undefined behavior
in this spot of the code". And then, since the compiler assumes
behavior is well-defined, assumes that that statement is not reached,
nor anything after it, and can eliminate it.

The problem is that an OOB array access should not be treated
as the same thing, as if it were unreachable(). Or, rather, no,
sure it's okay to treat an OOB arrary access as unreachable() --- IF
you generate the diagonstic about OOB array access that you
were asked to generate!!!

Would you be so kind to submit a bug report to gcc bugzilla?
In theory, I can do it myself, but I have a tendency to be lazy.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Tristan Wibberley@tristan.wibberley+netnews2@alumni.manchester.ac.uk to comp.lang.c on Wed Jan 14 14:24:08 2026

From Newsgroup: comp.lang.c

On 14/01/2026 06:02, Keith Thompson wrote:

Perhaps the exception Tristan was referring to (though it doesn't apply
to indexing) is this, in N3220 6.5.10p7:

I think I was referring either to C++ or to an inference somebody else
had made once upon a time - or else the historical final-drafts
documents are different now than they were :/ Possibly K&R C specified something more defined than ISO C, I have lent my book out so I can't
check. It was supposedly updated to ISO C. Possibly it was pre-K&R C or implementation specific that I picked up as a teen and my uni professor cemented it in my mind as correct C when he gave me 100% for an
assignment in which I used it - and he was a stickler regarding
undefined behaviour.

Furthermore, to prevent similar lingering misconceptions indexing is not specified in the standards I'm looking at beyond that it's just pointer arithmetic - that is, a pointer derived from an array-name may not be
used with pointer arithmetic /alone/ that adjusts the pointer down by
any nor up by more than the number of elements in the array + 1, and if adjusted up by num_elements it can't be used with *. Note that in some standards' final-drafts I see that intermediate (perhaps, generally, non-lvalue) pointers may be created by pointer arithmetic when they are de-created again - and perhaps the pattern of arithmetic is very limited.

Some standard version says if you convert it to a large enough integer
type and back then it is not undefined behaviour but
"implementation-specific" instead, which is a new term on me ("implementation-specific" is not the same term as "implementation
specified").

... pointer equality snipped ...

The idea, I think, is that without that paragraph, given something like
this:

#include <stdio.h>
int main(void) {
struct {
int a[10];
int b[10];
} obj;

printf("obj.a+10 %s obj.b\n",
obj.a+10 == obj.b ? "==" : "!=");
}

the compiler would have to go out of its way to treat obj.a+10 and obj.b
as unequal

No it wouldn't. The standard could have just made the comparison
undefined behaviour or unspecified, or implementation specified in all
those cases when dereferencing was undefined or unspecified.
--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Wed Jan 14 16:48:24 2026

From Newsgroup: comp.lang.c

On Wed, 14 Jan 2026 14:24:08 +0000
Tristan Wibberley <tristan.wibberley+netnews2@alumni.manchester.ac.uk>
wrote:

On 14/01/2026 06:02, Keith Thompson wrote:

The idea, I think, is that without that paragraph, given something
like this:

#include <stdio.h>
int main(void) {
struct {
int a[10];
int b[10];
} obj;

printf("obj.a+10 %s obj.b\n",
obj.a+10 == obj.b ? "==" : "!=");
}

the compiler would have to go out of its way to treat obj.a+10 and
obj.b as unequal

No it wouldn't. The standard could have just made the comparison
undefined behaviour or unspecified, or implementation specified in all
those cases when dereferencing was undefined or unspecified.

In this particular case both pointers are defined and there is no dereferencing.

The issues as one above are treated in depth in this paper: https://gustedt.wordpress.com/2025/06/30/the-provenance-memory-model-for-c/ Which I naturally didn't read.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Tristan Wibberley@tristan.wibberley+netnews2@alumni.manchester.ac.uk to comp.lang.c on Wed Jan 14 14:49:06 2026

From Newsgroup: comp.lang.c

On 13/01/2026 23:31, Kaz Kylheku wrote:

No one can claim that the OOB situation in the code has escaped identification, because a code-eliminating optimization was predicated
on it.

One can. "Identification" means that inference or definition happened of
a proposition that two are the same. Here, merely behaviour was affected consistent with some of the consequences of identification, which is weaker.
--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- Synchronet 3.21a-Linux NewsLink 1.2

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Wed Jan 14 17:23:25 2026

From Newsgroup: comp.lang.c

David Brown <david.brown@hesbynett.no> wrote:

On 14/01/2026 04:19, James Russell Kuyper Jr. wrote:

On 2026-01-12 13:08, Michael S wrote:

On Mon, 12 Jan 2026 15:58:15 +0000
bart <bc@freeuk.com> wrote:

...

struct bar1 {
union {
struct {
int table[4];
int other_table[4];
};
int xtable[8];
};
};

...

I'm not even sure about there being no padding between .table and
.other_table.

Considering that they both 'int' I don't think that it could happen,
even in standard C.

"Each non-bit-field member of a structure or union object is aligned in
an implementation-defined manner appropriate to its type." (6.7.3.2p16)
"... There can be unnamed padding within a structure object, but not
at its beginning." (6.7.3.2p17)

Does this allow different alignment rules for a type when it is
stand-alone, in an array, or in a struct? I don't think so - I have
always interpreted this to mean that the alignment is tied to the type,
not where the type is used.

Thus if "int" has 4-byte size and 4-byte alignment, and you have :

struct X {
char a;
int b;
int c;
int ds[4];
}

then there will be 3 bytes of padding between "a" and "b", but cannot be
any between "b" and "c" or between "c" and "ds".

Why not? Assuming 4 byte int with 4 byte alignment I see nothing
wrong with adding 4 byte padding between b and c. More precisely, implementation could say that after first int field in a struct
there is always 4 byte padding. AFAICS alignment constraints
and initial segment rule are satified, padding is not at start
of the struct. Are there any other restrictions?

Even if you have a weird system that has, say, 3-byte "int" with 4-byte alignment, where you would have a byte of padding between "b" and "c",
you would have the same padding there as between "ds[0]" and "ds[1]".

(None of this means you are allowed to access data with "p[i]" or "p +
i" outside of the range of the object that "p" points to or into.)

While I can't think of any good reason for an implementation to insert
padding between those objects, it would not violate any requirement of
the standard if one did.

--
Waldek Hebisch
--- Synchronet 3.21a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Wed Jan 14 12:53:22 2026

From Newsgroup: comp.lang.c

antispam@fricas.org (Waldek Hebisch) writes:

David Brown <david.brown@hesbynett.no> wrote:

On 14/01/2026 04:19, James Russell Kuyper Jr. wrote:

On 2026-01-12 13:08, Michael S wrote:

On Mon, 12 Jan 2026 15:58:15 +0000
bart <bc@freeuk.com> wrote:

...

struct bar1 {
union {
struct {
int table[4];
int other_table[4];
};
int xtable[8];
};
};

...

I'm not even sure about there being no padding between .table and
.other_table.

Considering that they both 'int' I don't think that it could happen,
even in standard C.

"Each non-bit-field member of a structure or union object is aligned in
an implementation-defined manner appropriate to its type." (6.7.3.2p16) >>> "... There can be unnamed padding within a structure object, but not
at its beginning." (6.7.3.2p17)

Does this allow different alignment rules for a type when it is
stand-alone, in an array, or in a struct? I don't think so - I have
always interpreted this to mean that the alignment is tied to the type,
not where the type is used.

Thus if "int" has 4-byte size and 4-byte alignment, and you have :

struct X {
char a;
int b;
int c;
int ds[4];
}

then there will be 3 bytes of padding between "a" and "b", but cannot be
any between "b" and "c" or between "c" and "ds".

Why not? Assuming 4 byte int with 4 byte alignment I see nothing
wrong with adding 4 byte padding between b and c.

Right. As long as alignment requirements are satisfied, an
implementation is free to put as much padding as it wants
between struct members.

More precisely,
implementation could say that after first int field in a struct
there is always 4 byte padding. AFAICS alignment constraints
and initial segment rule are satified, padding is not at start
of the struct. Are there any other restrictions?

There are some consistency requirements with respect to other
struct types that have a common starting sequence of members.
Basically, as long as the rules stay the same from struct to
struct (and alignment rules are respected), then there can be
however much padding the implementation chooses to add, at any
point between struct members (with some additional restrictions
for bitfields).
--- Synchronet 3.21a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Wed Jan 14 14:43:08 2026

From Newsgroup: comp.lang.c

David Brown <david.brown@hesbynett.no> writes:

On 14/01/2026 04:19, James Russell Kuyper Jr. wrote:

On 2026-01-12 13:08, Michael S wrote:

On Mon, 12 Jan 2026 15:58:15 +0000
bart <bc@freeuk.com> wrote:

...

struct bar1 {
union {
struct {
int table[4];
int other_table[4];
};
int xtable[8];
};
};

...

I'm not even sure about there being no padding between .table and
.other_table.

Considering that they both 'int' I don't think that it could happen,
even in standard C.

"Each non-bit-field member of a structure or union object is aligned
in an implementation-defined manner appropriate to its type."
(6.7.3.2p16)
"... There can be unnamed padding within a structure object, but not
at its beginning." (6.7.3.2p17)

Does this allow different alignment rules for a type when it is
stand-alone, in an array, or in a struct? I don't think so - I have
always interpreted this to mean that the alignment is tied to the
type, not where the type is used.

Note that the alignof operator applies to a type, not to an expression
or object.

Thus if "int" has 4-byte size and 4-byte alignment, and you have :

struct X {
char a;
int b;
int c;
int ds[4];
}

then there will be 3 bytes of padding between "a" and "b", but cannot
be any between "b" and "c" or between "c" and "ds".

There can be arbitrary padding between struct members, or after the last member. Almost(?) all implementations add padding only to satisfy
alignment requirements, but the standard doesn't state any restrictions.
There can be no padding before the first member, and offsets of members
must be increasing.

If alignof (int) is 4, a compiler must place an int object at an address
that's a multiple of 4. It's free to place it at a multiple of 8, or
16, if it chooses.

Even if you have a weird system that has, say, 3-byte "int" with
4-byte alignment, where you would have a byte of padding between "b"
and "c", you would have the same padding there as between "ds[0]" and "ds[1]".

sizeof (int) == 3 and alignof (int) == 4 is not possible. Each type's
size is a multiple of its alignment. There is no padding between array elements.

[...]
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.21a-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Thu Jan 15 11:45:00 2026

From Newsgroup: comp.lang.c

On 14/01/2026 23:43, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:

On 14/01/2026 04:19, James Russell Kuyper Jr. wrote:

On 2026-01-12 13:08, Michael S wrote:

On Mon, 12 Jan 2026 15:58:15 +0000
bart <bc@freeuk.com> wrote:

...

struct bar1 {
union {
struct {
int table[4];
int other_table[4];
};
int xtable[8];
};
};

...

I'm not even sure about there being no padding between .table and
.other_table.

Considering that they both 'int' I don't think that it could happen,
even in standard C.

"Each non-bit-field member of a structure or union object is aligned
in an implementation-defined manner appropriate to its type."
(6.7.3.2p16)
"... There can be unnamed padding within a structure object, but not
at its beginning." (6.7.3.2p17)

Does this allow different alignment rules for a type when it is
stand-alone, in an array, or in a struct? I don't think so - I have
always interpreted this to mean that the alignment is tied to the
type, not where the type is used.

Note that the alignof operator applies to a type, not to an expression
or object.

Thus if "int" has 4-byte size and 4-byte alignment, and you have :

struct X {
char a;
int b;
int c;
int ds[4];
}

then there will be 3 bytes of padding between "a" and "b", but cannot
be any between "b" and "c" or between "c" and "ds".

There can be arbitrary padding between struct members, or after the last member. Almost(?) all implementations add padding only to satisfy
alignment requirements, but the standard doesn't state any restrictions. There can be no padding before the first member, and offsets of members
must be increasing.

On closer reading, I agree with you here. I find it a little surprising
that this is not implementation-defined. If an implementation can
arbitrarily add extra padding within a struct, it severely limits the
use of structs in contexts outside the current translation unit.

If alignof (int) is 4, a compiler must place an int object at an address that's a multiple of 4. It's free to place it at a multiple of 8, or
16, if it chooses.

Even if you have a weird system that has, say, 3-byte "int" with
4-byte alignment, where you would have a byte of padding between "b"
and "c", you would have the same padding there as between "ds[0]" and
"ds[1]".

sizeof (int) == 3 and alignof (int) == 4 is not possible. Each type's
size is a multiple of its alignment. There is no padding between array elements.

I have not, as yet, found a justification for those statements in the standards. But I'll keep looking!

--- Synchronet 3.21a-Linux NewsLink 1.2

From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Thu Jan 15 06:16:35 2026

From Newsgroup: comp.lang.c

On 2026-01-15 05:45, David Brown wrote:

On 14/01/2026 23:43, Keith Thompson wrote:

...

sizeof (int) == 3 and alignof (int) == 4 is not possible. Each type's
size is a multiple of its alignment. There is no padding between array
elements.

I have not, as yet, found a justification for those statements in the standards. But I'll keep looking!

They follow from a couple of facts:
Each element in an array of type T must be correctly aligned for an
object of type T.
No space is allowed between the elements of an array. Note, in
particular, that this implies that if a type uses only 3 bytes, but has
an alignment requirement of 2, it must be padded to a length of 4 bytes,
and sizeof(T) must reflect that size, and not the number of bytes that
the type actually uses.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Thu Jan 15 04:04:05 2026

From Newsgroup: comp.lang.c

David Brown <david.brown@hesbynett.no> writes:

On 14/01/2026 23:43, Keith Thompson wrote:

[...]

There can be arbitrary padding between struct members, or after the
last member. Almost(?) all implementations add padding only to
satisfy alignment requirements, but the standard doesn't state any
restrictions. There can be no padding before the first member, and
offsets of members must be increasing.

On closer reading, I agree with you here. I find it a little
surprising that this is not implementation-defined. If an
implementation can arbitrarily add extra padding within a struct, it
severely limits the use of structs in contexts outside the current translation unit.

In practice, struct layouts are (I think) typically specified by
a system's ABI, and ABIs generally permit/require only whatever
padding is necessary to meet alignment requirements.

And I think C has rules about type compatibility that are intended to
cover the same struct definition being used in different translation
units within a program, though I'm too lazy to look up the details.

[...]
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.21a-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Thu Jan 15 13:56:09 2026

From Newsgroup: comp.lang.c

On 15/01/2026 13:04, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:

On 14/01/2026 23:43, Keith Thompson wrote:

[...]

There can be arbitrary padding between struct members, or after the
last member. Almost(?) all implementations add padding only to
satisfy alignment requirements, but the standard doesn't state any
restrictions. There can be no padding before the first member, and
offsets of members must be increasing.

On closer reading, I agree with you here. I find it a little
surprising that this is not implementation-defined. If an
implementation can arbitrarily add extra padding within a struct, it
severely limits the use of structs in contexts outside the current
translation unit.

In practice, struct layouts are (I think) typically specified by
a system's ABI, and ABIs generally permit/require only whatever
padding is necessary to meet alignment requirements.

Sure. I would be very surprised to see a real compiler add extra
padding in a struct, beyond what was needed for alignment. And real
compilers usually use well-documented ABI's that go beyond the
requirements of C's implementation-defined behaviours in their detail.
It just strikes me as a little odd that the standard says
implementations must document things like how they split bit-fields
between addressable units, but makes no requirements at all about how
much extra padding they can have between struct fields.

And I think C has rules about type compatibility that are intended to
cover the same struct definition being used in different translation
units within a program, though I'm too lazy to look up the details.

[...]

--- Synchronet 3.21a-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Thu Jan 15 15:10:34 2026

From Newsgroup: comp.lang.c

David Brown <david.brown@hesbynett.no> writes:

On 14/01/2026 23:43, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:

On 14/01/2026 04:19, James Russell Kuyper Jr. wrote:

There can be arbitrary padding between struct members, or after the last
member. Almost(?) all implementations add padding only to satisfy
alignment requirements, but the standard doesn't state any restrictions.
There can be no padding before the first member, and offsets of members
must be increasing.

On closer reading, I agree with you here. I find it a little surprising >that this is not implementation-defined. If an implementation can >arbitrarily add extra padding within a struct, it severely limits the
use of structs in contexts outside the current translation unit.

Including representing typical networking packet headers as structs.

Fortunately, most C compilers have some form of __attribute__((packed))
to inform the compiler that the structure layout should not be padded.
--- Synchronet 3.21a-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Thu Jan 15 16:23:43 2026

From Newsgroup: comp.lang.c

On 15/01/2026 16:10, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

On 14/01/2026 23:43, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:

On 14/01/2026 04:19, James Russell Kuyper Jr. wrote:

There can be arbitrary padding between struct members, or after the last >>> member. Almost(?) all implementations add padding only to satisfy
alignment requirements, but the standard doesn't state any restrictions. >>> There can be no padding before the first member, and offsets of members
must be increasing.

On closer reading, I agree with you here. I find it a little surprising
that this is not implementation-defined. If an implementation can
arbitrarily add extra padding within a struct, it severely limits the
use of structs in contexts outside the current translation unit.

Including representing typical networking packet headers as structs.

Fortunately, most C compilers have some form of __attribute__((packed))
to inform the compiler that the structure layout should not be padded.

I very rarely find any benefit in using packed structs - typically only
if the layout was originally designed without thought for alignment, or
where the maximum considered alignment was smaller than on modern
systems. My preference is to add padding fields manually, then have a static_assert on the size of the struct to check for problems. That
does not necessarily mean the code is always portable, but at least if
it is not going to work on a given platform, I get a compile-time error
rather than a mystical bug!

--- Synchronet 3.21a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Tue Feb 3 21:53:49 2026

From Newsgroup: comp.lang.c

Michael S <already5chosen@yahoo.com> writes:

On Mon, 12 Jan 2026 12:03:36 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

Michael S <already5chosen@yahoo.com> writes:

On Mon, 12 Jan 2026 08:03:31 -0800
Andrey Tarasevich <noone@noone.net> wrote:

On Mon 1/12/2026 6:28 AM, Michael S wrote:

According to C Standard, access to p->table[4] in foo1() is UB.
...
Now the question.
What The Standard says about foo2() ? Is there UB in foo2() as
well?

Yes, in the same sense as in `foo1`.

gcc code generator does not think so.

It definitely does.

Right.

May be. But it's not expressed by gcc code generator or by any
wranings. So, how can we know?

I know the behavior is undefined by what is said in the C standard.

For what gcc developers think of the question, for me the totality
of circumstantial evidence suffices. I have nothing to offer if the
goal is to convince skeptics.

Do you have citation from the Standard?

The short answer is section 6.5.6 paragraph 8.

I am reading N3220 draft https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf
Here section 6.5.6 has no paragraph 8 :(

I hope it isn't too much to expect that if N3220 doesn't have
what you are looking for then you would try looking in earlier
versions of the C standard, either C99 (N1256) or C11 (N1570).

There is amplification in Annex J.2, roughly three pages
after the start of J.2. You can search for "an array
subscript is out of range", where there is a clarifying
example.

I see the following text:
"An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.7)."

That's what you had in mind?

Yes.

Note the section quoted section number, 6.5.7, gives the correct
section number in N3220 to locate the aforementioned reference.
I see that in N3220 the relevant paragraph is paragraph 9 rather
than paragraph 8. I hope that would be evident by looking at the
contents of section 6.5.7.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sun Mar 1 22:53:28 2026

From Newsgroup: comp.lang.c

Andrey Tarasevich <noone@noone.net> writes:

On Mon 1/12/2026 9:36 AM, Michael S wrote:

But I was interested in the "opinion" of C Standard rather than of gcc
compiler.
Is it full nasal UB or merely "implementation-defined behavior"?

It is full nasal UB per the standard. And, of course, it is as "implementation-defined" as any other UB in a sense that the standard
permits implementations to _extend_ the language in any way they
please, as long as they don't forget to issue diagnostics when
diagnostics are required (by the standard).

There are two schools of thought on that question. For example, if
an implementation extends the ISO standard by adding a syntax rule,
then using a construct following the added rule does not violate the
syntax and hence no diagnostic is required. Conversely, it would be
silly for the C standard to say extensions are allowed if what the
extensions do could be done anyway under the umbrella of undefined
behavior (after a diagnostic is issued). The point of saying the
standard allows extensions is so that an implementation may decline
to issue a diagnostic in certain cases where one would otherwise be
required.

I'm not claiming that this view is the only view possible, only that
it is consistent with what is said in the C standard.
--- Synchronet 3.21d-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Geek2
  Wed Mar 4 21:24:21 2026
  from Euclid, Oh via Telnet
- Geek2
  Wed Mar 4 18:27:09 2026
  from Euclid, Oh via Telnet
- Geek2
  Tue Mar 3 10:26:12 2026
  from Euclid, Oh via Telnet
- Geek2
  Mon Mar 2 11:22:09 2026
  from Euclid, Oh via Telnet

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	59
Nodes:	6 (1 / 5)
Uptime:	19:38:59
Calls:	812
Calls today:	2
Files:	1,287
D/L today:	20 files (23,248K bytes)
Messages:	210,068

Re: On Undefined Behavior

Who's Online

Recent Visitors

System Info