• On Undefined Behavior

    From highcrew@high.crew3868@fastmail.com to comp.lang.c on Thu Jan 1 22:54:05 2026
    From Newsgroup: comp.lang.c

    Hello,

    While I consider myself reasonably good as C programmer, I still
    have difficulties in understanding undefined behavior.
    I wonder if anyone in this NG could help me.

    Let's take an example. There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
    So let's focus on https://godbolt.org/z/48bn19Tsb

    For the lazy, I report it here:

    int table[4] = {0};
    int exists_in_table(int v)
    {
    // return true in one of the first 4 iterations
    // or UB due to out-of-bounds access
    for (int i = 0; i <= 4; i++) {
    if (table[i] == v) return 1;
    }
    return 0;
    }

    This is compiled (with no warning whatsoever) into:

    exists_in_table:
    mov eax, 1
    ret
    table:
    .zero 16


    Well, this is *obviously* wrong. And sure, so is the original code,
    but I find it hard to think that the compiler isn't able to notice it,
    given that it is even "exploiting" it to produce very efficient code.

    I understand the formalism: the resulting assembly is formally
    "correct", in that UB implies that anything can happen.
    Yet I can't think of any situation where the resulting assembly
    could be considered sensible. The compiled function will
    basically return 1 for any input, and the final program will be
    buggy.

    Wouldn't it be more sensible to have a compilation error, or
    at least a warning? The compiler will be happy even with -Wall -Wextra -Werror.

    There's plenty of documentation, articles and presentations that
    explain how this can make very efficient code... but nothing
    will answer this question: do I really want to be efficiently
    wrong?

    I mean, yes I would find the problem, thanks to my 100% coverage
    unit testing, but couldn't the compiler give me a hint?

    Could someone drive me into this reasoning? I know there is a lot of
    thinking behind it, yet everything seems to me very incorrect!
    I'm in deep cognitive dissonance here! :) Help!
    --
    High Crew
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Fri Jan 2 00:26:16 2026
    From Newsgroup: comp.lang.c

    On Thu, 1 Jan 2026 22:54:05 +0100
    highcrew <high.crew3868@fastmail.com> wrote:

    Hello,

    While I consider myself reasonably good as C programmer, I still
    have difficulties in understanding undefined behavior.
    I wonder if anyone in this NG could help me.

    Let's take an example. There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
    So let's focus on https://godbolt.org/z/48bn19Tsb

    For the lazy, I report it here:

    int table[4] = {0};
    int exists_in_table(int v)
    {
    // return true in one of the first 4 iterations
    // or UB due to out-of-bounds access
    for (int i = 0; i <= 4; i++) {
    if (table[i] == v) return 1;
    }
    return 0;
    }

    This is compiled (with no warning whatsoever) into:

    exists_in_table:
    mov eax, 1
    ret
    table:
    .zero 16


    Well, this is *obviously* wrong. And sure, so is the original code,
    but I find it hard to think that the compiler isn't able to notice it,
    given that it is even "exploiting" it to produce very efficient code.

    I understand the formalism: the resulting assembly is formally
    "correct", in that UB implies that anything can happen.
    Yet I can't think of any situation where the resulting assembly
    could be considered sensible. The compiled function will
    basically return 1 for any input, and the final program will be
    buggy.

    Wouldn't it be more sensible to have a compilation error, or
    at least a warning? The compiler will be happy even with -Wall
    -Wextra -Werror.

    There's plenty of documentation, articles and presentations that
    explain how this can make very efficient code... but nothing
    will answer this question: do I really want to be efficiently
    wrong?

    I mean, yes I would find the problem, thanks to my 100% coverage
    unit testing, but couldn't the compiler give me a hint?

    Could someone drive me into this reasoning? I know there is a lot of
    thinking behind it, yet everything seems to me very incorrect!
    I'm in deep cognitive dissonance here! :) Help!


    IMHO, for compiler that eliminated all comparisons (I assume that it was
    gcc -O2/-O3) an absence of warning is a bug.
    It's worth reporting.

    And it has nothing to do with C standard and what considered UB by the
    standard and what not. It's a bug from practical POV and I believe that
    gcc maintainers will admit it and will fix it. Eventually, that is. Not
    too quickly.






    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Thu Jan 1 17:49:40 2026
    From Newsgroup: comp.lang.c

    On 2026-01-01 16:54, highcrew wrote:
    Hello,

    While I consider myself reasonably good as C programmer, I still
    have difficulties in understanding undefined behavior.
    I wonder if anyone in this NG could help me.

    Let's take an example. There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
    So let's focus on https://godbolt.org/z/48bn19Tsb

    For the lazy, I report it here:

    int table[4] = {0};
    int exists_in_table(int v)
    {
    // return true in one of the first 4 iterations
    // or UB due to out-of-bounds access
    for (int i = 0; i <= 4; i++) {
    if (table[i] == v) return 1;
    }
    return 0;
    }

    This is compiled (with no warning whatsoever) into:

    exists_in_table:
    mov eax, 1
    ret
    table:
    .zero 16


    Well, this is *obviously* wrong. And sure, so is the original code,
    but I find it hard to think that the compiler isn't able to notice it,
    given that it is even "exploiting" it to produce very efficient code.

    I understand the formalism: the resulting assembly is formally
    "correct", in that UB implies that anything can happen.
    Yet I can't think of any situation where the resulting assembly
    could be considered sensible. The compiled function will
    basically return 1 for any input, and the final program will be
    buggy.

    I agree.

    Wouldn't it be more sensible to have a compilation error, or
    at least a warning? The compiler will be happy even with -Wall -Wextra -Werror.

    The rule that this code violates is still violated if an array is
    accessed through a pointer, from a module which has no knowledge of the
    actual length of the array. As a result, it does not make sense for the standard to require diagnosis of all such violations.
    However, implementations are free to diagnose violations such as this
    one, where it would be perfectly feasible to do so. Whether or not implementations actually do so is considered a matter of "Quality of Implementation" (QoI), and therefore outside the scope of the standard. Generating code that is only justified because the behavior is
    undefined, and failing to diagnose the problem, seems to me to be very
    bad QoI.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From highcrew@high.crew3868@fastmail.com to comp.lang.c on Thu Jan 1 23:57:21 2026
    From Newsgroup: comp.lang.c

    On 1/1/26 11:26 PM, Michael S wrote:
    On Thu, 1 Jan 2026 22:54:05 +0100
    highcrew <high.crew3868@fastmail.com> wrote:
    For the lazy, I report it here:

    int table[4] = {0};
    int exists_in_table(int v)
    {
    // return true in one of the first 4 iterations
    // or UB due to out-of-bounds access
    for (int i = 0; i <= 4; i++) {
    if (table[i] == v) return 1;
    }
    return 0;
    }

    This is compiled (with no warning whatsoever) into:

    exists_in_table:
    mov eax, 1
    ret
    table:
    .zero 16

    [...]

    IMHO, for compiler that eliminated all comparisons (I assume that it was
    gcc -O2/-O3) an absence of warning is a bug.

    Correct: -O2. Same code compiled with -O0:
    https://godbolt.org/z/xE61j3PdM -> Still no warnings, although it will
    at least contain
    the `return 0`.

    Which doesn't mean it is better: the code is obviously wrong, and
    *I think* the compiler should whine.

    It's worth reporting.

    Do you think so? I think everybody already knows...

    First off, this example is mentioned since forever on the cppreference
    website. Secondly, no warning is issued by Clang either.

    Again, with this I don't mean the compilers are behaving correctly.
    I just think this situation is well known.
    --
    High Crew
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Fri Jan 2 05:53:13 2026
    From Newsgroup: comp.lang.c

    highcrew <high.crew3868@fastmail.com> wrote:
    Hello,

    While I consider myself reasonably good as C programmer, I still
    have difficulties in understanding undefined behavior.
    I wonder if anyone in this NG could help me.

    Let's take an example. There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
    So let's focus on https://godbolt.org/z/48bn19Tsb

    For the lazy, I report it here:

    int table[4] = {0};
    int exists_in_table(int v)
    {
    // return true in one of the first 4 iterations
    // or UB due to out-of-bounds access
    for (int i = 0; i <= 4; i++) {
    if (table[i] == v) return 1;
    }
    return 0;
    }

    This is compiled (with no warning whatsoever) into:

    exists_in_table:
    mov eax, 1
    ret
    table:
    .zero 16


    Well, this is *obviously* wrong. And sure, so is the original code,
    but I find it hard to think that the compiler isn't able to notice it,
    given that it is even "exploiting" it to produce very efficient code.

    I understand the formalism: the resulting assembly is formally
    "correct", in that UB implies that anything can happen.
    Yet I can't think of any situation where the resulting assembly
    could be considered sensible. The compiled function will
    basically return 1 for any input, and the final program will be
    buggy.

    You do not get the formalism: compiler applies a lot transformations
    which are supposed to be correct for programs obeying the C rules.
    However, compiler does not understand the program. It may notice
    details that you missed, but it act essentialy blindly on
    information it has. And most transformations have only limited
    info (storing all things that compiler infers would take a lot
    of memory and searching all info would take a lot of time).

    Code that you see is a result of many transformations, possibly
    hundreds or more. The result is a conseqence of all steps,
    but it could be hard to isolate a single "silly" step.

    Wouldn't it be more sensible to have a compilation error, or
    at least a warning? The compiler will be happy even with -Wall -Wextra -Werror.

    This case looks reasonably easy: when compiling 'exists_in_table'
    the compiler had declaration of 'table' and knows it size is 4.
    Compiler generated its output probably after noticing that
    the loop would produce out of bound reference. So with some
    extra effort it should be possible to generate a diagnostic.
    But in general, instead of array you may have a pointer without
    bound information. Or upper bound may be variable. As James
    wrote, for such reasons C standard does not require a diagnostic.
    Also, in the past gcc and clang did not generate diagnostics
    in such situation. gcc is very complex beast and adding
    diagnostics now may require nontrivial effort.

    BTW: I expect that eventually gcc will warn. Ideologicaly,
    using various string functions can overflow buffers in
    similar ways. In the past such buffers overflow just generated
    some (possibly "working") code. Now most such uses report
    warnings. In fact, this problem looks like an outlier.

    There's plenty of documentation, articles and presentations that
    explain how this can make very efficient code... but nothing
    will answer this question: do I really want to be efficiently
    wrong?

    By using C you implicitely gave "yes" as an answer.

    I mean, yes I would find the problem, thanks to my 100% coverage
    unit testing, but couldn't the compiler give me a hint?

    Since it gave no hint it probably could not. In cases when it
    can it warns (at least when you activate warnings).

    Could someone drive me into this reasoning? I know there is a lot of
    thinking behind it, yet everything seems to me very incorrect!
    I'm in deep cognitive dissonance here! :) Help!

    --
    Waldek Hebisch
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Fri Jan 2 10:31:19 2026
    From Newsgroup: comp.lang.c

    On 01/01/2026 22:54, highcrew wrote:
    Hello,

    While I consider myself reasonably good as C programmer, I still
    have difficulties in understanding undefined behavior.
    I wonder if anyone in this NG could help me.

    Let's take an example.-a There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
    So let's focus on https://godbolt.org/z/48bn19Tsb

    For the lazy, I report it here:

    -a int table[4] = {0};
    -a int exists_in_table(int v)
    -a {
    -a-a-a-a-a // return true in one of the first 4 iterations
    -a-a-a-a-a // or UB due to out-of-bounds access
    -a-a-a-a-a for (int i = 0; i <= 4; i++) {
    -a-a-a-a-a-a-a-a-a if (table[i] == v) return 1;
    -a-a-a-a-a }
    -a-a-a-a-a return 0;
    -a }

    This is compiled (with no warning whatsoever) into:

    -a exists_in_table:
    -a-a-a-a-a-a-a-a-a mov-a-a-a-a eax, 1
    -a-a-a-a-a-a-a-a-a ret
    -a table:
    -a-a-a-a-a-a-a-a-a .zero-a-a 16


    Well, this is *obviously* wrong. And sure, so is the original code,
    but I find it hard to think that the compiler isn't able to notice it,
    given that it is even "exploiting" it to produce very efficient code.


    Since the original code is wrong, it doesn't make sense to ask if the generated object code is right or wrong. This principle was understood
    from the days of mechanical computers: <https://www.brainyquote.com/quotes/charles_babbage_141832>

    However, it /does/ make sense to ask whether the compiler could have
    been more helpful in pointing out your mistake - and clearly, in theory
    at least, the answer is yes.

    Compilers first parse the source code into an internal format. Then
    they run through this in a series of passes, detecting errors and
    generating warnings, manipulating the code for optimisation, generating
    new types of internal formats, and so on. For modern compilers there
    will be scores of passes. Ideally, the ordering of these passes is such
    that the pass that figures out the size of "table" and the range of the
    loop variable will come before the pass that issues out-of-bounds
    warnings, and that will come before the pass that figures out that the
    "return 0;" cannot be reached without passing through UB and thus the
    only possible return value for the function is 1. For many optimisation passes, once the compiler has figured out a way to simplify the code,
    some aspects (such as paths leading up to UB that have been eliminated)
    are lost, and the compiler can't see the problem (and therefore can't
    issue a diagnostic) later on.

    Unfortunately here, the passes are not ordered ideally for catching this source code bug. I think it is inevitable that such situations will
    arise - no matter how the passes are ordered, you are going to be able
    to find samples where there is a bug that can easily be spotted by a
    human reader but which is not diagnosed by the compiler. Change the
    pass order to one that will catch this situation, and another bug will
    now go undiagnosed. Compiler writers do strive to catch what they can,
    but they must also balance compile times, compiler complexity, and of
    course other demands on their own development time. Sometimes that
    means things can appear simple and obvious to the user, but would
    require unwarranted effort to implement in the compiler.

    I had a little look in the gcc bugzilla, but could not find any issue
    that directly matches this case. So I think it is worthwhile if you
    file it in as a gcc bug. (It is not technically a "bug", but it is
    definitely an "opportunity for improvement".) If the gcc passes make it
    hard to implement as a normal warning, it may still be possible to add
    it to the "-fanalyzer" passes.


    I understand the formalism: the resulting assembly is formally
    "correct", in that UB implies that anything can happen.
    Yet I can't think of any situation where the resulting assembly
    could be considered sensible.-a The compiled function will
    basically return 1 for any input, and the final program will be
    buggy.

    Wouldn't it be more sensible to have a compilation error, or
    at least a warning?-a The compiler will be happy even with -Wall -Wextra -Werror.

    There's plenty of documentation, articles and presentations that
    explain how this can make very efficient code... but nothing
    will answer this question: do I really want to be efficiently
    wrong?

    You are asking if you want the generated code to be efficiently wrong or inefficiently wrong?


    I mean, yes I would find the problem, thanks to my 100% coverage
    unit testing, but couldn't the compiler give me a hint?

    You can also make use of run-time sanitizers that are ideal for catching
    this sort of thing (albeit with an inevitable speed overhead).

    <https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html>



    Could someone drive me into this reasoning? I know there is a lot of
    thinking behind it, yet everything seems to me very incorrect!
    I'm in deep cognitive dissonance here! :) Help!


    Basically, gcc is a great tool, but not it is not perfect. Reporting
    such issues can help improve it, but there are no guarantees.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From highcrew@high.crew3868@fastmail.com to comp.lang.c on Fri Jan 2 17:38:32 2026
    From Newsgroup: comp.lang.c

    On 1/2/26 6:53 AM, Waldek Hebisch wrote:
    highcrew <high.crew3868@fastmail.com> wrote:

    You do not get the formalism: compiler applies a lot transformations
    which are supposed to be correct for programs obeying the C rules.
    However, compiler does not understand the program. It may notice
    details that you missed, but it act essentialy blindly on
    information it has. And most transformations have only limited
    info (storing all things that compiler infers would take a lot
    of memory and searching all info would take a lot of time).

    Code that you see is a result of many transformations, possibly
    hundreds or more. The result is a conseqence of all steps,
    but it could be hard to isolate a single "silly" step.
    [...]

    Thanks for your answer.

    So you are basically saying that spotting such a problem is
    way more difficult than optimizing it? And indeed so difficult that the compiler fails at it?

    There's plenty of documentation, articles and presentations that
    explain how this can make very efficient code... but nothing
    will answer this question: do I really want to be efficiently
    wrong?

    By using C you implicitely gave "yes" as an answer.

    Wait, I don't think that makes sense.
    If we are talking about a legitimate limitation of the compilers, as you
    seem to suggest, then it is a different situation.

    Perhaps it would be more proper to say that, by using C, one implicitly
    accepts to take the burden of writing UB-free code.
    The compiler can't guarantee that it will detect UB, so the contract
    is: you get a correct program if you write correct code.
    --
    High Crew
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From highcrew@high.crew3868@fastmail.com to comp.lang.c on Fri Jan 2 17:51:38 2026
    From Newsgroup: comp.lang.c

    On 1/2/26 10:31 AM, David Brown wrote:
    However, it /does/ make sense to ask whether the compiler could have
    been more helpful in pointing out your mistake - and clearly, in theory
    at least, the answer is yes.
    [...]

    Thanks for the clarification.

    Yes, I'm exactly wondering if the compiler shouldn't reject that code
    to begin with. I'm not expecting to enter wrong code and get a
    working program. That would be dark magic.

    So you are basically confirming what I inferred from Waldek Hebisch's
    answer: it is actually quite hard for the compiler to spot it. So we
    live with it.

    I had a little look in the gcc bugzilla, but could not find any issue
    that directly matches this case.-a So I think it is worthwhile if you
    file it in as a gcc bug.-a (It is not technically a "bug", but it is definitely an "opportunity for improvement".)-a If the gcc passes make it hard to implement as a normal warning, it may still be possible to add
    it to the "-fanalyzer" passes.

    Erm... I will considering filing an "opportunity for improvement"
    ticket then, thank you.

    There's plenty of documentation, articles and presentations that
    explain how this can make very efficient code... but nothing
    will answer this question: do I really want to be efficiently
    wrong?

    You are asking if you want the generated code to be efficiently wrong or inefficiently wrong?

    I was asking if it is reasonable to accept as valid a program which
    is wrong, and make it optimized in its wrong behavior.

    What I could not grasp is the difficulty of the job.
    To quote your own words:

    Sometimes that means things can appear simple and obvious to the user,
    but would require unwarranted effort to implement in the compiler."

    You can also make use of run-time sanitizers that are ideal for catching this sort of thing (albeit with an inevitable speed overhead).

    <https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html>

    Yes, I'm aware of this instruments, but I'm not very knowledgeable about
    it. I'd like to learn more, and I'll need to spend time doing so.


    Thanks!
    --
    High Crew
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Kaz Kylheku@046-301-5902@kylheku.com to comp.lang.c on Fri Jan 2 22:52:21 2026
    From Newsgroup: comp.lang.c

    On 2026-01-01, highcrew <high.crew3868@fastmail.com> wrote:
    Wouldn't it be more sensible to have a compilation error, or
    at least a warning?

    It would in that case, but it's very difficult to specify that into the language standard as a requirement such that all implementations must
    produce a diagnostic.

    If we wanted to diagnose situations similar to the ones in your
    example program, we would have two departures from the way the
    language is now.

    The first is that we would need to formalize the concept of error versus warning diagnostics. Currently there are only diagnostics. Currently,
    when the implementation is required to diagnose a program, that program
    is deemed to be incorrect: it violates a syntax or semantic
    constraint rule. It is not required to be translated, and if it is
    translated and linked, it doesn't have well-defined behavior at all.

    Warnings are diagnostics for situations that are not confirmed
    incorrect, but only suspected.

    Compilers typically treat some ISO-C-required diagnostics as warnings.
    For instance, GNU C issues only a warning when pointers are converted
    without a cast. But ISO C allows it to stop translating.

    For the situation in your program, it would be unacceptable to have implementations stop translating. We really want just a warning (at
    least by default; in specific project and situations, developers
    could elect to treat certain warnings as fatal, even standard-required warnings.)

    The second new thing is that to diagnose this, we need to make
    diagnosis dependent on reachability.

    We want a rule which is something like "whenever the body of
    a function, or an initializing expression for an external definition
    reaches an expression which has unconditional undefined behavior
    that is not an unreachability assertion and not a documented
    extension, a warning diagnostic must be issued".

    This "reaches" has a problem because it requires the implementation to
    solve the halting problem. We should restrict that to just "statically
    obvious" or "trivial" reachability whereby without having to reason
    about unknown run-time values, we can infer that a statement or
    expression is unconditionally reached.

    This kind of diagnostic would be a good thing in my opinion; just
    nobody has stepped up to the plate because of the challenges:

    - introducing the concept of a warning versus error diagnostic.

    - defining a clear set of rules for trivial reachability which
    can catch the majority of these situations without too much
    complexity. (The C++ rules for functions that return value
    reaching their end without a return statement can be used
    as inspiration here.)

    - specifying exactly what "statically obvious" undefined behavior
    is and how positively determine that a certain expression
    exhibits it.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Kaz Kylheku@046-301-5902@kylheku.com to comp.lang.c on Fri Jan 2 22:56:55 2026
    From Newsgroup: comp.lang.c

    On 2026-01-01, Michael S <already5chosen@yahoo.com> wrote:
    On Thu, 1 Jan 2026 22:54:05 +0100
    highcrew <high.crew3868@fastmail.com> wrote:

    Hello,

    While I consider myself reasonably good as C programmer, I still
    have difficulties in understanding undefined behavior.
    I wonder if anyone in this NG could help me.

    Let's take an example. There's plenty here:
    https://en.cppreference.com/w/c/language/behavior.html
    So let's focus on https://godbolt.org/z/48bn19Tsb

    For the lazy, I report it here:

    int table[4] = {0};
    int exists_in_table(int v)
    {
    // return true in one of the first 4 iterations
    // or UB due to out-of-bounds access
    for (int i = 0; i <= 4; i++) {
    if (table[i] == v) return 1;
    }
    return 0;
    }

    This is compiled (with no warning whatsoever) into:

    exists_in_table:
    mov eax, 1
    ret
    table:
    .zero 16


    Well, this is *obviously* wrong. And sure, so is the original code,
    but I find it hard to think that the compiler isn't able to notice it,
    given that it is even "exploiting" it to produce very efficient code.

    I understand the formalism: the resulting assembly is formally
    "correct", in that UB implies that anything can happen.
    Yet I can't think of any situation where the resulting assembly
    could be considered sensible. The compiled function will
    basically return 1 for any input, and the final program will be
    buggy.

    Wouldn't it be more sensible to have a compilation error, or
    at least a warning? The compiler will be happy even with -Wall
    -Wextra -Werror.

    There's plenty of documentation, articles and presentations that
    explain how this can make very efficient code... but nothing
    will answer this question: do I really want to be efficiently
    wrong?

    I mean, yes I would find the problem, thanks to my 100% coverage
    unit testing, but couldn't the compiler give me a hint?

    Could someone drive me into this reasoning? I know there is a lot of
    thinking behind it, yet everything seems to me very incorrect!
    I'm in deep cognitive dissonance here! :) Help!


    IMHO, for compiler that eliminated all comparisons (I assume that it was
    gcc -O2/-O3) an absence of warning is a bug.

    A bug against which requirement, articulated where?

    And it has nothing to do with C standard and what considered UB by the standard and what not.

    It has everything to do with it, unfortunately. It literally has nothing
    to do with anything else, in fact.

    That function either finds a match in the four array elements and
    returns 1, or else its behavior is undefined.

    Therefore there is no situation under which it is /required/ to return
    anything other than 1.

    You literally cannot write a test case which tests for the "return 0",
    such that the test case has well-defined behavior.

    All well-defined test cases can only test for 1 being returned.

    And that is satisfied by machine code which unconditionally returns 1.

    There is no requirement anywhere that the function requires a
    diagnostic; not in ISO C and not in any GCC documentation.

    Therefore your bug report would have to be not about the compiler
    behavior but about the lack of the requirement.

    This is a difficult problem: writing the requirement /in a good way/
    that covers many cases is not easy, and that's before you implement
    anything in the compiler.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Sat Jan 3 13:30:50 2026
    From Newsgroup: comp.lang.c

    On 02/01/2026 17:38, highcrew wrote:
    On 1/2/26 6:53 AM, Waldek Hebisch wrote:
    highcrew <high.crew3868@fastmail.com> wrote:

    You do not get the formalism: compiler applies a lot transformations
    which are supposed to be correct for programs obeying the C rules.
    However, compiler does not understand the program.-a It may notice
    details that you missed, but it act essentialy blindly on
    information it has.-a And most transformations have only limited
    info (storing all things that compiler infers would take a lot
    of memory and searching all info would take a lot of time).

    Code that you see is a result of many transformations, possibly
    hundreds or more.-a The result is a conseqence of all steps,
    but it could be hard to isolate a single "silly" step.
    [...]

    Thanks for your answer.

    So you are basically saying that spotting such a problem is
    way more difficult than optimizing it? And indeed so difficult that the compiler fails at it?

    There's plenty of documentation, articles and presentations that
    explain how this can make very efficient code... but nothing
    will answer this question: do I really want to be efficiently
    wrong?

    By using C you implicitely gave "yes" as an answer.


    When code contains UB, all guarantees are out the window. There is no
    "right" answer, so it doesn't really make sense to ask if you are
    getting the wrong answer efficiently or inefficiently. In effect,
    because you are not giving the compiler valid correct C code, it can't
    know what you really want - you can't reasonably complain about what you
    get in response. Compilers are not mind readers.

    (Of course compilers try to go beyond the requirements of the C
    language, to be helpful development aids rather than just "pure"
    compilers. And while good compilers /are/ helpful, there is always room
    for improvement.)

    Wait, I don't think that makes sense.
    If we are talking about a legitimate limitation of the compilers, as you
    seem to suggest, then it is a different situation.

    Perhaps it would be more proper to say that, by using C, one implicitly accepts to take the burden of writing UB-free code.
    The compiler can't guarantee that it will detect UB, so the contract
    is: you get a correct program if you write correct code.


    That's a good way to phrase it, IMHO.

    The C language standards (together with any documented extensions, modifications, or implementation-defined behaviour in the
    implementation) provides a contract. You, as the programmer, guarantee
    to write correct source code. In return, the compiler guarantees to
    generate correct object code. If you fail on your part, and write UB in
    your code, the compiler is under no obligation to give you anything
    useful in return.



    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Sat Jan 3 13:42:35 2026
    From Newsgroup: comp.lang.c

    On 02/01/2026 17:51, highcrew wrote:
    On 1/2/26 10:31 AM, David Brown wrote:
    However, it /does/ make sense to ask whether the compiler could have
    been more helpful in pointing out your mistake - and clearly, in
    theory at least, the answer is yes.
    [...]

    Thanks for the clarification.

    Yes, I'm exactly wondering if the compiler shouldn't reject that code
    to begin with.-a I'm not expecting to enter wrong code and get a
    working program.-a That would be dark magic.

    So you are basically confirming what I inferred from Waldek Hebisch's
    answer: it is actually quite hard for the compiler to spot it. So we
    live with it.

    I think it is fair to say that if it were easy to detect this case
    within the structure of existing compilers, they would do so already.
    The fact that both gcc and clang fail to produce warnings, despite
    having significant differences in their internal structures and the
    details of their passes, shows that it is not as easy at it might
    appear. And James has explained why the C standards can't make
    detection of this kind of fault a language requirement.

    But it is still worth reporting the simple test case to gcc (and clang)
    to see if they are able to improve their warnings.


    I had a little look in the gcc bugzilla, but could not find any issue
    that directly matches this case.-a So I think it is worthwhile if you
    file it in as a gcc bug.-a (It is not technically a "bug", but it is
    definitely an "opportunity for improvement".)-a If the gcc passes make
    it hard to implement as a normal warning, it may still be possible to
    add it to the "-fanalyzer" passes.

    Erm... I will considering filing an "opportunity for improvement"
    ticket then, thank you.

    "Opportunity for improvement" tickets are the driving force behind a lot
    of open source software features, so please do file it. Post the
    bugzilla link here once you get some feedback - it would be good to see
    if there is likely to be an improvement in the warnings.


    There's plenty of documentation, articles and presentations that
    explain how this can make very efficient code... but nothing
    will answer this question: do I really want to be efficiently
    wrong?

    You are asking if you want the generated code to be efficiently wrong
    or inefficiently wrong?

    I was asking if it is reasonable to accept as valid a program which
    is wrong, and make it optimized in its wrong behavior.


    Yes, it is. That is perhaps unfortunate from the programmers'
    viewpoint, but "garbage in, garbage out" is unavoidable.

    What I could not grasp is the difficulty of the job.
    To quote your own words:

    Sometimes that means things can appear simple and obvious to the user,
    but would require unwarranted effort to implement in the compiler."

    You can also make use of run-time sanitizers that are ideal for
    catching this sort of thing (albeit with an inevitable speed overhead).

    <https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html>

    Yes, I'm aware of this instruments, but I'm not very knowledgeable about
    it. I'd like to learn more, and I'll need to spend time doing so.


    The tools here can be useful. Of course it is best when you can find
    bugs earlier, at the static analysis stage (I am a big fan of lots of
    compiler warnings), but the "-fsanatize" options are the next step for a
    lot of development. They are of limited value in my own work (small
    embedded systems - there's often no console for log messages, and much
    less possibility of "hardware accelerated" error detection such as
    creative use of a processor's MMU), but for PC programming they can be a
    great help.


    Thanks!


    It's been a good thread - on-topic, interesting discussion, people have
    got a better understanding of a few things, there's an opportunity to contribute to better C development tools, and no flames. I look forward
    to your next question!


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From highcrew@high.crew3868@fastmail.com to comp.lang.c on Sat Jan 3 14:42:59 2026
    From Newsgroup: comp.lang.c

    On 1/3/26 1:42 PM, David Brown wrote:
    Yes, I'm aware of this instruments, but I'm not very knowledgeable about
    it. I'd like to learn more, and I'll need to spend time doing so.


    The tools here can be useful.-a Of course it is best when you can find
    bugs earlier, at the static analysis stage (I am a big fan of lots of compiler warnings), but the "-fsanatize" options are the next step for a
    lot of development.-a They are of limited value in my own work (small embedded systems - there's often no console for log messages, and much
    less possibility of "hardware accelerated" error detection such as
    creative use of a processor's MMU), but for PC programming they can be a great help.

    Agreed.

    I happen to work with embedded systems as well, and while I came late to
    the party (all the possible checks are already employed by colleagues
    who came before me. They took the fun part!), I can tell the value of sanitizers even if the code will later run on embedded systems.
    That's why I say I'd like to learn more: I'm merely a user of it.

    Simulations use that code. Then the same code is compiled for the target platform.

    On the light of what I learned on this thread, the value of such testing
    is not just mere "let's see if it crashes", but effectively bound to UB!
    Or to the attempt to rule out the existing of UB, I should rather say.

    Sanitizer to the rescue. Assuming UB is not in the code, we saddle the (cross)compiler of the correctness of the output.

    Then - I'm reasoning as I write - the devil is in the details.
    Let's say that the compiler is unable to catch the issues, and so are
    the sanitizers. Then the UB-tainted source code goes to the target.
    Assuming the cross compiler is unable to catch it either, we have
    a garbage-in-garbage-out situation, affecting the product.

    Following this thoughts, I started to wonder: the code I reported in
    the beginning of the thread, built with -O2, is effectively coping with
    UB by replacing the function with the equivalent of `return 1`.
    What if I build it with -O2 and -fsanitize=address?
    Will the instrumentation be able to catch it, given that there's nothing inherently bad around a `return 1` (minus the fact that it's not what
    the developer intended?).

    $ cat x.c
    int table[4] = {0};
    int exists_in_table(int v)
    {
    // return true in one of the first 4 iterations
    // or UB due to out-of-bounds access
    for (int i = 0; i <= 4; i++) {
    if (table[i] == v) return 1;
    }
    return 0;
    }
    $ gcc -c -O2 x.c
    $ objdump --disassemble=exists_in_table x.o

    x.o: file format elf64-x86-64


    Disassembly of section .text:

    0000000000000000 <exists_in_table>:
    0: b8 01 00 00 00 mov $0x1,%eax
    5: c3 ret

    OK, this is the bad guy. ...now let's sanitize it.

    $ gcc -c -fsanitize=address -O2 x.c
    $ objdump --disassemble=exists_in_table x.o

    x.o: file format elf64-x86-64


    Disassembly of section .text:

    0000000000000000 <exists_in_table>:
    0: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # 7 <exists_in_table+0x7>
    7: 48 8d 70 14 lea 0x14(%rax),%rsi
    b: 48 89 c2 mov %rax,%rdx
    e: 48 c1 ea 03 shr $0x3,%rdx
    12: 0f b6 8a 00 80 ff 7f movzbl 0x7fff8000(%rdx),%ecx
    19: 48 89 c2 mov %rax,%rdx
    1c: 83 e2 07 and $0x7,%edx
    1f: 83 c2 03 add $0x3,%edx
    22: 38 ca cmp %cl,%dl
    24: 7c 04 jl 2a <exists_in_table+0x2a>
    26: 84 c9 test %cl,%cl
    28: 75 1c jne 46 <exists_in_table+0x46>
    2a: 39 38 cmp %edi,(%rax)
    2c: 74 12 je 40 <exists_in_table+0x40>
    2e: 48 83 c0 04 add $0x4,%rax
    32: 48 39 f0 cmp %rsi,%rax
    35: 75 d4 jne b <exists_in_table+0xb>
    37: 31 c0 xor %eax,%eax
    39: c3 ret
    3a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
    40: b8 01 00 00 00 mov $0x1,%eax
    45: c3 ret
    46: 48 83 ec 08 sub $0x8,%rsp
    4a: 48 89 c7 mov %rax,%rdi
    4d: e8 00 00 00 00 call 52 <__odr_asan.table+0x12>

    Disassembly of section .text.exit:

    Disassembly of section .text.startup:

    Well, what do you know? -fsanitize=address seems to interfere with optimizations, at least on my system. Link it, run it, and I get a nice segfault.

    Now the circle is closed!

    Thanks!


    It's been a good thread - on-topic, interesting discussion, people have
    got a better understanding of a few things, there's an opportunity to contribute to better C development tools, and no flames.-a I look forward
    to your next question!

    Thanks. I'm new here, and the community seems a good one! Happy to
    contribute.

    [Here I'm tempted to go OT with babbling about how nice it would be that
    usenet wasn't so underground, but I suspect that is probably what makes
    it good. In the small barrel, there is the good wine]
    --
    High Crew
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Andrey Tarasevich@noone@noone.net to comp.lang.c on Sat Jan 3 07:53:22 2026
    From Newsgroup: comp.lang.c

    On Thu 1/1/2026 1:54 PM, highcrew wrote:
    Let's take an example.-a There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
    So let's focus on https://godbolt.org/z/48bn19Tsb

    For the lazy, I report it here:

    -a int table[4] = {0};
    -a int exists_in_table(int v)
    -a {
    -a-a-a-a-a // return true in one of the first 4 iterations
    -a-a-a-a-a // or UB due to out-of-bounds access
    -a-a-a-a-a for (int i = 0; i <= 4; i++) {
    -a-a-a-a-a-a-a-a-a if (table[i] == v) return 1;
    -a-a-a-a-a }
    -a-a-a-a-a return 0;
    -a }

    This is compiled (with no warning whatsoever) into:

    -a exists_in_table:
    -a-a-a-a-a-a-a-a-a mov-a-a-a-a eax, 1
    -a-a-a-a-a-a-a-a-a ret
    -a table:
    -a-a-a-a-a-a-a-a-a .zero-a-a 16


    Well, this is *obviously* wrong.

    Once again, one equivalent definition of undefined behavior is: "The
    compiler is free to assume that conditions that lead to undefined
    behavior never occur".

    (And, as a corollary: if some stretch of code is always undefined,
    regardless of external conditions, the compiler is free to assume that
    the code is never executed.)

    The above is exactly how undefined behavior is used for optimizing code through static analysis.

    On your case undefined behavior happens when `i` reaches 4. Hence the
    compiler is free to assume that `i` is guaranteed to never reach 4. This
    means that the `if` condition is guaranteed to become true at some lower
    value of `i` (i.e. the compiler is free to assume that the calling code
    made a promise to never pass a `v` that is not present in `table`). This immediately means that the function will always return 1.

    That's what you are observing.

    I understand the formalism: the resulting assembly is formally
    "correct", in that UB implies that anything can happen.

    That's is true, but that is a very broad and general formalism. The
    logic the compiler follows is not that broad or general. It is
    significantly more focused on the properties of the actual code. The
    compiler "deduces" the result as I described above.
    --
    Best regards,
    Andrey


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Sat Jan 3 17:51:08 2026
    From Newsgroup: comp.lang.c

    On 03/01/2026 14:42, highcrew wrote:
    On 1/3/26 1:42 PM, David Brown wrote:

    [Be careful snipping attributions. Make sure you have enough left for
    all levels of quotation. The following paragraph was written by you,
    not by me.]

    Yes, I'm aware of this instruments, but I'm not very knowledgeable about >>> it. I'd like to learn more, and I'll need to spend time doing so.


    The tools here can be useful.-a Of course it is best when you can find
    bugs earlier, at the static analysis stage (I am a big fan of lots of
    compiler warnings), but the "-fsanatize" options are the next step for
    a lot of development.-a They are of limited value in my own work (small
    embedded systems - there's often no console for log messages, and much
    less possibility of "hardware accelerated" error detection such as
    creative use of a processor's MMU), but for PC programming they can be
    a great help.

    Agreed.

    I happen to work with embedded systems as well, and while I came late to
    the party (all the possible checks are already employed by colleagues
    who came before me. They took the fun part!), I can tell the value of sanitizers even if the code will later run on embedded systems.
    That's why I say I'd like to learn more: I'm merely a user of it.

    <snip>

    Following this thoughts, I started to wonder: the code I reported in
    the beginning of the thread, built with -O2, is effectively coping with
    UB by replacing the function with the equivalent of `return 1`.
    What if I build it with -O2 and -fsanitize=address?
    Will the instrumentation be able to catch it, given that there's nothing inherently bad around a `return 1` (minus the fact that it's not what
    the developer intended?).

    <snip>

    Well, what do you know? -fsanitize=address seems to interfere with optimizations, at least on my system. Link it, run it, and I get a nice segfault.

    Now the circle is closed!


    The sanitizers effectively inject code into your source, before any optimisations are applied. You can imagine your code being transformed
    into something like :

    int table[4] = {0};
    int exists_in_table(int v)
    {
    // return true in one of the first 4 iterations
    // or UB due to out-of-bounds access
    for (int i = 0; i <= 4; i++) {
    // Start of sanitizer code
    if (i < 0 || i > 3) halt_with_sanitizer_message();
    // End of sanitizer code
    if (table[i] == v) return 1;
    }
    return 0;
    }

    Then optimisations are applied as normal.

    I can strongly recommend <https://godbolt.org> as the tool of choice for investigating code generation. It only works well with small code
    sections, but it gives you very clear generated code, and lets you try
    it with hundreds of different compilers and compiler versions. It's far
    nicer than doing objdump's or using -Wa,ahsdl flags to generate listing.



    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Sat Jan 3 20:48:10 2026
    From Newsgroup: comp.lang.c

    On Fri, 2 Jan 2026 22:56:55 -0000 (UTC)
    Kaz Kylheku <046-301-5902@kylheku.com> wrote:

    On 2026-01-01, Michael S <already5chosen@yahoo.com> wrote:
    On Thu, 1 Jan 2026 22:54:05 +0100
    highcrew <high.crew3868@fastmail.com> wrote:

    Hello,

    While I consider myself reasonably good as C programmer, I still
    have difficulties in understanding undefined behavior.
    I wonder if anyone in this NG could help me.

    Let's take an example. There's plenty here:
    https://en.cppreference.com/w/c/language/behavior.html
    So let's focus on https://godbolt.org/z/48bn19Tsb

    For the lazy, I report it here:

    int table[4] = {0};
    int exists_in_table(int v)
    {
    // return true in one of the first 4 iterations
    // or UB due to out-of-bounds access
    for (int i = 0; i <= 4; i++) {
    if (table[i] == v) return 1;
    }
    return 0;
    }

    This is compiled (with no warning whatsoever) into:

    exists_in_table:
    mov eax, 1
    ret
    table:
    .zero 16


    Well, this is *obviously* wrong. And sure, so is the original code,
    but I find it hard to think that the compiler isn't able to notice
    it, given that it is even "exploiting" it to produce very
    efficient code.

    I understand the formalism: the resulting assembly is formally
    "correct", in that UB implies that anything can happen.
    Yet I can't think of any situation where the resulting assembly
    could be considered sensible. The compiled function will
    basically return 1 for any input, and the final program will be
    buggy.

    Wouldn't it be more sensible to have a compilation error, or
    at least a warning? The compiler will be happy even with -Wall
    -Wextra -Werror.

    There's plenty of documentation, articles and presentations that
    explain how this can make very efficient code... but nothing
    will answer this question: do I really want to be efficiently
    wrong?

    I mean, yes I would find the problem, thanks to my 100% coverage
    unit testing, but couldn't the compiler give me a hint?

    Could someone drive me into this reasoning? I know there is a lot
    of thinking behind it, yet everything seems to me very incorrect!
    I'm in deep cognitive dissonance here! :) Help!


    IMHO, for compiler that eliminated all comparisons (I assume that
    it was gcc -O2/-O3) an absence of warning is a bug.

    A bug against which requirement, articulated where?

    And it has nothing to do with C standard and what considered UB by
    the standard and what not.

    It has everything to do with it, unfortunately. It literally has
    nothing to do with anything else, in fact.

    That function either finds a match in the four array elements and
    returns 1, or else its behavior is undefined.

    Therefore there is no situation under which it is /required/ to return anything other than 1.

    You literally cannot write a test case which tests for the "return 0",
    such that the test case has well-defined behavior.

    All well-defined test cases can only test for 1 being returned.

    And that is satisfied by machine code which unconditionally returns 1.

    There is no requirement anywhere that the function requires a
    diagnostic; not in ISO C and not in any GCC documentation.

    Therefore your bug report would have to be not about the compiler
    behavior but about the lack of the requirement.

    This is a difficult problem: writing the requirement /in a good way/
    that covers many cases is not easy, and that's before you implement
    anything in the compiler.


    The text above is an example of languagelowyerish speak. I don't like
    it.

    gcc maintainer would want to fix it, because they actually care about
    quality of implementation.
    Now, being in majority not atypical representative of males of great
    ape species, the care even more about always feeling right, having last
    word, etc...
    Which means that the process of convincing the to make a fix requires
    wisdom.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From highcrew@high.crew3868@fastmail.com to comp.lang.c on Sat Jan 3 23:47:02 2026
    From Newsgroup: comp.lang.c

    On 1/2/26 11:52 PM, Kaz Kylheku wrote:
    On 2026-01-01, highcrew <high.crew3868@fastmail.com> wrote:
    For the situation in your program, it would be unacceptable to have implementations stop translating.

    I can somehow get the idea that it is difficult for the compiler
    to spot the issue, but why do you think it would be unacceptable
    to stop translating?

    We really want just a warning (at
    least by default; in specific project and situations, developers
    could elect to treat certain warnings as fatal, even standard-required warnings.)

    Even a warning would be enough though. Btw, my typical way of
    working is to enable -Werror while developing, but I don't like
    to force it in general. That would be an interesting digression,
    but definitely OT.

    The second new thing is that to diagnose this, we need to make
    diagnosis dependent on reachability.

    We want a rule which is something like "whenever the body of
    a function, or an initializing expression for an external definition
    reaches an expression which has unconditional undefined behavior
    that is not an unreachability assertion and not a documented
    extension, a warning diagnostic must be issued".

    That's an interesting perspective: reachability.
    Would you say that the incriminated piece of code is UB only if it
    is reachable in the final program, therefore it is acceptable
    to keep it as long as unreachable?

    Now that I think of it, the __builtin_unreachable() implemented
    by popular compilers is technically UB *if reached* :)

    This kind of diagnostic would be a good thing in my opinion; just
    nobody has stepped up to the plate because of the challenges:

    - introducing the concept of a warning versus error diagnostic.

    - defining a clear set of rules for trivial reachability which
    can catch the majority of these situations without too much
    complexity. (The C++ rules for functions that return value
    reaching their end without a return statement can be used
    as inspiration here.)

    - specifying exactly what "statically obvious" undefined behavior
    is and how positively determine that a certain expression
    exhibits it.
    Now I'm wondering how much work it requires to properly define
    the rules that the standard mandates!

    As for me the main take-away is that the detection of certain UB
    is non-trivial, it would be very evil if the standard was mandating
    some nearly-impossible task to the compiler!


    (The C++ rules for functions that return value
    reaching their end without a return statement can be used
    as inspiration here.)

    C++ does *what*?? I'm definitely not up to speed with C++, but
    I totally have missed that. Could you please tell me the name
    of this bizarre feature? I *need* to look it up :D
    --
    High Crew
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Sat Jan 3 23:14:33 2026
    From Newsgroup: comp.lang.c

    On Thu, 1 Jan 2026 22:54:05 +0100, highcrew wrote:

    Well, this is *obviously* wrong.

    I think itrCOs quite a clever way for the compiler to say rCLfuck yourCY to
    the programmer who wrote that. ;)
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From highcrew@high.crew3868@fastmail.com to comp.lang.c on Sun Jan 4 00:15:48 2026
    From Newsgroup: comp.lang.c

    On 1/3/26 4:53 PM, Andrey Tarasevich wrote:
    On your case undefined behavior happens when `i` reaches 4. Hence the compiler is free to assume that `i` is guaranteed to never reach 4. This means that the `if` condition is guaranteed to become true at some lower value of `i` (i.e. the compiler is free to assume that the calling code
    made a promise to never pass a `v` that is not present in `table`). This immediately means that the function will always return 1.

    OK, I totally have missed that there was a rational justification
    for `return 1`! Now I see that `return 1` is actually correct, and
    I'm quite surprised. Thank you for pointing it out!

    (turns out the compiler is in DENIAL of UB :P)

    Interestingly, if I keep in mind this standpoint, every single
    UB listed in https://en.cppreference.com/w/c/language/behavior.html
    starts to make a lot of sense. I can even *foresee* the
    behavior before reading it on the webpage! Damn, I think
    something clicked in my head now...

    * Signed overflow? UB *can't happen*, therefore `x + 1 > x` is always
    true

    * Access out of bounds, ...discussed above.

    * Uninitialized scalar:

    size_t f(int x)
    {
    size_t a;
    if (x) // either x nonzero or UB
    a = 42;
    return a;
    }

    Here we *deny* the variable can be used uninitialized,
    so the presence of UB implies that x is non-zero.
    The function definitely returns 42

    * Uninitialized scalar 2:

    _Bool p; // uninitialized local variable
    if (p) // UB access to uninitialized scalar
    puts("p is true");
    if (!p) // UB access to uninitialized scalar
    puts("p is false");

    This is hard to tell ...Shr||dinger boolean?

    According to the webpage the program might print both
    "p is true" and "p is false". Could it be because
    the compiler has no choice but take the UB route?

    There's no way to mark UB as not reachable.
    Fortunately the compiler will usually warn me about
    uninitialized variables.

    I can see a few bold cases down the line, e.g. "Access to
    pointer passed to realloc" where I start wondering if
    I could even predict as a human that a certain pointer
    passed through realloc.

    I have a horrible question now, but that's for a
    separate question...


    Conclusion: the original UB I've been asking about is
    not even a bug. It is the compiler dodging a conditional.
    --
    High Crew
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From highcrew@high.crew3868@fastmail.com to comp.lang.c on Sun Jan 4 00:20:06 2026
    From Newsgroup: comp.lang.c

    On 1/3/26 5:51 PM, David Brown wrote:
    [Be careful snipping attributions.-a Make sure you have enough left for
    all levels of quotation.-a The following paragraph was written by you,
    not by me.]

    I see. I apologize with the crowd, I think I did that a few times.
    And sorry for the apologies driven spam too.
    --
    High Crew
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From highcrew@high.crew3868@fastmail.com to comp.lang.c on Sun Jan 4 00:25:07 2026
    From Newsgroup: comp.lang.c

    On 1/4/26 12:15 AM, highcrew wrote:
    I have a horrible question now, but that's for a
    separate question...

    And the question is:

    Embedded systems. Address 0x00000000 is mapped to the flash.
    I want to assign a pointer to 0x00000000 and dereference it to
    read the first word.
    That's UB.

    How do I?

    Now I guess that an embedded compiler targeting that certain
    architecture where dereferencing 0 makes sense will not treat
    it as UB. But it is for sure a weird corner case.
    --
    High Crew
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Sat Jan 3 18:59:44 2026
    From Newsgroup: comp.lang.c

    On 2026-01-03 18:25, highcrew wrote:
    On 1/4/26 12:15 AM, highcrew wrote:
    I have a horrible question now, but that's for a
    separate question...

    And the question is:

    Embedded systems. Address 0x00000000 is mapped to the flash.
    I want to assign a pointer to 0x00000000 and dereference it to
    read the first word.
    That's UB.

    Actually, that's not necessarily true. A null pointer is not required to
    refer to the location with an address of 0. An integer constant
    expression with a value of 0, converted to a pointer type, is guaranteed
    to be a null pointer, but that pointer need not have a representation
    that has all bits 0. However, an integer expression that is not a
    constant expression, if converted to a pointer type, is not required to
    be a null pointer - it could convert to an entirely different pointer value.

    So an implementation could allow it simply by reserving a pointer to
    some other location (such as the last position in memory) as the
    representation of a null pointer.

    How do I?

    Even on an implementation that uses a pointer representing a machine
    address of 0 as a null pointer, such code can still work. In the C
    standard, "undefined behavior" means that the C standard imposes no requirements on the behavior. That doesn't prohibit other sources from
    imposing requirements. On such a system, it could define the behavior as accessing the flash.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Paul J. Lucas@paul@lucasmail.org to comp.lang.c on Sat Jan 3 17:10:22 2026
    From Newsgroup: comp.lang.c

    On 1/1/26 1:54 PM, highcrew wrote:

    For the lazy, I report it here:

    -a int table[4] = {0};
    -a int exists_in_table(int v)
    -a {
    -a-a-a-a-a // return true in one of the first 4 iterations
    -a-a-a-a-a // or UB due to out-of-bounds access
    -a-a-a-a-a for (int i = 0; i <= 4; i++) {
    -a-a-a-a-a-a-a-a-a if (table[i] == v) return 1;
    -a-a-a-a-a }
    -a-a-a-a-a return 0;
    -a }

    This particular example is explained is several places, e.g.:

    https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633

    Perhaps a slightly better explanation of the same example:

    https://medium.com/@pauljlucas/undefined-behavior-in-c-and-c-f30844f20e2a

    - Paul
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Andrey Tarasevich@noone@noone.net to comp.lang.c on Sat Jan 3 17:24:54 2026
    From Newsgroup: comp.lang.c

    On Sat 1/3/2026 3:25 PM, highcrew wrote:
    On 1/4/26 12:15 AM, highcrew wrote:
    I have a horrible question now, but that's for a
    separate question...

    And the question is:

    Embedded systems.-a Address 0x00000000 is mapped to the flash.
    I want to assign a pointer to 0x00000000 and dereference it to
    read the first word.
    That's UB.

    How do I?

    Well, the first question would be: what is the physical null pointer representation in that C implementation on that embedded system?

    Null pointer in C is represented by an integer constant `0` at source
    code level only (!). The actual physical representation in the compiled
    code is not necessarily "address 0", contrary to popular misguided
    belief. It can be anything. It is typically supposed to be chosen as
    some appropriate "invalid address value" on the given platform.

    The compiler on that embedded system is, of course, aware of the fact
    that address 0x00000000 is perfectly valid and should be left
    accessible. So, for that reason, the compiler is supposed to choose some
    other physical representation for null pointers, like, say, address
    0xFFFFFFFF (just for one example). So, every time you write something like

    int *p = 0;

    the compiler will emit code that stores `0xFFFFFFFF` into `p`.

    In than implementation you will have no problem accessing address
    0x00000000. No UB. No problem.

    But if even under such circumstances the compiler decided to use address 0x00000000 for physically representing null pointers (say, for some
    other important reasons)... well, then I guess the compiler will have no
    other choice but to extend the formal language specification and
    postulate that null pointer access is well-defined. There will be no optimizations based on UB associated with null pointer access. At least
    in some circumstances. That all would become implementation-defined, of course.
    --
    Best regards,
    Andrey
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Sun Jan 4 02:19:50 2026
    From Newsgroup: comp.lang.c

    On Sat, 3 Jan 2026 17:24:54 -0800, Andrey Tarasevich wrote:

    The compiler on that embedded system is, of course, aware of the
    fact that address 0x00000000 is perfectly valid and should be left accessible. So, for that reason, the compiler is supposed to choose
    some other physical representation for null pointers ...

    What if the entire machine address space is valid? Are C pointer types
    supposed to add an extra rCLinvalidrCY value on top of that?
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Sat Jan 3 21:31:20 2026
    From Newsgroup: comp.lang.c

    On 2026-01-03 21:19, Lawrence DrCOOliveiro wrote:
    On Sat, 3 Jan 2026 17:24:54 -0800, Andrey Tarasevich wrote:

    The compiler on that embedded system is, of course, aware of the
    fact that address 0x00000000 is perfectly valid and should be left
    accessible. So, for that reason, the compiler is supposed to choose
    some other physical representation for null pointers ...

    What if the entire machine address space is valid? Are C pointer types supposed to add an extra rCLinvalidrCY value on top of that?

    Either that, or set aside one piece of addressable memory that is not
    available to user code. Note, in particular, that it might be a piece of
    memory used by the implementation of C, or by the operating system. In
    which case, the undefined behavior that can occur as a result of
    dereferencing a null point would take the form of messing up the C
    runtime or the operating system.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Andrey Tarasevich@noone@noone.net to comp.lang.c on Sat Jan 3 18:44:02 2026
    From Newsgroup: comp.lang.c

    On Sat 1/3/2026 5:24 PM, Andrey Tarasevich wrote:
    On Sat 1/3/2026 3:25 PM, highcrew wrote:
    On 1/4/26 12:15 AM, highcrew wrote:
    I have a horrible question now, but that's for a
    separate question...

    And the question is:

    Embedded systems.-a Address 0x00000000 is mapped to the flash.
    I want to assign a pointer to 0x00000000 and dereference it to
    read the first word.
    That's UB.

    How do I?

    Well, the first question would be: what is the physical null pointer representation in that C implementation on that embedded system?
    ...

    Although, on the second thought, what I said above, while correct, is
    hardly relevant to the matter if using UB for optimizations.

    UB-based optimizations rely on static analysis of the code during
    compilation. At that stage the platform-specific physical representation
    of null pointer plays no role at all. The only thing that matters is the ability of the compiler to identify and track _logical_ null pointers
    through the program. E.g. for the compiler

    int *p = 0;

    is always a null pointer. And

    if (p != 0)

    always checks pointer `p` for being null. The actual physical
    representation of `p` does not come into the picture at all.

    In that case the key moment here is that only compile-time zero (i.e.
    integral constant expression zero) can be interpreted as a null pointer.
    A run-time zero cannot be.

    And the only issue that remains is your original request "I want to
    assign a pointer to 0x00000000 and dereference it to read the first
    word". Well, firstly, the language does not offer you any
    standard-defined features for accessing specific addresses. But in
    real-life it is usually done through explicitly converting an integer
    address to a pointer type. Since

    int *p = 0;

    has a reserved meaning and will not generally work as intended, one
    possible workaround would be

    uintptr_t a = 0;
    int *p = (int *) a;

    In the above case `p` will not be seen by the compiler as a logical null pointer.

    This is actually covered by the FAQ: https://c-faq.com/null/accessloc0.html
    --
    Best regards,
    Andrey
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Sun Jan 4 04:52:24 2026
    From Newsgroup: comp.lang.c

    On Sat, 3 Jan 2026 21:31:20 -0500, James Kuyper wrote:

    On 2026-01-03 21:19, Lawrence DrCOOliveiro wrote:

    What if the entire machine address space is valid? Are C pointer
    types supposed to add an extra rCLinvalidrCY value on top of that?

    Either that, or set aside one piece of addressable memory that is
    not available to user code. Note, in particular, that it might be a
    piece of memory used by the implementation of C, or by the operating
    system. In which case, the undefined behavior that can occur as a
    result of dereferencing a null point would take the form of messing
    up the C runtime or the operating system.

    rCLUndefined behaviourrCY could also include rCLperforming a valid memory accessrCY, could it not.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From highcrew@high.crew3868@fastmail.com to comp.lang.c on Sun Jan 4 12:51:25 2026
    From Newsgroup: comp.lang.c

    On 1/4/26 2:10 AM, Paul J. Lucas wrote:
    This particular example is explained is several places, e.g.:

    https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633

    Perhaps a slightly better explanation of the same example:

    https://medium.com/@pauljlucas/undefined-behavior-in-c-and-c-f30844f20e2a

    - Paul

    Hey, thanks for the pointers.
    I found the second a really good write up!
    --
    High Crew
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Sun Jan 4 12:58:03 2026
    From Newsgroup: comp.lang.c

    On 03/01/2026 23:47, highcrew wrote:
    On 1/2/26 11:52 PM, Kaz Kylheku wrote:
    On 2026-01-01, highcrew <high.crew3868@fastmail.com> wrote:
    For the situation in your program, it would be unacceptable to have
    implementations stop translating.

    I can somehow get the idea that it is difficult for the compiler
    to spot the issue, but why do you think it would be unacceptable
    to stop translating?


    A C compiler definitely should /not/ stop translating just because it
    finds UB like this - at least, not with "normal" compilation flags.
    (With additional flags, anything is allowable.)

    Run-time UB is only a problem if the running program attempts to execute
    it. So it is only really appropriate for it to be treated as a fatal compile-time error if the compiler knows for sure that it will be
    executed (i.e., it can trace all execution paths from the start of
    "main" and see that it is inevitably executed). That is clearly
    infeasible for the vast majority of such run-time UB.

    It is entirely normal that code is full of potential run-time UB :

    extern int xs[10];
    int foo(int i) { return xs[i]; }

    The function "foo" has potential UB - but the compiler should not stop translating just because you /might/ call it with an inappropriate argument.

    Functions that have unavoidable UB, such as the example in this thread,
    are not guaranteed to be called - the compiler cannot reasonably refuse
    to continue compiling. But it is also fair to say unavoidable UB in a function is almost certainly a mistake by the programmer, and a warning message (even without specifying any warning flags) would be a very nice
    thing for the compiler to give you.

    The C standard requires certain mistakes to require a diagnostic - a
    warning or fatal error message. It only does that for things that a
    compiler could reasonably be expected to identify, without having to
    simulate run-time conditions, or consider multiple translation units at
    once.



    We really want just a warning (at
    least by default; in specific project and situations, developers
    could elect to treat certain warnings as fatal, even standard-required
    warnings.)

    Even a warning would be enough though.-a Btw, my typical way of
    working is to enable -Werror while developing, but I don't like
    to force it in general.-a That would be an interesting digression,
    but definitely OT.


    (I too have "-Werror" enabled, at least once my initial builds are
    somewhat solidified - it means you can't lose an important warning
    message somewhere in the output of your build process.)

    The second new thing is that to diagnose this, we need to make
    diagnosis dependent on reachability.

    We want a rule which is something like "whenever the body of
    a function, or an initializing expression for an external definition
    reaches an expression which has unconditional undefined behavior
    that is not an unreachability assertion and not a documented
    extension, a warning diagnostic must be issued".

    That's an interesting perspective: reachability.
    Would you say that the incriminated piece of code is UB only if it
    is reachable in the final program, therefore it is acceptable
    to keep it as long as unreachable?

    Now that I think of it, the __builtin_unreachable() implemented
    by popular compilers is technically UB *if reached* :)


    "unreachable()" is now standard, in C23. But that may be what Kaz is referring to as "an unreachability assertion".

    And AFAIUI gcc and clang/llvm have an "UB" instruction or statement in
    their internal formats, and will use that for "__builtin_unreachable()"
    and also when generating code from your example.

    This kind of diagnostic would be a good thing in my opinion; just
    nobody has stepped up to the plate because of the challenges:

    - introducing the concept of a warning versus error diagnostic.

    - defining a clear set of rules for trivial reachability which
    -a-a can catch the majority of these situations without too much
    -a-a complexity. (The C++ rules for functions that return value
    -a-a reaching their end without a return statement can be used
    -a-a as inspiration here.)

    - specifying exactly what "statically obvious" undefined behavior
    -a-a is and how positively determine that a certain expression
    -a-a exhibits it.
    Now I'm wondering how much work it requires to properly define
    the rules that the standard mandates!


    Um, the standard defines the rules - that's the point. So your question
    is really "how much work did it take to write the C standard?". I don't
    think that's what you meant.

    As for me the main take-away is that the detection of certain UB
    is non-trivial, it would be very evil if the standard was mandating
    some nearly-impossible task to the compiler!


    The standard is quite lenient on what it requires from C compilers
    (though most don't follow all its rules by default). Static warnings
    are a matter of quality of implementation, not requirements of the
    language. This lets people write relatively small and simple C
    compilers if they want, while also giving big toolchains the freedom to
    add lots more checking and developer help.


    (The C++ rules for functions that return value
    -a-a-a reaching their end without a return statement can be used
    -a-a-a as inspiration here.)

    C++ does *what*?? I'm definitely not up to speed with C++, but
    I totally have missed that.-a Could you please tell me the name
    of this bizarre feature? I *need* to look it up :D


    I believe the difference is in the behaviour of a function that is
    declared to return a value (i.e., not "void") but which exits without a returning value. In C, this is allowed - but it is UB to attempt to use
    the non-existent return value. In C++, it is UB to fail to return a
    value - which is far easier for a compiler to diagnose.

    So if you have :

    int foo(void) { }

    int bar(void) { return foo(); }

    then in C++, the UB is in the definition of "foo", while in C it is in
    the run-time use of "foo" inside "bar".


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From highcrew@high.crew3868@fastmail.com to comp.lang.c on Sun Jan 4 14:38:00 2026
    From Newsgroup: comp.lang.c

    On 1/2/26 11:56 PM, Kaz Kylheku wrote:
    You literally cannot write a test case which tests for the "return 0",
    such that the test case has well-defined behavior.

    All well-defined test cases can only test for 1 being returned.

    And that is satisfied by machine code which unconditionally returns 1.

    I appreciate the nuance, or at least think I understand what you are
    saying here. A test that aims at spotting UB is necessarily using
    UB-tainted code, so it might even pass against all odds.

    Then it looks to me like this is one of those situations where practice
    beats theory. Unit testing, sanitizers, fuzzers... these tools will
    reveal the defect with a very high likelihood.

    Not differently from halting problem: sure, it is theoretically
    impossible to understand if a program will terminate, but in practical
    terms, if you expect it to take less than 1 second and it takes more
    than 10, you area already hitting ^C and conjecturing that something
    went horribly wrong :D
    --
    High Crew
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Sun Jan 4 15:51:48 2026
    From Newsgroup: comp.lang.c

    James Kuyper <jameskuyper@alumni.caltech.edu> writes:
    On 2026-01-03 18:25, highcrew wrote:
    On 1/4/26 12:15 AM, highcrew wrote:
    I have a horrible question now, but that's for a
    separate question...

    And the question is:

    Embedded systems. Address 0x00000000 is mapped to the flash.
    I want to assign a pointer to 0x00000000 and dereference it to
    read the first word.
    That's UB.

    Actually, that's not necessarily true. A null pointer is not required to >refer to the location with an address of 0. An integer constant
    expression with a value of 0, converted to a pointer type, is guaranteed
    to be a null pointer, but that pointer need not have a representation
    that has all bits 0. However, an integer expression that is not a
    constant expression, if converted to a pointer type, is not required to
    be a null pointer - it could convert to an entirely different pointer value.

    So an implementation could allow it simply by reserving a pointer to
    some other location (such as the last position in memory) as the >representation of a null pointer.

    How do I?

    Even on an implementation that uses a pointer representing a machine
    address of 0 as a null pointer, such code can still work. In the C
    standard, "undefined behavior" means that the C standard imposes no >requirements on the behavior. That doesn't prohibit other sources from >imposing requirements. On such a system, it could define the behavior as >accessing the flash.

    Indeed, every C compiler I've ever used has simply dereferenced a
    pointer that has a value of zero. In user mode, the kernel will
    generally trap and generate a SIGSEGV or equivalent. In kernel
    mode, it will just work, assuming that the CPU is configured to
    run with MMU disabled (or the MMU has a valid mapping for virtual
    address zero).

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Sun Jan 4 15:56:20 2026
    From Newsgroup: comp.lang.c

    Lawrence =?iso-8859-13?q?D=FFOliveiro?= <ldo@nz.invalid> writes:
    On Sat, 3 Jan 2026 17:24:54 -0800, Andrey Tarasevich wrote:

    The compiler on that embedded system is, of course, aware of the
    fact that address 0x00000000 is perfectly valid and should be left
    accessible. So, for that reason, the compiler is supposed to choose
    some other physical representation for null pointers ...

    What if the entire machine address space is valid? Are C pointer types >supposed to add an extra rCLinvalidrCY value on top of that?

    In the Burroughs medium systems line, which is (was) a BCD machine addressed
    to the nibble, the bit pattern for a NULL pointer included 'undigits'
    (invalid BCD digits 0b1010 through 0b1111). Specifically, @CxEEEEEE@
    was the bit pattern for a NULL pointer on that architecture.

    In 40 years of OS, Hypervisor and firmware programming, I've never seen
    a C compiler that didn't dereference a NULL pointer when asked to on
    any modern CPU (x86, 88100, SPARC, MIPS, arm32, arm64).
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Sun Jan 4 17:16:22 2026
    From Newsgroup: comp.lang.c

    On 04/01/2026 00:25, highcrew wrote:
    On 1/4/26 12:15 AM, highcrew wrote:
    I have a horrible question now, but that's for a
    separate question...

    And the question is:

    Embedded systems.-a Address 0x00000000 is mapped to the flash.
    I want to assign a pointer to 0x00000000 and dereference it to
    read the first word.
    That's UB.

    How do I?

    Now I guess that an embedded compiler targeting that certain
    architecture where dereferencing 0 makes sense will not treat
    it as UB.-a But it is for sure a weird corner case.


    There are some common misconceptions about null pointers in C. A "null pointer" is the result of converting a "null pointer constant", or
    another "null pointer", to a pointer type. A null pointer constant is
    either an integer constant expression with the value 0 (such as the
    constant 0, or "1 - 1"), or "nullptr" in C23. You can use "NULL" from <stddef.h> as a null pointer constant.

    So if you write "int * p = 0;", then "p" holds a null pointer. If you
    write "int * p = (int *) sizeof(*p); p--;" then "p" does not hold a null pointer, even though it will hold the value "0".

    On virtually all real-world systems, including all embedded systems I
    have ever known (and that's quite a few), null pointers correspond to
    the address 0. But that does not mean that dereferencing a pointer
    whose value is 0 is necessarily UB.

    And even when dereferencing a pointer /is/ UB, a compiler can handle it
    as defined if it wants.

    I think that if you have a microcontroller with code at address 0, and a pointer of some object type (say, "const uint8_t * p" or "const uint32_t
    * p") holding the address 0, then using that to read the flash at that
    address is UB. But it is not UB because "p" holds a null pointer - it
    may or may not be a null pointer. It is UB because "p" does not point
    to an object.

    In practice, I have never seen an embedded compiler fail to do the
    expected thing when reading flash from address 0. (Typical use-cases
    are for doing CRC checks or signature checks on code, or for reading the initial stack pointer value or reset vector of the code.) If you want
    to be more confident, use a pointer to volatile type.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Sun Jan 4 13:00:02 2026
    From Newsgroup: comp.lang.c

    On 2026-01-03 23:52, Lawrence DrCOOliveiro wrote:
    On Sat, 3 Jan 2026 21:31:20 -0500, James Kuyper wrote:

    On 2026-01-03 21:19, Lawrence DrCOOliveiro wrote:

    What if the entire machine address space is valid? Are C pointer
    types supposed to add an extra rCLinvalidrCY value on top of that?

    Either that, or set aside one piece of addressable memory that is
    not available to user code. Note, in particular, that it might be a
    piece of memory used by the implementation of C, or by the operating
    system. In which case, the undefined behavior that can occur as a
    result of dereferencing a null point would take the form of messing
    up the C runtime or the operating system.

    rCLUndefined behaviourrCY could also include rCLperforming a valid memory accessrCY, could it not.

    Of course. In fact, the single most dangerous thing that can occur when
    code with undefined behavior is executed is that it does exactly what
    you incorrectly believe it is required to do. As a result, you fail to
    be warned of the error in your beliefs.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Sun Jan 4 21:22:57 2026
    From Newsgroup: comp.lang.c

    On Sun, 4 Jan 2026 13:00:02 -0500, James Kuyper wrote:

    On 2026-01-03 23:52, Lawrence DrCOOliveiro wrote:

    On Sat, 3 Jan 2026 21:31:20 -0500, James Kuyper wrote:

    On 2026-01-03 21:19, Lawrence DrCOOliveiro wrote:

    What if the entire machine address space is valid? Are C pointer
    types supposed to add an extra rCLinvalidrCY value on top of that?

    Either that, or set aside one piece of addressable memory that is
    not available to user code. Note, in particular, that it might be
    a piece of memory used by the implementation of C, or by the
    operating system. In which case, the undefined behavior that can
    occur as a result of dereferencing a null point would take the
    form of messing up the C runtime or the operating system.

    rCLUndefined behaviourrCY could also include rCLperforming a valid memory
    accessrCY, could it not.

    Of course. In fact, the single most dangerous thing that can occur
    when code with undefined behavior is executed is that it does
    exactly what you incorrectly believe it is required to do. As a
    result, you fail to be warned of the error in your beliefs.

    In this case, itrCOs not clear what choice you have.

    Call it a C language limitation ...
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Sun Jan 4 21:42:13 2026
    From Newsgroup: comp.lang.c

    On Sun, 4 Jan 2026 14:38:00 +0100, highcrew wrote:

    Not differently from halting problem: sure, it is theoretically
    impossible to understand if a program will terminate, but in
    practical terms, if you expect it to take less than 1 second and it
    takes more than 10, you area already hitting ^C and conjecturing
    that something went horribly wrong :D

    What do Windows users hit instead of CTRL/C? Because CTRL/C means
    something different to them, doesnrCOt it?
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Sun Jan 4 16:53:16 2026
    From Newsgroup: comp.lang.c

    On 2026-01-04 16:22, Lawrence DrCOOliveiro wrote:
    On Sun, 4 Jan 2026 13:00:02 -0500, James Kuyper wrote:

    On 2026-01-03 23:52, Lawrence DrCOOliveiro wrote:
    ...
    rCLUndefined behaviourrCY could also include rCLperforming a valid memory >>> accessrCY, could it not.

    Of course. In fact, the single most dangerous thing that can occur
    when code with undefined behavior is executed is that it does
    exactly what you incorrectly believe it is required to do. As a
    result, you fail to be warned of the error in your beliefs.

    In this case, itrCOs not clear what choice you have.

    I may have lost the thread here - which choice are you talking about?
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Sun Jan 4 16:58:40 2026
    From Newsgroup: comp.lang.c

    On 2026-01-04 08:38, highcrew wrote:
    ...
    Not differently from halting problem: sure, it is theoretically
    impossible to understand if a program will terminate,

    That's an incorrect characterization of the halting problem. There are
    many programs where it's entirely feasible, and even easy, to determine
    whether they will halt. What has been proven is that there must be some programs that for which it cannnot be done.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Mon Jan 5 00:16:06 2026
    From Newsgroup: comp.lang.c

    On Sun, 4 Jan 2026 16:53:16 -0500, James Kuyper wrote:

    On 2026-01-04 16:22, Lawrence DrCOOliveiro wrote:

    On Sun, 4 Jan 2026 13:00:02 -0500, James Kuyper wrote:

    On 2026-01-03 23:52, Lawrence DrCOOliveiro wrote:
    ...
    rCLUndefined behaviourrCY could also include rCLperforming a valid
    memory accessrCY, could it not.

    Of course. In fact, the single most dangerous thing that can occur
    when code with undefined behavior is executed is that it does
    exactly what you incorrectly believe it is required to do. As a
    result, you fail to be warned of the error in your beliefs.

    In this case, itrCOs not clear what choice you have.

    I may have lost the thread here - which choice are you talking
    about?

    What if the entire machine address space is valid? Are C pointer types
    supposed to add an extra rCLinvalidrCY value on top of that?
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Mon Jan 5 08:49:14 2026
    From Newsgroup: comp.lang.c

    On 04/01/2026 22:58, James Kuyper wrote:
    On 2026-01-04 08:38, highcrew wrote:
    ...
    Not differently from halting problem: sure, it is theoretically
    impossible to understand if a program will terminate,

    That's an incorrect characterization of the halting problem. There are
    many programs where it's entirely feasible, and even easy, to determine whether they will halt. What has been proven is that there must be some programs that for which it cannnot be done.

    That is also imprecise. The halting problem is about proving that there
    is no /single/ algorithm (or equivalently, program) that can determine
    the halting status of /all/ programs. It is not about the existence of
    a program whose halting status cannot be determined - it is that for any systematic method you might use to determine the halting status of
    programs, there are always programs for which that method won't work.

    In the context of static error checking for runtime UB, this means that
    no matter how smart a static analyser is, you can always write a program
    with runtime UB that the analyser won't identify for you. You can then
    extend that analyser to cover this new case, but no matter how great you
    make your analyser, there will always be programs with UB that it can't identify.


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Mon Jan 5 08:55:54 2026
    From Newsgroup: comp.lang.c

    On 04/01/2026 16:51, Scott Lurndal wrote:
    James Kuyper <jameskuyper@alumni.caltech.edu> writes:
    On 2026-01-03 18:25, highcrew wrote:
    On 1/4/26 12:15 AM, highcrew wrote:
    I have a horrible question now, but that's for a
    separate question...

    And the question is:

    Embedded systems. Address 0x00000000 is mapped to the flash.
    I want to assign a pointer to 0x00000000 and dereference it to
    read the first word.
    That's UB.

    Actually, that's not necessarily true. A null pointer is not required to
    refer to the location with an address of 0. An integer constant
    expression with a value of 0, converted to a pointer type, is guaranteed
    to be a null pointer, but that pointer need not have a representation
    that has all bits 0. However, an integer expression that is not a
    constant expression, if converted to a pointer type, is not required to
    be a null pointer - it could convert to an entirely different pointer value. >>
    So an implementation could allow it simply by reserving a pointer to
    some other location (such as the last position in memory) as the
    representation of a null pointer.

    How do I?

    Even on an implementation that uses a pointer representing a machine
    address of 0 as a null pointer, such code can still work. In the C
    standard, "undefined behavior" means that the C standard imposes no
    requirements on the behavior. That doesn't prohibit other sources from
    imposing requirements. On such a system, it could define the behavior as
    accessing the flash.

    Indeed, every C compiler I've ever used has simply dereferenced a
    pointer that has a value of zero. In user mode, the kernel will
    generally trap and generate a SIGSEGV or equivalent. In kernel
    mode, it will just work, assuming that the CPU is configured to
    run with MMU disabled (or the MMU has a valid mapping for virtual
    address zero).


    The context (embedded systems with flash at address 0) implies you don't
    have signals, an MMU, or other "big OS" features. While embedded
    systems over a certain size usually have some kind of memory protection
    unit, and interrupts/traps/exceptions for address or bus errors, you can
    be very confident that these will not trigger on attempts to read from
    address 0 if that is part of the normal code address area - the
    protection systems are not that fine-grained. (You might, while trying
    to catch a bad pointer bug, put a read watchpoint at address 0 in your debugger.)


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Mon Jan 5 09:07:16 2026
    From Newsgroup: comp.lang.c

    On 04/01/2026 19:00, James Kuyper wrote:
    On 2026-01-03 23:52, Lawrence DrCOOliveiro wrote:
    On Sat, 3 Jan 2026 21:31:20 -0500, James Kuyper wrote:

    On 2026-01-03 21:19, Lawrence DrCOOliveiro wrote:

    What if the entire machine address space is valid? Are C pointer
    types supposed to add an extra rCLinvalidrCY value on top of that?

    Either that, or set aside one piece of addressable memory that is
    not available to user code. Note, in particular, that it might be a
    piece of memory used by the implementation of C, or by the operating
    system. In which case, the undefined behavior that can occur as a
    result of dereferencing a null point would take the form of messing
    up the C runtime or the operating system.

    rCLUndefined behaviourrCY could also include rCLperforming a valid memory
    accessrCY, could it not.

    Of course. In fact, the single most dangerous thing that can occur when
    code with undefined behavior is executed is that it does exactly what
    you incorrectly believe it is required to do. As a result, you fail to
    be warned of the error in your beliefs.

    I don't think that is the most dangerous thing that could happen with
    UB. Code that works as you expected during testing but fails after
    deployment is much worse. If the UB always results in the effect you intended, then the generated object code is correct for the tasks - even
    if the source code is unknowingly non-portable.

    And sometimes - especially in low-level embedded programming - getting
    the effect you want with the efficiency you want means knowingly writing
    code that has UB as far as C is concerned, but which results in the
    desired object code. Such code is inherently non-portable, but so is a
    lot of low-level embedded code. And you need to check the generated
    object code carefully, document it well, comment it well, and add any compile-time checks you can for compiler versions and other protection
    against someone re-using the code later without due consideration.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Mon Jan 5 06:41:13 2026
    From Newsgroup: comp.lang.c

    On 2026-01-04 19:16, Lawrence DrCOOliveiro wrote:
    On Sun, 4 Jan 2026 16:53:16 -0500, James Kuyper wrote:

    On 2026-01-04 16:22, Lawrence DrCOOliveiro wrote:
    ...
    In this case, itrCOs not clear what choice you have.

    I may have lost the thread here - which choice are you talking
    about?

    What if the entire machine address space is valid? Are C pointer types supposed to add an extra rCLinvalidrCY value on top of that?

    An implementation of C (keep in mind that the implementation includes
    the compiler, the linker, and the C standard library) can use any
    location they want for a null pointer, just so long as they make sure
    that no C object accessible to the user is stored in that location. No user-defined object should be allocated in that location, and no pointer returned by any standard library function (such as malloc() or
    asctime()) can return that location. If memory is tight, the
    implementation may use that location to store anything that is never
    supposed to be accessible to the user.
    Alternatively, pointers can be larger than needed to store just a
    machine address, and at least one bit of the extra space can be reserved
    to identify the pointer as null.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Mon Jan 5 15:39:28 2026
    From Newsgroup: comp.lang.c

    On 04/01/2026 12:51, highcrew wrote:
    On 1/4/26 2:10 AM, Paul J. Lucas wrote:
    This particular example is explained is several places, e.g.:

    https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633


    At a cursory read, that article looks okay. The lesson to learn is
    "look before you leap" - don't use data if you are not sure it is valid,
    and certainly don't add new uses of the data (such as debug prints) just before validity checks!

    It does, however, perpetuate the myth that there is a clear distinction between "classical compilers" or "non-optimising compilers" and
    "optimising compilers". That is not true - for any two standards
    conforming compilers (or selection of flags for the same compiler), the
    same source code is equally defined or undefined. Source code with UB
    has UB whether it is "optimised" or not, though the colour of the
    resulting nasal daemons may vary.


    Perhaps a slightly better explanation of the same example:

    https://medium.com/@pauljlucas/undefined-behavior-in-c-and-c-f30844f20e2a


    That one starts off with a bit of a jumble of misconceptions.


    To start with, "undefined behaviour" does not exist because of
    compatibility issues or the merging of different C variations into one standard C. It is a fundamental principle in programming because many computing functions are, mathematically, partial functions - they can
    only give a sensible defined result for some inputs. While it can
    sometimes be possible to verify the validity of inputs, it is often
    infeasible or at least very costly, especially in non-managed (compiled) languages. Pointer dereference, for example, only has defined behaviour
    if the pointer points to a valid object - otherwise the result is
    meaningless (even if some assembly code can be generated). Garbage in, garbage out - see the Babbage quotation.

    The C standard is simply somewhat unusual in that it is more explicit
    about UB than many languages' documentation. And being a language
    intended for maximally efficient code, C leaves a number of things as UB
    where other languages might throw an exception or have other error handling.

    The definition given for "implementation defined behaviour" and
    "unspecified behaviour" is poor. (IMHO the comp.lang.c FAQ is
    inaccurate here.) In particular, "unspecified behaviour" does not need
    to be consistent. For example, the order of evaluation of function
    arguments is unspecified, and can be done in different orders at
    different call sites - even in identical source code. It can even be re-ordered between different invocations of the same code - perhaps due
    to complicated inter-procedural optimisations, inlining, code cloning,
    and constant propagation.

    It then goes on to say that the order of evaluation of the operands of
    "+" are implementation defined, when it is in fact a good example of unspecified behaviour that is /not/ implementation defined.

    Implementation defined behaviour is /not/ "bad" - pretty much all
    programs rely on implementation-defined behaviour such as the size of
    "int", character sets used, etc. Relying on implementation-defined
    behaviour reduces the portability of code, but that is not necessary a
    bad thing.

    And while it is true that UB is "worse" than either
    implementation-defined behaviour or unspecified behaviour, it is not for either of the reasons given. The *nix program "date" does not need to
    contain UB in order to produce different results at different times.


    The examples of UB, and the consequences of them, are better.

    It also makes the mistake common in discussions of UB optimisations of concluding that the optimisation makes the code "wrong". Optimisations,
    such as the example of the "assign_not_null" function, are "logically
    valid" and /correct/ from the given source code. Optimisations have not
    made the code "wrong", nor has the compiler. The source code is correct
    for a given validity subset of its parameter types, and the object code
    is correct for that same subset. If the source code is intended to work
    over a wider range of inputs, then it is the source code that is wrong -
    not the optimiser or the optimised code.


    - Paul

    Hey, thanks for the pointers.
    I found the second a really good write up!


    I've seen worse, but it could be better.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Tue Jan 6 13:08:57 2026
    From Newsgroup: comp.lang.c

    David Brown <david.brown@hesbynett.no> wrote:
    On 04/01/2026 00:25, highcrew wrote:
    On 1/4/26 12:15 AM, highcrew wrote:
    I have a horrible question now, but that's for a
    separate question...

    And the question is:

    Embedded systems.-a Address 0x00000000 is mapped to the flash.
    I want to assign a pointer to 0x00000000 and dereference it to
    read the first word.
    That's UB.

    How do I?

    Now I guess that an embedded compiler targeting that certain
    architecture where dereferencing 0 makes sense will not treat
    it as UB.-a But it is for sure a weird corner case.


    There are some common misconceptions about null pointers in C. A "null pointer" is the result of converting a "null pointer constant", or
    another "null pointer", to a pointer type. A null pointer constant is either an integer constant expression with the value 0 (such as the
    constant 0, or "1 - 1"), or "nullptr" in C23. You can use "NULL" from <stddef.h> as a null pointer constant.

    So if you write "int * p = 0;", then "p" holds a null pointer. If you
    write "int * p = (int *) sizeof(*p); p--;" then "p" does not hold a null pointer, even though it will hold the value "0".

    On virtually all real-world systems, including all embedded systems I
    have ever known (and that's quite a few), null pointers correspond to
    the address 0. But that does not mean that dereferencing a pointer
    whose value is 0 is necessarily UB.

    And even when dereferencing a pointer /is/ UB, a compiler can handle it
    as defined if it wants.

    I think that if you have a microcontroller with code at address 0, and a pointer of some object type (say, "const uint8_t * p" or "const uint32_t
    * p") holding the address 0, then using that to read the flash at that address is UB. But it is not UB because "p" holds a null pointer - it
    may or may not be a null pointer. It is UB because "p" does not point
    to an object.

    In practice, I have never seen an embedded compiler fail to do the
    expected thing when reading flash from address 0. (Typical use-cases
    are for doing CRC checks or signature checks on code, or for reading the initial stack pointer value or reset vector of the code.) If you want
    to be more confident, use a pointer to volatile type.

    For curiosity I tried the following:

    #include <stdint.h>

    uint32_t
    read_at0(uint32_t * p) {
    if (!p) {
    return *p;
    } else {
    return 0;
    }
    }

    that is we read trough a pointer only when it is a null pointer.
    Using gcc-12 with command line:

    arm-none-eabi-gcc -O3 -fverbose-asm -fno-builtin -Wall -g -mthumb -mcpu=cortex-m3 -c ts_null.c

    I get the following assembly:

    00000000 <read_at0>:
    0: b108 cbz r0, 6 <read_at0+0x6>
    2: 2000 movs r0, #0
    4: 4770 bx lr
    6: 6803 ldr r3, [r0, #0]
    8: deff udf #255 @ 0xff
    a: bf00 nop

    So compiler generates actiual access, but then, instead of returning
    the value it executes undefined opcode. Without test for null
    pointer I get simple access to memory.

    So at least with gcc access works as long as compiler does not
    know that it is accessing null pointer. But if compiler can
    infer that pointer is null generated code may do strange
    things.

    Putting volatile qualifier on p gives working code, but apparently
    disables optimization. Also, this looks fragile. So if I needed
    to access address 0 I probably would use assembly routine to do this.
    --
    Waldek Hebisch
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Tue Jan 6 21:59:56 2026
    From Newsgroup: comp.lang.c

    On Tue, 6 Jan 2026 13:08:57 -0000 (UTC), Waldek Hebisch wrote:

    Putting volatile qualifier on p gives working code, but apparently
    disables optimization. Also, this looks fragile. So if I needed
    to access address 0 I probably would use assembly routine to do this.

    Seems to be a fundamental C language limitation, wouldnrCOt you say?
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Paul J. Lucas@paul@lucasmail.org to comp.lang.c on Tue Jan 6 18:08:22 2026
    From Newsgroup: comp.lang.c

    On 1/5/26 6:39 AM, David Brown wrote:
    On 04/01/2026 12:51, highcrew wrote:
    On 1/4/26 2:10 AM, Paul J. Lucas wrote:
    Perhaps a slightly better explanation of the same example:

    https://medium.com/@pauljlucas/undefined-behavior-in-c-and-c-
    f30844f20e2a

    That one starts off with a bit of a jumble of misconceptions.


    To start with, "undefined behaviour" does not exist because of
    compatibility issues or the merging of different C variations into one standard C.

    ...

    The C standard is simply somewhat unusual in that it is more explicit
    about UB than many languages' documentation.-a And being a language
    intended for maximally efficient code, C leaves a number of things as UB where other languages might throw an exception or have other error
    handling.

    Other languages had the luxury of doing that. As the article pointed
    out, C had existed for over a decade before the standard and there were
    many programs in the wild that relied on their existing behaviors. By
    this time, the C standard could not retroactively "throw an exception or
    have other error handling" since it would have broken those programs, so
    it _had_ to leave many things as UB explicitly. Hence, the article
    isn't wrong.

    Implementation defined behaviour is /not/ "bad" - pretty much all
    programs rely on implementation-defined behaviour such as the size of
    "int", character sets used, etc.-a Relying on implementation-defined behaviour reduces the portability of code, but that is not necessary a
    bad thing.

    It's "bad" if a naive programmer isn't aware it's implementation defined
    and just assumes it's defined however it's defined on his machine.

    And while it is true that UB is "worse" than either implementation-
    defined behaviour or unspecified behaviour, it is not for either of the reasons given.-a The *nix program "date" does not need to contain UB in order to produce different results at different times.

    Sure, but the article didn't mean such cases. It meant for cases like incrementing a signed integer past INT_MAX. A program could
    legitimately give different answers for the same line of code at
    different times.

    It also makes the mistake common in discussions of UB optimisations of concluding that the optimisation makes the code "wrong".-a Optimisations, such as the example of the "assign_not_null" function, are "logically
    valid" and /correct/ from the given source code.-a Optimisations have not made the code "wrong", nor has the compiler.-a The source code is correct for a given validity subset of its parameter types, and the object code
    is correct for that same subset.-a If the source code is intended to work over a wider range of inputs, then it is the source code that is wrong -
    not the optimiser or the optimised code.
    What the author meant is that optimization can make UB manifest more
    bizarrely in ways than not optimizing wouldn't. Code that contains UB
    is always wrong.

    - Paul
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From candycanearter07@candycanearter07@candycanearter07.nomail.afraid to comp.lang.c on Wed Jan 7 06:40:03 2026
    From Newsgroup: comp.lang.c

    Lawrence DrCOOliveiro <ldo@nz.invalid> wrote at 21:42 this Sunday (GMT):
    On Sun, 4 Jan 2026 14:38:00 +0100, highcrew wrote:

    Not differently from halting problem: sure, it is theoretically
    impossible to understand if a program will terminate, but in
    practical terms, if you expect it to take less than 1 second and it
    takes more than 10, you area already hitting ^C and conjecturing
    that something went horribly wrong :D

    What do Windows users hit instead of CTRL/C? Because CTRL/C means
    something different to them, doesnrCOt it?


    ctrl-alt-delete?
    --
    user <candycane> is generated from /dev/urandom
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Wed Jan 7 11:25:50 2026
    From Newsgroup: comp.lang.c

    On 07/01/2026 03:08, Paul J. Lucas wrote:
    On 1/5/26 6:39 AM, David Brown wrote:
    On 04/01/2026 12:51, highcrew wrote:
    On 1/4/26 2:10 AM, Paul J. Lucas wrote:
    Perhaps a slightly better explanation of the same example:

    https://medium.com/@pauljlucas/undefined-behavior-in-c-and-c-
    f30844f20e2a

    That one starts off with a bit of a jumble of misconceptions.


    To start with, "undefined behaviour" does not exist because of
    compatibility issues or the merging of different C variations into one
    standard C.

    ...

    The C standard is simply somewhat unusual in that it is more explicit
    about UB than many languages' documentation.-a And being a language
    intended for maximally efficient code, C leaves a number of things as
    UB where other languages might throw an exception or have other error
    handling.

    Other languages had the luxury of doing that.-a As the article pointed
    out, C had existed for over a decade before the standard and there were
    many programs in the wild that relied on their existing behaviors.-a By
    this time, the C standard could not retroactively "throw an exception or
    have other error handling" since it would have broken those programs, so
    it _had_ to leave many things as UB explicitly.-a Hence, the article
    isn't wrong.


    UB as a /concept/ does not exist because of compatibility issues.
    Certain particular things may have been declared UB in C because of compatibility between different existing compilers or different targets (though it is more common for such things to be declared "implementation-defined" rather than UB). I am, however, having
    difficulty finding examples of that for run-time UB. (There are plenty
    of situations where there is UB that could be identified at compile-time
    or link time, but the standard does not require toolchains to diagnose.)

    The idea that something can be expressed in a programming language,
    without errors in syntax, but have no meaningful or correct behaviour,
    is not new, and not restricted to C. UB in C is not different from
    asking for the square root of a negative number in the real domain, or
    asking a kid to add 3 and 4 using the fingers of one hand.


    Implementation defined behaviour is /not/ "bad" - pretty much all
    programs rely on implementation-defined behaviour such as the size of
    "int", character sets used, etc.-a Relying on implementation-defined
    behaviour reduces the portability of code, but that is not necessary a
    bad thing.

    It's "bad" if a naive programmer isn't aware it's implementation defined
    and just assumes it's defined however it's defined on his machine.


    Sure. But that applies to all portability issues - people make all
    sorts of assumptions about the system their code will be used on, of
    which the implementation-defined aspects of C are only a small part.

    And while it is true that UB is "worse" than either implementation-
    defined behaviour or unspecified behaviour, it is not for either of
    the reasons given.-a The *nix program "date" does not need to contain
    UB in order to produce different results at different times.

    Sure, but the article didn't mean such cases.

    If the author meant something different, he/she should have written
    something different.

    It meant for cases like
    incrementing a signed integer past INT_MAX.-a A program could
    legitimately give different answers for the same line of code at
    different times.

    It could also give different answers for unspecified behaviour :

    int first(void) { printf("1 "); return 1; }
    int second(void) { printf("2 "); return 2; }

    int x = first() + second();

    The evaluation order of the operands of the addition - and therefore the
    order of the debug prints, is unspecified. Not only is the order not something specified by the C standards, but it is not something that
    needs to be consistent even between different runs of the same code.

    So this "giving different answers" is not something special about UB.


    It also makes the mistake common in discussions of UB optimisations of
    concluding that the optimisation makes the code "wrong".
    Optimisations, such as the example of the "assign_not_null" function,
    are "logically valid" and /correct/ from the given source code.
    Optimisations have not made the code "wrong", nor has the compiler.
    The source code is correct for a given validity subset of its
    parameter types, and the object code is correct for that same subset.
    If the source code is intended to work over a wider range of inputs,
    then it is the source code that is wrong - not the optimiser or the
    optimised code.
    What the author meant is that optimization can make UB manifest more bizarrely in ways than not optimizing wouldn't.-a Code that contains UB
    is always wrong.


    If the author meant something different from what he wrote, it would
    have been better if he wrote what he meant.

    Yes, in practice you /can/ get a wider variety of strange results from
    code with UB if you use a highly optimising compiler compared to a
    simple compiler. But there are no guarantees there - you can get
    strange results from UB when not optimising, and perhaps enabling
    optimisation will give you simple and more consistent results (possibly
    the results you expected, possibly not).

    It is fine to tell people about some of the strange possibilities that
    can occur when you have UB. But anything that even sounds vaguely like
    a suggestion that you can mitigate the dangers of UB by disabling
    optimisation is bad. Far too many C programmers believe that.


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Wed Jan 7 06:31:31 2026
    From Newsgroup: comp.lang.c

    On 2026-01-06 21:08, Paul J. Lucas wrote:
    ...
    What the author meant is that optimization can make UB manifest more bizarrely in ways than not optimizing wouldn't. Code that contains UB
    is always wrong.

    "undefined behavior" is defined by the C standard as referring to
    behavior on which "this international standard imposes no requirements".
    It remains UB even if some other document imposes requirements on the
    behavior. In particular, if a given implementation implements an
    extension that gives defined behavior to code that the C standard does
    not, it's still UB, but it's entirely reasonable for users of that implementation to decide they want to use that extension.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Wed Jan 7 14:10:46 2026
    From Newsgroup: comp.lang.c

    On Tue, 6 Jan 2026 18:08:22 -0800
    "Paul J. Lucas" <paul@lucasmail.org> wrote:


    Other languages had the luxury of doing that. As the article pointed
    out, C had existed for over a decade before the standard and there
    were many programs in the wild that relied on their existing
    behaviors. By this time, the C standard could not retroactively
    "throw an exception or have other error handling" since it would have
    broken those programs, so it _had_ to leave many things as UB
    explicitly. Hence, the article isn't wrong.


    O.T.
    Rust exists for 13 years without standard. Did not prevent it from
    becoming more hyped than Ada in her heyday.

    Go exists without standard for how long? 20 years?
    But at least in case of Go there exists official specification that
    is not rewritten on every Tuesday.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Andrey Tarasevich@noone@noone.net to comp.lang.c on Wed Jan 7 20:48:03 2026
    From Newsgroup: comp.lang.c

    On Tue 1/6/2026 5:08 AM, Waldek Hebisch wrote:

    I get the following assembly:

    00000000 <read_at0>:
    0: b108 cbz r0, 6 <read_at0+0x6>
    2: 2000 movs r0, #0
    4: 4770 bx lr
    6: 6803 ldr r3, [r0, #0]
    8: deff udf #255 @ 0xff
    a: bf00 nop

    So compiler generates actiual access, but then, instead of returning
    the value it executes undefined opcode. Without test for null
    pointer I get simple access to memory.


    When it comes to invalid (or missing, in C++) `return` statements, GCC
    tends to adhere to a "punitive" approach in optimized code - it injects instructions to deliberately cause a crash/segfault in such cases.

    Clang on the other hand tends to stick to the uniform approach based on
    the "UB cannot happen" methodology, i.e. your code sample would be
    translated under "p is never null" assumption, and the function will
    fold into a simple unconditional `return 0`.
    --
    Best regards,
    Andrey
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Thu Jan 8 23:56:16 2026
    From Newsgroup: comp.lang.c

    On Wed, 7 Jan 2026 20:48:03 -0800, Andrey Tarasevich wrote:

    When it comes to invalid (or missing, in C++) `return` statements,
    GCC tends to adhere to a "punitive" approach in optimized code - it
    injects instructions to deliberately cause a crash/segfault in such
    cases.

    Clang on the other hand tends to stick to the uniform approach based
    on the "UB cannot happen" methodology, i.e. your code sample would
    be translated under "p is never null" assumption, and the function
    will fold into a simple unconditional `return 0`.

    Which one is more likely to lead to unexpected, hard-to-debug results?
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Fri Jan 9 01:42:53 2026
    From Newsgroup: comp.lang.c

    highcrew <high.crew3868@fastmail.com> writes:

    Hello,

    While I consider myself reasonably good as C programmer, I still
    have difficulties in understanding undefined behavior.
    I wonder if anyone in this NG could help me.

    Let's take an example. There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
    So let's focus on https://godbolt.org/z/48bn19Tsb

    For the lazy, I report it here:

    int table[4] = {0};
    int exists_in_table(int v)
    {
    // return true in one of the first 4 iterations
    // or UB due to out-of-bounds access
    for (int i = 0; i <= 4; i++) {
    if (table[i] == v) return 1;
    }
    return 0;
    }

    This is compiled (with no warning whatsoever) into:

    exists_in_table:
    mov eax, 1
    ret
    table:
    .zero 16


    Well, this is *obviously* wrong. And sure, so is the original code,
    but I find it hard to think that the compiler isn't able to notice it,
    given that it is even "exploiting" it to produce very efficient code.

    I understand the formalism: the resulting assembly is formally
    "correct", in that UB implies that anything can happen.
    Yet I can't think of any situation where the resulting assembly
    could be considered sensible. The compiled function will
    basically return 1 for any input, and the final program will be
    buggy.

    Wouldn't it be more sensible to have a compilation error, or
    at least a warning? The compiler will be happy even with -Wall -Wextra -Werror.

    There's plenty of documentation, articles and presentations that
    explain how this can make very efficient code... but nothing
    will answer this question: do I really want to be efficiently
    wrong?

    I mean, yes I would find the problem, thanks to my 100% coverage
    unit testing, but couldn't the compiler give me a hint?

    Could someone drive me into this reasoning? I know there is a lot of thinking behind it, yet everything seems to me very incorrect!
    I'm in deep cognitive dissonance here! :) Help!

    The important thing to realize is that the fundamental issue here
    is not a technical question but a social question. In effect what
    you are asking is "why doesn't gcc (or clang, or whatever) do what
    I want or expect?". The answer is different people want or expect
    different things. For some people the behavior described is
    egregiously wrong and must be corrected immediately. For other
    people the compiler is acting just as they think it should,
    nothing to see here, just fix the code and move on to the next
    bug. Different people have different priorities.

    After observing that, I think the right question is something like
    "Given that compilers act in these surprising ways, how should I
    protect my code so that it doesn't fall prey to the death-by-UB
    syndrome, or what can I do to diagnose a possibly death-by-UB
    situation when a strange bug crops up?" I don't pretend to have
    good answers to these questions. The best advice I can give
    (besides seeking help from others with more experience) is to be
    persistent, and to realize that the skills needed for combating a
    death-by-UB syndrome are rather different from the skills needed
    for regular programming. I have been in the situation of being
    made responsible for finding and correcting a death-by-UB kind of
    symptom, and what's worse in programming environment where I
    didn't have a great deal of familiarity or experience. Despite
    those drawbacks the bug got diagnosed and fixed, and I attribute
    that result mostly to tenacity and by being willing to consider
    unusual or unfamiliar points of view.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Fri Jan 9 14:36:47 2026
    From Newsgroup: comp.lang.c

    On Fri, 09 Jan 2026 01:42:53 -0800
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    highcrew <high.crew3868@fastmail.com> writes:

    Hello,

    While I consider myself reasonably good as C programmer, I still
    have difficulties in understanding undefined behavior.
    I wonder if anyone in this NG could help me.

    Let's take an example. There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
    So let's focus on https://godbolt.org/z/48bn19Tsb

    For the lazy, I report it here:

    int table[4] = {0};
    int exists_in_table(int v)
    {
    // return true in one of the first 4 iterations
    // or UB due to out-of-bounds access
    for (int i = 0; i <= 4; i++) {
    if (table[i] == v) return 1;
    }
    return 0;
    }

    This is compiled (with no warning whatsoever) into:

    exists_in_table:
    mov eax, 1
    ret
    table:
    .zero 16


    Well, this is *obviously* wrong. And sure, so is the original code,
    but I find it hard to think that the compiler isn't able to notice
    it, given that it is even "exploiting" it to produce very efficient
    code.

    I understand the formalism: the resulting assembly is formally
    "correct", in that UB implies that anything can happen.
    Yet I can't think of any situation where the resulting assembly
    could be considered sensible. The compiled function will
    basically return 1 for any input, and the final program will be
    buggy.

    Wouldn't it be more sensible to have a compilation error, or
    at least a warning? The compiler will be happy even with -Wall
    -Wextra -Werror.

    There's plenty of documentation, articles and presentations that
    explain how this can make very efficient code... but nothing
    will answer this question: do I really want to be efficiently
    wrong?

    I mean, yes I would find the problem, thanks to my 100% coverage
    unit testing, but couldn't the compiler give me a hint?

    Could someone drive me into this reasoning? I know there is a lot
    of thinking behind it, yet everything seems to me very incorrect!
    I'm in deep cognitive dissonance here! :) Help!

    The important thing to realize is that the fundamental issue here
    is not a technical question but a social question. In effect what
    you are asking is "why doesn't gcc (or clang, or whatever) do what
    I want or expect?". The answer is different people want or expect
    different things. For some people the behavior described is
    egregiously wrong and must be corrected immediately. For other
    people the compiler is acting just as they think it should,
    nothing to see here, just fix the code and move on to the next
    bug. Different people have different priorities.


    I have hard time imagining sort of people that would have objections in
    case compiler generates the same code as today, but issues diagnostic.
    Probably in the same style that it often produces in similar situations:
    warning: array subscript 4 is above array bounds of 'int[4]'
    [-Warray-bounds]

    After observing that, I think the right question is something like
    "Given that compilers act in these surprising ways, how should I
    protect my code so that it doesn't fall prey to the death-by-UB
    syndrome, or what can I do to diagnose a possibly death-by-UB
    situation when a strange bug crops up?" I don't pretend to have
    good answers to these questions. The best advice I can give
    (besides seeking help from others with more experience) is to be
    persistent, and to realize that the skills needed for combating a
    death-by-UB syndrome are rather different from the skills needed
    for regular programming. I have been in the situation of being
    made responsible for finding and correcting a death-by-UB kind of
    symptom, and what's worse in programming environment where I
    didn't have a great deal of familiarity or experience. Despite
    those drawbacks the bug got diagnosed and fixed, and I attribute
    that result mostly to tenacity and by being willing to consider
    unusual or unfamiliar points of view.


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Fri Jan 9 15:54:48 2026
    From Newsgroup: comp.lang.c

    On Thu, 1 Jan 2026 22:54:05 +0100
    highcrew <high.crew3868@fastmail.com> wrote:

    Hello,

    While I consider myself reasonably good as C programmer, I still
    have difficulties in understanding undefined behavior.
    I wonder if anyone in this NG could help me.

    Let's take an example. There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
    So let's focus on https://godbolt.org/z/48bn19Tsb

    For the lazy, I report it here:

    int table[4] = {0};
    int exists_in_table(int v)
    {
    // return true in one of the first 4 iterations
    // or UB due to out-of-bounds access
    for (int i = 0; i <= 4; i++) {
    if (table[i] == v) return 1;
    }
    return 0;
    }

    This is compiled (with no warning whatsoever) into:

    exists_in_table:
    mov eax, 1
    ret
    table:
    .zero 16


    Well, this is *obviously* wrong. And sure, so is the original code,
    but I find it hard to think that the compiler isn't able to notice it,
    given that it is even "exploiting" it to produce very efficient code.

    I understand the formalism: the resulting assembly is formally
    "correct", in that UB implies that anything can happen.
    Yet I can't think of any situation where the resulting assembly
    could be considered sensible. The compiled function will
    basically return 1 for any input, and the final program will be
    buggy.

    Wouldn't it be more sensible to have a compilation error, or
    at least a warning? The compiler will be happy even with -Wall
    -Wextra -Werror.

    There's plenty of documentation, articles and presentations that
    explain how this can make very efficient code... but nothing
    will answer this question: do I really want to be efficiently
    wrong?

    I mean, yes I would find the problem, thanks to my 100% coverage
    unit testing, but couldn't the compiler give me a hint?

    Could someone drive me into this reasoning? I know there is a lot of
    thinking behind it, yet everything seems to me very incorrect!
    I'm in deep cognitive dissonance here! :) Help!


    Personally, I am not shocked by gcc behavior in this case. May be,
    saddened, but not shocked.
    I am shocked by slightly modified variant of it.

    struct {
    int table[4];
    int other_table[4];
    } bar;

    int exists_in_table(int v)
    {
    for (int i = 0; i <= 4; i++) {
    if (bar.table[i] == v)
    return 1;
    }
    return 0;
    }

    An original variant is unlikely to be present in the code bases that I
    care about professionally. But something akin to modified variant could
    be present.
    Godbolt shows that this behaviour was first introduced in gcc5. It was backported to gcc4 series in gcc 4.8

    One of my suspect code bases currently at gcc 4.7. I was considering
    moving to 5.3. In lights of that example, I likely am not going to
    do it.
    Unless there is a magic flag that disables this optimization.












    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From wij@wyniijj5@gmail.com to comp.lang.c on Sat Jan 10 00:08:24 2026
    From Newsgroup: comp.lang.c

    On Fri, 2026-01-09 at 15:54 +0200, Michael S wrote:
    On Thu, 1 Jan 2026 22:54:05 +0100
    highcrew <high.crew3868@fastmail.com> wrote:

    Hello,

    While I consider myself reasonably good as C programmer, I still
    have difficulties in understanding undefined behavior.
    I wonder if anyone in this NG could help me.

    Let's take an example.-a There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
    So let's focus on https://godbolt.org/z/48bn19Tsb

    For the lazy, I report it here:

    -a-a int table[4] = {0};
    -a-a int exists_in_table(int v)
    -a-a {
    -a-a-a-a-a-a // return true in one of the first 4 iterations
    -a-a-a-a-a-a // or UB due to out-of-bounds access
    -a-a-a-a-a-a for (int i = 0; i <= 4; i++) {
    -a-a-a-a-a-a-a-a-a-a if (table[i] == v) return 1;
    -a-a-a-a-a-a }
    -a-a-a-a-a-a return 0;
    -a-a }

    This is compiled (with no warning whatsoever) into:

    -a-a exists_in_table:
    -a-a-a-a-a-a-a-a-a-a mov-a-a-a-a eax, 1
    -a-a-a-a-a-a-a-a-a-a ret
    -a-a table:
    -a-a-a-a-a-a-a-a-a-a .zero-a-a 16


    Well, this is *obviously* wrong. And sure, so is the original code,
    but I find it hard to think that the compiler isn't able to notice it, given that it is even "exploiting" it to produce very efficient code.

    I understand the formalism: the resulting assembly is formally
    "correct", in that UB implies that anything can happen.
    Yet I can't think of any situation where the resulting assembly
    could be considered sensible.-a The compiled function will
    basically return 1 for any input, and the final program will be
    buggy.
    It is UB, what the implement is irrevant.
    The for loop above is equivalent to:
    for (int i = 0; i <= 3; i++) {
    if (table[i] == v) return 1;
    }
    if(table[i]==v) { // implement defined
    return 1;
    }
    // implement defined
    So, always returning 1 is correct compilation (no way exists_in_table(v) will return non-1).
    Wouldn't it be more sensible to have a compilation error, or
    at least a warning?-a The compiler will be happy even with -Wall
    -Wextra -Werror.

    There's plenty of documentation, articles and presentations that
    explain how this can make very efficient code... but nothing
    will answer this question: do I really want to be efficiently
    wrong?

    I mean, yes I would find the problem, thanks to my 100% coverage
    unit testing, but couldn't the compiler give me a hint?

    Could someone drive me into this reasoning? I know there is a lot of thinking behind it, yet everything seems to me very incorrect!
    I'm in deep cognitive dissonance here! :) Help!


    Personally, I am not shocked by gcc behavior in this case. May be,
    saddened, but not shocked.
    I am shocked by slightly modified variant of it.

    struct {
    -a int table[4];
    -a int other_table[4];
    } bar;

    int exists_in_table(int v)
    {
    -a-a for (int i = 0; i <= 4; i++) {
    -a-a-a-a if (bar.table[i] == v)
    -a-a-a-a-a-a return 1;
    -a-a }
    -a-a return 0;
    }

    An original variant is unlikely to be present in the code bases that I
    care about professionally. But something akin to modified variant could
    be present.
    Godbolt shows that this behaviour was first introduced in gcc5. It was backported to gcc4 series in gcc 4.8

    One of my suspect code bases currently at gcc 4.7. I was considering
    moving to 5.3. In lights of that example, I likely am not going to
    do it.
    Unless there is a magic flag that disables this optimization.

    I am also shocked many seemingly missed.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Kaz Kylheku@046-301-5902@kylheku.com to comp.lang.c on Fri Jan 9 20:14:04 2026
    From Newsgroup: comp.lang.c

    On 2026-01-09, Michael S <already5chosen@yahoo.com> wrote:
    On Fri, 09 Jan 2026 01:42:53 -0800
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    highcrew <high.crew3868@fastmail.com> writes:

    Hello,

    While I consider myself reasonably good as C programmer, I still
    have difficulties in understanding undefined behavior.
    I wonder if anyone in this NG could help me.

    Let's take an example. There's plenty here:
    https://en.cppreference.com/w/c/language/behavior.html
    So let's focus on https://godbolt.org/z/48bn19Tsb

    For the lazy, I report it here:

    int table[4] = {0};
    int exists_in_table(int v)
    {
    // return true in one of the first 4 iterations
    // or UB due to out-of-bounds access
    for (int i = 0; i <= 4; i++) {
    if (table[i] == v) return 1;
    }
    return 0;
    }

    This is compiled (with no warning whatsoever) into:

    exists_in_table:
    mov eax, 1
    ret
    table:
    .zero 16


    Well, this is *obviously* wrong. And sure, so is the original code,
    but I find it hard to think that the compiler isn't able to notice
    it, given that it is even "exploiting" it to produce very efficient
    code.

    I understand the formalism: the resulting assembly is formally
    "correct", in that UB implies that anything can happen.
    Yet I can't think of any situation where the resulting assembly
    could be considered sensible. The compiled function will
    basically return 1 for any input, and the final program will be
    buggy.

    Wouldn't it be more sensible to have a compilation error, or
    at least a warning? The compiler will be happy even with -Wall
    -Wextra -Werror.

    There's plenty of documentation, articles and presentations that
    explain how this can make very efficient code... but nothing
    will answer this question: do I really want to be efficiently
    wrong?

    I mean, yes I would find the problem, thanks to my 100% coverage
    unit testing, but couldn't the compiler give me a hint?

    Could someone drive me into this reasoning? I know there is a lot
    of thinking behind it, yet everything seems to me very incorrect!
    I'm in deep cognitive dissonance here! :) Help!

    The important thing to realize is that the fundamental issue here
    is not a technical question but a social question. In effect what
    you are asking is "why doesn't gcc (or clang, or whatever) do what
    I want or expect?". The answer is different people want or expect
    different things. For some people the behavior described is
    egregiously wrong and must be corrected immediately. For other
    people the compiler is acting just as they think it should,
    nothing to see here, just fix the code and move on to the next
    bug. Different people have different priorities.


    I have hard time imagining sort of people that would have objections in
    case compiler generates the same code as today, but issues diagnostic.

    If false positives occur for the diagnostic frequently, there
    will be legitimate complaint.

    If there is only a simple switch for it, it will get turned off
    and then it no longer serves its purpose of catching errors.

    There are all kinds of optimizations compilers commonly do that could
    also be erroneous situations. For instance, eliminating dead code.

    // code portable among several types of systems:

    switch (sizeof var) {
    case 2: ...
    case 4: ...
    case 8: ...
    }

    sizeof var is a compile time constant expected to be 2, 4 or 8 bytes.
    The other cases are unreachable code.

    Suppose every time the compiler eliminates unreachable code, it
    issues a diagnostic "foo.c:42: 3 lines of unreachable code removed".

    That would be annoying when the programmer knows about dead code
    elimination and is counting on it.

    We also have to consider that not all code is written directly by hand.

    Code generation techniques (including macros) can produce "weird" code
    in some of their corner cases. The code is correct, and it would take
    more complexity to identify those cases and generate more idiomatic
    code; it is left to the compiler to clean up.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Sat Jan 10 18:19:05 2026
    From Newsgroup: comp.lang.c

    On Fri, 9 Jan 2026 20:14:04 -0000 (UTC)
    Kaz Kylheku <046-301-5902@kylheku.com> wrote:

    On 2026-01-09, Michael S <already5chosen@yahoo.com> wrote:
    On Fri, 09 Jan 2026 01:42:53 -0800
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    highcrew <high.crew3868@fastmail.com> writes:

    Hello,

    While I consider myself reasonably good as C programmer, I still
    have difficulties in understanding undefined behavior.
    I wonder if anyone in this NG could help me.

    Let's take an example. There's plenty here:
    https://en.cppreference.com/w/c/language/behavior.html
    So let's focus on https://godbolt.org/z/48bn19Tsb

    For the lazy, I report it here:

    int table[4] = {0};
    int exists_in_table(int v)
    {
    // return true in one of the first 4 iterations
    // or UB due to out-of-bounds access
    for (int i = 0; i <= 4; i++) {
    if (table[i] == v) return 1;
    }
    return 0;
    }

    This is compiled (with no warning whatsoever) into:

    exists_in_table:
    mov eax, 1
    ret
    table:
    .zero 16


    Well, this is *obviously* wrong. And sure, so is the original
    code, but I find it hard to think that the compiler isn't able
    to notice it, given that it is even "exploiting" it to produce
    very efficient code.

    I understand the formalism: the resulting assembly is formally
    "correct", in that UB implies that anything can happen.
    Yet I can't think of any situation where the resulting assembly
    could be considered sensible. The compiled function will
    basically return 1 for any input, and the final program will be
    buggy.

    Wouldn't it be more sensible to have a compilation error, or
    at least a warning? The compiler will be happy even with -Wall
    -Wextra -Werror.

    There's plenty of documentation, articles and presentations that
    explain how this can make very efficient code... but nothing
    will answer this question: do I really want to be efficiently
    wrong?

    I mean, yes I would find the problem, thanks to my 100% coverage
    unit testing, but couldn't the compiler give me a hint?

    Could someone drive me into this reasoning? I know there is a
    lot of thinking behind it, yet everything seems to me very
    incorrect! I'm in deep cognitive dissonance here! :) Help!

    The important thing to realize is that the fundamental issue here
    is not a technical question but a social question. In effect what
    you are asking is "why doesn't gcc (or clang, or whatever) do what
    I want or expect?". The answer is different people want or expect
    different things. For some people the behavior described is
    egregiously wrong and must be corrected immediately. For other
    people the compiler is acting just as they think it should,
    nothing to see here, just fix the code and move on to the next
    bug. Different people have different priorities.


    I have hard time imagining sort of people that would have
    objections in case compiler generates the same code as today, but
    issues diagnostic.

    If false positives occur for the diagnostic frequently, there
    will be legitimate complaint.

    If there is only a simple switch for it, it will get turned off
    and then it no longer serves its purpose of catching errors.

    There are all kinds of optimizations compilers commonly do that could
    also be erroneous situations. For instance, eliminating dead code.

    // code portable among several types of systems:

    switch (sizeof var) {
    case 2: ...
    case 4: ...
    case 8: ...
    }

    sizeof var is a compile time constant expected to be 2, 4 or 8 bytes.
    The other cases are unreachable code.

    Suppose every time the compiler eliminates unreachable code, it
    issues a diagnostic "foo.c:42: 3 lines of unreachable code removed".

    That would be annoying when the programmer knows about dead code
    elimination and is counting on it.

    We also have to consider that not all code is written directly by
    hand.

    Code generation techniques (including macros) can produce "weird" code
    in some of their corner cases. The code is correct, and it would take
    more complexity to identify those cases and generate more idiomatic
    code; it is left to the compiler to clean up.



    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Sat Jan 10 18:41:06 2026
    From Newsgroup: comp.lang.c

    On Fri, 9 Jan 2026 20:14:04 -0000 (UTC)
    Kaz Kylheku <046-301-5902@kylheku.com> wrote:

    On 2026-01-09, Michael S <already5chosen@yahoo.com> wrote:
    On Fri, 09 Jan 2026 01:42:53 -0800
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:


    The important thing to realize is that the fundamental issue here
    is not a technical question but a social question. In effect what
    you are asking is "why doesn't gcc (or clang, or whatever) do what
    I want or expect?". The answer is different people want or expect
    different things. For some people the behavior described is
    egregiously wrong and must be corrected immediately. For other
    people the compiler is acting just as they think it should,
    nothing to see here, just fix the code and move on to the next
    bug. Different people have different priorities.


    I have hard time imagining sort of people that would have
    objections in case compiler generates the same code as today, but
    issues diagnostic.

    If false positives occur for the diagnostic frequently, there
    will be legitimate complaint.

    If there is only a simple switch for it, it will get turned off
    and then it no longer serves its purpose of catching errors.

    There are all kinds of optimizations compilers commonly do that could
    also be erroneous situations. For instance, eliminating dead code.


    <snip>

    I am not talking about some general abstraction, but about specific
    case.
    You example is irrelevant.
    -Warray-bounds exists for a long time.
    -Warray-bounds=1 is a part of -Wall set.
    Message 'array subscript nnn is above array bounds' fits this
    particular case as well as any other case when compiler does not forget
    to issue it.
    Defending gcc behavior of not issuing the enabled warning in situation
    where compiler certainly detected out of bound access sounds like
    Stockholm syndrome.











    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Tristan Wibberley@tristan.wibberley+netnews2@alumni.manchester.ac.uk to comp.lang.c on Sat Jan 10 17:08:43 2026
    From Newsgroup: comp.lang.c

    On 09/01/2026 20:14, Kaz Kylheku wrote:
    If there is only a simple switch for it, it will get turned off
    and then it no longer serves its purpose of catching errors.


    But it might still serve its purpose for assigning criminal liability.
    --
    Tristan Wibberley

    The message body is Copyright (C) 2026 Tristan Wibberley except
    citations and quotations noted. All Rights Reserved except that you may,
    of course, cite it academically giving credit to me, distribute it
    verbatim as part of a usenet system or its archives, and use it to
    promote my greatness and general superiority without misrepresentation
    of my opinions other than my opinion of my greatness and general
    superiority which you _may_ misrepresent. You definitely MAY NOT train
    any production AI system with it but you may train experimental AI that
    will only be used for evaluation of the AI methods it implements.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sun Jan 11 11:48:08 2026
    From Newsgroup: comp.lang.c

    Michael S <already5chosen@yahoo.com> writes:

    On Fri, 09 Jan 2026 01:42:53 -0800
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    highcrew <high.crew3868@fastmail.com> writes:

    Hello,

    While I consider myself reasonably good as C programmer, I still
    have difficulties in understanding undefined behavior.
    I wonder if anyone in this NG could help me.

    Let's take an example. There's plenty here:
    https://en.cppreference.com/w/c/language/behavior.html
    So let's focus on https://godbolt.org/z/48bn19Tsb

    For the lazy, I report it here:

    int table[4] = {0};
    int exists_in_table(int v)
    {
    // return true in one of the first 4 iterations
    // or UB due to out-of-bounds access
    for (int i = 0; i <= 4; i++) {
    if (table[i] == v) return 1;
    }
    return 0;
    }

    This is compiled (with no warning whatsoever) into:

    exists_in_table:
    mov eax, 1
    ret
    table:
    .zero 16


    Well, this is *obviously* wrong. And sure, so is the original code,
    but I find it hard to think that the compiler isn't able to notice
    it, given that it is even "exploiting" it to produce very efficient
    code.

    I understand the formalism: the resulting assembly is formally
    "correct", in that UB implies that anything can happen.
    Yet I can't think of any situation where the resulting assembly
    could be considered sensible. The compiled function will
    basically return 1 for any input, and the final program will be
    buggy.

    Wouldn't it be more sensible to have a compilation error, or
    at least a warning? The compiler will be happy even with -Wall
    -Wextra -Werror.

    There's plenty of documentation, articles and presentations that
    explain how this can make very efficient code... but nothing
    will answer this question: do I really want to be efficiently
    wrong?

    I mean, yes I would find the problem, thanks to my 100% coverage
    unit testing, but couldn't the compiler give me a hint?

    Could someone drive me into this reasoning? I know there is a lot
    of thinking behind it, yet everything seems to me very incorrect!
    I'm in deep cognitive dissonance here! :) Help!

    The important thing to realize is that the fundamental issue here
    is not a technical question but a social question. In effect what
    you are asking is "why doesn't gcc (or clang, or whatever) do what
    I want or expect?". The answer is different people want or expect
    different things. For some people the behavior described is
    egregiously wrong and must be corrected immediately. For other
    people the compiler is acting just as they think it should,
    nothing to see here, just fix the code and move on to the next
    bug. Different people have different priorities.

    I have hard time imagining sort of people that would have objections
    in case compiler generates the same code as today, but issues
    diagnostic.

    It depends on what the tradeoffs are. For example, given a
    choice, I would rather have an option to prevent this particular
    death-by-UB optimization than an option to issue a diagnostic.
    Having both costs more effort than having just only one.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Sun Jan 11 22:52:56 2026
    From Newsgroup: comp.lang.c

    On Sun, 11 Jan 2026 11:48:08 -0800
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Fri, 09 Jan 2026 01:42:53 -0800
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    highcrew <high.crew3868@fastmail.com> writes:

    Hello,

    While I consider myself reasonably good as C programmer, I still
    have difficulties in understanding undefined behavior.
    I wonder if anyone in this NG could help me.

    Let's take an example. There's plenty here:
    https://en.cppreference.com/w/c/language/behavior.html
    So let's focus on https://godbolt.org/z/48bn19Tsb

    For the lazy, I report it here:

    int table[4] = {0};
    int exists_in_table(int v)
    {
    // return true in one of the first 4 iterations
    // or UB due to out-of-bounds access
    for (int i = 0; i <= 4; i++) {
    if (table[i] == v) return 1;
    }
    return 0;
    }

    This is compiled (with no warning whatsoever) into:

    exists_in_table:
    mov eax, 1
    ret
    table:
    .zero 16


    Well, this is *obviously* wrong. And sure, so is the original
    code, but I find it hard to think that the compiler isn't able to
    notice it, given that it is even "exploiting" it to produce very
    efficient code.

    I understand the formalism: the resulting assembly is formally
    "correct", in that UB implies that anything can happen.
    Yet I can't think of any situation where the resulting assembly
    could be considered sensible. The compiled function will
    basically return 1 for any input, and the final program will be
    buggy.

    Wouldn't it be more sensible to have a compilation error, or
    at least a warning? The compiler will be happy even with -Wall
    -Wextra -Werror.

    There's plenty of documentation, articles and presentations that
    explain how this can make very efficient code... but nothing
    will answer this question: do I really want to be efficiently
    wrong?

    I mean, yes I would find the problem, thanks to my 100% coverage
    unit testing, but couldn't the compiler give me a hint?

    Could someone drive me into this reasoning? I know there is a lot
    of thinking behind it, yet everything seems to me very incorrect!
    I'm in deep cognitive dissonance here! :) Help!

    The important thing to realize is that the fundamental issue here
    is not a technical question but a social question. In effect what
    you are asking is "why doesn't gcc (or clang, or whatever) do what
    I want or expect?". The answer is different people want or expect
    different things. For some people the behavior described is
    egregiously wrong and must be corrected immediately. For other
    people the compiler is acting just as they think it should,
    nothing to see here, just fix the code and move on to the next
    bug. Different people have different priorities.

    I have hard time imagining sort of people that would have objections
    in case compiler generates the same code as today, but issues
    diagnostic.

    It depends on what the tradeoffs are. For example, given a
    choice, I would rather have an option to prevent this particular
    death-by-UB optimization than an option to issue a diagnostic.
    Having both costs more effort than having just only one.

    Me too.
    But there are limits to what considered negotiable by worshippers of
    nasal demons and what is beyond that. Warning is negotiable, turning
    off the transformation is most likely beyond.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Sun Jan 11 22:53:53 2026
    From Newsgroup: comp.lang.c

    Michael S <already5chosen@yahoo.com> writes:
    [...]
    But there are limits to what considered negotiable by worshippers of
    nasal demons and what is beyond that. Warning is negotiable, turning
    off the transformation is most likely beyond.

    Your use of the word "worshippers" suggests a misunderstanding on
    your part.

    I certainly do not "worship" anything about C. I don't think
    anyone else you've been talking to does either. I have a pretty
    good understanding of it. There are plenty of things I don't
    particularly like.

    In the vast majority of my posts here, I simply try to explain what
    the standard actually says and offer advice based on that.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Mon Jan 12 11:44:43 2026
    From Newsgroup: comp.lang.c

    On Sun, 11 Jan 2026 22:53:53 -0800
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:
    [...]
    But there are limits to what considered negotiable by worshippers of
    nasal demons and what is beyond that. Warning is negotiable, turning
    off the transformation is most likely beyond.

    Your use of the word "worshippers" suggests a misunderstanding on
    your part.

    I certainly do not "worship" anything about C. I don't think
    anyone else you've been talking to does either. I have a pretty
    good understanding of it. There are plenty of things I don't
    particularly like.

    In the vast majority of my posts here, I simply try to explain what
    the standard actually says and offer advice based on that.


    About my personal vocabulary.

    Normally phrase "worshippers of nasal demons" in my posts refers to
    faction among developers and maintainers of gcc and clang compilers. I
    think that it's not an unusual use of the phrase, but I can be wrong
    about it.

    AFAIK, you are not gcc or clang maintainer. So, not a "worshipper".
    When I want to characterize [in derogatory fashion] people that have no
    direct influence on behavior of common software tools, but share the
    attitude of "worshippers" toward UBs then I use phrase 'language
    lawyers'.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Mon Jan 12 16:28:57 2026
    From Newsgroup: comp.lang.c

    On Thu, 1 Jan 2026 22:54:05 +0100
    highcrew <high.crew3868@fastmail.com> wrote:

    Hello,

    While I consider myself reasonably good as C programmer, I still
    have difficulties in understanding undefined behavior.
    I wonder if anyone in this NG could help me.

    Let's take an example. There's plenty here: https://en.cppreference.com/w/c/language/behavior.html
    So let's focus on https://godbolt.org/z/48bn19Tsb

    For the lazy, I report it here:

    int table[4] = {0};
    int exists_in_table(int v)
    {
    // return true in one of the first 4 iterations
    // or UB due to out-of-bounds access
    for (int i = 0; i <= 4; i++) {
    if (table[i] == v) return 1;
    }
    return 0;
    }

    This is compiled (with no warning whatsoever) into:

    exists_in_table:
    mov eax, 1
    ret
    table:
    .zero 16


    Well, this is *obviously* wrong. And sure, so is the original code,
    but I find it hard to think that the compiler isn't able to notice it,
    given that it is even "exploiting" it to produce very efficient code.

    I understand the formalism: the resulting assembly is formally
    "correct", in that UB implies that anything can happen.
    Yet I can't think of any situation where the resulting assembly
    could be considered sensible. The compiled function will
    basically return 1 for any input, and the final program will be
    buggy.

    Wouldn't it be more sensible to have a compilation error, or
    at least a warning? The compiler will be happy even with -Wall
    -Wextra -Werror.

    There's plenty of documentation, articles and presentations that
    explain how this can make very efficient code... but nothing
    will answer this question: do I really want to be efficiently
    wrong?

    I mean, yes I would find the problem, thanks to my 100% coverage
    unit testing, but couldn't the compiler give me a hint?

    Could someone drive me into this reasoning? I know there is a lot of
    thinking behind it, yet everything seems to me very incorrect!
    I'm in deep cognitive dissonance here! :) Help!


    On related note.


    struct bar1 {
    int table[4];
    int other_table[4];
    };

    struct bar2 {
    int other_table[4];
    int table[4];
    };

    int foo1(struct bar1* p, int v)
    {
    for (int i = 0; i <= 4; ++i)
    if (p->table[i] == v)
    return 1;
    return 0;
    }


    int foo2(struct bar2* p, int v)
    {
    for (int i = 0; i <= 4; ++i)
    if (p->table[i] == v)
    return 1;
    return 0;
    }

    According to C Standard, access to p->table[4] in foo1() is UB.
    [O.T.]
    I want to use language (or, better, standardize dialect of C) in which
    behavior in this case is defined, but I am bad at influencing other
    people. So can not get what I want.
    [/O.T.]

    Now the question.
    What The Standard says about foo2() ? Is there UB in foo2() as well?
    gcc code generator does not think so.

    .file "ub.c"
    .text
    .p2align 4
    .globl foo1
    .def foo1; .scl 2; .type
    32; .endef .seh_proc foo1
    foo1:
    .seh_endprologue
    movl $1, %eax
    ret
    .seh_endproc
    .p2align 4
    .globl foo2
    .def foo2; .scl 2; .type
    32; .endef .seh_proc foo2
    foo2:
    .seh_endprologue
    leaq 16(%rcx), %rax
    addq $36, %rcx
    .L5:
    cmpl %edx, (%rax)
    je .L6
    addq $4, %rax
    cmpq %rcx, %rax
    jne .L5
    xorl %eax, %eax
    ret
    .p2align 4,,10
    .p2align 3
    .L6:
    movl $1, %eax
    ret
    .seh_endproc
    .ident "GCC: (Rev8, Built by MSYS2 project) 15.2.0"
















    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From bart@bc@freeuk.com to comp.lang.c on Mon Jan 12 15:58:15 2026
    From Newsgroup: comp.lang.c

    On 12/01/2026 14:28, Michael S wrote:
    On Thu, 1 Jan 2026 22:54:05 +0100

    On related note.


    struct bar1 {
    int table[4];
    int other_table[4];
    };

    struct bar2 {
    int other_table[4];
    int table[4];
    };

    int foo1(struct bar1* p, int v)
    {
    for (int i = 0; i <= 4; ++i)
    if (p->table[i] == v)
    return 1;
    return 0;
    }


    int foo2(struct bar2* p, int v)
    {
    for (int i = 0; i <= 4; ++i)
    if (p->table[i] == v)
    return 1;
    return 0;
    }

    According to C Standard, access to p->table[4] in foo1() is UB.
    [O.T.]
    I want to use language (or, better, standardize dialect of C) in which behavior in this case is defined, but I am bad at influencing other
    people. So can not get what I want.
    [/O.T.]


    So you want to deliberately read one element past the end because you
    know it will be the first element of other_table?

    I think then it would be better writing it like this:

    struct bar1 {
    union {
    struct {
    int table[4];
    int other_table[4];
    };
    int xtable[8];
    };
    };

    int foo1(struct bar1* p, int v)
    {
    for (int i = 0; i <= 4; ++i)
    if (p->xtable[i] == v)
    return 1;
    return 0;
    }

    At least your intent is signaled to whomever is reading your code.

    But I don't know if UB goes away, if you intend writing to .table and .other_table, and reading those values via .xtable (I can't remember the rules).

    I'm not even sure about there being no padding between .table and .other_table.

    (In my systems language, the behaviour of your original foo1, in an
    equivalent program, is well-defined. But not of foo2, given that you may
    read some garbage value beyond the struct, which may or may not be
    within valid memory.)


    Now the question.
    What The Standard says about foo2() ? Is there UB in foo2() as well?

    Given that you may be reading garbage as I said, whether it is UB or not
    is irrelevant; your program has a bug.

    Unless you can add extra context which would make that reasonable. For example, the struct is within an array, it's not the last element, so it
    will read the first element of .other_table, and you are doing this
    knowingly rather than through oversight.

    It might well be UB, but that is a separate problem.


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Andrey Tarasevich@noone@noone.net to comp.lang.c on Mon Jan 12 08:03:31 2026
    From Newsgroup: comp.lang.c

    On Mon 1/12/2026 6:28 AM, Michael S wrote:

    According to C Standard, access to p->table[4] in foo1() is UB.
    ...
    Now the question.
    What The Standard says about foo2() ? Is there UB in foo2() as well?

    Yes, in the same sense as in `foo1`.

    gcc code generator does not think so.

    It definitely does. However, since this is the trailing array member of
    the struct, GCC does not want to accidentally suppress the classic
    "struct hack". It assumes that quite possibly the pointer passed to the function points to a struct object allocated through the "struct hack" technique.

    Add an extra field after the trailing array and `foo2` will also fold
    into `return 1`, just like `foo1`.

    Perhaps there's a switch in GCC that would outlaw the classic "struct
    hack"... But in any case, it is not prohibited by default for
    compatibility with pre-C99 code.
    --
    Best regards,
    Andrey
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Mon Jan 12 19:36:52 2026
    From Newsgroup: comp.lang.c

    On Mon, 12 Jan 2026 08:03:31 -0800
    Andrey Tarasevich <noone@noone.net> wrote:

    On Mon 1/12/2026 6:28 AM, Michael S wrote:

    According to C Standard, access to p->table[4] in foo1() is UB.
    ...
    Now the question.
    What The Standard says about foo2() ? Is there UB in foo2() as
    well?

    Yes, in the same sense as in `foo1`.

    gcc code generator does not think so.

    It definitely does.

    Do you have citation from the Standard?

    However, since this is the trailing array member
    of the struct, GCC does not want to accidentally suppress the classic "struct hack". It assumes that quite possibly the pointer passed to
    the function points to a struct object allocated through the "struct
    hack" technique.

    That much I understand myself.
    table plays a role FMA. A lot of code depends on such pattern. It's
    rather standard practice in communication programming. Esp. so in C90,
    when there were no FMA and in C++ where FMA does not exist even today. Production compiler like gcc has really no option except to handle
    it as expected by millions of programmers.

    But I was interested in the "opinion" of C Standard rather than of gcc compiler.
    Is it full nasal UB or merely "implementation-defined behavior"?


    Add an extra field after the trailing array and `foo2` will also fold
    into `return 1`, just like `foo1`.

    Perhaps there's a switch in GCC that would outlaw the classic "struct hack"... But in any case, it is not prohibited by default for
    compatibility with pre-C99 code.


    gcc indeed has something of this sort : -fstrict-flex-arrays=3
    But at the moment it does not appear to affect code generation [in this particular example].



    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Mon Jan 12 20:08:21 2026
    From Newsgroup: comp.lang.c

    On Mon, 12 Jan 2026 15:58:15 +0000
    bart <bc@freeuk.com> wrote:

    On 12/01/2026 14:28, Michael S wrote:
    On Thu, 1 Jan 2026 22:54:05 +0100

    On related note.


    struct bar1 {
    int table[4];
    int other_table[4];
    };

    struct bar2 {
    int other_table[4];
    int table[4];
    };

    int foo1(struct bar1* p, int v)
    {
    for (int i = 0; i <= 4; ++i)
    if (p->table[i] == v)
    return 1;
    return 0;
    }


    int foo2(struct bar2* p, int v)
    {
    for (int i = 0; i <= 4; ++i)
    if (p->table[i] == v)
    return 1;
    return 0;
    }

    According to C Standard, access to p->table[4] in foo1() is UB.
    [O.T.]
    I want to use language (or, better, standardize dialect of C) in
    which behavior in this case is defined, but I am bad at influencing
    other people. So can not get what I want.
    [/O.T.]


    So you want to deliberately read one element past the end because you
    know it will be the first element of other_table?


    Yes. I primarily want it for multi-dimensional arrays. Making the same
    pattern defined in 'struct' is less important in practice, but desirable
    for consistency between arrays and structures.

    I think then it would be better writing it like this:

    struct bar1 {
    union {
    struct {
    int table[4];
    int other_table[4];
    };
    int xtable[8];
    };
    };

    int foo1(struct bar1* p, int v)
    {
    for (int i = 0; i <= 4; ++i)
    if (p->xtable[i] == v)
    return 1;
    return 0;
    }

    At least your intent is signaled to whomever is reading your code.


    If were use language or dialect in which the behavior is defined, why
    would you consider the second variant better?
    I don't mean in this particular very simplified example, but generally,
    where layout is more complicated.

    But I don't know if UB goes away, if you intend writing to .table and .other_table, and reading those values via .xtable (I can't remember
    the rules).

    I'm not even sure about there being no padding between .table and .other_table.

    Considering that they both 'int' I don't think that it could happen,
    even in standard C. In "my" dialect, padding in such situation can be explicitly disallowed by the standard.


    (In my systems language, the behaviour of your original foo1, in an equivalent program, is well-defined. But not of foo2, given that you
    may read some garbage value beyond the struct, which may or may not
    be within valid memory.)


    Now the question.
    What The Standard says about foo2() ? Is there UB in foo2() as
    well?

    Given that you may be reading garbage as I said, whether it is UB or
    not is irrelevant; your program has a bug.

    Whether there is bug or not depends on what caller passed to foo2().
    There are great many programs around that do similar things and contain
    no bugs. Most typically, caller creates argument p by casting of char
    array that is long enough for table member to hold more than 4
    elements.
    Without seeing code on the caller's site we could only guess, due to
    suspect way the code is written, that there is bug. But we can't be
    sure.


    Unless you can add extra context which would make that reasonable.
    For example, the struct is within an array, it's not the last
    element, so it will read the first element of .other_table, and you
    are doing this knowingly rather than through oversight.

    It might well be UB, but that is a separate problem.




    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Mon Jan 12 20:02:20 2026
    From Newsgroup: comp.lang.c

    Michael S <already5chosen@yahoo.com> writes:
    On Mon, 12 Jan 2026 15:58:15 +0000
    bart <bc@freeuk.com> wrote:

    On 12/01/2026 14:28, Michael S wrote:
    On Thu, 1 Jan 2026 22:54:05 +0100

    On related note.


    struct bar1 {
    int table[4];
    int other_table[4];
    };

    struct bar2 {
    int other_table[4];
    int table[4];
    };

    int foo1(struct bar1* p, int v)
    {
    for (int i = 0; i <= 4; ++i)
    if (p->table[i] == v)
    return 1;
    return 0;
    }


    int foo2(struct bar2* p, int v)
    {
    for (int i = 0; i <= 4; ++i)
    if (p->table[i] == v)
    return 1;
    return 0;
    }

    According to C Standard, access to p->table[4] in foo1() is UB.
    [O.T.]
    I want to use language (or, better, standardize dialect of C) in
    which behavior in this case is defined, but I am bad at influencing
    other people. So can not get what I want.
    [/O.T.]


    So you want to deliberately read one element past the end because you
    know it will be the first element of other_table?


    Yes. I primarily want it for multi-dimensional arrays.

    So declare it as int table[4][4].

    $ cat /tmp/a.c
    #include <stdio.h>
    int table[4][4] = { {1,2,3,4}, {5,6,7,8}, {9, 10, 11, 12}, {13, 14, 15, 16} };

    int main(int argc, const char **argv, const char **envp)
    {

    printf("%u\n", table[3][2]);
    return 0;
    }
    $ cc -o /tmp/a /tmp/a.c
    $ /tmp/a
    15

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Jan 12 12:03:36 2026
    From Newsgroup: comp.lang.c

    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 12 Jan 2026 08:03:31 -0800
    Andrey Tarasevich <noone@noone.net> wrote:

    On Mon 1/12/2026 6:28 AM, Michael S wrote:

    According to C Standard, access to p->table[4] in foo1() is UB.
    ...
    Now the question.
    What The Standard says about foo2() ? Is there UB in foo2() as
    well?

    Yes, in the same sense as in `foo1`.

    gcc code generator does not think so.

    It definitely does.

    Right.

    Do you have citation from the Standard?

    The short answer is section 6.5.6 paragraph 8.

    There is amplification in Annex J.2, roughly three pages
    after the start of J.2. You can search for "an array
    subscript is out of range", where there is a clarifying
    example.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Mon Jan 12 22:41:07 2026
    From Newsgroup: comp.lang.c

    On Mon, 12 Jan 2026 12:03:36 -0800
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 12 Jan 2026 08:03:31 -0800
    Andrey Tarasevich <noone@noone.net> wrote:

    On Mon 1/12/2026 6:28 AM, Michael S wrote:

    According to C Standard, access to p->table[4] in foo1() is UB.
    ...
    Now the question.
    What The Standard says about foo2() ? Is there UB in foo2() as
    well?

    Yes, in the same sense as in `foo1`.

    gcc code generator does not think so.

    It definitely does.

    Right.


    May be. But it's not expressed by gcc code generator or by any wranings.
    So, how can we know?

    Do you have citation from the Standard?

    The short answer is section 6.5.6 paragraph 8.


    I am reading N3220 draft https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf
    Here section 6.5.6 has no paragraph 8 :(

    There is amplification in Annex J.2, roughly three pages
    after the start of J.2. You can search for "an array
    subscript is out of range", where there is a clarifying
    example.

    I see the following text:
    "An array subscript is out of range, even if an object is apparently
    accessible with the given subscript (as in the lvalue expression
    a[1][7] given the declaration int a[4][5]) (6.5.7)."

    That's what you had in mind?





    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From James Russell Kuyper Jr.@jameskuyper@alumni.caltech.edu to comp.lang.c on Mon Jan 12 20:29:40 2026
    From Newsgroup: comp.lang.c

    On 2026-01-12 04:44, Michael S wrote:
    ...
    Normally phrase "worshippers of nasal demons" in my posts refers to
    faction among developers and maintainers of gcc and clang compilers. I
    think that it's not an unusual use of the phrase, but I can be wrong
    about it.

    Which faction would that be? I'm sure there's more than one to choose
    from. An example of what they've done that, in your opinion, justifies
    that description might also be helpful

    ...
    AFAIK, you are not gcc or clang maintainer. So, not a "worshipper".
    When I want to characterize [in derogatory fashion] people that have no direct influence on behavior of common software tools, but share the
    attitude of "worshippers" toward UBs then I use phrase 'language lawyers'."language lawyers", at least, I understand, having frequently been
    described as one myself. It means those who are knowledgeable about what
    the standard allows and prohibits, both for programs and for
    implementations. I'm no sure why you'd consider them "worshippers" of
    UB; they are characterized as language lawyers because they know
    precisely when the behavior is or is not UB - but that says nothing
    about whether they approve of UB or not. They would still be language
    lawyers whether they approved of UB, or despised it.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From James Russell Kuyper Jr.@jameskuyper@alumni.caltech.edu to comp.lang.c on Mon Jan 12 20:35:09 2026
    From Newsgroup: comp.lang.c

    On 12/01/2026 14:28, Michael S wrote:
    On Thu, 1 Jan 2026 22:54:05 +0100

    On related note.


    struct bar1 {
    int table[4];
    int other_table[4];
    };

    struct bar2 {
    int other_table[4];
    int table[4];
    };

    int foo1(struct bar1* p, int v)
    {
    for (int i = 0; i <= 4; ++i)
    if (p->table[i] == v)
    return 1;
    return 0;
    }


    int foo2(struct bar2* p, int v)
    {
    for (int i = 0; i <= 4; ++i)
    if (p->table[i] == v)
    return 1;
    return 0;
    }

    According to C Standard, access to p->table[4] in foo1() is UB.
    [O.T.]
    I want to use language (or, better, standardize dialect of C) in which behavior in this case is defined, but I am bad at influencing other
    people. So can not get what I want.

    OK - so how do you want it to be defined? I've used languages where
    table[n] for n>3 would have exactly the same effect as table[3], and
    table[n] for n<0 would have exactly the same effect as table[0]. I've
    seen algorithms that were actually simplified by relying upon this behavior. --- Synchronet 3.21a-Linux NewsLink 1.2
  • From James Russell Kuyper Jr.@jameskuyper@alumni.caltech.edu to comp.lang.c on Mon Jan 12 21:09:25 2026
    From Newsgroup: comp.lang.c

    On 2026-01-12 15:02, Scott Lurndal wrote:
    Michael S <already5chosen@yahoo.com> writes:
    On Mon, 12 Jan 2026 15:58:15 +0000
    bart <bc@freeuk.com> wrote:

    On 12/01/2026 14:28, Michael S wrote:
    ...
    struct bar1 {
    int table[4];
    int other_table[4];
    };
    ...
    So you want to deliberately read one element past the end because you
    know it will be the first element of other_table?


    Yes. I primarily want it for multi-dimensional arrays.

    So declare it as int table[4][4].


    Note that this suggestion does not make the behavior defined. It is
    undefined behavior to make dereference table[0]+4, and it is undefined behavior to make any use of table[0]+5.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Tue Jan 13 09:12:14 2026
    From Newsgroup: comp.lang.c

    On 12/01/2026 21:41, Michael S wrote:
    On Mon, 12 Jan 2026 12:03:36 -0800
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 12 Jan 2026 08:03:31 -0800
    Andrey Tarasevich <noone@noone.net> wrote:

    On Mon 1/12/2026 6:28 AM, Michael S wrote:

    According to C Standard, access to p->table[4] in foo1() is UB.
    ...
    Now the question.
    What The Standard says about foo2() ? Is there UB in foo2() as
    well?

    Yes, in the same sense as in `foo1`.

    gcc code generator does not think so.

    It definitely does.

    Right.


    May be. But it's not expressed by gcc code generator or by any wranings.
    So, how can we know?

    Do you have citation from the Standard?

    The short answer is section 6.5.6 paragraph 8.


    I am reading N3220 draft https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf
    Here section 6.5.6 has no paragraph 8 :(


    The C standards managed to keep section numbers and even paragraph
    numbers consistent between versions for a long time, but there are a
    number of differences in C23. 6.5.6p8 in, for example, C11, is 6.5.7p9
    in N3220. (N3220 is an early draft of the next Cy version, and is far
    from complete. The best C23 draft is N3096, where the relevant
    paragraph is 6.5.6p9.)

    There is amplification in Annex J.2, roughly three pages
    after the start of J.2. You can search for "an array
    subscript is out of range", where there is a clarifying
    example.

    I see the following text:
    "An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression
    a[1][7] given the declaration int a[4][5]) (6.5.7)."

    That's what you had in mind?


    I can't read Tim's mind, but it is certainly an example that /I/ think
    is pretty clear. The list of undefined behaviours in J.2 is
    non-normative (meaning it does not define the rules of the language, it
    just tries to explain them or list them), and not complete (lots of
    things are UB without being listed, simply because the standard does not define behaviours for them). But the list in J.2 can be a very useful
    summary of UB's, and it can be easier to follow than to understand the referenced sections in the normative text.


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Tue Jan 13 11:07:27 2026
    From Newsgroup: comp.lang.c

    On Mon, 12 Jan 2026 20:35:09 -0500
    "James Russell Kuyper Jr." <jameskuyper@alumni.caltech.edu> wrote:

    On 12/01/2026 14:28, Michael S wrote:
    On Thu, 1 Jan 2026 22:54:05 +0100

    On related note.


    struct bar1 {
    int table[4];
    int other_table[4];
    };

    struct bar2 {
    int other_table[4];
    int table[4];
    };

    int foo1(struct bar1* p, int v)
    {
    for (int i = 0; i <= 4; ++i)
    if (p->table[i] == v)
    return 1;
    return 0;
    }


    int foo2(struct bar2* p, int v)
    {
    for (int i = 0; i <= 4; ++i)
    if (p->table[i] == v)
    return 1;
    return 0;
    }

    According to C Standard, access to p->table[4] in foo1() is UB.
    [O.T.]
    I want to use language (or, better, standardize dialect of C) in
    which behavior in this case is defined, but I am bad at influencing
    other people. So can not get what I want.

    OK - so how do you want it to be defined? I've used languages where
    table[n] for n>3 would have exactly the same effect as table[3], and table[n] for n<0 would have exactly the same effect as table[0]. I've
    seen algorithms that were actually simplified by relying upon this
    behavior.

    I want "my" dialect to be based on abstract machine with flat memory
    model. All variables, except for automatic variables which address
    was never taken by the program, are laid upon one big implicit
    array of char.
    For my purposes, Harvard abstract machine is sufficient.
    I am sure that there are multiple people that would want option for Von
    Neumann abstract machine, i.e. for program code to be laid over the same implicit array as variables, with as many things defined in the
    standard as practically possible. My aspirations do not go that far.

    In specific case of 'struct bar1', it means that I want p->table[4:7] to
    be absolute equivalents of p->other_table[0:3]. For p->table[n] where n
    < 0 or n > 7, I want generated code to access respective locations in
    implicit underlying array. Whether resulting behavior defined or
    undefined would depend on the specifics of the caller.

    If you say that "my" dialect is less optimizable than Standard C then
    my answer is "Yes, I know and I don't care".

    If you say that "my" dialect removes certain potential for detection of
    buffer overflows by compiler then my answer is "Generally, yes, and it's
    not great, but I consider it a fair price.". Pay attention that there
    are still plenty of places where compiler can warn, like in majority of automatic and static arrays. In other situations bound checking can be
    enabled at spot by special attribute.




    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.lang.c on Tue Jan 13 11:31:55 2026
    From Newsgroup: comp.lang.c

    On Mon, 12 Jan 2026 21:09:25 -0500
    "James Russell Kuyper Jr." <jameskuyper@alumni.caltech.edu> wrote:

    On 2026-01-12 15:02, Scott Lurndal wrote:
    Michael S <already5chosen@yahoo.com> writes:
    On Mon, 12 Jan 2026 15:58:15 +0000
    bart <bc@freeuk.com> wrote:

    On 12/01/2026 14:28, Michael S wrote:
    ...
    struct bar1 {
    int table[4];
    int other_table[4];
    };
    ...
    So you want to deliberately read one element past the end because
    you know it will be the first element of other_table?


    Yes. I primarily want it for multi-dimensional arrays.

    So declare it as int table[4][4].


    Note that this suggestion does not make the behavior defined. It is undefined behavior to make dereference table[0]+4, and it is
    undefined behavior to make any use of table[0]+5.


    Pay attention that Scott didn't suggest that dereferencing table[0][4]
    in his example is defined.
    Not that I understood what he wanted to suggest :(




    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Andrey Tarasevich@noone@noone.net to comp.lang.c on Tue Jan 13 08:11:15 2026
    From Newsgroup: comp.lang.c

    On Mon 1/12/2026 9:36 AM, Michael S wrote:
    But I was interested in the "opinion" of C Standard rather than of gcc compiler.
    Is it full nasal UB or merely "implementation-defined behavior"?

    It is full nasal UB per the standard. And, of course, it is as "implementation-defined" as any other UB in a sense that the standard
    permits implementations to _extend_ the language in any way they please,
    as long as they don't forget to issue diagnostics when diagnostics are required (by the standard).

    Perhaps there's a switch in GCC that would outlaw the classic "struct
    hack"... But in any case, it is not prohibited by default for
    compatibility with pre-C99 code.


    gcc indeed has something of this sort : -fstrict-flex-arrays=3
    But at the moment it does not appear to affect code generation [in this particular example].

    Yeah... I tried both the command-line setting and the attribute. No
    effect on the code though.
    --
    Best regards,
    Andrey
    --- Synchronet 3.21a-Linux NewsLink 1.2