Yes. Btw, the fix is almost trivial:
```
uint16_t
mul(uint16_t a, uint16_t b)
{
unsigned int aa = a, bb = b;
return aa * bb;
}
```
[...] Victor Yodaiken even wrote
a paper about this: https://arxiv.org/pdf/2201.07845
Bart <bc@freeuk.com> wrote:
On 06/05/2026 20:35, Dan Cross wrote:
In article <10tflij$19d6u$1@dont-email.me>, Bart <bc@freeuk.com> wrote:[...]
This is C:
uint64_t F(uint64_t s, uint64_t t, uint64_t u, uint64_t v) ...
This is my language:
func F(u64 s, t, u, v)u64 ...
All the parameters are the same type. If you change that first u64, they
all change. The parameter names are 's t u v'.
Yes, in this case having single type for parameters is simpler.
They are the same types in the C too, but you have to work harder to
double-check they are in fact identical. If you change that type, then
you have to change it at multiple sites; if you forget one, the compiler
will not tell you.
Well, if there is mismatch, them compiler will tell you. If types
make sense, but are different than intended, then you have trouble.
But this trouble is not different from situation where you need to
change one type, but keep other unchanged. For example, if you
need
uint64_t F(uint64_t s, uint64_t t, uint32_t u, uint64_t v)
In such case C version is easier to modify correctly.
So what happened since the late 70s? We're still doing independent compilation, still doing linking, still use makefiles.
In fact its got a lot more complex rather than simpler.
[...]
On 08/05/2026 22:02, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
Note that:
Let me put numbers to identify the claims:
* There are no C-style header files1.
* There are no separate declarations needed, neither in shared headers,2.
nor as prototypes
* If a module is imported by 50 others, it is processed exactly once3.
* Project info (modules etc) is present in only one module of a project. >>> (Most module schemes require import directives in every module)4.
* That means no external build system is needed.5.
Just that is HUGE.
Well, UCSD/Turbo Pascal, Ada, Modula 2, Oberon all have module system
satisfying 3 and 5. UCSD/Turbo Pascal and Oberon do not have
separate interface files, so satisfy 1 (Ada and Modula 2 have
separate interface files, but they have different nature than
C header files). Oberon satisfies 2. So only possibly novel part
is 4.
Looking at implementation decisions, there were strong voices for
separate interface files. So 1 is at least "subjective". I
would say that 2 is probably "arguably bad idea". Concerning 4,
I routinely use a module system where import directive are
sometimes unnecessary: basicaly 'import' means that names from
given module may be used without qualification. If somebody
decided to use only qualified names, then 'import' is not
necessary. Arguably, information saying from which module to
take given name (and given module may simultaneously use the
same name from multiple modules) is part of source code of
given module. You apparently think differently, but I want
to keep related things together, so want this information
included in module source. So for me your 4 is "arguably
bad idea".
I went through several versions of the module scheme. They all worked,
but with problems.
For example, with one version, each module of a 50-module project say,
would have a rag-tag collection of imports at the top, with whatever
subset of the other 49 was currently needed, that needed constant maintainence.
Then you renamed one module, or combined two, or split one into two, and
you had a lot of editing to do. This kind is very common.
You see the equivalent in C with long collections of header files, but
here you have to generate and maintain the header files too, and you
have to hunt for them within the file system.
Note that all languages that I mention are at roughly similar
level as C, all are more (Oberon) or less (Ada) niche now.
But I think that each of them has more users than your
language (original UCSD Pascal and Turbo Pascal are dead, but
there are Turbo Pascal compatible products in current developement).
Also, users and developers of those languages probably think
that they "run rings around C", but do not come here to complain
how bad C is.
C sits at a particular level and mine is at about the same place.
For example, FreePascal transpiles to C; you don't hear of C transpiling
to FreePascal!
So C is that kind of language, and mine would also be if someone kindly wrote decent compilers for it. But for its main platform, it can also be used the same way (I'm doing that right now).
Anyway, when you come here and propose your language as alternative
to C,
I'm not pushing my languages at all; they are personal. I was replying
to this:
"You keep referring to your "systems language" as evidence that C
is bad. C may be bad; there are even ways that I think that C
_is_ bad. But your opinion is not evidence, and given that you
have shown it to be founded on misconceptions, it is utterly
irrelevant."
I'm showing I can create a decent language, one that has been tried and tested so I know what worked and what didn't. And that enables me to
make an informed comparison with C, which is used in the same space.
then you implicitly claim that your language is better than
Ada, Modula 2, Pascal and a lot of other languages which competed
with C.
Well, they didn't compete with C very well. Where were the real contenders?!
C was informal, and small,
and allowed you great freedom (more than
necessary), but the way it was presented was poor (syntax etc) and now
is very dated (header files and relying too much on its token-based
macros to fix shortcomings).
Anyway, I'm not making a comparison with those. If I hadn't devised my language (say my place of work provided the language to be used), then
most likely I would have been using C, not Pascal or Ada (which was
anyway still in the future).
Even more, since C programmers did not switch to other
languages earlier, your language must be _much_ better than
other ones. Do you realize how grandiose claim it is? Maybe
you do not mean this, but that is impression that you give.
No. I understand that in reality mine is a crappy little one-man
language that should have been put out of its misery at least 25 years
ago. It has no trendy modern features, no docs, no users, no libraries,
no nothing.
But one of the reasons it's still going is because C is, showing there
is still a demand for that class of primitive language. In that case I
know that niche pretty well!
It that case, then yes it is much more polished, is somewhat safer, with fewer quirks and fewer surprises. This is the simplest function pointer
type in C, and in my language:
void(*)(void)
ref proc
and the same type used to declare variable:
void(*fnptr)(void);
ref proc fnptr
and here, an array of 10 of those pointers:
void(*table[10])(void);
[10]ref proc table
But, this is the kicker: to write that last C version, I had to use a
tool to figure where the parentheses and square brackets go. What kind
of HLL is that?!
Unless you're going argue that C's syntax has the edge, then my language
is indeed better.
I'm not claiming it's unique either (eg. my syntax was taken from--
Algol68), but most modern alternatives are bigger and more ambitious.
[...]
Also, if I need to do massive change I would do something like
below:
for A in *.[ch]; do sed 's,file1,file2,' $A > $A.pp; done
for A in *.[ch]; do mv $A.pp $A; done
That is two commands to do simple mass renaming, regardless of
number of files involved. Could be done in one command, but
doing it in two steps give me oportunity to back up or check
things if I have any doubts about correctness of the first
command.
[...]
antispam@fricas.org (Waldek Hebisch) writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Let me address an additional question, which may have been touched
on in other postings although I am not sure of that. What should we
do if we want to safely add signed integers, avoid nasal demons no
matter what, and are content with wrap-around semantics for cases
that "overflow"? Here is a function to do that:
signed
safely_add( signed a, signed b ){
unsigned ua = a, ub = b, u = ua+ub;
return u <= INT_MAX ? (int)u : -INT_MAX + (int)(u-INT_MAX-1) - 1; >>> }
No undefined behavior, no circumstances where ID behavior comes into
play, and gives desired answer in essentially all environments (the
exceptions are environments where UINT_MAX != INT_MAX*2+1, which is
almost non-existent today).
Now, same question, but for multiplication. The answer is almost
exactly the same:
signed
safely_multiply( signed a, signed b ){
unsigned ua = a, ub = b, u = ua*ub;
return u <= INT_MAX ? (int)u : -INT_MAX + (int)(u-INT_MAX-1) - 1; >>> }
Both gcc and clang compile these functions into one operation each
(along with 'ret').
Here is a little test driver folks may want to try:
#include <stdio.h>
int
main(){
for( signed i = -10000; i <= 10000; i++ ){
for( signed j = -10000; j <= 10000; j++ ){
signed p = i*j;
signed q = safely_multiply( i, j );
if( p == q ) continue;
printf( " %6d * %6d = %12d or %12d\n", i, j, p, q );
}
}
printf( " done.\n" );
return 0;
}
Compiling this with -S -O2 may give an amusing result, for those who
want to try it.
I did to try it, but I would expect 'safe_add' and 'safely_multiply'
to produce just single machine instruction for computation,
possibly inlined (at -O3 gcc should inline them, but -O2 is more
conservative).
When I compile with -S, both gcc and clang generate (besides the
retg) one instruction (that being a leal) for safely_add, and two instructions (those being an imull and a movl) for safely_multiply,
at level -O1 or higher.
Without the 'if( p==q ) continue;' test, gcc will inline at -O1 and
higher, and clang will inline at -O2 and higher. After inlining,
both are smart enough to strength-reduce the loop, eliminating the multiplication in favor of an addition.
The amusing result happens when the 'if( p==q ) continue;' test is
left in, at level -O2 or higher. I'm not giving away what happens;
let me just say I was surprised and amused by the result.
On 08/05/2026 17:58, Keith Thompson wrote:
If you have questions about the behavior of gcc's "-fwrapv" option,
a gcc forum is likely to be a better place to ask them.
Yeah, forget it.
All I get is, I can forget about C's UB for signed overflow, and pretend
it works like unsigned overflow, if I stipulate the '-fwrapv' option to
the compiler if it has one.
If it doesn't then it will most likely work like that anyway since it
won't be smart enough to do anything clever.
So the answer to my question is my C source must go hand-in-hand with
some stipulations about how it is built. That alreadys happens with my generated code anyway as it assumes a 64-bit target.
In article <10tls2u$39j7a$1@dont-email.me>, Bart <bc@freeuk.com> wrote:
On 08/05/2026 22:02, Waldek Hebisch wrote:
[snip]
Note that all languages that I mention are at roughly similar
level as C, all are more (Oberon) or less (Ada) niche now.
But I think that each of them has more users than your
language (original UCSD Pascal and Turbo Pascal are dead, but
there are Turbo Pascal compatible products in current developement).
Also, users and developers of those languages probably think
that they "run rings around C", but do not come here to complain
how bad C is.
C sits at a particular level and mine is at about the same place.
For example, FreePascal transpiles to C;
You mean `fpc`? I see no evidence for that. I just looked at https://www.freepascal.org and I see no documentation about the
compiler generating C code; it appears to generate object code
for the target platform, and the software requirements don't
mention a C compiler.
I'm showing I can create a decent language, one that has been tried and
tested so I know what worked and what didn't. And that enables me to
make an informed comparison with C, which is used in the same space.
No. Having a good understanding of C would enable you to make a
good comparison with C. But, again, you haven't demonstrated
that you have a good understanding of C, and you've expressed
negative interest in gaining such understanding, so whatever you
know about your own language is irrelevant.
Bart <bc@freeuk.com> wrote:
On 08/05/2026 22:02, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
Note that:
Let me put numbers to identify the claims:
* There are no C-style header files1.
* There are no separate declarations needed, neither in shared headers, >>>> nor as prototypes2.
* If a module is imported by 50 others, it is processed exactly once3.
* Project info (modules etc) is present in only one module of a project. >>>> (Most module schemes require import directives in every module)4.
* That means no external build system is needed.5.
Just that is HUGE.
Well, UCSD/Turbo Pascal, Ada, Modula 2, Oberon all have module system
satisfying 3 and 5. UCSD/Turbo Pascal and Oberon do not have
separate interface files, so satisfy 1 (Ada and Modula 2 have
separate interface files, but they have different nature than
C header files). Oberon satisfies 2. So only possibly novel part
is 4.
Looking at implementation decisions, there were strong voices for
separate interface files. So 1 is at least "subjective". I
would say that 2 is probably "arguably bad idea". Concerning 4,
I routinely use a module system where import directive are
sometimes unnecessary: basicaly 'import' means that names from
given module may be used without qualification. If somebody
decided to use only qualified names, then 'import' is not
necessary. Arguably, information saying from which module to
take given name (and given module may simultaneously use the
same name from multiple modules) is part of source code of
given module. You apparently think differently, but I want
to keep related things together, so want this information
included in module source. So for me your 4 is "arguably
bad idea".
I went through several versions of the module scheme. They all worked,
but with problems.
For example, with one version, each module of a 50-module project say,
would have a rag-tag collection of imports at the top, with whatever
subset of the other 49 was currently needed, that needed constant
maintainence.
Then you renamed one module, or combined two, or split one into two, and
you had a lot of editing to do. This kind is very common.
In my coding changes to module names happen at least order of
magnitude less frequently than other changes. And when you
need to change module names I do not see how your scheme saves
editing (if you put info in a single file, then you need to
edit one file, but change it in multiple places).
For example, FreePascal transpiles to C; you don't hear of C transpiling
to FreePascal!
I first see such claim in your post. AFAIK Free Pascal always offered home-grown native backed. They used to have (and I probably still have) their own internal linker, so that if you wanted you could directly
generate executable (without creating intermediate .o or assembler
files). So it was closer to what you do than to typical C compiler.
AFAIK now Free Pascal can use LLVM as a backend, but last info
about this I have seen is that using LLVM is optional and that Free
Pascal own backed will be also supported.
More generaly, I do not see any relevance in your "transpiles to C"
argument. Translating from one language to another is common and
valid way to implement a language.
What matter is what kinds of constructs is supported and all languages
that I mention support low-level programming. Ada, Modula 2
and Oberon where used to write operating systems, at least in
case of Modula 2 and Oberon there were no other language involved
(beside some small pieces of assembler). I am not sure if
there was any operating system written in Free Pascal, but
Free Pascal has all constructs needed, so there could be
if anybody wanted such system.
So technically each of languages that I mentioned could be
used to implement full software stack, starting from the
lowest level. If any of them gained enough popularity there
would translators targeting this language.
BTW: IIUC one popular PC "database" system used Modula 2 behind
the scene: it offered its own language which was translated to
Modula 2 which in turn was compiled by Modula 2 compiler to
native code.
Well, they didn't compete with C very well. Where were the real contenders?!
Clearly they lost. IMO, there were strong non-technical reasons.
At purely technical level I see one thing favouring C: C allowed
better machine code from a simple compiler. Less technical
thing is that typical C implementation used operating system
linker, which ensured good interoperation with other languages.
Turbo Pascal insited on using it own liker and "main" program
had to be in Turbo Pascal. Consequently, if you wanted to
offer a library written in Turbo Pascal, such library would
be usable only from Turbo Pascal. IIUC later there were
cometing compilers allowing creation of normal object files.
But no wonder that library writers prefered other lanuages (like
C). Some technically good things lost because of too high price
demanded by vendors. IMO Pascal had problem because there
were several incompatible dialects
C was informal, and small,
Ada was formalized and big, but other were small to. In fact,
Oberon was quite small, but also was late and made a bunch of
bad choices (for example first implementation was for rather
obscure processor).
and allowed you great freedom (more than
I am not sure what "great freedom" means here. Even Ada which
is consdered most strict between languages that I mentioned allows
doing any needed low level tasks.
and here, an array of 10 of those pointers:
void(*table[10])(void);
[10]ref proc table
But, this is the kicker: to write that last C version, I had to use a
tool to figure where the parentheses and square brackets go. What kind
of HLL is that?!
I admit that in case above I would first declare pointer type and
only after that I would declare the array. Not nice, but if you
are bothered by this there are many languages that do not have
this problem.
Unless you're going argue that C's syntax has the edge, then my language
is indeed better.
Well, you apparently are not getting simple thing: C is "good
enough".
That is problems with C are (or at least were) not deal
breakers. Deal breakres are:
- having a compiler for needed target
- possibly quality (speed and size) of object code
- compatiblity with other software (linking, interlanguage calls
and similar)
Ada folks made reasonable argument that Ada gives about twice
productivity compared to C.
In other words, to compete with C language must offer really
large advantage: it is not enough to be better,
That may be
big enough to win. But you should understand that what
matter is whole ecosystem, including extra tools.
In article <10tk4sg$2l19a$2@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:
On 08/05/2026 00:08, Dan Cross wrote:
K&R is a wonderful book for its exposition: well-written,
concise, and the prose is beautiful. Kernighan is an amazing
writer, and Ritchie was well-known for his patience and clear
explainations.
However, it is a product of its time. It dates from a simpler
era, when programmers were expected to use books like it is a
starting point, and subsequently gain mastery either through
careful study of the standard, or extensive practice. (I'm
referring specifically to K&R2, of course, since the first
edition predated the first version of the standard by a decade.)
Machines were smaller and simpler, then, and so were compilers.
I am sad to say that I don't think it has aged particularly well.
I like the way you put that. Sometimes people have a tendency to put
too much reverence on particular texts - such as imagining that K&R says
all that needs to be said about C, and treating any modern tool, text,
standard or program that diverges from it as some kind of heresy, or
"not following the spirit of C". Languages evolve - tools evolve,
programs evolve, standards evolve, requirements evolve. K&R was a
milestone in the history of programming languages, and a model for
technical writing in its field, but C today is not C of fifty years ago.
Thank you. Yes, I pretty much agree.
It is unfortunate that this situation may be UB. I personally think
"unsigned short" should promote to "unsigned int", not "int" -
promotions should be signedness preserving. I don't like the "promote
to int" at all. But opinions don't change the standards, and I suppose
there are historical reasons for the rules that were made here.
But I am not sure I agree that such cases are "easy to stumble into".
How often would code like that be written, where overflowing the
uint16_t would be correct behaviour in the code on a 16-bit int system?
It is certainly possible, but it is perhaps more likely that cases of
overflow in the 16-bit system were also bugs in the code - moving the
code to 32-bit systems could give different undesirable effects from the
bug. It could also happen to remove the effects of the bug by holding
results in 32-bit registers and leading to correct results in later
calculations - UB can work either way.
Sure. This was a bit of a contrived example, but you ask a good
question: how often might one want write code like that?
In short, I don't know, but I can think of any number of hash
functions, checksums, etc, that may be implemented using 16-bit
arithmetic, and I can well see programmers wanting to take
advantage of the modular semantics afforded by using unsigned
types to do so. Every day? Probably not. But often enough.
One of the things I had to really internalize as an OS person is
that the universe of useful existing software is large. It
doesn't matter if I create the most beautiful abstractions for
them that are infinitely superior to whatever swill their code
is using now. If they don't get to run their program (or worse,
they have to make a bunch of invasive changes for no discernable
benefit from their perspective) because I know better about how
things ought to be done, they're not going to use whatever
system I'm working on unless they're forced. But even then they
will resent it and move to something else the first chance they
get (lookin' at you, DEC, Microsoft, IBM, and any number of
commercial Unix vendors).
Whatever _I_ think of how the interfaces they chose to use is
immaterial, making it difficult for them wins me no friends.
This is one of the smart things Torvalds did with Linux: "don't
break userspace" (unless there's a really, really good reason)
probably did a lot to help make Linux popular.
Anyway, I think this is similar. It doesn't matter what anyone
thinks of whether one ought to prevent all overflow; the fact
is that the language supports it for unsigned integers (though
with some surprising semantics for types of lower rank than
`int`) simply is what it is. And if someone has a program that
avails of those semantics, and that program is important to them
for whatever reason, then there's little choice but to hold
one's nose. I know you know this, of course, but I think it's
worth repeating every now and then.
Certainly, however, the fact that this expression could contain UB would
surprise many C programmers.
Yes. Btw, the fix is almost trivial:
```
uint16_t
mul(uint16_t a, uint16_t b)
{
unsigned int aa = a, bb = b;
return aa * bb;
}
```
But if a programmer is not already very familiar with the
language, it may look very odd.
I don't think that this what the authors originally intended
(in fact, I'm quite certain it is not, based on conversations
I've had with them in the past; they very much wanted the
original semantics for integer promotion and did not like those
chosen by the ANSI committee).
K&R has not been updated in almost 40 years, and 40 years ago,
it reflected a very different language, and moreover, reflected
the spirit intended by the original authors.. But, regardless of
the original intent, that is not the language we have _today_.
I just picked my copy up off the shelf. The pages are yellowed
and the corners heavily dogeared; but flipping through it is
like seeing an old friend. Then I put it back on the shelf: you
can never go home again.
You make that sound so sad!
It is bittersweet. I have fond memories of times spent with
that copy of that book. I learned a lot from it, and it had an
outsized role in shaping my career and my development as an
engineer.
I met Dennis Ritchie several times. I think he would be pleased
and satisfied to know how many people look at K&R with fondness
and appreciation, but perhaps moreso how many have outgrown it,
as well. I worked in the same office as Kernighan for a while,
and occasionally ate breakfast with him. I managed to overcome
my embarassment enough one morning and asked him to sign my copy
of K&R1, and I could tell he very much appreciated it; I'm
certain he feels much the way I just described. (Sadly, I never
asked Dennis to sign my copy before he passed away.)
(My copy is in a box in the loft somewhere. I guess it is really one of
these books that should always be on the bookshelf, even if I never look
at it again.)
Absolutely.
To get a taste for the flavor of SML, you may find https://www.cs.cmu.edu/~rwh/isml/book.pdf interesting.
The Fibonacci example that in this "M" language is:
|func fib(n)=
| if n<3 then
| 1
| else
| fib(n-1)+fib(n-2)
| fi
|end
[sic: as a sequence, the Fibonacci numbers are undefined for
$n<0$, but this is a pedagogical example, so let's ignore that]
In SML, this same program could be written as:
```
fun fib(n) =
if n<3 then
1
else
fib(n-1) + fib(n-2)
```
Here I'm trying to follow his style. Somewhat more
idiomatically style, would probably be written like:
```
fun fib n =
if n < 3 then 1
else fib (n - 1) + fib (n - 2)
```
Though typically one would use pattern matching so as to more
closely match the mathemtical definition of the Fibonacci
numbers, expressed as a recurrence relation:
```
fun fib 0 = 1
| fib 1 = 1
| fib n = fib (n - 1) + fib (n - 2)
```
(Origin 0, not 1.)
But note that so far all of these programs are exponential in
both space and time. A more robust version, mirroring Harper's,
that runs in linear time and space is:
```
exception Range of string
fun fib n =
let fun fib' 0 = (1, 0)
| fib' 1 = (1, 1)
| fib' n =
let val (a, b) = fib' (n - 1)
in (a + b, a)
end
in if n >= 0
then #1 (fib' n)
else raise Range "fib: n must be non-negative"
end
````
A tail recursive version that runs in linear time and constant
space is:
```
exception Range of string
fun fib n =
let fun fib' 0 a _ = a
| fib' n a b = fib' (n - 1) (a + b) a
in if n >= 0
then fib' n 1 0
else raise Range "fib: n must be non-negative"
end
```
Of course, in C, one might write the last as something like:
```
unsigned int
fib(unsigned int n)
{
unsigned int a = 1, b = 0;
while (n-- > 0) {
unsigned int sum = a + b;
b = a;
a = sum;
}
return a;
}
```
On 09/05/2026 02:57, Dan Cross wrote:
In article <10tls2u$39j7a$1@dont-email.me>, Bart <bc@freeuk.com> wrote:
On 08/05/2026 22:02, Waldek Hebisch wrote:
[snip]
Note that all languages that I mention are at roughly similar
level as C, all are more (Oberon) or less (Ada) niche now.
But I think that each of them has more users than your
language (original UCSD Pascal and Turbo Pascal are dead, but
there are Turbo Pascal compatible products in current developement).
Also, users and developers of those languages probably think
that they "run rings around C", but do not come here to complain
how bad C is.
C sits at a particular level and mine is at about the same place.
For example, FreePascal transpiles to C;
You mean `fpc`? I see no evidence for that. I just looked at
https://www.freepascal.org and I see no documentation about the
compiler generating C code; it appears to generate object code
for the target platform, and the software requirements don't
mention a C compiler.
When I tried it about a decade ago, it appeared to use a C backend from
what I can remember. The same with Nim, or GHC. Or languages like
Euphoria or Seed7 which are interpreted, but can lower to C as an option.
But it is also possible I mixed it up with FreeBasic (I tried both).
Programs however move on. If I download it now, then it does bundle >something called 'gcc.exe', but it is a stub: it can load C files, but
it is missing 'cc1'.
The point is that some HLL X is commonly transpiled to C, either for >bootstrapping, or for early versions or as an option.
C rarely transpiles to some other HLL X, for the purposes of
implementing C.
But it sometimes does when language X wants to migrate
existing C code to X.
I'm showing I can create a decent language, one that has been tried and
tested so I know what worked and what didn't. And that enables me to
make an informed comparison with C, which is used in the same space.
No. Having a good understanding of C would enable you to make a
good comparison with C. But, again, you haven't demonstrated
that you have a good understanding of C, and you've expressed
negative interest in gaining such understanding, so whatever you
know about your own language is irrelevant.
As I kept saying, anybody can subjectively compare any language with any >other as it pertains to their sphere, their experience and their >requirements, down to individual features.
In my case:
* Pretty much all coding I did, outside of assembly and scripting, was
for applications that anyone else would have used C for.
* ALL of that was achieved via the features of my own languages
* All the generated code was done via my own tools right down to the binary
So for a particular micro-task, to get it from concept A in the source
code to B in the binary executable for machine M, I know exactly how I >expect it to work.
I can then compare that with using C to try and get from A to B.
I don't care how it does it internally or what are the reasons why it
might give different behaviour.
There are reasonable adjustments you need to make to switch languages,
and there are unreasonables ones, such as needing to become a guru in
the new language.
Or having to use workarounds because your code has to work without UB on
the DS9000, even though you are only interested in M, which has the same >characteristics as all other target machines you are likely to use.
And for my language, you can substitute 'X'.
So I refute your claim that somebody can't make a comparison or express
a preference without such indepth knowledge.
In article <10tj2h0$20gfo$1@dont-email.me>,
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
...
The `realloc` thing was a particularly egregious example of a
thing that started life well-defined, then became IB, and then
UB; it's relevant because it shows the committee is willing to
make weaken the language's guarantees about what is well-defined
over time, but I admit that that is rare.When was it ever well-defined?
The C89 standard says:
"If the size of the space requested is zero, the behavior is
implementation-defined; the value returned shall be either a null
pointer or a unique pointer."
Prior to C89, the closest thing there was to a standard was K&R, which
didn't mention realloc() (or most of the rest of what became the C
standard library).
In section 7.10.3.4 ("The `realloc` function"), the last
sentence of the "Description" reads: "If `size` is zero and
`ptr` is not a null pointer, the object it points to is freed."
That statement is explicit, and unambiguous.
The text you quoted is from the prefactory material at the top
of section 7.10.3 ("Memory management functions") and clearly
applies to to `malloc` and `calloc`.
I suppose one could make an argument to support it applying
to `realloc` as well because it doesn't explicitly *exclude* it,
but that would be a stretch.
I counter with two points: a) the
langauge in realloc is more specific, and thus should supercede
the general statement in the earlier introductory text, and b)
the langauge in 7.10.3 is talking about size requested for
allocation, but the language in 7.10.3.4 says that, in the case
it describes, the behavior is to _free_. In that specific case,
no size is "being requested" a la the 7.10.3 language, and thus
the statement about behavior in 7.10.3 does not apply.
The bottom line is that, despite the 7.10.3 wording, C89
explicitly defined `realloc(ptr, 0);` as equivalent to
`free(ptr)` when `ptr != NULL`.
Examples of statically typed languages include SML, Haskell,
Rust, etc. Those are also all strongly typed.
In article <10tn3so$3j8hc$1@dont-email.me>, Bart <bc@freeuk.com> wrote:
When I tried it about a decade ago, it appeared to use a C backend from
what I can remember. The same with Nim, or GHC. Or languages like
Euphoria or Seed7 which are interpreted, but can lower to C as an option.
That's not true for GHC, either. It does use an intermediate
representation language called "Cmm" that is described as a,
"simple, C like language". But that is not C, and the compiler
either generates native code or LLVM IR.
I don't know about Nim. A cursory glance indicates that it has
backends targeting a number of languages in the C family (C,
C++, Objective-C) and JavaScript. The C backend seems to be the
default.
But it is also possible I mixed it up with FreeBasic (I tried both).
FreeBASIC appears to generate native code.
Programs however move on. If I download it now, then it does bundle
something called 'gcc.exe', but it is a stub: it can load C files, but
it is missing 'cc1'.
This doesn't seem particularly relevant to anything.
However,
you may be confused because I'm some of these tools may invoke
`gcc` (or similar) as a command driver to invoke the platform
assembler and/or linker.
Why would it? C compilers are ubiquitous.
You don't care about C _as it is defined_. You only care about
how _you think it should work based on your intuition_. Your
incredulity at its definition not matching your expectations has
no bearing on anything at all.
There are reasonable adjustments you need to make to switch languages,
and there are unreasonables ones, such as needing to become a guru in
the new language.
It strikes me that you need to know the language if you want to
use and discuss it.
On 09/05/2026 16:18, Dan Cross wrote:
In article <10tn3so$3j8hc$1@dont-email.me>, Bart-a <bc@freeuk.com> wrote:
(You can also see from this that /nobody/ likes stdint.h types, even
though standardised from C99 which also introduced 'long long' used
here. That is another bugbear. Oh, I forgot, my criticism is not not
valid.)
Programs however move on. If I download it now, then it does bundle
something called 'gcc.exe', but it is a stub: it can load C files, but
it is missing 'cc1'.
This doesn't seem particularly relevant to anything.
We're drifting from my point, which is that C is in that small category,
of a deceptively simple and malleable language that would be a good fit
for a target language.
I'm saying mine would be in that group, which is my I'm doing
comparisons with Pascal or Ada which have been brought up.
-aHowever,
you may be confused because I'm some of these tools may invoke
`gcc` (or similar) as a command driver to invoke the platform
assembler and/or linker.
Probably. But I don't know what FPC looked like when I first tried it.
Why would it?-a C compilers are ubiquitous.
For the major platforms, so are compilers for dozens of languages.
You don't care about C _as it is defined_.-a You only care about
how _you think it should work based on your intuition_.-a Your
incredulity at its definition not matching your expectations has
no bearing on anything at all.
If you disagree with an opinion of mine, would be make it any difference
if I knew the C standard inside out? You are hardly going to change your mind.
Suppose I proposed for example that C should deprecate, then ban, the ability to write:
-a-a A[i]
-a-a B[i][j]
respectively as:
-a-a i[A]
-a-a j[i[A]]
(The last one is a little mind-blowing, as it turns one 2D array access
- two consecutive 1D accesses) into two /nested/ 1D accesses.)
Basically, it would mean addition between pointers and integers would
not be commutative: P + i, but not i + P.
You will either agree with this or not. But I can't see that it requires
any deep knowledge of the standard to make such a proposal, or why
somebody would require that of me in order to even consider it.
On 08/05/2026 22:47, Scott Lurndal wrote:
antispam@fricas.org (Waldek Hebisch) writes:
Bart <bc@freeuk.com> wrote:
1.
* There are no separate declarations needed, neither in shared headers, >>>> nor as prototypes2.
* If a module is imported by 50 others, it is processed exactly once3.
* Project info (modules etc) is present in only one module of a project. >>>> (Most module schemes require import directives in every module)4.
* That means no external build system is needed.5.
Just that is HUGE.
All of this was the norm in the late 1970s. And it may be HUGE
to you, but clearly it's more YAWN to everyone else.
So what happened since the late 70s?
On 09/05/2026 06:50, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
On 08/05/2026 22:02, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
What matter is what kinds of constructs is supported and all languages
that I mention support low-level programming. Ada, Modula 2
and Oberon where used to write operating systems, at least in
case of Modula 2 and Oberon there were no other language involved
(beside some small pieces of assembler). I am not sure if
there was any operating system written in Free Pascal, but
Free Pascal has all constructs needed, so there could be
if anybody wanted such system.
So technically each of languages that I mentioned could be
used to implement full software stack, starting from the
lowest level. If any of them gained enough popularity there
would translators targeting this language.
BTW: IIUC one popular PC "database" system used Modula 2 behind
the scene: it offered its own language which was translated to
Modula 2 which in turn was compiled by Modula 2 compiler to
native code.
I last used Pascal to any great extent in 1980, in a college
environment. It was a teaching language.
and allowed you great freedom (more than
I am not sure what "great freedom" means here. Even Ada which
is consdered most strict between languages that I mentioned allows
doing any needed low level tasks.
You can completely by-pass the type system including overriding a
function signature with another. You can access any memory address. You
can access the code bytes of any function.
You can pass control (call as a function) to any arbitrary address. You
jump to any address (via gnu extension).
You can execute any inline assembly (probably another extension).
Can you do that with Ada? Then good on it, but I'd imagine you'd need to jump through a few hoops.
I would doubt it very much with Oberon.
Mine of course allows all that.
In C these days, you're need to work around the UB that most of the
above probably is. That is my beef with it.
cross@spitfire.i.gajendra.net (Dan Cross) writes:
Examples of statically typed languages include SML, Haskell,
Rust, etc. Those are also all strongly typed.
Rust is not generally considered to be strongly typed.
Rust has
raw pointers and unsafe functions, both of which (can) violate
type safety.
On 09/05/2026 18:16, Bart wrote:
(You can also see from this that /nobody/ likes stdint.h types, even
though standardised from C99 which also introduced 'long long' used
here. That is another bugbear. Oh, I forgot, my criticism is not not
valid.)
Don't you realise that when you write things like that, you are only demonstrating why so many people do not take you seriously?-a Have you checked with every C programmer, and every person writing systems that generate C code, and checked that none of them like the <stdint.h>
types?-a No?-a I thought not.
Some people use them extensively.-a Some people have little use for size- specific types.-a Some people want size-specific types, but for some
reason (good or bad) want to use C90 rather than C99.-a Some people like
the <stdint.h> types but for some reason (good or bad) are unable to use them in certain cases.-a Some people dislike the <stdint.h> types, but
use them anyway.
Your language would not be a good fit, because it is a home-made
personal language with no traction.
Why would it?-a C compilers are ubiquitous.
For the major platforms, so are compilers for dozens of languages.
Almost invariably, C is the first language to be targeted for compilers
for a platform.-a It does not matter whether you like that or not, it is
a fact.
Basically, it would mean addition between pointers and integers would
not be commutative: P + i, but not i + P.
You will either agree with this or not. But I can't see that it
requires any deep knowledge of the standard to make such a proposal,
or why somebody would require that of me in order to even consider it.
An opinion about preferences for a particular piece of syntax does not
need deep knowledge beyond that bit of code.-a An opinion on whether it would be a good idea to change the standard to fit that preference, or
on what other peoples' preferences might be, or any unexpected
consequences or impacts of such a change - /that/ requires a deep
knowledge.
cross@spitfire.i.gajendra.net (Dan Cross) writes:
[sic: as a sequence, the Fibonacci numbers are undefined for
$n<0$, but this is a pedagogical example, so let's ignore that]
A comment on that further down...
[snip]
(Origin 0, not 1.)
fibonacci(0) is 0. There is no other.
[snip]
Here is my current favorite fast fibonacci function (which happens
to be written in a functional and tail-recursive style):
static ULL ff( ULL, ULL, unsigned, unsigned );
static unsigned lone( unsigned );
ULL
ffibonacci( unsigned n ){
return ff( 1, 0, lone( n ), n );
}
ULL
ff( ULL a, ULL b, unsigned m, unsigned n ){
ULL c = a+b;
return
m & n ? ff( (a+c)*b, b*b+c*c, m>>1, n ) :
m ? ff( a*a+b*b, (a+c)*b, m>>1, n ) :
/*****/ b;
}
unsigned
lone( unsigned n ){
return n |= n>>1, n |= n>>2, n |= n>>4, n ^ n>>1;
}
Much faster than the linear version.
cross@spitfire.i.gajendra.net (Dan Cross) writes:
[concerning UB when multiplying 16-bit unsigneds]
Yes. Btw, the fix is almost trivial:
```
uint16_t
mul(uint16_t a, uint16_t b)
{
unsigned int aa = a, bb = b;
return aa * bb;
}
```
Easier:
uint16_t
mul( unsigned a, unsigned b ){
return a*b;
}
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <10tj2h0$20gfo$1@dont-email.me>,
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
...
[snip]
Prior to C89, the closest thing there was to a standard was K&R, which
didn't mention realloc() (or most of the rest of what became the C
standard library).
In section 7.10.3.4 ("The `realloc` function"), the last
sentence of the "Description" reads: "If `size` is zero and
`ptr` is not a null pointer, the object it points to is freed."
That statement is explicit, and unambiguous.
The text you quoted is from the prefactory material at the top
of section 7.10.3 ("Memory management functions") and clearly
applies to to `malloc` and `calloc`.
I suppose one could make an argument to support it applying
to `realloc` as well because it doesn't explicitly *exclude* it,
but that would be a stretch.
Not at all. The rule in the C standard is that statements in a
higher node of the hierarchy apply to all the child nodes unless
a particular child node explicitly alters it.
I counter with two points: a) the
langauge in realloc is more specific, and thus should supercede
the general statement in the earlier introductory text, and b)
the langauge in 7.10.3 is talking about size requested for
allocation, but the language in 7.10.3.4 says that, in the case
it describes, the behavior is to _free_. In that specific case,
no size is "being requested" a la the 7.10.3 language, and thus
the statement about behavior in 7.10.3 does not apply.
The two provisions are not in conflict. The semantic description
in the realloc() section says the block is free()'d, but doesn't
say anything about the return value. The general prelude higher
up describes what is returned when the size requested is zero.
These two passages are talking about different things, and are
not in conflict with each other, and both apply.
The bottom line is that, despite the 7.10.3 wording, C89
explicitly defined `realloc(ptr, 0);` as equivalent to
`free(ptr)` when `ptr != NULL`.
You are simply wrong.
There is different wording in C99, and
that newer wording is not a change but a clarification of the
earlier wording in C89.
Such clarifications often occur in the
C99 standard.
I last used Pascal to any great extent in 1980, in a college
environment. It was a teaching language.
[...]
(My language is also a personal endeavour, but I'm not inflicting in on
the world, just sharing some ideas.)
[...]
Programming in Ada is like doing so with one hand tied behind your back.
FBC seems to include a working gcc.exe program.
If I do 'fib64 -R hello.bas' then it produces the C file below.
(You can also see from this that /nobody/ likes stdint.h types, even
though standardised from C99 which also introduced 'long long' used
here. That is another bugbear. Oh, I forgot, my criticism is not not
valid.)
Programs however move on. If I download it now, then it does bundle
something called 'gcc.exe', but it is a stub: it can load C files, but
it is missing 'cc1'.
This doesn't seem particularly relevant to anything.
We're drifting from my point, which is that C is in that small
category, of a deceptively simple and malleable language that would be
a good fit for a target language.
I'm saying mine would be in that group, which is my I'm doing
comparisons with Pascal or Ada which have been brought up.
If you disagree with an opinion of mine, would be make it any
difference if I knew the C standard inside out? You are hardly going
to change your mind.
Suppose I proposed for example that C should deprecate, then ban, the
ability to write:
A[i]
B[i][j]
respectively as:
i[A]
j[i[A]]
(The last one is a little mind-blowing, as it turns one 2D array
access - two consecutive 1D accesses) into two /nested/ 1D accesses.)
Basically, it would mean addition between pointers and integers would
not be commutative: P + i, but not i + P.
You will either agree with this or not. But I can't see that it
requires any deep knowledge of the standard to make such a proposal,
or why somebody would require that of me in order to even consider it.
There are reasonable adjustments you need to make to switch languages,It strikes me that you need to know the language if you want to
and there are unreasonables ones, such as needing to become a guru in
the new language.
use and discuss it.
You want EVERYBODY who uses C to know the standard in as much depth as
KT, JK and TR? (Maybe a few others too but they don't seem that
bothered about it.)
(I've just tried the above proposal in my C compiler. It took half a
minute to find where I had to comment out 4 lines to make it work.
As it happens, because this ability has been there a long time, some
programs use it, for example from sqlite:
nPage = nPageHeader = get4byte(28+(u8*)pPage1->aData);
So this change is not going to happen, and people will continue
writing quirky things like 3["ABCDEF"] just for the hell of it.
This is the story of C.)
Now look at what's involved in splitting a C module into two.
That is a joke. Unix and C (and C compilers and libraries) are so
closely intertwined that you cannot separate them.
I'd say then that that gave C an unfair advantage.
C pretends to be a safe language by saying all those naughty things
are UB and should be avoided, at the same time, C compilers can be
made to do all that.
There is a need for a language at the level of C, with small scope,
small footprint (it can be implemented in 200KB or less; show me a
200KB Rustc), with lots of rope to be able to do what you like.
On 2026-05-09 14:10, Bart wrote:
I last used Pascal to any great extent in 1980, in a college
environment. It was a teaching language.
Back these days. It was also a language that had been used in
critical environments; hereabouts, for example, in a nuclear
reprocessing plant.
These are both application areas where you'll hardly find any
products of privately developed language like yours, I'm sure.
[...]
(My language is also a personal endeavour, but I'm not inflicting in
on the world, just sharing some ideas.)
...ideas you borrowed from other languages and just assembled
them - as it seems, per design principle, arbitrarily - to your
personal liking.
[...]
Programming in Ada is like doing so with one hand tied behind your back.
I suppose you prefer the "freedom" of assembly.
In article <86mry8so39.fsf@linuxsc.com>,<snip>
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
Here is my current favorite fast fibonacci function (which happens
to be written in a functional and tail-recursive style):
static ULL ff( ULL, ULL, unsigned, unsigned );
static unsigned lone( unsigned );
ULL
ffibonacci( unsigned n ){
return ff( 1, 0, lone( n ), n );
}
ULL
ff( ULL a, ULL b, unsigned m, unsigned n ){
ULL c = a+b;
return
m & n ? ff( (a+c)*b, b*b+c*c, m>>1, n ) :
m ? ff( a*a+b*b, (a+c)*b, m>>1, n ) :
/*****/ b;
}
unsigned
lone( unsigned n ){
return n |= n>>1, n |= n>>2, n |= n>>4, n ^ n>>1;
}
Much faster than the linear version.
Very nice. 64-bit `unsigned long long` overflows for n>93, so I
question how much it matters in practice, though; surely if
calling this frequently you simply cache it in some kind of
table?
I wondered how this compared to Binet's Formula, using floating
point:
```
unsigned long long
binet_fib(unsigned int n)
{
const long double sqrt5 = sqrtl(5.);
long double fn =
(powl(1. + sqrt5, n) - powl(1. - sqrt5, n)) /
(powl(2., n) * sqrt5);
return llroundl(fn);
}
```
Sadly, my quick test suggests accuracy suffers (presumably due
to floating point) for the larger representable values in the
sequence; specifically, n>90. As a result I didn't bother
attempting to benchmark it.
In article <86ik8wsifm.fsf@linuxsc.com>,[...]
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
There is different wording in C99, and
that newer wording is not a change but a clarification of the
earlier wording in C89.
You need to go read n2464.
Bart <bc@freeuk.com> writes:
[snip]
There is a need for a language at the level of C, with small scope,
small footprint (it can be implemented in 200KB or less; show me a
200KB Rustc), with lots of rope to be able to do what you like.
I have no such need. If you do, well, you've implemented languages
before. Go for it.
Bart <bc@freeuk.com> writes:
[...]
Now look at what's involved in splitting a C module into two.
C doesn't have "modules".
[...]
That is a joke. Unix and C (and C compilers and libraries) are so
closely intertwined that you cannot separate them.
I'd say then that that gave C an unfair advantage.
It gave C an advantage. I don't know what you think is "unfair"
about it.
[...]
C pretends to be a safe language by saying all those naughty things
are UB and should be avoided, at the same time, C compilers can be
made to do all that.
C does not pretend to be a "safe language".
On 09/05/2026 23:47, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
Now look at what's involved in splitting a C module into two.C doesn't have "modules".
You want to be /that/ pedantic?
This is exactly why I said that the C standard is your thing. If
somebody uses a term that doesn't appear in the standard, then it
doesn't exist.
So, what is involved in splitting a ... I don't even know what to call
it - a single .c 'source file'? Well, a lot of messy work.
That is a joke. Unix and C (and C compilers and libraries) are soIt gave C an advantage. I don't know what you think is "unfair"
closely intertwined that you cannot separate them.
I'd say then that that gave C an unfair advantage.
about it.
The context was why C became the dominant language for systems
programming. I offered that as an example. If it helped C over a
potential rival which wasn't used to implement a major OS, then it
strikes me as an unfair advantage.
Suppose Unix was implemented in some other language, then if C was
still more successful over rivals, that would have been fairer.
C pretends to be a safe language by saying all those naughty things
are UB and should be avoided, at the same time, C compilers can be
made to do all that.
C does not pretend to be a "safe language".
So, C can be unsafe even when you avoid all UB? Examples?
I suppose this depends on what you mean by unsafe. Take this:
m = monthnames[month];
d = daynames[day];
Suppose month and day indices got swapped by mistake, but both are
still within bounds; is this the kind of 'unsafe' in C that some
languages can fix through stricter typing?
But then, how about this one:
d1 = daynames[day1];
d2 = daynames[day2];
A type system can't stop day1 and day2 being swapped; it can still go wrong.
Right, you don't know what to call it. I think the term you're
probably looking for is "translation unit".
If you have something to say about splitting a C translation unit
(something I don't think I've ever had a need to do),
perhaps because
you've had difficulties doing so yourself, feel free to elaborate.
On Sat, 09 May 2026 17:33:51 -0700
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
Right, you don't know what to call it. I think the term you're
probably looking for is "translation unit".
If you have something to say about splitting a C translation unit
(something I don't think I've ever had a need to do),
That surprises me greatly.
In my practice refactoring that includes splitting translation units is rather common.
Or, may be, I misunderstood your above sentence and you meant that you
never had a need *to say* something about splitting etc...?
perhaps because
you've had difficulties doing so yourself, feel free to elaborate.
That is a joke. Unix and C (and C compilers and libraries) are so
closely intertwined that you cannot separate them.
Bart <bc@freeuk.com> writes:
So, what is involved in splitting a ... I don't even know what to call
it - a single .c 'source file'? Well, a lot of messy work.
Right, you don't know what to call it. I think the term you're
probably looking for is "translation unit".
If you have something to say about splitting a C translation unit
(something I don't think I've ever had a need to do), perhaps because
you've had difficulties doing so yourself, feel free to elaborate.
Question: Does C, as you claim, "pretend to be a safe language"?
Can you cite a source to support that claim? If you were willing
to read the C standard, I'd refer you to first few paragraphs of
Annex K, introduced in C11.
You made a false statement. I've made plenty of mistakes here
myself. Acknowledging them would substantially increase your
credibility.
On 09/05/2026 16:18, Dan Cross wrote:
In article <10tn3so$3j8hc$1@dont-email.me>, Bart <bc@freeuk.com> wrote:(You can also see from this that /nobody/ likes stdint.h types, even
[snip]
though standardised from C99 which also introduced 'long long' used
here. That is another bugbear. Oh, I forgot, my criticism is not not valid.)
This doesn't seem particularly relevant to anything.
We're drifting from my point, which is that C is in that small category,
of a deceptively simple and malleable language that would be a good fit
for a target language.
I'm saying mine would be in that group, which is my I'm doing
comparisons with Pascal or Ada which have been brought up.
Why would it? C compilers are ubiquitous.
For the major platforms, so are compilers for dozens of languages.
You don't care about C _as it is defined_. You only care about
how _you think it should work based on your intuition_. Your
incredulity at its definition not matching your expectations has
no bearing on anything at all.
If you disagree with an opinion of mine, would be make it any difference
if I knew the C standard inside out? You are hardly going to change your >mind.
Suppose I proposed for example that C should deprecate, then ban, the >ability to write:
A[i]
B[i][j]
respectively as:
i[A]
j[i[A]]
(The last one is a little mind-blowing, as it turns one 2D array access
- two consecutive 1D accesses) into two /nested/ 1D accesses.)
Basically, it would mean addition between pointers and integers would
not be commutative: P + i, but not i + P.
You will either agree with this or not. But I can't see that it requires
any deep knowledge of the standard to make such a proposal, or why
somebody would require that of me in order to even consider it.
There are reasonable adjustments you need to make to switch languages,
and there are unreasonables ones, such as needing to become a guru in
the new language.
It strikes me that you need to know the language if you want to
use and discuss it.
You want EVERYBODY who uses C to know the standard in as much depth as
KT, JK and TR? (Maybe a few others too but they don't seem that bothered >about it.)
(I've just tried the above proposal in my C compiler. It took half a
minute to find where I had to comment out 4 lines to make it work.
As it happens, because this ability has been there a long time, some >programs use it, for example from sqlite:
nPage = nPageHeader = get4byte(28+(u8*)pPage1->aData);
So this change is not going to happen, and people will continue writing >quirky things like 3["ABCDEF"] just for the hell of it.
This is the story of C.)
Output from fbc64 -R hello.bas:
-----------------------------
[snip]
Looks like C to me!
On 10/05/2026 01:33, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
So, what is involved in splitting a ... I don't even know what to callRight, you don't know what to call it. I think the term you're
it - a single .c 'source file'? Well, a lot of messy work.
probably looking for is "translation unit".
A source file isn't a translation unit. A translation unit is the
primary source file with all the includes flattened out (and I guess
with all the comments removed and all macros expanded), and that does
not happen until compile time.
It's not what I see whan I look at file.c in my editor.
Now you're going to tell I'm wrong according to the standard.
If you have something to say about splitting a C translation unit
(something I don't think I've ever had a need to do), perhaps because
you've had difficulties doing so yourself, feel free to elaborate.
You've never had a module - sorry source file - sorry 'translation
unit' get too big for one file?
But you will surely know everything that might need doing if a such a
file needs splitting into two or more files.
My point had been that in my module scheme, it would be less work.
Question: Does C, as you claim, "pretend to be a safe language"?
Can you cite a source to support that claim? If you were willing
to read the C standard, I'd refer you to first few paragraphs of
Annex K, introduced in C11.
You made a false statement. I've made plenty of mistakes here
myself. Acknowledging them would substantially increase your
credibility.
You know what, if all possible answers to all C-related questions were contained within the C standard, why does this group even exist?
Just post a link to the standard document and be done with it.
On 09/05/2026 17:38, David Brown wrote:
On 09/05/2026 18:16, Bart wrote:
(You can also see from this that /nobody/ likes stdint.h types, even
though standardised from C99 which also introduced 'long long' used
here. That is another bugbear. Oh, I forgot, my criticism is not not
valid.)
Don't you realise that when you write things like that, you are only
demonstrating why so many people do not take you seriously?-a Have you
checked with every C programmer, and every person writing systems that
generate C code, and checked that none of them like the <stdint.h>
types?-a No?-a I thought not.
So, what's the figure?
I see this pattern frequently (sometimes every other project seemingly)
so they are unpopular for some. And we don't know if people using
uint8_t etc are doing so because they genuinely like it or feel obliged
to use it.
(Tim Rentsch also seems to avoid it here.)
Perhaps try asking why somebody would invent a new type name for uint8_t
at all.
Some people use them extensively.-a Some people have little use for size- >> specific types.-a Some people want size-specific types, but for some
reason (good or bad) want to use C90 rather than C99.-a Some people like
the <stdint.h> types but for some reason (good or bad) are unable to use
them in certain cases.-a Some people dislike the <stdint.h> types, but
use them anyway.
So they can be problematic. And they are optional which is another matter.
Your language would not be a good fit, because it is a home-made
personal language with no traction.
I'm not suggesting take up of it. The point is that it is in that same >category.
Why would it?-a C compilers are ubiquitous.
For the major platforms, so are compilers for dozens of languages.
Almost invariably, C is the first language to be targeted for compilers
for a platform.-a It does not matter whether you like that or not, it is
a fact.
If are looking for a HLL language to target for a new language, this is
not going to be a brand-new platform.
It will be an established one with lots of choices.
Basically, it would mean addition between pointers and integers would
not be commutative: P + i, but not i + P.
You will either agree with this or not. But I can't see that it
requires any deep knowledge of the standard to make such a proposal,
or why somebody would require that of me in order to even consider it.
An opinion about preferences for a particular piece of syntax does not
need deep knowledge beyond that bit of code.-a An opinion on whether it
would be a good idea to change the standard to fit that preference, or
on what other peoples' preferences might be, or any unexpected
consequences or impacts of such a change - /that/ requires a deep
knowledge.
Well I made that change and the first app I tried failed because relied
on 'i + P', if not 'A[i]', but C doesn't allow you to separate those.
The next two were OK, but the fourth also used it:
add32le(p + 2, x + s1->plt->data - p);
(From Tiny C sources.) So it looks use 'i + P' is already too widespread >even to deprecate it.
It would have needed to be banned from the start. Then that line would >simply have been written as:
add32le(p + 2, s1->plt->data - p + x);
At least, I made the change and tested it on real programs.
In article <10tnmk6$3os5b$1@dont-email.me>, Bart <bc@freeuk.com> wrote:[...]
Suppose I proposed for example that C should deprecate, then ban, the
ability to write:
A[i]
B[i][j]
respectively as:
i[A]
j[i[A]]
(The last one is a little mind-blowing, as it turns one 2D array access
- two consecutive 1D accesses) into two /nested/ 1D accesses.)
I haven't really given it much thought. This is an historical
artifact that came from B via "nb", where pointers were denoted
as `A[]`, so A[i] = A + i = i + A = i[A] because arithmetic in
the integers is commutative, and B was word-oriented.
It's a cute parlor trick, surprising to a few who haven't looked
closely at the history, but no deep mystery. I would not mourn
if it were removed from the language.
https://www.nokia.com/bell-labs/about/dennis-m-ritchie/chist.html
Basically, it would mean addition between pointers and integers would
not be commutative: P + i, but not i + P.
No, it would not mean that. It would merely mean that the
syntax for array accesses was divorced from its early history.
You will either agree with this or not. But I can't see that it requires
any deep knowledge of the standard to make such a proposal, or why
somebody would require that of me in order to even consider it.
In observing your behavior, this fits the pattern of being about
the place where your argument breaks down. Chesterson's fence
applies, of course, but I do not think you are not wrong to
question whether that surprising syntax should endure. But it
is your conclusion about communitivity of pointer arithmetic
that fails. You go from something that is, at least, open to
reasonable debate, and draw a specious conclusion that you then
assert as fact.
[...]
On 09/05/2026 23:25, Janis Papanagnou wrote:
On 2026-05-09 14:10, Bart wrote:
(My language is also a personal endeavour, but I'm not inflicting in
on the world, just sharing some ideas.)
...ideas you borrowed from other languages and just assembled
them - as it seems, per design principle, arbitrarily - to your
personal liking.
This sounds like a putdown.
[...]
I looked through mine, and I've identified a dozen or more features that
are either novel (at least I hadn't seen them elsewhere), or adapted in
a different way.
[...]
So, where's /your/ language? [...]
[...]
Programming in Ada is like doing so with one hand tied behind your back.
I suppose you prefer the "freedom" of assembly.
I hate assembly. I prefer a HLL a couple of steps up. C could have been
that language, but I got spoiled by a decade of using my private one.
On 09/05/2026 23:47, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
Now look at what's involved in splitting a C module into two.
C doesn't have "modules".
You want to be /that/ pedantic?
This is exactly why I said that the C standard is your thing. If
somebody uses a term that doesn't appear in the standard, then it
doesn't exist.
[...]
[...]
That is a joke. Unix and C (and C compilers and libraries) are so
closely intertwined that you cannot separate them.
I'd say then that that gave C an unfair advantage.
It gave C an advantage.-a I don't know what you think is "unfair"
about it.
The context was why C became the dominant language for systems
programming. I offered that as an example. If it helped C over a
potential rival which wasn't used to implement a major OS, then it
strikes me as an unfair advantage.
[...]
On 10/05/2026 01:33, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
So, what is involved in splitting a ... I don't even know what to call
it - a single .c 'source file'? Well, a lot of messy work.
Right, you don't know what to call it.-a I think the term you're
probably looking for is "translation unit".
A source file isn't a translation unit. [...]
It's not what I see whan I look at file.c in my editor.
In article <10tlvaa$1l93l$16@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
On 2026-05-09 00:43, Dan Cross wrote:
[...]
Sure. This was a bit of a contrived example, but you ask a good
question: how often might one want write code like that?
In short, I don't know, but I can think of any number of hash
functions, checksums, etc, that may be implemented using 16-bit
arithmetic, and I can well see programmers wanting to take
advantage of the modular semantics afforded by using unsigned
types to do so. Every day? Probably not. But often enough.
I mentioned it before but it may have got lost in the lots text
typically exchanged here; for hash functions a modulus based on
powers of two has *bad* _distribution properties_, so it's not
a sensible example or plausible rationale to vindicate modular
arithmetic for the few special cases (m=8, 16, 32, 64, etc.).
Maybe, maybe not, depending on the exact hashing function and
the values it uses. Since K&R2 came up elsewhere, consider the
hash function the presented on pp 128-129:
/* hash: form hash value for string s */
unsigned hash(char *s)
{
unsigned hashval;
for (hashval = 0; *s != '\0'; s++)
hashval = *s + 31 * hashval;
return hashval % HASHSIZE;
}
I wrote about collisions in this function a long time ago: https://pub.gajendra.net/2012/09/notes_on_collisions_in_a_common_string_hashing_function
In this case, the important characteristic with respect to
distribution is that the multiplier is relatively prime to the
modulous. Their choice of multipler is 31, which is a prime
number, and thus by definition co-prime to all positive moduli
They happen to chose 101 (also prime) for `HASHSIZE` but
assuming reasonably random input, the pathological behavior you
are referring to would be avoided even if the modulus were (say)
128.
On 09/05/2026 17:38, David Brown wrote:
On 09/05/2026 18:16, Bart wrote:
(You can also see from this that /nobody/ likes stdint.h types, even
though standardised from C99 which also introduced 'long long' used
here. That is another bugbear. Oh, I forgot, my criticism is not not
valid.)
Don't you realise that when you write things like that, you are only
demonstrating why so many people do not take you seriously?-a Have you
checked with every C programmer, and every person writing systems that
generate C code, and checked that none of them like the <stdint.h>
types?-a No?-a I thought not.
So, what's the figure?
I see this pattern frequently (sometimes every other project seemingly)
so they are unpopular for some. And we don't know if people using
uint8_t etc are doing so because they genuinely like it or feel obliged
to use it.
(Tim Rentsch also seems to avoid it here.)
Perhaps try asking why somebody would invent a new type name for uint8_t
at all.
Some people use them extensively.-a Some people have little use for
size- specific types.-a Some people want size-specific types, but for
some reason (good or bad) want to use C90 rather than C99.-a Some
people like the <stdint.h> types but for some reason (good or bad) are
unable to use them in certain cases.-a Some people dislike the
<stdint.h> types, but use them anyway.
So they can be problematic. And they are optional which is another matter.
Your language would not be a good fit, because it is a home-made
personal language with no traction.
I'm not suggesting take up of it. The point is that it is in that same category.
Why would it?-a C compilers are ubiquitous.
For the major platforms, so are compilers for dozens of languages.
Almost invariably, C is the first language to be targeted for
compilers for a platform.-a It does not matter whether you like that or
not, it is a fact.
If are looking for a HLL language to target for a new language, this is
not going to be a brand-new platform.
It will be an established one with lots of choices.
Basically, it would mean addition between pointers and integers would
not be commutative: P + i, but not i + P.
You will either agree with this or not. But I can't see that it
requires any deep knowledge of the standard to make such a proposal,
or why somebody would require that of me in order to even consider it.
An opinion about preferences for a particular piece of syntax does not
need deep knowledge beyond that bit of code.-a An opinion on whether it
would be a good idea to change the standard to fit that preference, or
on what other peoples' preferences might be, or any unexpected
consequences or impacts of such a change - /that/ requires a deep
knowledge.
Well I made that change and the first app I tried failed because relied
on 'i + P', if not 'A[i]', but C doesn't allow you to separate those.
The next two were OK, but the fourth also used it:
-a add32le(p + 2, x + s1->plt->data - p);
(From Tiny C sources.) So it looks use 'i + P' is already too widespread even to deprecate it.
It would have needed to be banned from the start. Then that line would simply have been written as:
-a add32le(p + 2, s1->plt->data - p + x);
At least, I made the change and tested it on real programs.
On 09/05/2026 20:20, Bart wrote:[...]
Well I made that change and the first app I tried failed because
relied on 'i + P', if not 'A[i]', but C doesn't allow you to
separate those.
The next two were OK, but the fourth also used it:
-a add32le(p + 2, x + s1->plt->data - p);
(From Tiny C sources.) So it looks use 'i + P' is already too
widespread even to deprecate it.
It would have needed to be banned from the start. Then that line
would simply have been written as:
-a add32le(p + 2, s1->plt->data - p + x);
At least, I made the change and tested it on real programs.
So you discovered that your knowledge was too superficial to give an
informed opinion, and after learning more, you discovered that
something you thought should "obviously" be changed in C, cannot be
changed. I guess that's progress!
In article <10tnmk6$3os5b$1@dont-email.me>, Bart <bc@freeuk.com> wrote:
One might find it silly to stop at a traffic signal in the
middle of the night, when it is easy to see there is no other
traffic and obviously no else is nearby. But if you decide not
to stop, don't be upset when a cop pulls you over and gives you
a ticket.
(I've just tried the above proposal in my C compiler. It took half a
minute to find where I had to comment out 4 lines to make it work.
And did you break commutative arithmetic on pointers when you
were at it?
As it happens, because this ability has been there a long time, some
programs use it, for example from sqlite:
nPage = nPageHeader = get4byte(28+(u8*)pPage1->aData);
That's not using subscripting at all.
So this change is not going to happen, and people will continue writing
quirky things like 3["ABCDEF"] just for the hell of it.
This is the story of C.)
No, it's not.
This appears to be another of your misunderstandings.
There are
reasons to dislike the semantic quirk of array subscribes
inherited from nb. But once again, your conclusion is specious.
Output from fbc64 -R hello.bas:
-----------------------------
[snip]
Looks like C to me!
Looks like a non-sequitur to me.
- Dan C.
Bart <bc@freeuk.com> writes:
My point had been that in my module scheme, it would be less work.
Good for you.
So you don't have a problem you're trying to solve, and you don't
want advice about how to do something.
You know what, if all possible answers to all C-related questions were
contained within the C standard, why does this group even exist?
Just post a link to the standard document and be done with it.
This group exists because not all possible answers to all C-related
questions are contained within the C standard. You pretend that
someone has made such a ridiculous claim, but unless I missed
something nobody has.
I'll try this again. You claimed that C "pretends to be a safe
language". That was a false claim. Will you either provide evidence
that it was correct or acknowledge that it was incorrect?
It happens that the first few paragraphs of Annex K are relevant
to your statement. If you inferred from that remark that I think
"all possible answers to all C-related questions were contained
within the C standard", that was a very wrong and silly inference.
I expect that you will refuse yet again to respond, but I'm prepared to
be pleasantly surprised.
So, C can be unsafe even when you avoid all UB? Examples?
On 09/05/2026 20:20, Bart wrote:
Well I made that change and the first app I tried failed because
relied on 'i + P', if not 'A[i]', but C doesn't allow you to separate
those.
The next two were OK, but the fourth also used it:
-a-a add32le(p + 2, x + s1->plt->data - p);
(From Tiny C sources.) So it looks use 'i + P' is already too
widespread even to deprecate it.
It would have needed to be banned from the start. Then that line would
simply have been written as:
-a-a add32le(p + 2, s1->plt->data - p + x);
At least, I made the change and tested it on real programs.
So you discovered that your knowledge was too superficial to give an informed opinion,
and after learning more, you discovered that something
you thought should "obviously" be changed in C, cannot be changed.-a I
guess that's progress!
On 2026-05-10 01:45, Bart wrote:
On 09/05/2026 23:47, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
Now look at what's involved in splitting a C module into two.
C doesn't have "modules".
You want to be /that/ pedantic?
This is exactly why I said that the C standard is your thing. If
somebody uses a term that doesn't appear in the standard, then it
doesn't exist.
I suppose the word you wanted to use is "translation unit".
Mind that I'm not familiar with the "C Standard"; I know that term
only from me listening to the discussions here. You are much longer
here in this newsgroup than I am, so I'd have expected that you'd
(meanwhile) know terms like that. Especially given that you make so
many posts, let me ask you, do you read (and perceive) the posts
you are replying to. - Given your posting history my guess would be
that you don't really read them, or don't understand them, probably
are not even interested in them.
The context was why C became the dominant language for systems
programming. I offered that as an example. If it helped C over a
potential rival which wasn't used to implement a major OS, then it
strikes me as an unfair advantage.
Keith already said that it was an advantage. Insisting on a "unfair" qualification is inappropriate, especially without ethical measure
and without any substantial evidence. (That wording reminds me the
wording in the communication style of the current POTUS.)
On 2026-05-10 01:16, Bart wrote:
On 09/05/2026 23:25, Janis Papanagnou wrote:
On 2026-05-09 14:10, Bart wrote:
(My language is also a personal endeavour, but I'm not inflicting in
on the world, just sharing some ideas.)
...ideas you borrowed from other languages and just assembled
them - as it seems, per design principle, arbitrarily - to your
personal liking.
This sounds like a putdown.
Since you appear to have suggested in your previous post that you're
sharing primarily the ideas and given that I don't see any idea that
you'd have actually invented, and recognizing that you always only
ever spoke about "your languages" in all your posts; yes, this is a
clear putdown of your achievements concerning substantial, novelty
ideas.
[...]
I looked through mine, and I've identified a dozen or more features
that are either novel (at least I hadn't seen them elsewhere), or
adapted in a different way.
I'm not interested in "adaptions", but feel free to post novel ideas
you developed; yet I haven't seen any, and I'm honestly interested in
new ideas.
[...]
So, where's /your/ language? [...]
What makes you think that I'd need to write an own language given that there's a plethora of languages of all kinds and paradigms existing.
On 2026-05-09 03:36, Dan Cross wrote:
In article <10tlvaa$1l93l$16@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
On 2026-05-09 00:43, Dan Cross wrote:
[...]
Sure. This was a bit of a contrived example, but you ask a good
question: how often might one want write code like that?
In short, I don't know, but I can think of any number of hash
functions, checksums, etc, that may be implemented using 16-bit
arithmetic, and I can well see programmers wanting to take
advantage of the modular semantics afforded by using unsigned
types to do so. Every day? Probably not. But often enough.
I mentioned it before but it may have got lost in the lots text
typically exchanged here; for hash functions a modulus based on
powers of two has *bad* _distribution properties_, so it's not
a sensible example or plausible rationale to vindicate modular
arithmetic for the few special cases (m=8, 16, 32, 64, etc.).
Maybe, maybe not, depending on the exact hashing function and
the values it uses. Since K&R2 came up elsewhere, consider the
hash function the presented on pp 128-129:
(I don't have that version available so the reference doesn't
help me much.)
/* hash: form hash value for string s */
unsigned hash(char *s)
{
unsigned hashval;
for (hashval = 0; *s != '\0'; s++)
hashval = *s + 31 * hashval;
return hashval % HASHSIZE;
}
The item in question would be 'HASHSIZE'. I cannot infer from
that code whether a prime, a CPU-wordsize, or something else
has been defined for that entity.
Seeking in my older K&R translation I found a similar (but not
the same, a more primitive) function that has a HASHSIZE of 100
defined. (Clearly not a good choice.)
On 10/05/2026 10:29, David Brown wrote:
On 09/05/2026 20:20, Bart wrote:
Well I made that change and the first app I tried failed because
relied on 'i + P', if not 'A[i]', but C doesn't allow you to separate
those.
The next two were OK, but the fourth also used it:
-a-a add32le(p + 2, x + s1->plt->data - p);
(From Tiny C sources.) So it looks use 'i + P' is already too
widespread even to deprecate it.
It would have needed to be banned from the start. Then that line would
simply have been written as:
-a-a add32le(p + 2, s1->plt->data - p + x);
At least, I made the change and tested it on real programs.
So you discovered that your knowledge was too superficial to give an
informed opinion,
So, what did I miss? And about what; the prevalance of i+P arithmetic in
C codebases? I suspect you didn't know that either.
and after learning more, you discovered that something
you thought should "obviously" be changed in C, cannot be changed.-a I
guess that's progress!
Well it /can/ be changed, but it would be too draconian when dealing
with legacy code.
It requires constructs like i[A] to be deprecated, while still allowing
i + A.
That is also possible, but is not as simple a change, since C currently >requires them to be interchangeable, and that is baked in to my compiler.
On 2026-05-10 03:35, Dan Cross wrote:
[snip]
In observing your behavior, this fits the pattern of being about
the place where your argument breaks down. Chesterson's fence
applies, of course, but I do not think you are not wrong to
question whether that surprising syntax should endure. But it
is your conclusion about communitivity of pointer arithmetic
that fails. You go from something that is, at least, open to
reasonable debate, and draw a specious conclusion that you then
assert as fact.
There's a truth in Chesterson's Fence. But I have my doubts when
applied for the case here; basically asking Bart to inform himself
about the original rationales for that option/rule or "fence".
On 10/05/2026 02:35, Dan Cross wrote:
In article <10tnmk6$3os5b$1@dont-email.me>, Bart <bc@freeuk.com> wrote:
[snip]
(I've just tried the above proposal in my C compiler. It took half a
minute to find where I had to comment out 4 lines to make it work.
And did you break commutative arithmetic on pointers when you
were at it?
Yes,
that's exactly how it works, since A[i] is reduced to *(A + i).
I
commented out the bit of code that swapped operands when the pointer was
on the right, as in (i + A).
As it happens, because this ability has been there a long time, some
programs use it, for example from sqlite:
nPage = nPageHeader = get4byte(28+(u8*)pPage1->aData);
That's not using subscripting at all.
In C, you can write i[A] /because/ it is exactly equivalent to *(i + A),
and pointer addition (between T* and int) is commutative.
So this change is not going to happen, and people will continue writing
quirky things like 3["ABCDEF"] just for the hell of it.
This is the story of C.)
No, it's not.
This appears to be another of your misunderstandings.
What is the misunderstanding?
There are
reasons to dislike the semantic quirk of array subscribes
inherited from nb. But once again, your conclusion is specious.
There are cruder ways to stop people writing i[A] whilst still allowing
(i + P). But it would be more of a hack.
(In my languages, i + P is simply not allowed, while A[i] is not reduced
to pointer arithmetic at the AST level. For one thing, arrays have an >arbitrary lower bound so the mapping isn't as simple.)
Output from fbc64 -R hello.bas:
-----------------------------
[snip]
Looks like C to me!
Looks like a non-sequitur to me.
You suggested FBC didn't transpile to C. I actually tried it to find
On 10/05/2026 10:29, David Brown wrote:
On 09/05/2026 20:20, Bart wrote:
Well I made that change and the first app I tried failed because
relied on 'i + P', if not 'A[i]', but C doesn't allow you to separate
those.
The next two were OK, but the fourth also used it:
-a-a add32le(p + 2, x + s1->plt->data - p);
(From Tiny C sources.) So it looks use 'i + P' is already too
widespread even to deprecate it.
It would have needed to be banned from the start. Then that line
would simply have been written as:
-a-a add32le(p + 2, s1->plt->data - p + x);
At least, I made the change and tested it on real programs.
So you discovered that your knowledge was too superficial to give an
informed opinion,
So, what did I miss? And about what; the prevalance of i+P arithmetic in
C codebases?
I suspect you didn't know that either.
and after learning more, you discovered that something you thought
should "obviously" be changed in C, cannot be changed.-a I guess that's
progress!
Well it /can/ be changed, but it would be too draconian when dealing
with legacy code.
It requires constructs like i[A] to be deprecated, while still allowing
i + A.
That is also possible, but is not as simple a change, since C currently requires them to be interchangeable, and that is baked in to my compiler.
On 10/05/2026 05:39, Janis Papanagnou wrote:
[snip]
What makes you think that I'd need to write an own language given that
there's a plethora of languages of all kinds and paradigms existing.
So where's the one that works like mine?
And why are there so many new ones still appearing? Most of them you
will not know about.
On 10/05/2026 13:29, Bart wrote:
I am not saying /I/ have
the in-depth knowledge required to give a good argument for changing the standards here - I am merely saying that /you/ don't have that knowledge.
Suppose I proposed for example that C should deprecate, then ban, the ability to write:...
But I can't see that it requires any deep knowledge of the standard tomake such a proposal, or why somebody would require that of me in order
It requires constructs like i[A] to be deprecated, while still
allowing i + A.
That is also possible, but is not as simple a change, since C
currently requires them to be interchangeable, and that is baked in to
my compiler.
Not only do you not have the knowledge required to give an informed
opinion about making this particular change to the standards, you don't
have the knowledge required to give an informed opinion about making /
any/ changes to the standard, the C language, or implementations.
This is not like making changes to your personal little languages or
your toy C compiler.
In article <10tpq7e$a6kp$3@dont-email.me>, Bart <bc@freeuk.com> wrote:
On 10/05/2026 10:29, David Brown wrote:
On 09/05/2026 20:20, Bart wrote:
Well I made that change and the first app I tried failed because
relied on 'i + P', if not 'A[i]', but C doesn't allow you to separate
those.
The next two were OK, but the fourth also used it:
-a-a add32le(p + 2, x + s1->plt->data - p);
(From Tiny C sources.) So it looks use 'i + P' is already too
widespread even to deprecate it.
It would have needed to be banned from the start. Then that line would >>>> simply have been written as:
-a-a add32le(p + 2, s1->plt->data - p + x);
At least, I made the change and tested it on real programs.
So you discovered that your knowledge was too superficial to give an
informed opinion,
So, what did I miss? And about what; the prevalance of i+P arithmetic in
C codebases? I suspect you didn't know that either.
Apparently, you missed the changes afoot in the committee to do
exactly what everyone has been telling you: deprecate `i[A]` but
preserve `i + A`.
and after learning more, you discovered that something
you thought should "obviously" be changed in C, cannot be changed.-a I
guess that's progress!
Well it /can/ be changed, but it would be too draconian when dealing
with legacy code.
It requires constructs like i[A] to be deprecated, while still allowing
i + A.
How is that draconian?
That is also possible, but is not as simple a change, since C currently
requires them to be interchangeable, and that is baked in to my compiler.
Sounds like a problem for you and your compiler.
On 09/05/2026 23:47, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
Now look at what's involved in splitting a C module into two.
C doesn't have "modules".
You want to be /that/ pedantic?
This is exactly why I said that the C standard is your thing. If
somebody uses a term that doesn't appear in the standard, then it
doesn't exist.
So, what is involved in splitting a ... I don't even know what to call
it - a single .c 'source file'? Well, a lot of messy work.
[snip]
The context was why C became the dominant language for systems
programming. I offered that as an example. If it helped C over a
potential rival which wasn't used to implement a major OS, then it
strikes me as an unfair advantage.
C does not pretend to be a "safe language".
So, C can be unsafe even when you avoid all UB? Examples?
I suppose this depends on what you mean by unsafe. Take this:
m = monthnames[month];
d = daynames[day];
Suppose month and day indices got swapped by mistake, but both are still >within bounds; is this the kind of 'unsafe' in C that some languages can
fix through stricter typing?
But then, how about this one:
d1 = daynames[day1];
d2 = daynames[day2];
A type system can't stop day1 and day2 being swapped; it can still go wrong.
Bart <bc@freeuk.com> wrote:
On 09/05/2026 06:50, Waldek Hebisch wrote:
I last used Pascal to any great extent in 1980, in a college
environment. It was a teaching language.
That was orignal goal. But Pascal quickly got serious use.
On 09/05/2026 17:38, David Brown wrote:
On 09/05/2026 18:16, Bart wrote:
(You can also see from this that /nobody/ likes stdint.h types, even
though standardised from C99 which also introduced 'long long' used
here. That is another bugbear. Oh, I forgot, my criticism is not not
valid.)
Don't you realise that when you write things like that, you are only
demonstrating why so many people do not take you seriously?-a Have you
checked with every C programmer, and every person writing systems that
generate C code, and checked that none of them like the <stdint.h>
types?-a No?-a I thought not.
So, what's the figure?
Perhaps try asking why somebody would invent a new type name for uint8_t
at all.
Some people use them extensively.-a Some people have little use for size- >> specific types.-a Some people want size-specific types, but for some
reason (good or bad) want to use C90 rather than C99.-a Some people like
the <stdint.h> types but for some reason (good or bad) are unable to use
them in certain cases.-a Some people dislike the <stdint.h> types, but
use them anyway.
So they can be problematic.
On 10/05/2026 05:39, Janis Papanagnou wrote:
Originally my language was created to run on a bare board with very
little memory and no existing software /at all/, not even an assembler.
/You/ try it.
On 09/05/2026 23:47, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
Now look at what's involved in splitting a C module into two.
C doesn't have "modules".
You want to be /that/ pedantic?
This is exactly why I said that the C standard is your thing. If
somebody uses a term that doesn't appear in the standard, then it
doesn't exist.
So, what is involved in splitting a ... I don't even know what to call
it - a single .c 'source file'? Well, a lot of messy work.
The context was why C became the dominant language for systems
programming. I offered that as an example. If it helped C over a
potential rival which wasn't used to implement a major OS, then it
strikes me as an unfair advantage.
Suppose Unix was implemented in some other language, then if C was still >more successful over rivals, that would have been fairer.
Michael S <already5chosen@yahoo.com> writes:
On Sat, 09 May 2026 17:33:51 -0700
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
Right, you don't know what to call it. I think the term you're
probably looking for is "translation unit".
If you have something to say about splitting a C translation unit
(something I don't think I've ever had a need to do),
That surprises me greatly.
In my practice refactoring that includes splitting translation units is
rather common.
Or, may be, I misunderstood your above sentence and you meant that you
never had a need *to say* something about splitting etc...?
perhaps because
you've had difficulties doing so yourself, feel free to elaborate.
I didn't give it a lot of thought, but I haven't done a lot of
refactoring of C projects. My experience is of course not universal,
and may not be representative.
Bart <bc@freeuk.com> writes:
On 09/05/2026 17:38, David Brown wrote:
On 09/05/2026 18:16, Bart wrote:
(You can also see from this that /nobody/ likes stdint.h types, even
though standardised from C99 which also introduced 'long long' used
here. That is another bugbear. Oh, I forgot, my criticism is not not
valid.)
Don't you realise that when you write things like that, you are only
demonstrating why so many people do not take you seriously?-a Have you
checked with every C programmer, and every person writing systems that
generate C code, and checked that none of them like the <stdint.h>
types?-a No?-a I thought not.
So, what's the figure?
One doesn't understand your question. Is 'figure' some britishism
in this context? Or do you expect David to provide an accurate
percentage describing the preferences of every C programmer on
the planet (or in orbit, if any of the current station occupants
can program in C :-).
Personally, for my working code, the stdint types are used
extensively.
Perhaps try asking why somebody would invent a new type name for uint8_t
at all.
Strawman. Please provide examples of "somebody inventing a new type name
for uint8_t" (post standardization). One swallow doesn't make a summer, so a single example
from some obscure project you found on the WWW isn't partcularly
instructive.
Some people use them extensively.-a Some people have little use for size- >>> specific types.-a Some people want size-specific types, but for some
reason (good or bad) want to use C90 rather than C99.-a Some people like >>> the <stdint.h> types but for some reason (good or bad) are unable to use >>> them in certain cases.-a Some people dislike the <stdint.h> types, but
use them anyway.
So they can be problematic.
That's not what David said, or even implied.
In article <86o6isuegr.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
[snip]
It's important to understand the perspectives of different groups
of participants in the C ecosystem. There are three main groups:
If you're a programmer, you hate undefined behavior, and avoid it
like the plague.
If you're a compiler writer, you love undefined behavior, because
it lets you do whatever you want.
If you're a member of the ISO C standards committee (and I admit
that to a degree I am speculating here), you think of undefined
behavior as a balancing test, of needing to weigh the tensions
inherent in what the first two groups would prefer.
This, I think, is the tragedy of C ("tragedy" in the dramatic,
Shakespearean sense).
[long exposition on the history of C]
My point here is that the users and developers of the language
were the same group, [elaboration]
But, as you pointed out, this is no longer the case. The two are
now distinct, with very different goals. [a consequence of which
is C usage is less uniform (my paraphrase)]
I think this is fair: pretty much no production OS is written in
pure ISO C, if they're written in C at all: they all use compiler
flags or custom toolchains to enable various extensions and pin
down aspects of UB they depend on in one form or another.
And this is the tragedy. This isn't how it started, and I don't
think the folks who created the language wanted it to go down this
way, but here we are. [rest omitted]
On 09/05/2026 00:43, Dan Cross wrote:
In article <10tk4sg$2l19a$2@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:
On 08/05/2026 00:08, Dan Cross wrote:
K&R is a wonderful book for its exposition: well-written,
concise, and the prose is beautiful. Kernighan is an amazing
writer, and Ritchie was well-known for his patience and clear
explainations.
However, it is a product of its time. It dates from a simpler
era, when programmers were expected to use books like it is a
starting point, and subsequently gain mastery either through
careful study of the standard, or extensive practice. (I'm
referring specifically to K&R2, of course, since the first
edition predated the first version of the standard by a decade.)
Machines were smaller and simpler, then, and so were compilers.
I am sad to say that I don't think it has aged particularly well.
I like the way you put that. Sometimes people have a tendency to put
too much reverence on particular texts - such as imagining that K&R says >>> all that needs to be said about C, and treating any modern tool, text,
standard or program that diverges from it as some kind of heresy, or
"not following the spirit of C". Languages evolve - tools evolve,
programs evolve, standards evolve, requirements evolve. K&R was a
milestone in the history of programming languages, and a model for
technical writing in its field, but C today is not C of fifty years ago.
Thank you. Yes, I pretty much agree.
It is unfortunate that this situation may be UB. I personally think
"unsigned short" should promote to "unsigned int", not "int" -
promotions should be signedness preserving. I don't like the "promote
to int" at all. But opinions don't change the standards, and I suppose
there are historical reasons for the rules that were made here.
But I am not sure I agree that such cases are "easy to stumble into".
How often would code like that be written, where overflowing the
uint16_t would be correct behaviour in the code on a 16-bit int system?
It is certainly possible, but it is perhaps more likely that cases of
overflow in the 16-bit system were also bugs in the code - moving the
code to 32-bit systems could give different undesirable effects from the >>> bug. It could also happen to remove the effects of the bug by holding
results in 32-bit registers and leading to correct results in later
calculations - UB can work either way.
Sure. This was a bit of a contrived example, but you ask a good
question: how often might one want write code like that?
I think the particularly interesting thing about asking how often code
like this occurs, is that the potential impact of an oddity may be
higher for things that aren't often used. Most C programmers will
fairly quickly learn that overflowing signed arithmetic is UB and try to >avoid it - but the rarity of this example means that people are less
likely to realise it is UB.
In short, I don't know, but I can think of any number of hash
functions, checksums, etc, that may be implemented using 16-bit
arithmetic, and I can well see programmers wanting to take
advantage of the modular semantics afforded by using unsigned
types to do so. Every day? Probably not. But often enough.
I can imagine situations in the microcontroller world (as usual, many of
my examples come from there!) where code that was originally written for >8-bit or 16-bit devices was moved to 32-bit devices. Microcontroller >programmers are big users of fixed-size integer types - sometimes a good >thing, sometimes not.
One of the things I had to really internalize as an OS person is
that the universe of useful existing software is large. It
doesn't matter if I create the most beautiful abstractions for
them that are infinitely superior to whatever swill their code
is using now. If they don't get to run their program (or worse,
they have to make a bunch of invasive changes for no discernable
benefit from their perspective) because I know better about how
things ought to be done, they're not going to use whatever
system I'm working on unless they're forced. But even then they
will resent it and move to something else the first chance they
get (lookin' at you, DEC, Microsoft, IBM, and any number of
commercial Unix vendors).
Whatever _I_ think of how the interfaces they chose to use is
immaterial, making it difficult for them wins me no friends.
This is one of the smart things Torvalds did with Linux: "don't
break userspace" (unless there's a really, really good reason)
probably did a lot to help make Linux popular.
Anyway, I think this is similar. It doesn't matter what anyone
thinks of whether one ought to prevent all overflow; the fact
is that the language supports it for unsigned integers (though
with some surprising semantics for types of lower rank than
`int`) simply is what it is. And if someone has a program that
avails of those semantics, and that program is important to them
for whatever reason, then there's little choice but to hold
one's nose. I know you know this, of course, but I think it's
worth repeating every now and then.
Agreed. Knowing the semantics (and knowing when no semantics are
defined) is more important than exactly what the semantics are. For any >real language, there are always going to be things you disagree with or >think could be done differently, but you live with it anyway. Just look
at the C or C++ standards committee voting records - very few changes
get voted through unanimously.
(I guess that's why Bart is so deliriously impressed with his own
language - as the language's only designer, implementer, and user, it >presumably fits his preferences quite well. Real-world languages are
more of a compromise.)
Certainly, however, the fact that this expression could contain UB would >>> surprise many C programmers.
Yes. Btw, the fix is almost trivial:
```
uint16_t
mul(uint16_t a, uint16_t b)
{
unsigned int aa = a, bb = b;
return aa * bb;
}
```
But we must be careful - copying the same pattern to uint32_t would then
be incorrect if unsigned int is smaller than 32 bits. (Still no UB, >though.)
A general pattern could be :
T mul(T a, T b) {
return (a + 0u) * b;
}
On 10/05/2026 15:58, Scott Lurndal wrote:
Bart <bc@freeuk.com> writes:
Perhaps try asking why somebody would invent a new type name for uint8_t >>> at all.
Strawman. Please provide examples of "somebody inventing a new type name
for uint8_t" (post standardization). One swallow doesn't make a summer, so a single example
from some obscure project you found on the WWW isn't partcularly
instructive.
You invite people to give examples, but then immediately qualify that by >putting restrictions on quantity and popularity so that they can never win!
For other people's benefit:
typedef uint8_t byte;
(From: >https://github.com/arduino/ArduinoCore-avr/blob/master/cores/arduino/Arduino.h)
typedef int64_t mz_int64;
(From a compression library called "miniz")
typedef uint32_t Uint32;
(From SDL2 header files)
Is it always safe and not undefined behavior to do:
int i;
long l;
i = (int)l;
as long as you have first veried that 'l' is within
the range between INT_MIN and INT_MAX? Thanks.
On 10/05/2026 13:39, Dan Cross wrote:
In article <10tpq7e$a6kp$3@dont-email.me>, Bart <bc@freeuk.com> wrote:
On 10/05/2026 10:29, David Brown wrote:
On 09/05/2026 20:20, Bart wrote:
Well I made that change and the first app I tried failed because
relied on 'i + P', if not 'A[i]', but C doesn't allow you to separate >>>>> those.
The next two were OK, but the fourth also used it:
-a-a add32le(p + 2, x + s1->plt->data - p);
(From Tiny C sources.) So it looks use 'i + P' is already too
widespread even to deprecate it.
It would have needed to be banned from the start. Then that line would >>>>> simply have been written as:
-a-a add32le(p + 2, s1->plt->data - p + x);
At least, I made the change and tested it on real programs.
So you discovered that your knowledge was too superficial to give an
informed opinion,
So, what did I miss? And about what; the prevalance of i+P arithmetic in >>> C codebases? I suspect you didn't know that either.
Apparently, you missed the changes afoot in the committee to do
exactly what everyone has been telling you: deprecate `i[A]` but
preserve `i + A`.
The current standard says that those have to be tied together: 6.5.2.1p2.
However, I also, independently (and off the top of my head), came up
with a proposal that is being actually considered.
Well, done, Bart!
Oh, hang on, EVERY SINGLE THING I SAY AND DO HERE IS WRONG. I forgot
that part.
and after learning more, you discovered that something
you thought should "obviously" be changed in C, cannot be changed.-a I >>>> guess that's progress!
Well it /can/ be changed, but it would be too draconian when dealing
with legacy code.
It requires constructs like i[A] to be deprecated, while still allowing
i + A.
How is that draconian?
If implemented by removing pointer+int commutativity, too many programs >would fail. My first attempt did that since I wanted to honour 6.5.2.1p.
If I broke 6.5.2.1p2, then it was more successful. Programs using i[A]
are much rarer than those using i+P.
That is also possible, but is not as simple a change, since C currentlySounds like a problem for you and your compiler.
requires them to be interchangeable, and that is baked in to my compiler. >>
Always with the positives! You just have to keep bullying don't you.
It turns out that disallowing i[A] while keeping i+P was even simpler >because of the way my compiler works.
On 10/05/2026 14:03, David Brown wrote:
On 10/05/2026 13:29, Bart wrote:...
[snip]
Suppose I proposed for example that C should deprecate, then ban, the >>ability to write:
But I can't see that it requires any deep knowledge of the standard to >>make such a proposal, or why somebody would require that of me in order
to even consider it.
Now it turns out that the C committee is actually looking at such a >proposal. But funnily enough, no one has given me credit for that.
[snip]
My only mistake was thinking that C REQUIRED indexing syntax to be tied
to pointer arithmetic, but as far as I know, it currently does do that,
and will do so for some years yet.
But if we're allowed to separate them, then OK I'll have another go at
my compiler. It turns out to even simpler: I had to modify three lines
of code.
Now P+i and i+P are still allowed, but not i[A], only A[i]. All the
tests I tried before now still work.
So my toy compiler implements part of C2y!
The interesting thing is that to achieve it, I had to ignore my
knowledge of the current C standard (specifically 6.5.2.1p2).
SO *NOW* TELL ME WHAT I DID WRONG.
It seems to me that guys just want to constantly pick on me for specious >reasons.
I wouldn't call it a tragedy, in fact just the opposite. If C had
stayed in its original environment it never would have become as
ubiquitous and widespread as it is today. The original ecosystem
doesn't scale. By letting C, and also Unix, enter the public
sphere, a great benefit accrued to the world at large.
On 10/05/2026 14:03, David Brown wrote:
On 10/05/2026 13:29, Bart wrote:
I am not saying /I/ have the in-depth knowledge required to give a
good argument for changing the standards here - I am merely saying
that /you/ don't have that knowledge.
So, this is mystery: I am at fault for not knowing X, but you not at
fault for not knowing X?!
Bart <bc@freeuk.com> writes:
On 09/05/2026 17:38, David Brown wrote:
On 09/05/2026 18:16, Bart wrote:
(You can also see from this that /nobody/ likes stdint.h types, even
though standardised from C99 which also introduced 'long long' used
here. That is another bugbear. Oh, I forgot, my criticism is not not
valid.)
Don't you realise that when you write things like that, you are only
demonstrating why so many people do not take you seriously?-a Have you
checked with every C programmer, and every person writing systems that
generate C code, and checked that none of them like the <stdint.h>
types?-a No?-a I thought not.
So, what's the figure?
One doesn't understand your question. Is 'figure' some britishism
in this context? Or do you expect David to provide an accurate
percentage describing the preferences of every C programmer on
the planet (or in orbit, if any of the current station occupants
can program in C :-).
Personally, for my working code, the stdint types are used
extensively.
Bart <bc@freeuk.com> writes:
On 10/05/2026 15:58, Scott Lurndal wrote:
Bart <bc@freeuk.com> writes:
Perhaps try asking why somebody would invent a new type name for uint8_t >>>> at all.
Strawman. Please provide examples of "somebody inventing a new type name >>> for uint8_t" (post standardization). One swallow doesn't make a summer, so a single example
from some obscure project you found on the WWW isn't partcularly
instructive.
You invite people to give examples, but then immediately qualify that by
putting restrictions on quantity and popularity so that they can never win! >>
For other people's benefit:
typedef uint8_t byte;
They're explicitly using uint8_t specifically for the purpose
it was intended for. They fact that they have an alias could
be for dozens of reasons, including code reuse or compatability between older C compilers that didn't yet support <stdint.h> (with suitable
preprocessor code to define uint8_t on targets that don't support
<stdint.h>. See autotools.
They didn't do this because the programmer disliked uint8_t
or the stdint.h types in general.
Yes, that's a major problem with all 64 bit Unices.Use Windows with
that. On Windows long and int have the same size.
In article <10tq1pi$d08i$1@dont-email.me>, Bart <bc@freeuk.com> wrote:
On 10/05/2026 14:03, David Brown wrote:
On 10/05/2026 13:29, Bart wrote:...
[snip]
Suppose I proposed for example that C should deprecate, then ban, the
ability to write:
But I can't see that it requires any deep knowledge of the standard to
make such a proposal, or why somebody would require that of me in order
to even consider it.
Now it turns out that the C committee is actually looking at such a
proposal. But funnily enough, no one has given me credit for that.
Why would anyone give you credit for something the committee
came up with?
You may have independently come up with a similar idea, or even
the same idea, but I see no evidence the committee was aware of
that.
On 10/05/2026 15:58, Scott Lurndal wrote:
Bart <bc@freeuk.com> writes:
On 09/05/2026 17:38, David Brown wrote:
On 09/05/2026 18:16, Bart wrote:
(You can also see from this that /nobody/ likes stdint.h types, even >>>>> though standardised from C99 which also introduced 'long long' used
here. That is another bugbear. Oh, I forgot, my criticism is not not >>>>> valid.)
Don't you realise that when you write things like that, you are only
demonstrating why so many people do not take you seriously?-a Have you >>>> checked with every C programmer, and every person writing systems that >>>> generate C code, and checked that none of them like the <stdint.h>
types?-a No?-a I thought not.
So, what's the figure?
One doesn't understand your question. Is 'figure' some britishism
in this context? Or do you expect David to provide an accurate
percentage describing the preferences of every C programmer on
the planet (or in orbit, if any of the current station occupants
can program in C :-).
Personally, for my working code, the stdint types are used
extensively.
You know, I could well be right, and nobody does like them, apart of
course from people here. Instead they could just be tolerated.
I doubt whether they are loved, otherwise we'd see those _t suffixes in >other languages too because they look so good.
Perhaps try asking why somebody would invent a new type name for uint8_t >>> at all.
Strawman. Please provide examples of "somebody inventing a new type name
for uint8_t" (post standardization). One swallow doesn't make a summer, so a single example
from some obscure project you found on the WWW isn't partcularly
instructive.
You invite people to give examples, but then immediately qualify that by >putting restrictions on quantity and popularity so that they can never win!
That's not what David said, or even implied.
Problematic in being a mess leading to a mix of classic, stdint and >user-defined types. Compare with the use of comparable types in C#, D,
Java, Zig, Rust and Go.
cross@spitfire.i.gajendra.net (Dan Cross) writes:
[..I am summarizing parts in an effort to get to key aspects..]
In article <86o6isuegr.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
[snip]
It's important to understand the perspectives of different groups
of participants in the C ecosystem. There are three main groups:
If you're a programmer, you hate undefined behavior, and avoid it
like the plague.
If you're a compiler writer, you love undefined behavior, because
it lets you do whatever you want.
If you're a member of the ISO C standards committee (and I admit
that to a degree I am speculating here), you think of undefined
behavior as a balancing test, of needing to weigh the tensions
inherent in what the first two groups would prefer.
This, I think, is the tragedy of C ("tragedy" in the dramatic,
Shakespearean sense).
[long exposition on the history of C]
My point here is that the users and developers of the language
were the same group, [elaboration]
But, as you pointed out, this is no longer the case. The two are
now distinct, with very different goals. [a consequence of which
is C usage is less uniform (my paraphrase)]
I think this is fair: pretty much no production OS is written in
pure ISO C, if they're written in C at all: they all use compiler
flags or custom toolchains to enable various extensions and pin
down aspects of UB they depend on in one form or another.
And this is the tragedy. This isn't how it started, and I don't
think the folks who created the language wanted it to go down this
way, but here we are. [rest omitted]
I wouldn't call it a tragedy, in fact just the opposite.
If C had
stayed in its original environment it never would have become as
ubiquitous and widespread as it is today. The original ecosystem
doesn't scale. By letting C, and also Unix, enter the public
sphere, a great benefit accrued to the world at large.
[snip] Moreover such non-standard
language usages are not limited to C -- the Rust language is also
used in the linux kernel, and there too some non-standard language
features are used in kernel code.
I don't mean to compare C and Rust. My position here is only that,
in my view, the complaints raised about C are misplaced. Others
are welcome to their own views on the subject.
On 09/05/2026 17:38, David Brown wrote:
On 09/05/2026 18:16, Bart wrote:
(You can also see from this that /nobody/ likes stdint.h types,
even though standardised from C99 which also introduced 'long long'
used here. That is another bugbear. Oh, I forgot, my criticism is
not not valid.)
Don't you realise that when you write things like that, you are only
demonstrating why so many people do not take you seriously? Have
you checked with every C programmer, and every person writing
systems that generate C code, and checked that none of them like the
<stdint.h> types? No? I thought not.
So, what's the figure?
I see this pattern frequently (sometimes every other project
seemingly) so they are unpopular for some. And we don't know if people
using uint8_t etc are doing so because they genuinely like it or feel
obliged to use it.
(Tim Rentsch also seems to avoid it here.)
In almost all cases where uint8_t
might be used, unsigned char works just as well.
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
Michael S <already5chosen@yahoo.com> writes:
On Sat, 09 May 2026 17:33:51 -0700
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
Right, you don't know what to call it. I think the term you're
probably looking for is "translation unit".
If you have something to say about splitting a C translation unit
(something I don't think I've ever had a need to do),
That surprises me greatly.
In my practice refactoring that includes splitting translation units
is rather common.
Or, may be, I misunderstood your above sentence and you meant that
you never had a need *to say* something about splitting etc...?
perhaps because
you've had difficulties doing so yourself, feel free to elaborate.
I didn't give it a lot of thought, but I haven't done a lot of
refactoring of C projects. My experience is of course not universal,
and may not be representative.
I don't recall refactoring existing code, primarily because the
original programmers used multiple translation units logically
dividing the code into functionly related segments, where necessary,
from the start.
An experienced C programmer uses independent translation
units without even thinking about it, when the application
is non-trivial. For many reasons, including reusability,
maintainability and collaboration.
Bart <bc@freeuk.com> writes:
On 10/05/2026 15:58, Scott Lurndal wrote:
Bart <bc@freeuk.com> writes:
Perhaps try asking why somebody would invent a new type name for uint8_t >>>> at all.
Strawman. Please provide examples of "somebody inventing a new type name >>> for uint8_t" (post standardization). One swallow doesn't make a summer, so a single example
from some obscure project you found on the WWW isn't partcularly
instructive.
You invite people to give examples, but then immediately qualify that by >>putting restrictions on quantity and popularity so that they can never win! >>
For other people's benefit:
typedef uint8_t byte;
They're explicitly using uint8_t specifically for the purpose
it was intended for. They fact that they have an alias could
be for dozens of reasons, including code reuse or compatability between older C compilers that didn't yet support <stdint.h> (with suitable
preprocessor code to define uint8_t on targets that don't support
<stdint.h>. See autotools.
They didn't do this because the programmer disliked uint8_t
or the stdint.h types in general.
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
In almost all cases where uint8_t
might be used, unsigned char works just as well.
Why "almost"? Where is the difference if any?
On 10/05/2026 03:01, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
My point had been that in my module scheme, it would be less work.Good for you.
So you don't have a problem you're trying to solve, and you don't
want advice about how to do something.
You keep forgetting context. It was a throwaway remark in a brief
discussion WH and I were having about module schemes.
(And I still don't know what to call a 'primary source file'; that is,
one of these files:
gcc one.c two.c three.c
and not a .h file, or a .c or other file that is included indirectly.)
You know what, if all possible answers to all C-related questions were
contained within the C standard, why does this group even exist?
Just post a link to the standard document and be done with it.
This group exists because not all possible answers to all C-related
questions are contained within the C standard. You pretend that
someone has made such a ridiculous claim, but unless I missed
something nobody has.
I'll try this again. You claimed that C "pretends to be a safe
language". That was a false claim. Will you either provide evidence
that it was correct or acknowledge that it was incorrect?
It happens that the first few paragraphs of Annex K are relevant
to your statement. If you inferred from that remark that I think
"all possible answers to all C-related questions were contained
within the C standard", that was a very wrong and silly inference.
I expect that you will refuse yet again to respond, but I'm prepared
to be pleasantly surprised.
I've glanced at appendix K.1 and saw nothing relevant there. It's
about exceeding arrray bounds.
I assuming that doing that would UB.
My question was (it is always important to keep conext!):
So, C can be unsafe even when you avoid all UB? Examples?
Really it comes down to what 'unsafe' means in a language, and in C,
whether it is tied to UB or can be more general.
But since 'unsafe' is not defined in the standard (not in N1570
anyway, where it used casually on only one instance), I expect you
don't know, and wouldn't want to speculate.
On 2026-05-10 16:15, Kalevi Kolttonen wrote:
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
In almost all cases where uint8_t
might be used, unsigned char works just as well.
Why "almost"? Where is the difference if any?
If uint8_t exists, CHAR_BIT must be 8, and unsigned char must therefore
meet the requirements to be the type that uint8_t is a typedef for.
However, the standard doesn't mandate it. If, for example, a machine supported two different 8-bit types, with the order of the bits from low
to high reversed between them, uint8_t could be one of those types, and unsigned char could be the other - the C standard imposes no
requirements that would be broken by that choice.
This is not something you're likely to ever see, just a possibility
allowed by the standard that we're extremely unlikely to see.
On 10/05/2026 14:03, David Brown wrote:
On 10/05/2026 13:29, Bart wrote:
I am not saying /I/ have the in-depth knowledge required to give a
good argument for changing the standards here - I am merely saying
that /you/ don't have that knowledge.
So, this is mystery: I am at fault for not knowing X, but you not at
fault for not knowing X?!
This particular thing was just some simple example I'd thought up:
Suppose I proposed for example that C should deprecate, then ban, theability to write:
...
But I can't see that it requires any deep knowledge of the standardto make such a proposal, or why somebody would require that of me in
order to even consider it.
Now it turns out that the C committee is actually looking at such a
proposal.
But funnily enough, no one has given me credit for that.
It requires constructs like i[A] to be deprecated, while still
allowing i + A.
That is also possible, but is not as simple a change, since C
currently requires them to be interchangeable, and that is baked in
to my compiler.
Not only do you not have the knowledge required to give an informed
opinion about making this particular change to the standards, you
don't have the knowledge required to give an informed opinion about
making / any/ changes to the standard, the C language, or
implementations.
This is not like making changes to your personal little languages or
your toy C compiler.
Why, what's the difference? At least I attempted to make the change to
see what would happen, and I tried it out on some real non-toy
code-bases.
My only mistake was thinking that C REQUIRED indexing syntax to be
tied to pointer arithmetic, but as far as I know, it currently does do
that, and will do so for some years yet.
But if we're allowed to separate them, then OK I'll have another go at
my compiler. It turns out to even simpler: I had to modify three lines
of code.
Now P+i and i+P are still allowed, but not i[A], only A[i]. All the
tests I tried before now still work.
So my toy compiler implements part of C2y!
The interesting thing is that to achieve it, I had to ignore my
knowledge of the current C standard (specifically 6.5.2.1p2).
SO *NOW* TELL ME WHAT I DID WRONG.
It seems to me that guys just want to constantly pick on me for
specious reasons.
On 10/05/2026 06:00, Janis Papanagnou wrote:
On 2026-05-10 01:45, Bart wrote:
On 09/05/2026 23:47, Keith Thompson wrote:I suppose the word you wanted to use is "translation unit".
Bart <bc@freeuk.com> writes:
[...]
Now look at what's involved in splitting a C module into two.
C doesn't have "modules".
You want to be /that/ pedantic?
This is exactly why I said that the C standard is your thing. If
somebody uses a term that doesn't appear in the standard, then it
doesn't exist.
No. That is a technical term used within the C standard and relates to
a subsequent representation of your source code within a compiler.
It is also C-specific. What is the generic term for one of the
discrete source files of a program?
Bart <bc@freeuk.com> writes:
I've glanced at appendix K.1 and saw nothing relevant there. It's
about exceeding arrray bounds.
Your false claim was that C "pretends" to be a safe language.
C pretends to be a safe language by saying all those naughty thingsare UB and should be avoided, at the same time, C compilers can be made
So, C can be unsafe even when you avoid all UB? Examples?
Do you still falsely claim that C pretends
to be safe?
Do you acknowledge that you were wrong? Was it a
deliberate exaggeration? Was it a deliberate lie?
Bonita Montero <Bonita.Montero@gmail.com> wrote:
Yes, that's a major problem with all 64 bit Unices.Use Windows with
that. On Windows long and int have the same size.
I have used Linux since the summer of 1998 and would never
ever even consider installing Windows. It is so disgusting.
(You can also see from this that /nobody/ likes stdint.h types, even
though standardised from C99 which also introduced 'long long' used
here. That is another bugbear. Oh, I forgot, my criticism is not not
valid.)
Bart <bc@freeuk.com> writes:
On 10/05/2026 05:39, Janis Papanagnou wrote:
Originally my language was created to run on a bare board with very
little memory and no existing software /at all/, not even an assembler.
/You/ try it.
Typical project in undergraduate computer science programs;
in my era, one wrote a recursive descent compiler for a
subset of Pascal[*] (or C) - the course took a single academic
quarter.
Bart <bc@freeuk.com> writes:
[...]
(You can also see from this that /nobody/ likes stdint.h types, even
though standardised from C99 which also introduced 'long long' used
here. That is another bugbear. Oh, I forgot, my criticism is not not
valid.)
Bart, you claimed here that literally *nobody* likes stdint.h types.
I like stdint.h types.
Your claim is therefore false.
Will you acknowledge that simple fact?
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
In almost all cases where uint8_t
might be used, unsigned char works just as well.
Why "almost"? Where is the difference if any?
As far as I know, ISO guarantees that
sizeof(unsigned char) is always 1 byte.
And operations on unsigned char are well defined,
including wrap-around. So I fail to see any
difference between unsigned char and uint8_t.
Bart, you claimed here that literally *nobody* likes stdint.h types.
I like stdint.h types.
On 11/05/2026 00:11, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
I've glanced at appendix K.1 and saw nothing relevant there. It'sYour false claim was that C "pretends" to be a safe language.
about exceeding arrray bounds.
People keep jumping to conclusions without asking for
clarification. This is what I said (quite a few posts back):
C pretends to be a safe language by saying all those naughty thingsare UB and should be avoided, at the same time, C compilers can be
made to do all that.
(I see now you quoted this yourself; I can have saved some time!)
The assumption made here
is that unsafe-ness arises in C from UB. Then
I suggest that, while the language itself washes it hands of it, it
lets the compiler do the dirty work (as well as pushing the
responsibility to the user, by allowing the compiler to do something
that is UB).
In a later follow-up to you I ask:
So, C can be unsafe even when you avoid all UB? Examples?
And yet later I ask for clarification for what it means to be 'unsafe'
and gave some examples of my own. I don't recall that being answered.
Do you still falsely claim that C pretends
to be safe?
"When you avoid all UB". You keep forgetting this bit.
Well, first tell me what it means for a language to be 'unsafe'. That
term has not been defined. Is it only what happens when UB is invoked,
or can it be at any time?
If you think I was wrong, then you can politely suggest that and offer
some enlightenment. Why become aggressive and give me the third
degree? Sometimes I feel like I'm in the dock.
So, reading between the lines, you seem to be suggesting that C /can/
be an unsafe language (whatever that means) whether or not UB is
involved.
Do you acknowledge that you were wrong? Was it a
deliberate exaggeration? Was it a deliberate lie?
Please stop this. If you don't agree with what I said, then post a couner-argument.
You should also look at the context: I was explaining the various
underhand, 'unsafe' things that are possible in C, which give it an
edge over competitors for systems work, then I suggest that many of
those are likely to be UB so not officially sanctioned.
Phew! (Mopping sweaty brow with a handkerchief.)--
Yes, I have heard that argument before. I am unconvinced that the
"value preserving" choice actually has any real advantages. I also
think it is a misnomer - it implies that "unsigned preserving" would
not preserve values, which is wrong.
Kalevi Kolttonen <kalevi@kolttonen.fi> wrote:
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
In almost all cases where uint8_t
might be used, unsigned char works just as well.
Why "almost"? Where is the difference if any?
As far as I know, ISO guarantees that
sizeof(unsigned char) is always 1 byte.
And operations on unsigned char are well defined,
including wrap-around. So I fail to see any
difference between unsigned char and uint8_t.
If machine has bytes bigger than 8 bit, then uint8_t will not
exit, so trying to use uint8_t will fail at compile time, which
may be good thing, if code depends on size being exactly 8
bits.
On 11/05/2026 01:10, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
(You can also see from this that /nobody/ likes stdint.h types, even
though standardised from C99 which also introduced 'long long' used
here. That is another bugbear. Oh, I forgot, my criticism is not not
valid.)
Bart, you claimed here that literally *nobody* likes stdint.h types.
I like stdint.h types.
Your claim is therefore false.
Will you acknowledge that simple fact?
From Google:
3. Hyperbole (Exaggeration) for Emphasis
"Nobody" is frequently used in hyperbolic statements to emphasize that
almost no one was there, or that the number of people was negligible.
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
In almost all cases where uint8_t
might be used, unsigned char works just as well.
Why "almost"? Where is the difference if any?
As far as I know, ISO guarantees that
sizeof(unsigned char) is always 1 byte.
And operations on unsigned char are well defined,
including wrap-around. So I fail to see any
difference between unsigned char and uint8_t.
Bart <bc@freeuk.com> writes:
On 11/05/2026 01:10, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
(You can also see from this that /nobody/ likes stdint.h types, even
though standardised from C99 which also introduced 'long long' used
here. That is another bugbear. Oh, I forgot, my criticism is not not
valid.)
Bart, you claimed here that literally *nobody* likes stdint.h types.
I like stdint.h types.
Your claim is therefore false.
Will you acknowledge that simple fact?
From Google:
3. Hyperbole (Exaggeration) for Emphasis
"Nobody" is frequently used in hyperbolic statements to emphasize that
almost no one was there, or that the number of people was negligible.
Yes, thank you, I know why hyperbole means. You're obviously saying
that your statement was hypberbole.
You used the word "nobody". You've repeatedly defended your
statement at great length and raised other questions, like asking
for "figures".
You could have said, in response to the first criticism of your
statement, that it was merely hyperbole.
It would have saved us
all a great deal of time.
On 11/05/2026 01:42, Keith Thompson wrote:[...]
You could have said, in response to the first criticism of your
statement, that it was merely hyperbole.
I think we've been here before. I was talking figuratively, and didn't
feel the need to point that out.
On 11/05/2026 00:11, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
I've glanced at appendix K.1 and saw nothing relevant there. It's
about exceeding arrray bounds.
Your false claim was that C "pretends" to be a safe language.
People keep jumping to conclusions without asking for clarification.
This is what I said (quite a few posts back):
C pretends to be a safe language by saying all those naughty thingsare UB and should be avoided, at the same time, C compilers can be made
to do all that.
(I see now you quoted this yourself; I can have saved some time!)
The assumption made here is that unsafe-ness arises in C from UB. Then I suggest that, while the language itself washes it hands of it, it lets
the compiler do the dirty work (as well as pushing the responsibility to
the user, by allowing the compiler to do something that is UB).
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
On 2026-05-10 16:15, Kalevi Kolttonen wrote:
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
In almost all cases where uint8_t
might be used, unsigned char works just as well.
Why "almost"? Where is the difference if any?
If uint8_t exists, CHAR_BIT must be 8, and unsigned char must therefore
meet the requirements to be the type that uint8_t is a typedef for.
However, the standard doesn't mandate it. If, for example, a machine
supported two different 8-bit types, with the order of the bits from low
to high reversed between them, uint8_t could be one of those types, and
unsigned char could be the other - the C standard imposes no
requirements that would be broken by that choice.
This is not something you're likely to ever see, just a possibility
allowed by the standard that we're extremely unlikely to see.
I see, thanks. So from a practical point of view today, they
appear pretty identical.
On 11/05/2026 00:11, Keith Thompson wrote:
[snip]People keep jumping to conclusions without asking for clarification.
This is what I said (quite a few posts back):
C pretends to be a safe language by saying all those naughty thingsare UB and should be avoided, at the same time, C compilers can be made
to do all that.
(I see now you quoted this yourself; I can have saved some time!)
The assumption made here is that unsafe-ness arises in C from UB. Then I >suggest that, while the language itself washes it hands of it, it lets
the compiler do the dirty work (as well as pushing the responsibility to
the user, by allowing the compiler to do something that is UB).
In a later follow-up to you I ask:
So, C can be unsafe even when you avoid all UB? Examples?
And yet later I ask for clarification for what it means to be 'unsafe'
and gave some examples of my own. I don't recall that being answered.
Do you still falsely claim that C pretends
to be safe?
"When you avoid all UB". You keep forgetting this bit.
Well, first tell me what it means for a language to be 'unsafe'. That
term has not been defined. Is it only what happens when UB is invoked,
or can it be at any time?
If you think I was wrong, then you can politely suggest that and offer
some enlightenment. Why become aggressive and give me the third degree? >Sometimes I feel like I'm in the dock.
So, reading between the lines, you seem to be suggesting that C /can/ be
an unsafe language (whatever that means) whether or not UB is involved.
Do you acknowledge that you were wrong? Was it a
deliberate exaggeration? Was it a deliberate lie?
Please stop this. If you don't agree with what I said, then post a >couner-argument.
You should also look at the context: I was explaining the various
underhand, 'unsafe' things that are possible in C, which give it an edge >over competitors for systems work, then I suggest that many of those are >likely to be UB so not officially sanctioned.
Phew! (Mopping sweaty brow with a handkerchief.)
The comiler doesn't "do something that is UB." The compiler
detects that something in a program is undefined behavior and
does something as a result (that "something" may be nothing).
[snip]
I think that your formulation "allowing the compiler to do
something that is UB" is quite misleading. Standard says
that some things are UB. If UB appears in a program, it
is programmer who put it there.
example
Essential part of UB is
that it is programmer responsibility to avoid UB.
Specific compiler may be helpful by detecting UB or
defining some useful behaviour, but in general compiler
is allowed to proceed blindy, trusting that there are
no UB in the source.
On 11/05/2026 00:11, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
I've glanced at appendix K.1 and saw nothing relevant there. It's
about exceeding arrray bounds.
Your false claim was that C "pretends" to be a safe language.
People keep jumping to conclusions without asking for clarification.
This is what I said (quite a few posts back):
C pretends to be a safe language by saying all those naughty thingsare UB and should be avoided, at the same time, C compilers can be made
to do all that.[...]
Bart <bc@freeuk.com> writes:
On 09/05/2026 23:47, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
Now look at what's involved in splitting a C module into two.
C doesn't have "modules".
You want to be /that/ pedantic?
This is exactly why I said that the C standard is your thing. If
somebody uses a term that doesn't appear in the standard, then it
doesn't exist.
C doesn't have a concept of 'module' per se. Perhaps you're looking
for "translation unit"?
So, what is involved in splitting a ... I don't even know what to call
it - a single .c 'source file'? Well, a lot of messy work.
An experienced C programmer uses independent translation units
without even thinking about it, when the application is
non-trivial. For many reasons, including reusability,
maintainability and collaboration. There are codebases that
have well over a million SLOC.
You are the only programmer who has ever claimed
that an entire application must be contained within a single
translation unit. It sounds like you've never actually worked
with either a team, or a non-trivial application.
The context was why C became the dominant language for systems
programming. I offered that as an example. If it helped C over a
potential rival which wasn't used to implement a major OS, then it
strikes me as an unfair advantage.
Suppose Unix was implemented in some other language, then if C was still
more successful over rivals, that would have been fairer.
Fair? What is your definition of "fair" with respect to programming languages?
In article <86mry8so39.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
[sic: as a sequence, the Fibonacci numbers are undefined for
$n<0$, but this is a pedagogical example, so let's ignore that]
A comment on that further down...
[snip]
(Origin 0, not 1.)
fibonacci(0) is 0. There is no other.
You are correct, and I was incorrect stating that Fib(n) is
undefined for n<0. [...]
[snip]
Here is my current favorite fast fibonacci function (which happens
to be written in a functional and tail-recursive style):
static ULL ff( ULL, ULL, unsigned, unsigned );
static unsigned lone( unsigned );
ULL
ffibonacci( unsigned n ){
return ff( 1, 0, lone( n ), n );
}
ULL
ff( ULL a, ULL b, unsigned m, unsigned n ){
ULL c = a+b;
return
m & n ? ff( (a+c)*b, b*b+c*c, m>>1, n ) :
m ? ff( a*a+b*b, (a+c)*b, m>>1, n ) :
/*****/ b;
}
unsigned
lone( unsigned n ){
return n |= n>>1, n |= n>>2, n |= n>>4, n ^ n>>1;
}
Much faster than the linear version.
Very nice. 64-bit `unsigned long long` overflows for n>93, so I
question how much it matters in practice, though; surely if
calling this frequently you simply cache it in some kind of
table?
I wondered how this compared to Binet's Formula, using floating
point:
```
unsigned long long
binet_fib(unsigned int n)
{
const long double sqrt5 = sqrtl(5.);
long double fn =
(powl(1. + sqrt5, n) - powl(1. - sqrt5, n)) /
(powl(2., n) * sqrt5);
return llroundl(fn);
}
```
Sadly, my quick test suggests accuracy suffers (presumably due
to floating point) for the larger representable values in the
sequence; specifically, n>90. As a result I didn't bother
attempting to benchmark it.
Dan Cross <cross@spitfire.i.gajendra.net> wrote:
In article <86mry8so39.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
<snip>
Here is my current favorite fast fibonacci function (which happens
to be written in a functional and tail-recursive style):
static ULL ff( ULL, ULL, unsigned, unsigned );
static unsigned lone( unsigned );
ULL
ffibonacci( unsigned n ){
return ff( 1, 0, lone( n ), n );
}
ULL
ff( ULL a, ULL b, unsigned m, unsigned n ){
ULL c = a+b;
return
m & n ? ff( (a+c)*b, b*b+c*c, m>>1, n ) :
m ? ff( a*a+b*b, (a+c)*b, m>>1, n ) :
/*****/ b;
}
unsigned
lone( unsigned n ){
return n |= n>>1, n |= n>>2, n |= n>>4, n ^ n>>1;
}
Much faster than the linear version.
Very nice. 64-bit `unsigned long long` overflows for n>93, so I
question how much it matters in practice, though; surely if
calling this frequently you simply cache it in some kind of
table?
I wondered how this compared to Binet's Formula, using floating
point:
```
unsigned long long
binet_fib(unsigned int n)
{
const long double sqrt5 = sqrtl(5.);
long double fn =
(powl(1. + sqrt5, n) - powl(1. - sqrt5, n)) /
(powl(2., n) * sqrt5);
return llroundl(fn);
}
```
Sadly, my quick test suggests accuracy suffers (presumably due
to floating point) for the larger representable values in the
sequence; specifically, n>90. As a result I didn't bother
attempting to benchmark it.
Fast version of fibonacci depend on fast computation of matrix
power (of a two by two matrix). One way to have fast matrix power
is to diagonalize and use floating point (which is essentially
what is done by Binet's Formula), but as you noted this needs extra precision. Tim's version looks like somewhat obscure variant
of fast matrix powering.
This has advantage of doing all computations on integers.
Of course, to make sense this must use increased precision,
preferably arbitrary precision arithmetic.
[ C's characteristics ]
To allow one to shoot themselves in the foot! Both feet. ;^)
[...]
Do people understand mine? 90% of my posts are about defending my
position especially when attacked on multiple fronts.
I can say something and immediately I get attacked and accused of not knowing this or that, by people who get the wrong end of the stick or
pick up on a choice of word I used.
The context was why C became the dominant language for systems
programming. I offered that as an example. If it helped C over a
potential rival which wasn't used to implement a major OS, then it
strikes me as an unfair advantage.
Keith already said that it was an advantage. Insisting on a "unfair"
qualification is inappropriate, especially without ethical measure
and without any substantial evidence. (That wording reminds me the
wording in the communication style of the current POTUS.)
Hmm, weren't Microsoft accused of unfair practice by bundling their
browsers with Windows?
In article <10tp4o8$1l93k$7@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
On 2026-05-09 03:36, Dan Cross wrote:
Maybe, maybe not, depending on the exact hashing function and
the values it uses. Since K&R2 came up elsewhere, consider the
hash function the presented on pp 128-129:
(I don't have that version available so the reference doesn't
help me much.)
I mean, I gave you the function; you quoted it. :-)
[...]
On 2026-05-08 06:43, David Brown wrote:
...
Yes, I have heard that argument before. I am unconvinced that the
"value preserving" choice actually has any real advantages. I also
think it is a misnomer - it implies that "unsigned preserving" would
not preserve values, which is wrong.
Unsigned-preserving rules would convert a signed value which might be negative to unsigned type more frequently than the value preserving
rules do. Such a conversion is not value-preserving.
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
On 2026-05-10 16:15, Kalevi Kolttonen wrote:
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
In almost all cases where uint8_t
might be used, unsigned char works just as well.
Why "almost"? Where is the difference if any?
If uint8_t exists, CHAR_BIT must be 8, and unsigned char must therefore
meet the requirements to be the type that uint8_t is a typedef for.
However, the standard doesn't mandate it. If, for example, a machine
supported two different 8-bit types, with the order of the bits from low
to high reversed between them, uint8_t could be one of those types, and
unsigned char could be the other - the C standard imposes no
requirements that would be broken by that choice.
This is not something you're likely to ever see, just a possibility
allowed by the standard that we're extremely unlikely to see.
I see, thanks. So from a practical point of view today, they
appear pretty identical.
On 11/05/2026 00:11, Keith Thompson wrote:
In a later follow-up to you I ask:
So, C can be unsafe even when you avoid all UB? Examples?
And yet later I ask for clarification for what it means to be 'unsafe'
and gave some examples of my own. I don't recall that being answered.
On 10/05/2026 15:58, Scott Lurndal wrote:
Bart <bc@freeuk.com> writes:
On 09/05/2026 17:38, David Brown wrote:
On 09/05/2026 18:16, Bart wrote:
(You can also see from this that /nobody/ likes stdint.h types, even >>>>> though standardised from C99 which also introduced 'long long' used
here. That is another bugbear. Oh, I forgot, my criticism is not not >>>>> valid.)
Don't you realise that when you write things like that, you are only
demonstrating why so many people do not take you seriously?-a Have you >>>> checked with every C programmer, and every person writing systems that >>>> generate C code, and checked that none of them like the <stdint.h>
types?-a No?-a I thought not.
So, what's the figure?
One doesn't understand your question.-a Is 'figure' some britishism
in this context?-a Or do you expect David to provide an accurate
percentage describing the preferences of every C programmer on
the planet (or in orbit, if any of the current station occupants
can program in C :-).
Personally, for my working code, the stdint types are used
extensively.
You know, I could well be right, and nobody does like them, apart of
course from people here. Instead they could just be tolerated.
I doubt whether they are loved, otherwise we'd see those _t suffixes in other languages too because they look so good.
Perhaps try asking why somebody would invent a new type name for uint8_t >>> at all.
Strawman.-a Please provide examples of "somebody inventing a new type name >> for uint8_t" (post standardization).-a-a One swallow doesn't make a
summer, so a single example
from some obscure project you found on the WWW isn't partcularly
instructive.
You invite people to give examples, but then immediately qualify that by putting restrictions on quantity and popularity so that they can never win!
For other people's benefit:
-a typedef uint8_t byte;
(From: https://github.com/arduino/ArduinoCore-avr/blob/master/cores/ arduino/Arduino.h)
-a typedef int64_t mz_int64;
(From a compression library called "miniz")
-a typedef uint32_t Uint32;
(From SDL2 header files)
On 2026-05-10 14:37, Dan Cross wrote:
In article <10tp4o8$1l93k$7@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
On 2026-05-09 03:36, Dan Cross wrote:
Maybe, maybe not, depending on the exact hashing function and
the values it uses. Since K&R2 came up elsewhere, consider the
hash function the presented on pp 128-129:
(I don't have that version available so the reference doesn't
help me much.)
I mean, I gave you the function; you quoted it. :-)
Erm, no. I referred to something from an earlier K&R release.
The algorithm was different from the one you posted,
and the
modulus was also different; using 100 (2*2*5*5) vs. 101 (this
is a prime) makes a difference. - It seems the newer K&R that
you were referring to used a better modulus than the old book.
Never mind.
Bart <bc@freeuk.com> wrote:
On 11/05/2026 00:11, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
I've glanced at appendix K.1 and saw nothing relevant there. It's
about exceeding arrray bounds.
Your false claim was that C "pretends" to be a safe language.
People keep jumping to conclusions without asking for clarification.
This is what I said (quite a few posts back):
C pretends to be a safe language by saying all those naughty thingsare UB and should be avoided, at the same time, C compilers can be made
to do all that.
(I see now you quoted this yourself; I can have saved some time!)
The assumption made here is that unsafe-ness arises in C from UB. Then I
suggest that, while the language itself washes it hands of it, it lets
the compiler do the dirty work (as well as pushing the responsibility to
the user, by allowing the compiler to do something that is UB).
You are seriously confused by what other people consider as
"safe language".
First, I do not think it is possible to
give satisfactory definition of safety, either get the idea
or not. One popular attempt at definition is that language
is safe if no untrapped errors are possible. Of course, this
definition has trouble because then one needs to say what
an error is. Resonable definition could be that there is an
error if program is doing different thing than intended by
its creator. But as you noted there are errors that
language implementation can not reasonably detect so clearly
attempt above + this definiot on error is not satisfactory.
So we need to restrict what we consider to be an error.
When talking about language safety posible (and popular)
approach is restrict errors to things that break language
rules, like using out of bound array indices or overflow
in C signed arithmetic.
Now, if you look at UB, UB in particular means that
implementation is not obliged to detect errors. So
UB in language definition means that language is more
or less unsafe.
I think that your formulation "allowing the compiler to do
something that is UB" is quite misleading. Standard says
that some things are UB. If UB appears in a program, it
is programmer who put it there. Essential part of UB is
that it is programmer responsibility to avoid UB.
Specific compiler may be helpful by detecting UB or
defining some useful behaviour, but in general compiler
is allowed to proceed blindy, trusting that there are
no UB in the source.
Coming back to safety, definig errors as violations of
language rules is not fully satisfactory too. Namely,
using language that "allow anything", like assembler,
there will be no violation of language rules, but clearly
such language does not help in detecting error. So
to meaningfuly talk about language safety there must
be rules such that some classes of error lead to
violation of rule and violation must be detected. C
has type rules and violations of type rules will
detect some errors at compile time. But by design C
does not require any error detection at runtime so
clearly is unsafe.
Now, unqualified "safe" is really a fuzzy concept, as
there is no hope of detecting all errors and while
detecting some errors is theoretically possible
cost of checking could be prohibitive. So basically
"safe" boils down to "due diligence": language rules
forbid things that are recognized as likely to be
errors and language uses state of the art methods
to detect or prevent violations of the rules.
Let me add that basically from time where Pascal
were invented it was known how to define a language
rich enough to do most real world task, having rules
which eliminate substantial fraction of errors and
where _all_ violations of language rules are detected.
So languages that allow undetected violations of rules
are consdered more or less unsafe.
cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]
The comiler doesn't "do something that is UB." The compiler
detects that something in a program is undefined behavior and
does something as a result (that "something" may be nothing).
Or the compiler *doesn't* detect that something in the program
has undefined behavior, but assumes that the behavior is defined,
and generates code consistent with that assumption. A big part of
the rationale behind "undefined behavior" is that compilers don't
have to detect it.
An example:
#include <stdio.h>
#include <time.h>
#include <limits.h>
int main(void) {
int n = time(NULL) > 0 ? INT_MAX : 0;
printf("n=%d, n+1=%d, ", n, n+1);
printf("%d %s %d\n",
n+1,
n+1 > n ? ">" : n+1 == n ? "==" : "<",
n);
}
With different compilers and optimization settings, I get any of the following outputs on my system:
n=2147483647, n+1=1, 1 > 2147483647
n=2147483647, n+1=-2147483648, -2147483648 < 2147483647
n=2147483647, n+1=-2147483648, -2147483648 > 2147483647
I'm fairly sure that none of the compilers detect that there will
be undefined behavior at run time. The fact that time(NULL) is
greater than 0 is not something I'd expect a compiler to assume.
(That's why I added that to the program.) Rather, some compilers
assume that the behavior is defined, and therefore that n + 1 must
be greater than n.
On 5/10/2026 8:42 AM, Scott Lurndal wrote:
Bart <bc@freeuk.com> writes:
On 09/05/2026 23:47, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
Now look at what's involved in splitting a C module into two.
C doesn't have "modules".
You want to be /that/ pedantic?
This is exactly why I said that the C standard is your thing. If
somebody uses a term that doesn't appear in the standard, then it
doesn't exist.
C doesn't have a concept of 'module' per se.-a Perhaps you're looking
for "translation unit"?
So, what is involved in splitting a ... I don't even know what to call
it - a single .c 'source file'? Well, a lot of messy work.
An experienced C programmer uses independent translation units
without even thinking about it, when the application is
non-trivial.-a-a For many reasons, including reusability,
maintainability and collaboration.-a There are codebases that
have well over a million SLOC.
You are the only programmer who has ever claimed
that an entire application must be contained within a single
translation unit.-a It sounds like you've never actually worked
with either a team, or a non-trivial application.
I wonder if his system has pre-compiled header support.
On 2026-05-10 13:44, Bart wrote:
(How likely do you think is it that it's the fault of the hostile
environment and your personality and communication or the level of
expertise has nothing to do with it?)
On 11/05/2026 05:28, Chris M. Thomasson wrote:
On 5/10/2026 8:42 AM, Scott Lurndal wrote:
[snip]
An experienced C programmer uses independent translation units
without even thinking about it, when the application is
non-trivial.-a-a For many reasons, including reusability,
maintainability and collaboration.-a There are codebases that
have well over a million SLOC.
You are the only programmer who has ever claimed
that an entire application must be contained within a single
translation unit.-a It sounds like you've never actually worked
with either a team, or a non-trivial application.
I wonder if his system has pre-compiled header support.
SL is talking nonsense.
Because sometimes I use tools that transpile whole programs of dozens of >modules into a single C source, for the purpose of compiling into an >executable (another single file!), he thinks I advocate writing and >developing projects in such a single file too!
Nobody has a problem with distributing an EXE file as one monolithic file.
But if EXEs are a problem, due to AV, or to mistrust, then the next step >back might be some textual format that could be ASM, IR, or C. Then the >end-user can run apply that final step themselves.
I've used both ASM and C, but the latter is preferable as local >optimisations can be appplied.
That is not however the original source. Scott Lurndal cannot grasp this >when that file happens to be 'C'.
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
In almost all cases where uint8_t
might be used, unsigned char works just as well.
Why "almost"? Where is the difference if any?
As far as I know, ISO guarantees that
sizeof(unsigned char) is always 1 byte.
And operations on unsigned char are well defined,
including wrap-around. So I fail to see any
difference between unsigned char and uint8_t.
kalevi@kolttonen.fi (Kalevi Kolttonen) writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
In almost all cases where uint8_t
might be used, unsigned char works just as well.
Why "almost"? Where is the difference if any?
As far as I know, ISO guarantees that
sizeof(unsigned char) is always 1 byte.
On at least one system with a working C compiler,
a byte is 9 bits, not 8. If I wanted an 8-bit datum
on that system, I'd have to use uint8_t.
(Now, I haven't used that system in decades, but it
still exists and powers a large fraction of the
worlds airline reservation and operational functions).
And operations on unsigned char are well defined,
including wrap-around. So I fail to see any
difference between unsigned char and uint8_t.
Indeed. Although from my perspective, the use of the
stdint types clearly documents the programmers
intent, whereas a typedef such as BYTE or WORD
is inherently ambiguous and would require a programmer
to look up the definition of such types in the
application to determine the original programmers intent.
On 2026-05-10 20:10, Keith Thompson wrote:
...
Bart, you claimed here that literally *nobody* likes stdint.h types.
I like stdint.h types.
Me too.
[... ]Rust, which /people/ sometimes claim will give you
bug-free programs once you managed to get it to compile, [...]
To get back to C and UB, if that 'safe' line isn't on the boundary
between non-UB and UB, then what does the boundary mean? Is it just >deterministic vs. non-deterministic behaviour?
This is back to the other topic as to what makes a practical systems >language.
On 11/05/2026 05:28, Chris M. Thomasson wrote:
On 5/10/2026 8:42 AM, Scott Lurndal wrote:
Bart <bc@freeuk.com> writes:
On 09/05/2026 23:47, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
Now look at what's involved in splitting a C module into two.
C doesn't have "modules".
You want to be /that/ pedantic?
This is exactly why I said that the C standard is your thing. If
somebody uses a term that doesn't appear in the standard, then it
doesn't exist.
C doesn't have a concept of 'module' per se.-a Perhaps you're looking
for "translation unit"?
So, what is involved in splitting a ... I don't even know what to call >>>> it - a single .c 'source file'? Well, a lot of messy work.
An experienced C programmer uses independent translation units
without even thinking about it, when the application is
non-trivial.-a-a For many reasons, including reusability,
maintainability and collaboration.-a There are codebases that
have well over a million SLOC.
You are the only programmer who has ever claimed
that an entire application must be contained within a single
translation unit.-a It sounds like you've never actually worked
with either a team, or a non-trivial application.
I wonder if his system has pre-compiled header support.
SL is talking nonsense.
Because sometimes I use tools that transpile whole programs of dozens of >modules into a single C source, for the purpose of compiling into an >executable (another single file!), he thinks I advocate writing and >developing projects in such a single file too!
Nobody has a problem with distributing an EXE file as one monolithic file.
In article <10tsfvd$11qhe$4@dont-email.me>, Bart <bc@freeuk.com> wrote:
On 11/05/2026 05:28, Chris M. Thomasson wrote:
On 5/10/2026 8:42 AM, Scott Lurndal wrote:
[snip]
An experienced C programmer uses independent translation units
without even thinking about it, when the application is
non-trivial.-a-a For many reasons, including reusability,
maintainability and collaboration.-a There are codebases that
have well over a million SLOC.
You are the only programmer who has ever claimed
that an entire application must be contained within a single
translation unit.-a It sounds like you've never actually worked
with either a team, or a non-trivial application.
I wonder if his system has pre-compiled header support.
SL is talking nonsense.
No, he's really not.
Because sometimes I use tools that transpile whole programs of dozens of
modules into a single C source, for the purpose of compiling into an
executable (another single file!), he thinks I advocate writing and
developing projects in such a single file too!
You are the one making a big deal out of the fact that whole
programs are in single source files.
Executable object files (to use the ELF terminology) are a
completely different matter.
Nobody has a problem with distributing an EXE file as one monolithic file.
Actually, many do.
If you are only concerned with a single (as you called it)
"monolithic" "EXE" file, then yeah, it's tautalogically true
that that is a single file.
That is not however the original source. Scott Lurndal cannot grasp this
when that file happens to be 'C'.
Sure he can. SQLite does that. It's a well-known technique.
You are the only programmer who has ever claimed
that an entire application must be contained within a single
translation unit. It sounds like you've never actually worked
with either a team, or a non-trivial application.
You are moving the goalposts because you were using your own
terminology and got pushbacks, and you seem constitutionally
incapable of accepting when people tell you what you wrote is
ambiguous or incorrect.
scott@slp53.sl.home (Scott Lurndal) writes:
kalevi@kolttonen.fi (Kalevi Kolttonen) writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
In almost all cases where uint8_t
might be used, unsigned char works just as well.
Why "almost"? Where is the difference if any?
As far as I know, ISO guarantees that
sizeof(unsigned char) is always 1 byte.
On at least one system with a working C compiler,
a byte is 9 bits, not 8. If I wanted an 8-bit datum
on that system, I'd have to use uint8_t.
If a byte is 9 bits (ie, if CHAR_BIT == 9) there cannot
be a uint8_t type. The fixed-width types are not allowed
to have padding bits.
Indeed. Although from my perspective, the use of the
stdint types clearly documents the programmers
intent, whereas a typedef such as BYTE or WORD
is inherently ambiguous and would require a programmer
to look up the definition of such types in the
application to determine the original programmers intent.
BYTE and WORD are poor choices for type names, no doubt
about that. On the other hand, in many or most cases
so are [u]intNN_t; they simultaneously convey both too
little and too much information. There is a certain kind
of programming where the fixed-width types are genuinely
helpful; unfortunately though they are used a lot more
widely than circumstances where they are helpful.
Bart <bc@freeuk.com> writes:
On 11/05/2026 05:28, Chris M. Thomasson wrote:
On 5/10/2026 8:42 AM, Scott Lurndal wrote:
Bart <bc@freeuk.com> writes:
On 09/05/2026 23:47, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
Now look at what's involved in splitting a C module into two.
C doesn't have "modules".
You want to be /that/ pedantic?
This is exactly why I said that the C standard is your thing. If
somebody uses a term that doesn't appear in the standard, then it
doesn't exist.
C doesn't have a concept of 'module' per se.-a Perhaps you're looking
for "translation unit"?
So, what is involved in splitting a ... I don't even know what to call >>>>> it - a single .c 'source file'? Well, a lot of messy work.
An experienced C programmer uses independent translation units
without even thinking about it, when the application is
non-trivial.-a-a For many reasons, including reusability,
maintainability and collaboration.-a There are codebases that
have well over a million SLOC.
You are the only programmer who has ever claimed
that an entire application must be contained within a single
translation unit.-a It sounds like you've never actually worked
with either a team, or a non-trivial application.
I wonder if his system has pre-compiled header support.
SL is talking nonsense.
Really.
Because sometimes I use tools that transpile whole programs of dozens of
modules into a single C source, for the purpose of compiling into an
executable (another single file!), he thinks I advocate writing and
developing projects in such a single file too!
Nobody has a problem with distributing an EXE file as one monolithic file.
Actually, many (if not most) of us distribute applications. The application my CPOE ships includes a fairly small ELF (7MB text) executable, more than fifty shared objects (DLL in windows terminology), manual pages (nroff), several small stand-alone utilities and other collateral.
A single ELF executable is very seldom shipped stand-alone in the real world.
A single ELF executable is very seldom shipped stand-alone in thereal world.
On 10/05/2026 16:47, Bart wrote:[...]
This I just discovered by chance. It's a small Reddit language project
which here transpiles to C:
-a-a-a-a-a-a-a self.emit("// Core types");
-a-a-a-a-a-a-a self.emit("typedef int64_t Int;");
-a-a-a-a-a-a-a self.emit("typedef int8_t Int8;");
-a-a-a-a-a-a-a self.emit("typedef int16_t Int16;");
-a-a-a-a-a-a-a self.emit("typedef int32_t Int32;");
-a-a-a-a-a-a-a self.emit("typedef int64_t Int64;");
-a-a-a-a-a-a-a self.emit("typedef uint64_t UInt;");
-a-a-a-a-a-a-a self.emit("typedef uint8_t UInt8;");
-a-a-a-a-a-a-a ...
You'd think that if transpling to C anyway, they can tolerate using "int64_t" in the generated C. But apparently not.
I don't blame them; I do the same:
typedef signed char-a-a-a-a-a-a-a-a-a-a-a-a i8;
typedef short-a-a-a-a-a-a-a-a-a-a-a-a-a-a-a-a-a-a i16;
typedef int-a-a-a-a-a-a-a-a-a-a-a-a-a-a-a-a-a-a-a-a i32;
typedef long long int-a-a-a-a-a-a-a-a-a-a i64;
typedef unsigned char-a-a-a-a-a-a-a-a-a-a u8;
typedef unsigned short-a-a-a-a-a-a-a-a-a u16;
typedef unsigned int-a-a-a-a-a-a-a-a-a-a-a u32;
typedef unsigned long long int-a u64;
In this case however I don't use any standard headers.
[...]
James Kuyper <jameskuyper@alumni.caltech.edu> writes:
On 2026-05-10 20:10, Keith Thompson wrote:
...
Bart, you claimed here that literally *nobody* likes stdint.h types.
I like stdint.h types.
Me too.
Me three.
Alhough "like" and "dislike" are emotions, not logic. Those
types are part of the language, and they should be used when appropriate.
Bonita Montero <Bonita.Montero@gmail.com> wrote:
Yes, that's a major problem with all 64 bit Unices.Use Windows with
that. On Windows long and int have the same size.
I have used Linux since the summer of 1998 and would never
ever even consider installing Windows. It is so disgusting.
Clearly he actually thinks I'm advocating using a single source file for
any kind of project, for actual development rather than a distribution medium. Either that or he's deliberately spewing misinformation.
On 11/05/2026 02:31, James Kuyper wrote:
On 2026-05-08 06:43, David Brown wrote:
...
Yes, I have heard that argument before. I am unconvinced that the
"value preserving" choice actually has any real advantages. I also
think it is a misnomer - it implies that "unsigned preserving" would
not preserve values, which is wrong.
Unsigned-preserving rules would convert a signed value which might be
negative to unsigned type more frequently than the value preserving
rules do. Such a conversion is not value-preserving.
If you have a signed value, you have a signed type. Unsigned-preserving >rules are also signed-preserving - smaller unsigned types promote to
bigger unsigned types, while smaller signed types promote to bigger
signed types. I don't think anyone ever suggested smaller signed types >should promote to larger unsigned types.
Perhaps I am being bone-headed here and missing something obvious.
(Given that the C committee put in a lot of effort and came to a
different conclusion, it seems very likely that I'm missing something.)
Unsigned-preserving promotions would, AFAICS, preserve value and
signedness :
unsigned short -> unsigned int
signed short -> signed int
Value-preserving promotions would preserve values too :
unsigned short -> signed int
signed short -> signed int
The unsigned-preserving promotions could also safely be applied even if >short is the same size as int - that is not the case for the "always
promote to signed int" rules.
This I just discovered by chance. It's a small Reddit language project[...]
which here transpiles to C:
self.emit("// Core types");
self.emit("typedef int64_t Int;");
self.emit("typedef int8_t Int8;");
self.emit("typedef int16_t Int16;");
self.emit("typedef int32_t Int32;");
self.emit("typedef int64_t Int64;");
self.emit("typedef uint64_t UInt;");
self.emit("typedef uint8_t UInt8;");
...
You'd think that if transpling to C anyway, they can tolerate using
"int64_t" in the generated C. But apparently not.
On 11/05/2026 03:48, Keith Thompson wrote:[...]
With different compilers and optimization settings, I get any of the
following outputs on my system:
n=2147483647, n+1=1, 1 > 2147483647
n=2147483647, n+1=-2147483648, -2147483648 < 2147483647
n=2147483647, n+1=-2147483648, -2147483648 > 2147483647
I'm fairly sure that none of the compilers detect that there will
be undefined behavior at run time. The fact that time(NULL) is
greater than 0 is not something I'd expect a compiler to assume.
(That's why I added that to the program.) Rather, some compilers
assume that the behavior is defined, and therefore that n + 1 must
be greater than n.
I expected an output that looks like that middle line, which is the
most intuitive if you accept that integers have a limited capacity and
will wrap, when represented as 32-bit two's complement.
Scott said this:
You are the only programmer who has ever claimed
that an entire application must be contained within a single
translation unit. It sounds like you've never actually worked
with either a team, or a non-trivial application.
Clearly he actually thinks I'm advocating using a single source file
for any kind of project, for actual development rather than a
distribution medium. Either that or he's deliberately spewing
misinformation.
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
scott@slp53.sl.home (Scott Lurndal) writes:
kalevi@kolttonen.fi (Kalevi Kolttonen) writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
In almost all cases where uint8_t
might be used, unsigned char works just as well.
Why "almost"? Where is the difference if any?
As far as I know, ISO guarantees that
sizeof(unsigned char) is always 1 byte.
On at least one system with a working C compiler,
a byte is 9 bits, not 8. If I wanted an 8-bit datum
on that system, I'd have to use uint8_t.
If a byte is 9 bits (ie, if CHAR_BIT == 9) there cannot
be a uint8_t type. The fixed-width types are not allowed
to have padding bits.
That was a 36-bit system. It could easly create a
uint8_t value from 1/9th of two 72-bit words;
so no padding bits required.
Indeed. Although from my perspective, the use of the
stdint types clearly documents the programmers
intent, whereas a typedef such as BYTE or WORD
is inherently ambiguous and would require a programmer
to look up the definition of such types in the
application to determine the original programmers intent.
BYTE and WORD are poor choices for type names, no doubt
about that. On the other hand, in many or most cases
so are [u]intNN_t; they simultaneously convey both too
little and too much information. There is a certain kind
of programming where the fixed-width types are genuinely
helpful; unfortunately though they are used a lot more
widely than circumstances where they are helpful.
The programming I do
(mainly kernel programming, SoC simulation,
firmware) all naturally require the fixed-width types.
For other apps, int, long, float, double are preferred
to INT, LONG, FLOAT, DOUBLE (which seems to be the
way windows programmers code)[*]
[*] which probably dates back to 16-bit windows
and their methods of maintaining backward compatability
across two subsequent (32, 64) x86 processor architectures
plus MIPS et alia.
On at least one system with a working C compiler,[...]
a byte is 9 bits, not 8. If I wanted an 8-bit datum
on that system, I'd have to use uint8_t.
(Now, I haven't used that system in decades, but it
still exists and powers a large fraction of the
worlds airline reservation and operational functions).
Bart <bc@freeuk.com> writes:
[...]
This I just discovered by chance. It's a small Reddit language project[...]
which here transpiles to C:
self.emit("// Core types");
self.emit("typedef int64_t Int;");
self.emit("typedef int8_t Int8;");
self.emit("typedef int16_t Int16;");
self.emit("typedef int32_t Int32;");
self.emit("typedef int64_t Int64;");
self.emit("typedef uint64_t UInt;");
self.emit("typedef uint8_t UInt8;");
...
You'd think that if transpling to C anyway, they can tolerate using
"int64_t" in the generated C. But apparently not.
So what?
Some people like the <stdint.h> types. Some people don't. Everyone
here knows that. Showing us yet another example of someone renaming
them proves nothing.
What is your point?
BYTE and WORD are poor choices for type names, no doubt[...]
about that.
Bart <bc@freeuk.com> writes:
On 11/05/2026 03:48, Keith Thompson wrote:[...]
With different compilers and optimization settings, I get any of the
following outputs on my system:
n=2147483647, n+1=1, 1 > 2147483647
n=2147483647, n+1=-2147483648, -2147483648 < 2147483647
n=2147483647, n+1=-2147483648, -2147483648 > 2147483647
I'm fairly sure that none of the compilers detect that there will
be undefined behavior at run time. The fact that time(NULL) is
greater than 0 is not something I'd expect a compiler to assume.
(That's why I added that to the program.) Rather, some compilers
assume that the behavior is defined, and therefore that n + 1 must
be greater than n.
I expected an output that looks like that middle line, which is the
most intuitive if you accept that integers have a limited capacity and
will wrap, when represented as 32-bit two's complement.
The program has undefined behavior.
The situations they were thinking about were things like this:
unsigned short a = 8;
int b = -5;
long c = a * b;
With value-preserving semantics, `c` is 40.
On the other hand,
with unsigned-preserving semantics, assuming a 64-bit `long` and
32-bit `int`, `c` is 4294967256; logical enough, but one could
see how that might be surprising for someone unfamiliar with the
language.
On 11/05/2026 19:12, Keith Thompson wrote:[...]
The program has undefined behavior.
Even when -fwrapv is applied?
Bart <bc@freeuk.com> writes:
[...]
Scott said this:
You are the only programmer who has ever claimed
that an entire application must be contained within a single
translation unit. It sounds like you've never actually worked
with either a team, or a non-trivial application.
Clearly he actually thinks I'm advocating using a single source file
for any kind of project, for actual development rather than a
distribution medium. Either that or he's deliberately spewing
misinformation.
Apparently Scott misunderstood something you wrote. That's not
at all surprising. You could have calmly and briefly corrected
Scott's error rather than arguing about it at great length.
Something like "No, I don't advocate using a single source file
for actual development" would have been more than sufficient.
No, he's really not.
You are the one making a big deal out of the fact that whole
programs are in single source files.
Sure he can. SQLite does that. It's a well-known technique.
You are moving the goalposts because you were using your own
terminology and got pushbacks, and you seem constitutionally
incapable of accepting when people tell you what you wrote is
ambiguous or incorrect.
On 11/05/2026 14:54, Dan Cross wrote:
In article <10tsfvd$11qhe$4@dont-email.me>, Bart <bc@freeuk.com> wrote:
On 11/05/2026 05:28, Chris M. Thomasson wrote:
On 5/10/2026 8:42 AM, Scott Lurndal wrote:
[snip]
An experienced C programmer uses independent translation units
without even thinking about it, when the application is
non-trivial.-a-a For many reasons, including reusability,
maintainability and collaboration.-a There are codebases that
have well over a million SLOC.
You are the only programmer who has ever claimed
that an entire application must be contained within a single
translation unit.-a It sounds like you've never actually worked
with either a team, or a non-trivial application.
I wonder if his system has pre-compiled header support.
SL is talking nonsense.
No, he's really not.
Because sometimes I use tools that transpile whole programs of dozens of >>> modules into a single C source, for the purpose of compiling into an
executable (another single file!), he thinks I advocate writing and
developing projects in such a single file too!
You are the one making a big deal out of the fact that whole
programs are in single source files.
Only for special purposes such as for distribution or as intermediate files.
But when such a file happens to be C source code, people here seem to
get confused, and think my original program source actually exists as a >single 80,000-line module.
Executable object files (to use the ELF terminology) are a
completely different matter.
Nobody has a problem with distributing an EXE file as one monolithic file. >>Actually, many do.
If you are only concerned with a single (as you called it)
"monolithic" "EXE" file, then yeah, it's tautalogically true
that that is a single file.
That's not what I mean by monolithic.
A complete application will consist of one or more EXEs, and each may >dynamically link to DLLs, either external libraries or also part of the >application.
I'm talking about a single EXE or DLL file, which is created by
compiling dozens or hundreds of individual source files.
Suppose, for some reason, a prebuilt binary isn't practical, what is the >alternative? Supply original source code which is, say 100 modules?
Then you get the nightmarish build systems you associate with C and >especially Linux.
Why can't the original source be reduced down to one monolithic file?
Advantages:
* You only need supply one file 'prog.c'; not sprawling directories
* The build process then is nearly as simple as compiling hello.c
* A compiler can also do whole-program optimisations
* Where original source is an an obscure language, people don't need a >compiler for that language (another EXE) and can use one they have and trust
That is not however the original source. Scott Lurndal cannot grasp this >>> when that file happens to be 'C'.
Sure he can. SQLite does that. It's a well-known technique.
SQLlite3 is about 100 differen source files, which have gone through an >amalgamation process to produce an easy-to-deploy single file. It is not >what the developers work with.
Scott said this:
You are the only programmer who has ever claimed
that an entire application must be contained within a single
translation unit. It sounds like you've never actually worked
with either a team, or a non-trivial application.
Clearly he actually thinks I'm advocating using a single source file for
any kind of project, for actual development rather than a distribution >medium. Either that or he's deliberately spewing misinformation.
You're both clever chaps, and I think you know perfectly well what is >happening. So shame on you.
You are moving the goalposts because you were using your own
terminology and got pushbacks, and you seem constitutionally
incapable of accepting when people tell you what you wrote is
ambiguous or incorrect.
I explained the single file thing multiple times. It never seems to get >through. Or people don't bother reading my explanations.
In that case this will probably cut no ice either.
Below is the list of 77 files that comprise my C compiler.
[snip]
In article <M0nMR.786566$G7x8.651226@fx15.iad>,[...]
Scott Lurndal <slp53@pacbell.net> wrote:
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
If a byte is 9 bits (ie, if CHAR_BIT == 9) there cannot
be a uint8_t type. The fixed-width types are not allowed
to have padding bits.
That was a 36-bit system. It could easly create a
uint8_t value from 1/9th of two 72-bit words;
so no padding bits required.
I think the issue is the standard's section on, "representation
of types" (sec 6.2.6.1 para 4 in `n3220`), which requires that
anything that's not a `char` type (`(signed|unsigned)? char`)
must be a multiple represented by a multiple `CHAR_BIT` bits.
So if `CHAR_BIT` is 9, then since the exact-width types do not
permit padding bits (sec 7.22, para 1), then `uint8_t` cannot
be defined on such a system since there is no (integer) multiple
of 9 that gives 8.
Granted, that section does not explicitly say that it needs to
be an *integer* multiple of `CHAR_BIT`, but it implies it, and
section 5.2.5.3.2 says that `CHAR_BIT` is the, "number of bits
for smallest object that is not a bit-field (byte)".
So it is not clear to me that the definition of `byte` in the C
standard comports with that of some 36-bit machines, where bytes
can be of variable width; that would have to be some kind of
non-standard extension.
On 11/05/2026 19:20, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
Scott said this:
You are the only programmer who has ever claimed
that an entire application must be contained within a single
translation unit. It sounds like you've never actually worked
with either a team, or a non-trivial application.
Clearly he actually thinks I'm advocating using a single source file
for any kind of project, for actual development rather than a
distribution medium. Either that or he's deliberately spewing
misinformation.
Apparently Scott misunderstood something you wrote. That's not
at all surprising. You could have calmly and briefly corrected
Scott's error rather than arguing about it at great length.
I wasn't replying to Scott. And in fact this correction has been made >multiple times in the past; it doesn't help.
Something like "No, I don't advocate using a single source file
for actual development" would have been more than sufficient.
I was replying to Dan Cross who tool SL's side and made these comments:
DC:
No, he's really not.
You are the one making a big deal out of the fact that whole
programs are in single source files.
Sure he can. SQLite does that. It's a well-known technique.
(Here DC is getting things mixed up)
You are moving the goalposts because you were using your own
terminology and got pushbacks, and you seem constitutionally
incapable of accepting when people tell you what you wrote is
ambiguous or incorrect.
It's difficult to keep calm when people post bullying and patronising >garbage like this.
Still, I believe the post I made was civil, and accurate.
[...]
The situations they were thinking about were things like this:
unsigned short a = 8;
int b = -5;
long c = a * b;
With value-preserving semantics, `c` is 40.
You mean -40.
On 11/05/2026 19:20, Keith Thompson wrote:[...]
Bart <bc@freeuk.com> writes:
[...]
Scott said this:
You are the only programmer who has ever claimed
that an entire application must be contained within a single
translation unit. It sounds like you've never actually worked
with either a team, or a non-trivial application.
Clearly he actually thinks I'm advocating using a single source file
for any kind of project, for actual development rather than a
distribution medium. Either that or he's deliberately spewing
misinformation.
Apparently Scott misunderstood something you wrote. That's not
at all surprising. You could have calmly and briefly corrected
Scott's error rather than arguing about it at great length.
I wasn't replying to Scott. And in fact this correction has been made multiple times in the past; it doesn't help.
On 11/05/2026 19:05, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
This I just discovered by chance. It's a small Reddit language project[...]
which here transpiles to C:
self.emit("// Core types");
self.emit("typedef int64_t Int;");
self.emit("typedef int8_t Int8;");
self.emit("typedef int16_t Int16;");
self.emit("typedef int32_t Int32;");
self.emit("typedef int64_t Int64;");
self.emit("typedef uint64_t UInt;");
self.emit("typedef uint8_t UInt8;");
...
You'd think that if transpling to C anyway, they can tolerate using
"int64_t" in the generated C. But apparently not.
So what?
Some people like the <stdint.h> types. Some people don't. Everyone
here knows that. Showing us yet another example of someone renaming
them proves nothing.
What is your point?
I was asked for multiple examples of somebody defining aliases for
stdint.h types. This was one more, and not cherry-picked either.
On 11/05/2026 02:44, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
On 11/05/2026 00:11, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
I've glanced at appendix K.1 and saw nothing relevant there. It's
about exceeding arrray bounds.
Your false claim was that C "pretends" to be a safe language.
People keep jumping to conclusions without asking for clarification.
This is what I said (quite a few posts back):
C pretends to be a safe language by saying all those naughty thingsare UB and should be avoided, at the same time, C compilers can be made
to do all that.
(I see now you quoted this yourself; I can have saved some time!)
The assumption made here is that unsafe-ness arises in C from UB. Then I >>> suggest that, while the language itself washes it hands of it, it lets
the compiler do the dirty work (as well as pushing the responsibility to >>> the user, by allowing the compiler to do something that is UB).
You are seriously confused by what other people consider as
"safe language".
Yes. You say that as though I shouldn't be ...
First, I do not think it is possible to
give satisfactory definition of safety, either get the idea
or not. One popular attempt at definition is that language
is safe if no untrapped errors are possible. Of course, this
definition has trouble because then one needs to say what
an error is. Resonable definition could be that there is an
error if program is doing different thing than intended by
its creator. But as you noted there are errors that
language implementation can not reasonably detect so clearly
attempt above + this definiot on error is not satisfactory.
So we need to restrict what we consider to be an error.
When talking about language safety posible (and popular)
approach is restrict errors to things that break language
rules, like using out of bound array indices or overflow
in C signed arithmetic.
Now, if you look at UB, UB in particular means that
implementation is not obliged to detect errors. So
UB in language definition means that language is more
or less unsafe.
I think that your formulation "allowing the compiler to do
something that is UB" is quite misleading. Standard says
that some things are UB. If UB appears in a program, it
is programmer who put it there. Essential part of UB is
that it is programmer responsibility to avoid UB.
Specific compiler may be helpful by detecting UB or
defining some useful behaviour, but in general compiler
is allowed to proceed blindy, trusting that there are
no UB in the source.
Coming back to safety, definig errors as violations of
language rules is not fully satisfactory too. Namely,
using language that "allow anything", like assembler,
there will be no violation of language rules, but clearly
such language does not help in detecting error. So
to meaningfuly talk about language safety there must
be rules such that some classes of error lead to
violation of rule and violation must be detected. C
has type rules and violations of type rules will
detect some errors at compile time. But by design C
does not require any error detection at runtime so
clearly is unsafe.
Now, unqualified "safe" is really a fuzzy concept, as
there is no hope of detecting all errors and while
detecting some errors is theoretically possible
cost of checking could be prohibitive. So basically
"safe" boils down to "due diligence": language rules
forbid things that are recognized as likely to be
errors and language uses state of the art methods
to detect or prevent violations of the rules.
Let me add that basically from time where Pascal
were invented it was known how to define a language
rich enough to do most real world task, having rules
which eliminate substantial fraction of errors and
where _all_ violations of language rules are detected.
... but then you do a very good job of demonstrating why anyone could be confused!
But thank you engaging in the topic and providing some examples.
Assembly language is a good one. Clearly it does have some rules, but if some program manages to assemble, it doesn't mean it has no bugs,
including dangerous ones.
Other languages will have a line drawn elsewhere, as they have more
rules, stricter typing etc. Some, like Rust, which /people/ sometimes
claim will give you bug-free programs once you managed to get it to
compile, have it near the opposite end.
To get back to C and UB, if that 'safe' line isn't on the boundary
between non-UB and UB, then what does the boundary mean? Is it just deterministic vs. non-deterministic behaviour?
So languages that allow undetected violations of rules
are consdered more or less unsafe.
This is back to the other topic as to what makes a practical systems language.
scott@slp53.sl.home (Scott Lurndal) writes:
[...]
On at least one system with a working C compiler,[...]
a byte is 9 bits, not 8. If I wanted an 8-bit datum
on that system, I'd have to use uint8_t.
(Now, I haven't used that system in decades, but it
still exists and powers a large fraction of the
worlds airline reservation and operational functions).
What system is that?
In article <10tstnn$17jmo$1@dont-email.me>, Bart <bc@freeuk.com> wrote:
On 11/05/2026 14:54, Dan Cross wrote:
You're both clever chaps, and I think you know perfectly well what is >>happening. So shame on you.
Consider that, perhaps, your use of terminology is so muddled
and unclear that we do not, in fact, "know perfectly well what
is happening."
I can't speak for Scott, of course, but from where I am sitting,
you seem to be very uninformed about how these things work
generally, and you're using your own, made-up terminology.
Sometimes, that terminology conflicts with standard terminology,
and confusion results. You seem to think this is people
deliberately trying to misinterpret you.
In article <M0nMR.786566$G7x8.651226@fx15.iad>,[...]
Scott Lurndal <slp53@pacbell.net> wrote:
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
If a byte is 9 bits (ie, if CHAR_BIT == 9) there cannot
be a uint8_t type. The fixed-width types are not allowed
to have padding bits.
That was a 36-bit system. It could easly create a
uint8_t value from 1/9th of two 72-bit words;
so no padding bits required.
I think the issue is the standard's section on, "representation
of types" (sec 6.2.6.1 para 4 in `n3220`), which requires that
anything that's not a `char` type (`(signed|unsigned)? char`)
must be a multiple represented by a multiple `CHAR_BIT` bits.
So if `CHAR_BIT` is 9, then since the exact-width types do not
permit padding bits (sec 7.22, para 1), then `uint8_t` cannot
be defined on such a system since there is no (integer) multiple
of 9 that gives 8.
Exactly. (We can confidently infer that it must be an integer
multiple because it refers to "the size of an object of that type,
in bytes", and the sizeof operator yields an integer value, and
because it wouldn't make sense otherwise.)
Granted, that section does not explicitly say that it needs to
be an *integer* multiple of `CHAR_BIT`, but it implies it, and
section 5.2.5.3.2 says that `CHAR_BIT` is the, "number of bits
for smallest object that is not a bit-field (byte)".
So it is not clear to me that the definition of `byte` in the C
standard comports with that of some 36-bit machines, where bytes
can be of variable width; that would have to be some kind of
non-standard extension.
It's crystal clear that a C "byte' has a fixed width,
and that it's
inconsistent with any kind of variable-width "byte".
You might have, say, a 36-bit machine that can work with 6-bit,
9-bit, or 12-bit "bytes",
but a conforming C implementation must
chose a constant value for CHAR_BIT (and it can't be 6).
A C-like implementation that has CHAR_BIT==6 or that supports
variable-width bytes might be useful, but it wouldn't be conforming.
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <10tstnn$17jmo$1@dont-email.me>, Bart <bc@freeuk.com> wrote: >>>On 11/05/2026 14:54, Dan Cross wrote:
You're both clever chaps, and I think you know perfectly well what is >>>happening. So shame on you.
Consider that, perhaps, your use of terminology is so muddled
and unclear that we do not, in fact, "know perfectly well what
is happening."
I can't speak for Scott, of course, but from where I am sitting,
you seem to be very uninformed about how these things work
generally, and you're using your own, made-up terminology.
Sometimes, that terminology conflicts with standard terminology,
and confusion results. You seem to think this is people
deliberately trying to misinterpret you.
Indeed. I misunderstood him, my apologies.
Software distribution is a problem was been solved decades ago.
Whether early shell archives (shar) or tar/cpio,
.rpm/.deb et alia or even windows installers,
it's a problem that's been solved many times;
'shar' is even a single text file.
All of which must, of course, be unpacked before building the
code, although with shar (and .rpm/.deb), the software can
be built as part the installation process automatically.
Bart seems to be advocating a distribution mechanism where one
feeds the distributed file directly into a compiler without
being required to unpack an archive first.
On 11/05/2026 19:12, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
On 11/05/2026 03:48, Keith Thompson wrote:[...]
With different compilers and optimization settings, I get any of the
following outputs on my system:
n=2147483647, n+1=1, 1 > 2147483647
n=2147483647, n+1=-2147483648, -2147483648 < 2147483647
n=2147483647, n+1=-2147483648, -2147483648 > 2147483647
I'm fairly sure that none of the compilers detect that there will
be undefined behavior at run time. The fact that time(NULL) is
greater than 0 is not something I'd expect a compiler to assume.
(That's why I added that to the program.) Rather, some compilers
assume that the behavior is defined, and therefore that n + 1 must
be greater than n.
I expected an output that looks like that middle line, which is the
most intuitive if you accept that integers have a limited capacity and
will wrap, when represented as 32-bit two's complement.
The program has undefined behavior.
Even when -fwrapv is applied?
In article <10tt6sr$1asao$1@dont-email.me>, Bart <bc@freeuk.com> wrote:
On 11/05/2026 19:05, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
This I just discovered by chance. It's a small Reddit language project >>>> which here transpiles to C:[...]
self.emit("// Core types");
self.emit("typedef int64_t Int;");
self.emit("typedef int8_t Int8;");
self.emit("typedef int16_t Int16;");
self.emit("typedef int32_t Int32;");
self.emit("typedef int64_t Int64;");
self.emit("typedef uint64_t UInt;");
self.emit("typedef uint8_t UInt8;");
...
You'd think that if transpling to C anyway, they can tolerate using
"int64_t" in the generated C. But apparently not.
So what?
Some people like the <stdint.h> types. Some people don't. Everyone
here knows that. Showing us yet another example of someone renaming
them proves nothing.
What is your point?
I was asked for multiple examples of somebody defining aliases for
stdint.h types. This was one more, and not cherry-picked either.
No you weren't. You were asked to prove that "nobody" likes
them.
Strawman. Please provide examples of "somebody inventing a new type name for uint8_t" (post standardization). One swallow doesn't make asummer, so a single example
from some obscure project you found on the WWW isn't partcularly instructive.
In article <10tstnn$17jmo$1@dont-email.me>, Bart <bc@freeuk.com> wrote:
On 11/05/2026 14:54, Dan Cross wrote:
In article <10tsfvd$11qhe$4@dont-email.me>, Bart <bc@freeuk.com> wrote: >>>> On 11/05/2026 05:28, Chris M. Thomasson wrote:
On 5/10/2026 8:42 AM, Scott Lurndal wrote:
[snip]
CMT:An experienced C programmer uses independent translation units
without even thinking about it, when the application is
non-trivial.-a-a For many reasons, including reusability,
maintainability and collaboration.-a There are codebases that
have well over a million SLOC.
You are the only programmer who has ever claimed
that an entire application must be contained within a single
translation unit.-a It sounds like you've never actually worked
with either a team, or a non-trivial application.
BC (me):I wonder if his system has pre-compiled header support.
********************************SL is talking nonsense.
No, he's really not.
Only for special purposes such as for distribution or as intermediate files.
What? No, that wasn't the context at all.
On 11/05/2026 20:04, Dan Cross wrote:
In article <10tt6sr$1asao$1@dont-email.me>, Bart <bc@freeuk.com> wrote:
On 11/05/2026 19:05, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
This I just discovered by chance. It's a small Reddit language project >>>>> which here transpiles to C:[...]
self.emit("// Core types");
self.emit("typedef int64_t Int;");
self.emit("typedef int8_t Int8;");
self.emit("typedef int16_t Int16;");
self.emit("typedef int32_t Int32;");
self.emit("typedef int64_t Int64;");
self.emit("typedef uint64_t UInt;");
self.emit("typedef uint8_t UInt8;");
...
You'd think that if transpling to C anyway, they can tolerate using
"int64_t" in the generated C. But apparently not.
So what?
Some people like the <stdint.h> types. Some people don't. Everyone
here knows that. Showing us yet another example of someone renaming
them proves nothing.
What is your point?
I was asked for multiple examples of somebody defining aliases for
stdint.h types. This was one more, and not cherry-picked either.
No you weren't. You were asked to prove that "nobody" likes
them.
I was asked this:
SL:
Strawman. Please provide examples of "somebody inventing a new type name for uint8_t" (post standardization). One swallow doesn't make asummer, so a single example
from some obscure project you found on the WWW isn't partcularly instructive.
2 or 3 posts back.
On 11/05/2026 19:44, Dan Cross wrote:
In article <10tstnn$17jmo$1@dont-email.me>, Bart <bc@freeuk.com> wrote:
On 11/05/2026 14:54, Dan Cross wrote:
In article <10tsfvd$11qhe$4@dont-email.me>, Bart <bc@freeuk.com> wrote: >>>>> On 11/05/2026 05:28, Chris M. Thomasson wrote:
On 5/10/2026 8:42 AM, Scott Lurndal wrote:
[snip]
********************************
SL:
CMT:An experienced C programmer uses independent translation units
without even thinking about it, when the application is
non-trivial.-a-a For many reasons, including reusability,
maintainability and collaboration.-a There are codebases that
have well over a million SLOC.
You are the only programmer who has ever claimed
that an entire application must be contained within a single
translation unit.-a It sounds like you've never actually worked
with either a team, or a non-trivial application.
I wonder if his system has pre-compiled header support.
BC (me):
********************************SL is talking nonsense.
No, he's really not.
The context here is what I've marked between rows of asterisks above.
Scott Lurndal seems to think I prefer applications to be within one
source file. I said that is nonsense because it isn't true.
Only for special purposes such as for distribution or as intermediate files.
What? No, that wasn't the context at all.
In this case it was, but it depends on past history where I've advocated >/distribution/ of programs, if they can't be binaries, as a
self-contained C source file which has been generated from the original >sources. I first did that in 2014.
Everyone here, not just SL, always seem to think that it means I'm
suggested developing using such files, despite my explaining it many times.
I assumed your remark ('No, he's really not') was about that. If not
then I misunderstood and I apologise.
Bart <bc@freeuk.com> wrote:
This is back to the other topic as to what makes a practical systems
language.
With current state of art, if you need to work with hardware,
then you need unsafe features. Modern tendency is that only
operating system (including device drivers) has "unrestricted"
access to hardware (I put unrestricted in quotes to account
for things like hypervisors). At higer level it seems that
safe languages allow to do all needed work. Of course, this
may require more effort from programmers, but that is managable
and there are indications that on average safe languages
may require less effort. There may be loss of efficiency.
C++ preached safe and "zero runtime cost", but in reality
safety features have some overhead. You seem to be quite
satified having half of speed of optimized C.
It seem that
safe languges can deliver that.
The battle is about last
few percent of preformanmce and there are disagreements if
those last few percents matter.
Anyway, once you are above OS level languages like SML or
OCaml add strong safety features and offer performace within
small factor around preformance of optimized C. Languages
based on JVM claim slightly better performance and comparable
safety. Those languages depend on garbage collection which may
introduce unacceptable delays. But there are now methods
to make delays smaller (used in Erlang and Go), and
methods that burn more machine cycles but completely eliminate
delays (parallel garbage collection). Rust got a lot of
good press because it promises memory safety (which previously
needed garbage collector) for manual memory management.
Concerning "system programming", for long time many developers
believed that safety is needed only in very special applications
and that in general purpose systems bugs are tolerable.
Internet slightly challenged this, highliting need for
security. But even after industry got serious about security,
they still considered language safety almost as unneeded
luxury. That changed in recent times. There is one thing
when Joe Random Hacker encrypts disk of user computer and
demands ransom, basically all powers that could change this
considered such things as user problem. But now hackers from
hostile country can break critical systems (shut down
electricity in whole county, destroy electric plants, stop
vital pipeline from operationg, etc) and goverments got
more serious.
In article <10truhq$tqbj$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:
On 11/05/2026 02:31, James Kuyper wrote:
On 2026-05-08 06:43, David Brown wrote:
...
Yes, I have heard that argument before. I am unconvinced that the
"value preserving" choice actually has any real advantages. I also
think it is a misnomer - it implies that "unsigned preserving" would
not preserve values, which is wrong.
Unsigned-preserving rules would convert a signed value which might be
negative to unsigned type more frequently than the value preserving
rules do. Such a conversion is not value-preserving.
If you have a signed value, you have a signed type. Unsigned-preserving
rules are also signed-preserving - smaller unsigned types promote to
bigger unsigned types, while smaller signed types promote to bigger
signed types. I don't think anyone ever suggested smaller signed types
should promote to larger unsigned types.
Perhaps I am being bone-headed here and missing something obvious.
(Given that the C committee put in a lot of effort and came to a
different conclusion, it seems very likely that I'm missing something.)
The C89 rationale document is useful here, specifically section
3.2.1.1.
It describes the tradeoffs between unsigned-preserving and
value-preserving semantics that the committeee considered when
making the decision to codify value-preserving behavior. Of
note to this discussion is the following:
|Both schemes give the same answer in the vast majority of
|cases, and both give the same effective result in even more
|cases in implementations with twos complement arithmetic and
|quiet wraparound on signed overflow rCo that is, in most current |implementations.
This suggests the committee felt that it was rare that signed
integer overflow was treated specially by compilers, and that
the equivalent of `-fwrapv` was the dominant case, and would
continue to be in the future. (Oh, those sweet summer
children....)
The text continues with descriptions of operations where the
promotion of `unsigned char` and `unsigned short` values yield
results that the committee dubbed, "questionably signed." That
is, places where interpreting the sign of the result is
ambiguous given the two different semantics.
They highlight that the same ambiguity arises with operations
mixing `unsigned int` and `signed int`, but state that (to use
their words), the "unsigned preserving rules greatly increase
the number of situations where `unsigned int` confronts `signed
int` to yield a questionably signed result, whereas the value
preserving rules minimize such confrontations. Thus, the value
preserving rules were considered to be safer for the novice, or
unwary, programmer."
They do go on to note that this is a, "quiet change", at odds
with contemporary Unix compilers, and say, "This is considered
the most serious semantic change made by the Committee to a
widespread current practice." Indeed.
Unsigned-preserving promotions would, AFAICS, preserve value and
signedness :
unsigned short -> unsigned int
signed short -> signed int
Value-preserving promotions would preserve values too :
unsigned short -> signed int
signed short -> signed int
The unsigned-preserving promotions could also safely be applied even if
short is the same size as int - that is not the case for the "always
promote to signed int" rules.
The situations they were thinking about were things like this:
unsigned short a = 8;
int b = -5;
long c = a * b;
With value-preserving semantics, `c` is 40. On the other hand,
with unsigned-preserving semantics, assuming a 64-bit `long` and
32-bit `int`, `c` is 4294967256; logical enough, but one could
see how that might be surprising for someone unfamiliar with the
language.
What they do not appear to have antipicated are compiler
developers who would exploit the undefined nature of signed
integer overflow so aggressively that things like taking the
product of two 16-bit `unsigned short` values and assigning it
to a variable of unsigned type might yield unexpected results
(like a saturated product).
And I sincerely believe that they never thought that anyone
would use "undefined behavior" as a cudgel to justify such
behavior, even if a compiler would technically be operating
within the bounds of the standard if it did so. Talk about
being surprising for the novice or unwary....
On balance, I agree with you that they should have chosen
unsigned-preserving semantics. Perhaps it would have led to
more situations where `unsigned int` "confronts" a `signed int`
that is negative (like they're about to throw down outside a bar
over a spilled drink or something), but in retrospect, I think
that's relatively easy to explain, while the value preserving
semantics lead to more UB and different questions: from the
novice perspective, it is very reasonable to ask why
`(unsigned short)8 * -5 == -40` but
`(unsigned)8 * -5 == 4294967256`).
- Dan C.
b) Note that he said, "post standardization." You have provided
no data about any of the projects you cited and when they
adopted whatever their alternative type names are, or whether or
not they target platforms and/or compilers that are not
standards conforming. For all we know, that was done
pre-standardization of those names, or an important platform is
something that doesn't support `<stdint.h>` for some obscure
reason.
Bart <bc@freeuk.com> writes:
[...]
This I just discovered by chance. It's a small Reddit language project[...]
which here transpiles to C:
self.emit("// Core types");
self.emit("typedef int64_t Int;");
self.emit("typedef int8_t Int8;");
self.emit("typedef int16_t Int16;");
self.emit("typedef int32_t Int32;");
self.emit("typedef int64_t Int64;");
self.emit("typedef uint64_t UInt;");
self.emit("typedef uint8_t UInt8;");
...
You'd think that if transpling to C anyway, they can tolerate using
"int64_t" in the generated C. But apparently not.
So what?
Some people like the <stdint.h> types.
Some people don't. Everyone
here knows that. Showing us yet another example of someone renaming
them proves nothing.
What is your point?
On 2026-05-10 20:10, Keith Thompson wrote:
...
Bart, you claimed here that literally *nobody* likes stdint.h types.
I like stdint.h types.
Me too.
kalevi@kolttonen.fi (Kalevi Kolttonen) writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
In almost all cases where uint8_t
might be used, unsigned char works just as well.
Why "almost"? Where is the difference if any?
As far as I know, ISO guarantees that
sizeof(unsigned char) is always 1 byte.
On at least one system with a working C compiler,
a byte is 9 bits, not 8. If I wanted an 8-bit datum
on that system, I'd have to use uint8_t.
(Now, I haven't used that system in decades, but it
still exists and powers a large fraction of the
worlds airline reservation and operational functions).
And operations on unsigned char are well defined,
including wrap-around. So I fail to see any
difference between unsigned char and uint8_t.
Indeed. Although from my perspective, the use of the
stdint types clearly documents the programmers
intent, whereas a typedef such as BYTE or WORD
is inherently ambiguous and would require a programmer
to look up the definition of such types in the
application to determine the original programmers intent.
On 2026-05-11 06:21, Chris M. Thomasson wrote:
[ C's characteristics ]
To allow one to shoot themselves in the foot! Both feet. ;^)
To stress that picture...
"C" allows you to shoot yourself in your foot, but if you
manage to shoot in both of your feet with a single bullet
then it's the programmer's fault!
For other apps, int, long, float, double are preferred
to INT, LONG, FLOAT, DOUBLE (which seems to be the
way windows programmers code)[*]
[*] which probably dates back to 16-bit windows
and their methods of maintaining backward compatability
across two subsequent (32, 64) x86 processor architectures
plus MIPS et alia.
Software distribution is a problem was been solved decades ago.
Whether early shell archives (shar) or tar/cpio,
.rpm/.deb et alia or even windows installers,
it's a problem that's been solved many times;
'shar' is even a single text file.
All of which must, of course, be unpacked before building the
code, although with shar (and .rpm/.deb), the software can
be built as part the installation process automatically.
Bart seems to be advocating a distribution mechanism where one
feeds the distributed file directly into a compiler without
being required to unpack an archive first.
In article <10tt85b$1adha$9@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <M0nMR.786566$G7x8.651226@fx15.iad>,[...]
Scott Lurndal <slp53@pacbell.net> wrote:
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
If a byte is 9 bits (ie, if CHAR_BIT == 9) there cannot
be a uint8_t type. The fixed-width types are not allowed
to have padding bits.
That was a 36-bit system. It could easly create a
uint8_t value from 1/9th of two 72-bit words;
so no padding bits required.
I think the issue is the standard's section on, "representation
of types" (sec 6.2.6.1 para 4 in `n3220`), which requires that
anything that's not a `char` type (`(signed|unsigned)? char`)
must be a multiple represented by a multiple `CHAR_BIT` bits.
So if `CHAR_BIT` is 9, then since the exact-width types do not
permit padding bits (sec 7.22, para 1), then `uint8_t` cannot
be defined on such a system since there is no (integer) multiple
of 9 that gives 8.
Exactly. (We can confidently infer that it must be an integer
multiple because it refers to "the size of an object of that type,
in bytes", and the sizeof operator yields an integer value, and
because it wouldn't make sense otherwise.)
Granted, that section does not explicitly say that it needs to
be an *integer* multiple of `CHAR_BIT`, but it implies it, and
section 5.2.5.3.2 says that `CHAR_BIT` is the, "number of bits
for smallest object that is not a bit-field (byte)".
So it is not clear to me that the definition of `byte` in the C
standard comports with that of some 36-bit machines, where bytes
can be of variable width; that would have to be some kind of
non-standard extension.
It's crystal clear that a C "byte' has a fixed width,
I'm not sure that's actually true, but am willing to accept it
at face value.
But I take exception with the assertion that it is "crystal
clear". It is a conclusion that is inferered, not explicit,
though it is likely the only possible conclusion one can arrive
at considering the full set of constraints imposed by the
standard as a whole.
One can imagine a system where "bytes" are variable length, but
tagged with their size, addressible, where the minimal width is
7, and `CHAR_BIT` is defined as the maximal allowable byte size,
and `m x CHAR_BIT` permits `m` to be rational. With sufficient
controtions, one _may_ be able to force this round peg of a
non-existent machine into the square hole of standards
conformance, though I **strongly** suspect there is some other
requirement that invalidates the idea (as it rightly should).
Such a machine does not exist. And since thinking about it is
not useful other than as an academic thought exercise, I cannot
motivate myself to go find the disconfirming passages in the
standard. I shall simply trust that the exist, or that in any
event it does not matter, and lose no sleep over the matter.
and that it's
inconsistent with any kind of variable-width "byte".
You might have, say, a 36-bit machine that can work with 6-bit,
9-bit, or 12-bit "bytes",
Or 7 bits. Or mixed within a word. 36-bit machines got pretty
funky.
but a conforming C implementation must
chose a constant value for CHAR_BIT (and it can't be 6).
(nb, because 6 bits is insufficient to represent the characters
in the "basic character set", which has 94 characters in it [as
of N3220], and sec 3.7 defines a byte as an, "addressable unit
of data storage large enough to hold any member of the basic
character set of the execution environment").
Curiously, that section does not require it to be fixed; it
arguably should.
A C-like implementation that has CHAR_BIT==6 or that supports >>variable-width bytes might be useful, but it wouldn't be conforming.
More fundamentally for conformance for word-oriented 36-bit
machines, bytes are not usually (ever?) directly addressed.
Rather, the containing word is the addressable unit, and a byte
accessed from within the word via special instructions or some
sort of descriptor.
Perhaps the main "mistake" (where "mistake" means "I personally think[...]
C would be nicer for my own use if things were different") is that
when mixing operations between signed int and unsigned int, the signed
int is converted to unsigned. I suspect that in real-world code,
unsigned int values that are within the range of signed int are common
- and that negative signed int values are more common than unsigned
int values that are out of range of signed int. Any common type here,
unless it is larger than the two original types, is going to get some
things wrong - but I think that converging on signed int as the common
type would be wrong less often. And if that had been the rule, then unsigned-preserving promotion would be correct too in examples like
yours.
scott@slp53.sl.home (Scott Lurndal) writes:
[...]
Software distribution is a problem was been solved decades ago.
For certain values of "solved".
Whether early shell archives (shar) or tar/cpio,
.rpm/.deb et alia or even windows installers,
it's a problem that's been solved many times;
'shar' is even a single text file.
All of which must, of course, be unpacked before building the
code, although with shar (and .rpm/.deb), the software can
be built as part the installation process automatically.
Bart seems to be advocating a distribution mechanism where one
feeds the distributed file directly into a compiler without
being required to unpack an archive first.
And that's a valid mechanism. It is, for example, one of
several source distribution mechanisms used by SQLite. But the "amalgamation" (a) is distributed in compressed form (which unpacks
to 4 source files last time I looked), and (b) is not the form
used for development and maintenance. It's generated from a more conventional collection of source files in a directory tree.
I don't think anyone had advocated distributing single source files
as the best method in general; that was a simple misunderstanding.
On 11/05/2026 22:16, Keith Thompson wrote:[...]
I don't think anyone had advocated distributing single source files
as the best method in general; that was a simple misunderstanding.
The misunderstanding what in mistakenly suggested I'd advocated
maintaining original, maintanable source for any scale of program in
one source file.
In article <10tpt9j$c3i4$1@dont-email.me>, Bart <bc@freeuk.com> wrote:
On 10/05/2026 05:39, Janis Papanagnou wrote:
[snip]
What makes you think that I'd need to write an own language given that
there's a plethora of languages of all kinds and paradigms existing.
So where's the one that works like mine?
I mean, Rust does exactly what you were just describing.
And why are there so many new ones still appearing? Most of them you
will not know about.
Consider the possibility that you may be unique in the world in
possessing the combination of requirements and aesthetic
judgement that makes you feel you need a language like yours.
As for new languages, there are a number of reasons. Most of
them are not particularly relevant here.
At this point, you may consider doing what Keith suggested, and
moving further discussion of your language to comp.lang.misc.
As I wrote, safety is about ability to avoid or detect errors.
I don't have much of a problem with the things that C can do, but with
how it does it, its syntax, its ancient baggage, its quirks, its
folklore, its Unix-centric ecosystem, its pointless UBs, its
insistence in working with every oddball processor, its solving every shortcoming with macros, its adherents who will defend every
misfeature to the death...
It is also frustrating looking at C forums and people thinking they
are too stupid to grasp something when it's language that could have
been better.
On 10/05/2026 14:05, Dan Cross wrote:
In article <10tpt9j$c3i4$1@dont-email.me>, Bart <bc@freeuk.com> wrote:
On 10/05/2026 05:39, Janis Papanagnou wrote:
[snip]
What makes you think that I'd need to write an own language given that >>>> there's a plethora of languages of all kinds and paradigms existing.
So where's the one that works like mine?
I mean, Rust does exactly what you were just describing.
Rust could hardly be more different than mine.
And why are there so many new ones still appearing? Most of them you
will not know about.
Consider the possibility that you may be unique in the world in
possessing the combination of requirements and aesthetic
judgement that makes you feel you need a language like yours.
My language fills the same niche that C does.
I don't have much of a problem with the things that C can do, but with
how it does it, its syntax, its ancient baggage, its quirks, its
folklore, its Unix-centric ecosystem, its pointless UBs, its insistence
in working with every oddball processor, its solving every shortcoming
with macros, its adherents who will defend every misfeature to the death...
Maybe the answer is to just create my own language?! I did exactly that,
and didn't to have to deal with C for 10-15 years, but you can't get
away from it because it's everywhere.
It is also frustrating looking at C forums and people thinking they are
too stupid to grasp something when it's language that could have been >better.
As for new languages, there are a number of reasons. Most of
them are not particularly relevant here.
At this point, you may consider doing what Keith suggested, and
moving further discussion of your language to comp.lang.misc.
Sure, a pretty much dead group.
antispam@fricas.org (Waldek Hebisch) writes:
[discussing the notion of "safe" programs]
As I wrote, safety is about ability to avoid or detect errors.
In the functional programming community the usual statement is
"Well-typed programs cannot go wrong."
I think a good way of
understanding this is that, if a program stays inside the
safe limits of the language, the program can produce wrong
answers, but it cannot produce meaningless answers.
Of course, that has nothing to do with failing hardware, etc.
In article <868q9ppg4o.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
antispam@fricas.org (Waldek Hebisch) writes:
[discussing the notion of "safe" programs]
As I wrote, safety is about ability to avoid or detect errors.
In the functional programming community the usual statement is
"Well-typed programs cannot go wrong."
This is only concerning _type safety_.
David Brown <david.brown@hesbynett.no> writes:
[...]
Perhaps the main "mistake" (where "mistake" means "I personally think[...]
C would be nicer for my own use if things were different") is that
when mixing operations between signed int and unsigned int, the signed
int is converted to unsigned. I suspect that in real-world code,
unsigned int values that are within the range of signed int are common
- and that negative signed int values are more common than unsigned
int values that are out of range of signed int. Any common type here,
unless it is larger than the two original types, is going to get some
things wrong - but I think that converging on signed int as the common
type would be wrong less often. And if that had been the rule, then
unsigned-preserving promotion would be correct too in examples like
yours.
If I were designing a new C-like language, I'd probably avoid the
issue of signed-preserving vs. value-preserving altogether. I might
say operations where one operand is signed and the other is unsigned
are not allowed; if you need that, you can cast one of the operands.
The C committee decided to impose a more or less reasonable rule on
all such operations; I might require the programmer to decide what
to do in each case. (There might be an exception for constants,
so that u+1 doesn't require a cast; I haven't thought through the implications of that.)
I'd also define operations on narrow types, so the promotion rules
become unnecesary.
Of course C can't be changed in this way without breaking tons of
existing code.
On 11/05/2026 23:30, Keith Thompson wrote:
David Brown <david.brown@hesbynett.no> writes:
[...]
Perhaps the main "mistake" (where "mistake" means "I personally think[...]
C would be nicer for my own use if things were different") is that
when mixing operations between signed int and unsigned int, the signed
int is converted to unsigned. I suspect that in real-world code,
unsigned int values that are within the range of signed int are common
- and that negative signed int values are more common than unsigned
int values that are out of range of signed int. Any common type here,
unless it is larger than the two original types, is going to get some
things wrong - but I think that converging on signed int as the common
type would be wrong less often. And if that had been the rule, then
unsigned-preserving promotion would be correct too in examples like
yours.
If I were designing a new C-like language, I'd probably avoid the
issue of signed-preserving vs. value-preserving altogether. I might
say operations where one operand is signed and the other is unsigned
are not allowed; if you need that, you can cast one of the operands.
I'd be with you on that.
However, I think you'd quickly run into inconveniences and annoyances
with integer constants - you'd want "x * 2" to work regardless of the signedness of x's type. I am no Ada expert, and it's OT anyway, but I believe in Ada the type of integer constants adapts to fit when used
like this - you'd need something similar to make the hypothetical K&B
C language work well. Integer constants would have to be
"questionably signed", not signed or unsigned. (Maybe "adaptively
typed" might be a better term, and include the size of the type as
well as the signedness.)
Of course C can't be changed in this way without breaking tons of
existing code.
The curse of popularity.
On Sun, 10 May 2026 20:30:24 -0400
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
On 2026-05-10 20:10, Keith Thompson wrote:
...
Bart, you claimed here that literally *nobody* likes stdint.h types.
I like stdint.h types.
Me too.
I not just like stdint.h*
I also hate when C programmers define their own fixed-width integer
types.
That I wouldn't do myself, but it is o.k.:
typedef int32_t sample_index;
typedef uint32_t sample_value;
That would raise my blood pressure:
typedef int32_t s32;
typedef uint32_t u32;
typedef uint8_t octet;
------------
* PTR macros is something else. Those I hate.
And all those *_fast and *_least types... Not that I hate them, but
it's certainly shows lack of taste.
On 11/05/2026 19:07, Dan Cross wrote:
[snip]
The C89 rationale document is useful here, specifically section
3.2.1.1.
It describes the tradeoffs between unsigned-preserving and
value-preserving semantics that the committeee considered when
making the decision to codify value-preserving behavior. Of
note to this discussion is the following:
|Both schemes give the same answer in the vast majority of
|cases, and both give the same effective result in even more
|cases in implementations with twos complement arithmetic and
|quiet wraparound on signed overflow rCo that is, in most current
|implementations.
Yes, I've read the rationale here, and I'm still not convinced I
understand their reasoning.
[snip]
The situations they were thinking about were things like this:
unsigned short a = 8;
int b = -5;
long c = a * b;
With value-preserving semantics, `c` is 40. On the other hand,
with unsigned-preserving semantics, assuming a 64-bit `long` and
32-bit `int`, `c` is 4294967256; logical enough, but one could
see how that might be surprising for someone unfamiliar with the
language.
Thanks for that example.
Perhaps the main "mistake" (where "mistake" means "I personally think C >would be nicer for my own use if things were different") is that when
mixing operations between signed int and unsigned int, the signed int is >converted to unsigned. I suspect that in real-world code, unsigned int >values that are within the range of signed int are common - and that >negative signed int values are more common than unsigned int values that
are out of range of signed int. Any common type here, unless it is
larger than the two original types, is going to get some things wrong -
but I think that converging on signed int as the common type would be
wrong less often. And if that had been the rule, then
unsigned-preserving promotion would be correct too in examples like yours.
David Brown <david.brown@hesbynett.no> writes:
[...]
Perhaps the main "mistake" (where "mistake" means "I personally think[...]
C would be nicer for my own use if things were different") is that
when mixing operations between signed int and unsigned int, the signed
int is converted to unsigned. I suspect that in real-world code,
unsigned int values that are within the range of signed int are common
- and that negative signed int values are more common than unsigned
int values that are out of range of signed int. Any common type here,
unless it is larger than the two original types, is going to get some
things wrong - but I think that converging on signed int as the common
type would be wrong less often. And if that had been the rule, then
unsigned-preserving promotion would be correct too in examples like
yours.
If I were designing a new C-like language, I'd probably avoid the
issue of signed-preserving vs. value-preserving altogether. I might
say operations where one operand is signed and the other is unsigned
are not allowed; if you need that, you can cast one of the operands.
The C committee decided to impose a more or less reasonable rule on
all such operations; I might require the programmer to decide what
to do in each case. (There might be an exception for constants,
so that u+1 doesn't require a cast; I haven't thought through the >implications of that.)
I'd also define operations on narrow types, so the promotion rules
become unnecesary.
Of course C can't be changed in this way without breaking tons of
existing code.
On 11/05/2026 23:30, Keith Thompson wrote:
David Brown <david.brown@hesbynett.no> writes:
[...]
Perhaps the main "mistake" (where "mistake" means "I personally think[...]
C would be nicer for my own use if things were different") is that
when mixing operations between signed int and unsigned int, the signed
int is converted to unsigned. I suspect that in real-world code,
unsigned int values that are within the range of signed int are common
- and that negative signed int values are more common than unsigned
int values that are out of range of signed int. Any common type here,
unless it is larger than the two original types, is going to get some
things wrong - but I think that converging on signed int as the common
type would be wrong less often. And if that had been the rule, then
unsigned-preserving promotion would be correct too in examples like
yours.
If I were designing a new C-like language, I'd probably avoid the
issue of signed-preserving vs. value-preserving altogether. I might
say operations where one operand is signed and the other is unsigned
are not allowed; if you need that, you can cast one of the operands.
I'd be with you on that.
However, I think you'd quickly run into inconveniences and annoyances
with integer constants - you'd want "x * 2" to work regardless of the >signedness of x's type. I am no Ada expert, and it's OT anyway, but I >believe in Ada the type of integer constants adapts to fit when used
like this - you'd need something similar to make the hypothetical K&B C >language work well. Integer constants would have to be "questionably >signed", not signed or unsigned. (Maybe "adaptively typed" might be a >better term, and include the size of the type as well as the signedness.)
The C committee decided to impose a more or less reasonable rule on
all such operations; I might require the programmer to decide what
to do in each case. (There might be an exception for constants,
so that u+1 doesn't require a cast; I haven't thought through the
implications of that.)
Certainly the rules work - even if I might have preferred something >different, you can learn the rules and right correct code using them.
Lots of people do!
I'd also define operations on narrow types, so the promotion rules
become unnecesary.
<aol> Me too! </aol>
I might start using the _BitInt types, once the versions of gcc I need
for the targets I need have good support for them.
Of course C can't be changed in this way without breaking tons of
existing code.
The curse of popularity.
In article <10ttvng$1j579$1@dont-email.me>, Bart <bc@freeuk.com> wrote:
On 10/05/2026 14:05, Dan Cross wrote:
In article <10tpt9j$c3i4$1@dont-email.me>, Bart <bc@freeuk.com> wrote: >>>> On 10/05/2026 05:39, Janis Papanagnou wrote:
[snip]
What makes you think that I'd need to write an own language given that >>>>> there's a plethora of languages of all kinds and paradigms existing.
So where's the one that works like mine?
I mean, Rust does exactly what you were just describing.
Rust could hardly be more different than mine.
You were describing what Rust calls, `include_str!` and
`include_bytes!`. That's what I was referring to.
https://doc.rust-lang.org/std/macro.include_str.html https://doc.rust-lang.org/std/macro.include_bytes.html
It is also frustrating looking at C forums and people thinking they are
too stupid to grasp something when it's language that could have been
better.
The problem you keep encountering here, specifically, is that by
your own admission you do not know C, the language, well enough
to accurately understand what would have made it a "language
that could have been better."
I not just like stdint.h*
I also hate when C programmers define their own fixed-width integer
types.
[...]
That would raise my blood pressure:
typedef int32_t s32;
typedef uint32_t u32;
typedef uint8_t octet;
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <868q9ppg4o.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
antispam@fricas.org (Waldek Hebisch) writes:
[discussing the notion of "safe" programs]
As I wrote, safety is about ability to avoid or detect errors.
In the functional programming community the usual statement is
"Well-typed programs cannot go wrong."
This is only concerning _type safety_.
I didn't mean to imply anything different.
In article <10tuhmt$1o3bp$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:
On 11/05/2026 23:30, Keith Thompson wrote:
David Brown <david.brown@hesbynett.no> writes:
[...]
Perhaps the main "mistake" (where "mistake" means "I personally think[...]
C would be nicer for my own use if things were different") is that
when mixing operations between signed int and unsigned int, the signed >>>> int is converted to unsigned. I suspect that in real-world code,
unsigned int values that are within the range of signed int are common >>>> - and that negative signed int values are more common than unsigned
int values that are out of range of signed int. Any common type here, >>>> unless it is larger than the two original types, is going to get some
things wrong - but I think that converging on signed int as the common >>>> type would be wrong less often. And if that had been the rule, then
unsigned-preserving promotion would be correct too in examples like
yours.
If I were designing a new C-like language, I'd probably avoid the
issue of signed-preserving vs. value-preserving altogether. I might
say operations where one operand is signed and the other is unsigned
are not allowed; if you need that, you can cast one of the operands.
I'd be with you on that.
However, I think you'd quickly run into inconveniences and annoyances
with integer constants - you'd want "x * 2" to work regardless of the
signedness of x's type. I am no Ada expert, and it's OT anyway, but I
believe in Ada the type of integer constants adapts to fit when used
like this - you'd need something similar to make the hypothetical K&B C
language work well. Integer constants would have to be "questionably
signed", not signed or unsigned. (Maybe "adaptively typed" might be a
better term, and include the size of the type as well as the signedness.)
I think the term you are looking for is "strongly typed". :-)
That is, types are verifably compatible. In a strongly- and
statically-typed language (that is, one where the types of
objects are known at compile time), it's possible to be both
expressive and precise. There are plenty of examples of such
langauges, but the common characteristic is that they (usually)
_infer_ the type of an expression based on the types of the
operands; there are well-known, formally sound, techniques for
doing this
With respect to literal constants, this would simply mean that
the literal would be considered to be of the inferred type of
the expression it was in: if no such inference could be made
(for instance, the types are fundamentally incompatbile), then
the compiler fail, flagging the type incompatibility as an
error.
So, if this were a fragment of a program in a hypothetical C
dialect that was strongly typed and used type inference,
unsigned int a = 5;
unsigned int c = a * 2;
both `5` and `2` would be inferred to have type `unsigned int`,
since both are representable as unsigned ints. However,
unsigned int c = a * -2;
would be a compile time error, since the resulting type of the
expression must be `unsigned int`, but `-2` is not an unsigned
integer: it would have to be explicitly converted first.
The C committee decided to impose a more or less reasonable rule on
all such operations; I might require the programmer to decide what
to do in each case. (There might be an exception for constants,
so that u+1 doesn't require a cast; I haven't thought through the
implications of that.)
Certainly the rules work - even if I might have preferred something
different, you can learn the rules and right correct code using them.
Lots of people do!
Yes, there are many examples of this, so it is obviously true.
However, I don't think there are many large projects written in
C where there isn't undefined behavior lurking somewhere, and
the amount of effort required to learn _all_ the rules of the
language is unnecessarily large.
I think it is fair to say that there are people who wear their
knowledge of the C standard as a badge of honor and look down at
those who desire a simpler language or who do not know the rules
as well. Some of that is fair (we see examples in this group of
some who not only refuse to learn the rules of the language, but
revel in their ignorance).
But that doesn't mean that all of the criticism is wrong, and
the frequency at which it happens that people run into UB is
also an indictment of the language. Put it this way: it may be
the programmer's fault that they relied on UB, but that it is so
evidently hard to learn and internalize the rules is also the
fault of the langauge. It is not wrong to wish it were better.
I'd also define operations on narrow types, so the promotion rules
become unnecesary.
<aol> Me too! </aol>
I might start using the _BitInt types, once the versions of gcc I need
for the targets I need have good support for them.
Of course C can't be changed in this way without breaking tons of
existing code.
The curse of popularity.
The curse of history!
- Dan C.
In article <10tuhmt$1o3bp$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:
On 11/05/2026 23:30, Keith Thompson wrote:
David Brown <david.brown@hesbynett.no> writes:
[...]
Perhaps the main "mistake" (where "mistake" means "I personally think[...]
C would be nicer for my own use if things were different") is that
when mixing operations between signed int and unsigned int, the signed >>>> int is converted to unsigned. I suspect that in real-world code,
unsigned int values that are within the range of signed int are common >>>> - and that negative signed int values are more common than unsigned
int values that are out of range of signed int. Any common type here, >>>> unless it is larger than the two original types, is going to get some
things wrong - but I think that converging on signed int as the common >>>> type would be wrong less often. And if that had been the rule, then
unsigned-preserving promotion would be correct too in examples like
yours.
If I were designing a new C-like language, I'd probably avoid the
issue of signed-preserving vs. value-preserving altogether. I might
say operations where one operand is signed and the other is unsigned
are not allowed; if you need that, you can cast one of the operands.
I'd be with you on that.
However, I think you'd quickly run into inconveniences and annoyances
with integer constants - you'd want "x * 2" to work regardless of the
signedness of x's type. I am no Ada expert, and it's OT anyway, but I
believe in Ada the type of integer constants adapts to fit when used
like this - you'd need something similar to make the hypothetical K&B C
language work well. Integer constants would have to be "questionably
signed", not signed or unsigned. (Maybe "adaptively typed" might be a
better term, and include the size of the type as well as the signedness.)
I think the term you are looking for is "strongly typed". :-)
That is, types are verifably compatible. In a strongly- and
statically-typed language (that is, one where the types of
objects are known at compile time), it's possible to be both
expressive and precise. There are plenty of examples of such
langauges, but the common characteristic is that they (usually)
_infer_ the type of an expression based on the types of the
operands; there are well-known, formally sound, techniques for
doing this
With respect to literal constants, this would simply mean that
the literal would be considered to be of the inferred type of
the expression it was in: if no such inference could be made
(for instance, the types are fundamentally incompatbile), then
the compiler fail, flagging the type incompatibility as an
error.
So, if this were a fragment of a program in a hypothetical C
dialect that was strongly typed and used type inference,
unsigned int a = 5;
unsigned int c = a * 2;
both `5` and `2` would be inferred to have type `unsigned int`,
since both are representable as unsigned ints. However,
unsigned int c = a * -2;
would be a compile time error, since the resulting type of the
expression must be `unsigned int`, but `-2` is not an unsigned
integer: it would have to be explicitly converted first.
The C committee decided to impose a more or less reasonable rule on
all such operations; I might require the programmer to decide what
to do in each case. (There might be an exception for constants,
so that u+1 doesn't require a cast; I haven't thought through the
implications of that.)
Certainly the rules work - even if I might have preferred something
different, you can learn the rules and right correct code using them.
Lots of people do!
Yes, there are many examples of this, so it is obviously true.
However, I don't think there are many large projects written in
C where there isn't undefined behavior lurking somewhere, and
the amount of effort required to learn _all_ the rules of the
language is unnecessarily large.
I think it is fair to say that there are people who wear their
knowledge of the C standard as a badge of honor and look down at
those who desire a simpler language or who do not know the rules
as well. Some of that is fair (we see examples in this group of
some who not only refuse to learn the rules of the language, but
revel in their ignorance).
In article <10ttem6$1daks$2@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:
On 11/05/2026 19:07, Dan Cross wrote:
[snip]
The C89 rationale document is useful here, specifically section
3.2.1.1.
It describes the tradeoffs between unsigned-preserving and
value-preserving semantics that the committeee considered when
making the decision to codify value-preserving behavior. Of
note to this discussion is the following:
|Both schemes give the same answer in the vast majority of
|cases, and both give the same effective result in even more
|cases in implementations with twos complement arithmetic and
|quiet wraparound on signed overflow rCo that is, in most current
|implementations.
Yes, I've read the rationale here, and I'm still not convinced I
understand their reasoning.
Nor am I.
[snip]
The situations they were thinking about were things like this:
unsigned short a = 8;
int b = -5;
long c = a * b;
With value-preserving semantics, `c` is 40. On the other hand,
with unsigned-preserving semantics, assuming a 64-bit `long` and
32-bit `int`, `c` is 4294967256; logical enough, but one could
see how that might be surprising for someone unfamiliar with the
language.
Thanks for that example.
Perhaps the main "mistake" (where "mistake" means "I personally think C
would be nicer for my own use if things were different") is that when
mixing operations between signed int and unsigned int, the signed int is
converted to unsigned. I suspect that in real-world code, unsigned int
values that are within the range of signed int are common - and that
negative signed int values are more common than unsigned int values that
are out of range of signed int. Any common type here, unless it is
larger than the two original types, is going to get some things wrong -
but I think that converging on signed int as the common type would be
wrong less often. And if that had been the rule, then
unsigned-preserving promotion would be correct too in examples like yours.
If I understand what you're saying -- and correct me if I'm
wrong -- it sounds like you are suggesting sign-preserving
semantics for all types.
I'm sure they must have at least talked about that. Where did
they idea go? I'm speculating, but I think they were trying to
thread a needle here, and felt that redefining the semantics for
types ranked with `int` and higher would be a bridge too far. I
keep saying I had (and still have) a lot of sympathy for the
committee: they were chared with imposing order on an unruly
situation, balancing many competing organizations and interests,
all while preserving compatibility with existing pratice and
implementations, and (as they put it) retaining the "character"
of C. This is an unenviable position to be in.
I imagine the committee felt that, by the time the standards
process was in full swing, the ship had sailed on changing the
rules for values of type `int` or types of higher ranks, and
they could only reasonably address promotion of leser ranked
types to that of `int`. They acknowledged that the
sign-preserving promotion rules were a big semantic difference
from established practice; had they attempted to mandate
sign-preserving rules for arithmetic involving the `int` family
of types, they likely would have faced a serious revolt.
And as they said in the rationale, in _most_ cases, it doesn't
matter; for `int`/`unsigned int` even less so. For instance,
assume a platform with 32-bit `int`. Then the behavior of this
code is implementation-defined, but documented to have the same
predictable result across most conforming compilers:
unsigned int a = 8;
int b = -5;
int c = a * b;
To whit, `b` is prompted to `unsigned int` per the rules set
forth in the standard prior to the multiplication; the product
is taken in some ring $Z/2^nZ$ where $n$ is the bit-width of
`unsigned int` (in this example, 32); the product then undergoes
lvalue conversion to `signed int`, but per the rules for
unsigned-to-signed conversions, the result is
implementation-defined (since the product is outside of the
range of the positive subset of 32-bit numbers in 2s complement representation). However, almost all real implementations will
define this using twos complement semantics with no change to
representation, and assign the resulting value assigned to `c`.
This is, surely, by far the most common case.
So, for all _practical_ purposes, the interpretation of the
product as signed or unsigned only matters in the handful of
cases listed in the rationale: using the result in a comparison, right-shifting the result or widening it (in which case
sign-extension matters, now that all the world's a 2s complement
machine) and so on.
And in cases where the compiler permits silent wrapping on
signed overflow, as I firmly believe they expected to be the
near-universal case, they made the argument that it mattered
even less.
Of course, we understand the consequences of these decisions
much better now, 40 years after the fact. But I really don't
think they thought things would unfold the way they have, with
UB taking such a prominent role as a basis for optimization.
- Dan C.
On 12/05/2026 15:10, Dan Cross wrote:
And as they said in the rationale, in _most_ cases, it doesn't
matter; for `int`/`unsigned int` even less so. For instance,
assume a platform with 32-bit `int`. Then the behavior of this
code is implementation-defined, but documented to have the same
predictable result across most conforming compilers:
I don't know much about early C compilers (other than briefly trying C
on a home computer in my teens, ANSI C was established by the time I
first used C). Did early any / many C compilers guarantee wrapping for >signed integer arithmetic?
On 12/05/2026 15:10, Dan Cross wrote:
In article <10ttem6$1daks$2@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:
[snip]If I understand what you're saying -- and correct me if I'm
Perhaps the main "mistake" (where "mistake" means "I personally think C
would be nicer for my own use if things were different") is that when
mixing operations between signed int and unsigned int, the signed int is >>> converted to unsigned. I suspect that in real-world code, unsigned int
values that are within the range of signed int are common - and that
negative signed int values are more common than unsigned int values that >>> are out of range of signed int. Any common type here, unless it is
larger than the two original types, is going to get some things wrong -
but I think that converging on signed int as the common type would be
wrong less often. And if that had been the rule, then
unsigned-preserving promotion would be correct too in examples like yours. >>
wrong -- it sounds like you are suggesting sign-preserving
semantics for all types.
Yes. (Although I might not have thought through all the consequences of >this - so it's possible that I'll later realise or learn that it would
have been a bad idea.)
[snip]
I imagine the committee felt that, by the time the standards
process was in full swing, the ship had sailed on changing the
rules for values of type `int` or types of higher ranks, and
they could only reasonably address promotion of leser ranked
types to that of `int`. They acknowledged that the
sign-preserving promotion rules were a big semantic difference
from established practice; had they attempted to mandate
sign-preserving rules for arithmetic involving the `int` family
of types, they likely would have faced a serious revolt.
And as they said in the rationale, in _most_ cases, it doesn't
matter; for `int`/`unsigned int` even less so. For instance,
assume a platform with 32-bit `int`. Then the behavior of this
code is implementation-defined, but documented to have the same
predictable result across most conforming compilers:
I don't know much about early C compilers (other than briefly trying C
on a home computer in my teens, ANSI C was established by the time I
first used C). Did early any / many C compilers guarantee wrapping for >signed integer arithmetic? It is not a guarantee I have seen in any of
the embedded C compiler manuals I have read, though some of these
compilers were far too weakly optimising for it to have made a difference.
member`, the thing on the left side of `->` must be a pointerto an instance of whatever `struct` definition contained
unsigned int a = 8;
int b = -5;
int c = a * b;
To whit, `b` is prompted to `unsigned int` per the rules set
forth in the standard prior to the multiplication; the product
is taken in some ring $Z/2^nZ$ where $n$ is the bit-width of
`unsigned int` (in this example, 32); the product then undergoes
lvalue conversion to `signed int`, but per the rules for
unsigned-to-signed conversions, the result is
implementation-defined (since the product is outside of the
range of the positive subset of 32-bit numbers in 2s complement
representation). However, almost all real implementations will
define this using twos complement semantics with no change to
representation, and assign the resulting value assigned to `c`.
This is, surely, by far the most common case.
Yes, you end up with the same answer of -40, when "c" is an "int". But
if "c" is "long" (like in your first example), and that is bigger than >"int", the answer is 4294967256 which is almost certainly not what the >programmer intended. If the common type for "a * b" had been signed
int, rather than unsigned int, then you'd get -40 whether "c" is "int"
or "long". And you'd get it more directly, with less IB.
So, for all _practical_ purposes, the interpretation of the
product as signed or unsigned only matters in the handful of
cases listed in the rationale: using the result in a comparison,
right-shifting the result or widening it (in which case
sign-extension matters, now that all the world's a 2s complement
machine) and so on.
And in cases where the compiler permits silent wrapping on
signed overflow, as I firmly believe they expected to be the
near-universal case, they made the argument that it mattered
even less.
Of course, we understand the consequences of these decisions
much better now, 40 years after the fact. But I really don't
think they thought things would unfold the way they have, with
UB taking such a prominent role as a basis for optimization.
Well, as they say, making predictions is hard - especially about the future!
In article <10tvefc$1vmna$2@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:
On 12/05/2026 15:10, Dan Cross wrote:
In article <10ttem6$1daks$2@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:
[snip]
I imagine the committee felt that, by the time the standards
process was in full swing, the ship had sailed on changing the
rules for values of type `int` or types of higher ranks, and
they could only reasonably address promotion of leser ranked
types to that of `int`. They acknowledged that the
sign-preserving promotion rules were a big semantic difference
from established practice; had they attempted to mandate
sign-preserving rules for arithmetic involving the `int` family
of types, they likely would have faced a serious revolt.
And as they said in the rationale, in _most_ cases, it doesn't
matter; for `int`/`unsigned int` even less so. For instance,
assume a platform with 32-bit `int`. Then the behavior of this
code is implementation-defined, but documented to have the same
predictable result across most conforming compilers:
I don't know much about early C compilers (other than briefly trying C
on a home computer in my teens, ANSI C was established by the time I
first used C). Did early any / many C compilers guarantee wrapping for
signed integer arithmetic? It is not a guarantee I have seen in any of
the embedded C compiler manuals I have read, though some of these
compilers were far too weakly optimising for it to have made a difference.
I think "guarantee" is too strong of a word; after all, there
was no standard in which to make a guarantee, but that was how
very early C compilers operated in practice. They were very
primitive, probably in part because of the paucity of the
machine they were developed on, so one really could imagine the
instructions that would be emitted in response to a given line
of code (C's unwarranted reputation as a "high-level assembler"
likely comes from this).
Pre-typesetter C, in particular, was pretty wild, though the
basic skeletal structure of the language as we know it had
mostly settled by then. Still, if one looks at the 6th Edition
Unix kernel source codes, one will frequently find things like
this (excerpted from the DN-11 driver):
```
struct dn {
struct {
char dn_stat;
char dn_reg;
} dn11[3];
}
#define DNADDR 0175200
dnopen(dev, flag)
{
register struct dn *dp;
register int rdev;
rdev = dev.d_minor;
dp = &DNADDR->dn11[rdev];
if (dp->dn_reg&(PWI|DLO))
u.u_error = ENXIO;
else {
DNADDR->dn11[0].dn_stat =| MENABLE;
dp->dn_stat = IENABLE|MENABLE|CRQ;
}
}
```
Notice the pointer that the struct member references are made
against, not just a variable with no declared type, but against
an integer constant: in early C, all `struct` members shared a
single common namespace; so the language assumed if it saw
member`, the thing on the left side of `->` must be a pointerto an instance of whatever `struct` definition contained
`member`. On the PDP-11, an integer literal was taken as an
absolute address in the virtual address space of the program, as
defined by the settings in its segmentation registers. In the
kernel, this is basically a physical address.
unsigned int a = 8;
int b = -5;
int c = a * b;
To whit, `b` is prompted to `unsigned int` per the rules set
forth in the standard prior to the multiplication; the product
is taken in some ring $Z/2^nZ$ where $n$ is the bit-width of
`unsigned int` (in this example, 32); the product then undergoes
lvalue conversion to `signed int`, but per the rules for
unsigned-to-signed conversions, the result is
implementation-defined (since the product is outside of the
range of the positive subset of 32-bit numbers in 2s complement
representation). However, almost all real implementations will
define this using twos complement semantics with no change to
representation, and assign the resulting value assigned to `c`.
This is, surely, by far the most common case.
Yes, you end up with the same answer of -40, when "c" is an "int". But
if "c" is "long" (like in your first example), and that is bigger than
"int", the answer is 4294967256 which is almost certainly not what the
programmer intended. If the common type for "a * b" had been signed
int, rather than unsigned int, then you'd get -40 whether "c" is "int"
or "long". And you'd get it more directly, with less IB.
But you'd have more UB, because you'd run into signed overflow
more often (assuming they preserved that as UB in this
hypothetical alternate reality).
If, instead, they had defined
the language to have unsigned-preserving semantics and defined
the behavior of unsigned to signed convertion to be the inverse
of signed to unsigned conversion, then you'd get the same result
without the IB.
So, for all _practical_ purposes, the interpretation of the
product as signed or unsigned only matters in the handful of
cases listed in the rationale: using the result in a comparison,
right-shifting the result or widening it (in which case
sign-extension matters, now that all the world's a 2s complement
machine) and so on.
And in cases where the compiler permits silent wrapping on
signed overflow, as I firmly believe they expected to be the
near-universal case, they made the argument that it mattered
even less.
Of course, we understand the consequences of these decisions
much better now, 40 years after the fact. But I really don't
think they thought things would unfold the way they have, with
UB taking such a prominent role as a basis for optimization.
Well, as they say, making predictions is hard - especially about the future!
Lol. Thanks, Steincke.
- Dan C.
On 12/05/2026 16:05, Dan Cross wrote:
[snip]
I think the term you are looking for is "strongly typed". :-)
Sure - I want this all to be strongly typed, but the question is what
should the type of integer constants / integer literals be? Ada calls
them "universal_integer" type, which might be a good name. (I don't
think there's a need to do too much bikeshedding for a purely
hypothetical language, however.)
That is, types are verifably compatible. In a strongly- and
statically-typed language (that is, one where the types of
objects are known at compile time), it's possible to be both
expressive and precise. There are plenty of examples of such
langauges, but the common characteristic is that they (usually)
_infer_ the type of an expression based on the types of the
operands; there are well-known, formally sound, techniques for
doing this
Yes. I'd want the hypothetical language to be more strongly typed than C.
With respect to literal constants, this would simply mean that
the literal would be considered to be of the inferred type of
the expression it was in: if no such inference could be made
(for instance, the types are fundamentally incompatbile), then
the compiler fail, flagging the type incompatibility as an
error.
Yes.
So, if this were a fragment of a program in a hypothetical C
dialect that was strongly typed and used type inference,
unsigned int a = 5;
unsigned int c = a * 2;
both `5` and `2` would be inferred to have type `unsigned int`,
since both are representable as unsigned ints. However,
unsigned int c = a * -2;
would be a compile time error, since the resulting type of the
expression must be `unsigned int`, but `-2` is not an unsigned
integer: it would have to be explicitly converted first.
That would be good.
I think there'd be a fair bit of overlap in our personal perfected
versions or dialects of C - but I'm sure there would be differences too.
The C committee decided to impose a more or less reasonable rule on
all such operations; I might require the programmer to decide what
to do in each case. (There might be an exception for constants,
so that u+1 doesn't require a cast; I haven't thought through the
implications of that.)
Certainly the rules work - even if I might have preferred something
different, you can learn the rules and right correct code using them.
Lots of people do!
Yes, there are many examples of this, so it is obviously true.
However, I don't think there are many large projects written in
C where there isn't undefined behavior lurking somewhere, and
the amount of effort required to learn _all_ the rules of the
language is unnecessarily large.
I think it is fair to say that there are people who wear their
knowledge of the C standard as a badge of honor and look down at
those who desire a simpler language or who do not know the rules
as well. Some of that is fair (we see examples in this group of
some who not only refuse to learn the rules of the language, but
revel in their ignorance).
But that doesn't mean that all of the criticism is wrong, and
the frequency at which it happens that people run into UB is
also an indictment of the language. Put it this way: it may be
the programmer's fault that they relied on UB, but that it is so
evidently hard to learn and internalize the rules is also the
fault of the langauge. It is not wrong to wish it were better.
It is not wrong to wish C were better - with hindsight, there are many
ways in which a slightly different language would have kept the
advantages of C while reducing at least some risks of errors (whether UB
or not).
But I think that a lot of the UB in you might find in large projects
would be bugs in the code regardless of how that UB might have been
defined.
That is, even if signed integer arithmetic overflow had been
fully defined, you'd still get the wrong answer and the program has a
bug. The same with dereferencing a null pointer, or a buffer overflow,
or using the value of an uninitialised local variable. That is, if you >write your code so that it would have been bug-free in a language that
did not have these UB's, the C code would be the same.
The exceptions here would be cases where a programmer wrongly assumes >something has defined behaviour, and writes code according to that >assumption. Thus if they write code that assumes reading an
uninitialised local variable returns 0, or has an unspecified (but not >undefined) value, or that assumes signed integer overflow is defined as >wrapping - /then/ the C language's UB can surprise them in a way other >languages generally do not. I don't think there are other situations
where you could hit UB while expecting defined behaviour. (But as we
know, there are a few situations where the signed integer overflow can
be hiding unexpectedly, like uint16_t * uint16_t.)
On 12/05/2026 17:57, Dan Cross wrote:
I think "guarantee" is too strong of a word; after all, there
was no standard in which to make a guarantee, but that was how
very early C compilers operated in practice. They were very
primitive, probably in part because of the paucity of the
machine they were developed on, so one really could imagine the
instructions that would be emitted in response to a given line
of code (C's unwarranted reputation as a "high-level assembler"
likely comes from this).
Perhaps "documented" would be better than "guaranteed". I realise that
in many situations, even highly optimising compilers generate signed
integer arithmetic operations that wrap. But to me, it's important what
the documentation says. The C standard says signed integer arithmetic
is UB - if a C compiler's manual does not document what the compiler
does with overflow, you can't rely on any particular behaviour. But if
the manual says "signed integer overflow follows the target processor's >behaviour" and you know that is wrapping (no traps or other
"interesting" stuff), that's fine. Before the C standard, then of
course the compiler manual (and any referenced documents) would be only >source of information on the semantics.
Very occasionally, I'll rely on "what happens in practice" - if there is
no good and efficient way to avoid it and I can be sure from testing and >examining generated assembly code that everything works as I want.
Pre-typesetter C, in particular, was pretty wild, though the
basic skeletal structure of the language as we know it had
mostly settled by then. Still, if one looks at the 6th Edition
Unix kernel source codes, one will frequently find things like
this (excerpted from the DN-11 driver):
```
struct dn {
struct {
char dn_stat;
char dn_reg;
} dn11[3];
}
#define DNADDR 0175200
dnopen(dev, flag)
{
register struct dn *dp;
register int rdev;
rdev = dev.d_minor;
dp = &DNADDR->dn11[rdev];
if (dp->dn_reg&(PWI|DLO))
u.u_error = ENXIO;
else {
DNADDR->dn11[0].dn_stat =| MENABLE;
dp->dn_stat = IENABLE|MENABLE|CRQ;
}
}
```
Notice the pointer that the struct member references are made
against, not just a variable with no declared type, but against
an integer constant: in early C, all `struct` members shared a
single common namespace; so the language assumed if it saw
member`, the thing on the left side of `->` must be a pointerto an instance of whatever `struct` definition contained
`member`. On the PDP-11, an integer literal was taken as an
absolute address in the virtual address space of the program, as
defined by the settings in its segmentation registers. In the
kernel, this is basically a physical address.
Yes, I knew that's how structs worked before (though I have never had to >work with any code from that time). I notice also it has "=|" rather
than "|=".
And it seems to have been written at a time when space characters still
cost real money :-)
unsigned int a = 8;
int b = -5;
int c = a * b;
To whit, `b` is prompted to `unsigned int` per the rules set
forth in the standard prior to the multiplication; the product
is taken in some ring $Z/2^nZ$ where $n$ is the bit-width of
`unsigned int` (in this example, 32); the product then undergoes
lvalue conversion to `signed int`, but per the rules for
unsigned-to-signed conversions, the result is
implementation-defined (since the product is outside of the
range of the positive subset of 32-bit numbers in 2s complement
representation). However, almost all real implementations will
define this using twos complement semantics with no change to
representation, and assign the resulting value assigned to `c`.
This is, surely, by far the most common case.
Yes, you end up with the same answer of -40, when "c" is an "int". But
if "c" is "long" (like in your first example), and that is bigger than
"int", the answer is 4294967256 which is almost certainly not what the
programmer intended. If the common type for "a * b" had been signed
int, rather than unsigned int, then you'd get -40 whether "c" is "int"
or "long". And you'd get it more directly, with less IB.
But you'd have more UB, because you'd run into signed overflow
more often (assuming they preserved that as UB in this
hypothetical alternate reality).
Would you get more signed overflow in practice? And in particular,
would you get more signed overflow UB in places where you would not have
a bug in the code anyway. There would certainly be more cases of signed >integer arithmetic, whereas moving to a common unsigned type means more >unsigned integer arithmetic. But I don't see signed integer arithmetic
as a risk of UB in itself - it is only a risk UB if you are working with >inappropriate values.
I think perhaps this is getting a bit speculative - we can't really give >quantitative values for the risk of problems with particular expressions
in existing C code. I believe the conclusion is simply that the C
committee chose the rules that they thought, at the time, gave the most >consistent results with the least risk of introducing new problems in >existing code written for a variety of slightly different C dialects.
Four decades later I disagree with some of those decisions, but there's >nothing to be done about it now.
In article <10tvmp7$23t17$1@dont-email.me>,<snip>
David Brown <david.brown@hesbynett.no> wrote:
Pre-typesetter C, in particular, was pretty wild, though the
basic skeletal structure of the language as we know it had
mostly settled by then. Still, if one looks at the 6th Edition
Unix kernel source codes, one will frequently find things like
this (excerpted from the DN-11 driver):
```
struct dn {
struct {
char dn_stat;
char dn_reg;
} dn11[3];
}
#define DNADDR 0175200
dnopen(dev, flag)
{
register struct dn *dp;
register int rdev;
rdev = dev.d_minor;
dp = &DNADDR->dn11[rdev];
if (dp->dn_reg&(PWI|DLO))
u.u_error = ENXIO;
else {
DNADDR->dn11[0].dn_stat =| MENABLE;
dp->dn_stat = IENABLE|MENABLE|CRQ;
}
}
```
Notice the pointer that the struct member references are made
against, not just a variable with no declared type, but against
an integer constant: in early C, all `struct` members shared a
single common namespace; so the language assumed if it saw
member`, the thing on the left side of `->` must be a pointerto an instance of whatever `struct` definition contained
`member`. On the PDP-11, an integer literal was taken as an
absolute address in the virtual address space of the program, as
defined by the settings in its segmentation registers. In the
kernel, this is basically a physical address.
Yes, I knew that's how structs worked before (though I have never had to >>work with any code from that time). I notice also it has "=|" rather
than "|=".
Yes. This is in Dennis Ritchie's C history paper; apparently it
was due to something they did in the lexical analyzer in B, on
the PDP-7.
And it seems to have been written at a time when space characters still >>cost real money :-)
Heh. They preserved the density of that style in Plan 9, too.
Michael S <already5chosen@yahoo.com> writes:
I not just like stdint.h*
I also hate when C programmers define their own fixed-width integer
types.
[...]
That would raise my blood pressure:
typedef int32_t s32;
typedef uint32_t u32;
typedef uint8_t octet;
Can you say what it is about them that you don't like?
Or why you don't like them? Are the reasons the same
in all three cases, or is octet different?
In article <10tvmp7$23t17$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:
[snip]
And it seems to have been written at a time when space characters still >>>cost real money :-)
Heh. They preserved the density of that style in Plan 9, too.
The code of that era was generally terse. Here's an interesting
fragment from v6 ls.c:
readdir(dir)
char *dir;
{
static struct {
int dinode;
char dname[14];
} dentry;
register char *p;
register int j;
register struct lbuf *ep;
if (fopen(dir, &inf) < 0) {
printf("%s unreadable\n", dir);
return;
}
tblocks = 0;
for(;;) {
p = &dentry;
for (j=0; j<16; j++)
*p++ = getc(&inf);
if (dentry.dinode==0
|| aflg==0 && dentry.dname[0]=='.')
continue;
if (dentry.dinode == -1)
break;
ep = gstat(makename(dir, dentry.dname), 0);
if (ep->lnum != -1)
ep->lnum = dentry.dinode;
for (j=0; j<14; j++)
ep->lname[j] = dentry.dname[j];
}
close(inf.fdes);
}
As an aside, I think this addresses your question/gripe about
why 'ls' ignored all dot files by default. While the intent
was to hide '.' and '..', ls(1) simply looked at the first
byte. I don't think user-created 'dot-files' were common in the v6
days, and when they did become common, it was _because_ of that
shortcut in V6 ls(1).
It's worth noting that Ken? used "aflg==0" rather than '!aflg' :-)
Bart <bc@freeuk.com> writes:
[...]
I don't have much of a problem with the things that C can do, but with
how it does it, its syntax, its ancient baggage, its quirks, its
folklore, its Unix-centric ecosystem, its pointless UBs, its
insistence in working with every oddball processor, its solving every
shortcoming with macros, its adherents who will defend every
misfeature to the death...
You're mostly wrong about that last point. Many of us spend a
great deal of time and effort here *explaining* how C is defined
and how best to use it.
To explain is not to defend. What will it take for you to understand
that?
[...]
It is also frustrating looking at C forums and people thinking they
are too stupid to grasp something when it's language that could have
been better.
(I'm going to assume I parsed that sentence correctly.)
Nobody has said that C couldn't have been better. But it could
hardly have been more successful. As Dennis Ritchie himself said,
"C is quirky, flawed, and an enormous success."
[...]
Even half a century ago, there were big companies and lots of cleverI am reminded of Seymour Cray's rebuttal to Tom Watson:
people, who could have cranked out a suitable systems language of
equal capability to C in their sleep, but with fewer rough edges.
I wonder why they didn't?
On 12/05/2026 02:37, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
I don't have much of a problem with the things that C can do, but with
how it does it, its syntax, its ancient baggage, its quirks, its
folklore, its Unix-centric ecosystem, its pointless UBs, its
insistence in working with every oddball processor, its solving every
shortcoming with macros, its adherents who will defend every
misfeature to the death...
You're mostly wrong about that last point. Many of us spend a
great deal of time and effort here *explaining* how C is defined
and how best to use it.
I don't see the connection with my point. I haven't said that people
don't explain things.
But it does seem that every poor feature in C is an invaluable asset
to somebody, that must never be fixed.
So the inconvenience of how 'switch' works is excused because
/sometimes/ you need fallthrough, or the one time in a thousand you
need Duff's device.
To explain is not to defend. What will it take for you to understand
that?
[...]
It is also frustrating looking at C forums and people thinking they
are too stupid to grasp something when it's language that could have
been better.
(I'm going to assume I parsed that sentence correctly.)
Nobody has said that C couldn't have been better. But it could
hardly have been more successful. As Dennis Ritchie himself said,
"C is quirky, flawed, and an enormous success."
Yeah, it's one of the great mysteries. Even half a century ago, there
were big companies and lots of clever people, who could have cranked
out a suitable systems language of equal capability to C in their
sleep, but with fewer rough edges.
I wonder why they didn't? Maybe they would have been aimimg too high
even then? (Instead we got Smalltalk and Ada.)
On 12/05/2026 15:05, Dan Cross wrote:[...]
I think it is fair to say that there are people who wear their
knowledge of the C standard as a badge of honor and look down at
those who desire a simpler language or who do not know the rules
as well. Some of that is fair (we see examples in this group of
some who not only refuse to learn the rules of the language, but
revel in their ignorance).
Take for example C's set of operator precedences.
The one for the ?: operator is particularly obscure, so in an
expression like one of these:
a + b ? c - d : e * f
a ? b ? c : d ? e : f : g
then parentheses would be used to make things clearer. (I haven't
check these are valid, but that is the point; it is hard to see!)
But would shouldn't people be expected to learn the rules? Why is it
OK to 'revel' in not knowing the basics here, but not when unnecessary
UBs are involved where rules are harder and which depend on runtime
inputs?
Bart <bc@freeuk.com> writes:
On 12/05/2026 15:05, Dan Cross wrote:[...]
I think it is fair to say that there are people who wear their
knowledge of the C standard as a badge of honor and look down at
those who desire a simpler language or who do not know the rules
as well. Some of that is fair (we see examples in this group of
some who not only refuse to learn the rules of the language, but
revel in their ignorance).
Take for example C's set of operator precedences.
The one for the ?: operator is particularly obscure, so in an
expression like one of these:
a + b ? c - d : e * f
a ? b ? c : d ? e : f : g
then parentheses would be used to make things clearer. (I haven't
check these are valid, but that is the point; it is hard to see!)
Some C programmers make it a point to know all the operator
precedences by heart. I don't. I know most of them, but I
occasionally have to look them up. (My method is to look at the
subsection headers in 6.5 "Expressions", and look at the grammar
when I need more detail. Others prefer to use tables.)
But would shouldn't people be expected to learn the rules? Why is it
OK to 'revel' in not knowing the basics here, but not when unnecessary
UBs are involved where rules are harder and which depend on runtime
inputs?
There's nothing wrong with adding parentheses to make an expression
clearer. It doesn't imply an unwillingness to learn the rules,
just consideration for one's audience.
On 12/05/2026 02:37, Keith Thompson wrote:<snip>
Bart <bc@freeuk.com> writes:
[...]
Nobody has said that C couldn't have been better. But it could
hardly have been more successful. As Dennis Ritchie himself said,
"C is quirky, flawed, and an enormous success."
Yeah, it's one of the great mysteries. Even half a century ago, there
were big companies and lots of clever people, who could have cranked out
a suitable systems language of equal capability to C in their sleep, but >with fewer rough edges.
On Tue, 12 May 2026 22:32:30 +0100
Bart <bc@freeuk.com> wrote:
Even half a century ago, there were big companies and lots of clever
people, who could have cranked out a suitable systems language of
equal capability to C in their sleep, but with fewer rough edges.
I wonder why they didn't?
I am reminded of Seymour Cray's rebuttal to Tom Watson:
"I understand that in the laboratory developing this system there are
only 34 people, 'including the janitor.' Of these, 14 are engineers and
4 are programmers, and only one has a Ph. D., a relatively junior
programmer. To the outsider, the laboratory appeared to be cost
conscious, hard working and highly motivated.
Contrasting this modest effort with our own vast development
activities, I fail to understand why we have lost our industry
leadership position by letting someone else offer the worldrCOs most
powerful computer."
"It seems like Mr. Watson has answered his own question."
Bart <bc@freeuk.com> writes:
On 12/05/2026 02:37, Keith Thompson wrote:
[...]
Yeah, it's one of the great mysteries. Even half a century ago, there
were big companies and lots of clever people, who could have cranked out
a suitable systems language of equal capability to C in their sleep, but
with fewer rough edges.
Those clever people _were_ cranking out suitable systems languages
by the bucketful. PL/1, Algol derivatives, proprietary internal
languages (Burroughs SPRITE and BPL languages), HP-3000 SPL (Systems Programming Language - I used SPL in the late 70s) and
on the academic side, modula, ADA, Pascal (yes, it could be
a systems programming language, c.f. VAX-11 Pascal).
[...]
[Dropping comp.lang.misc, since this is only about C.]
Bart <bc@freeuk.com> writes:
[...][...]
So the inconvenience of how 'switch' works is excused because
/sometimes/ you need fallthrough, or the one time in a thousand you
need Duff's device.
Not at all. "switch" was originally implemented in a way that,
I suspect, was easier for the compiler to implement (basically
a scoped computed goto), and for an audience of programmers who,
to exaggerate slightly, could shout across the room and ask Dennis
Ritchie questions about it.
I suspect many or most C programmers
would prefer it to have been designed without default fallthrough.
It stays the way it is because changing it *would break existing
code*. Worse, some seemingly reasonable ways of changing it would
mean that existing code is still valid but with different semantics.
Note that the current method for using multiple cases relies on
implicit fallthrough.
Certainly a "better" switch statement could do that differently,
but it's something that would have to be addressed. And since
the existing switch statement *works*, and can be used reasonably
safely if the programmer exercises a reasonable amount of care,
and since compilers can and do warn about questionable uses, it
hasn't been seen as worth fixing. As far as I know, nobody has
submitted a proposal to change it.
Except that C23 adds a "fallthrough" attribute that, while it
doesn't change the semantics of the switch statement, allows a
programmer to tell the compiler that a fallthrough was intentional.
A compiler can choose to warn about an unmarked fallthrough and
remain silent when it sees the "fallthrough" attribute.
[...]
On 2026-05-08 06:43, David Brown wrote:
...
Yes, I have heard that argument before. I am unconvinced that the
"value preserving" choice actually has any real advantages. I also
think it is a misnomer - it implies that "unsigned preserving" would
not preserve values, which is wrong.
Unsigned-preserving rules would convert a signed value which might be negative to unsigned type more frequently than the value preserving
rules do.
For the early C compiler on the PDP-11, the 'int' type was
16-bits, implicitly signed, and the code generator simply emitted
available arithmetic instructions.
It was the only C compiler at the time, any guarantees would have
been implicit in the choice of target architecture.
I mostly wrote unix kernel code using the v6 compiler, rather than
writing code that did any heavy math, so whether value was
preserved or sign was preserved wasn't something I, as a kernel
programmer, routinely considered.
On Tue, 12 May 2026 07:12:00 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
I not just like stdint.h*
I also hate when C programmers define their own fixed-width
integer types.
[...]
That would raise my blood pressure:
typedef int32_t s32;
typedef uint32_t u32;
typedef uint8_t octet;
Can you say what it is about them that you don't like?
They increase mental load for casual reader of the code.
IMHO, for no good reason.
Or why you don't like them? Are the reasons the same
in all three cases, or is octet different?
The same in all three cases.
Take for example C's set of operator precedences.
The one for the ?: operator is particularly obscure, so in an
expression like one of these:
a + b ? c - d : e * f
a ? b ? c : d ? e : f : g
then parentheses would be used to make things clearer. (I haven't
check these are valid, but that is the point; it is hard to see!)
But would shouldn't people be expected to learn the rules? Why is it
OK to 'revel' in not knowing the basics here, but not when unnecessary
UBs are involved where rules are harder and which depend on runtime
inputs?
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
scott@slp53.sl.home (Scott Lurndal) writes:
kalevi@kolttonen.fi (Kalevi Kolttonen) writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
In almost all cases where uint8_t
might be used, unsigned char works just as well.
Why "almost"? Where is the difference if any?
As far as I know, ISO guarantees that
sizeof(unsigned char) is always 1 byte.
On at least one system with a working C compiler,
a byte is 9 bits, not 8. If I wanted an 8-bit datum
on that system, I'd have to use uint8_t.
If a byte is 9 bits (ie, if CHAR_BIT == 9) there cannot
be a uint8_t type. The fixed-width types are not allowed
to have padding bits.
That was a 36-bit system. It could easly create a
uint8_t value from 1/9th of two 72-bit words;
so no padding bits required.
Indeed. Although from my perspective, the use of the
stdint types clearly documents the programmers
intent, whereas a typedef such as BYTE or WORD
is inherently ambiguous and would require a programmer
to look up the definition of such types in the
application to determine the original programmers intent.
BYTE and WORD are poor choices for type names, no doubt
about that. On the other hand, in many or most cases
so are [u]intNN_t; they simultaneously convey both too
little and too much information. There is a certain kind
of programming where the fixed-width types are genuinely
helpful; unfortunately though they are used a lot more
widely than circumstances where they are helpful.
The programming I do
(mainly kernel programming, SoC simulation,
firmware) all naturally require the fixed-width types.
For other apps, int, long, float, double are preferred
to INT, LONG, FLOAT, DOUBLE (which seems to be the
way windows programmers code)[*]
[*] which probably dates back to 16-bit windows
and their methods of maintaining backward compatability
across two subsequent (32, 64) x86 processor architectures
plus MIPS et alia.
It is not clear to me that `longjmp` out of a non-nested signal
handler is still well-defined as of C11, though it is explicitly
stated to be C89.
scott@slp53.sl.home (Scott Lurndal) writes:[...]
The programming I do
(mainly kernel programming, SoC simulation,
firmware) all naturally require the fixed-width types.
Right. Code that interacts very closely with hardware is one of
those cases where the fixed-width types make sense.
[...]
[...]
Take for example C's set of operator precedences.
The one for the ?: operator is particularly obscure, so in an expression like one of these:
-a-a a + b ? c - d : e * f
-a-a a ? b ? c : d ? e : f : g
then parentheses would be used to make things clearer. (I haven't check these are valid, but that is the point; it is hard to see!)
But would shouldn't people be expected to learn the rules?
[...]
[...]
Yes, there are many examples of this, so it is obviously true.
However, I don't think there are many large projects written in
C where there isn't undefined behavior lurking somewhere, and
the amount of effort required to learn _all_ the rules of the
language is unnecessarily large.
I think it is fair to say that there are people who wear their
knowledge of the C standard as a badge of honor and look down at
those who desire a simpler language or who do not know the rules
as well. Some of that is fair (we see examples in this group of
some who not only refuse to learn the rules of the language, but
revel in their ignorance).
But that doesn't mean that all of the criticism is wrong, and
the frequency at which it happens that people run into UB is
also an indictment of the language. Put it this way: it may be
the programmer's fault that they relied on UB, but that it is so
evidently hard to learn and internalize the rules is also the
fault of the langauge. It is not wrong to wish it were better.
[...]
[...]
Perhaps the main "mistake" (where "mistake" means "I personally think C would be nicer for my own use if things were different") is that when
mixing operations between signed int and unsigned int, the signed int is converted to unsigned.-a I suspect that in real-world code, unsigned int values that are within the range of signed int are common - and that negative signed int values are more common than unsigned int values that
are out of range of signed int.-a Any common type here, unless it is
larger than the two original types, is going to get some things wrong -
but I think that converging on signed int as the common type would be
wrong less often.-a And if that had been the rule, then unsigned-
preserving promotion would be correct too in examples like yours.
[...]
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
[...]
BYTE and WORD are poor choices for type names, no doubt
about that.
[...]
WORD is certainly ambiguous (unless, I suppose, it's sufficiently
obvious from the context). But I don't have a problem with BYTE,
or preferably byte, as a type name as long as it really is a byte.
C does have a byte type; it just happens to spell it "unsigned char".
But I don't object to something like
typedef unsigned char byte;
and I've used it myself.
In article <10tvmp7$23t17$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:
Would you get more signed overflow in practice? And in particular,
would you get more signed overflow UB in places where you would not have
a bug in the code anyway. There would certainly be more cases of signed
integer arithmetic, whereas moving to a common unsigned type means more
unsigned integer arithmetic. But I don't see signed integer arithmetic
as a risk of UB in itself - it is only a risk UB if you are working with
inappropriate values.
I suspect you would, if only because one of the major motivating
factors for using unsigned arithmetic in practice is to have the
full bit-range of the type available. [...]
cross@spitfire.i.gajendra.net (Dan Cross) writes:[...]
In article <10tvmp7$23t17$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:
And it seems to have been written at a time when space characters still
cost real money :-)
Heh. They preserved the density of that style in Plan 9, too.
The code of that era was generally terse. [...]
In article <10tpq7e$a6kp$3@dont-email.me>, Bart <bc@freeuk.com> wrote:
[...]
Apparently, you missed the changes afoot in the committee to do
exactly what everyone has been telling you: deprecate `i[A]` but
preserve `i + A`.
On 2026-05-13 00:35, Keith Thompson wrote:
[Dropping comp.lang.misc, since this is only about C.]
Bart <bc@freeuk.com> writes:
[...][...]
So the inconvenience of how 'switch' works is excused because
/sometimes/ you need fallthrough, or the one time in a thousand you
need Duff's device.
I don't see any inconvenience in "how it works"; it actually
allows programmers to implement both semantics as needed. And
both semantics were needed, they have been used. (Even if you
think your projection of your preferences and limited uses is
what should constitute the global software development world.)
I don't see any inconvenience in "how it works"; it actually
allows programmers to implement both semantics as needed.
(Even if you
think your projection of your preferences and limited uses is
what should constitute the global software development world.)
Not at all.-a "switch" was originally implemented in a way that,
I suspect, was easier for the compiler to implement (basically
a scoped computed goto), and for an audience of programmers who,
to exaggerate slightly, could shout across the room and ask Dennis
Ritchie questions about it.
Computed 'goto' was actually considered quite "high standards"
back then, quite some prominent languages provided these. (Hard
to believe given all the options that appeared later.)
I suspect many or most C programmers
would prefer it to have been designed without default fallthrough.
The explicit and clumsy 'break' is what syntactically annoys me,
but it's also no drama, to be clear.
It stays the way it is because changing it *would break existing
code*.-a Worse, some seemingly reasonable ways of changing it would
mean that existing code is still valid but with different semantics.
Indeed. And that's the crucial point. A simple "dislike"-criticism
without acknowledging the practical side effects is pointless.
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
[...]
BYTE and WORD are poor choices for type names, no doubt
about that.
[...]
WORD is certainly ambiguous (unless, I suppose, it's sufficiently
obvious from the context). But I don't have a problem with BYTE,
or preferably byte, as a type name as long as it really is a byte.
[...]
BYTE is a poor choice for a type name because it looks like a
macro.
A lower-case version, byte, is a poor choice for a type name,
because it is both confusing and ambiguous.
Confusing, because for a very long time and for a huge segment of
the programming community, the term byte is synonymous with eight
bits, but in C that need not be true.
[...]
Bart <bc@freeuk.com> writes:
[.. I am cutting 100ish lines as they don't bear on my response ..]
Take for example C's set of operator precedences.
The one for the ?: operator is particularly obscure, so in an
expression like one of these:
a + b ? c - d : e * f
a ? b ? c : d ? e : f : g
then parentheses would be used to make things clearer. (I haven't
check these are valid, but that is the point; it is hard to see!)
But would shouldn't people be expected to learn the rules? Why is it
OK to 'revel' in not knowing the basics here, but not when unnecessary
UBs are involved where rules are harder and which depend on runtime
inputs?
If you want people to take you seriously, you need to find more
compelling examples. I am both familiar with and comfortable with
the syntax of C expressions, and even I would never write such
expressions as the two shown above.
These lines look like they
were written by someone in junior high school (or these days,
probably elementary school).
Whether you mean to or not, this
example gives the impression of offering a strawman argument, and
it's only natural for people to react to that by dismissing your
comments, or even dismissing them altogether. Is that what you
want? To be dismissed? Or do you hope to actually communicate
with people? If so I recommend looking for a better framing of
your views and ideas.
On 2026-05-13 13:07, Tim Rentsch wrote:
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
[...]
BYTE and WORD are poor choices for type names, no doubt
about that.
[...]
WORD is certainly ambiguous (unless, I suppose, it's sufficiently
obvious from the context).-a But I don't have a problem with BYTE,
or preferably byte, as a type name as long as it really is a byte.
[...]
BYTE is a poor choice for a type name because it looks like a
macro.
A lower-case version, byte, is a poor choice for a type name,
because it is both confusing and ambiguous.
Confusing, because for a very long time and for a huge segment of
the programming community, the term byte is synonymous with eight
bits, but in C that need not be true.
Actually, it was more an issue in the "intermediate epoch", when
terminology spread to the non-expert home-users who considered
a byte to be 8 bit on their typical PC systems while not knowing
anything from the professional IT world before (with 6, 7, 9 bit
entities). Nowadays I'd consider it less an issue since these
systems seem to have (mostly?) vanished. There was a reason why
the standards back then introduced and used the term "octet" for
the common 8-bit entities, to avoid ambiguity and misunderstanding.
What's technically defined for the "C" language in the respective
standard documents is an own thing, not necessarily equivalent to
the respective application semantics expressed by some C-program,
although I'd always prefer "octet" for that (and avoid "byte").
Janis
On 2026-05-12 16:33, Bart wrote:
[...]
Take for example C's set of operator precedences.
The one for the ?: operator is particularly obscure, so in an
expression like one of these:
-a-a-a a + b ? c - d : e * f
-a-a-a a ? b ? c : d ? e : f : g
then parentheses would be used to make things clearer. (I haven't
check these are valid, but that is the point; it is hard to see!)
What has that example to do with ["obscure"] _operator precedence_?
Ternary conditionals are actually expressions that are sensibly
defined in "C" (i.e. concerning their precedence ranking).
-a-a-a-a a + b ? c - d
-a-a-a-a-a-a-a-a-a-a : e * f
-a-a-a-a a ?
-a-a-a-a-a-a-a-a b ? c
-a-a-a-a-a-a-a-a-a-a : d ? e
-a-a-a-a-a-a-a-a-a-a-a-a-a-a : f
-a-a-a-a-a-a : g
For complex expressions you can, as a *responsible* programmer, use
various means to not (not deliberately) write obfuscated expressions;
you can indent code, use parentheses[*], or you can decompose it to
(semantic or technical) identified sub-units.
[*] Parentheses would IMO make your layout in your example above not
in any way better,
just yet more overloaded. (So forcing parenthesis
[in "your language"] is certainly addressing the wrong problem here.)
Your complaint, as so often, fails to work on so many levels. It
tells, yet again, more about you than about the "C" language.
But would shouldn't people be expected to learn the rules?
Programmers should certainly learn, know, apply, and obey the rules.
(If you don't understand that you may try to transform that truism
to your "car example".)
Janis
PS: There *is* a specific issue in C's operator precedence ranking
but it's not the ternary conditional.
On 13/05/2026 03:48, Tim Rentsch wrote:
Bart <bc@freeuk.com> writes:
[.. I am cutting 100ish lines as they don't bear on my response ..]
Take for example C's set of operator precedences.
The one for the ?: operator is particularly obscure, so in an
expression like one of these:
a + b ? c - d : e * f
a ? b ? c : d ? e : f : g
then parentheses would be used to make things clearer. (I haven't
check these are valid, but that is the point; it is hard to see!)
But would shouldn't people be expected to learn the rules? Why is it
OK to 'revel' in not knowing the basics here, but not when unnecessary
UBs are involved where rules are harder and which depend on runtime
inputs?
If you want people to take you seriously, you need to find more
compelling examples. I am both familiar with and comfortable with
the syntax of C expressions, and even I would never write such
expressions as the two shown above.
No? I actually had your posted examples in mind. I can't remember you
using parentheses. I can remember you not being sympathetic to readers
of your code and expected them to be as familiar with precedence as
you are.
These lines look like they
were written by someone in junior high school (or these days,
probably elementary school).
The lines are not meant to mean anything, just sequences of terms and operators. You can think of them as exercises where you add
parentheses to make them unambiguous.
A bit like adding punctuation here:
"that that is is that that is not is not is that it it is"
Whether you mean to or not, this
example gives the impression of offering a strawman argument, and
it's only natural for people to react to that by dismissing your
comments, or even dismissing them altogether. Is that what you
want? To be dismissed? Or do you hope to actually communicate
with people? If so I recommend looking for a better framing of
your views and ideas.
Now this is getting silly. Can no one here engage in a civil
discussion without reducing to insults and casting aspersions?
On 13/05/2026 02:26, Janis Papanagnou wrote:[ code snipped ]
On 2026-05-13 00:35, Keith Thompson wrote:
[Dropping comp.lang.misc, since this is only about C.]
Bart <bc@freeuk.com> writes:
[...][...]
So the inconvenience of how 'switch' works is excused because
/sometimes/ you need fallthrough, or the one time in a thousand you
need Duff's device.
I don't see any inconvenience in "how it works"; it actually
allows programmers to implement both semantics as needed. And
both semantics were needed, they have been used. (Even if you
think your projection of your preferences and limited uses is
what should constitute the global software development world.)
Well, no other language (save C++) implements switch like C does.
I'm not sure you appreciate how bizarre it actually is. Here is a piece
of code from a 'Sieve' benchmark:
This is perfectly valid C code, if meaningless.
[...]
BTW switch fallthrough is necessary so that you can do this:
-a-a-a switch (a) {
-a-a-a case 'A': case 'B': case 'C': .... // deal with A/B/C
Without fall-through behavior, it would exit after that case 'A': label. This is how crude it is.
Remember my saying people defend its misfeatures to the death? Your post
is a perfect example!
[...]
I don't see any inconvenience in "how it works"; it actually
allows programmers to implement both semantics as needed.
Ha, ha, ha! This is exactly my point. 99% of the time, at least, you
want very simple, boring semantics and properly structured syntax, just
as I offer im my languages and others do in their switch/match statements.
[...]
(Even if you
think your projection of your preferences and limited uses is
what should constitute the global software development world.)
You think my view is limited, and genuinely think C switch is superior
to how it works in other languages?
[...]
Then this is definitely a wind-up.
Not at all.-a "switch" was originally implemented in a way that,
I suspect, was easier for the compiler to implement (basically
a scoped computed goto), and for an audience of programmers who,
to exaggerate slightly, could shout across the room and ask Dennis
Ritchie questions about it.
Computed 'goto' was actually considered quite "high standards"
Switch is not really computed goto.
[...]I acknowledge if you say that it is a drama *for you*. (And I'm not
[...]
The explicit and clumsy 'break' is what syntactically annoys me,
but it's also no drama, to be clear.
I disagree, it /IS/ a drama where you have to keep remembering to write it.
It stays the way it is because changing it *would break existing
code*.-a Worse, some seemingly reasonable ways of changing it would
mean that existing code is still valid but with different semantics.
Indeed. And that's the crucial point. A simple "dislike"-criticism
without acknowledging the practical side effects is pointless.
I understand the problems of changing it in the 21st century rather than much earlier on.
People could simply agree with me that it is a terrible
language feature.
[...]
Bart <bc@freeuk.com> writes:
The one for the ?: operator is particularly obscure, so in an
expression like one of these:
-a-a-a a + b ? c - d : e * f
-a-a-a a ? b ? c : d ? e : f : g
[...]
The lines are not meant to mean anything, just sequences of terms and operators. You can think of them as exercises where you add parentheses
to make them unambiguous.
[...]
On 2026-05-13 01:21, Scott Lurndal wrote:
Bart <bc@freeuk.com> writes:
On 12/05/2026 02:37, Keith Thompson wrote:
[...]
Yeah, it's one of the great mysteries. Even half a century ago, there
were big companies and lots of clever people, who could have cranked out >>> a suitable systems language of equal capability to C in their sleep, but >>> with fewer rough edges.
Those clever people _were_ cranking out suitable systems languages
by the bucketful. PL/1, Algol derivatives, proprietary internal
languages (Burroughs SPRITE and BPL languages), HP-3000 SPL (Systems
Programming Language - I used SPL in the late 70s) and
on the academic side, modula, ADA, Pascal (yes, it could be
a systems programming language, c.f. VAX-11 Pascal).
[...]
I wonder about why you put Ada just in the "academic box".
On 2026-05-12 16:33, Bart wrote:
[snip]
But would shouldn't people be expected to learn the rules?
Programmers should certainly learn, know, apply, and obey the rules.
(If you don't understand that you may try to transform that truism
to your "car example".)
scott@slp53.sl.home (Scott Lurndal) writes:
For the early C compiler on the PDP-11, the 'int' type was
16-bits, implicitly signed, and the code generator simply emitted
available arithmetic instructions.
It was the only C compiler at the time, any guarantees would have
been implicit in the choice of target architecture.
I mostly wrote unix kernel code using the v6 compiler, rather than
writing code that did any heavy math, so whether value was
preserved or sign was preserved wasn't something I, as a kernel
programmer, routinely considered.
If int was only 16 bits, I expect promotion considerations didn't
come up very often.
On 2026-05-12 20:09, Dan Cross wrote:
In article <10tvmp7$23t17$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:
Would you get more signed overflow in practice? And in particular,
would you get more signed overflow UB in places where you would not have >>> a bug in the code anyway. There would certainly be more cases of signed >>> integer arithmetic, whereas moving to a common unsigned type means more
unsigned integer arithmetic. But I don't see signed integer arithmetic
as a risk of UB in itself - it is only a risk UB if you are working with >>> inappropriate values.
I suspect you would, if only because one of the major motivating
factors for using unsigned arithmetic in practice is to have the
full bit-range of the type available. [...]
Hmm.. - I'm using 'unsigned' typically to express the domain of the >application values (not to "wrest" some more values out of a type).
James Kuyper <jameskuyper@alumni.caltech.edu> writes:
On 2026-05-08 06:43, David Brown wrote:
...
Yes, I have heard that argument before. I am unconvinced that the
"value preserving" choice actually has any real advantages. I also
think it is a misnomer - it implies that "unsigned preserving" would
not preserve values, which is wrong.
Unsigned-preserving rules would convert a signed value which might be
negative to unsigned type more frequently than the value preserving
rules do.
This statement is wrong.
An "unsigned preserving" promotion rule
converts a signed value to a signed value and an unsigned value to
an unsigned value. The value being converted stays the same in both
cases. Both an "unsigned preserving" promotion and a so-called
"value preserving" promotion preserve the value of the operand being
promoted (and converted).
[snip]
Bart <bc@freeuk.com> writes:
[...]
So the inconvenience of how 'switch' works is excused because
/sometimes/ you need fallthrough, or the one time in a thousand you
need Duff's device.
I don't see any inconvenience in "how it works"; it actually
allows programmers to implement both semantics as needed. And
both semantics were needed, they have been used. (Even if you
think your projection of your preferences and limited uses is
what should constitute the global software development world.)
On 2026-05-13 00:35, Keith Thompson wrote:
Except that C23 adds a "fallthrough" attribute that, while it
doesn't change the semantics of the switch statement, allows a
programmer to tell the compiler that a fallthrough was intentional.
A compiler can choose to warn about an unmarked fallthrough and
remain silent when it sees the "fallthrough" attribute.
We considered it generally good style to write /* fall-through */
at such places in our software as an explicit visible hint (and
that is even more "bulky" than the explicit 'break'). I'm thus not
astonished about these new features.
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <10tpq7e$a6kp$3@dont-email.me>, Bart <bc@freeuk.com> wrote:
[...]
Apparently, you missed the changes afoot in the committee to do
exactly what everyone has been telling you: deprecate `i[A]` but
preserve `i + A`.
Not deprecate but deem it obsolescent. A very different thing.
In article <10u1j2h$1l93l$31@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
On 2026-05-12 16:33, Bart wrote:
[snip]
But would shouldn't people be expected to learn the rules?
Programmers should certainly learn, know, apply, and obey the rules.
(If you don't understand that you may try to transform that truism
to your "car example".)
Programmers _should_ absolutely learn the rules. But in C,
there are many of them, and some of them are deceptively subtle.
_A_ rule that programmers can remember quite easily, however,
is that parenthesis generally carry very high precedence, and
so when it doubt, wrapping something in paren's can aid
understanding (for the programmer and the maintainer). The key
is to find balance between extreme terseness and extreme
verbosity, both of which can feel obfuscating.
There was a time when I knew and had memorized the precedence of
all operators in C. I remember most, but have forgotten some
that I use less frequently; I suspect many programmers are in
the same (or a similar) situation. If I am writing code and can
not immediately remember the precedence of some operator in some
expression, I apply parentheses.
On Sun, 10 May 2026 20:30:24 -0400...
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
On 2026-05-10 20:10, Keith Thompson wrote:
...I like stdint.h types.
Me too.
And all those *_fast and *_least types... Not that I hate them, but
it's certainly shows lack of taste.
On 2026-05-13 05:47, Tim Rentsch wrote:
scott@slp53.sl.home (Scott Lurndal) writes:[...]
The programming I do
(mainly kernel programming, SoC simulation,
firmware) all naturally require the fixed-width types.
Right. Code that interacts very closely with hardware is one of
those cases where the fixed-width types make sense.
Another common one - also "low-level" but different - are data types >exchanged through communication protocols.
cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]
It is not clear to me that `longjmp` out of a non-nested signal
handler is still well-defined as of C11, though it is explicitly
stated to be C89.
It seems you are misunderstanding what the standards are saying.
The description of longjmp() says (paraphrasing) that it restores
the environment where the relevant setjmp() was done.
There is
in C89 a passage about returning from signal handlers and so
forth, but that is followed by a carveout for nested signal
handlers, which in C89 is undefined behavior. (I assume that
also holds for C90 but I haven't verified that.)
Starting in C99, any mention of interrupts and signal handlers was
removed, along with the carveout.
Because there is a definition
for what longjmp() does, the behavior is defined, and there is no
undefined behavior (not counting things like doing a longjmp()
with a jmp_buf that wasn't set up, etc). Removing the mention of
interrupts and signals, and also removing the carveout, only makes
longjmp() more defined, not less.
On 13/05/2026 16:31, Dan Cross wrote:
In article <10u1j2h$1l93l$31@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
On 2026-05-12 16:33, Bart wrote:
[snip]
But would shouldn't people be expected to learn the rules?
Programmers should certainly learn, know, apply, and obey the rules.
(If you don't understand that you may try to transform that truism
to your "car example".)
Programmers _should_ absolutely learn the rules. But in C,
there are many of them, and some of them are deceptively subtle.
_A_ rule that programmers can remember quite easily, however,
is that parenthesis generally carry very high precedence, and
so when it doubt, wrapping something in paren's can aid
understanding (for the programmer and the maintainer). The key
is to find balance between extreme terseness and extreme
verbosity, both of which can feel obfuscating.
There was a time when I knew and had memorized the precedence of
all operators in C. I remember most, but have forgotten some
that I use less frequently; I suspect many programmers are in
the same (or a similar) situation. If I am writing code and can
not immediately remember the precedence of some operator in some
expression, I apply parentheses.
I don't think it is necessary to /learn/ all the rules of a language -
but it is necessary to be aware of them, and to know how well you know
them. It's fine not to be sure of all the precedence rules in a
language (and some languages have many more operators than C, or
stranger precedence rules). You only need to know the ones you rely on >regularly, and the ones you have to read regularly. If you occasionally >come across something different, then you can look it up. There's no
point in filling your head with knowledge that you almost never need.
So there is usually no need to know the precedence rules for mixing >relational operators, shift operators and bitwise and/or operators, or >whatever, if you put parentheses in your own code or split the complex >expression into multiple variables. (With the caveat that you mentioned >earlier that both too few and too many parentheses make code harder to >understand.)
But you might have to understand code written which relies on more of
the details - you need to be aware of what you know, and what you have
to look up, in order to understand the code. The risk comes not from >ignorance of the precedence rules, but from thinking you know them when
you have misremembered them. Self-awareness of your own knowledge,
along with convenient and reliable references, is vital.
In article <86cxyzlfc7.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote: >>cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <10tpq7e$a6kp$3@dont-email.me>, Bart <bc@freeuk.com> wrote:
[...]
Apparently, you missed the changes afoot in the committee to do
exactly what everyone has been telling you: deprecate `i[A]` but
preserve `i + A`.
Not deprecate but deem it obsolescent. A very different thing.
https://www.open-std.org/JTC1/SC22/WG14/www/docs/n3517.htm is
the original issue. It links to n3380, available online at >https://open-std.org/JTC1/SC22/WG14/www/docs/n3380.htm.
Note `n3380` dated, 202410, which is accompanied by this
comment: "Do not remove index[array], yet, but deprecate it."
Note also the poll and results from the Minneapolis meeting: >"https://open-std.org/JTC1/SC22/WG14/www/docs/n3380.htm" (10
voted yes, 1 no, 8 abstain; result is "direction").
This is the sense in which I used that word, not the sense in
the standard.
On 2026-05-08 18:23, Tim Rentsch wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
NB: As opposed to other people I've never considered the K&R bible
a good book;
K&R is meant to be an introduction to C, written in an informal and
sometimes tutorial style. [...]
Acknowledged.
[...]
for every other statement it gave it created one more question
that just wasn't answered, so I remember. [...]
This is actually very unlike other languages I learned, were
simple coding situations either work as documented or lead to an
error.
C was typical for its era; that's just how languages were in those
days. In any case this complaint is about the language, not about
the book.
It was a comment about the books available to learn a language.
(Of course K&R was sufficient to sit down and hack up programs.
But to repeat the main point: "for every other statement it gave
it created one more question that just wasn't answered". YMMV.)
[.. the language chosen for comparison is Simula 67] There's good
tutorials (like "SIMULA BEGIN"), [...]. [Other Simula documents
were mentioned but they aren't relevant to my comments which were
only about The C Programming Language, by Kernighan and Ritchie]
On 2026-05-13 13:07, Tim Rentsch wrote:
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
[...]
BYTE and WORD are poor choices for type names, no doubt
about that.
[...]
WORD is certainly ambiguous (unless, I suppose, it's sufficiently
obvious from the context). But I don't have a problem with BYTE,
or preferably byte, as a type name as long as it really is a byte.
[...]
BYTE is a poor choice for a type name because it looks like a
macro.
A lower-case version, byte, is a poor choice for a type name,
because it is both confusing and ambiguous.
Confusing, because for a very long time and for a huge segment of
the programming community, the term byte is synonymous with eight
bits, but in C that need not be true.
Actually, it was more an issue in the "intermediate epoch", when
terminology spread to the non-expert home-users who considered
a byte to be 8 bit on their typical PC systems while not knowing
anything from the professional IT world before (with 6, 7, 9 bit
entities). Nowadays I'd consider it less an issue since these
systems seem to have (mostly?) vanished. There was a reason why
the standards back then introduced and used the term "octet" for
the common 8-bit entities, to avoid ambiguity and misunderstanding.
What's technically defined for the "C" language in the respective
standard documents is an own thing, not necessarily equivalent to
the respective application semantics expressed by some C-program,
although I'd always prefer "octet" for that (and avoid "byte").
On 2026-05-09 00:43, Dan Cross wrote:
[...]
Sure. This was a bit of a contrived example, but you ask a good
question: how often might one want write code like that?
In short, I don't know, but I can think of any number of hash
functions, checksums, etc, that may be implemented using 16-bit
arithmetic, and I can well see programmers wanting to take
advantage of the modular semantics afforded by using unsigned
types to do so. Every day? Probably not. But often enough.
I mentioned it before but it may have got lost in the lots text
typically exchanged here; for hash functions a modulus based on
powers of two has *bad* _distribution properties_, so it's not
a sensible example or plausible rationale to vindicate modular
arithmetic for the few special cases (m=8, 16, 32, 64, etc.).
In article <864ikdp9lk.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <868q9ppg4o.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
antispam@fricas.org (Waldek Hebisch) writes:
[discussing the notion of "safe" programs]
As I wrote, safety is about ability to avoid or detect errors.
In the functional programming community the usual statement is
"Well-typed programs cannot go wrong."
This is only concerning _type safety_.
I didn't mean to imply anything different.
Looking at what you wrote:
|I think a good way of understanding this is that, if
|a program stays inside the safe limits of the language,
|the program can produce wrong answers, but it cannot
|produce meaningless answers.
You are wrong.
A well-typed program _can_ produce meaningless answers; those
answers will have a well-defined type, but it is impossible to
say whether the value produced has any meaning with respect to
the program's intended purpose. Moreover, the "safe limits of
the lanugage", whatever those may be, have nothing to do with
it.
There is another difference worth noting. A byte is a unit of
storage, whereas octet is a measure of information. The word
byte is inherently about memory; the word octet is inherently
about value (eight bits of information). For this reason too
the name 'octet' is a better choice for a type name than 'byte'.
In article <8633zwm5h9.fsf@linuxsc.com>,[...]
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
If int was only 16 bits, I expect promotion considerations didn't
come up very often.
Presumably they came up all the time; `char` was used a small
integer frequently. But there was no `unsigned` type so
whether, it was promoted to an `int` or `unsigned int` was moot.
In article <86lddnlvtr.fsf@linuxsc.com>,[...]
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Starting in C99, any mention of interrupts and signal handlers was
removed, along with the carveout.
This is wrong. Section 7.14 of C23 talks about signals and
signal handlers at length.
I never mentioned "interrupts" at all (traditionally, Unix
signals, which formed the basis for C signals, are not
interrputs in the conventional sense. Modern systems will
sometimes make use of interprocessor-interrupts to hasten their
delivery, however).
I think you are talking about _only_ the description of
`longjmp`. I am actually talking about the standard considered
in total. I only mentioned "non-nested" signal handler because
C90 was explicit in saying that that `longjmp` from a _nested_
signal handler was UB.
On 13/05/2026 02:26, Janis Papanagnou wrote:
On 2026-05-13 00:35, Keith Thompson wrote:
[Dropping comp.lang.misc, since this is only about C.]I don't see any inconvenience in "how it works"; it actually
Bart <bc@freeuk.com> writes:
[...][...]
So the inconvenience of how 'switch' works is excused because
/sometimes/ you need fallthrough, or the one time in a thousand you
need Duff's device.
allows programmers to implement both semantics as needed. And
both semantics were needed, they have been used. (Even if you
think your projection of your preferences and limited uses is
what should constitute the global software development world.)
Well, no other language (save C++) implements switch like C does.
I'm not sure you appreciate how bizarre it actually is.
Here is a piece of code from a 'Sieve' benchmark:[snip]
Now, let's put wrap 'switch' around it:
This is perfectly valid C code, if meaningless.
The original code is 4 nested statements, but the switch's 'case'
labels can go literally anywhere within that structure. Even 'default'
can go anywhere and be mixed up with the other cases.
Further, if you wanted to apply 'break' to one of those case-blocks,
it wouldn't work as it would pertain to one of those nested loops.
I made this point before but it was brushed off. The C authors
couldn't think of an alternate keyword so there remains this conflict.
BTW switch fallthrough is necessary so that you can do this:
switch (a) {
case 'A': case 'B': case 'C': .... // deal with A/B/C
Without fall-through behavior, it would exit after that case 'A':
label. This is how crude it is.
Remember my saying people defend its misfeatures to the death? Your
post is a perfect example!
Ha, ha, ha! This is exactly my point. 99% of the time, at least, you
want very simple, boring semantics and properly structured syntax,
just as I offer im my languages and others do in their switch/match statements.
The explicit and clumsy 'break' is what syntactically annoys me,
but it's also no drama, to be clear.
I disagree, it /IS/ a drama where you have to keep remembering to write it.
It stays the way it is because changing it *would break existingIndeed. And that's the crucial point. A simple "dislike"-criticism
code*.-a Worse, some seemingly reasonable ways of changing it would
mean that existing code is still valid but with different semantics.
without acknowledging the practical side effects is pointless.
I understand the problems of changing it in the 21st century rather
than much earlier on. People could simply agree with me that it is a
terrible language feature.
It would also have been perfectly possible to leave 'switch' alone and instead introduce a new kind of statement.
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:[...]
I wonder about why you put Ada just in the "academic box".
Because the first ADA compiler I used came from NYU. :-)
In article <10u0k0k$1l93l$30@dont-email.me>,[...]
It's easy to get wrong. Other languages accommodate both
semantics using alternation in the selector arm. For example,
one might imagine an hypothetical syntax, something like:
switch (a) {
case 1 || 2 || 3 || 4: whatever();
default: other();
}
...with no `break` to end each `case`.
You couldn't use it to build Duff's Device, but I'm not sure
that even Duff would call that a loss.
On 2026-05-13 05:47, Tim Rentsch wrote:
scott@slp53.sl.home (Scott Lurndal) writes:[...]
The programming I do
(mainly kernel programming, SoC simulation,
firmware) all naturally require the fixed-width types.
Right.-a Code that interacts very closely with hardware is one of
those cases where the fixed-width types make sense.
Another common one - also "low-level" but different - are data types exchanged through communication protocols.
In article <10u0k0k$1l93l$30@dont-email.me>,[...]
It's easy to get wrong. Other languages accommodate both
semantics using alternation in the selector arm. For example,
one might imagine an hypothetical syntax, something like:
switch (a) {
case 1 || 2 || 3 || 4: whatever();
default: other();
}
...with no `break` to end each `case`.
That's already valid syntax.
If C's switch statement were to be
changed, it would have to use something that's currently a syntax
error. Perhaps something like
case 1, case 2, case 3, case 4: whatever();
Oh, I know, we could reuse the "static" keyword!
You couldn't use it to build Duff's Device, but I'm not sure
that even Duff would call that a loss.
A "better" switch statement might have an explicit fallthrough
construct. (bash's "case" statement has this, more or less.)
Or you could use goto (yeah, I know).
scott@slp53.sl.home (Scott Lurndal) writes:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:[...]
I wonder about why you put Ada just in the "academic box".
Because the first ADA compiler I used came from NYU. :-)
It's Ada, not ADA. It's a person's name, not an acronym.
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <864ikdp9lk.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <868q9ppg4o.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
antispam@fricas.org (Waldek Hebisch) writes:
[discussing the notion of "safe" programs]
As I wrote, safety is about ability to avoid or detect errors.
In the functional programming community the usual statement is
"Well-typed programs cannot go wrong."
This is only concerning _type safety_.
I didn't mean to imply anything different.
Looking at what you wrote:
|I think a good way of understanding this is that, if
|a program stays inside the safe limits of the language,
|the program can produce wrong answers, but it cannot
|produce meaningless answers.
You are wrong.
A well-typed program _can_ produce meaningless answers; those
answers will have a well-defined type, but it is impossible to
say whether the value produced has any meaning with respect to
the program's intended purpose. Moreover, the "safe limits of
the lanugage", whatever those may be, have nothing to do with
it.
What you mean by meaningless isn't what I meant by meaningless.
In article <86lddnlvtr.fsf@linuxsc.com>,[...]
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Starting in C99, any mention of interrupts and signal handlers was >>>removed, along with the carveout.
This is wrong. Section 7.14 of C23 talks about signals and
signal handlers at length.
Obviously, but that's clearly not what Tim meant.
His statement
was not wrong in context. (7.14 describes <signal.h>. It's not
plausible that Tim would think that had been removed.)
I never mentioned "interrupts" at all (traditionally, Unix
signals, which formed the basis for C signals, are not
interrputs in the conventional sense. Modern systems will
sometimes make use of interprocessor-interrupts to hasten their
delivery, however).
I think you are talking about _only_ the description of
`longjmp`. I am actually talking about the standard considered
in total. I only mentioned "non-nested" signal handler because
C90 was explicit in saying that that `longjmp` from a _nested_
signal handler was UB.
Yes, Tim was clearly talking only about the descrition of longjmp.
His statement wasn't wrong, just restricted to a certain context.
C90's description of of longjmp includes a paragraph about interrupts
and signals. C99 removed that paragraph.
In article <8633zwm5h9.fsf@linuxsc.com>,[...]
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
If int was only 16 bits, I expect promotion considerations didn't
come up very often.
Presumably they came up all the time; `char` was used a small
integer frequently. But there was no `unsigned` type so
whether, it was promoted to an `int` or `unsigned int` was moot.
Very early C didn't have unsigned int, but the signedness of char was >effectively implementation-defined. From the 1975 C Reference Manual:
A char object may be used anywhere an int may be. In all
cases the char is converted to an int by propagating its sign
through the upper 8 bits of the resultant integer. This is
consistent with the tworCOs complement representation used for
both characters and integers. (However, the sign-propagation
feature disappears in other implementations.)
In modern terms, the "other implementations" made plain char unsigned.
In article <10u2jpk$2t96p$6@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <10u0k0k$1l93l$30@dont-email.me>,[...]
It's easy to get wrong. Other languages accommodate both
semantics using alternation in the selector arm. For example,
one might imagine an hypothetical syntax, something like:
switch (a) {
case 1 || 2 || 3 || 4: whatever();
default: other();
}
...with no `break` to end each `case`.
That's already valid syntax.
It wasn't meant to be taken as a serious suggestion!
If C's switch statement were to be
changed, it would have to use something that's currently a syntax
error. Perhaps something like
case 1, case 2, case 3, case 4: whatever();
Sure, that's better.
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <86lddnlvtr.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
[...]
Starting in C99, any mention of interrupts and signal handlers was
removed, along with the carveout.
This is wrong. Section 7.14 of C23 talks about signals and
signal handlers at length.
Obviously, but that's clearly not what Tim meant. His statement
was not wrong in context. (7.14 describes <signal.h>. It's not
plausible that Tim would think that had been removed.)
I never mentioned "interrupts" at all (traditionally, Unix
signals, which formed the basis for C signals, are not
interrputs in the conventional sense. Modern systems will
sometimes make use of interprocessor-interrupts to hasten their
delivery, however).
I think you are talking about _only_ the description of
`longjmp`. I am actually talking about the standard considered
in total. I only mentioned "non-nested" signal handler because
C90 was explicit in saying that that `longjmp` from a _nested_
signal handler was UB.
Yes, Tim was clearly talking only about the descrition of longjmp.
His statement wasn't wrong, just restricted to a certain context.
C90's description of of longjmp includes a paragraph about interrupts
and signals. C99 removed that paragraph.
In article <86lddnlvtr.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]
It is not clear to me that `longjmp` out of a non-nested signal
handler is still well-defined as of C11, though it is explicitly
stated to be C89.
It seems you are misunderstanding what the standards are saying.
You read my post with insufficient care, and failed to
understand what I wrote, and are responding to something I did
not say.
The description of longjmp() says (paraphrasing) that it restores
the environment where the relevant setjmp() was done.
Yes.
There is
in C89 a passage about returning from signal handlers and so
forth, but that is followed by a carveout for nested signal
handlers, which in C89 is undefined behavior. (I assume that
also holds for C90 but I haven't verified that.)
Yes.
Aside: surely it is well well-known by now that the language in
C90 is verbatim identical to the language for C89 except for
some bits of the front matter that explain the provenance of the
standard originating from ANSI.
If you know of specific differences, or a reason this is known
to be incorrect, please point it out.
Starting in C99, any mention of interrupts and signal handlers was
removed, along with the carveout.
This is wrong. Section 7.14 of C23 talks about signals and
signal handlers at length.
I never mentioned "interrupts" at all (traditionally, Unix
signals, which formed the basis for C signals, are not
interrputs in the conventional sense. Modern systems will
sometimes make use of interprocessor-interrupts to hasten their
delivery, however).
I think you are talking about _only_ the description of
`longjmp`. I am actually talking about the standard considered
in total. I only mentioned "non-nested" signal handler because
C90 was explicit in saying that that `longjmp` from a _nested_
signal handler was UB.
Because there is a definition
for what longjmp() does, the behavior is defined, and there is no
undefined behavior (not counting things like doing a longjmp()
with a jmp_buf that wasn't set up, etc). Removing the mention of
interrupts and signals, and also removing the carveout, only makes
longjmp() more defined, not less.
I don't think you understood my statement.
Read section 7.14 of C23 carefully; it is not at all obvious
that a `longjmp` out of a signal handler is not _a priori_ UB.
By my reading, it's the opposite, in fact: I see no way to do
so without invoking UB.
I was asked for an example, beyond the behavior of
`realloc(ptr, 0)` with respect to whether it free's `ptr` if
`ptr` is non-null, where something that was explicitly
guaranteed by an earlier version of the standard was changed to
UB in a later version. This appears another example of such a
case.
By all means, correct me if you think I am mistaken, but your
explanation above was based on your own misinterpretation, not
otherwise relevant to the statement I had made, and incorrect
in fact (the standard did _not_ remove mention of signals).
Note, in the case of `longjmp` and signal handlers, I suspect it
doesn't much matter because if one is doing something like that
anyway, as one is almost invariably going to targeting a system
that conforms to a standard like POSIX, which extends ISO C with
stronger guarantees for defined behavior in this specific area.
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
[...]
There is another difference worth noting. A byte is a unit of
storage, whereas octet is a measure of information. The word
byte is inherently about memory; the word octet is inherently
about value (eight bits of information). For this reason too
the name 'octet' is a better choice for a type name than 'byte'.
The words "octet" and "byte" mean different things.
If I were to typedef "byte" as unsigned char, it would be because I
want to emphasize the fact that a byte object holds one fundamental
unit of data, not necessarily character data. And I'd probably
use it in a way that doesn't assume it's 8 bits (unless I have a
good reason not to need portability). C's conflation of character
types with "bytes" is IMHO unfortunate; a typedef makes it clearer
what the type is being used for.
I usually just use "unsigned char" and remember that it's one byte
(however many bits that is).
cross@spitfire.i.gajendra.net (Dan Cross) writes:[...]
In article <10u2jpk$2t96p$6@kst.eternal-september.org>,
If C's switch statement were to be
changed, it would have to use something that's currently a syntax
error. Perhaps something like
case 1, case 2, case 3, case 4: whatever();
Sure, that's better.
case 1...4: whatever();
is a typical GCC extension (that we use heavily).
On 2026-05-13 00:35, Keith Thompson wrote:
[Dropping comp.lang.misc, since this is only about C.]
Bart <bc@freeuk.com> writes:
[...]
[...]
So the inconvenience of how 'switch' works is excused because
/sometimes/ you need fallthrough, or the one time in a thousand you
need Duff's device.
I don't see any inconvenience in "how it works"; it actually
allows programmers to implement both semantics as needed. And
both semantics were needed, they have been used. (Even if you
think your projection of your preferences and limited uses is
what should constitute the global software development world.)
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <86lddnlvtr.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]
It is not clear to me that `longjmp` out of a non-nested signal
handler is still well-defined as of C11, though it is explicitly
stated to be C89.
It seems you are misunderstanding what the standards are saying.
You read my post with insufficient care, and failed to
understand what I wrote, and are responding to something I did
not say.
The description of longjmp() says (paraphrasing) that it restores
the environment where the relevant setjmp() was done.
Yes.
There is
in C89 a passage about returning from signal handlers and so
forth, but that is followed by a carveout for nested signal
handlers, which in C89 is undefined behavior. (I assume that
also holds for C90 but I haven't verified that.)
Yes.
Aside: surely it is well well-known by now that the language in
C90 is verbatim identical to the language for C89 except for
some bits of the front matter that explain the provenance of the
standard originating from ANSI.
If you know of specific differences, or a reason this is known
to be incorrect, please point it out.
Starting in C99, any mention of interrupts and signal handlers was
removed, along with the carveout.
This is wrong. Section 7.14 of C23 talks about signals and
signal handlers at length.
I never mentioned "interrupts" at all (traditionally, Unix
signals, which formed the basis for C signals, are not
interrputs in the conventional sense. Modern systems will
sometimes make use of interprocessor-interrupts to hasten their
delivery, however).
I think you are talking about _only_ the description of
`longjmp`. I am actually talking about the standard considered
in total. I only mentioned "non-nested" signal handler because
C90 was explicit in saying that that `longjmp` from a _nested_
signal handler was UB.
Because there is a definition
for what longjmp() does, the behavior is defined, and there is no
undefined behavior (not counting things like doing a longjmp()
with a jmp_buf that wasn't set up, etc). Removing the mention of
interrupts and signals, and also removing the carveout, only makes
longjmp() more defined, not less.
I don't think you understood my statement.
Read section 7.14 of C23 carefully; it is not at all obvious
that a `longjmp` out of a signal handler is not _a priori_ UB.
By my reading, it's the opposite, in fact: I see no way to do
so without invoking UB.
I was asked for an example, beyond the behavior of
`realloc(ptr, 0)` with respect to whether it free's `ptr` if
`ptr` is non-null, where something that was explicitly
guaranteed by an earlier version of the standard was changed to
UB in a later version. This appears another example of such a
case.
By all means, correct me if you think I am mistaken, but your
explanation above was based on your own misinterpretation, not
otherwise relevant to the statement I had made, and incorrect
in fact (the standard did _not_ remove mention of signals).
Note, in the case of `longjmp` and signal handlers, I suspect it
doesn't much matter because if one is doing something like that
anyway, as one is almost invariably going to targeting a system
that conforms to a standard like POSIX, which extends ISO C with
stronger guarantees for defined behavior in this specific area.
I replied to Keith Thompson's reply downthread.
In article <10u1j2h$1l93l$31@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
On 2026-05-12 16:33, Bart wrote:
[snip]
But would shouldn't people be expected to learn the rules?
Programmers should certainly learn, know, apply, and obey the rules.
(If you don't understand that you may try to transform that truism
to your "car example".)
Programmers _should_ absolutely learn the rules. But in C,
there are many of them, and some of them are deceptively subtle.
_A_ rule that programmers can remember quite easily, however,
is that parenthesis generally carry very high precedence, and
so when it doubt, wrapping something in paren's can aid
understanding (for the programmer and the maintainer).
The key
is to find balance between extreme terseness and extreme
verbosity, both of which can feel obfuscating.
There was a time when I knew and had memorized the precedence of
all operators in C. I remember most, but have forgotten some
that I use less frequently; I suspect many programmers are in
the same (or a similar) situation. If I am writing code and can
not immediately remember the precedence of some operator in some
expression, I apply parentheses.
In article <10u1emq$1l93k$13@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
On 2026-05-13 05:47, Tim Rentsch wrote:
scott@slp53.sl.home (Scott Lurndal) writes:[...]
The programming I do
(mainly kernel programming, SoC simulation,
firmware) all naturally require the fixed-width types.
Right. Code that interacts very closely with hardware is one of
those cases where the fixed-width types make sense.
Another common one - also "low-level" but different - are data types
exchanged through communication protocols.
Yes, in particular, networking protocols are often described in
terms of "octets", since many protocols date from the era in
which machines with differently sized bytes were still common.
E.g., much of the early work presaging TCP/IP was done on DEC
PDP-10 machines, which were 36-bit, word-oriented computers.
However, when discussing protocols (or hardware peripherals on
the local system, for that matter) it is important to exercise
care with respect to ordering of octets within multi-octet
data. For instance, IP networking "on the wire" uses Big-Endian
ordering to represent the fields in the IP datagram header,
while a processor might use Little-endian natively. Hence, one
must be sensitive to transforming between the two. It may be
easier to leave the packet data in an octet buffer, and extract
the fields one is interested in on the host from that.
On 2026-05-13 16:31, Dan Cross wrote:
In article <10u1j2h$1l93l$31@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
On 2026-05-12 16:33, Bart wrote:
[snip]
But would shouldn't people be expected to learn the rules?
Programmers should certainly learn, know, apply, and obey the rules.
(If you don't understand that you may try to transform that truism
to your "car example".)
Programmers _should_ absolutely learn the rules. But in C,
there are many of them, and some of them are deceptively subtle.
We agreed.
_A_ rule that programmers can remember quite easily, however,
is that parenthesis generally carry very high precedence, and
so when it doubt, wrapping something in paren's can aid
understanding (for the programmer and the maintainer).
I agree.
The key
is to find balance between extreme terseness and extreme
verbosity, both of which can feel obfuscating.
First, don't forget that there was no problem with precedence
existing in Bart's post; it was just an overloaded and badly
formatted composition in an example of ternary conditionals.
Now back to your statement. The point is that precedence rules
vary between programming languages. Folks can usually rely on
the precedence of * and / compared to + and - .
But being a
computer scientist there's also other characteristics one can
assume with respect to typical types; but weighed against the
design decisions of the language. For example I can live with
the difference of Pascal's and C's operator precedence, even
that they differ. But it's harder to live with a discrepancy,
a mis-ranking of a class of operators in "C". (I noticed that
already when I read K&R some time around 1985, but I first saw
that "officially" acknowledged not too long ago when someone
posted a link to a paper from, IIRC, some time in the 1990's
written by one of the authors of "C".) - And that discrepancy
detail in C's precedence ranking was actually the only reason
for me looking "regularly" into the precedence table of my K&R.
(The point is that - with the exception of & ^ | - the ranking
makes perfectly sense and should be easily usable without doubt
by a concept-knowing programmer. But note that, historically,
a sort of "rationale" can be formulated for the discrepancy to
justify the given choice in context of specifically "C". But
still remember the "official" acknowledgement of an issue here.)
There was a time when I knew and had memorized the precedence of
all operators in C. I remember most, but have forgotten some
that I use less frequently; I suspect many programmers are in
the same (or a similar) situation. If I am writing code and can
not immediately remember the precedence of some operator in some
expression, I apply parentheses.
Depending on the complexity of expressions that is a sensible
approach. (I do that as well were I think that it aids clarity.)
There is another difference worth noting. A byte is a unit of
storage, whereas octet is a measure of information. The word
byte is inherently about memory; the word octet is inherently
about value (eight bits of information). For this reason too
the name 'octet' is a better choice for a type name than 'byte'.
I must remember to start using "char unsigned" in preference to
"unsigned char". ;)
Well, I was developing software in the ISO/OSI universe, not so
much in the IETF/IP world. Endianess on the protocol level was
inherently no issue with (for example) the ASN.1/BER standards.
The "OSI-libraries" we used did the mapping from/to the machine
format. For our own local (non-OSI) protocols between different
systems we used existing functions (htonl, nltoh, etc.) for the
correct data-mapping.
Janis
In article <10u36pv$1l93k$18@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
Now back to your statement. The point is that precedence rules
vary between programming languages. Folks can usually rely on
the precedence of * and / compared to + and - .
I can think of at least two languages where you could not, but
yeah, that is usually true.
[...][...]
Yes, the fact that "break;" is used both for loops and for switch
statements is inconvenient. (csh uses "breaksw" for switch
statements.)
[...]
A "better" switch statement might have an explicit fallthrough
construct. (bash's "case" statement has this, more or less.)
Or you could use goto (yeah, I know).
We use:
# if __GNUC__ >= 7 // 'statement attributes' were new with GCC 7.x
# if defined(__cplusplus) && (__cplusplus >= 201103L) // C++11 or greater # define XXX_FALLTHROUGH [[gnu::fallthrough]]
# else
# define XXX_FALLTHROUGH __attribute__ ((fallthrough))
# endif
# else // GCC 4.x, 5.x, 6.x, comment only!
# define XXX_FALLTHROUGH /* Fall Through */
# endif
Where 'XXX' is replaced by the app name.
switch (variable) {
case cond1:
break;
case cond:
do something
XXX_FALLTHROUGH
default:
do something else
}
On 2026-05-13 21:28, Keith Thompson wrote:
[...][...]
Yes, the fact that "break;" is used both for loops and for switch
statements is inconvenient. (csh uses "breaksw" for switch
statements.)
If we'd have two distinct keywords (in a language, BTW, that tried
to avoid too many keywords in the first place!) there would be
complaints (and foremost from Bart, for sure); why they decided to
use two keywords to basically do "the same" thing, namely leaving
a control structure.
I don't see how using 'break' in more than one context would be in
any way "inconvenient".
although it's somewhat
quirky to imagine a thing that is physically "5 Bit"
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
[...]
There is another difference worth noting. A byte is a unit of
storage, whereas octet is a measure of information. The word
byte is inherently about memory; the word octet is inherently
about value (eight bits of information). For this reason too
the name 'octet' is a better choice for a type name than 'byte'.
The words "octet" and "byte" mean different things.
If I were to typedef "byte" as unsigned char, it would be because I
want to emphasize the fact that a byte object holds one fundamental
unit of data, not necessarily character data. And I'd probably
use it in a way that doesn't assume it's 8 bits (unless I have a
good reason not to need portability). C's conflation of character
types with "bytes" is IMHO unfortunate; a typedef makes it clearer
what the type is being used for.
It could, if someone happens to be looking at the typedef. More
often than not what is being looked at is a use of the name, and
not the typedef. Readers don't always have time to look up where
the name is defined, and that's why a good choice of name matters.
I usually just use "unsigned char" and remember that it's one byte
(however many bits that is).
I must remember to start using "char unsigned" in preference to
"unsigned char". ;)
On 5/13/2026 3:42 PM, Tim Rentsch wrote:
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
[...]
There is another difference worth noting.-a A byte is a unit of
storage, whereas octet is a measure of information.-a The word
byte is inherently about memory;-a the word octet is inherently
about value (eight bits of information).-a For this reason too
the name 'octet' is a better choice for a type name than 'byte'.
The words "octet" and "byte" mean different things.
If I were to typedef "byte" as unsigned char, it would be because I
want to emphasize the fact that a byte object holds one fundamental
unit of data, not necessarily character data.-a And I'd probably
use it in a way that doesn't assume it's 8 bits (unless I have a
good reason not to need portability).-a C's conflation of character
types with "bytes" is IMHO unfortunate;-a a typedef makes it clearer
what the type is being used for.
It could, if someone happens to be looking at the typedef.-a More
often than not what is being looked at is a use of the name, and
not the typedef.-a Readers don't always have time to look up where
the name is defined, and that's why a good choice of name matters.
I usually just use "unsigned char" and remember that it's one byte
(however many bits that is).
I must remember to start using "char unsigned" in preference to
"unsigned char".-a ;)
Nothing wrong with unsigned char? right? ;^o
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 2026-05-13 21:28, Keith Thompson wrote:
[...][...]
Yes, the fact that "break;" is used both for loops and for switch
statements is inconvenient. (csh uses "breaksw" for switch
statements.)
If we'd have two distinct keywords (in a language, BTW, that tried
to avoid too many keywords in the first place!) there would be
complaints (and foremost from Bart, for sure); why they decided to
use two keywords to basically do "the same" thing, namely leaving
a control structure.
I don't see how using 'break' in more than one context would be in
any way "inconvenient".
With nested loops, "break" or "continue" always refers to the innermost
loop. With a switch statement inside a loop, "break" refers to the
switch statement, but "continue" refers to the loop.
It's obviously not impossible to deal with, but I find it mildly
annoying.
On 5/14/26 03:39, Janis Papanagnou wrote:
although it's somewhat
quirky to imagine a thing that is physically "5 Bit"
-a-a Is data on the wire a physical thing ?
-a-a https://en.wikipedia.org/wiki/Baudot_code
In article <10u24l5$2oaav$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:
On 13/05/2026 16:31, Dan Cross wrote:
In article <10u1j2h$1l93l$31@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
On 2026-05-12 16:33, Bart wrote:
[snip]
But would shouldn't people be expected to learn the rules?
Programmers should certainly learn, know, apply, and obey the rules.
(If you don't understand that you may try to transform that truism
to your "car example".)
Programmers _should_ absolutely learn the rules. But in C,
there are many of them, and some of them are deceptively subtle.
_A_ rule that programmers can remember quite easily, however,
is that parenthesis generally carry very high precedence, and
so when it doubt, wrapping something in paren's can aid
understanding (for the programmer and the maintainer). The key
is to find balance between extreme terseness and extreme
verbosity, both of which can feel obfuscating.
There was a time when I knew and had memorized the precedence of
all operators in C. I remember most, but have forgotten some
that I use less frequently; I suspect many programmers are in
the same (or a similar) situation. If I am writing code and can
not immediately remember the precedence of some operator in some
expression, I apply parentheses.
I don't think it is necessary to /learn/ all the rules of a language -
but it is necessary to be aware of them, and to know how well you know
them. It's fine not to be sure of all the precedence rules in a
language (and some languages have many more operators than C, or
stranger precedence rules). You only need to know the ones you rely on
regularly, and the ones you have to read regularly. If you occasionally
come across something different, then you can look it up. There's no
point in filling your head with knowledge that you almost never need.
So there is usually no need to know the precedence rules for mixing
relational operators, shift operators and bitwise and/or operators, or
whatever, if you put parentheses in your own code or split the complex
expression into multiple variables. (With the caveat that you mentioned
earlier that both too few and too many parentheses make code harder to
understand.)
But you might have to understand code written which relies on more of
the details - you need to be aware of what you know, and what you have
to look up, in order to understand the code. The risk comes not from
ignorance of the precedence rules, but from thinking you know them when
you have misremembered them. Self-awareness of your own knowledge,
along with convenient and reliable references, is vital.
Yes, I agree. The key is knowing when it's time to go to look
at a reference.
I like the way you put it.
I might go a bit further and say that it's fine not to know
every rule, but there's a qualitative difference between
acknowledging that and know that easy access to a reliable
reference is useful, and steadfasty, refusing to learn the rules
because one considers them poor to begin with.
On 2026-05-14 03:39, Dan Cross wrote:
In article <10u36pv$1l93k$18@dont-email.me>,
Janis Papanagnou-a <janis_papanagnou+ng@hotmail.com> wrote:
Now back to your statement. The point is that precedence rules
vary between programming languages. Folks can usually rely on
the precedence of * and / compared to + and - .
I can think of at least two languages where you could not, but
yeah, that is usually true.
Are you thinking about languages like Algol 68 where you
can explicitly define and re-define operator precedence,
or do you mean languages where they made just bad design
decisions?
Janis
In article <10u0k0k$1l93l$30@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
[snip]
Bart <bc@freeuk.com> writes:
[...]
So the inconvenience of how 'switch' works is excused because
/sometimes/ you need fallthrough, or the one time in a thousand you
need Duff's device.
I don't see any inconvenience in "how it works"; it actually
allows programmers to implement both semantics as needed. And
both semantics were needed, they have been used. (Even if you
think your projection of your preferences and limited uses is
what should constitute the global software development world.)
It's easy to get wrong. Other languages accommodate both
semantics using alternation in the selector arm. For example,
one might imagine an hypothetical syntax, something like:
switch (a) {
case 1 || 2 || 3 || 4: whatever();
default: other();
}
...with no `break` to end each `case`.
You couldn't use it to build Duff's Device, but I'm not sure
that even Duff would call that a loss.
- Dan C.
On 2026-05-14 03:39, Dan Cross wrote:
In article <10u36pv$1l93k$18@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
Now back to your statement. The point is that precedence rules
vary between programming languages. Folks can usually rely on
the precedence of * and / compared to + and - .
I can think of at least two languages where you could not, but
yeah, that is usually true.
Are you thinking about languages like Algol 68 where you
can explicitly define and re-define operator precedence,
or do you mean languages where they made just bad design
decisions?
On 14/05/2026 03:57, Janis Papanagnou wrote:
On 2026-05-14 03:39, Dan Cross wrote:
In article <10u36pv$1l93k$18@dont-email.me>,
Janis Papanagnou-a <janis_papanagnou+ng@hotmail.com> wrote:
Now back to your statement. The point is that precedence rules
vary between programming languages. Folks can usually rely on
the precedence of * and / compared to + and - .
I can think of at least two languages where you could not, but
yeah, that is usually true.
Are you thinking about languages like Algol 68 where you
can explicitly define and re-define operator precedence,
or do you mean languages where they made just bad design
decisions?
Janis
<OT>
I believe APL does not have operator precedences, though I have never >written more than a one-line program in the language.
And in Forth, operators are all post-fix, so there are no precedences
there either.
But I'm curious which two languages Dan was referring to. (My guess is
that APL was one of them, but I don't know the other.)
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 2026-05-13 21:28, Keith Thompson wrote:
[...][...]
Yes, the fact that "break;" is used both for loops and for switch
statements is inconvenient. (csh uses "breaksw" for switch
statements.)
If we'd have two distinct keywords (in a language, BTW, that tried
to avoid too many keywords in the first place!) there would be
complaints (and foremost from Bart, for sure); why they decided to
use two keywords to basically do "the same" thing, namely leaving
a control structure.
I don't see how using 'break' in more than one context would be in
any way "inconvenient".
With nested loops, "break" or "continue" always refers to the innermost
loop. With a switch statement inside a loop, "break" refers to the
switch statement, but "continue" refers to the loop.
It's obviously not impossible to deal with, but I find it mildly
annoying.
On 2026-05-13 16:31, Dan Cross wrote:
In article <10u1j2h$1l93l$31@dont-email.me>,
Janis Papanagnou-a <janis_papanagnou+ng@hotmail.com> wrote:
On 2026-05-12 16:33, Bart wrote:
[snip]
But would shouldn't people be expected to learn the rules?
Programmers should certainly learn, know, apply, and obey the rules.
(If you don't understand that you may try to transform that truism
to your "car example".)
Programmers _should_ absolutely learn the rules.-a But in C,
there are many of them, and some of them are deceptively subtle.
We agreed.
_A_ rule that programmers can remember quite easily, however,
is that parenthesis generally carry very high precedence, and
so when it doubt, wrapping something in paren's can aid
understanding (for the programmer and the maintainer).
I agree.
The key
is to find balance between extreme terseness and extreme
verbosity, both of which can feel obfuscating.
First, don't forget that there was no problem with precedence
existing in Bart's post; it was just an overloaded and badly
formatted composition in an example of ternary conditionals.
(The point is that - with the exception of & ^ | - the ranking
makes perfectly sense
On 13/05/2026 16:57, Dan Cross wrote:
In article <10u0k0k$1l93l$30@dont-email.me>,
Janis Papanagnou-a <janis_papanagnou+ng@hotmail.com> wrote:
[snip]
Bart <bc@freeuk.com> writes:
[...]
So the inconvenience of how 'switch' works is excused because
/sometimes/ you need fallthrough, or the one time in a thousand you
need Duff's device.
I don't see any inconvenience in "how it works"; it actually
allows programmers to implement both semantics as needed. And
both semantics were needed, they have been used. (Even if you
think your projection of your preferences and limited uses is
what should constitute the global software development world.)
It's easy to get wrong.-a Other languages accommodate both
semantics using alternation in the selector arm.-a For example,
one might imagine an hypothetical syntax, something like:
-a-a-a-a switch (a) {
-a-a-a-a-a-a-a-a case 1 || 2 || 3 || 4: whatever();
-a-a-a-a-a-a-a-a default: other();
-a-a-a-a }
...with no `break` to end each `case`.
You couldn't use it to build Duff's Device, but I'm not sure
that even Duff would call that a loss.
-a-a-a-a-a-a-a-a - Dan C.
Anyone curious about how far C's switch statements can be used or
abused, might like to read about "Protothreads" :
<https://en.wikipedia.org/wiki/Protothread>
This is a conglomeration of Duff's Device on steroids with supporting
macros that gives you a limited type of stackless cooperative
multitasking with extremely low overhead.-a The library has seen real
usage in small embedded systems.-a Reactions to the underlying implementation range from thinking it is a hideous abuse of a bad
language design, to elegant and very ingenious.
On 2026-05-14 00:42, Tim Rentsch wrote:
I must remember to start using "char unsigned" in preference to
"unsigned char". ;)
Despite the smiley I can't really interpret that. So a honest
question; is there a difference in those two, or what do you
want to express by that?
On 2026-05-13 20:00, Tim Rentsch wrote:
There is another difference worth noting. A byte is a unit of
storage, whereas octet is a measure of information. The word
byte is inherently about memory; the word octet is inherently
about value (eight bits of information). For this reason too
the name 'octet' is a better choice for a type name than 'byte'.
Well, I have a slightly different view; I suppose it's cultural.
I often see, specifically from the Anglo-American culture, that
they talk about, say, "8 bits"; and this has partly culturally
also spread across the ocean. - Here we try to distinguish the
units and the "metal"; the latter are formally substantives and
written with a capital letter. So we have units of "1 bit" or
"5 bit" entities (no 's' at the end). But seen as "metal" we
speak about "one Bit" or "five Bits" - although it's somewhat
quirky to imagine a thing that is physically "5 Bit", mostly it
is more accurate to say it's an entity of "5 bit" - and similar
with "1 byte". Because we use that also as _unit_ for 8 bit
entities. It gets complicated by us addressing the unit 'bit'
by a name, which is then "Bit". So the more accurate forms for
the _units_ are 5 bit or 8 byte. - As said, we may culturally
see that differently, and colloquially you nowadays also often
hear "5 Bits" or "8 Bytes" (as pluralized substantive), so it's
cumbersome to argue about that. - Only that "byte" is also a
unit (and not necessarily associated with memory) seems to be
our difference in how we view that.
On 2026-05-12 20:09, Dan Cross wrote:
[...] one of the major motivating
factors for using unsigned arithmetic in practice is to have the
full bit-range of the type available. [...]
Hmm.. - I'm using 'unsigned' typically to express the domain of the application values (not to "wrest" some more values out of a type).
On 2026-05-13 17:12, Scott Lurndal wrote:
We use:
# if __GNUC__ >= 7 // 'statement attributes' were new with GCC 7.x
# if defined(__cplusplus) && (__cplusplus >= 201103L) // C++11 or greater >> # define XXX_FALLTHROUGH [[gnu::fallthrough]]
# else
# define XXX_FALLTHROUGH __attribute__ ((fallthrough))
# endif
# else // GCC 4.x, 5.x, 6.x, comment only!
# define XXX_FALLTHROUGH /* Fall Through */
# endif
Where 'XXX' is replaced by the app name.
switch (variable) {
case cond1:
break;
case cond:
do something
XXX_FALLTHROUGH
default:
do something else
}
Just a note aside; couldn't the XXX be automatically concatenated using
the CPP features? (I seem to recall we've done such things back then.)
I also wonder about the app-specific variants; wouldn't one version for
all apps have sufficed?
On 2026-05-13 20:00, Tim Rentsch wrote:
There is another difference worth noting. A byte is a unit of
storage, whereas octet is a measure of information. The word
byte is inherently about memory; the word octet is inherently
about value (eight bits of information). For this reason too
the name 'octet' is a better choice for a type name than 'byte'.
Well, I have a slightly different view; I suppose it's cultural.
I often see, specifically from the Anglo-American culture, that
they talk about, say, "8 bits";
[...] [Remembering precedence in C is difficult because of]
a mis-ranking of a class of operators in "C". (I noticed that
already when I read K&R some time around 1985, but I first saw
that "officially" acknowledged not too long ago when someone
posted a link to a paper from, IIRC, some time in the 1990's
written by one of the authors of "C".) - And that discrepancy
detail in C's precedence ranking was actually the only reason
for me looking "regularly" into the precedence table of my K&R.
(The point is that - with the exception of & ^ | - the ranking
makes perfectly sense and should be easily usable without doubt
by a concept-knowing programmer. But note that, historically,
a sort of "rationale" can be formulated for the discrepancy to
justify the given choice in context of specifically "C". But
still remember the "official" acknowledgement of an issue here.)
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
[..discussing C expression syntax..]
[...] [Remembering precedence in C is difficult because of]
a mis-ranking of a class of operators in "C". (I noticed that
already when I read K&R some time around 1985, but I first saw
that "officially" acknowledged not too long ago when someone
posted a link to a paper from, IIRC, some time in the 1990's
written by one of the authors of "C".) - And that discrepancy
detail in C's precedence ranking was actually the only reason
for me looking "regularly" into the precedence table of my K&R.
(The point is that - with the exception of & ^ | - the ranking
makes perfectly sense and should be easily usable without doubt
by a concept-knowing programmer. But note that, historically,
a sort of "rationale" can be formulated for the discrepancy to
justify the given choice in context of specifically "C". But
still remember the "official" acknowledgement of an issue here.)
I think it's easy to remember how expressions in C work with the
help of just a few memory aids:
1. unary operators are always ahead of binary operators, first
those on the right and then those on the left;
3. sizeof is greedy with respect to type names: sizeof (int)+1
is (sizeof (int))+1, not sizeof ((int)+1)
2. the bitwise operators form a sandwich enclosing the relational
operators and the equality operators - shift (<<,>>) on top,
and the three kinds of logical operations (&,^,|) underneath;
On 14/05/2026 16:37, Tim Rentsch wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
[..discussing C expression syntax..]
[...] [Remembering precedence in C is difficult because of]
a mis-ranking of a class of operators in "C". (I noticed that
already when I read K&R some time around 1985, but I first saw
that "officially" acknowledged not too long ago when someone
posted a link to a paper from, IIRC, some time in the 1990's
written by one of the authors of "C".) - And that discrepancy
detail in C's precedence ranking was actually the only reason
for me looking "regularly" into the precedence table of my K&R.
(The point is that - with the exception of & ^ | - the ranking
makes perfectly sense and should be easily usable without doubt
by a concept-knowing programmer. But note that, historically,
a sort of "rationale" can be formulated for the discrepancy to
justify the given choice in context of specifically "C". But
still remember the "official" acknowledgement of an issue here.)
I think it's easy to remember how expressions in C work with the
help of just a few memory aids:
1. unary operators are always ahead of binary operators, first
those on the right and then those on the left;
Unary operators aren't the problem. It's a mystery why they need to be
in a table at all. Nobody's going to think that '&a + b' means '&(a +
b)'.
3. sizeof is greedy with respect to type names: sizeof (int)+1
is (sizeof (int))+1, not sizeof ((int)+1)
This isn't a problem either: it works like a unary operator.
2. the bitwise operators form a sandwich enclosing the relational
operators and the equality operators - shift (<<,>>) on top,
and the three kinds of logical operations (&,^,|) underneath;
This is where the trouble starts: these make up 6 different levels.
Combinations of & ^ | are rare enough, as bitwise operations, that
you'd use parentheses anyway. They don't need 3 separate levels.
Comparison ones don't need 2 levels.
And shift operators don't really need their own level either. (Since
they scale numbers just like * and /, they can be lumped in with
those. Having 'a * 8 + b' mean the same as 'a << 3 + b' makes sense; currently they have quite different meanings.)
On 14/05/2026 16:37, Tim Rentsch wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
[..discussing C expression syntax..]
[...]-a [Remembering precedence in C is difficult because of]
a mis-ranking of a class of operators in "C".-a (I noticed that
already when I read K&R some time around 1985, but I first saw
that "officially" acknowledged not too long ago when someone
posted a link to a paper from, IIRC, some time in the 1990's
written by one of the authors of "C".) - And that discrepancy
detail in C's precedence ranking was actually the only reason
for me looking "regularly" into the precedence table of my K&R.
(The point is that - with the exception of & ^ | - the ranking
makes perfectly sense and should be easily usable without doubt
by a concept-knowing programmer.-a But note that, historically,
a sort of "rationale" can be formulated for the discrepancy to
justify the given choice in context of specifically "C".-a But
still remember the "official" acknowledgement of an issue here.)
I think it's easy to remember how expressions in C work with the
help of just a few memory aids:
-a-a 1. unary operators are always ahead of binary operators, first
-a-a-a-a-a those on the right and then those on the left;
Unary operators aren't the problem. It's a mystery why they need to be
in a table at all. Nobody's going to think that '&a + b' means '&(a + b)'.
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 2026-05-13 17:12, Scott Lurndal wrote:
We use:
# if __GNUC__ >= 7 // 'statement attributes' were new with GCC 7.x
# if defined(__cplusplus) && (__cplusplus >= 201103L) // C++11 or higher >>> # define XXX_FALLTHROUGH [[gnu::fallthrough]]
# else
# define XXX_FALLTHROUGH __attribute__ ((fallthrough))
# endif
# else // GCC 4.x, 5.x, 6.x, comment only!
# define XXX_FALLTHROUGH /* Fall Through */
# endif
Where 'XXX' is replaced by the app name.
switch (variable) {
case cond1:
break;
case cond:
do something
XXX_FALLTHROUGH
default:
do something else
}
Just a note aside; couldn't the XXX be automatically concatenated using
the CPP features? (I seem to recall we've done such things back then.)
Not sure I understand your question. I used xxx above just
to obscure the name of the proprietary program that includes
the above file.
I also wonder about the app-specific variants; wouldn't one version for
all apps have sufficed?
There is a need to support gcc4 through gcc14 in that project. We've subsequently raised the lower limit to gcc7. The project was started
in 2012.
scott@slp53.sl.home (Scott Lurndal) writes:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 2026-05-13 17:12, Scott Lurndal wrote:
We use:
# if __GNUC__ >= 7 // 'statement attributes' were new with GCC 7.x
# if defined(__cplusplus) && (__cplusplus >= 201103L) // C++11 or higher
# define XXX_FALLTHROUGH [[gnu::fallthrough]]
# else
# define XXX_FALLTHROUGH __attribute__ ((fallthrough))
# endif
# else // GCC 4.x, 5.x, 6.x, comment only!
# define XXX_FALLTHROUGH /* Fall Through */
# endif
Where 'XXX' is replaced by the app name.
switch (variable) {
case cond1:
break;
case cond:
do something
XXX_FALLTHROUGH
default:
do something else
}
Just a note aside; couldn't the XXX be automatically concatenated using >>> the CPP features? (I seem to recall we've done such things back then.)
Not sure I understand your question. I used xxx above just
to obscure the name of the proprietary program that includes
the above file.
I also wonder about the app-specific variants; wouldn't one version for >>> all apps have sufficed?
There is a need to support gcc4 through gcc14 in that project. We've
subsequently raised the lower limit to gcc7. The project was started
in 2012.
If instead you use
#define XXX_FALLTHROUGH GOHERE_( __LINE__ )
#define GOHERE_( n ) GOHERE__( n )
#define GOHERE__( n ) goto RIGHT_HYAR_##n; RIGHT_HYAR_##n:
and just give 'XXX_FALLTHROUGH;', how are the results? It
works fine in my tests.
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 2026-05-13 20:00, Tim Rentsch wrote:
There is another difference worth noting. A byte is a unit of
storage, whereas octet is a measure of information. The word
byte is inherently about memory; the word octet is inherently
about value (eight bits of information). For this reason too
the name 'octet' is a better choice for a type name than 'byte'.
Well, I have a slightly different view; I suppose it's cultural.
I often see, specifically from the Anglo-American culture, that
they talk about, say, "8 bits"; and this has partly culturally
also spread across the ocean. - Here we try to distinguish the
units and the "metal"; the latter are formally substantives and
written with a capital letter. So we have units of "1 bit" or
"5 bit" entities (no 's' at the end). But seen as "metal" we
speak about "one Bit" or "five Bits" - although it's somewhat
quirky to imagine a thing that is physically "5 Bit", mostly it
is more accurate to say it's an entity of "5 bit" - and similar
with "1 byte". Because we use that also as _unit_ for 8 bit
entities. It gets complicated by us addressing the unit 'bit'
by a name, which is then "Bit". So the more accurate forms for
the _units_ are 5 bit or 8 byte. - As said, we may culturally
see that differently, and colloquially you nowadays also often
hear "5 Bits" or "8 Bytes" (as pluralized substantive), so it's
cumbersome to argue about that. - Only that "byte" is also a
unit (and not necessarily associated with memory) seems to be
our difference in how we view that.
I don't know if I see what you're getting at here. My writing
follows standard usage in American English. Sometimes the names
of units are capitalized but for the most part they aren't. The
names of units are singular or plural when used as nouns (1 bit,
2 bits), but singular when used as adjectives (16-bit int).
There may be exceptions to those rules, I haven't thought about
it deeply.
My main point is that "byte" and "octet" are talking about
different kinds of things.
A computer might have 64k bytes of
RAM, but normally I wouldn't (and I think normally other people
wouldn't) say that a computer has 64k octets of RAM.
We might
say a computer has enough RAM to _hold_ 64k octets, but not that
it _has_ 64k octets. There's a semantic incongruity in the
latter case. Do you see what I mean?
[...] the issue being discussed was that multiple cases (that may
not be contiguous) depend on the default fallthrough behavior.
case 10:
case 20:
case 30:
whatever();
break;
In a hypothetical C-like language without default fallthrough, it
would make sense to invent a different syntax. For C repeating the
"case" keyword is slightly ugly, but probably not worth fixing.
Bart <bc@freeuk.com> writes:
On 14/05/2026 16:37, Tim Rentsch wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
[..discussing C expression syntax..]
[...] [Remembering precedence in C is difficult because of]
a mis-ranking of a class of operators in "C". (I noticed that
already when I read K&R some time around 1985, but I first saw
that "officially" acknowledged not too long ago when someone
posted a link to a paper from, IIRC, some time in the 1990's
written by one of the authors of "C".) - And that discrepancy
detail in C's precedence ranking was actually the only reason
for me looking "regularly" into the precedence table of my K&R.
(The point is that - with the exception of & ^ | - the ranking
makes perfectly sense and should be easily usable without doubt
by a concept-knowing programmer. But note that, historically,
a sort of "rationale" can be formulated for the discrepancy to
justify the given choice in context of specifically "C". But
still remember the "official" acknowledgement of an issue here.)
I think it's easy to remember how expressions in C work with the
help of just a few memory aids:
1. unary operators are always ahead of binary operators, first
those on the right and then those on the left;
Unary operators aren't the problem. It's a mystery why they need to be
in a table at all. Nobody's going to think that '&a + b' means '&(a +
b)'.
3. sizeof is greedy with respect to type names: sizeof (int)+1
is (sizeof (int))+1, not sizeof ((int)+1)
This isn't a problem either: it works like a unary operator.
2. the bitwise operators form a sandwich enclosing the relational
operators and the equality operators - shift (<<,>>) on top,
and the three kinds of logical operations (&,^,|) underneath;
This is where the trouble starts: these make up 6 different levels.
Combinations of & ^ | are rare enough, as bitwise operations, that
you'd use parentheses anyway. They don't need 3 separate levels.
Comparison ones don't need 2 levels.
And shift operators don't really need their own level either. (Since
they scale numbers just like * and /, they can be lumped in with
those. Having 'a * 8 + b' mean the same as 'a << 3 + b' makes sense;
currently they have quite different meanings.)
I wasn't trying to help you. I know that's a lost cause.
On 14/05/2026 18:00, Bart wrote:
On 14/05/2026 16:37, Tim Rentsch wrote:Unary operator precedence is certainly important.-a (*p)++ and *(p++) are very different things, and *p++ can only mean one of them.
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
[..discussing C expression syntax..]
[...]-a [Remembering precedence in C is difficult because of]
a mis-ranking of a class of operators in "C".-a (I noticed that
already when I read K&R some time around 1985, but I first saw
that "officially" acknowledged not too long ago when someone
posted a link to a paper from, IIRC, some time in the 1990's
written by one of the authors of "C".) - And that discrepancy
detail in C's precedence ranking was actually the only reason
for me looking "regularly" into the precedence table of my K&R.
(The point is that - with the exception of & ^ | - the ranking
makes perfectly sense and should be easily usable without doubt
by a concept-knowing programmer.-a But note that, historically,
a sort of "rationale" can be formulated for the discrepancy to
justify the given choice in context of specifically "C".-a But
still remember the "official" acknowledgement of an issue here.)
I think it's easy to remember how expressions in C work with the
help of just a few memory aids:
-a-a 1. unary operators are always ahead of binary operators, first
-a-a-a-a-a those on the right and then those on the left;
Unary operators aren't the problem. It's a mystery why they need to be
in a table at all. Nobody's going to think that '&a + b' means '&(a +
b)'.
On 14/05/2026 17:51, David Brown wrote:
On 14/05/2026 18:00, Bart wrote:
On 14/05/2026 16:37, Tim Rentsch wrote:Unary operator precedence is certainly important.-a (*p)++ and *(p++)
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
[..discussing C expression syntax..]
[...]-a [Remembering precedence in C is difficult because of]
a mis-ranking of a class of operators in "C".-a (I noticed that
already when I read K&R some time around 1985, but I first saw
that "officially" acknowledged not too long ago when someone
posted a link to a paper from, IIRC, some time in the 1990's
written by one of the authors of "C".) - And that discrepancy
detail in C's precedence ranking was actually the only reason
for me looking "regularly" into the precedence table of my K&R.
(The point is that - with the exception of & ^ | - the ranking
makes perfectly sense and should be easily usable without doubt
by a concept-knowing programmer.-a But note that, historically,
a sort of "rationale" can be formulated for the discrepancy to
justify the given choice in context of specifically "C".-a But
still remember the "official" acknowledgement of an issue here.)
I think it's easy to remember how expressions in C work with the
help of just a few memory aids:
-a-a 1. unary operators are always ahead of binary operators, first
-a-a-a-a-a those on the right and then those on the left;
Unary operators aren't the problem. It's a mystery why they need to
be in a table at all. Nobody's going to think that '&a + b' means
'&(a + b)'.
are very different things, and *p++ can only mean one of them.
Yes, but it is nothing to do with the precedences of binary operators**. This is pretty much universal.
Unary ops in charts share the one precedence level, and have their own
rules when there is a cluster of them around the same term.
That is, start on the one to the immediate right, and work left to
right, then the one on the immediate left and go right to left. If 'a b
c d' are unary operators, then:
-a a b X c d
is evaluated as a(b(d(c(X)))).
However I couldn't find a valid example in C of successive post-fix operators, so there will be at most one on the right. In that case,
'right to left' is accurate.
Still, Tim said '/those/' on the right, so I'd be interested if there
was in fact such an example, then I think you would have to evaluate
them 'inside-out' like my example.
By that may just have meant the '++ --' operators in general, not more
than one in any one example.
(** I don't count '.' as a binary operator.)
On 14/05/2026 01:59, Janis Papanagnou wrote:[...]
(The point is that - with the exception of & ^ | - the ranking
makes perfectly sense
It doesn't make sense even then; here are the remaining groups for
binary ops from high to low:
(* / %) (+ -) (<< >>) (< <= >= >) (== !==) (&&) (||) (=)
Why are the shift operators at that spot? This causes chaos in
expressions like 'a << 3 + b' which are parsed as 'a << (3 + b)'.
Why are == and != lower precedence than the other compare operators?
In which circumstances would that be an advantage? This is just a
pointless extra level, as such usage would be so unusual that you'd
use parentheses anyway.
TBF, while other languages may not have as many levels, they also have questionable choices, because there are no standards.
At best it is generally agreed that there are 3 groups (4 including assignment) again arranged from high to low:
1 School arithmetic which everyone knows
2 Comparisons
3 Logical (and, or)
4 (Assignment)
These should be intuitive, all that's left is the ordering within
group 1 and group 3, and also where these extra ops need to go:
<< >> & | ^
In the case if C, it also decided that ?: belongs in this chart of
/binary/ operators. (I supposed you can consider each of ? and : as a
binary operator...)
In article <86pl2yi0n3.fsf@linuxsc.com>,[...]
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
My main point is that "byte" and "octet" are talking about
different kinds of things.
Not really. It has always been understood to refer to the same
kind of thing that "byte" refers to.
The problem was that, at the time the term "octet" was coined,
the size of a byte (measured in bits) varied between different
computers, and sometimes on the same computer. When people
starting getting serious about making computers talk to one
another, this became an issue: hence octet to have standard
nomenclature.
A computer might have 64k bytes of
RAM, but normally I wouldn't (and I think normally other people
wouldn't) say that a computer has 64k octets of RAM.
Some would, though it may sound a bit odd.
At this point, the term "byte" has been standardized by several
different bodies (IEC, ISO) to be synonymous with octet. The
continued use of "octet" by organizations like the IETF is
mostly a legacy curiosity.
Unary operators aren't the problem. It's a mystery why they need to be[...]
in a table at all. Nobody's going to think that '&a + b' means '&(a +
b)'.
On 14/05/2026 03:58, Keith Thompson wrote:[...]
With nested loops, "break" or "continue" always refers to the
innermost loop. With a switch statement inside a loop, "break"
refers to the switch statement, but "continue" refers to the loop.
It's obviously not impossible to deal with, but I find it mildly
annoying.
Break doing the two jobs is a flaw. 'break' and 'continue' being
inconsistent is a further one:
Suppose you have a loop, and within the loop, you have an if-else-if
chain within which are 'break' and 'continue' statements.
You decide that that if-else-if chain is better off as a switch. But
now, while 'continue' continues to do its job, 'break' silently
behaves differently.
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
[...] the issue being discussed was that multiple cases (that may
not be contiguous) depend on the default fallthrough behavior.
case 10:
case 20:
case 30:
whatever();
break;
In a hypothetical C-like language without default fallthrough, it
would make sense to invent a different syntax. For C repeating the
"case" keyword is slightly ugly, but probably not worth fixing.
Both gcc and clang, with options -std=c99 -pedantic -Wall -Wextra,
accept the code below and give no diagnostics:
#include <stdio.h>
#include "cases.h"
int
main( int argc, char *argv[] ){
switch( argc-1 ){
cases( 2, 3, 5, 8 ):
The cases() macro is a fairly straightforward application of variadic
macros, as follows:
#define ARGS_N_(...) \
ARGS_N_X_( __VA_ARGS__, \
09, 08, 07, 06, 05, 04, 03, 02, 01, 00 \
)
#define ARGS_N_X_( dummy, _9, _8, _7, _6, _5, _4, _3, _2, _1, ... ) _1
#define cases(...) casesx_( ARGS_N_( __VA_ARGS__ ), __VA_ARGS__ )
#define casesx_( N, ... ) casesy_( N, __VA_ARGS__ )
#define casesy_( N, ... ) cases_ ## N ## _( __VA_ARGS__ )
#define cases_01_(a) case a
#define cases_02_(a,...) case a : cases_01_( __VA_ARGS__ )
#define cases_03_(a,...) case a : cases_02_( __VA_ARGS__ )
#define cases_04_(a,...) case a : cases_03_( __VA_ARGS__ )
#define cases_05_(a,...) case a : cases_04_( __VA_ARGS__ )
#define cases_06_(a,...) case a : cases_05_( __VA_ARGS__ )
#define cases_07_(a,...) case a : cases_06_( __VA_ARGS__ )
#define cases_08_(a,...) case a : cases_07_( __VA_ARGS__ )
#define cases_09_(a,...) case a : cases_08_( __VA_ARGS__ )
It's easy to see how to extend this definition to allow more cases, if
that is needed.
Personally I would rather have something that works in C99, etc, now,
than to wait for some possible change at some point in the indefinite
future.
Bart <bc@freeuk.com> writes:
On 14/05/2026 01:59, Janis Papanagnou wrote:[...]
(The point is that - with the exception of & ^ | - the ranking
makes perfectly sense
It doesn't make sense even then; here are the remaining groups for
binary ops from high to low:
(* / %) (+ -) (<< >>) (< <= >= >) (== !==) (&&) (||) (=)
Why are the shift operators at that spot? This causes chaos in
expressions like 'a << 3 + b' which are parsed as 'a << (3 + b)'.
I've never heard anyone claim that C's operator precedence rules are
ideal. They aren't. But they can't be changed without breaking
existing code, so there's little point in complaining about it.
TBF, while other languages may not have as many levels, they also have
questionable choices, because there are no standards.
Plenty of other languages have standards.
At best it is generally agreed that there are 3 groups (4 including
assignment) again arranged from high to low:
Agreed by whom?
1 School arithmetic which everyone knows
2 Comparisons
3 Logical (and, or)
4 (Assignment)
Bart <bc@freeuk.com> writes:
Why are == and != lower precedence than the other compare operators?
In which circumstances would that be an advantage? This is just a
pointless extra level, as such usage would be so unusual that you'd
use parentheses anyway.
I suggest that it doesn't matter why. It is what it is. And yes,
I'd add parentheses in the unlikely event that I needed to write
an expression that uses both equality and comparison operators
(unless I were writing deliberately obfuscated code, which I've
been known to do).
[...]
These should be intuitive, all that's left is the ordering within
group 1 and group 3, and also where these extra ops need to go:
<< >> & | ^
[...][...]
Bart <bc@freeuk.com> writes:
[...]
Unary operators aren't the problem. It's a mystery why they need to be[...]
in a table at all. Nobody's going to think that '&a + b' means '&(a +
b)'.
It would be silly for an operator precedence tables to omit the
operators that "everybody knows". If I had a table that didn't
show *all* the operators, I'd look for a better table (like the
one in K&R2).
On 14/05/2026 01:59, Janis Papanagnou wrote:
[...]
(The point is that - with the exception of & ^ | - the ranking
makes perfectly sense
It doesn't make sense even then; [...]
[...]
TBF, while other languages may not have as many levels, they also have questionable choices, because there are no standards.
[...]
In the case if C, it also decided that ?: belongs in this chart of /
binary/ operators. (I supposed you can consider each of ? and : as a
binary operator...)
On 14/05/2026 16:37, Tim Rentsch wrote:
[...]
-a-a 1. unary operators are always ahead of binary operators, first
-a-a-a-a-a those on the right and then those on the left;
Unary operators aren't the problem. It's a mystery why they need to be
in a table at all. [...]
[...]
On 15/05/2026 00:40, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
Unary operators aren't the problem. It's a mystery why they need to be[...]
in a table at all. Nobody's going to think that '&a + b' means '&(a +
b)'.
It would be silly for an operator precedence tables to omit the
operators that "everybody knows". If I had a table that didn't
show *all* the operators, I'd look for a better table (like the
one in K&R2).
Do you need to know the precedence of a unary operator (say applied to
any of these terms) in order to correctly parse this:
a op1 b op2 c
?
'a b c' are terms, and 'op1 op2' are operators. You need to know their relative precedences in order to correctly parse this as either '(a
op1 b) op2 c' or 'a op1 (b op2 c)'. Any unary ops on those terms don't
affect that.
On 14/05/2026 20:19, Bart wrote:
[...]
I certainly agree it would be odd if there were binary arithmetic
operators with higher precedence than unary operators.-a [...]
On 2026-05-14 20:50, David Brown wrote:
On 14/05/2026 20:19, Bart wrote:
[...]
I certainly agree it would be odd if there were binary arithmetic
operators with higher precedence than unary operators.-a [...]
I've just posted an example; exponentiation. It depends on the
language. (Below examples from Algol 68 and Awk...)
$ genie -p '-2^4'
-a-a-a-a-a-a-a-a-a-a-a-a-a-a-a-a +16
$ awk 'BEGIN{print -2^4}'
-16
But (I think; unless that has changed) "C" has no exponentiation
operator, so it doesn't apply here at least.
But if you start with any new language you have to inspect the
documentation about its operators and their precedence rules.
cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]
At this point, the term "byte" has been standardized by several
different bodies (IEC, ISO) to be synonymous with octet. The
continued use of "octet" by organizations like the IETF is
mostly a legacy curiosity.
Has it? The ISO C and C++ standards certainly do not use "byte"
to mean exactly 8 bits. ISO/IEC 2382 says:
byte
string that consists of a number of bits, treated as a unit, and
usually representing a character or a part of a character
Note 1 to entry: The number of bits in a byte is fixed for a given
data processing system.
Note 2 to entry: The number of bits in a byte is usually 8.
and
octet
8-bit byte
byte that consists of eight bits
<https://www.iso.org/obp/ui/#iso:std:iso-iec:2382:ed-1:v2:en>
The latter implies that you can't have octets on a system with,
say, 16-bit bytes, which doesn't match what I would have expected.
I would think it would be reasonable to say that a system with
16-bit bytes has, say, 32k bytes or 64k octets of memory. But C
doesn't use the word "octet", so this is at best marginally topical.
[...]
It's not really helped by having having a table that combines
precedences of different kinds of operator.
[...]--- Synchronet 3.22a-Linux NewsLink 1.2
Anyone curious about how far C's switch statements can be used or
abused, might like to read about "Protothreads" :
<https://en.wikipedia.org/wiki/Protothread>
This is a conglomeration of Duff's Device on steroids with supporting
macros that gives you a limited type of stackless cooperative
multitasking with extremely low overhead.-a The library has seen real
usage in small embedded systems.-a Reactions to the underlying implementation range from thinking it is a hideous abuse of a bad
language design, to elegant and very ingenious.
In article <86pl2yi0n3.fsf@linuxsc.com>,[...]
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
My main point is that "byte" and "octet" are talking about
different kinds of things.
Not really. It has always been understood to refer to the same
kind of thing that "byte" refers to.
I agree, at least for the way I understand the terms. For me,
"octet" and "byte" refer to the same kind of thing. The difference
is that an "octet" is specifically 8 bits, and a "byte" is a
fundamental unit of storage for a given system (commonly 8 bits).
ISO/IEC 2382 happens to agree with me.
The problem was that, at the time the term "octet" was coined,
the size of a byte (measured in bits) varied between different
computers, and sometimes on the same computer. When people
starting getting serious about making computers talk to one
another, this became an issue: hence octet to have standard
nomenclature.
A computer might have 64k bytes of
RAM, but normally I wouldn't (and I think normally other people
wouldn't) say that a computer has 64k octets of RAM.
Some would, though it may sound a bit odd.
Agreed. "64k bytes" is certainly more common, but "64k octets"
means essentially the same thing while being more specific.
Also, the "k" suffix formally means 1000, but is often used to mean
1024, which is why we have "Ki", "kibi" to denote a power of two
explicitly.
[...]]
At this point, the term "byte" has been standardized by several
different bodies (IEC, ISO) to be synonymous with octet. The
continued use of "octet" by organizations like the IETF is
mostly a legacy curiosity.
Has it?
The ISO C and C++ standards certainly do not use "byte"
to mean exactly 8 bits.
ISO/IEC 2382 says:
byte
string that consists of a number of bits, treated as a unit, and
usually representing a character or a part of a character
Note 1 to entry: The number of bits in a byte is fixed for a given
data processing system.
Note 2 to entry: The number of bits in a byte is usually 8.
and
octet
8-bit byte
byte that consists of eight bits
<https://www.iso.org/obp/ui/#iso:std:iso-iec:2382:ed-1:v2:en>
The latter implies that you can't have octets on a system with,
say, 16-bit bytes, which doesn't match what I would have expected.
I would think it would be reasonable to say that a system with
16-bit bytes has, say, 32k bytes or 64k octets of memory. But C
doesn't use the word "octet", so this is at best marginally topical.
It's not really helped by having having a table that combines[...]
precedences of different kinds of operator.
In article <10u5m4m$uo0d$3@kst.eternal-september.org>,[...]
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
At this point, the term "byte" has been standardized by several
different bodies (IEC, ISO) to be synonymous with octet. The
continued use of "octet" by organizations like the IETF is
mostly a legacy curiosity.
Has it?
Yes. IEC 80000-13 declares them to be synonyms.
In article <10u5m4m$uo0d$3@kst.eternal-september.org>,[...]
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
At this point, the term "byte" has been standardized by several
different bodies (IEC, ISO) to be synonymous with octet. The
continued use of "octet" by organizations like the IETF is
mostly a legacy curiosity.
Has it?
Yes. IEC 80000-13 declares them to be synonyms.
Interesting. It's odd that ISO/IEC 2382 and ISO/IEC 80000-13
disagree with each other.
<https://www.iso.org/standard/87648.html>
IEC 80000-13:2025 Quantities and units
Part 13: Information science and technology
I'm not going to spend 115 Swiss Francs (currently
146.39 USD) to get a copy.
If you have a copy, can you quote the relevant wording?
On 2026-05-14 20:50, David Brown wrote:
On 14/05/2026 20:19, Bart wrote:
[...]
I certainly agree it would be odd if there were binary arithmetic
operators with higher precedence than unary operators.-a [...]
I've just posted an example; exponentiation. It depends on the
language. (Below examples from Algol 68 and Awk...)
$ genie -p '-2^4'
-a-a-a-a-a-a-a-a-a-a-a-a-a-a-a-a +16
$ awk 'BEGIN{print -2^4}'
-16
But (I think; unless that has changed) "C" has no exponentiation
operator, so it doesn't apply here at least.
But if you start with any new language you have to inspect the
documentation about its operators and their precedence rules.
Janis
On 15/05/2026 00:40, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
Unary operators aren't the problem. It's a mystery why they need to be[...]
in a table at all. Nobody's going to think that '&a + b' means '&(a +
b)'.
It would be silly for an operator precedence tables to omit the
operators that "everybody knows".-a If I had a table that didn't
show *all* the operators, I'd look for a better table (like the
one in K&R2).
Do you need to know the precedence of a unary operator (say applied to
any of these terms) in order to correctly parse this:
-a-a a op1 b op2 c
?
'a b c' are terms, and 'op1 op2' are operators. You need to know their relative precedences in order to correctly parse this as either '(a op1
b) op2 c' or 'a op1 (b op2 c)'. Any unary ops on those terms don't
affect that.
On 15/05/2026 02:31, Bart wrote:[...]
Do you need to know the precedence of a unary operator (say applied
to any of these terms) in order to correctly parse this:
-a-a a op1 b op2 c
?
'a b c' are terms, and 'op1 op2' are operators. You need to know
their relative precedences in order to correctly parse this as
either '(a op1 b) op2 c' or 'a op1 (b op2 c)'. Any unary ops on
those terms don't affect that.
That argument makes no sense.
$ awk 'BEGIN{print -2^4}'
-16
On 15/05/2026 02:31, Bart wrote:
On 15/05/2026 00:40, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
Unary operators aren't the problem. It's a mystery why they need to be >>>> in a table at all. Nobody's going to think that '&a + b' means '&(a +[...]
b)'.
It would be silly for an operator precedence tables to omit the
operators that "everybody knows".-a If I had a table that didn't
show *all* the operators, I'd look for a better table (like the
one in K&R2).
Do you need to know the precedence of a unary operator (say applied to
any of these terms) in order to correctly parse this:
-a-a-a a op1 b op2 c
?
'a b c' are terms, and 'op1 op2' are operators. You need to know their
relative precedences in order to correctly parse this as either '(a
op1 b) op2 c' or 'a op1 (b op2 c)'. Any unary ops on those terms don't
affect that.
That argument makes no sense.
You don't need to know where binary "-" fits in the precedence ordering
in order to correctly parse :
-a-a-a-aa + b / c
However, I doubt if you would be happy with a table of operators that omitted binary minus.
Yes, it would be possible to draw tables of C operator precedence where
you had separate tables for each type of operator, and then a separate description of how they fit together.-a But it is a lot simpler, clearer
and easier to use if you have a table that includes them all.
"switch" was originally implemented in a way that, I suspect, was
easier for the compiler to implement
"switch" was originally implemented in a way that, I suspect, was
easier for the compiler to implement
It would also have been familiar from BCPL. When C was designed, switch
would have been recognised as a direct equivalent of BCPL's SWITCHON >construct:
https://archive.org/details/bcpl_20200522/page/19/mode/2up
There's no equivalent of break in that version of BCPL; if you look at >example code from that era (e.g. the Xerox Alto BCPL manuals), the
convention was to use GOTO at the end of each case with a label after
the block. Later versions of BCPL have ENDCASE which works like break:
https://archive.org/details/DTIC_ADA003599/page/41/mode/2up
(and I wonder whether this was influenced by C).
On 15/05/2026 09:32, David Brown wrote:
On 15/05/2026 02:31, Bart wrote:
On 15/05/2026 00:40, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
Unary operators aren't the problem. It's a mystery why they need to be >>>>> in a table at all. Nobody's going to think that '&a + b' means '&(a + >>>>> b)'.[...]
It would be silly for an operator precedence tables to omit the
operators that "everybody knows".-a If I had a table that didn't
show *all* the operators, I'd look for a better table (like the
one in K&R2).
Do you need to know the precedence of a unary operator (say applied to
any of these terms) in order to correctly parse this:
-a-a-a a op1 b op2 c
?
'a b c' are terms, and 'op1 op2' are operators. You need to know their
relative precedences in order to correctly parse this as either '(a
op1 b) op2 c' or 'a op1 (b op2 c)'. Any unary ops on those terms don't
affect that.
That argument makes no sense.
You don't need to know where binary "-" fits in the precedence ordering
in order to correctly parse :
-a-a-a-aa + b / c
However, I doubt if you would be happy with a table of operators that
omitted binary minus.
What do you mean by 'binary "-"' and 'binary minus'?
Are they both the
operator in "x - y" or did one or both mean the unary negation operator
in "-z"?
Yes, it would be possible to draw tables of C operator precedence where
you had separate tables for each type of operator, and then a separate
description of how they fit together.-a But it is a lot simpler, clearer
and easier to use if you have a table that includes them all.
Disagree: in C, the only thing I've used the precedence table for is for
the relative precedence of op1 and op2 in examples like mine.
It's not even useful inside C compilers; you tend to follow the grammar >rather than have it table-driven using such a chart.
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
"switch" was originally implemented in a way that, I suspect, was
easier for the compiler to implement
It would also have been familiar from BCPL. When C was designed, switch
would have been recognised as a direct equivalent of BCPL's SWITCHON construct:
https://archive.org/details/bcpl_20200522/page/19/mode/2up
There's no equivalent of break in that version of BCPL; if you look at example code from that era (e.g. the Xerox Alto BCPL manuals), the
convention was to use GOTO at the end of each case with a label after
the block. Later versions of BCPL have ENDCASE which works like break:
https://archive.org/details/DTIC_ADA003599/page/41/mode/2up
(and I wonder whether this was influenced by C).
In article <10u5m4m$uo0d$3@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <86pl2yi0n3.fsf@linuxsc.com>,[...]
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Also, the "k" suffix formally means 1000, but is often used to mean
1024, which is why we have "Ki", "kibi" to denote a power of two
explicitly.
Yes. K, M, G, etc, have always been the SI indicators for
powers of 10, not powers of 2. The "Ki", "Mi", "Gi", etc, forms
are (as I understand it) relatively new.
[...]]
At this point, the term "byte" has been standardized by several
different bodies (IEC, ISO) to be synonymous with octet. The
continued use of "octet" by organizations like the IETF is
mostly a legacy curiosity.
Has it?
Yes. IEC 80000-13 declares them to be synonyms.
The ISO C and C++ standards certainly do not use "byte"
to mean exactly 8 bits.
Indeed. I don't blame them. I suspect there are some DSP chips
or weird one-off processors with oddball byte sizes, even now.
I would think it would be reasonable to say that a system with
16-bit bytes has, say, 32k bytes or 64k octets of memory. But C
doesn't use the word "octet", so this is at best marginally topical.
I wonder. For word oriented systems, it was common to describe
memory in terms of words (e.g., "the KL-10B processor with
extended addressing supports a maximum of 4 MW of memory...").
Similarly, even for byte-addressed machines, like the PDP-11,
memory capacities were often described in terms of 16-bit words
("this machine has 256 KW of memory", aka, 512 KB). [Of course,
these machines all predate common use if the "Ki" and "Mi"
units). Anyway, there is some precedent for using the machine
specific sizes in discussion, though I agree generally that
using octets makes sense in this context.
None of this has much to do with C, though, as you point out.
- Dan C.
On 15/05/2026 09:32, David Brown wrote:
On 15/05/2026 02:31, Bart wrote:
On 15/05/2026 00:40, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
Unary operators aren't the problem. It's a mystery why they need to be >>>>> in a table at all. Nobody's going to think that '&a + b' means '&(a + >>>>> b)'.[...]
It would be silly for an operator precedence tables to omit the
operators that "everybody knows".-a If I had a table that didn't
show *all* the operators, I'd look for a better table (like the
one in K&R2).
Do you need to know the precedence of a unary operator (say applied
to any of these terms) in order to correctly parse this:
-a-a-a a op1 b op2 c
?
'a b c' are terms, and 'op1 op2' are operators. You need to know
their relative precedences in order to correctly parse this as either
'(a op1 b) op2 c' or 'a op1 (b op2 c)'. Any unary ops on those terms
don't affect that.
That argument makes no sense.
You don't need to know where binary "-" fits in the precedence
ordering in order to correctly parse :
-a-a-a-a-aa + b / c
However, I doubt if you would be happy with a table of operators that
omitted binary minus.
What do you mean by 'binary "-"' and 'binary minus'? Are they both the operator in "x - y" or did one or both mean the unary negation operator
in "-z"?
In my example, 'a b c' each represent arbitrary terms. These are
examples of such terms:
-a-a -x
-a-a &x
-a-a ++x[i]
-a-a x(i, j)
-a-a (x + y)
-a-a x.m--
-a-a -(+(-(sizeof(x))))
In article <10u6t2n$4mai$1@dont-email.me>, Bart <bc@freeuk.com> wrote:
On 15/05/2026 09:32, David Brown wrote:
On 15/05/2026 02:31, Bart wrote:
On 15/05/2026 00:40, Keith Thompson wrote:
Bart <bc@freeuk.com> writes:
[...]
Unary operators aren't the problem. It's a mystery why they need to be >>>>>> in a table at all. Nobody's going to think that '&a + b' means '&(a + >>>>>> b)'.[...]
It would be silly for an operator precedence tables to omit the
operators that "everybody knows".-a If I had a table that didn't
show *all* the operators, I'd look for a better table (like the
one in K&R2).
Do you need to know the precedence of a unary operator (say applied to >>>> any of these terms) in order to correctly parse this:
-a-a-a a op1 b op2 c
?
'a b c' are terms, and 'op1 op2' are operators. You need to know their >>>> relative precedences in order to correctly parse this as either '(a
op1 b) op2 c' or 'a op1 (b op2 c)'. Any unary ops on those terms don't >>>> affect that.
That argument makes no sense.
You don't need to know where binary "-" fits in the precedence ordering
in order to correctly parse :
-a-a-a-aa + b / c
However, I doubt if you would be happy with a table of operators that
omitted binary minus.
What do you mean by 'binary "-"' and 'binary minus'?
As a binary operator, `-` refers to subtraction. Note that the
expression quoted above does not contain subtraction. Therefore
one does to need to know the precedence of the subtraction
operator to parse that expression.
'binary "-"' and 'binary minus' mean the same thing; in the
former he used a literal `-` character, and in the latter he
substituted the name of the symbol.
Are they both the
operator in "x - y" or did one or both mean the unary negation operator
in "-z"?
It says it right there on the tin, dude. It ain't that hard.
Disagree: in C, the only thing I've used the precedence table for is for
the relative precedence of op1 and op2 in examples like mine.
Not the flex you think it is....
On 15/05/2026 12:38, Bart wrote:
On 15/05/2026 09:32, David Brown wrote:
However, I doubt if you would be happy with a table of operators that
omitted binary minus.
What do you mean by 'binary "-"' and 'binary minus'? Are they both the
operator in "x - y" or did one or both mean the unary negation
operator in "-z"?
I meant "binary minus", written either "minus" or "-".-a If I had meant unary minus or negation, I would not have written "binary".
In my example, 'a b c' each represent arbitrary terms. These are
examples of such terms:
-a-a-a -x
-a-a-a &x
-a-a-a ++x[i]
-a-a-a x(i, j)
-a-a-a (x + y)
-a-a-a x.m--
-a-a-a -(+(-(sizeof(x))))
Yes.-a So?
What you wrote is that if you have an expression where only the binary operators are of relevance or interest, you only need a table of the
binary operators in order to understand the interaction between them.
I pointed out that the same logic applies if you have an expression
where only some binary operators are used - you only need a table with
those binary operators in order to understand the interactions.
There is no benefit in having multiple tables for the normal operators
in a language - it is simpler and clearer to put them all in one table. Isolating the unary operators is no more logical or useful than
isolating the binary minus operator.
Prefix and Postfix ops together, when clustered around a specific term,
have their own set of rules. They are quite different from binary ops.
scott@slp53.sl.home (Scott Lurndal) writes:
cross@spitfire.i.gajendra.net (Dan Cross) writes:[...]
In article <10u2jpk$2t96p$6@kst.eternal-september.org>,
If C's switch statement were to be
changed, it would have to use something that's currently a syntax >>>>error. Perhaps something like
case 1, case 2, case 3, case 4: whatever();
Sure, that's better.
case 1...4: whatever();
is a typical GCC extension (that we use heavily).
Yes, and the C2y draft adopts that syntax.
(One possible reason it wasn't adopted sooner is that `case 'a'...'z'` >doesn't necessarily work if the letters are not contiguous, for
example in EBCDIC.)
On 15/05/2026 14:54, Bart wrote:
Prefix and Postfix ops together, when clustered around a specific term,
have their own set of rules. They are quite different from binary ops.
I'm going to bail out here. This is not going anywhere.
Either people don't understand the subject, or are pretending not to, or >just want to have a go.
I have quite a bit of knowledge and practical experience of the subject, >even if people here don't like to admit that, but I'm poor at getting
the point across.
I will reconsider my claim that precedence tables don't need to include >anything beyond binary ops, if somebody can give a reference to such a
table in the C standard.
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
[...]
(One possible reason it wasn't adopted sooner is that `case 'a'...'z'`
doesn't necessarily work if the letters are not contiguous, for
example in EBCDIC.)
I think that's a bit far-fetched. Regular expressions
have the same EBCDIC related issues (i.e. the discontinuous
nature of the EBCDIC alpha translations); yet, there are
no other defined characters in the gaps between the
alpha groups in EBCDIC, so [a-z] or "case 'a'...'z':" would
probably work just fine in most cases to match lowercase EBCDIC
alpha text.
Worse case, it could be coded as
case 'a'...'i': /* FALLTHROUGH */
case 'j'...'r': /* FALLTHROUGH */
case 's'...'z':
do something;
break;
On 15/05/2026 14:54, Bart wrote:
Prefix and Postfix ops together, when clustered around a specific
term, have their own set of rules. They are quite different from
binary ops.
I'm going to bail out here. This is not going anywhere.
Either people don't understand the subject, or are pretending not to,
or just want to have a go.
I have quite a bit of knowledge and practical experience of the
subject, even if people here don't like to admit that, but I'm poor at getting the point across.
I will reconsider my claim that precedence tables don't need to
include anything beyond binary ops, if somebody can give a reference
to such a table in the C standard.
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
scott@slp53.sl.home (Scott Lurndal) writes:
cross@spitfire.i.gajendra.net (Dan Cross) writes:[...]
In article <10u2jpk$2t96p$6@kst.eternal-september.org>,
If C's switch statement were to be
changed, it would have to use something that's currently a syntax >>>>>error. Perhaps something like
case 1, case 2, case 3, case 4: whatever();
Sure, that's better.
case 1...4: whatever();
is a typical GCC extension (that we use heavily).
Yes, and the C2y draft adopts that syntax.
(One possible reason it wasn't adopted sooner is that `case 'a'...'z'` >>doesn't necessarily work if the letters are not contiguous, for
example in EBCDIC.)
I think that's a bit far-fetched. Regular expressions
have the same EBCDIC related issues (i.e. the discontinuous
nature of the EBCDIC alpha translations); yet, there are
no other defined characters in the gaps between the
alpha groups in EBCDIC, so [a-z] or "case 'a'...'z':" would
probably work just fine in most cases to match lowercase EBCDIC
alpha text.
Worse case, it could be coded as
case 'a'...'i': /* FALLTHROUGH */
case 'j'...'r': /* FALLTHROUGH */
case 's'...'z':
do something;
break;
Bart <bc@freeuk.com> writes:[...]
I will reconsider my claim that precedence tables don't need to
include anything beyond binary ops, if somebody can give a reference
to such a table in the C standard.
That's disingenuous. You know, because you've been told several
times in this thread, that there is precedence table in the C
standard. You also know that there is a precedence table, that
includes unary, postfix, binary, and ternary operators, in K&R2.
gcc and C2Y use "..." rather than ".." because it's an existing token,
used in variadic function declarations. `1..4` is actually a
preprocessing number, resulting in a syntax error when it's
converted to an integer constant.
Bart <bc@freeuk.com> writes:
On 15/05/2026 14:54, Bart wrote:
Prefix and Postfix ops together, when clustered around a specific
term, have their own set of rules. They are quite different from
binary ops.
I'm going to bail out here. This is not going anywhere.
Either people don't understand the subject, or are pretending not to,
or just want to have a go.
I have quite a bit of knowledge and practical experience of the
subject, even if people here don't like to admit that, but I'm poor at
getting the point across.
I think that most of us found your idea that precedence tables
should exclude unary ops to be so bizarre that we weren't sure you
actually meant it.
I will reconsider my claim that precedence tables don't need to
include anything beyond binary ops, if somebody can give a reference
to such a table in the C standard.
That's disingenuous. You know, because you've been told several
times in this thread, that there is precedence table in the C
standard.
You also know that there is a precedence table, that
includes unary, postfix, binary, and ternary operators, in K&R2.
Your personal preference for a precedence table that excludes unary
and postfix operators is perfectly valid for you. Other people's
preference for a table that includes all the operators is perfectly
valid for them. (The evidence so far suggests that the latter
includes everyone but you.)
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <10u2jpk$2t96p$6@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <10u0k0k$1l93l$30@dont-email.me>,[...]
It's easy to get wrong. Other languages accommodate both
semantics using alternation in the selector arm. For example,
one might imagine an hypothetical syntax, something like:
switch (a) {
case 1 || 2 || 3 || 4: whatever();
default: other();
}
...with no `break` to end each `case`.
That's already valid syntax.
It wasn't meant to be taken as a serious suggestion!
If C's switch statement were to be
changed, it would have to use something that's currently a syntax
error. Perhaps something like
case 1, case 2, case 3, case 4: whatever();
Sure, that's better.
case 1...4: whatever();
is a typical GCC extension (that we use heavily).
On 15/05/2026 20:23, Keith Thompson wrote:[...]
I think that most of us found your idea that precedence tables
should exclude unary ops to be so bizarre that we weren't sure you
actually meant it.
Well, I find it bizarre that they should!
That's disingenuous. You know, because you've been told several
times in this thread, that there is precedence table in the C
standard.
Whereabouts?
scott@slp53.sl.home (Scott Lurndal) writes:
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
scott@slp53.sl.home (Scott Lurndal) writes:
cross@spitfire.i.gajendra.net (Dan Cross) writes:[...]
In article <10u2jpk$2t96p$6@kst.eternal-september.org>,
If C's switch statement were to be
changed, it would have to use something that's currently a syntax >>>>>>error. Perhaps something like
case 1, case 2, case 3, case 4: whatever();
Sure, that's better.
case 1...4: whatever();
is a typical GCC extension (that we use heavily).
Yes, and the C2y draft adopts that syntax.
(One possible reason it wasn't adopted sooner is that `case 'a'...'z'` >>>doesn't necessarily work if the letters are not contiguous, for
example in EBCDIC.)
I think that's a bit far-fetched. Regular expressions
have the same EBCDIC related issues (i.e. the discontinuous
nature of the EBCDIC alpha translations); yet, there are
no other defined characters in the gaps between the
alpha groups in EBCDIC, so [a-z] or "case 'a'...'z':" would
probably work just fine in most cases to match lowercase EBCDIC
alpha text.
I understand there are different versions of EBCDIC. According to
the table in the Wikipedia article, '~' is between 'r' and 's',
'}' is between 'I' and 'J', and '\\' is between 'R' and 'S'.
In fact EBCDIC, though not mentioned by name, was part of the reason for
not supporting case ranges. Quoting the ANSI C Rationale:
[...]The fewer precedence groups you have the more parentheses you will
This is information about Go [...]
[snip table]
[...]
On 2026-05-15 22:39, Bart wrote:
[...]The fewer precedence groups you have the more parentheses you will
have to use in expressions.
And vice versa. - The actual choice is
a decision of the respective language designers.
Actually, ranges of about ten levels seem to be not uncommon amongst programming languages to provide a sensible, widely accepted grouping. Language designers seem to be trying to avoid a flood of unnecessary parentheses in programs.
And here's a precedence table for another language that facilitates...
that there's no parenthesis at all necessary to obtain unambiguous expressions:
-a-a lvl-a op
-a ---------
-a-a-a-a 1-a-a-a !!
-a-a-a-a 2-a-a-a !-o
-a-a-a-a 3-a-a-a !$
-a-a-a 61-a-a-a =/
-a-a-a 62-a-a-a =+
-a-a-a 63-a-a-a =~
-a-a-a 64-a-a-a ==
And here a variant with only one level of operator precedence,
all listed in a single group[*]
-a-a !! !-o !$ !% !/ !+ !~ != -o! -o-o -o$ -o% -o/ -o+ -o~ -o=
-a-a $! $-o $$ $% $/ $+ $~ $= %! %-o %$ %% %/ %+ %~ %=
-a-a /! /-o /$ /% // /+ /~ /= +! +-o +$ +% +/ ++ +~ +=
-a-a ~! ~-o ~$ ~% ~/ ~+ ~~ ~= =! =-o =$ =% =/ =+ =~ ==
Bart <bc@freeuk.com> writes:
The one for the ?: operator is particularly obscure, so in an
expression like one of these:
-a-a-a a + b ? c - d : e * f
-a-a-a a ? b ? c : d ? e : f : g
[...]
The lines are not meant to mean anything, just sequences of terms and operators. You can think of them as exercises where you add parentheses
to make them unambiguous.
In the case if C, it also decided that ?: belongs in this chart of /
binary/ operators. (I supposed you can consider each of ? and : as a
binary operator...)
On 2026-05-14 13:32, Bart wrote:
...
In the case if C, it also decided that ?: belongs in this chart of /
binary/ operators. (I supposed you can consider each of ? and : as a
binary operator...)
When and where was that decided? There's a big difference between
putting ?: in a chart along with binary and unary operators (which
happened in section 2.12 of the 1st edition of K&R) and putting it in a
chart of binary operators. To the best of my knowledge, the latter never happened. Could you please identify where it did?
If you read my post again, you'll find that 'this chart' most likely
refers to my suggested chart containing those 4 groups I mentioned.
That chart contains a set of binary (and infix) operators that also
appear in K&R2. I'm saying that whoever put that together in K&R2
decided that ?: belonged in this set.
(Personally I don't, as I don't consider a ?:-like feature to be an
operator,
but even it it was, a 3-way operator is a poor fit when all
the others are 2-way.)
| Sysop: | Amessyroom |
|---|---|
| Location: | Fayetteville, NC |
| Users: | 65 |
| Nodes: | 6 (0 / 6) |
| Uptime: | 06:17:23 |
| Calls: | 862 |
| Files: | 1,311 |
| D/L today: |
921 files (14,318M bytes) |
| Messages: | 264,699 |