Could you identify which document guarantees that every Unicode locale contains "UTF-8"?
On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:
Could you identify which document guarantees that every Unicode locale
contains "UTF-8"?
How else would it work? Bytes have to be 8-bit.
Lawrence DrCOOliveiro <ldo@nz.invalid> writes:
On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:
Could you identify which document guarantees that every Unicode locale
contains "UTF-8"?
How else would it work? Bytes have to be 8-bit.
I can't figure out what point you're trying to make.
Obviously bytes in C have to be *at least* 8 bits, but I don't see
the relevance.
Take a look at the article to which you replied. How does your
followup have anything to do with it?
One of several points that you snipped is that locale names can
contain the string "utf8", not "UTF-8".
On 12/24/2025 12:22 AM, Keith Thompson wrote:
Lawrence DrCOOliveiro <ldo@nz.invalid> writes:
On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:
Could you identify which document guarantees that every Unicode
locale contains "UTF-8"?
How else would it work? Bytes have to be 8-bit.
I can't figure out what point you're trying to make.
Obviously bytes in C have to be *at least* 8 bits, but I don't see
the relevance.
Take a look at the article to which you replied. How does your
followup have anything to do with it?
One of several points that you snipped is that locale names can
contain the string "utf8", not "UTF-8".
Did C never work on the 6 bit machines such as the Univac 1108 (36
bit) or the CDC 7600 (60 bit) ?
Lynn
On 12/24/2025 12:22 AM, Keith Thompson wrote:
Lawrence DrCOOliveiro <ldo@nz.invalid> writes:
On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:
Could you identify which document guarantees that every Unicode locale >>>> contains "UTF-8"?
How else would it work? Bytes have to be 8-bit.
I can't figure out what point you're trying to make.
Obviously bytes in C have to be *at least* 8 bits, but I don't see
the relevance.
Take a look at the article to which you replied. How does your
followup have anything to do with it?
One of several points that you snipped is that locale names can
contain the string "utf8", not "UTF-8".
Did C never work on the 6 bit machines such as the Univac 1108 (36 bit)
Lynn McGuire <lynnmcguire5@gmail.com> writes:
On 12/24/2025 12:22 AM, Keith Thompson wrote:
Lawrence DrCOOliveiro <ldo@nz.invalid> writes:
On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:
Could you identify which document guarantees that every Unicode locale >>>>> contains "UTF-8"?
How else would it work? Bytes have to be 8-bit.
I can't figure out what point you're trying to make.
Obviously bytes in C have to be *at least* 8 bits, but I don't see
the relevance.
Take a look at the article to which you replied. How does your
followup have anything to do with it?
One of several points that you snipped is that locale names can
contain the string "utf8", not "UTF-8".
Did C never work on the 6 bit machines such as the Univac 1108 (36 bit)
Yes, there is a C compiler for the Univac machines. The byte size is
9 bits.
On 12/24/2025 11:11 AM, Scott Lurndal wrote:
Lynn McGuire <lynnmcguire5@gmail.com> writes:
On 12/24/2025 12:22 AM, Keith Thompson wrote:
Lawrence DrCOOliveiro <ldo@nz.invalid> writes:
On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:
Could you identify which document guarantees that every Unicode
locale contains "UTF-8"?
How else would it work? Bytes have to be 8-bit.
I can't figure out what point you're trying to make.
Obviously bytes in C have to be *at least* 8 bits, but I don't see
the relevance.
Take a look at the article to which you replied. How does your
followup have anything to do with it?
One of several points that you snipped is that locale names can
contain the string "utf8", not "UTF-8".
Did C never work on the 6 bit machines such as the Univac 1108 (36
bit)
Yes, there is a C compiler for the Univac machines. The byte size
is 9 bits.
I get the feeling that you are messing with me. That would be four 9
bit characters per 36 bit word.
But the machinations to store that unnatural 9 bits would be crazy.
I doubt that would be supported in hardware.
Lynn
On Thu, 25 Dec 2025 02:00:16 -0600
Lynn McGuire <lynnmcguire5@gmail.com> wrote:
On 12/24/2025 11:11 AM, Scott Lurndal wrote:
Lynn McGuire <lynnmcguire5@gmail.com> writes:
Did C never work on the 6 bit machines such as the Univac 1108 (36
bit)
Yes, there is a C compiler for the Univac machines. The byte size
is 9 bits.
I get the feeling that you are messing with me. That would be four 9
bit characters per 36 bit word.
But the machinations to store that unnatural 9 bits would be crazy.
I doubt that would be supported in hardware.
Does not the same apply even stronger to your original suggestion to
use 6-bit characters?
There might be documents specifying locale naming standards, but I'm not aware of any.
On Thu, 25 Dec 2025 02:00:16 -0600
Lynn McGuire <lynnmcguire5@gmail.com> wrote:
=20Did C never work on the 6 bit machines such as the Univac 1108 (36=20
bit) =20
Yes, there is a C compiler for the Univac machines. The byte size
is 9 bits. =20
I get the feeling that you are messing with me. That would be four 9=20
bit characters per 36 bit word.
Michael S <already5chosen@yahoo.com> writes:
On Thu, 25 Dec 2025 02:00:16 -0600
Lynn McGuire <lynnmcguire5@gmail.com> wrote:
=20Did C never work on the 6 bit machines such as the Univac 1108 (36=20
bit) =20
Yes, there is a C compiler for the Univac machines. The byte size
is 9 bits. =20
I get the feeling that you are messing with me. That would be four 9=20 >>> bit characters per 36 bit word.
Indeed, that would be the case.
You know, you can always look this stuff up.
https://en.wikipedia.org/wiki/UNIVAC_1100/2200_series#Data_formats
On Thu, 25 Dec 2025 02:00:16 -0600
Lynn McGuire <lynnmcguire5@gmail.com> wrote:
On 12/24/2025 11:11 AM, Scott Lurndal wrote:
Lynn McGuire <lynnmcguire5@gmail.com> writes:
On 12/24/2025 12:22 AM, Keith Thompson wrote:
Lawrence DrCOOliveiro <ldo@nz.invalid> writes:
On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:
Could you identify which document guarantees that every Unicode
locale contains "UTF-8"?
How else would it work? Bytes have to be 8-bit.
I can't figure out what point you're trying to make.
Obviously bytes in C have to be *at least* 8 bits, but I don't see
the relevance.
Take a look at the article to which you replied. How does your
followup have anything to do with it?
One of several points that you snipped is that locale names can
contain the string "utf8", not "UTF-8".
Did C never work on the 6 bit machines such as the Univac 1108 (36
bit)
Yes, there is a C compiler for the Univac machines. The byte size
is 9 bits.
I get the feeling that you are messing with me. That would be four 9
bit characters per 36 bit word.
But the machinations to store that unnatural 9 bits would be crazy.
I doubt that would be supported in hardware.
Lynn
Does not the same apply even stronger to your original suggestion to
use 6-bit characters?
On 12/25/2025 2:49 AM, Michael S wrote:
On Thu, 25 Dec 2025 02:00:16 -0600
Lynn McGuire <lynnmcguire5@gmail.com> wrote:
On 12/24/2025 11:11 AM, Scott Lurndal wrote:
Lynn McGuire <lynnmcguire5@gmail.com> writes:
On 12/24/2025 12:22 AM, Keith Thompson wrote:
Lawrence DrCOOliveiro <ldo@nz.invalid> writes:
On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:
Could you identify which document guarantees that every
Unicode locale contains "UTF-8"?
How else would it work? Bytes have to be 8-bit.
I can't figure out what point you're trying to make.
Obviously bytes in C have to be *at least* 8 bits, but I don't
see the relevance.
Take a look at the article to which you replied. How does your
followup have anything to do with it?
One of several points that you snipped is that locale names can
contain the string "utf8", not "UTF-8".
Did C never work on the 6 bit machines such as the Univac 1108
(36 bit)
Yes, there is a C compiler for the Univac machines. The byte
size is 9 bits.
I get the feeling that you are messing with me. That would be
four 9 bit characters per 36 bit word.
But the machinations to store that unnatural 9 bits would be crazy.
I doubt that would be supported in hardware.
Lynn
Does not the same apply even stronger to your original suggestion to
use 6-bit characters?
Those 6 bit characters, upper case only, were on the 36 bit (Univac
1108) or 60 bit (CDC 7600) machines. Those machines were native 6
bit bytes, at 6 bytes per word or 10 bytes per word.
Those machines were with 8 bit
characters. And now we have the 64 bit machines with 8 bit
characters.
Lynn
We will have 128 bit machines soon in the relative sense,Using the way you look at it (width of machine = width of its widest
if not already.
On 12/25/2025 2:49 AM, Michael S wrote:
On Thu, 25 Dec 2025 02:00:16 -0600
Lynn McGuire <lynnmcguire5@gmail.com> wrote:
Yes, there is a C compiler for the Univac machines. The byte size
is 9 bits.
I get the feeling that you are messing with me. That would be four 9
bit characters per 36 bit word.
But the machinations to store that unnatural 9 bits would be crazy.
I doubt that would be supported in hardware.
Lynn
Does not the same apply even stronger to your original suggestion to
use 6-bit characters?
Those 6 bit characters, upper case only, were on the 36 bit (Univac
1108) or 60 bit (CDC 7600) machines. Those machines were native 6 bit >bytes, at 6 bytes per word or 10 bytes per word.
There is an ISO standard for 6-bit characters (ISO 646).
On 2025-12-27 20:17, Scott Lurndal wrote:
There is an ISO standard for 6-bit characters (ISO 646).
I think you're confusing something here. ISO 646 (AKA IA5) is
a set of 7 bit character sets (with national variants) mostly
resembling ASCII.
Janis
On Sat, 27 Dec 2025 20:47:37 +0100, Janis Papanagnou wrote:
On 2025-12-27 20:17, Scott Lurndal wrote:
There is an ISO standard for 6-bit characters (ISO 646).
I think you're confusing something here. ISO 646 (AKA IA5) is
a set of 7 bit character sets (with national variants) mostly
resembling ASCII.
Janis
If you trust Wikipedia, the article "Six Bit character code" (https://en.wikipedia.org/wiki/Six-bit_character_code) mentions
that "ISO Recommendation R 646-1967" included a 6bit code, which
was dropped when they issued ISO 646-1973.
I have neither standard available, so I can't confirm. However,
I should note that the ECMA still publishes (and makes freely
available) their version of ISO 646: ECMA-006 "7 Bit coded
Character Set".
On Sat, 27 Dec 2025 20:47:37 +0100, Janis Papanagnou wrote:
On 2025-12-27 20:17, Scott Lurndal wrote:
There is an ISO standard for 6-bit characters (ISO 646).
I think you're confusing something here. ISO 646 (AKA IA5) is
a set of 7 bit character sets (with national variants) mostly
resembling ASCII.
If you trust Wikipedia, the article "Six Bit character code" (https://en.wikipedia.org/wiki/Six-bit_character_code) mentions
that "ISO Recommendation R 646-1967" included a 6bit code, which
was dropped when they issued ISO 646-1973.
[...]
We did not move to Fortran 77 until 1990 or so since the mainframe
vendors charged a lot more to use the F77 compiler instead of the
F66 compiler, compile time was way slower also.
Using the way you look at it (width of machine = width of its widest
data register) we already have commodity general-purpose 512-bit
machines for exactly ten years.
But that's a wrong way to look.
Apparently, ECMA still publishes (and makes available for free download) their 6-bit characterset standard: ECMA-001 (https://ecma-international.org/publications-and-standards/standards/ecma-1/)
On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:
Could you identify which document guarantees that every Unicode locale
contains "UTF-8"?
How else would it work? ...
...Bytes have to be 8-bit.
On 2025-12-24 01:17, Lawrence DrCOOliveiro wrote:
...Bytes have to be 8-bit.
Incorrect - the only requirement is that CHAR_BIT >= 8. There are
real systems where CHAR_BIT == 16. There have been real machines
where CHAR_BIT==9 would have been the most reasonable option.
On Wed, 31 Dec 2025 18:04:59 -0500, James Kuyper wrote:
On 2025-12-24 01:17, Lawrence DrCOOliveiro wrote:
...Bytes have to be 8-bit.
Incorrect - the only requirement is that CHAR_BIT >= 8. There are
real systems where CHAR_BIT == 16. There have been real machines
where CHAR_BIT==9 would have been the most reasonable option.
Those are sizes of rCLcharactersrCY, not rCLbytesrCY though, are they.
| Sysop: | Amessyroom |
|---|---|
| Location: | Fayetteville, NC |
| Users: | 54 |
| Nodes: | 6 (0 / 6) |
| Uptime: | 04:46:57 |
| Calls: | 743 |
| Files: | 1,218 |
| Messages: | 188,612 |