Forum: Too Lazy BBS

Re: Unicode...

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Wed Dec 24 06:17:44 2025

From Newsgroup: comp.lang.c

On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:

Could you identify which document guarantees that every Unicode locale contains "UTF-8"?

How else would it work? Bytes have to be 8-bit.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Tue Dec 23 22:22:46 2025

From Newsgroup: comp.lang.c

Lawrence DrCOOliveiro <ldo@nz.invalid> writes:

On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:

Could you identify which document guarantees that every Unicode locale
contains "UTF-8"?

How else would it work? Bytes have to be 8-bit.

I can't figure out what point you're trying to make.

Obviously bytes in C have to be *at least* 8 bits, but I don't see
the relevance.

Take a look at the article to which you replied. How does your
followup have anything to do with it?

One of several points that you snipped is that locale names can
contain the string "utf8", not "UTF-8".
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.21a-Linux NewsLink 1.2

From Lynn McGuire@lynnmcguire5@gmail.com to comp.lang.c on Wed Dec 24 01:41:30 2025

From Newsgroup: comp.lang.c

On 12/24/2025 12:22 AM, Keith Thompson wrote:

Lawrence DrCOOliveiro <ldo@nz.invalid> writes:

On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:

Could you identify which document guarantees that every Unicode locale
contains "UTF-8"?

How else would it work? Bytes have to be 8-bit.

I can't figure out what point you're trying to make.

Obviously bytes in C have to be *at least* 8 bits, but I don't see
the relevance.

Take a look at the article to which you replied. How does your
followup have anything to do with it?

One of several points that you snipped is that locale names can
contain the string "utf8", not "UTF-8".

Did C never work on the 6 bit machines such as the Univac 1108 (36 bit)
or the CDC 7600 (60 bit) ?

Lynn

--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Wed Dec 24 11:24:04 2025

From Newsgroup: comp.lang.c

On Wed, 24 Dec 2025 01:41:30 -0600
Lynn McGuire <lynnmcguire5@gmail.com> wrote:

On 12/24/2025 12:22 AM, Keith Thompson wrote:

Lawrence DrCOOliveiro <ldo@nz.invalid> writes:

On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:

Could you identify which document guarantees that every Unicode
locale contains "UTF-8"?

How else would it work? Bytes have to be 8-bit.

I can't figure out what point you're trying to make.

Obviously bytes in C have to be *at least* 8 bits, but I don't see
the relevance.

Take a look at the article to which you replied. How does your
followup have anything to do with it?

One of several points that you snipped is that locale names can
contain the string "utf8", not "UTF-8".

Did C never work on the 6 bit machines such as the Univac 1108 (36
bit) or the CDC 7600 (60 bit) ?

Lynn

It depends on definition of the word C.
The requirement for CHAR_BIT > 7 was not present in K&R C. IIRC, it
first came in C90.
Also, what prevents C90 compiler from using 36-bit char on Univac 1108
and 60-bit bytes on CDC 7600? Methinks, it would be very reasonable.
By chance, that* was a choice made both by TI and by Analog for C
compilers of their word-addressable DSPs.
* - not specifically 36 or 60 bits, but CHAR_BIT = native word width.
--- Synchronet 3.21a-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Wed Dec 24 17:11:43 2025

From Newsgroup: comp.lang.c

Lynn McGuire <lynnmcguire5@gmail.com> writes:

On 12/24/2025 12:22 AM, Keith Thompson wrote:

Lawrence DrCOOliveiro <ldo@nz.invalid> writes:

On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:

Could you identify which document guarantees that every Unicode locale >>>> contains "UTF-8"?

How else would it work? Bytes have to be 8-bit.

I can't figure out what point you're trying to make.

Obviously bytes in C have to be *at least* 8 bits, but I don't see
the relevance.

Take a look at the article to which you replied. How does your
followup have anything to do with it?

One of several points that you snipped is that locale names can
contain the string "utf8", not "UTF-8".

Did C never work on the 6 bit machines such as the Univac 1108 (36 bit)

Yes, there is a C compiler for the Univac machines. The byte size is
9 bits.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Lynn McGuire@lynnmcguire5@gmail.com to comp.lang.c on Thu Dec 25 02:00:16 2025

From Newsgroup: comp.lang.c

On 12/24/2025 11:11 AM, Scott Lurndal wrote:

Lynn McGuire <lynnmcguire5@gmail.com> writes:

On 12/24/2025 12:22 AM, Keith Thompson wrote:

Lawrence DrCOOliveiro <ldo@nz.invalid> writes:

On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:

Could you identify which document guarantees that every Unicode locale >>>>> contains "UTF-8"?

How else would it work? Bytes have to be 8-bit.

I can't figure out what point you're trying to make.

Obviously bytes in C have to be *at least* 8 bits, but I don't see
the relevance.

Take a look at the article to which you replied. How does your
followup have anything to do with it?

One of several points that you snipped is that locale names can
contain the string "utf8", not "UTF-8".

Did C never work on the 6 bit machines such as the Univac 1108 (36 bit)

Yes, there is a C compiler for the Univac machines. The byte size is
9 bits.

I get the feeling that you are messing with me. That would be four 9
bit characters per 36 bit word.

But the machinations to store that unnatural 9 bits would be crazy. I
doubt that would be supported in hardware.

Lynn

--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Thu Dec 25 10:49:01 2025

From Newsgroup: comp.lang.c

On Thu, 25 Dec 2025 02:00:16 -0600
Lynn McGuire <lynnmcguire5@gmail.com> wrote:

On 12/24/2025 11:11 AM, Scott Lurndal wrote:

Lynn McGuire <lynnmcguire5@gmail.com> writes:

On 12/24/2025 12:22 AM, Keith Thompson wrote:

Lawrence DrCOOliveiro <ldo@nz.invalid> writes:

On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:

Could you identify which document guarantees that every Unicode
locale contains "UTF-8"?

How else would it work? Bytes have to be 8-bit.

I can't figure out what point you're trying to make.

Obviously bytes in C have to be *at least* 8 bits, but I don't see
the relevance.

Take a look at the article to which you replied. How does your
followup have anything to do with it?

One of several points that you snipped is that locale names can
contain the string "utf8", not "UTF-8".

Did C never work on the 6 bit machines such as the Univac 1108 (36
bit)

Yes, there is a C compiler for the Univac machines. The byte size
is 9 bits.

I get the feeling that you are messing with me. That would be four 9
bit characters per 36 bit word.

But the machinations to store that unnatural 9 bits would be crazy.
I doubt that would be supported in hardware.

Lynn

Does not the same apply even stronger to your original suggestion to
use 6-bit characters?
--- Synchronet 3.21a-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Thu Dec 25 10:22:11 2025

From Newsgroup: comp.lang.c

On 2025-12-25 09:49, Michael S wrote:

On Thu, 25 Dec 2025 02:00:16 -0600
Lynn McGuire <lynnmcguire5@gmail.com> wrote:

On 12/24/2025 11:11 AM, Scott Lurndal wrote:

Lynn McGuire <lynnmcguire5@gmail.com> writes:

Did C never work on the 6 bit machines such as the Univac 1108 (36
bit)

Yes, there is a C compiler for the Univac machines. The byte size
is 9 bits.

I get the feeling that you are messing with me. That would be four 9
bit characters per 36 bit word.

But the machinations to store that unnatural 9 bits would be crazy.
I doubt that would be supported in hardware.

Does not the same apply even stronger to your original suggestion to
use 6-bit characters?

I don't recall whether the mainframes I used - and which of them - had
actually a "C" compiler; I think our 360-clone(?) at least had one. All
I can say is that it seems natural to support characters of appropriate
sizes. Our CDC (175 or 176; 60 bit) had used in Pascal 6 bit characters
(the 'text' data type was a 'packed array [1..10] of character'). And
I'd suppose that a 36 bit based architecture might use 9 bit characters
(or maybe use the spare bit just for error checking, or ignore it?).
Anyway, in my K&R version there's the "Honeywell 6000" hardware listed
with a 9 bit 'char' type.

Janis

--- Synchronet 3.21a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Fri Dec 26 02:03:15 2025

From Newsgroup: comp.lang.c

On Wed, 19 Nov 2025 09:08:10 -0500, James Kuyper wrote:

There might be documents specifying locale naming standards, but I'm not aware of any.

The GNU convention is perhaps the most popular.

<https://wiki.wlug.org.nz/LocaleName>
--- Synchronet 3.21a-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Fri Dec 26 16:28:10 2025

From Newsgroup: comp.lang.c

Michael S <already5chosen@yahoo.com> writes:

On Thu, 25 Dec 2025 02:00:16 -0600
Lynn McGuire <lynnmcguire5@gmail.com> wrote:

Did C never work on the 6 bit machines such as the Univac 1108 (36
bit) =20

=20
Yes, there is a C compiler for the Univac machines. The byte size
is 9 bits. =20

=20
I get the feeling that you are messing with me. That would be four 9=20
bit characters per 36 bit word.

Indeed, that would be the case.

You know, you can always look this stuff up.

https://en.wikipedia.org/wiki/UNIVAC_1100/2200_series#Data_formats
--- Synchronet 3.21a-Linux NewsLink 1.2

From Lynn McGuire@lynnmcguire5@gmail.com to comp.lang.c on Sat Dec 27 00:25:51 2025

From Newsgroup: comp.lang.c

On 12/26/2025 10:28 AM, Scott Lurndal wrote:

Michael S <already5chosen@yahoo.com> writes:

On Thu, 25 Dec 2025 02:00:16 -0600
Lynn McGuire <lynnmcguire5@gmail.com> wrote:

Did C never work on the 6 bit machines such as the Univac 1108 (36
bit) =20

=20
Yes, there is a C compiler for the Univac machines. The byte size
is 9 bits. =20

=20
I get the feeling that you are messing with me. That would be four 9=20 >>> bit characters per 36 bit word.

Indeed, that would be the case.

You know, you can always look this stuff up.

https://en.wikipedia.org/wiki/UNIVAC_1100/2200_series#Data_formats

Wild. I wrote Fortran IV/66 software on Univac 1108 from 1975 to 1980
and never knew that it had quarter word instructions. We stored 6
characters in the 36 bit words (all upper case) until we ported to the
IBM 370 in 1978 or 1979 when we had to switch to four characters per word.

You know, we ported to the Prime 450 in 1977 when we bought one. If I remember correctly, the Prime was a 32 bit word / 8 bit byte machine so
we did the 4 characters max for a integer on that port, not the IBM 370
port. All those years run together now so I am not sure which and what
port happened when at all.

It was a major change in our software and used a lot more ram in storing characters in integer arrays. We did not move to Fortran 77 until 1990
or so since the mainframe vendors charged a lot more to use the F77
compiler instead of the F66 compiler, compile time was way slower also.

Lynn

--- Synchronet 3.21a-Linux NewsLink 1.2

From Lynn McGuire@lynnmcguire5@gmail.com to comp.lang.c on Sat Dec 27 00:29:47 2025

From Newsgroup: comp.lang.c

On 12/25/2025 2:49 AM, Michael S wrote:

On Thu, 25 Dec 2025 02:00:16 -0600
Lynn McGuire <lynnmcguire5@gmail.com> wrote:

On 12/24/2025 11:11 AM, Scott Lurndal wrote:

Lynn McGuire <lynnmcguire5@gmail.com> writes:

On 12/24/2025 12:22 AM, Keith Thompson wrote:

Lawrence DrCOOliveiro <ldo@nz.invalid> writes:

On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:

Could you identify which document guarantees that every Unicode
locale contains "UTF-8"?

How else would it work? Bytes have to be 8-bit.

I can't figure out what point you're trying to make.

Obviously bytes in C have to be *at least* 8 bits, but I don't see
the relevance.

Take a look at the article to which you replied. How does your
followup have anything to do with it?

One of several points that you snipped is that locale names can
contain the string "utf8", not "UTF-8".

Did C never work on the 6 bit machines such as the Univac 1108 (36
bit)

Yes, there is a C compiler for the Univac machines. The byte size
is 9 bits.

I get the feeling that you are messing with me. That would be four 9
bit characters per 36 bit word.

But the machinations to store that unnatural 9 bits would be crazy.
I doubt that would be supported in hardware.

Lynn

Does not the same apply even stronger to your original suggestion to
use 6-bit characters?

Those 6 bit characters, upper case only, were on the 36 bit (Univac
1108) or 60 bit (CDC 7600) machines. Those machines were native 6 bit
bytes, at 6 bytes per word or 10 bytes per word.

Those machines were superseded by the 32 bit machines with 8 bit
characters. And now we have the 64 bit machines with 8 bit characters.
We will have 128 bit machines soon in the relative sense, if not already.

Lynn

--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Sat Dec 27 18:08:38 2025

From Newsgroup: comp.lang.c

On Sat, 27 Dec 2025 00:29:47 -0600
Lynn McGuire <lynnmcguire5@gmail.com> wrote:

On 12/25/2025 2:49 AM, Michael S wrote:

On Thu, 25 Dec 2025 02:00:16 -0600
Lynn McGuire <lynnmcguire5@gmail.com> wrote:

On 12/24/2025 11:11 AM, Scott Lurndal wrote:

Lynn McGuire <lynnmcguire5@gmail.com> writes:

On 12/24/2025 12:22 AM, Keith Thompson wrote:

Lawrence DrCOOliveiro <ldo@nz.invalid> writes:

On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:

Could you identify which document guarantees that every
Unicode locale contains "UTF-8"?

How else would it work? Bytes have to be 8-bit.

I can't figure out what point you're trying to make.

Obviously bytes in C have to be *at least* 8 bits, but I don't
see the relevance.

Take a look at the article to which you replied. How does your
followup have anything to do with it?

One of several points that you snipped is that locale names can
contain the string "utf8", not "UTF-8".

Did C never work on the 6 bit machines such as the Univac 1108
(36 bit)

Yes, there is a C compiler for the Univac machines. The byte
size is 9 bits.

I get the feeling that you are messing with me. That would be
four 9 bit characters per 36 bit word.

But the machinations to store that unnatural 9 bits would be crazy.
I doubt that would be supported in hardware.

Lynn

Does not the same apply even stronger to your original suggestion to
use 6-bit characters?

Those 6 bit characters, upper case only, were on the 36 bit (Univac
1108) or 60 bit (CDC 7600) machines. Those machines were native 6
bit bytes, at 6 bytes per word or 10 bytes per word.

In what way were 6-bit bytes "native" ?
I don't know much about either machine and not in the mood to look up,
but would be surprised if it was much more than software convention.
Especially so in CDC case.

Those machines were with 8 bit
characters. And now we have the 64 bit machines with 8 bit
characters.

Lynn

I think that you are looking at it from the wrong angle. The right angle
is not "superseded by the 32 bit machines", but "word-addressable
machines were superseded by the octet-addressable machines".

We will have 128 bit machines soon in the relative sense,
if not already.

Using the way you look at it (width of machine = width of its widest
data register) we already have commodity general-purpose 512-bit
machines for exactly ten years.
But that's a wrong way to look.
--- Synchronet 3.21a-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Sat Dec 27 19:17:46 2025

From Newsgroup: comp.lang.c

Lynn McGuire <lynnmcguire5@gmail.com> writes:

On 12/25/2025 2:49 AM, Michael S wrote:

On Thu, 25 Dec 2025 02:00:16 -0600
Lynn McGuire <lynnmcguire5@gmail.com> wrote:

Yes, there is a C compiler for the Univac machines. The byte size
is 9 bits.

I get the feeling that you are messing with me. That would be four 9
bit characters per 36 bit word.

But the machinations to store that unnatural 9 bits would be crazy.
I doubt that would be supported in hardware.

Lynn

Does not the same apply even stronger to your original suggestion to
use 6-bit characters?

Those 6 bit characters, upper case only, were on the 36 bit (Univac
1108) or 60 bit (CDC 7600) machines. Those machines were native 6 bit >bytes, at 6 bytes per word or 10 bytes per word.

6-bit (DEC SixBit) characters were also supported by the PDP-8 (12-bit words), the PDP-9 (18-bit words) and the PDP-10 (36-bit words).

Most Burroughs systems also used 6-bit containers for BCDIC (alphanumeric) coding before EBCDIC extended BCDIC to 8-bits. Not to mention older
IBM systems (e.g. 702/704 series and the widely sold 1401's). The Burroughs Large Systems (48-bit word size) could store 8 BCDIC or 6 EBCDIC characters
per word. Medium systems addressed to the nibble (digit), so there wasn't an architectural word size per se; bytes were just stored as two consecutive digits starting at an even address (the wasn't a native 6-bit type, so
BCDIC characters were stored in two consecutive nibbles.

There is an ISO standard for 6-bit characters (ISO 646).

7-track magnetic tape drives only stored 6-bit characters (plus parity).
--- Synchronet 3.21a-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Sat Dec 27 20:47:37 2025

From Newsgroup: comp.lang.c

On 2025-12-27 20:17, Scott Lurndal wrote:

There is an ISO standard for 6-bit characters (ISO 646).

I think you're confusing something here. ISO 646 (AKA IA5) is
a set of 7 bit character sets (with national variants) mostly
resembling ASCII.

Janis

--- Synchronet 3.21a-Linux NewsLink 1.2

From Lew Pitcher@lew.pitcher@digitalfreehold.ca to comp.lang.c on Sat Dec 27 20:03:34 2025

From Newsgroup: comp.lang.c

On Sat, 27 Dec 2025 20:47:37 +0100, Janis Papanagnou wrote:

On 2025-12-27 20:17, Scott Lurndal wrote:

There is an ISO standard for 6-bit characters (ISO 646).

I think you're confusing something here. ISO 646 (AKA IA5) is
a set of 7 bit character sets (with national variants) mostly
resembling ASCII.

Janis

If you trust Wikipedia, the article "Six Bit character code" (https://en.wikipedia.org/wiki/Six-bit_character_code) mentions
that "ISO Recommendation R 646-1967" included a 6bit code, which
was dropped when they issued ISO 646-1973.

I have neither standard available, so I can't confirm. However,
I should note that the ECMA still publishes (and makes freely
available) their version of ISO 646: ECMA-006 "7 Bit coded
Character Set".
--
Lew Pitcher
"In Skills We Trust"
Not LLM output - I'm just like this.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Lew Pitcher@lew.pitcher@digitalfreehold.ca to comp.lang.c on Sat Dec 27 20:05:33 2025

From Newsgroup: comp.lang.c

On Sat, 27 Dec 2025 20:03:34 +0000, Lew Pitcher wrote:

On Sat, 27 Dec 2025 20:47:37 +0100, Janis Papanagnou wrote:

On 2025-12-27 20:17, Scott Lurndal wrote:

There is an ISO standard for 6-bit characters (ISO 646).

I think you're confusing something here. ISO 646 (AKA IA5) is
a set of 7 bit character sets (with national variants) mostly
resembling ASCII.

Janis

If you trust Wikipedia, the article "Six Bit character code" (https://en.wikipedia.org/wiki/Six-bit_character_code) mentions
that "ISO Recommendation R 646-1967" included a 6bit code, which
was dropped when they issued ISO 646-1973.

I have neither standard available, so I can't confirm. However,
I should note that the ECMA still publishes (and makes freely
available) their version of ISO 646: ECMA-006 "7 Bit coded
Character Set".

Apparently, ECMA still publishes (and makes available for free download)
their 6-bit characterset standard: ECMA-001 (https://ecma-international.org/publications-and-standards/standards/ecma-1/) --
Lew Pitcher
"In Skills We Trust"
Not LLM output - I'm just like this.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Sat Dec 27 22:43:47 2025

From Newsgroup: comp.lang.c

On 2025-12-27 21:03, Lew Pitcher wrote:

On Sat, 27 Dec 2025 20:47:37 +0100, Janis Papanagnou wrote:

On 2025-12-27 20:17, Scott Lurndal wrote:

There is an ISO standard for 6-bit characters (ISO 646).

I think you're confusing something here. ISO 646 (AKA IA5) is
a set of 7 bit character sets (with national variants) mostly
resembling ASCII.

If you trust Wikipedia, the article "Six Bit character code" (https://en.wikipedia.org/wiki/Six-bit_character_code) mentions
that "ISO Recommendation R 646-1967" included a 6bit code, which
was dropped when they issued ISO 646-1973.

Contents of an historic superseded ISO "Recommendation" is of
no relevance if we want to *classify* the "ISO Standard 646".
And more so if we are looking specifically for 6 bit character
sets; there are such beasts, but in specific other places[*].
But ISO 646 in all its existing national variances is purely 7
bit.

(Above still quoted statement just needed correction, and I
don't think it's worth any further discussion beyond that.)

Janis

[*] Note that there's other character codes "standards", also 5
or 6 bit, used in telephone, telegraph, and similar contexts -
the CCITT (now ITU-T) provided some; their standards are called "Recommendations", BTW, and often adopted by ISO as "Standards".
Many are irrelevant nowadays or deprecated. (See [**] and search
the page for "Character String Types", if you're interested.)

[**] https://www.oss.com/asn1/resources/asn1-made-simple/asn1-quick-reference.html

[...]

--- Synchronet 3.21a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Mon Dec 29 23:34:22 2025

From Newsgroup: comp.lang.c

On Sat, 27 Dec 2025 00:25:51 -0600, Lynn McGuire wrote:

We did not move to Fortran 77 until 1990 or so since the mainframe
vendors charged a lot more to use the F77 compiler instead of the
F66 compiler, compile time was way slower also.

Just in time for Fortran-90 to come out ...
--- Synchronet 3.21a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Mon Dec 29 23:38:36 2025

From Newsgroup: comp.lang.c

On Sat, 27 Dec 2025 18:08:38 +0200, Michael S wrote:

Using the way you look at it (width of machine = width of its widest
data register) we already have commodity general-purpose 512-bit
machines for exactly ten years.
But that's a wrong way to look.

In the days before byte-addressability, the rCLword lengthrCY of a machine
was a (mostly) pretty obvious thing to determine.

Byte addressability did muddy the waters somewhat. For example, the
original Motorola 68000 processor was widely considered to be
rCL16-bitrCY, even though it had 32-bit address fields and 32-bit
architectural registers, and the whole instruction set design was
clearly meant to be a cut-down 32-bit architecture. (And indeed the
later full-32-bit 68020 processor differed in its instruction set
mainly in the filling in of a few gaps.)

Whereas, rCL64-bitrCY processors were considered to be rCL64-bitrCY seemingly based on their support for 64-bit addresses. Being able to operate on
64-bit quantities was not enough.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Mon Dec 29 23:39:23 2025

From Newsgroup: comp.lang.c

On Sat, 27 Dec 2025 20:05:33 -0000 (UTC), Lew Pitcher wrote:

Apparently, ECMA still publishes (and makes available for free download) their 6-bit characterset standard: ECMA-001 (https://ecma-international.org/publications-and-standards/standards/ecma-1/)

Does it say on the cover: rCLMy First ECMA SpecrCY ... ? ;)
--- Synchronet 3.21a-Linux NewsLink 1.2

From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Wed Dec 31 18:04:59 2025

From Newsgroup: comp.lang.c

On 2025-12-24 01:17, Lawrence DrCOOliveiro wrote:

On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:

Sorry for the delay in my response. We spent the last week in Orlando,
without access to usenet.

Could you identify which document guarantees that every Unicode locale
contains "UTF-8"?

How else would it work? ...

I think you're missing the point of my question. Take a look at the list
from my message - none of the locales on my system have names that
contain "UTF-8". Most of them contain "utf8" instead, and many have neither.

As I mentioned in my last message, there are also many different
possible encodings for Unicode. MS has used both UCS-2 and UTF-16.
Chinese government systems use GB18030. Would you expect locales using
those encodings to have names that contained "UTF-8"?

...Bytes have to be 8-bit.

Incorrect - the only requirement is that CHAR_BIT >= 8. There are real
systems where CHAR_BIT == 16. There have been real machines where
CHAR_BIT==9 would have been the most reasonable option.

I'm not sure why you mentioned that, however. Why do you think it's
relevant? UCS-2 and UTF-16 have no problem existing on machines with
8-bit bytes; they just occupy two such bytes. UTF-32 would occupy 4 of them.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Wed Dec 31 23:11:26 2025

From Newsgroup: comp.lang.c

On Wed, 31 Dec 2025 18:04:59 -0500, James Kuyper wrote:

On 2025-12-24 01:17, Lawrence DrCOOliveiro wrote:

...Bytes have to be 8-bit.

Incorrect - the only requirement is that CHAR_BIT >= 8. There are
real systems where CHAR_BIT == 16. There have been real machines
where CHAR_BIT==9 would have been the most reasonable option.

Those are sizes of rCLcharactersrCY, not rCLbytesrCY though, are they.
--- Synchronet 3.21a-Linux NewsLink 1.2

From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Wed Dec 31 18:36:35 2025

From Newsgroup: comp.lang.c

On 2025-12-31 18:11, Lawrence DrCOOliveiro wrote:

On Wed, 31 Dec 2025 18:04:59 -0500, James Kuyper wrote:

On 2025-12-24 01:17, Lawrence DrCOOliveiro wrote:

...Bytes have to be 8-bit.

Incorrect - the only requirement is that CHAR_BIT >= 8. There are
real systems where CHAR_BIT == 16. There have been real machines
where CHAR_BIT==9 would have been the most reasonable option.

Those are sizes of rCLcharactersrCY, not rCLbytesrCY though, are they.

No. In the C standard, a byte is a unit of measurement for memory. The
size of a byte is CHAR_BIT bits, which is implementation-defined. A
character is something that you can store in memory. A byte is required
to be large enough to store any member of the basic character set of the execution environment. Since the basic character set need only contain
96 characters, and CHAR_BIT is required to be >=8, that's not an onerous requirement.

--- Synchronet 3.21a-Linux NewsLink 1.2

Who's Online

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	54
Nodes:	6 (0 / 6)
Uptime:	04:46:57
Calls:	743
Files:	1,218
Messages:	188,612

Re: Unicode...

Who's Online

System Info