Forum: Too Lazy BBS

=?UTF-8?Q?Re=3A_Browser=E2=80=91only_HMAC=E2=80=91based_toy_cipher_?= =?UTF-8?Q?demo_=28DrMoron=29_=E2=80=94_now_live_with_URL=E2=80=91encoded_ci?= =?UTF-8?Q?phertext?=

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Sun Feb 15 11:53:00 2026

From Newsgroup: sci.crypt

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

Features:

[...]

arbitrary Unicode passwords supported

BTW: Consider to read <https://www.unicode.org/faq/normalization>.

<https://www.unicode.org/faq/normalization#1> <https://www.unicode.org/faq/normalization#2>

I "think" my code handles it okay... Humm... thanks.

This is not meant for real secrets rCo just something I built for fun and
learning.
Feedback, critique, or curiosity welcome.

Unfortunately, I don't have enough time to look at your code, but
I wanted to at least make the above comment.

thank you and totally fair enough for sure.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Rich@rich@example.invalid to sci.crypt on Mon Feb 16 22:01:26 2026

From Newsgroup: sci.crypt

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

Features:

[...]

arbitrary Unicode passwords supported

BTW: Consider to read <https://www.unicode.org/faq/normalization>.

<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>

I "think" my code handles it okay... Humm... thanks.

Unless you are actively "normalizing" per the Unicode Consortium recommendation, then it is nearly a 100% chance it *does not* handle it
ok.

Unicode has plural underlying representations that all produce the same "visible characters" for a surprisingly large number of code points.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Tue Feb 17 12:21:10 2026

From Newsgroup: sci.crypt

On 2/16/2026 2:01 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

Features:

[...]

arbitrary Unicode passwords supported

BTW: Consider to read <https://www.unicode.org/faq/normalization>.

<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>

I "think" my code handles it okay... Humm... thanks.

Unless you are actively "normalizing" per the Unicode Consortium recommendation, then it is nearly a 100% chance it *does not* handle it
ok.

Well, I am trying to get the password, plaintext to convert to raw bytes
for my algo to handle. Of course my C version handles anything

Unicode has plural underlying representations that all produce the same "visible characters" for a surprisingly large number of code points.

Let me try a plaintext with some unicode that encrypts using the default
key.

This seems to work:

Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

Now, let me try encrypting it using the same plaintext for a password:

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=b9fd17dc8961db67d635dc20d8374e63ce66521770a30bb7f96e280acaef43d76687a8eb3799393403a47dca1d31f5cecc636ccdec94700b3a364c472ca2b927dea46c75a3ecd1810d5734c15cb700d5f90a106bb3fc0f7f5fdb1b48eec077860102dfbab3e308afafba45113d8f4b7712343d1b608b5b21992c

Seems to work fine.

Can you verify it on your end? Thanks.

https://i.ibb.co/VWWFLt1P/image.png

--- Synchronet 3.21b-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Tue Feb 17 13:17:46 2026

From Newsgroup: sci.crypt

On 2/16/2026 2:01 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

Features:

[...]

arbitrary Unicode passwords supported

BTW: Consider to read <https://www.unicode.org/faq/normalization>.

<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>

I "think" my code handles it okay... Humm... thanks.

Unless you are actively "normalizing" per the Unicode Consortium recommendation, then it is nearly a 100% chance it *does not* handle it
ok.

Unicode has plural underlying representations that all produce the same "visible characters" for a surprisingly large number of code points.

Another test:

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ba06cf92dce13bb9d2b5f2fd81f92d1094bce5e88d01800356e27e3823e58ab343adeaa1d919357dbabb66b13b815e5d3418d01fd82a6ae3c20cb54422e6e5923d4cade18f06d7f76b35c8207e2779631c21b3e57637262adc3b8e1b7e6bb07c09d7e7c89d20a203a834bfa80407370bba17db92da96728ef7e6e303cd2cd8

Plaintext:

Can you see this?

+u reA raa reo reu reR Efii Efc#N+A
--- Synchronet 3.21b-Linux NewsLink 1.2

From Marcel Logen@333200007110-0201@ybtra.de to sci.crypt on Tue Feb 17 23:41:21 2026

From Newsgroup: sci.crypt

Chris M. Thomasson in sci.crypt:

On 2/16/2026 2:01 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

Features:

[...]

arbitrary Unicode passwords supported

BTW: Consider to read <https://www.unicode.org/faq/normalization>.

<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>

I "think" my code handles it okay... Humm... thanks.

Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it
ok.

Well, I am trying to get the password, plaintext to convert to raw bytes
for my algo to handle. Of course my C version handles anything

Look at this example
(made in an UTF-8 terminal window (Linux & bash)):

| $ echo 61 cc 88 0a | xxd -r -p
| |n

| $ echo c3 a4 0a | xxd -r -p
| |n

The raw bytes sequences "0x61cc88" and "0xc3a4" both produce the
output "|n" (German "a umlaut").

Unicode has plural underlying representations that all produce the same
"visible characters" for a surprisingly large number of code points.

Marcel
--
Tue Feb 17 23:41:21 2026 CET (1771368081)
pc-731
87 ms17 c87s
Lines: 46
--- Synchronet 3.21b-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Tue Feb 17 15:25:10 2026

From Newsgroup: sci.crypt

On 2/17/2026 2:41 PM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

On 2/16/2026 2:01 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

Features:

[...]

arbitrary Unicode passwords supported

BTW: Consider to read <https://www.unicode.org/faq/normalization>.

<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>

I "think" my code handles it okay... Humm... thanks.

Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it
ok.

Well, I am trying to get the password, plaintext to convert to raw bytes
for my algo to handle. Of course my C version handles anything

Look at this example
(made in an UTF-8 terminal window (Linux & bash)):

| $ echo 61 cc 88 0a | xxd -r -p
| |n

| $ echo c3 a4 0a | xxd -r -p
| |n

The raw bytes sequences "0x61cc88" and "0xc3a4" both produce the
output "|n" (German "a umlaut").

I hope I am not missing something in my code. Wrt a plaintext of:
____________
Composed |n, decomposed a|e
Symbol |a, letter |a
Omega +-, Ohm raa
____________

I get:

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=b565723fb62f13d0f92b4dd66223afd80b3f6cdbc3cd0eeab4cec1d2ce37f622bcc635f20ac6f4245c028b0f639432a51fdbd39441fc6e60b334d46199dbd2cf3f5300a3bba9c80fcee5fbe24a730c60d4c951e5fbcbd966f4244b8ef9b4b0d60529c15104b93deb2576e2d07816b9956f0a2c03b55ada9bcac8dcce259ab79978112bd61b5f6c274085a6d1e3

____________
Composed |n, decomposed a|e
Symbol |a, letter |a
Omega +-, Ohm raa
____________

Actually, I need to put in a way for me to enter raw hex bytes as a
plaintext. That would help the online version in a sense. I bet there is subtle flaw in there. Thanks!

Unicode has plural underlying representations that all produce the same
"visible characters" for a surprisingly large number of code points.

I bet I missed something. It sucks to try to get a plaintext data in raw bytes. Or I am missing a much easier way.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Tue Feb 17 15:35:01 2026

From Newsgroup: sci.crypt

On 2/17/2026 3:25 PM, Chris M. Thomasson wrote:

On 2/17/2026 2:41 PM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

On 2/16/2026 2:01 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

[...]
I think TextEncoder().encode()/decode() helps me out here a bit. Still,
I need to put in a special plaintext box, or a radio button that treats
the existing plaintext as raw hex bytes. Any thoughts? Thanks.

[...]
--- Synchronet 3.21b-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Tue Feb 17 15:50:18 2026

From Newsgroup: sci.crypt

On 2/17/2026 3:35 PM, Chris M. Thomasson wrote:

On 2/17/2026 3:25 PM, Chris M. Thomasson wrote:

On 2/17/2026 2:41 PM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

On 2/16/2026 2:01 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

[...]
I think TextEncoder().encode()/decode() helps me out here a bit. Still,
I need to put in a special plaintext box, or a radio button that treats
the existing plaintext as raw hex bytes. Any thoughts? Thanks.

[...]

NFC normalization? So, I copy paste in two different unicode's from
website saying they are different, the paste can make them all use the
same code?
--- Synchronet 3.21b-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Tue Feb 17 16:37:04 2026

From Newsgroup: sci.crypt

On 2/15/2026 1:20 AM, Chris M. Thomasson wrote:

IrCOve been working on a small educational cipher experiment called DrMoron.

[...]

Hey now. Fwiw, here is a plaintext, between the lines:
___________________
This is a test...

123 rUi456 Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e ___________________

Okay. I run it through my C version and get the hex ciphertext:

daf3e9d725a004ee2178ad997b6aacb8ef7017fa59c06078f22c8fbcdf04833ba82f5202b81168ef88dd1e7faf5f66ae8d8885637aa5928ea5c1ff64658d938ad8c0b0b72154350dcd766b4aabbffba1d7c7fd9e4b93ce7df9280d4e03e72308cf7f0043aa821311c92ed669dcc7fd65eabd345bd1f852e3304cfbf7244afeda4b98fd91268084a0befae4c8ff3ac9f3443579cd0b1d6bf54b3f37

I run it through my online version, hit decrypt, and get:
___________________
This is a test...

123 rUi456 Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e ___________________

YEAH!!!

That is cool.

I really need to add in a checkbox for a user to treat the data in the plaintext textarea as raw hex bytes.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Rich@rich@example.invalid to sci.crypt on Wed Feb 18 04:34:53 2026

From Newsgroup: sci.crypt

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/16/2026 2:01 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

Features:

[...]

arbitrary Unicode passwords supported

BTW: Consider to read <https://www.unicode.org/faq/normalization>.

<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>

I "think" my code handles it okay... Humm... thanks.

Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it
ok.

Well, I am trying to get the password, plaintext to convert to raw bytes
for my algo to handle. Of course my C version handles anything

Unicode has plural underlying representations that all produce the same
"visible characters" for a surprisingly large number of code points.

Let me try a plaintext with some unicode that encrypts using the default key.

This seems to work:

Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

Of course, because you only have one "password".

You missed the entire point:

Unicode has plural underlying representations that all produce the
same "visible characters" for a surprisingly large number of code
points.

So (all made up as I'm not going to bother digging into Unicode to drag
out the actual byte sequences) the byte sequence:

0x80 0x55 0x54

might display a Pilcrow symbol

And this byte sequence:

0x81 0x23 0x88 0x23

might *also* display as a Pilcrow symbol

So, you, on your Windows machine, enter a pilcrow as part of your
password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow.
Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's encrypting.

You send the result to a buddy using a Mac, and you tell him the
password is a Pilcrow symbol.

They enter a Pilcrow symbol however one would do so on a Mac, but MacOS outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
password as you used to encrypt.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Rich@rich@example.invalid to sci.crypt on Wed Feb 18 04:39:19 2026

From Newsgroup: sci.crypt

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

I bet there is subtle flaw in there. Thanks!

There is, but it is not 'subtle' for anyone who has more than a passing familiarity with Unicode encodings. It's only subtle to those who
don't have much understanding of Unicode encodings.

Unicode has plural underlying representations that all produce the same >>>> "visible characters" for a surprisingly large number of code points.

I bet I missed something. It sucks to try to get a plaintext data in raw bytes. Or I am missing a much easier way.

You missed the example Marciel posted.

Two very different byte sequences, both that display the exact same
character.

So someone that uses that "character" in a password is at the mercy of
which byte sequence their tool/OS uses when they enter that character.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Rich@rich@example.invalid to sci.crypt on Wed Feb 18 04:41:49 2026

From Newsgroup: sci.crypt

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/17/2026 3:25 PM, Chris M. Thomasson wrote:

On 2/17/2026 2:41 PM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

On 2/16/2026 2:01 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

[...]
I think TextEncoder().encode()/decode() helps me out here a bit. Still,
I need to put in a special plaintext box, or a radio button that treats
the existing plaintext as raw hex bytes. Any thoughts? Thanks.

[...]

Only if it also applies Unicode normalization in the process.

And of course, that still leaves you at the mercy of the Unicode
standards committee changing the manner of normalization three versions
from now, whereupon things change again.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Rich@rich@example.invalid to sci.crypt on Wed Feb 18 04:43:18 2026

From Newsgroup: sci.crypt

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/17/2026 3:35 PM, Chris M. Thomasson wrote:

On 2/17/2026 3:25 PM, Chris M. Thomasson wrote:

On 2/17/2026 2:41 PM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

On 2/16/2026 2:01 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

[...]
I think TextEncoder().encode()/decode() helps me out here a bit. Still,
I need to put in a special plaintext box, or a radio button that treats
the existing plaintext as raw hex bytes. Any thoughts? Thanks.

[...]

NFC normalization? So, I copy paste in two different unicode's from
website saying they are different, the paste can make them all use the
same code?

Well, if you mean Near Field Communication, not likely.

Unicode normalization. Unicode has a whole host of rules for
normalizing the variant encodings such that one can compare unicode
strings and determine if they represent the same characters or not.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Ben Bacarisse@ben@bsb.me.uk to sci.crypt on Wed Feb 18 12:19:56 2026

From Newsgroup: sci.crypt

Rich <rich@example.invalid> writes:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/16/2026 2:01 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

Features:

[...]

arbitrary Unicode passwords supported

BTW: Consider to read <https://www.unicode.org/faq/normalization>.

<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>

I "think" my code handles it okay... Humm... thanks.

Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it
ok.

Well, I am trying to get the password, plaintext to convert to raw bytes
for my algo to handle. Of course my C version handles anything

Unicode has plural underlying representations that all produce the same
"visible characters" for a surprisingly large number of code points.

Let me try a plaintext with some unicode that encrypts using the default
key.

This seems to work:

Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

Of course, because you only have one "password".

You missed the entire point:

Unicode has plural underlying representations that all produce the
same "visible characters" for a surprisingly large number of code
points.

So (all made up as I'm not going to bother digging into Unicode to drag
out the actual byte sequences) the byte sequence:

I don't think it's helpful to use made-up encodings! There are
normalisation issues, but they are not at all as you present them. I
think you might be talking about completely different Unicode encodings,
like UTF-16 vs UTF-8.

0x80 0x55 0x54

might display a Pilcrow symbol

And this byte sequence:

0x81 0x23 0x88 0x23

might *also* display as a Pilcrow symbol

So, you, on your Windows machine, enter a pilcrow as part of your
password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow. Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's encrypting.

You send the result to a buddy using a Mac, and you tell him the
password is a Pilcrow symbol.

They enter a Pilcrow symbol however one would do so on a Mac, but MacOS outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
password as you used to encrypt.

This is not to do with normalisation but, I think, two systems using
different Uniconde encodings. Im guessing because you don't give two
actual encodings that can represent a pilcrow.

Exactly the same problem has been around since the birth of computing.
If my password is "IBM" in EBCDIC and my DEC buddy types "IBM" into her
PDP-11 it won't work. It won't even work between Windows and Mac
machines when using any characters that don't have the same single-byte encodings.
--
Ben.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Richard Harnden@richard.nospam@gmail.invalid to sci.crypt on Wed Feb 18 19:08:29 2026

From Newsgroup: sci.crypt

On 18/02/2026 12:19, Ben Bacarisse wrote:

Rich <rich@example.invalid> writes:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/16/2026 2:01 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

Features:

[...]

arbitrary Unicode passwords supported

BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>

I "think" my code handles it okay... Humm... thanks.

Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it >>>> ok.

Well, I am trying to get the password, plaintext to convert to raw bytes >>> for my algo to handle. Of course my C version handles anything

Unicode has plural underlying representations that all produce the same >>>> "visible characters" for a surprisingly large number of code points.

Let me try a plaintext with some unicode that encrypts using the default >>> key.

This seems to work:

Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

Of course, because you only have one "password".

You missed the entire point:

Unicode has plural underlying representations that all produce the
same "visible characters" for a surprisingly large number of code
points.

So (all made up as I'm not going to bother digging into Unicode to drag
out the actual byte sequences) the byte sequence:

I don't think it's helpful to use made-up encodings! There are
normalisation issues, but they are not at all as you present them. I
think you might be talking about completely different Unicode encodings,
like UTF-16 vs UTF-8.

0x80 0x55 0x54

might display a Pilcrow symbol

And this byte sequence:

0x81 0x23 0x88 0x23

might *also* display as a Pilcrow symbol

So, you, on your Windows machine, enter a pilcrow as part of your
password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow.
Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's
encrypting.

You send the result to a buddy using a Mac, and you tell him the
password is a Pilcrow symbol.

They enter a Pilcrow symbol however one would do so on a Mac, but MacOS
outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
password as you used to encrypt.

This is not to do with normalisation but, I think, two systems using different Uniconde encodings. Im guessing because you don't give two
actual encodings that can represent a pilcrow.

Exactly the same problem has been around since the birth of computing.
If my password is "IBM" in EBCDIC and my DEC buddy types "IBM" into her PDP-11 it won't work. It won't even work between Windows and Mac
machines when using any characters that don't have the same single-byte encodings.

For UTF-8, "|i", for example, could be any of:

The precomposed character: c3a1 - 11000011 10100001,
"a" plus a combining-acute: 61cc81 - 01100001 11001100 10000001,
or some overlong encoding that takes up more bytes than it actually needs.

I think that's what Rich meant.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Wed Feb 18 12:32:21 2026

From Newsgroup: sci.crypt

On 2/17/2026 8:34 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/16/2026 2:01 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

Features:

[...]

arbitrary Unicode passwords supported

BTW: Consider to read <https://www.unicode.org/faq/normalization>.

<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>

I "think" my code handles it okay... Humm... thanks.

Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it
ok.

Well, I am trying to get the password, plaintext to convert to raw bytes
for my algo to handle. Of course my C version handles anything

Unicode has plural underlying representations that all produce the same
"visible characters" for a surprisingly large number of code points.

Let me try a plaintext with some unicode that encrypts using the default
key.

This seems to work:

Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

Of course, because you only have one "password".

You missed the entire point:

Unicode has plural underlying representations that all produce the
same "visible characters" for a surprisingly large number of code
points.

So (all made up as I'm not going to bother digging into Unicode to drag
out the actual byte sequences) the byte sequence:

0x80 0x55 0x54

might display a Pilcrow symbol

And this byte sequence:

0x81 0x23 0x88 0x23

might *also* display as a Pilcrow symbol

So, you, on your Windows machine, enter a pilcrow as part of your
password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow.
Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's encrypting.

Okay.

You send the result to a buddy using a Mac, and you tell him the
password is a Pilcrow symbol.

They enter a Pilcrow symbol however one would do so on a Mac, but MacOS outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
password as you used to encrypt.

BAM! It will not decrypt. I really need to make a checkbox or something
that denotes "hex byte mode"... This will allow plaintext and password textarea's to use hex bytes directly...

I did it on my local machine, not uploaded it to the web yet... I have,
say a plaintext treated as hex bytes with the different encodings, hit decrypt, and they show as the correct symbols even though the hex representations are different. Humm, need to ponder...

--- Synchronet 3.21b-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Wed Feb 18 12:36:58 2026

From Newsgroup: sci.crypt

On 2/17/2026 8:39 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

I bet there is subtle flaw in there. Thanks!

There is, but it is not 'subtle' for anyone who has more than a passing familiarity with Unicode encodings. It's only subtle to those who
don't have much understanding of Unicode encodings.

Unicode has plural underlying representations that all produce the same >>>>> "visible characters" for a surprisingly large number of code points.

I bet I missed something. It sucks to try to get a plaintext data in raw
bytes. Or I am missing a much easier way.

You missed the example Marciel posted.

Two very different byte sequences, both that display the exact same character.

That would nail the password.

So someone that uses that "character" in a password is at the mercy of
which byte sequence their tool/OS uses when they enter that character.

Yup. Need a raw hex byte mode... :^)
--- Synchronet 3.21b-Linux NewsLink 1.2

From Ben Bacarisse@ben@bsb.me.uk to sci.crypt on Wed Feb 18 22:57:28 2026

From Newsgroup: sci.crypt

Richard Harnden <richard.nospam@gmail.invalid> writes:

On 18/02/2026 12:19, Ben Bacarisse wrote:

Rich <rich@example.invalid> writes:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/16/2026 2:01 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

Features:

[...]

arbitrary Unicode passwords supported

BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>

I "think" my code handles it okay... Humm... thanks.

Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it >>>>> ok.

Well, I am trying to get the password, plaintext to convert to raw bytes >>>> for my algo to handle. Of course my C version handles anything

Unicode has plural underlying representations that all produce the same >>>>> "visible characters" for a surprisingly large number of code points.

Let me try a plaintext with some unicode that encrypts using the default >>>> key.

This seems to work:

Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

Of course, because you only have one "password".

You missed the entire point:

Unicode has plural underlying representations that all produce the
same "visible characters" for a surprisingly large number of code
points.

So (all made up as I'm not going to bother digging into Unicode to drag
out the actual byte sequences) the byte sequence:

I don't think it's helpful to use made-up encodings! There are
normalisation issues, but they are not at all as you present them. I
think you might be talking about completely different Unicode encodings,
like UTF-16 vs UTF-8.

0x80 0x55 0x54

might display a Pilcrow symbol

And this byte sequence:

0x81 0x23 0x88 0x23

might *also* display as a Pilcrow symbol

So, you, on your Windows machine, enter a pilcrow as part of your
password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow.
Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's
encrypting.

You send the result to a buddy using a Mac, and you tell him the
password is a Pilcrow symbol.

They enter a Pilcrow symbol however one would do so on a Mac, but MacOS
outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
password as you used to encrypt.

This is not to do with normalisation but, I think, two systems using
different Uniconde encodings. Im guessing because you don't give two
actual encodings that can represent a pilcrow.
Exactly the same problem has been around since the birth of computing.
If my password is "IBM" in EBCDIC and my DEC buddy types "IBM" into her
PDP-11 it won't work. It won't even work between Windows and Mac
machines when using any characters that don't have the same single-byte
encodings.

For UTF-8, "|i", for example, could be any of:

The precomposed character: c3a1 - 11000011 10100001,
"a" plus a combining-acute: 61cc81 - 01100001 11001100 10000001,
or some overlong encoding that takes up more bytes than it actually needs.

I think that's what Rich meant.

My first thought, but then why would pilcrow have different encodings
between Windows and Mac? That's an entirely different scenario. That's
why I would have preferred an example, not a couple of made-up sequences
that are not Unicode encodings at all!

BTW, all programs should reject over-long encodings in input and none
should ever generate any as output so, again, it does not for the
Windows/Mac scenario. A system that generated one when the user is
typing a password is seriously broken.
--
Ben.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Rich@rich@example.invalid to sci.crypt on Thu Feb 19 03:36:43 2026

From Newsgroup: sci.crypt

Ben Bacarisse <ben@bsb.me.uk> wrote:

Rich <rich@example.invalid> writes:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/16/2026 2:01 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

Features:

[...]

arbitrary Unicode passwords supported

BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>

I "think" my code handles it okay... Humm... thanks.

Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it >>>> ok.

Well, I am trying to get the password, plaintext to convert to raw bytes >>> for my algo to handle. Of course my C version handles anything

Unicode has plural underlying representations that all produce the same >>>> "visible characters" for a surprisingly large number of code points.

Let me try a plaintext with some unicode that encrypts using the default >>> key.

This seems to work:

Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

Of course, because you only have one "password".

You missed the entire point:

Unicode has plural underlying representations that all produce the
same "visible characters" for a surprisingly large number of code
points.

So (all made up as I'm not going to bother digging into Unicode to drag
out the actual byte sequences) the byte sequence:

I don't think it's helpful to use made-up encodings!

I wasn't going to put in the effort to actually do the research to find
a real one (plus another poster already provided a real one).

There are normalisation issues, but they are not at all as you
present them. I think you might be talking about completely
different Unicode encodings, like UTF-16 vs UTF-8.

Nope. For most of the accented characters (i.e., something like the a
with umlaut) there is both a code point that is directly "a with umlaut"
and there are combining characters (a combining umlaut) that can be
combined with a standard "letter a" code point, to also make "a with
umlaut". But the "code point" version will be a different byte
sequence (in whatever UTF you pick) than the "combining character plus non-accented letter) version.

0x80 0x55 0x54

might display a Pilcrow symbol

And this byte sequence:

0x81 0x23 0x88 0x23

might *also* display as a Pilcrow symbol

So, you, on your Windows machine, enter a pilcrow as part of your
password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow.
Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's
encrypting.

You send the result to a buddy using a Mac, and you tell him the
password is a Pilcrow symbol.

They enter a Pilcrow symbol however one would do so on a Mac, but MacOS
outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
password as you used to encrypt.

This is not to do with normalisation but, I think, two systems using different Uniconde encodings.

That can also bite him too, but often when someone starts talking about "unicode passwords" it is because their native language uses accented characters, and they would like something like p|nssword (that's
password, but with a-umulat instead of a). But without realizing there
are two different sets of codepoints that will create |n and at least
about three different ways to encode those code points to bytes.

Im guessing because you don't give two actual encodings that can
represent a pilcrow.

The whole point of "made up" is it was all "made up", because, frankly,
I wasn't going to bother putting in the effort to find the two
different encodings.

Exactly the same problem has been around since the birth of computing.
If my password is "IBM" in EBCDIC and my DEC buddy types "IBM" into her PDP-11 it won't work. It won't even work between Windows and Mac
machines when using any characters that don't have the same single-byte encodings.

Yep, which is what Chris was missing when he started talking about
using "unicode passwords".
--- Synchronet 3.21b-Linux NewsLink 1.2

From Rich@rich@example.invalid to sci.crypt on Thu Feb 19 03:39:03 2026

From Newsgroup: sci.crypt

Richard Harnden <richard.nospam@gmail.invalid> wrote:

On 18/02/2026 12:19, Ben Bacarisse wrote:

Rich <rich@example.invalid> writes:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/16/2026 2:01 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

Features:

[...]

arbitrary Unicode passwords supported

BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>

I "think" my code handles it okay... Humm... thanks.

Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it >>>>> ok.

Well, I am trying to get the password, plaintext to convert to raw bytes >>>> for my algo to handle. Of course my C version handles anything

Unicode has plural underlying representations that all produce the same >>>>> "visible characters" for a surprisingly large number of code points.

Let me try a plaintext with some unicode that encrypts using the default >>>> key.

This seems to work:

Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

Of course, because you only have one "password".

You missed the entire point:

Unicode has plural underlying representations that all produce the
same "visible characters" for a surprisingly large number of code
points.

So (all made up as I'm not going to bother digging into Unicode to drag
out the actual byte sequences) the byte sequence:

I don't think it's helpful to use made-up encodings! There are
normalisation issues, but they are not at all as you present them. I
think you might be talking about completely different Unicode encodings,
like UTF-16 vs UTF-8.

0x80 0x55 0x54

might display a Pilcrow symbol

And this byte sequence:

0x81 0x23 0x88 0x23

might *also* display as a Pilcrow symbol

So, you, on your Windows machine, enter a pilcrow as part of your
password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow.
Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's
encrypting.

You send the result to a buddy using a Mac, and you tell him the
password is a Pilcrow symbol.

They enter a Pilcrow symbol however one would do so on a Mac, but MacOS
outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
password as you used to encrypt.

This is not to do with normalisation but, I think, two systems using
different Uniconde encodings. Im guessing because you don't give two
actual encodings that can represent a pilcrow.

Exactly the same problem has been around since the birth of computing.
If my password is "IBM" in EBCDIC and my DEC buddy types "IBM" into her
PDP-11 it won't work. It won't even work between Windows and Mac
machines when using any characters that don't have the same single-byte
encodings.

For UTF-8, "|i", for example, could be any of:

The precomposed character: c3a1 - 11000011 10100001,
"a" plus a combining-acute: 61cc81 - 01100001 11001100 10000001,
or some overlong encoding that takes up more bytes than it actually needs.

Technically a spec. violation, as the spec. requires the shortest
encoding be used. So a proper system shouldn't overlong encode, but
someone "trying to find edge conditions" just might ship in an overlong encoding to see what happens.

I think that's what Rich meant.

Yep, exactly what I meant.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Rich@rich@example.invalid to sci.crypt on Thu Feb 19 03:43:39 2026

From Newsgroup: sci.crypt

Ben Bacarisse <ben@bsb.me.uk> wrote:

Richard Harnden <richard.nospam@gmail.invalid> writes:

On 18/02/2026 12:19, Ben Bacarisse wrote:

Rich <rich@example.invalid> writes:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/16/2026 2:01 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

Features:

[...]

arbitrary Unicode passwords supported

BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>

I "think" my code handles it okay... Humm... thanks.

Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it >>>>>> ok.

Well, I am trying to get the password, plaintext to convert to raw bytes >>>>> for my algo to handle. Of course my C version handles anything

Unicode has plural underlying representations that all produce the same >>>>>> "visible characters" for a surprisingly large number of code points. >>>>>

Let me try a plaintext with some unicode that encrypts using the default >>>>> key.

This seems to work:

Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

Of course, because you only have one "password".

You missed the entire point:

Unicode has plural underlying representations that all produce the >>>> same "visible characters" for a surprisingly large number of code
points.

So (all made up as I'm not going to bother digging into Unicode to drag >>>> out the actual byte sequences) the byte sequence:

I don't think it's helpful to use made-up encodings! There are
normalisation issues, but they are not at all as you present them. I
think you might be talking about completely different Unicode encodings, >>> like UTF-16 vs UTF-8.

0x80 0x55 0x54

might display a Pilcrow symbol

And this byte sequence:

0x81 0x23 0x88 0x23

might *also* display as a Pilcrow symbol

So, you, on your Windows machine, enter a pilcrow as part of your
password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow. >>>> Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's
encrypting.

You send the result to a buddy using a Mac, and you tell him the
password is a Pilcrow symbol.

They enter a Pilcrow symbol however one would do so on a Mac, but MacOS >>>> outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
password as you used to encrypt.

This is not to do with normalisation but, I think, two systems using
different Uniconde encodings. Im guessing because you don't give two
actual encodings that can represent a pilcrow.
Exactly the same problem has been around since the birth of computing.
If my password is "IBM" in EBCDIC and my DEC buddy types "IBM" into her
PDP-11 it won't work. It won't even work between Windows and Mac
machines when using any characters that don't have the same single-byte
encodings.

For UTF-8, "|i", for example, could be any of:

The precomposed character: c3a1 - 11000011 10100001,
"a" plus a combining-acute: 61cc81 - 01100001 11001100 10000001,
or some overlong encoding that takes up more bytes than it actually needs. >>
I think that's what Rich meant.

My first thought, but then why would pilcrow have different encodings
between Windows and Mac?

You are still being "too literal". "Made up" means everything from my
example was "made up", including the fact that I said "pilcrow" (I
wanted a character that wasn't a standard ascii character, and that was
the first "name" that came to mind).

But "made up" means none of it was actual fact, it was just to try to
get Chris to understand that the same characters on screen (or in his
password entry box) could become very different byte sequences
depending upon what the user program, its libraries, and/or the OS
might choose to do.

That's an entirely different scenario. That's why I would have
preferred an example, not a couple of made-up sequences that are not
Unicode encodings at all!

That's fair, but I was not going to bother doing the leg work to work
out the example.

BTW, all programs should reject over-long encodings in input and none
should ever generate any as output so, again, it does not for the
Windows/Mac scenario. A system that generated one when the user is
typing a password is seriously broken.

Agreed. So Chris should not need to worry about 'overlong' sequences.
Except when a pen-tester is attacking his project, and then the
pen-tester might just inject overlong sequences to see what happens.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Rich@rich@example.invalid to sci.crypt on Thu Feb 19 03:45:40 2026

From Newsgroup: sci.crypt

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/17/2026 8:34 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/16/2026 2:01 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

Features:

[...]

arbitrary Unicode passwords supported

BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>

I "think" my code handles it okay... Humm... thanks.

Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it >>>> ok.

Well, I am trying to get the password, plaintext to convert to raw bytes >>> for my algo to handle. Of course my C version handles anything

Unicode has plural underlying representations that all produce the same >>>> "visible characters" for a surprisingly large number of code points.

Let me try a plaintext with some unicode that encrypts using the default >>> key.

This seems to work:

Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

Of course, because you only have one "password".

You missed the entire point:

Unicode has plural underlying representations that all produce the
same "visible characters" for a surprisingly large number of code
points.

So (all made up as I'm not going to bother digging into Unicode to drag
out the actual byte sequences) the byte sequence:

0x80 0x55 0x54

might display a Pilcrow symbol

And this byte sequence:

0x81 0x23 0x88 0x23

might *also* display as a Pilcrow symbol

So, you, on your Windows machine, enter a pilcrow as part of your
password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow.
Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's
encrypting.

Okay.

You send the result to a buddy using a Mac, and you tell him the
password is a Pilcrow symbol.

They enter a Pilcrow symbol however one would do so on a Mac, but MacOS
outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
password as you used to encrypt.

BAM! It will not decrypt. I really need to make a checkbox or something
that denotes "hex byte mode"... This will allow plaintext and password textarea's to use hex bytes directly...

I did it on my local machine, not uploaded it to the web yet... I have,
say a plaintext treated as hex bytes with the different encodings, hit decrypt, and they show as the correct symbols even though the hex representations are different. Humm, need to ponder...

If in the end you want the ability of a user to be able to enter any
string of binary bytes then some form of "binary input" (hex, base64,
base85, etc.) will be needed if you expect them to type them in.

If you let them upload a file as the "key" then you don't need a hex
(or other) input mode.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Ben Bacarisse@ben@bsb.me.uk to sci.crypt on Thu Feb 19 11:52:23 2026

From Newsgroup: sci.crypt

Rich <rich@example.invalid> writes:

Ben Bacarisse <ben@bsb.me.uk> wrote:

Rich <rich@example.invalid> writes:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/16/2026 2:01 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

Features:

[...]

arbitrary Unicode passwords supported

BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>

I "think" my code handles it okay... Humm... thanks.

Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it >>>>> ok.

Well, I am trying to get the password, plaintext to convert to raw bytes >>>> for my algo to handle. Of course my C version handles anything

Unicode has plural underlying representations that all produce the same >>>>> "visible characters" for a surprisingly large number of code points.

Let me try a plaintext with some unicode that encrypts using the default >>>> key.

This seems to work:

Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

Of course, because you only have one "password".

You missed the entire point:

Unicode has plural underlying representations that all produce the
same "visible characters" for a surprisingly large number of code
points.

So (all made up as I'm not going to bother digging into Unicode to drag >>> out the actual byte sequences) the byte sequence:

I don't think it's helpful to use made-up encodings!

I wasn't going to put in the effort to actually do the research to find
a real one (plus another poster already provided a real one).

There are normalisation issues, but they are not at all as you
present them. I think you might be talking about completely
different Unicode encodings, like UTF-16 vs UTF-8.

Nope. For most of the accented characters (i.e., something like the a
with umlaut) there is both a code point that is directly "a with umlaut"
and there are combining characters (a combining umlaut) that can be
combined with a standard "letter a" code point, to also make "a with umlaut". But the "code point" version will be a different byte
sequence (in whatever UTF you pick) than the "combining character plus non-accented letter) version.

When I started reading your post that's what I thought you might be
getting at but you example -- pilcrow -- put a stop to that. Surely you
could not be talking about combining characters and diacriticals given
that example. So in trying to make sense of your example I decided you
must be talking about different encodings.

Anyway, I should have stuck with that thought despite the example. Even
so, I would probably still have posted a "different encodings" remark as
that is just as much a problem (though one that is reducing over time)
as Unicode equivalence and compatibility.
--
Ben.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Thu Feb 19 12:36:26 2026

From Newsgroup: sci.crypt

On 2/18/2026 7:45 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/17/2026 8:34 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/16/2026 2:01 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

Features:

[...]

arbitrary Unicode passwords supported

BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>

I "think" my code handles it okay... Humm... thanks.

Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it >>>>> ok.

Well, I am trying to get the password, plaintext to convert to raw bytes >>>> for my algo to handle. Of course my C version handles anything

Unicode has plural underlying representations that all produce the same >>>>> "visible characters" for a surprisingly large number of code points.

Let me try a plaintext with some unicode that encrypts using the default >>>> key.

This seems to work:

Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

Of course, because you only have one "password".

You missed the entire point:

Unicode has plural underlying representations that all produce the
same "visible characters" for a surprisingly large number of code
points.

So (all made up as I'm not going to bother digging into Unicode to drag
out the actual byte sequences) the byte sequence:

0x80 0x55 0x54

might display a Pilcrow symbol

And this byte sequence:

0x81 0x23 0x88 0x23

might *also* display as a Pilcrow symbol

So, you, on your Windows machine, enter a pilcrow as part of your
password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow.
Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's
encrypting.

Okay.

You send the result to a buddy using a Mac, and you tell him the
password is a Pilcrow symbol.

They enter a Pilcrow symbol however one would do so on a Mac, but MacOS
outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
password as you used to encrypt.

BAM! It will not decrypt. I really need to make a checkbox or something
that denotes "hex byte mode"... This will allow plaintext and password
textarea's to use hex bytes directly...

I did it on my local machine, not uploaded it to the web yet... I have,
say a plaintext treated as hex bytes with the different encodings, hit
decrypt, and they show as the correct symbols even though the hex
representations are different. Humm, need to ponder...

If in the end you want the ability of a user to be able to enter any
string of binary bytes then some form of "binary input" (hex, base64,
base85, etc.) will be needed if you expect them to type them in.

If you let them upload a file as the "key" then you don't need a hex
(or other) input mode.

I was thinking about that. It would not be "uploaded" to the server
because my code is client only. It would need to allow the user to look
for, I guess, any file. Then use its contents as a Password, raw bytes.
Should work. At least I should put in a warning about using the password
with unicode... Thanks! :^)

Password with unicode = can of oh shit's!, not just a can of worms?
--- Synchronet 3.21b-Linux NewsLink 1.2

From Rich@rich@example.invalid to sci.crypt on Fri Feb 20 04:05:19 2026

From Newsgroup: sci.crypt

Ben Bacarisse <ben@bsb.me.uk> wrote:

Rich <rich@example.invalid> writes:

Ben Bacarisse <ben@bsb.me.uk> wrote:

Rich <rich@example.invalid> writes:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/16/2026 2:01 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

Features:

[...]

arbitrary Unicode passwords supported

BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>

I "think" my code handles it okay... Humm... thanks.

Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it >>>>>> ok.

Well, I am trying to get the password, plaintext to convert to raw bytes >>>>> for my algo to handle. Of course my C version handles anything

Unicode has plural underlying representations that all produce the same >>>>>> "visible characters" for a surprisingly large number of code points. >>>>>

Let me try a plaintext with some unicode that encrypts using the default >>>>> key.

This seems to work:

Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

Of course, because you only have one "password".

You missed the entire point:

Unicode has plural underlying representations that all produce the >>>> same "visible characters" for a surprisingly large number of code
points.

So (all made up as I'm not going to bother digging into Unicode to drag >>>> out the actual byte sequences) the byte sequence:

I don't think it's helpful to use made-up encodings!

I wasn't going to put in the effort to actually do the research to find
a real one (plus another poster already provided a real one).

There are normalisation issues, but they are not at all as you
present them. I think you might be talking about completely
different Unicode encodings, like UTF-16 vs UTF-8.

Nope. For most of the accented characters (i.e., something like the a
with umlaut) there is both a code point that is directly "a with umlaut"
and there are combining characters (a combining umlaut) that can be
combined with a standard "letter a" code point, to also make "a with
umlaut". But the "code point" version will be a different byte
sequence (in whatever UTF you pick) than the "combining character plus
non-accented letter) version.

When I started reading your post that's what I thought you might be
getting at but you example -- pilcrow -- put a stop to that. Surely you could not be talking about combining characters and diacriticals given
that example. So in trying to make sense of your example I decided you
must be talking about different encodings.

That's a fair critisism. I should have used an accented char as the
fake character. And yes, there is the additional 'encodings' issue
too.

Anyway, I should have stuck with that thought despite the example. Even
so, I would probably still have posted a "different encodings" remark as
that is just as much a problem (though one that is reducing over time)
as Unicode equivalence and compatibility.

Yes, I did not think to mention that. At the same time, it was aimed
at Chris, and combining a "different sets of codepoints represent same character" and "different UTF encodings convert same code points to
different byte strings" would likely have sent him off on an unrelated tangent.

But you are correct, UTF-8 has almost taken over, but the other
encodings do still show up occasionally, and a different encoding will
cause his "Unicode" passwords to also fail.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Rich@rich@example.invalid to sci.crypt on Fri Feb 20 04:08:20 2026

From Newsgroup: sci.crypt

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/18/2026 7:45 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/17/2026 8:34 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/16/2026 2:01 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

Features:

[...]

arbitrary Unicode passwords supported

BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>

I "think" my code handles it okay... Humm... thanks.

Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it >>>>>> ok.

Well, I am trying to get the password, plaintext to convert to raw bytes >>>>> for my algo to handle. Of course my C version handles anything

Unicode has plural underlying representations that all produce the same >>>>>> "visible characters" for a surprisingly large number of code points. >>>>>

Let me try a plaintext with some unicode that encrypts using the default >>>>> key.

This seems to work:

Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

Of course, because you only have one "password".

You missed the entire point:

Unicode has plural underlying representations that all produce the >>>> same "visible characters" for a surprisingly large number of code >>>> points.

So (all made up as I'm not going to bother digging into Unicode to drag >>>> out the actual byte sequences) the byte sequence:

0x80 0x55 0x54

might display a Pilcrow symbol

And this byte sequence:

0x81 0x23 0x88 0x23

might *also* display as a Pilcrow symbol

So, you, on your Windows machine, enter a pilcrow as part of your
password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow. >>>> Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's
encrypting.

Okay.

You send the result to a buddy using a Mac, and you tell him the
password is a Pilcrow symbol.

They enter a Pilcrow symbol however one would do so on a Mac, but MacOS >>>> outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
password as you used to encrypt.

BAM! It will not decrypt. I really need to make a checkbox or something
that denotes "hex byte mode"... This will allow plaintext and password
textarea's to use hex bytes directly...

I did it on my local machine, not uploaded it to the web yet... I have,
say a plaintext treated as hex bytes with the different encodings, hit
decrypt, and they show as the correct symbols even though the hex
representations are different. Humm, need to ponder...

If in the end you want the ability of a user to be able to enter any
string of binary bytes then some form of "binary input" (hex, base64,
base85, etc.) will be needed if you expect them to type them in.

If you let them upload a file as the "key" then you don't need a hex
(or other) input mode.

I was thinking about that. It would not be "uploaded" to the server
because my code is client only. It would need to allow the user to look
for, I guess, any file. Then use its contents as a Password, raw bytes. Should work. At least I should put in a warning about using the password with unicode... Thanks! :^)

Password with unicode = can of oh shit's!, not just a can of worms?

Unicode passwords bring the trouble of "looks the same on screen" but
might be "very different set of bytes on the wire". And your hashes
only see the "bytes on the wire".
--- Synchronet 3.21b-Linux NewsLink 1.2

From Ben Bacarisse@ben@bsb.me.uk to sci.crypt on Fri Feb 20 13:07:40 2026

From Newsgroup: sci.crypt

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

Password with unicode = can of oh shit's!, not just a can of worms?

They introduce some difficulties, but there are advantages as well.
People want to use passwords and pass phrases that are in their native language. It can help people choose longer ones while still remembering
them. Also Unicode can increase the entropy without needing longer
passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
many places will either reject it or insist I add a digit, an upper and
lower case letter and a symbol!
--
Ben.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Ben Bacarisse@ben@bsb.me.uk to sci.crypt on Fri Feb 20 21:01:09 2026

From Newsgroup: sci.crypt

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 2/20/2026 5:07 AM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

Password with unicode = can of oh shit's!, not just a can of worms?

They introduce some difficulties, but there are advantages as well.
People want to use passwords and pass phrases that are in their native
language. It can help people choose longer ones while still remembering
them. Also Unicode can increase the entropy without needing longer
passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
many places will either reject it or insist I add a digit, an upper and
lower case letter and a symbol!

Well let me try with just copy-and-paste between the quotes "raAEYc+-+rCR-i-+-#" as a
password for the plaintext, between the quotes:

"Ben Bacarisse"

Here is the link to some ciphertext... Can you decrypt it? copy paste in
the password, click the decrypt button:

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7

I don't want to copy an post stuff into a link. What purpose does that
serve? You know what your code does so you know what will happen for a particular sequence of input bytes.

I still think it scary because of what I read in this thread. Might work, might not??? Perhaps? Humm... ;^o

The trick to knowing if some software will do what you expect is to
understand the code, the inputs and the outputs. Copying and pasting
text won't teach you much about any of these things.
--
Ben.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Fri Feb 20 12:52:46 2026

From Newsgroup: sci.crypt

On 2/20/2026 5:07 AM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

Password with unicode = can of oh shit's!, not just a can of worms?

They introduce some difficulties, but there are advantages as well.
People want to use passwords and pass phrases that are in their native language. It can help people choose longer ones while still remembering them. Also Unicode can increase the entropy without needing longer passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
many places will either reject it or insist I add a digit, an upper and
lower case letter and a symbol!

Well let me try with just copy-and-paste between the quotes "raAEYc+-+rCR-i-+-#"
as a password for the plaintext, between the quotes:

"Ben Bacarisse"

Here is the link to some ciphertext... Can you decrypt it? copy paste in
the password, click the decrypt button:

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7

I can from here, but can anybody else?

https://i.ibb.co/v4VG3k6K/image.png

thanks Ben. :^)

I still think it scary because of what I read in this thread. Might
work, might not??? Perhaps? Humm... ;^o

--- Synchronet 3.21b-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Fri Feb 20 13:08:14 2026

From Newsgroup: sci.crypt

On 2/20/2026 1:01 PM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 2/20/2026 5:07 AM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

Password with unicode = can of oh shit's!, not just a can of worms?

They introduce some difficulties, but there are advantages as well.
People want to use passwords and pass phrases that are in their native
language. It can help people choose longer ones while still remembering >>> them. Also Unicode can increase the entropy without needing longer
passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
many places will either reject it or insist I add a digit, an upper and
lower case letter and a symbol!

Well let me try with just copy-and-paste between the quotes "raAEYc+-+rCR-i-+-#" as a
password for the plaintext, between the quotes:

"Ben Bacarisse"

Here is the link to some ciphertext... Can you decrypt it? copy paste in
the password, click the decrypt button:

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7

I don't want to copy an post stuff into a link. What purpose does that serve? You know what your code does so you know what will happen for a particular sequence of input bytes.

I still think it scary because of what I read in this thread. Might work,
might not??? Perhaps? Humm... ;^o

The trick to knowing if some software will do what you expect is to understand the code, the inputs and the outputs. Copying and pasting
text won't teach you much about any of these things.

Well, if you are on a different system and the copy and paste might give different bytes for the same visual symbols, that would murder the
password and you could not decrypt... Right?
--- Synchronet 3.21b-Linux NewsLink 1.2

From Ben Bacarisse@ben@bsb.me.uk to sci.crypt on Fri Feb 20 22:03:54 2026

From Newsgroup: sci.crypt

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 2/20/2026 1:01 PM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 2/20/2026 5:07 AM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

Password with unicode = can of oh shit's!, not just a can of worms?

They introduce some difficulties, but there are advantages as well.
People want to use passwords and pass phrases that are in their native >>>> language. It can help people choose longer ones while still remembering >>>> them. Also Unicode can increase the entropy without needing longer
passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
many places will either reject it or insist I add a digit, an upper and >>>> lower case letter and a symbol!

Well let me try with just copy-and-paste between the quotes "raAEYc+-+rCR-i-+-#" as a
password for the plaintext, between the quotes:

"Ben Bacarisse"

Here is the link to some ciphertext... Can you decrypt it? copy paste in >>> the password, click the decrypt button:

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7

I don't want to copy an post stuff into a link. What purpose does that
serve? You know what your code does so you know what will happen for a
particular sequence of input bytes.

I still think it scary because of what I read in this thread. Might work, >>> might not??? Perhaps? Humm... ;^o

The trick to knowing if some software will do what you expect is to
understand the code, the inputs and the outputs. Copying and pasting
text won't teach you much about any of these things.

Well, if you are on a different system and the copy and paste might give different bytes for the same visual symbols, that would murder the password and you could not decrypt... Right?

Yes it would. But so what? If your code handles the input correctly,
why do you care if I can paste something into a website correctly?
--
Ben.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Fri Feb 20 14:07:28 2026

From Newsgroup: sci.crypt

On 2/20/2026 2:03 PM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 2/20/2026 1:01 PM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 2/20/2026 5:07 AM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

Password with unicode = can of oh shit's!, not just a can of worms? >>>>> They introduce some difficulties, but there are advantages as well.

People want to use passwords and pass phrases that are in their native >>>>> language. It can help people choose longer ones while still remembering >>>>> them. Also Unicode can increase the entropy without needing longer
passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
many places will either reject it or insist I add a digit, an upper and >>>>> lower case letter and a symbol!

Well let me try with just copy-and-paste between the quotes "raAEYc+-+rCR-i-+-#" as a
password for the plaintext, between the quotes:

"Ben Bacarisse"

Here is the link to some ciphertext... Can you decrypt it? copy paste in >>>> the password, click the decrypt button:

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7

I don't want to copy an post stuff into a link. What purpose does that
serve? You know what your code does so you know what will happen for a
particular sequence of input bytes.

I still think it scary because of what I read in this thread. Might work, >>>> might not??? Perhaps? Humm... ;^o

The trick to knowing if some software will do what you expect is to
understand the code, the inputs and the outputs. Copying and pasting
text won't teach you much about any of these things.

Well, if you are on a different system and the copy and paste might give
different bytes for the same visual symbols, that would murder the password >> and you could not decrypt... Right?

Yes it would. But so what? If your code handles the input correctly,
why do you care if I can paste something into a website correctly?

I wanted to see if you can post those visual symbols as a password and
make it not decrypt the plaintext correctly. It would give an example of
the potential problem we are discussing here? Ala, Rich and Marcel 's
point? My hash would not work if its not bit exact...
--- Synchronet 3.21b-Linux NewsLink 1.2

From Ben Bacarisse@ben@bsb.me.uk to sci.crypt on Sat Feb 21 00:01:42 2026

From Newsgroup: sci.crypt

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 2/20/2026 2:03 PM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 2/20/2026 1:01 PM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 2/20/2026 5:07 AM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

Password with unicode = can of oh shit's!, not just a can of worms? >>>>>> They introduce some difficulties, but there are advantages as well. >>>>>> People want to use passwords and pass phrases that are in their native >>>>>> language. It can help people choose longer ones while still remembering >>>>>> them. Also Unicode can increase the entropy without needing longer >>>>>> passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though

many places will either reject it or insist I add a digit, an upper and >>>>>> lower case letter and a symbol!

Well let me try with just copy-and-paste between the quotes "raAEYc+-+rCR-i-+-#" as a
password for the plaintext, between the quotes:

"Ben Bacarisse"

Here is the link to some ciphertext... Can you decrypt it? copy paste in >>>>> the password, click the decrypt button:

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7

I don't want to copy an post stuff into a link. What purpose does that >>>> serve? You know what your code does so you know what will happen for a >>>> particular sequence of input bytes.

I still think it scary because of what I read in this thread. Might work, >>>>> might not??? Perhaps? Humm... ;^o

The trick to knowing if some software will do what you expect is to
understand the code, the inputs and the outputs. Copying and pasting
text won't teach you much about any of these things.

Well, if you are on a different system and the copy and paste might give >>> different bytes for the same visual symbols, that would murder the password >>> and you could not decrypt... Right?

Yes it would. But so what? If your code handles the input correctly,
why do you care if I can paste something into a website correctly?

I wanted to see if you can post those visual symbols as a password and make it not decrypt the plaintext correctly. It would give an example of the potential problem we are discussing here? Ala, Rich and Marcel 's point? My hash would not work if its not bit exact...

I don't think any of the symbols I posted can illustrate Rich's point
and even if I had chosen ones that did, I would be surprised if copying
and pasting would trigger normalisation. But let's say it's a
concern... Then a good way to illustrate that point would be a website
that just showed what bytes are pasted into an input field. Having hash
code behind it just obscures the problem. The problem would not lie in
the hash but with the system that's doing the copy and paste (and the
details of how the HTML form in handled server side).
--
Ben.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Fri Feb 20 21:05:48 2026

From Newsgroup: sci.crypt

On 2/19/2026 8:05 PM, Rich wrote:

Ben Bacarisse <ben@bsb.me.uk> wrote:

Rich <rich@example.invalid> writes:

Ben Bacarisse <ben@bsb.me.uk> wrote:

Rich <rich@example.invalid> writes:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/16/2026 2:01 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

Features:

[...]

arbitrary Unicode passwords supported

BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>

I "think" my code handles it okay... Humm... thanks.

Unless you are actively "normalizing" per the Unicode Consortium >>>>>>> recommendation, then it is nearly a 100% chance it *does not* handle it >>>>>>> ok.

Well, I am trying to get the password, plaintext to convert to raw bytes >>>>>> for my algo to handle. Of course my C version handles anything

Unicode has plural underlying representations that all produce the same >>>>>>> "visible characters" for a surprisingly large number of code points. >>>>>>

Let me try a plaintext with some unicode that encrypts using the default >>>>>> key.

This seems to work:

Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

Of course, because you only have one "password".

You missed the entire point:

Unicode has plural underlying representations that all produce the >>>>> same "visible characters" for a surprisingly large number of code >>>>> points.

So (all made up as I'm not going to bother digging into Unicode to drag >>>>> out the actual byte sequences) the byte sequence:

I don't think it's helpful to use made-up encodings!

I wasn't going to put in the effort to actually do the research to find
a real one (plus another poster already provided a real one).

There are normalisation issues, but they are not at all as you
present them. I think you might be talking about completely
different Unicode encodings, like UTF-16 vs UTF-8.

Nope. For most of the accented characters (i.e., something like the a
with umlaut) there is both a code point that is directly "a with umlaut" >>> and there are combining characters (a combining umlaut) that can be
combined with a standard "letter a" code point, to also make "a with
umlaut". But the "code point" version will be a different byte
sequence (in whatever UTF you pick) than the "combining character plus
non-accented letter) version.

When I started reading your post that's what I thought you might be
getting at but you example -- pilcrow -- put a stop to that. Surely you
could not be talking about combining characters and diacriticals given
that example. So in trying to make sense of your example I decided you
must be talking about different encodings.

That's a fair critisism. I should have used an accented char as the
fake character. And yes, there is the additional 'encodings' issue
too.

Anyway, I should have stuck with that thought despite the example. Even
so, I would probably still have posted a "different encodings" remark as
that is just as much a problem (though one that is reducing over time)
as Unicode equivalence and compatibility.

Yes, I did not think to mention that. At the same time, it was aimed
at Chris, and combining a "different sets of codepoints represent same character" and "different UTF encodings convert same code points to
different byte strings" would likely have sent him off on an unrelated tangent.

But you are correct, UTF-8 has almost taken over, but the other
encodings do still show up occasionally, and a different encoding will
cause his "Unicode" passwords to also fail.

Agreed.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Fri Feb 20 21:54:22 2026

From Newsgroup: sci.crypt

On 2/19/2026 8:08 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/18/2026 7:45 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/17/2026 8:34 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/16/2026 2:01 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

Features:

[...]

arbitrary Unicode passwords supported

BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>

I "think" my code handles it okay... Humm... thanks.

Unless you are actively "normalizing" per the Unicode Consortium >>>>>>> recommendation, then it is nearly a 100% chance it *does not* handle it >>>>>>> ok.

Well, I am trying to get the password, plaintext to convert to raw bytes >>>>>> for my algo to handle. Of course my C version handles anything

Unicode has plural underlying representations that all produce the same >>>>>>> "visible characters" for a surprisingly large number of code points. >>>>>>

Let me try a plaintext with some unicode that encrypts using the default >>>>>> key.

This seems to work:

Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

Of course, because you only have one "password".

You missed the entire point:

Unicode has plural underlying representations that all produce the >>>>> same "visible characters" for a surprisingly large number of code >>>>> points.

So (all made up as I'm not going to bother digging into Unicode to drag >>>>> out the actual byte sequences) the byte sequence:

0x80 0x55 0x54

might display a Pilcrow symbol

And this byte sequence:

0x81 0x23 0x88 0x23

might *also* display as a Pilcrow symbol

So, you, on your Windows machine, enter a pilcrow as part of your
password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow. >>>>> Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's >>>>> encrypting.

Okay.

You send the result to a buddy using a Mac, and you tell him the
password is a Pilcrow symbol.

They enter a Pilcrow symbol however one would do so on a Mac, but MacOS >>>>> outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your >>>>> fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and >>>>> returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
password as you used to encrypt.

BAM! It will not decrypt. I really need to make a checkbox or something >>>> that denotes "hex byte mode"... This will allow plaintext and password >>>> textarea's to use hex bytes directly...

I did it on my local machine, not uploaded it to the web yet... I have, >>>> say a plaintext treated as hex bytes with the different encodings, hit >>>> decrypt, and they show as the correct symbols even though the hex
representations are different. Humm, need to ponder...

If in the end you want the ability of a user to be able to enter any
string of binary bytes then some form of "binary input" (hex, base64,
base85, etc.) will be needed if you expect them to type them in.

If you let them upload a file as the "key" then you don't need a hex
(or other) input mode.

I was thinking about that. It would not be "uploaded" to the server
because my code is client only. It would need to allow the user to look
for, I guess, any file. Then use its contents as a Password, raw bytes.
Should work. At least I should put in a warning about using the password
with unicode... Thanks! :^)

Password with unicode = can of oh shit's!, not just a can of worms?

Unicode passwords bring the trouble of "looks the same on screen" but
might be "very different set of bytes on the wire". And your hashes
only see the "bytes on the wire".

bingo. Tango! Delta. Niner! Over and out.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Fri Feb 20 22:25:24 2026

From Newsgroup: sci.crypt

On 2/15/2026 1:20 AM, Chris M. Thomasson wrote:
[...]

Wrt the online version:

Two bags of bytes on the local system (files), nothing sent to the
server. One bag of raw bytes for the password, and another one for the plaintext. I can do it. It's been a while since I blew through the
javascript trees... ;^D rofl.

Using unicode for the password was fucking moronic. Sorry. DrMoron? wow...

Sorry.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Fri Feb 20 22:41:50 2026

From Newsgroup: sci.crypt

On 2/20/2026 4:01 PM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 2/20/2026 2:03 PM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 2/20/2026 1:01 PM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 2/20/2026 5:07 AM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

Password with unicode = can of oh shit's!, not just a can of worms? >>>>>>> They introduce some difficulties, but there are advantages as well. >>>>>>> People want to use passwords and pass phrases that are in their native >>>>>>> language. It can help people choose longer ones while still remembering

them. Also Unicode can increase the entropy without needing longer >>>>>>> passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
many places will either reject it or insist I add a digit, an upper and >>>>>>> lower case letter and a symbol!

Well let me try with just copy-and-paste between the quotes "raAEYc+-+rCR-i-+-#" as a
password for the plaintext, between the quotes:

"Ben Bacarisse"

Here is the link to some ciphertext... Can you decrypt it? copy paste in >>>>>> the password, click the decrypt button:

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7

I don't want to copy an post stuff into a link. What purpose does that >>>>> serve? You know what your code does so you know what will happen for a >>>>> particular sequence of input bytes.

I still think it scary because of what I read in this thread. Might work,
might not??? Perhaps? Humm... ;^o

The trick to knowing if some software will do what you expect is to
understand the code, the inputs and the outputs. Copying and pasting >>>>> text won't teach you much about any of these things.

Well, if you are on a different system and the copy and paste might give >>>> different bytes for the same visual symbols, that would murder the password
and you could not decrypt... Right?

Yes it would. But so what? If your code handles the input correctly,
why do you care if I can paste something into a website correctly?

I wanted to see if you can post those visual symbols as a password and make >> it not decrypt the plaintext correctly. It would give an example of the
potential problem we are discussing here? Ala, Rich and Marcel 's point? My >> hash would not work if its not bit exact...

I don't think any of the symbols I posted can illustrate Rich's point
and even if I had chosen ones that did, I would be surprised if copying
and pasting would trigger normalisation. But let's say it's a
concern... Then a good way to illustrate that point would be a website
that just showed what bytes are pasted into an input field. Having hash
code behind it just obscures the problem. The problem would not lie in
the hash but with the system that's doing the copy and paste (and the
details of how the HTML form in handled server side).

Using unicode for a password is scary. DrMoron? Yikes!

https://youtu.be/q3qDESAvzh0?list=RDRijB8wnJCN0
--- Synchronet 3.21b-Linux NewsLink 1.2

From Rich@rich@example.invalid to sci.crypt on Sat Feb 21 18:43:05 2026

From Newsgroup: sci.crypt

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/19/2026 8:05 PM, Rich wrote:

Ben Bacarisse <ben@bsb.me.uk> wrote:

Rich <rich@example.invalid> writes:

Ben Bacarisse <ben@bsb.me.uk> wrote:

Rich <rich@example.invalid> writes:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/16/2026 2:01 PM, Rich wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 2/15/2026 3:54 AM, Marcel Logen wrote:

Chris M. Thomasson in sci.crypt:

Features:

[...]

arbitrary Unicode passwords supported

BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>

I "think" my code handles it okay... Humm... thanks.

Unless you are actively "normalizing" per the Unicode Consortium >>>>>>>> recommendation, then it is nearly a 100% chance it *does not* handle it
ok.

Well, I am trying to get the password, plaintext to convert to raw bytes
for my algo to handle. Of course my C version handles anything

Unicode has plural underlying representations that all produce the same
"visible characters" for a surprisingly large number of code points. >>>>>>>

Let me try a plaintext with some unicode that encrypts using the default
key.

This seems to work:

Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

Of course, because you only have one "password".

You missed the entire point:

Unicode has plural underlying representations that all produce the >>>>>> same "visible characters" for a surprisingly large number of code >>>>>> points.

So (all made up as I'm not going to bother digging into Unicode to drag >>>>>> out the actual byte sequences) the byte sequence:

I don't think it's helpful to use made-up encodings!

I wasn't going to put in the effort to actually do the research to find >>>> a real one (plus another poster already provided a real one).

There are normalisation issues, but they are not at all as you
present them. I think you might be talking about completely
different Unicode encodings, like UTF-16 vs UTF-8.

Nope. For most of the accented characters (i.e., something like the a >>>> with umlaut) there is both a code point that is directly "a with umlaut" >>>> and there are combining characters (a combining umlaut) that can be
combined with a standard "letter a" code point, to also make "a with
umlaut". But the "code point" version will be a different byte
sequence (in whatever UTF you pick) than the "combining character plus >>>> non-accented letter) version.

When I started reading your post that's what I thought you might be
getting at but you example -- pilcrow -- put a stop to that. Surely you >>> could not be talking about combining characters and diacriticals given
that example. So in trying to make sense of your example I decided you
must be talking about different encodings.

That's a fair critisism. I should have used an accented char as the
fake character. And yes, there is the additional 'encodings' issue
too.

Anyway, I should have stuck with that thought despite the example. Even >>> so, I would probably still have posted a "different encodings" remark as >>> that is just as much a problem (though one that is reducing over time)
as Unicode equivalence and compatibility.

Yes, I did not think to mention that. At the same time, it was aimed
at Chris, and combining a "different sets of codepoints represent same
character" and "different UTF encodings convert same code points to
different byte strings" would likely have sent him off on an unrelated
tangent.

But you are correct, UTF-8 has almost taken over, but the other
encodings do still show up occasionally, and a different encoding will
cause his "Unicode" passwords to also fail.

Agreed.

Thing is, provided you have some way to know which "encoding" is being provided, you can convert those differing "encodings" to a single
common one (i.e., if you get UTF-16 encoding, you can convert it to
UTF-8 if your input is expected to be UTF-8).

That is more straighforward than proper normalization (the Unicode spec
doc for normalizing is rather lengthy).

Reality is, if you want to support "unicode passwords" you do really
need to do both. Convert the supplied encoding to the encoding your
'hasher' accepts, *and* perform normalization on the code points within
the "password" you are provided.

So you might want to look for some library code that would handle both
for you.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Rich@rich@example.invalid to sci.crypt on Sat Feb 21 18:46:32 2026

From Newsgroup: sci.crypt

Ben Bacarisse <ben@bsb.me.uk> wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 2/20/2026 2:03 PM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 2/20/2026 1:01 PM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 2/20/2026 5:07 AM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

Password with unicode = can of oh shit's!, not just a can of worms? >>>>>>> They introduce some difficulties, but there are advantages as well. >>>>>>> People want to use passwords and pass phrases that are in their native >>>>>>> language. It can help people choose longer ones while still remembering

them. Also Unicode can increase the entropy without needing longer >>>>>>> passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
many places will either reject it or insist I add a digit, an upper and >>>>>>> lower case letter and a symbol!

Well let me try with just copy-and-paste between the quotes "raAEYc+-+rCR-i-+-#" as a
password for the plaintext, between the quotes:

"Ben Bacarisse"

Here is the link to some ciphertext... Can you decrypt it? copy paste in >>>>>> the password, click the decrypt button:

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7

I don't want to copy an post stuff into a link. What purpose does that >>>>> serve? You know what your code does so you know what will happen for a >>>>> particular sequence of input bytes.

I still think it scary because of what I read in this thread. Might work,
might not??? Perhaps? Humm... ;^o

The trick to knowing if some software will do what you expect is to
understand the code, the inputs and the outputs. Copying and pasting >>>>> text won't teach you much about any of these things.

Well, if you are on a different system and the copy and paste might give >>>> different bytes for the same visual symbols, that would murder the password
and you could not decrypt... Right?

Yes it would. But so what? If your code handles the input correctly,
why do you care if I can paste something into a website correctly?

I wanted to see if you can post those visual symbols as a password and make >> it not decrypt the plaintext correctly. It would give an example of the
potential problem we are discussing here? Ala, Rich and Marcel 's point? My >> hash would not work if its not bit exact...

I don't think any of the symbols I posted can illustrate Rich's point
and even if I had chosen ones that did, I would be surprised if copying
and pasting would trigger normalisation. But let's say it's a
concern... Then a good way to illustrate that point would be a website
that just showed what bytes are pasted into an input field. Having hash
code behind it just obscures the problem. The problem would not lie in
the hash but with the system that's doing the copy and paste (and the
details of how the HTML form in handled server side).

Richard Harnden, in Message-ID: <10n52nf$2t5lv$1@dont-email.me>, took
care of that part for me:

For UTF-8, "|i", for example, could be any of:

The precomposed character: c3a1 - 11000011 10100001,
"a" plus a combining-acute: 61cc81 - 01100001 11001100 10000001,
or some overlong encoding that takes up more bytes than it actually needs.

Now, proper confirming implementations should not issue "overlong"
encodings, so with the exception of pen-testers, Chris *should* be able
to ignore the "overlong" angle.

But Richard's post shows two different byte strings that produce the
identical visual character.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Ben Bacarisse@ben@bsb.me.uk to sci.crypt on Sun Feb 22 22:34:09 2026

From Newsgroup: sci.crypt

Rich <rich@example.invalid> writes:

Ben Bacarisse <ben@bsb.me.uk> wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 2/20/2026 2:03 PM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 2/20/2026 1:01 PM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 2/20/2026 5:07 AM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

Password with unicode = can of oh shit's!, not just a can of worms? >>>>>>>> They introduce some difficulties, but there are advantages as well. >>>>>>>> People want to use passwords and pass phrases that are in their native >>>>>>>> language. It can help people choose longer ones while still remembering

them. Also Unicode can increase the entropy without needing longer >>>>>>>> passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
many places will either reject it or insist I add a digit, an upper and
lower case letter and a symbol!

Well let me try with just copy-and-paste between the quotes
"raAEYc+-+rCR-i-+-#" as a
password for the plaintext, between the quotes:

"Ben Bacarisse"

Here is the link to some ciphertext... Can you decrypt it? copy paste in
the password, click the decrypt button:

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7

I don't want to copy an post stuff into a link. What purpose does that >>>>>> serve? You know what your code does so you know what will happen for a >>>>>> particular sequence of input bytes.

I still think it scary because of what I read in this thread. Might work,
might not??? Perhaps? Humm... ;^o

The trick to knowing if some software will do what you expect is to >>>>>> understand the code, the inputs and the outputs. Copying and pasting >>>>>> text won't teach you much about any of these things.

Well, if you are on a different system and the copy and paste might give >>>>> different bytes for the same visual symbols, that would murder the password
and you could not decrypt... Right?

Yes it would. But so what? If your code handles the input correctly, >>>> why do you care if I can paste something into a website correctly?

I wanted to see if you can post those visual symbols as a password and make >>> it not decrypt the plaintext correctly. It would give an example of the
potential problem we are discussing here? Ala, Rich and Marcel 's point? My >>> hash would not work if its not bit exact...

I don't think any of the symbols I posted can illustrate Rich's point
and even if I had chosen ones that did, I would be surprised if copying
and pasting would trigger normalisation. But let's say it's a
concern... Then a good way to illustrate that point would be a website
that just showed what bytes are pasted into an input field. Having hash
code behind it just obscures the problem. The problem would not lie in
the hash but with the system that's doing the copy and paste (and the
details of how the HTML form in handled server side).

Richard Harnden, in Message-ID: <10n52nf$2t5lv$1@dont-email.me>, took
care of that part for me:

For UTF-8, "|i", for example, could be any of:

The precomposed character: c3a1 - 11000011 10100001,
"a" plus a combining-acute: 61cc81 - 01100001 11001100 10000001,
or some overlong encoding that takes up more bytes than it actually
needs.

I'm not sure why you are reposting this. My point to Chris was that his website was not a good way to test if copy and past is working as he
expects since it does show the input bytes and just adds an extra level
of complexity to the test.

Now, proper confirming implementations should not issue "overlong" encodings, so with the exception of pen-testers, Chris *should* be able
to ignore the "overlong" angle.

There are cases where overlong encodings can cause problems and there is nothing to stop these being generated maliciously so the usual advice is
that programs should check for these on input.

But Richard's post shows two different byte strings that produce the identical visual character.

Yes. I didn't think this was in any doubt.
--
Ben.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Mon Feb 23 12:54:51 2026

From Newsgroup: sci.crypt

On 2/22/2026 2:34 PM, Ben Bacarisse wrote:

Rich <rich@example.invalid> writes:

Ben Bacarisse <ben@bsb.me.uk> wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 2/20/2026 2:03 PM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 2/20/2026 1:01 PM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

On 2/20/2026 5:07 AM, Ben Bacarisse wrote:

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes: >>>>>>>>>

Password with unicode = can of oh shit's!, not just a can of worms? >>>>>>>>> They introduce some difficulties, but there are advantages as well. >>>>>>>>> People want to use passwords and pass phrases that are in their native

language. It can help people choose longer ones while still remembering
them. Also Unicode can increase the entropy without needing longer >>>>>>>>> passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
many places will either reject it or insist I add a digit, an upper and
lower case letter and a symbol!

Well let me try with just copy-and-paste between the quotes
"raAEYc+-+rCR-i-+-#" as a
password for the plaintext, between the quotes:

"Ben Bacarisse"

Here is the link to some ciphertext... Can you decrypt it? copy paste in
the password, click the decrypt button:

https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7

I don't want to copy an post stuff into a link. What purpose does that >>>>>>> serve? You know what your code does so you know what will happen for a >>>>>>> particular sequence of input bytes.

I still think it scary because of what I read in this thread. Might work,
might not??? Perhaps? Humm... ;^o

The trick to knowing if some software will do what you expect is to >>>>>>> understand the code, the inputs and the outputs. Copying and pasting >>>>>>> text won't teach you much about any of these things.

Well, if you are on a different system and the copy and paste might give >>>>>> different bytes for the same visual symbols, that would murder the password
and you could not decrypt... Right?

Yes it would. But so what? If your code handles the input correctly, >>>>> why do you care if I can paste something into a website correctly?

I wanted to see if you can post those visual symbols as a password and make
it not decrypt the plaintext correctly. It would give an example of the >>>> potential problem we are discussing here? Ala, Rich and Marcel 's point? My
hash would not work if its not bit exact...

I don't think any of the symbols I posted can illustrate Rich's point
and even if I had chosen ones that did, I would be surprised if copying
and pasting would trigger normalisation. But let's say it's a
concern... Then a good way to illustrate that point would be a website
that just showed what bytes are pasted into an input field. Having hash >>> code behind it just obscures the problem. The problem would not lie in
the hash but with the system that's doing the copy and paste (and the
details of how the HTML form in handled server side).

Richard Harnden, in Message-ID: <10n52nf$2t5lv$1@dont-email.me>, took
care of that part for me:

For UTF-8, "|i", for example, could be any of:

The precomposed character: c3a1 - 11000011 10100001,
"a" plus a combining-acute: 61cc81 - 01100001 11001100 10000001,
or some overlong encoding that takes up more bytes than it actually
needs.

I'm not sure why you are reposting this. My point to Chris was that his website was not a good way to test if copy and past is working as he
expects since it does show the input bytes and just adds an extra level
of complexity to the test.

Now, proper confirming implementations should not issue "overlong"
encodings, so with the exception of pen-testers, Chris *should* be able
to ignore the "overlong" angle.

There are cases where overlong encodings can cause problems and there is nothing to stop these being generated maliciously so the usual advice is
that programs should check for these on input.

But Richard's post shows two different byte strings that produce the
identical visual character.

Yes. I didn't think this was in any doubt.

Right. I need to allow a user to see the raw hex bytes from password and
the plaintext.
--- Synchronet 3.21b-Linux NewsLink 1.2

Who's Online

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	65
Nodes:	6 (0 / 6)
Uptime:	05:14:11
Calls:	862
Files:	1,311
D/L today:	921 files (14,318M bytes)
Messages:	264,602

=?UTF-8?Q?Re=3A_Browser=E2=80=91only_HMAC=E2=80=91based_toy_cipher_?= =?UTF-8?Q?demo_=28DrMoron=29_=E2=80=94_now_live_with_URL=E2=80=91encoded_ci?= =?UTF-8?Q?phertext?=

Who's Online

System Info