Chris M. Thomasson in sci.crypt:
Features:[...]
arbitrary Unicode passwords supported
BTW: Consider to read <https://www.unicode.org/faq/normalization>.
<https://www.unicode.org/faq/normalization#1> <https://www.unicode.org/faq/normalization#2>
This is not meant for real secrets rCo just something I built for fun and
learning.
Feedback, critique, or curiosity welcome.
Unfortunately, I don't have enough time to look at your code, but
I wanted to at least make the above comment.
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
Features:[...]
arbitrary Unicode passwords supported
BTW: Consider to read <https://www.unicode.org/faq/normalization>.
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>
I "think" my code handles it okay... Humm... thanks.
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
Features:[...]
arbitrary Unicode passwords supported
BTW: Consider to read <https://www.unicode.org/faq/normalization>.
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>
I "think" my code handles it okay... Humm... thanks.
Unless you are actively "normalizing" per the Unicode Consortium recommendation, then it is nearly a 100% chance it *does not* handle it
ok.
Unicode has plural underlying representations that all produce the same "visible characters" for a surprisingly large number of code points.
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
Features:[...]
arbitrary Unicode passwords supported
BTW: Consider to read <https://www.unicode.org/faq/normalization>.
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>
I "think" my code handles it okay... Humm... thanks.
Unless you are actively "normalizing" per the Unicode Consortium recommendation, then it is nearly a 100% chance it *does not* handle it
ok.
Unicode has plural underlying representations that all produce the same "visible characters" for a surprisingly large number of code points.
On 2/16/2026 2:01 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
Features:[...]
arbitrary Unicode passwords supported
BTW: Consider to read <https://www.unicode.org/faq/normalization>.
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>
I "think" my code handles it okay... Humm... thanks.
Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it
ok.
Well, I am trying to get the password, plaintext to convert to raw bytes
for my algo to handle. Of course my C version handles anything
Unicode has plural underlying representations that all produce the same
"visible characters" for a surprisingly large number of code points.
Chris M. Thomasson in sci.crypt:
On 2/16/2026 2:01 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
Features:[...]
arbitrary Unicode passwords supported
BTW: Consider to read <https://www.unicode.org/faq/normalization>.
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>
I "think" my code handles it okay... Humm... thanks.
Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it
ok.
Well, I am trying to get the password, plaintext to convert to raw bytes
for my algo to handle. Of course my C version handles anything
Look at this example
(made in an UTF-8 terminal window (Linux & bash)):
| $ echo 61 cc 88 0a | xxd -r -p
| |n
| $ echo c3 a4 0a | xxd -r -p
| |n
The raw bytes sequences "0x61cc88" and "0xc3a4" both produce the
output "|n" (German "a umlaut").
Unicode has plural underlying representations that all produce the same
"visible characters" for a surprisingly large number of code points.
On 2/17/2026 2:41 PM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
On 2/16/2026 2:01 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
On 2/17/2026 3:25 PM, Chris M. Thomasson wrote:
On 2/17/2026 2:41 PM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
On 2/16/2026 2:01 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
[...]
I think TextEncoder().encode()/decode() helps me out here a bit. Still,
I need to put in a special plaintext box, or a radio button that treats
the existing plaintext as raw hex bytes. Any thoughts? Thanks.
[...]
IrCOve been working on a small educational cipher experiment called DrMoron.
On 2/16/2026 2:01 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
Features:[...]
arbitrary Unicode passwords supported
BTW: Consider to read <https://www.unicode.org/faq/normalization>.
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>
I "think" my code handles it okay... Humm... thanks.
Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it
ok.
Well, I am trying to get the password, plaintext to convert to raw bytes
for my algo to handle. Of course my C version handles anything
Unicode has plural underlying representations that all produce the same
"visible characters" for a surprisingly large number of code points.
Let me try a plaintext with some unicode that encrypts using the default key.
This seems to work:
Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d
I bet there is subtle flaw in there. Thanks!
Unicode has plural underlying representations that all produce the same >>>> "visible characters" for a surprisingly large number of code points.
I bet I missed something. It sucks to try to get a plaintext data in raw bytes. Or I am missing a much easier way.
On 2/17/2026 3:25 PM, Chris M. Thomasson wrote:
On 2/17/2026 2:41 PM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
On 2/16/2026 2:01 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
[...]
I think TextEncoder().encode()/decode() helps me out here a bit. Still,
I need to put in a special plaintext box, or a radio button that treats
the existing plaintext as raw hex bytes. Any thoughts? Thanks.
[...]
On 2/17/2026 3:35 PM, Chris M. Thomasson wrote:
On 2/17/2026 3:25 PM, Chris M. Thomasson wrote:
On 2/17/2026 2:41 PM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
On 2/16/2026 2:01 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
[...]
I think TextEncoder().encode()/decode() helps me out here a bit. Still,
I need to put in a special plaintext box, or a radio button that treats
the existing plaintext as raw hex bytes. Any thoughts? Thanks.
[...]
NFC normalization? So, I copy paste in two different unicode's from
website saying they are different, the paste can make them all use the
same code?
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/16/2026 2:01 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
Features:[...]
arbitrary Unicode passwords supported
BTW: Consider to read <https://www.unicode.org/faq/normalization>.
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>
I "think" my code handles it okay... Humm... thanks.
Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it
ok.
Well, I am trying to get the password, plaintext to convert to raw bytes
for my algo to handle. Of course my C version handles anything
Unicode has plural underlying representations that all produce the same
"visible characters" for a surprisingly large number of code points.
Let me try a plaintext with some unicode that encrypts using the default
key.
This seems to work:
Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d
Of course, because you only have one "password".
You missed the entire point:
Unicode has plural underlying representations that all produce the
same "visible characters" for a surprisingly large number of code
points.
So (all made up as I'm not going to bother digging into Unicode to drag
out the actual byte sequences) the byte sequence:
0x80 0x55 0x54
might display a Pilcrow symbol
And this byte sequence:
0x81 0x23 0x88 0x23
might *also* display as a Pilcrow symbol
So, you, on your Windows machine, enter a pilcrow as part of your
password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow. Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's encrypting.
You send the result to a buddy using a Mac, and you tell him the
password is a Pilcrow symbol.
They enter a Pilcrow symbol however one would do so on a Mac, but MacOS outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
password as you used to encrypt.
Rich <rich@example.invalid> writes:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/16/2026 2:01 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
Features:[...]
arbitrary Unicode passwords supported
BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>
I "think" my code handles it okay... Humm... thanks.
Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it >>>> ok.
Well, I am trying to get the password, plaintext to convert to raw bytes >>> for my algo to handle. Of course my C version handles anything
Unicode has plural underlying representations that all produce the same >>>> "visible characters" for a surprisingly large number of code points.
Let me try a plaintext with some unicode that encrypts using the default >>> key.
This seems to work:
Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d
Of course, because you only have one "password".
You missed the entire point:
Unicode has plural underlying representations that all produce the
same "visible characters" for a surprisingly large number of code
points.
So (all made up as I'm not going to bother digging into Unicode to drag
out the actual byte sequences) the byte sequence:
I don't think it's helpful to use made-up encodings! There are
normalisation issues, but they are not at all as you present them. I
think you might be talking about completely different Unicode encodings,
like UTF-16 vs UTF-8.
0x80 0x55 0x54
might display a Pilcrow symbol
And this byte sequence:
0x81 0x23 0x88 0x23
might *also* display as a Pilcrow symbol
So, you, on your Windows machine, enter a pilcrow as part of your
password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow.
Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's
encrypting.
You send the result to a buddy using a Mac, and you tell him the
password is a Pilcrow symbol.
They enter a Pilcrow symbol however one would do so on a Mac, but MacOS
outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
password as you used to encrypt.
This is not to do with normalisation but, I think, two systems using different Uniconde encodings. Im guessing because you don't give two
actual encodings that can represent a pilcrow.
Exactly the same problem has been around since the birth of computing.
If my password is "IBM" in EBCDIC and my DEC buddy types "IBM" into her PDP-11 it won't work. It won't even work between Windows and Mac
machines when using any characters that don't have the same single-byte encodings.
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/16/2026 2:01 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
Features:[...]
arbitrary Unicode passwords supported
BTW: Consider to read <https://www.unicode.org/faq/normalization>.
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>
I "think" my code handles it okay... Humm... thanks.
Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it
ok.
Well, I am trying to get the password, plaintext to convert to raw bytes
for my algo to handle. Of course my C version handles anything
Unicode has plural underlying representations that all produce the same
"visible characters" for a surprisingly large number of code points.
Let me try a plaintext with some unicode that encrypts using the default
key.
This seems to work:
Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d
Of course, because you only have one "password".
You missed the entire point:
Unicode has plural underlying representations that all produce the
same "visible characters" for a surprisingly large number of code
points.
So (all made up as I'm not going to bother digging into Unicode to drag
out the actual byte sequences) the byte sequence:
0x80 0x55 0x54
might display a Pilcrow symbol
And this byte sequence:
0x81 0x23 0x88 0x23
might *also* display as a Pilcrow symbol
So, you, on your Windows machine, enter a pilcrow as part of your
password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow.
Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's encrypting.
You send the result to a buddy using a Mac, and you tell him the
password is a Pilcrow symbol.
They enter a Pilcrow symbol however one would do so on a Mac, but MacOS outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
password as you used to encrypt.
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
I bet there is subtle flaw in there. Thanks!
There is, but it is not 'subtle' for anyone who has more than a passing familiarity with Unicode encodings. It's only subtle to those who
don't have much understanding of Unicode encodings.
Unicode has plural underlying representations that all produce the same >>>>> "visible characters" for a surprisingly large number of code points.
I bet I missed something. It sucks to try to get a plaintext data in raw
bytes. Or I am missing a much easier way.
You missed the example Marciel posted.
Two very different byte sequences, both that display the exact same character.
So someone that uses that "character" in a password is at the mercy of
which byte sequence their tool/OS uses when they enter that character.
On 18/02/2026 12:19, Ben Bacarisse wrote:
Rich <rich@example.invalid> writes:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:I don't think it's helpful to use made-up encodings! There are
On 2/16/2026 2:01 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
Features:[...]
arbitrary Unicode passwords supported
BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>
I "think" my code handles it okay... Humm... thanks.
Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it >>>>> ok.
Well, I am trying to get the password, plaintext to convert to raw bytes >>>> for my algo to handle. Of course my C version handles anything
Unicode has plural underlying representations that all produce the same >>>>> "visible characters" for a surprisingly large number of code points.
Let me try a plaintext with some unicode that encrypts using the default >>>> key.
This seems to work:
Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d
Of course, because you only have one "password".
You missed the entire point:
Unicode has plural underlying representations that all produce the
same "visible characters" for a surprisingly large number of code
points.
So (all made up as I'm not going to bother digging into Unicode to drag
out the actual byte sequences) the byte sequence:
normalisation issues, but they are not at all as you present them. I
think you might be talking about completely different Unicode encodings,
like UTF-16 vs UTF-8.
0x80 0x55 0x54This is not to do with normalisation but, I think, two systems using
might display a Pilcrow symbol
And this byte sequence:
0x81 0x23 0x88 0x23
might *also* display as a Pilcrow symbol
So, you, on your Windows machine, enter a pilcrow as part of your
password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow.
Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's
encrypting.
You send the result to a buddy using a Mac, and you tell him the
password is a Pilcrow symbol.
They enter a Pilcrow symbol however one would do so on a Mac, but MacOS
outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
password as you used to encrypt.
different Uniconde encodings. Im guessing because you don't give two
actual encodings that can represent a pilcrow.
Exactly the same problem has been around since the birth of computing.
If my password is "IBM" in EBCDIC and my DEC buddy types "IBM" into her
PDP-11 it won't work. It won't even work between Windows and Mac
machines when using any characters that don't have the same single-byte
encodings.
For UTF-8, "|i", for example, could be any of:
The precomposed character: c3a1 - 11000011 10100001,
"a" plus a combining-acute: 61cc81 - 01100001 11001100 10000001,
or some overlong encoding that takes up more bytes than it actually needs.
I think that's what Rich meant.
Rich <rich@example.invalid> writes:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/16/2026 2:01 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
Features:[...]
arbitrary Unicode passwords supported
BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>
I "think" my code handles it okay... Humm... thanks.
Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it >>>> ok.
Well, I am trying to get the password, plaintext to convert to raw bytes >>> for my algo to handle. Of course my C version handles anything
Unicode has plural underlying representations that all produce the same >>>> "visible characters" for a surprisingly large number of code points.
Let me try a plaintext with some unicode that encrypts using the default >>> key.
This seems to work:
Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d
Of course, because you only have one "password".
You missed the entire point:
Unicode has plural underlying representations that all produce the
same "visible characters" for a surprisingly large number of code
points.
So (all made up as I'm not going to bother digging into Unicode to drag
out the actual byte sequences) the byte sequence:
I don't think it's helpful to use made-up encodings!
There are normalisation issues, but they are not at all as you
present them. I think you might be talking about completely
different Unicode encodings, like UTF-16 vs UTF-8.
0x80 0x55 0x54
might display a Pilcrow symbol
And this byte sequence:
0x81 0x23 0x88 0x23
might *also* display as a Pilcrow symbol
So, you, on your Windows machine, enter a pilcrow as part of your
password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow.
Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's
encrypting.
You send the result to a buddy using a Mac, and you tell him the
password is a Pilcrow symbol.
They enter a Pilcrow symbol however one would do so on a Mac, but MacOS
outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
password as you used to encrypt.
This is not to do with normalisation but, I think, two systems using different Uniconde encodings.
Im guessing because you don't give two actual encodings that can
represent a pilcrow.
Exactly the same problem has been around since the birth of computing.
If my password is "IBM" in EBCDIC and my DEC buddy types "IBM" into her PDP-11 it won't work. It won't even work between Windows and Mac
machines when using any characters that don't have the same single-byte encodings.
On 18/02/2026 12:19, Ben Bacarisse wrote:
Rich <rich@example.invalid> writes:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/16/2026 2:01 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
Features:[...]
arbitrary Unicode passwords supported
BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>
I "think" my code handles it okay... Humm... thanks.
Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it >>>>> ok.
Well, I am trying to get the password, plaintext to convert to raw bytes >>>> for my algo to handle. Of course my C version handles anything
Unicode has plural underlying representations that all produce the same >>>>> "visible characters" for a surprisingly large number of code points.
Let me try a plaintext with some unicode that encrypts using the default >>>> key.
This seems to work:
Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d
Of course, because you only have one "password".
You missed the entire point:
Unicode has plural underlying representations that all produce the
same "visible characters" for a surprisingly large number of code
points.
So (all made up as I'm not going to bother digging into Unicode to drag
out the actual byte sequences) the byte sequence:
I don't think it's helpful to use made-up encodings! There are
normalisation issues, but they are not at all as you present them. I
think you might be talking about completely different Unicode encodings,
like UTF-16 vs UTF-8.
0x80 0x55 0x54
might display a Pilcrow symbol
And this byte sequence:
0x81 0x23 0x88 0x23
might *also* display as a Pilcrow symbol
So, you, on your Windows machine, enter a pilcrow as part of your
password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow.
Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's
encrypting.
You send the result to a buddy using a Mac, and you tell him the
password is a Pilcrow symbol.
They enter a Pilcrow symbol however one would do so on a Mac, but MacOS
outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
password as you used to encrypt.
This is not to do with normalisation but, I think, two systems using
different Uniconde encodings. Im guessing because you don't give two
actual encodings that can represent a pilcrow.
Exactly the same problem has been around since the birth of computing.
If my password is "IBM" in EBCDIC and my DEC buddy types "IBM" into her
PDP-11 it won't work. It won't even work between Windows and Mac
machines when using any characters that don't have the same single-byte
encodings.
For UTF-8, "|i", for example, could be any of:
The precomposed character: c3a1 - 11000011 10100001,
"a" plus a combining-acute: 61cc81 - 01100001 11001100 10000001,
or some overlong encoding that takes up more bytes than it actually needs.
I think that's what Rich meant.
Richard Harnden <richard.nospam@gmail.invalid> writes:
On 18/02/2026 12:19, Ben Bacarisse wrote:
Rich <rich@example.invalid> writes:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:I don't think it's helpful to use made-up encodings! There are
On 2/16/2026 2:01 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
Features:[...]
arbitrary Unicode passwords supported
BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>
I "think" my code handles it okay... Humm... thanks.
Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it >>>>>> ok.
Well, I am trying to get the password, plaintext to convert to raw bytes >>>>> for my algo to handle. Of course my C version handles anything
Unicode has plural underlying representations that all produce the same >>>>>> "visible characters" for a surprisingly large number of code points. >>>>>Let me try a plaintext with some unicode that encrypts using the default >>>>> key.
This seems to work:
Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d
Of course, because you only have one "password".
You missed the entire point:
Unicode has plural underlying representations that all produce the >>>> same "visible characters" for a surprisingly large number of code
points.
So (all made up as I'm not going to bother digging into Unicode to drag >>>> out the actual byte sequences) the byte sequence:
normalisation issues, but they are not at all as you present them. I
think you might be talking about completely different Unicode encodings, >>> like UTF-16 vs UTF-8.
0x80 0x55 0x54This is not to do with normalisation but, I think, two systems using
might display a Pilcrow symbol
And this byte sequence:
0x81 0x23 0x88 0x23
might *also* display as a Pilcrow symbol
So, you, on your Windows machine, enter a pilcrow as part of your
password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow. >>>> Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's
encrypting.
You send the result to a buddy using a Mac, and you tell him the
password is a Pilcrow symbol.
They enter a Pilcrow symbol however one would do so on a Mac, but MacOS >>>> outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
password as you used to encrypt.
different Uniconde encodings. Im guessing because you don't give two
actual encodings that can represent a pilcrow.
Exactly the same problem has been around since the birth of computing.
If my password is "IBM" in EBCDIC and my DEC buddy types "IBM" into her
PDP-11 it won't work. It won't even work between Windows and Mac
machines when using any characters that don't have the same single-byte
encodings.
For UTF-8, "|i", for example, could be any of:
The precomposed character: c3a1 - 11000011 10100001,
"a" plus a combining-acute: 61cc81 - 01100001 11001100 10000001,
or some overlong encoding that takes up more bytes than it actually needs. >>
I think that's what Rich meant.
My first thought, but then why would pilcrow have different encodings
between Windows and Mac?
That's an entirely different scenario. That's why I would have
preferred an example, not a couple of made-up sequences that are not
Unicode encodings at all!
BTW, all programs should reject over-long encodings in input and none
should ever generate any as output so, again, it does not for the
Windows/Mac scenario. A system that generated one when the user is
typing a password is seriously broken.
On 2/17/2026 8:34 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/16/2026 2:01 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
Features:[...]
arbitrary Unicode passwords supported
BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>
I "think" my code handles it okay... Humm... thanks.
Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it >>>> ok.
Well, I am trying to get the password, plaintext to convert to raw bytes >>> for my algo to handle. Of course my C version handles anything
Unicode has plural underlying representations that all produce the same >>>> "visible characters" for a surprisingly large number of code points.
Let me try a plaintext with some unicode that encrypts using the default >>> key.
This seems to work:
Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d
Of course, because you only have one "password".
You missed the entire point:
Unicode has plural underlying representations that all produce the
same "visible characters" for a surprisingly large number of code
points.
So (all made up as I'm not going to bother digging into Unicode to drag
out the actual byte sequences) the byte sequence:
0x80 0x55 0x54
might display a Pilcrow symbol
And this byte sequence:
0x81 0x23 0x88 0x23
might *also* display as a Pilcrow symbol
So, you, on your Windows machine, enter a pilcrow as part of your
password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow.
Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's
encrypting.
Okay.
You send the result to a buddy using a Mac, and you tell him the
password is a Pilcrow symbol.
They enter a Pilcrow symbol however one would do so on a Mac, but MacOS
outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
password as you used to encrypt.
BAM! It will not decrypt. I really need to make a checkbox or something
that denotes "hex byte mode"... This will allow plaintext and password textarea's to use hex bytes directly...
I did it on my local machine, not uploaded it to the web yet... I have,
say a plaintext treated as hex bytes with the different encodings, hit decrypt, and they show as the correct symbols even though the hex representations are different. Humm, need to ponder...
Ben Bacarisse <ben@bsb.me.uk> wrote:
Rich <rich@example.invalid> writes:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/16/2026 2:01 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
Features:[...]
arbitrary Unicode passwords supported
BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>
I "think" my code handles it okay... Humm... thanks.
Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it >>>>> ok.
Well, I am trying to get the password, plaintext to convert to raw bytes >>>> for my algo to handle. Of course my C version handles anything
Unicode has plural underlying representations that all produce the same >>>>> "visible characters" for a surprisingly large number of code points.
Let me try a plaintext with some unicode that encrypts using the default >>>> key.
This seems to work:
Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d
Of course, because you only have one "password".
You missed the entire point:
Unicode has plural underlying representations that all produce the
same "visible characters" for a surprisingly large number of code
points.
So (all made up as I'm not going to bother digging into Unicode to drag >>> out the actual byte sequences) the byte sequence:
I don't think it's helpful to use made-up encodings!
I wasn't going to put in the effort to actually do the research to find
a real one (plus another poster already provided a real one).
There are normalisation issues, but they are not at all as you
present them. I think you might be talking about completely
different Unicode encodings, like UTF-16 vs UTF-8.
Nope. For most of the accented characters (i.e., something like the a
with umlaut) there is both a code point that is directly "a with umlaut"
and there are combining characters (a combining umlaut) that can be
combined with a standard "letter a" code point, to also make "a with umlaut". But the "code point" version will be a different byte
sequence (in whatever UTF you pick) than the "combining character plus non-accented letter) version.
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/17/2026 8:34 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/16/2026 2:01 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
Features:[...]
arbitrary Unicode passwords supported
BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>
I "think" my code handles it okay... Humm... thanks.
Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it >>>>> ok.
Well, I am trying to get the password, plaintext to convert to raw bytes >>>> for my algo to handle. Of course my C version handles anything
Unicode has plural underlying representations that all produce the same >>>>> "visible characters" for a surprisingly large number of code points.
Let me try a plaintext with some unicode that encrypts using the default >>>> key.
This seems to work:
Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d
Of course, because you only have one "password".
You missed the entire point:
Unicode has plural underlying representations that all produce the
same "visible characters" for a surprisingly large number of code
points.
So (all made up as I'm not going to bother digging into Unicode to drag
out the actual byte sequences) the byte sequence:
0x80 0x55 0x54
might display a Pilcrow symbol
And this byte sequence:
0x81 0x23 0x88 0x23
might *also* display as a Pilcrow symbol
So, you, on your Windows machine, enter a pilcrow as part of your
password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow.
Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's
encrypting.
Okay.
You send the result to a buddy using a Mac, and you tell him the
password is a Pilcrow symbol.
They enter a Pilcrow symbol however one would do so on a Mac, but MacOS
outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
password as you used to encrypt.
BAM! It will not decrypt. I really need to make a checkbox or something
that denotes "hex byte mode"... This will allow plaintext and password
textarea's to use hex bytes directly...
I did it on my local machine, not uploaded it to the web yet... I have,
say a plaintext treated as hex bytes with the different encodings, hit
decrypt, and they show as the correct symbols even though the hex
representations are different. Humm, need to ponder...
If in the end you want the ability of a user to be able to enter any
string of binary bytes then some form of "binary input" (hex, base64,
base85, etc.) will be needed if you expect them to type them in.
If you let them upload a file as the "key" then you don't need a hex
(or other) input mode.
Rich <rich@example.invalid> writes:
Ben Bacarisse <ben@bsb.me.uk> wrote:
Rich <rich@example.invalid> writes:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/16/2026 2:01 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
Features:[...]
arbitrary Unicode passwords supported
BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>
I "think" my code handles it okay... Humm... thanks.
Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it >>>>>> ok.
Well, I am trying to get the password, plaintext to convert to raw bytes >>>>> for my algo to handle. Of course my C version handles anything
Unicode has plural underlying representations that all produce the same >>>>>> "visible characters" for a surprisingly large number of code points. >>>>>Let me try a plaintext with some unicode that encrypts using the default >>>>> key.
This seems to work:
Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d
Of course, because you only have one "password".
You missed the entire point:
Unicode has plural underlying representations that all produce the >>>> same "visible characters" for a surprisingly large number of code
points.
So (all made up as I'm not going to bother digging into Unicode to drag >>>> out the actual byte sequences) the byte sequence:
I don't think it's helpful to use made-up encodings!
I wasn't going to put in the effort to actually do the research to find
a real one (plus another poster already provided a real one).
There are normalisation issues, but they are not at all as you
present them. I think you might be talking about completely
different Unicode encodings, like UTF-16 vs UTF-8.
Nope. For most of the accented characters (i.e., something like the a
with umlaut) there is both a code point that is directly "a with umlaut"
and there are combining characters (a combining umlaut) that can be
combined with a standard "letter a" code point, to also make "a with
umlaut". But the "code point" version will be a different byte
sequence (in whatever UTF you pick) than the "combining character plus
non-accented letter) version.
When I started reading your post that's what I thought you might be
getting at but you example -- pilcrow -- put a stop to that. Surely you could not be talking about combining characters and diacriticals given
that example. So in trying to make sense of your example I decided you
must be talking about different encodings.
Anyway, I should have stuck with that thought despite the example. Even
so, I would probably still have posted a "different encodings" remark as
that is just as much a problem (though one that is reducing over time)
as Unicode equivalence and compatibility.
On 2/18/2026 7:45 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/17/2026 8:34 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/16/2026 2:01 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
Features:[...]
arbitrary Unicode passwords supported
BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>
I "think" my code handles it okay... Humm... thanks.
Unless you are actively "normalizing" per the Unicode Consortium
recommendation, then it is nearly a 100% chance it *does not* handle it >>>>>> ok.
Well, I am trying to get the password, plaintext to convert to raw bytes >>>>> for my algo to handle. Of course my C version handles anything
Unicode has plural underlying representations that all produce the same >>>>>> "visible characters" for a surprisingly large number of code points. >>>>>Let me try a plaintext with some unicode that encrypts using the default >>>>> key.
This seems to work:
Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d
Of course, because you only have one "password".
You missed the entire point:
Unicode has plural underlying representations that all produce the >>>> same "visible characters" for a surprisingly large number of code >>>> points.
So (all made up as I'm not going to bother digging into Unicode to drag >>>> out the actual byte sequences) the byte sequence:
0x80 0x55 0x54
might display a Pilcrow symbol
And this byte sequence:
0x81 0x23 0x88 0x23
might *also* display as a Pilcrow symbol
So, you, on your Windows machine, enter a pilcrow as part of your
password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow. >>>> Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's
encrypting.
Okay.
You send the result to a buddy using a Mac, and you tell him the
password is a Pilcrow symbol.
They enter a Pilcrow symbol however one would do so on a Mac, but MacOS >>>> outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
password as you used to encrypt.
BAM! It will not decrypt. I really need to make a checkbox or something
that denotes "hex byte mode"... This will allow plaintext and password
textarea's to use hex bytes directly...
I did it on my local machine, not uploaded it to the web yet... I have,
say a plaintext treated as hex bytes with the different encodings, hit
decrypt, and they show as the correct symbols even though the hex
representations are different. Humm, need to ponder...
If in the end you want the ability of a user to be able to enter any
string of binary bytes then some form of "binary input" (hex, base64,
base85, etc.) will be needed if you expect them to type them in.
If you let them upload a file as the "key" then you don't need a hex
(or other) input mode.
I was thinking about that. It would not be "uploaded" to the server
because my code is client only. It would need to allow the user to look
for, I guess, any file. Then use its contents as a Password, raw bytes. Should work. At least I should put in a warning about using the password with unicode... Thanks! :^)
Password with unicode = can of oh shit's!, not just a can of worms?
Password with unicode = can of oh shit's!, not just a can of worms?
On 2/20/2026 5:07 AM, Ben Bacarisse wrote:
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
Password with unicode = can of oh shit's!, not just a can of worms?They introduce some difficulties, but there are advantages as well.
People want to use passwords and pass phrases that are in their native
language. It can help people choose longer ones while still remembering
them. Also Unicode can increase the entropy without needing longer
passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
many places will either reject it or insist I add a digit, an upper and
lower case letter and a symbol!
Well let me try with just copy-and-paste between the quotes "raAEYc+-+rCR-i-+-#" as a
password for the plaintext, between the quotes:
"Ben Bacarisse"
Here is the link to some ciphertext... Can you decrypt it? copy paste in
the password, click the decrypt button:
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7
I still think it scary because of what I read in this thread. Might work, might not??? Perhaps? Humm... ;^o
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
Password with unicode = can of oh shit's!, not just a can of worms?
They introduce some difficulties, but there are advantages as well.
People want to use passwords and pass phrases that are in their native language. It can help people choose longer ones while still remembering them. Also Unicode can increase the entropy without needing longer passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
many places will either reject it or insist I add a digit, an upper and
lower case letter and a symbol!
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
On 2/20/2026 5:07 AM, Ben Bacarisse wrote:
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
Password with unicode = can of oh shit's!, not just a can of worms?They introduce some difficulties, but there are advantages as well.
People want to use passwords and pass phrases that are in their native
language. It can help people choose longer ones while still remembering >>> them. Also Unicode can increase the entropy without needing longer
passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
many places will either reject it or insist I add a digit, an upper and
lower case letter and a symbol!
Well let me try with just copy-and-paste between the quotes "raAEYc+-+rCR-i-+-#" as a
password for the plaintext, between the quotes:
"Ben Bacarisse"
Here is the link to some ciphertext... Can you decrypt it? copy paste in
the password, click the decrypt button:
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7
I don't want to copy an post stuff into a link. What purpose does that serve? You know what your code does so you know what will happen for a particular sequence of input bytes.
I still think it scary because of what I read in this thread. Might work,
might not??? Perhaps? Humm... ;^o
The trick to knowing if some software will do what you expect is to understand the code, the inputs and the outputs. Copying and pasting
text won't teach you much about any of these things.
On 2/20/2026 1:01 PM, Ben Bacarisse wrote:
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
On 2/20/2026 5:07 AM, Ben Bacarisse wrote:I don't want to copy an post stuff into a link. What purpose does that
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
Password with unicode = can of oh shit's!, not just a can of worms?They introduce some difficulties, but there are advantages as well.
People want to use passwords and pass phrases that are in their native >>>> language. It can help people choose longer ones while still remembering >>>> them. Also Unicode can increase the entropy without needing longer
passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
many places will either reject it or insist I add a digit, an upper and >>>> lower case letter and a symbol!
Well let me try with just copy-and-paste between the quotes "raAEYc+-+rCR-i-+-#" as a
password for the plaintext, between the quotes:
"Ben Bacarisse"
Here is the link to some ciphertext... Can you decrypt it? copy paste in >>> the password, click the decrypt button:
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7
serve? You know what your code does so you know what will happen for a
particular sequence of input bytes.
I still think it scary because of what I read in this thread. Might work, >>> might not??? Perhaps? Humm... ;^oThe trick to knowing if some software will do what you expect is to
understand the code, the inputs and the outputs. Copying and pasting
text won't teach you much about any of these things.
Well, if you are on a different system and the copy and paste might give different bytes for the same visual symbols, that would murder the password and you could not decrypt... Right?
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
On 2/20/2026 1:01 PM, Ben Bacarisse wrote:
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
On 2/20/2026 5:07 AM, Ben Bacarisse wrote:I don't want to copy an post stuff into a link. What purpose does that
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
Password with unicode = can of oh shit's!, not just a can of worms? >>>>> They introduce some difficulties, but there are advantages as well.People want to use passwords and pass phrases that are in their native >>>>> language. It can help people choose longer ones while still remembering >>>>> them. Also Unicode can increase the entropy without needing longer
passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
many places will either reject it or insist I add a digit, an upper and >>>>> lower case letter and a symbol!
Well let me try with just copy-and-paste between the quotes "raAEYc+-+rCR-i-+-#" as a
password for the plaintext, between the quotes:
"Ben Bacarisse"
Here is the link to some ciphertext... Can you decrypt it? copy paste in >>>> the password, click the decrypt button:
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7
serve? You know what your code does so you know what will happen for a
particular sequence of input bytes.
I still think it scary because of what I read in this thread. Might work, >>>> might not??? Perhaps? Humm... ;^oThe trick to knowing if some software will do what you expect is to
understand the code, the inputs and the outputs. Copying and pasting
text won't teach you much about any of these things.
Well, if you are on a different system and the copy and paste might give
different bytes for the same visual symbols, that would murder the password >> and you could not decrypt... Right?
Yes it would. But so what? If your code handles the input correctly,
why do you care if I can paste something into a website correctly?
On 2/20/2026 2:03 PM, Ben Bacarisse wrote:
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
On 2/20/2026 1:01 PM, Ben Bacarisse wrote:Yes it would. But so what? If your code handles the input correctly,
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
On 2/20/2026 5:07 AM, Ben Bacarisse wrote:I don't want to copy an post stuff into a link. What purpose does that >>>> serve? You know what your code does so you know what will happen for a >>>> particular sequence of input bytes.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
Password with unicode = can of oh shit's!, not just a can of worms? >>>>>> They introduce some difficulties, but there are advantages as well. >>>>>> People want to use passwords and pass phrases that are in their native >>>>>> language. It can help people choose longer ones while still remembering >>>>>> them. Also Unicode can increase the entropy without needing longer >>>>>> passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, thoughmany places will either reject it or insist I add a digit, an upper and >>>>>> lower case letter and a symbol!
Well let me try with just copy-and-paste between the quotes "raAEYc+-+rCR-i-+-#" as a
password for the plaintext, between the quotes:
"Ben Bacarisse"
Here is the link to some ciphertext... Can you decrypt it? copy paste in >>>>> the password, click the decrypt button:
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7
I still think it scary because of what I read in this thread. Might work, >>>>> might not??? Perhaps? Humm... ;^oThe trick to knowing if some software will do what you expect is to
understand the code, the inputs and the outputs. Copying and pasting
text won't teach you much about any of these things.
Well, if you are on a different system and the copy and paste might give >>> different bytes for the same visual symbols, that would murder the password >>> and you could not decrypt... Right?
why do you care if I can paste something into a website correctly?
I wanted to see if you can post those visual symbols as a password and make it not decrypt the plaintext correctly. It would give an example of the potential problem we are discussing here? Ala, Rich and Marcel 's point? My hash would not work if its not bit exact...
Ben Bacarisse <ben@bsb.me.uk> wrote:
Rich <rich@example.invalid> writes:
Ben Bacarisse <ben@bsb.me.uk> wrote:
Rich <rich@example.invalid> writes:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/16/2026 2:01 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
Features:[...]
arbitrary Unicode passwords supported
BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>
I "think" my code handles it okay... Humm... thanks.
Unless you are actively "normalizing" per the Unicode Consortium >>>>>>> recommendation, then it is nearly a 100% chance it *does not* handle it >>>>>>> ok.
Well, I am trying to get the password, plaintext to convert to raw bytes >>>>>> for my algo to handle. Of course my C version handles anything
Unicode has plural underlying representations that all produce the same >>>>>>> "visible characters" for a surprisingly large number of code points. >>>>>>Let me try a plaintext with some unicode that encrypts using the default >>>>>> key.
This seems to work:
Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d
Of course, because you only have one "password".
You missed the entire point:
Unicode has plural underlying representations that all produce the >>>>> same "visible characters" for a surprisingly large number of code >>>>> points.
So (all made up as I'm not going to bother digging into Unicode to drag >>>>> out the actual byte sequences) the byte sequence:
I don't think it's helpful to use made-up encodings!
I wasn't going to put in the effort to actually do the research to find
a real one (plus another poster already provided a real one).
There are normalisation issues, but they are not at all as you
present them. I think you might be talking about completely
different Unicode encodings, like UTF-16 vs UTF-8.
Nope. For most of the accented characters (i.e., something like the a
with umlaut) there is both a code point that is directly "a with umlaut" >>> and there are combining characters (a combining umlaut) that can be
combined with a standard "letter a" code point, to also make "a with
umlaut". But the "code point" version will be a different byte
sequence (in whatever UTF you pick) than the "combining character plus
non-accented letter) version.
When I started reading your post that's what I thought you might be
getting at but you example -- pilcrow -- put a stop to that. Surely you
could not be talking about combining characters and diacriticals given
that example. So in trying to make sense of your example I decided you
must be talking about different encodings.
That's a fair critisism. I should have used an accented char as the
fake character. And yes, there is the additional 'encodings' issue
too.
Anyway, I should have stuck with that thought despite the example. Even
so, I would probably still have posted a "different encodings" remark as
that is just as much a problem (though one that is reducing over time)
as Unicode equivalence and compatibility.
Yes, I did not think to mention that. At the same time, it was aimed
at Chris, and combining a "different sets of codepoints represent same character" and "different UTF encodings convert same code points to
different byte strings" would likely have sent him off on an unrelated tangent.
But you are correct, UTF-8 has almost taken over, but the other
encodings do still show up occasionally, and a different encoding will
cause his "Unicode" passwords to also fail.
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/18/2026 7:45 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/17/2026 8:34 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/16/2026 2:01 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
Features:[...]
arbitrary Unicode passwords supported
BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>
I "think" my code handles it okay... Humm... thanks.
Unless you are actively "normalizing" per the Unicode Consortium >>>>>>> recommendation, then it is nearly a 100% chance it *does not* handle it >>>>>>> ok.
Well, I am trying to get the password, plaintext to convert to raw bytes >>>>>> for my algo to handle. Of course my C version handles anything
Unicode has plural underlying representations that all produce the same >>>>>>> "visible characters" for a surprisingly large number of code points. >>>>>>Let me try a plaintext with some unicode that encrypts using the default >>>>>> key.
This seems to work:
Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d
Of course, because you only have one "password".
You missed the entire point:
Unicode has plural underlying representations that all produce the >>>>> same "visible characters" for a surprisingly large number of code >>>>> points.
So (all made up as I'm not going to bother digging into Unicode to drag >>>>> out the actual byte sequences) the byte sequence:
0x80 0x55 0x54
might display a Pilcrow symbol
And this byte sequence:
0x81 0x23 0x88 0x23
might *also* display as a Pilcrow symbol
So, you, on your Windows machine, enter a pilcrow as part of your
password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow. >>>>> Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's >>>>> encrypting.
Okay.
You send the result to a buddy using a Mac, and you tell him the
password is a Pilcrow symbol.
They enter a Pilcrow symbol however one would do so on a Mac, but MacOS >>>>> outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your >>>>> fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and >>>>> returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
password as you used to encrypt.
BAM! It will not decrypt. I really need to make a checkbox or something >>>> that denotes "hex byte mode"... This will allow plaintext and password >>>> textarea's to use hex bytes directly...
I did it on my local machine, not uploaded it to the web yet... I have, >>>> say a plaintext treated as hex bytes with the different encodings, hit >>>> decrypt, and they show as the correct symbols even though the hex
representations are different. Humm, need to ponder...
If in the end you want the ability of a user to be able to enter any
string of binary bytes then some form of "binary input" (hex, base64,
base85, etc.) will be needed if you expect them to type them in.
If you let them upload a file as the "key" then you don't need a hex
(or other) input mode.
I was thinking about that. It would not be "uploaded" to the server
because my code is client only. It would need to allow the user to look
for, I guess, any file. Then use its contents as a Password, raw bytes.
Should work. At least I should put in a warning about using the password
with unicode... Thanks! :^)
Password with unicode = can of oh shit's!, not just a can of worms?
Unicode passwords bring the trouble of "looks the same on screen" but
might be "very different set of bytes on the wire". And your hashes
only see the "bytes on the wire".
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
On 2/20/2026 2:03 PM, Ben Bacarisse wrote:
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
On 2/20/2026 1:01 PM, Ben Bacarisse wrote:Yes it would. But so what? If your code handles the input correctly,
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
On 2/20/2026 5:07 AM, Ben Bacarisse wrote:I don't want to copy an post stuff into a link. What purpose does that >>>>> serve? You know what your code does so you know what will happen for a >>>>> particular sequence of input bytes.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
Password with unicode = can of oh shit's!, not just a can of worms? >>>>>>> They introduce some difficulties, but there are advantages as well. >>>>>>> People want to use passwords and pass phrases that are in their native >>>>>>> language. It can help people choose longer ones while still rememberingthem. Also Unicode can increase the entropy without needing longer >>>>>>> passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
many places will either reject it or insist I add a digit, an upper and >>>>>>> lower case letter and a symbol!
Well let me try with just copy-and-paste between the quotes "raAEYc+-+rCR-i-+-#" as a
password for the plaintext, between the quotes:
"Ben Bacarisse"
Here is the link to some ciphertext... Can you decrypt it? copy paste in >>>>>> the password, click the decrypt button:
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7
I still think it scary because of what I read in this thread. Might work,The trick to knowing if some software will do what you expect is to
might not??? Perhaps? Humm... ;^o
understand the code, the inputs and the outputs. Copying and pasting >>>>> text won't teach you much about any of these things.
Well, if you are on a different system and the copy and paste might give >>>> different bytes for the same visual symbols, that would murder the password
and you could not decrypt... Right?
why do you care if I can paste something into a website correctly?
I wanted to see if you can post those visual symbols as a password and make >> it not decrypt the plaintext correctly. It would give an example of the
potential problem we are discussing here? Ala, Rich and Marcel 's point? My >> hash would not work if its not bit exact...
I don't think any of the symbols I posted can illustrate Rich's point
and even if I had chosen ones that did, I would be surprised if copying
and pasting would trigger normalisation. But let's say it's a
concern... Then a good way to illustrate that point would be a website
that just showed what bytes are pasted into an input field. Having hash
code behind it just obscures the problem. The problem would not lie in
the hash but with the system that's doing the copy and paste (and the
details of how the HTML form in handled server side).
On 2/19/2026 8:05 PM, Rich wrote:
Ben Bacarisse <ben@bsb.me.uk> wrote:
Rich <rich@example.invalid> writes:
Ben Bacarisse <ben@bsb.me.uk> wrote:
Rich <rich@example.invalid> writes:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/16/2026 2:01 PM, Rich wrote:
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
On 2/15/2026 3:54 AM, Marcel Logen wrote:
Chris M. Thomasson in sci.crypt:
Features:[...]
arbitrary Unicode passwords supported
BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>>>>
<https://www.unicode.org/faq/normalization#1>
<https://www.unicode.org/faq/normalization#2>
I "think" my code handles it okay... Humm... thanks.
Unless you are actively "normalizing" per the Unicode Consortium >>>>>>>> recommendation, then it is nearly a 100% chance it *does not* handle it
ok.
Well, I am trying to get the password, plaintext to convert to raw bytes
for my algo to handle. Of course my C version handles anything
Unicode has plural underlying representations that all produce the sameLet me try a plaintext with some unicode that encrypts using the default
"visible characters" for a surprisingly large number of code points. >>>>>>>
key.
This seems to work:
Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d
Of course, because you only have one "password".
You missed the entire point:
Unicode has plural underlying representations that all produce the >>>>>> same "visible characters" for a surprisingly large number of code >>>>>> points.
So (all made up as I'm not going to bother digging into Unicode to drag >>>>>> out the actual byte sequences) the byte sequence:
I don't think it's helpful to use made-up encodings!
I wasn't going to put in the effort to actually do the research to find >>>> a real one (plus another poster already provided a real one).
There are normalisation issues, but they are not at all as you
present them. I think you might be talking about completely
different Unicode encodings, like UTF-16 vs UTF-8.
Nope. For most of the accented characters (i.e., something like the a >>>> with umlaut) there is both a code point that is directly "a with umlaut" >>>> and there are combining characters (a combining umlaut) that can be
combined with a standard "letter a" code point, to also make "a with
umlaut". But the "code point" version will be a different byte
sequence (in whatever UTF you pick) than the "combining character plus >>>> non-accented letter) version.
When I started reading your post that's what I thought you might be
getting at but you example -- pilcrow -- put a stop to that. Surely you >>> could not be talking about combining characters and diacriticals given
that example. So in trying to make sense of your example I decided you
must be talking about different encodings.
That's a fair critisism. I should have used an accented char as the
fake character. And yes, there is the additional 'encodings' issue
too.
Anyway, I should have stuck with that thought despite the example. Even >>> so, I would probably still have posted a "different encodings" remark as >>> that is just as much a problem (though one that is reducing over time)
as Unicode equivalence and compatibility.
Yes, I did not think to mention that. At the same time, it was aimed
at Chris, and combining a "different sets of codepoints represent same
character" and "different UTF encodings convert same code points to
different byte strings" would likely have sent him off on an unrelated
tangent.
But you are correct, UTF-8 has almost taken over, but the other
encodings do still show up occasionally, and a different encoding will
cause his "Unicode" passwords to also fail.
Agreed.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
On 2/20/2026 2:03 PM, Ben Bacarisse wrote:
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
On 2/20/2026 1:01 PM, Ben Bacarisse wrote:Yes it would. But so what? If your code handles the input correctly,
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
On 2/20/2026 5:07 AM, Ben Bacarisse wrote:I don't want to copy an post stuff into a link. What purpose does that >>>>> serve? You know what your code does so you know what will happen for a >>>>> particular sequence of input bytes.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
Password with unicode = can of oh shit's!, not just a can of worms? >>>>>>> They introduce some difficulties, but there are advantages as well. >>>>>>> People want to use passwords and pass phrases that are in their native >>>>>>> language. It can help people choose longer ones while still rememberingthem. Also Unicode can increase the entropy without needing longer >>>>>>> passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
many places will either reject it or insist I add a digit, an upper and >>>>>>> lower case letter and a symbol!
Well let me try with just copy-and-paste between the quotes "raAEYc+-+rCR-i-+-#" as a
password for the plaintext, between the quotes:
"Ben Bacarisse"
Here is the link to some ciphertext... Can you decrypt it? copy paste in >>>>>> the password, click the decrypt button:
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7
I still think it scary because of what I read in this thread. Might work,The trick to knowing if some software will do what you expect is to
might not??? Perhaps? Humm... ;^o
understand the code, the inputs and the outputs. Copying and pasting >>>>> text won't teach you much about any of these things.
Well, if you are on a different system and the copy and paste might give >>>> different bytes for the same visual symbols, that would murder the password
and you could not decrypt... Right?
why do you care if I can paste something into a website correctly?
I wanted to see if you can post those visual symbols as a password and make >> it not decrypt the plaintext correctly. It would give an example of the
potential problem we are discussing here? Ala, Rich and Marcel 's point? My >> hash would not work if its not bit exact...
I don't think any of the symbols I posted can illustrate Rich's point
and even if I had chosen ones that did, I would be surprised if copying
and pasting would trigger normalisation. But let's say it's a
concern... Then a good way to illustrate that point would be a website
that just showed what bytes are pasted into an input field. Having hash
code behind it just obscures the problem. The problem would not lie in
the hash but with the system that's doing the copy and paste (and the
details of how the HTML form in handled server side).
For UTF-8, "|i", for example, could be any of:
The precomposed character: c3a1 - 11000011 10100001,
"a" plus a combining-acute: 61cc81 - 01100001 11001100 10000001,
or some overlong encoding that takes up more bytes than it actually needs.
Ben Bacarisse <ben@bsb.me.uk> wrote:
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
On 2/20/2026 2:03 PM, Ben Bacarisse wrote:
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
On 2/20/2026 1:01 PM, Ben Bacarisse wrote:Yes it would. But so what? If your code handles the input correctly, >>>> why do you care if I can paste something into a website correctly?
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
On 2/20/2026 5:07 AM, Ben Bacarisse wrote:I don't want to copy an post stuff into a link. What purpose does that >>>>>> serve? You know what your code does so you know what will happen for a >>>>>> particular sequence of input bytes.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
Password with unicode = can of oh shit's!, not just a can of worms? >>>>>>>> They introduce some difficulties, but there are advantages as well. >>>>>>>> People want to use passwords and pass phrases that are in their native >>>>>>>> language. It can help people choose longer ones while still rememberingthem. Also Unicode can increase the entropy without needing longer >>>>>>>> passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
many places will either reject it or insist I add a digit, an upper and
lower case letter and a symbol!
Well let me try with just copy-and-paste between the quotes
"raAEYc+-+rCR-i-+-#" as a
password for the plaintext, between the quotes:
"Ben Bacarisse"
Here is the link to some ciphertext... Can you decrypt it? copy paste in
the password, click the decrypt button:
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7
I still think it scary because of what I read in this thread. Might work,The trick to knowing if some software will do what you expect is to >>>>>> understand the code, the inputs and the outputs. Copying and pasting >>>>>> text won't teach you much about any of these things.
might not??? Perhaps? Humm... ;^o
Well, if you are on a different system and the copy and paste might give >>>>> different bytes for the same visual symbols, that would murder the password
and you could not decrypt... Right?
I wanted to see if you can post those visual symbols as a password and make >>> it not decrypt the plaintext correctly. It would give an example of the
potential problem we are discussing here? Ala, Rich and Marcel 's point? My >>> hash would not work if its not bit exact...
I don't think any of the symbols I posted can illustrate Rich's point
and even if I had chosen ones that did, I would be surprised if copying
and pasting would trigger normalisation. But let's say it's a
concern... Then a good way to illustrate that point would be a website
that just showed what bytes are pasted into an input field. Having hash
code behind it just obscures the problem. The problem would not lie in
the hash but with the system that's doing the copy and paste (and the
details of how the HTML form in handled server side).
Richard Harnden, in Message-ID: <10n52nf$2t5lv$1@dont-email.me>, took
care of that part for me:
For UTF-8, "|i", for example, could be any of:
The precomposed character: c3a1 - 11000011 10100001,
"a" plus a combining-acute: 61cc81 - 01100001 11001100 10000001,
or some overlong encoding that takes up more bytes than it actually
needs.
Now, proper confirming implementations should not issue "overlong" encodings, so with the exception of pen-testers, Chris *should* be able
to ignore the "overlong" angle.
But Richard's post shows two different byte strings that produce the identical visual character.
Rich <rich@example.invalid> writes:
Ben Bacarisse <ben@bsb.me.uk> wrote:
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
On 2/20/2026 2:03 PM, Ben Bacarisse wrote:
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
On 2/20/2026 1:01 PM, Ben Bacarisse wrote:Yes it would. But so what? If your code handles the input correctly, >>>>> why do you care if I can paste something into a website correctly?
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
On 2/20/2026 5:07 AM, Ben Bacarisse wrote:I don't want to copy an post stuff into a link. What purpose does that >>>>>>> serve? You know what your code does so you know what will happen for a >>>>>>> particular sequence of input bytes.
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes: >>>>>>>>>
Password with unicode = can of oh shit's!, not just a can of worms? >>>>>>>>> They introduce some difficulties, but there are advantages as well. >>>>>>>>> People want to use passwords and pass phrases that are in their nativelanguage. It can help people choose longer ones while still remembering
them. Also Unicode can increase the entropy without needing longer >>>>>>>>> passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
many places will either reject it or insist I add a digit, an upper and
lower case letter and a symbol!
Well let me try with just copy-and-paste between the quotes
"raAEYc+-+rCR-i-+-#" as a
password for the plaintext, between the quotes:
"Ben Bacarisse"
Here is the link to some ciphertext... Can you decrypt it? copy paste in
the password, click the decrypt button:
https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7
I still think it scary because of what I read in this thread. Might work,The trick to knowing if some software will do what you expect is to >>>>>>> understand the code, the inputs and the outputs. Copying and pasting >>>>>>> text won't teach you much about any of these things.
might not??? Perhaps? Humm... ;^o
Well, if you are on a different system and the copy and paste might give >>>>>> different bytes for the same visual symbols, that would murder the password
and you could not decrypt... Right?
I wanted to see if you can post those visual symbols as a password and make
it not decrypt the plaintext correctly. It would give an example of the >>>> potential problem we are discussing here? Ala, Rich and Marcel 's point? My
hash would not work if its not bit exact...
I don't think any of the symbols I posted can illustrate Rich's point
and even if I had chosen ones that did, I would be surprised if copying
and pasting would trigger normalisation. But let's say it's a
concern... Then a good way to illustrate that point would be a website
that just showed what bytes are pasted into an input field. Having hash >>> code behind it just obscures the problem. The problem would not lie in
the hash but with the system that's doing the copy and paste (and the
details of how the HTML form in handled server side).
Richard Harnden, in Message-ID: <10n52nf$2t5lv$1@dont-email.me>, took
care of that part for me:
For UTF-8, "|i", for example, could be any of:
The precomposed character: c3a1 - 11000011 10100001,
"a" plus a combining-acute: 61cc81 - 01100001 11001100 10000001,
or some overlong encoding that takes up more bytes than it actually
needs.
I'm not sure why you are reposting this. My point to Chris was that his website was not a good way to test if copy and past is working as he
expects since it does show the input bytes and just adds an extra level
of complexity to the test.
Now, proper confirming implementations should not issue "overlong"
encodings, so with the exception of pen-testers, Chris *should* be able
to ignore the "overlong" angle.
There are cases where overlong encodings can cause problems and there is nothing to stop these being generated maliciously so the usual advice is
that programs should check for these on input.
But Richard's post shows two different byte strings that produce the
identical visual character.
Yes. I didn't think this was in any doubt.
| Sysop: | Amessyroom |
|---|---|
| Location: | Fayetteville, NC |
| Users: | 59 |
| Nodes: | 6 (1 / 5) |
| Uptime: | 16:03:10 |
| Calls: | 810 |
| Calls today: | 1 |
| Files: | 1,287 |
| D/L today: |
10 files (21,017K bytes) |
| Messages: | 193,341 |