• =?UTF-8?Q?Re=3A_Browser=E2=80=91only_HMAC=E2=80=91based_toy_cipher_?= =?UTF-8?Q?demo_=28DrMoron=29_=E2=80=94_now_live_with_URL=E2=80=91encoded_ci?= =?UTF-8?Q?phertext?=

    From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Sun Feb 15 11:53:00 2026
    From Newsgroup: sci.crypt

    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    Features:
    [...]
    arbitrary Unicode passwords supported

    BTW: Consider to read <https://www.unicode.org/faq/normalization>.

    <https://www.unicode.org/faq/normalization#1> <https://www.unicode.org/faq/normalization#2>

    I "think" my code handles it okay... Humm... thanks.


    This is not meant for real secrets rCo just something I built for fun and
    learning.
    Feedback, critique, or curiosity welcome.

    Unfortunately, I don't have enough time to look at your code, but
    I wanted to at least make the above comment.

    thank you and totally fair enough for sure.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Rich@rich@example.invalid to sci.crypt on Mon Feb 16 22:01:26 2026
    From Newsgroup: sci.crypt

    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    Features:
    [...]
    arbitrary Unicode passwords supported

    BTW: Consider to read <https://www.unicode.org/faq/normalization>.

    <https://www.unicode.org/faq/normalization#1>
    <https://www.unicode.org/faq/normalization#2>

    I "think" my code handles it okay... Humm... thanks.

    Unless you are actively "normalizing" per the Unicode Consortium recommendation, then it is nearly a 100% chance it *does not* handle it
    ok.

    Unicode has plural underlying representations that all produce the same "visible characters" for a surprisingly large number of code points.

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Tue Feb 17 12:21:10 2026
    From Newsgroup: sci.crypt

    On 2/16/2026 2:01 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    Features:
    [...]
    arbitrary Unicode passwords supported

    BTW: Consider to read <https://www.unicode.org/faq/normalization>.

    <https://www.unicode.org/faq/normalization#1>
    <https://www.unicode.org/faq/normalization#2>

    I "think" my code handles it okay... Humm... thanks.

    Unless you are actively "normalizing" per the Unicode Consortium recommendation, then it is nearly a 100% chance it *does not* handle it
    ok.

    Well, I am trying to get the password, plaintext to convert to raw bytes
    for my algo to handle. Of course my C version handles anything


    Unicode has plural underlying representations that all produce the same "visible characters" for a surprisingly large number of code points.

    Let me try a plaintext with some unicode that encrypts using the default
    key.

    This seems to work:

    Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d


    Now, let me try encrypting it using the same plaintext for a password:

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=b9fd17dc8961db67d635dc20d8374e63ce66521770a30bb7f96e280acaef43d76687a8eb3799393403a47dca1d31f5cecc636ccdec94700b3a364c472ca2b927dea46c75a3ecd1810d5734c15cb700d5f90a106bb3fc0f7f5fdb1b48eec077860102dfbab3e308afafba45113d8f4b7712343d1b608b5b21992c

    Seems to work fine.

    Can you verify it on your end? Thanks.

    https://i.ibb.co/VWWFLt1P/image.png





    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Tue Feb 17 13:17:46 2026
    From Newsgroup: sci.crypt

    On 2/16/2026 2:01 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    Features:
    [...]
    arbitrary Unicode passwords supported

    BTW: Consider to read <https://www.unicode.org/faq/normalization>.

    <https://www.unicode.org/faq/normalization#1>
    <https://www.unicode.org/faq/normalization#2>

    I "think" my code handles it okay... Humm... thanks.

    Unless you are actively "normalizing" per the Unicode Consortium recommendation, then it is nearly a 100% chance it *does not* handle it
    ok.

    Unicode has plural underlying representations that all produce the same "visible characters" for a surprisingly large number of code points.


    Another test:

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ba06cf92dce13bb9d2b5f2fd81f92d1094bce5e88d01800356e27e3823e58ab343adeaa1d919357dbabb66b13b815e5d3418d01fd82a6ae3c20cb54422e6e5923d4cade18f06d7f76b35c8207e2779631c21b3e57637262adc3b8e1b7e6bb07c09d7e7c89d20a203a834bfa80407370bba17db92da96728ef7e6e303cd2cd8

    Plaintext:

    Can you see this?

    +u reA raa reo reu reR Efii Efc#N+A
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Marcel Logen@333200007110-0201@ybtra.de to sci.crypt on Tue Feb 17 23:41:21 2026
    From Newsgroup: sci.crypt

    Chris M. Thomasson in sci.crypt:

    On 2/16/2026 2:01 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    Features:
    [...]
    arbitrary Unicode passwords supported

    BTW: Consider to read <https://www.unicode.org/faq/normalization>.

    <https://www.unicode.org/faq/normalization#1>
    <https://www.unicode.org/faq/normalization#2>

    I "think" my code handles it okay... Humm... thanks.

    Unless you are actively "normalizing" per the Unicode Consortium
    recommendation, then it is nearly a 100% chance it *does not* handle it
    ok.

    Well, I am trying to get the password, plaintext to convert to raw bytes
    for my algo to handle. Of course my C version handles anything

    Look at this example
    (made in an UTF-8 terminal window (Linux & bash)):

    | $ echo 61 cc 88 0a | xxd -r -p
    | |n

    | $ echo c3 a4 0a | xxd -r -p
    | |n

    The raw bytes sequences "0x61cc88" and "0xc3a4" both produce the
    output "|n" (German "a umlaut").

    Unicode has plural underlying representations that all produce the same
    "visible characters" for a surprisingly large number of code points.

    Marcel
    --
    Tue Feb 17 23:41:21 2026 CET (1771368081)
    pc-731
    87 ms17 c87s
    Lines: 46
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Tue Feb 17 15:25:10 2026
    From Newsgroup: sci.crypt

    On 2/17/2026 2:41 PM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    On 2/16/2026 2:01 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    Features:
    [...]
    arbitrary Unicode passwords supported

    BTW: Consider to read <https://www.unicode.org/faq/normalization>.

    <https://www.unicode.org/faq/normalization#1>
    <https://www.unicode.org/faq/normalization#2>

    I "think" my code handles it okay... Humm... thanks.

    Unless you are actively "normalizing" per the Unicode Consortium
    recommendation, then it is nearly a 100% chance it *does not* handle it
    ok.

    Well, I am trying to get the password, plaintext to convert to raw bytes
    for my algo to handle. Of course my C version handles anything

    Look at this example
    (made in an UTF-8 terminal window (Linux & bash)):

    | $ echo 61 cc 88 0a | xxd -r -p
    | |n

    | $ echo c3 a4 0a | xxd -r -p
    | |n

    The raw bytes sequences "0x61cc88" and "0xc3a4" both produce the
    output "|n" (German "a umlaut").

    I hope I am not missing something in my code. Wrt a plaintext of:
    ____________
    Composed |n, decomposed a|e
    Symbol |a, letter |a
    Omega +-, Ohm raa
    ____________

    I get:

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=b565723fb62f13d0f92b4dd66223afd80b3f6cdbc3cd0eeab4cec1d2ce37f622bcc635f20ac6f4245c028b0f639432a51fdbd39441fc6e60b334d46199dbd2cf3f5300a3bba9c80fcee5fbe24a730c60d4c951e5fbcbd966f4244b8ef9b4b0d60529c15104b93deb2576e2d07816b9956f0a2c03b55ada9bcac8dcce259ab79978112bd61b5f6c274085a6d1e3

    ____________
    Composed |n, decomposed a|e
    Symbol |a, letter |a
    Omega +-, Ohm raa
    ____________


    Actually, I need to put in a way for me to enter raw hex bytes as a
    plaintext. That would help the online version in a sense. I bet there is subtle flaw in there. Thanks!



    Unicode has plural underlying representations that all produce the same
    "visible characters" for a surprisingly large number of code points.

    I bet I missed something. It sucks to try to get a plaintext data in raw bytes. Or I am missing a much easier way.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Tue Feb 17 15:35:01 2026
    From Newsgroup: sci.crypt

    On 2/17/2026 3:25 PM, Chris M. Thomasson wrote:
    On 2/17/2026 2:41 PM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    On 2/16/2026 2:01 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    [...]
    I think TextEncoder().encode()/decode() helps me out here a bit. Still,
    I need to put in a special plaintext box, or a radio button that treats
    the existing plaintext as raw hex bytes. Any thoughts? Thanks.

    [...]
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Tue Feb 17 15:50:18 2026
    From Newsgroup: sci.crypt

    On 2/17/2026 3:35 PM, Chris M. Thomasson wrote:
    On 2/17/2026 3:25 PM, Chris M. Thomasson wrote:
    On 2/17/2026 2:41 PM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    On 2/16/2026 2:01 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    [...]
    I think TextEncoder().encode()/decode() helps me out here a bit. Still,
    I need to put in a special plaintext box, or a radio button that treats
    the existing plaintext as raw hex bytes. Any thoughts? Thanks.

    [...]


    NFC normalization? So, I copy paste in two different unicode's from
    website saying they are different, the paste can make them all use the
    same code?
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Tue Feb 17 16:37:04 2026
    From Newsgroup: sci.crypt

    On 2/15/2026 1:20 AM, Chris M. Thomasson wrote:
    IrCOve been working on a small educational cipher experiment called DrMoron.

    [...]

    Hey now. Fwiw, here is a plaintext, between the lines:
    ___________________
    This is a test...

    123 rUi456 Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e ___________________


    Okay. I run it through my C version and get the hex ciphertext:


    daf3e9d725a004ee2178ad997b6aacb8ef7017fa59c06078f22c8fbcdf04833ba82f5202b81168ef88dd1e7faf5f66ae8d8885637aa5928ea5c1ff64658d938ad8c0b0b72154350dcd766b4aabbffba1d7c7fd9e4b93ce7df9280d4e03e72308cf7f0043aa821311c92ed669dcc7fd65eabd345bd1f852e3304cfbf7244afeda4b98fd91268084a0befae4c8ff3ac9f3443579cd0b1d6bf54b3f37



    I run it through my online version, hit decrypt, and get:
    ___________________
    This is a test...

    123 rUi456 Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e ___________________

    YEAH!!!

    That is cool.

    I really need to add in a checkbox for a user to treat the data in the plaintext textarea as raw hex bytes.

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Rich@rich@example.invalid to sci.crypt on Wed Feb 18 04:34:53 2026
    From Newsgroup: sci.crypt

    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/16/2026 2:01 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    Features:
    [...]
    arbitrary Unicode passwords supported

    BTW: Consider to read <https://www.unicode.org/faq/normalization>.

    <https://www.unicode.org/faq/normalization#1>
    <https://www.unicode.org/faq/normalization#2>

    I "think" my code handles it okay... Humm... thanks.

    Unless you are actively "normalizing" per the Unicode Consortium
    recommendation, then it is nearly a 100% chance it *does not* handle it
    ok.

    Well, I am trying to get the password, plaintext to convert to raw bytes
    for my algo to handle. Of course my C version handles anything


    Unicode has plural underlying representations that all produce the same
    "visible characters" for a surprisingly large number of code points.

    Let me try a plaintext with some unicode that encrypts using the default key.

    This seems to work:

    Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

    Of course, because you only have one "password".

    You missed the entire point:

    Unicode has plural underlying representations that all produce the
    same "visible characters" for a surprisingly large number of code
    points.

    So (all made up as I'm not going to bother digging into Unicode to drag
    out the actual byte sequences) the byte sequence:

    0x80 0x55 0x54

    might display a Pilcrow symbol

    And this byte sequence:

    0x81 0x23 0x88 0x23

    might *also* display as a Pilcrow symbol

    So, you, on your Windows machine, enter a pilcrow as part of your
    password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow.
    Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's encrypting.

    You send the result to a buddy using a Mac, and you tell him the
    password is a Pilcrow symbol.

    They enter a Pilcrow symbol however one would do so on a Mac, but MacOS outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
    fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
    returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
    password as you used to encrypt.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Rich@rich@example.invalid to sci.crypt on Wed Feb 18 04:39:19 2026
    From Newsgroup: sci.crypt

    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    I bet there is subtle flaw in there. Thanks!

    There is, but it is not 'subtle' for anyone who has more than a passing familiarity with Unicode encodings. It's only subtle to those who
    don't have much understanding of Unicode encodings.

    Unicode has plural underlying representations that all produce the same >>>> "visible characters" for a surprisingly large number of code points.

    I bet I missed something. It sucks to try to get a plaintext data in raw bytes. Or I am missing a much easier way.

    You missed the example Marciel posted.

    Two very different byte sequences, both that display the exact same
    character.

    So someone that uses that "character" in a password is at the mercy of
    which byte sequence their tool/OS uses when they enter that character.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Rich@rich@example.invalid to sci.crypt on Wed Feb 18 04:41:49 2026
    From Newsgroup: sci.crypt

    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/17/2026 3:25 PM, Chris M. Thomasson wrote:
    On 2/17/2026 2:41 PM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    On 2/16/2026 2:01 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    [...]
    I think TextEncoder().encode()/decode() helps me out here a bit. Still,
    I need to put in a special plaintext box, or a radio button that treats
    the existing plaintext as raw hex bytes. Any thoughts? Thanks.

    [...]

    Only if it also applies Unicode normalization in the process.

    And of course, that still leaves you at the mercy of the Unicode
    standards committee changing the manner of normalization three versions
    from now, whereupon things change again.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Rich@rich@example.invalid to sci.crypt on Wed Feb 18 04:43:18 2026
    From Newsgroup: sci.crypt

    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/17/2026 3:35 PM, Chris M. Thomasson wrote:
    On 2/17/2026 3:25 PM, Chris M. Thomasson wrote:
    On 2/17/2026 2:41 PM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    On 2/16/2026 2:01 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    [...]
    I think TextEncoder().encode()/decode() helps me out here a bit. Still,
    I need to put in a special plaintext box, or a radio button that treats
    the existing plaintext as raw hex bytes. Any thoughts? Thanks.

    [...]


    NFC normalization? So, I copy paste in two different unicode's from
    website saying they are different, the paste can make them all use the
    same code?

    Well, if you mean Near Field Communication, not likely.

    Unicode normalization. Unicode has a whole host of rules for
    normalizing the variant encodings such that one can compare unicode
    strings and determine if they represent the same characters or not.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Ben Bacarisse@ben@bsb.me.uk to sci.crypt on Wed Feb 18 12:19:56 2026
    From Newsgroup: sci.crypt

    Rich <rich@example.invalid> writes:

    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/16/2026 2:01 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    Features:
    [...]
    arbitrary Unicode passwords supported

    BTW: Consider to read <https://www.unicode.org/faq/normalization>.

    <https://www.unicode.org/faq/normalization#1>
    <https://www.unicode.org/faq/normalization#2>

    I "think" my code handles it okay... Humm... thanks.

    Unless you are actively "normalizing" per the Unicode Consortium
    recommendation, then it is nearly a 100% chance it *does not* handle it
    ok.

    Well, I am trying to get the password, plaintext to convert to raw bytes
    for my algo to handle. Of course my C version handles anything


    Unicode has plural underlying representations that all produce the same
    "visible characters" for a surprisingly large number of code points.

    Let me try a plaintext with some unicode that encrypts using the default
    key.

    This seems to work:

    Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

    Of course, because you only have one "password".

    You missed the entire point:

    Unicode has plural underlying representations that all produce the
    same "visible characters" for a surprisingly large number of code
    points.

    So (all made up as I'm not going to bother digging into Unicode to drag
    out the actual byte sequences) the byte sequence:

    I don't think it's helpful to use made-up encodings! There are
    normalisation issues, but they are not at all as you present them. I
    think you might be talking about completely different Unicode encodings,
    like UTF-16 vs UTF-8.

    0x80 0x55 0x54

    might display a Pilcrow symbol

    And this byte sequence:

    0x81 0x23 0x88 0x23

    might *also* display as a Pilcrow symbol

    So, you, on your Windows machine, enter a pilcrow as part of your
    password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow. Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's encrypting.

    You send the result to a buddy using a Mac, and you tell him the
    password is a Pilcrow symbol.

    They enter a Pilcrow symbol however one would do so on a Mac, but MacOS outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
    fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
    returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
    password as you used to encrypt.

    This is not to do with normalisation but, I think, two systems using
    different Uniconde encodings. Im guessing because you don't give two
    actual encodings that can represent a pilcrow.

    Exactly the same problem has been around since the birth of computing.
    If my password is "IBM" in EBCDIC and my DEC buddy types "IBM" into her
    PDP-11 it won't work. It won't even work between Windows and Mac
    machines when using any characters that don't have the same single-byte encodings.
    --
    Ben.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Richard Harnden@richard.nospam@gmail.invalid to sci.crypt on Wed Feb 18 19:08:29 2026
    From Newsgroup: sci.crypt

    On 18/02/2026 12:19, Ben Bacarisse wrote:
    Rich <rich@example.invalid> writes:

    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/16/2026 2:01 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    Features:
    [...]
    arbitrary Unicode passwords supported

    BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>
    <https://www.unicode.org/faq/normalization#1>
    <https://www.unicode.org/faq/normalization#2>

    I "think" my code handles it okay... Humm... thanks.

    Unless you are actively "normalizing" per the Unicode Consortium
    recommendation, then it is nearly a 100% chance it *does not* handle it >>>> ok.

    Well, I am trying to get the password, plaintext to convert to raw bytes >>> for my algo to handle. Of course my C version handles anything


    Unicode has plural underlying representations that all produce the same >>>> "visible characters" for a surprisingly large number of code points.

    Let me try a plaintext with some unicode that encrypts using the default >>> key.

    This seems to work:

    Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

    Of course, because you only have one "password".

    You missed the entire point:

    Unicode has plural underlying representations that all produce the
    same "visible characters" for a surprisingly large number of code
    points.

    So (all made up as I'm not going to bother digging into Unicode to drag
    out the actual byte sequences) the byte sequence:

    I don't think it's helpful to use made-up encodings! There are
    normalisation issues, but they are not at all as you present them. I
    think you might be talking about completely different Unicode encodings,
    like UTF-16 vs UTF-8.

    0x80 0x55 0x54

    might display a Pilcrow symbol

    And this byte sequence:

    0x81 0x23 0x88 0x23

    might *also* display as a Pilcrow symbol

    So, you, on your Windows machine, enter a pilcrow as part of your
    password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow.
    Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's
    encrypting.

    You send the result to a buddy using a Mac, and you tell him the
    password is a Pilcrow symbol.

    They enter a Pilcrow symbol however one would do so on a Mac, but MacOS
    outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
    fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
    returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
    password as you used to encrypt.

    This is not to do with normalisation but, I think, two systems using different Uniconde encodings. Im guessing because you don't give two
    actual encodings that can represent a pilcrow.

    Exactly the same problem has been around since the birth of computing.
    If my password is "IBM" in EBCDIC and my DEC buddy types "IBM" into her PDP-11 it won't work. It won't even work between Windows and Mac
    machines when using any characters that don't have the same single-byte encodings.


    For UTF-8, "|i", for example, could be any of:

    The precomposed character: c3a1 - 11000011 10100001,
    "a" plus a combining-acute: 61cc81 - 01100001 11001100 10000001,
    or some overlong encoding that takes up more bytes than it actually needs.

    I think that's what Rich meant.





    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Wed Feb 18 12:32:21 2026
    From Newsgroup: sci.crypt

    On 2/17/2026 8:34 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/16/2026 2:01 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    Features:
    [...]
    arbitrary Unicode passwords supported

    BTW: Consider to read <https://www.unicode.org/faq/normalization>.

    <https://www.unicode.org/faq/normalization#1>
    <https://www.unicode.org/faq/normalization#2>

    I "think" my code handles it okay... Humm... thanks.

    Unless you are actively "normalizing" per the Unicode Consortium
    recommendation, then it is nearly a 100% chance it *does not* handle it
    ok.

    Well, I am trying to get the password, plaintext to convert to raw bytes
    for my algo to handle. Of course my C version handles anything


    Unicode has plural underlying representations that all produce the same
    "visible characters" for a surprisingly large number of code points.

    Let me try a plaintext with some unicode that encrypts using the default
    key.

    This seems to work:

    Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

    Of course, because you only have one "password".

    You missed the entire point:

    Unicode has plural underlying representations that all produce the
    same "visible characters" for a surprisingly large number of code
    points.

    So (all made up as I'm not going to bother digging into Unicode to drag
    out the actual byte sequences) the byte sequence:

    0x80 0x55 0x54

    might display a Pilcrow symbol

    And this byte sequence:

    0x81 0x23 0x88 0x23

    might *also* display as a Pilcrow symbol

    So, you, on your Windows machine, enter a pilcrow as part of your
    password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow.
    Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's encrypting.

    Okay.

    You send the result to a buddy using a Mac, and you tell him the
    password is a Pilcrow symbol.

    They enter a Pilcrow symbol however one would do so on a Mac, but MacOS outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
    fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
    returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
    password as you used to encrypt.

    BAM! It will not decrypt. I really need to make a checkbox or something
    that denotes "hex byte mode"... This will allow plaintext and password textarea's to use hex bytes directly...

    I did it on my local machine, not uploaded it to the web yet... I have,
    say a plaintext treated as hex bytes with the different encodings, hit decrypt, and they show as the correct symbols even though the hex representations are different. Humm, need to ponder...


    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Wed Feb 18 12:36:58 2026
    From Newsgroup: sci.crypt

    On 2/17/2026 8:39 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    I bet there is subtle flaw in there. Thanks!

    There is, but it is not 'subtle' for anyone who has more than a passing familiarity with Unicode encodings. It's only subtle to those who
    don't have much understanding of Unicode encodings.

    Unicode has plural underlying representations that all produce the same >>>>> "visible characters" for a surprisingly large number of code points.

    I bet I missed something. It sucks to try to get a plaintext data in raw
    bytes. Or I am missing a much easier way.

    You missed the example Marciel posted.

    Two very different byte sequences, both that display the exact same character.

    That would nail the password.


    So someone that uses that "character" in a password is at the mercy of
    which byte sequence their tool/OS uses when they enter that character.

    Yup. Need a raw hex byte mode... :^)
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Ben Bacarisse@ben@bsb.me.uk to sci.crypt on Wed Feb 18 22:57:28 2026
    From Newsgroup: sci.crypt

    Richard Harnden <richard.nospam@gmail.invalid> writes:

    On 18/02/2026 12:19, Ben Bacarisse wrote:
    Rich <rich@example.invalid> writes:

    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/16/2026 2:01 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    Features:
    [...]
    arbitrary Unicode passwords supported

    BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>
    <https://www.unicode.org/faq/normalization#1>
    <https://www.unicode.org/faq/normalization#2>

    I "think" my code handles it okay... Humm... thanks.

    Unless you are actively "normalizing" per the Unicode Consortium
    recommendation, then it is nearly a 100% chance it *does not* handle it >>>>> ok.

    Well, I am trying to get the password, plaintext to convert to raw bytes >>>> for my algo to handle. Of course my C version handles anything


    Unicode has plural underlying representations that all produce the same >>>>> "visible characters" for a surprisingly large number of code points.

    Let me try a plaintext with some unicode that encrypts using the default >>>> key.

    This seems to work:

    Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

    Of course, because you only have one "password".

    You missed the entire point:

    Unicode has plural underlying representations that all produce the
    same "visible characters" for a surprisingly large number of code
    points.

    So (all made up as I'm not going to bother digging into Unicode to drag
    out the actual byte sequences) the byte sequence:
    I don't think it's helpful to use made-up encodings! There are
    normalisation issues, but they are not at all as you present them. I
    think you might be talking about completely different Unicode encodings,
    like UTF-16 vs UTF-8.

    0x80 0x55 0x54

    might display a Pilcrow symbol

    And this byte sequence:

    0x81 0x23 0x88 0x23

    might *also* display as a Pilcrow symbol

    So, you, on your Windows machine, enter a pilcrow as part of your
    password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow.
    Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's
    encrypting.

    You send the result to a buddy using a Mac, and you tell him the
    password is a Pilcrow symbol.

    They enter a Pilcrow symbol however one would do so on a Mac, but MacOS
    outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
    fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
    returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
    password as you used to encrypt.
    This is not to do with normalisation but, I think, two systems using
    different Uniconde encodings. Im guessing because you don't give two
    actual encodings that can represent a pilcrow.
    Exactly the same problem has been around since the birth of computing.
    If my password is "IBM" in EBCDIC and my DEC buddy types "IBM" into her
    PDP-11 it won't work. It won't even work between Windows and Mac
    machines when using any characters that don't have the same single-byte
    encodings.


    For UTF-8, "|i", for example, could be any of:

    The precomposed character: c3a1 - 11000011 10100001,
    "a" plus a combining-acute: 61cc81 - 01100001 11001100 10000001,
    or some overlong encoding that takes up more bytes than it actually needs.

    I think that's what Rich meant.

    My first thought, but then why would pilcrow have different encodings
    between Windows and Mac? That's an entirely different scenario. That's
    why I would have preferred an example, not a couple of made-up sequences
    that are not Unicode encodings at all!

    BTW, all programs should reject over-long encodings in input and none
    should ever generate any as output so, again, it does not for the
    Windows/Mac scenario. A system that generated one when the user is
    typing a password is seriously broken.
    --
    Ben.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Rich@rich@example.invalid to sci.crypt on Thu Feb 19 03:36:43 2026
    From Newsgroup: sci.crypt

    Ben Bacarisse <ben@bsb.me.uk> wrote:
    Rich <rich@example.invalid> writes:

    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/16/2026 2:01 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    Features:
    [...]
    arbitrary Unicode passwords supported

    BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>
    <https://www.unicode.org/faq/normalization#1>
    <https://www.unicode.org/faq/normalization#2>

    I "think" my code handles it okay... Humm... thanks.

    Unless you are actively "normalizing" per the Unicode Consortium
    recommendation, then it is nearly a 100% chance it *does not* handle it >>>> ok.

    Well, I am trying to get the password, plaintext to convert to raw bytes >>> for my algo to handle. Of course my C version handles anything


    Unicode has plural underlying representations that all produce the same >>>> "visible characters" for a surprisingly large number of code points.

    Let me try a plaintext with some unicode that encrypts using the default >>> key.

    This seems to work:

    Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

    Of course, because you only have one "password".

    You missed the entire point:

    Unicode has plural underlying representations that all produce the
    same "visible characters" for a surprisingly large number of code
    points.

    So (all made up as I'm not going to bother digging into Unicode to drag
    out the actual byte sequences) the byte sequence:

    I don't think it's helpful to use made-up encodings!

    I wasn't going to put in the effort to actually do the research to find
    a real one (plus another poster already provided a real one).

    There are normalisation issues, but they are not at all as you
    present them. I think you might be talking about completely
    different Unicode encodings, like UTF-16 vs UTF-8.

    Nope. For most of the accented characters (i.e., something like the a
    with umlaut) there is both a code point that is directly "a with umlaut"
    and there are combining characters (a combining umlaut) that can be
    combined with a standard "letter a" code point, to also make "a with
    umlaut". But the "code point" version will be a different byte
    sequence (in whatever UTF you pick) than the "combining character plus non-accented letter) version.

    0x80 0x55 0x54

    might display a Pilcrow symbol

    And this byte sequence:

    0x81 0x23 0x88 0x23

    might *also* display as a Pilcrow symbol

    So, you, on your Windows machine, enter a pilcrow as part of your
    password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow.
    Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's
    encrypting.

    You send the result to a buddy using a Mac, and you tell him the
    password is a Pilcrow symbol.

    They enter a Pilcrow symbol however one would do so on a Mac, but MacOS
    outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
    fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
    returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
    password as you used to encrypt.

    This is not to do with normalisation but, I think, two systems using different Uniconde encodings.

    That can also bite him too, but often when someone starts talking about "unicode passwords" it is because their native language uses accented characters, and they would like something like p|nssword (that's
    password, but with a-umulat instead of a). But without realizing there
    are two different sets of codepoints that will create |n and at least
    about three different ways to encode those code points to bytes.

    Im guessing because you don't give two actual encodings that can
    represent a pilcrow.

    The whole point of "made up" is it was all "made up", because, frankly,
    I wasn't going to bother putting in the effort to find the two
    different encodings.

    Exactly the same problem has been around since the birth of computing.
    If my password is "IBM" in EBCDIC and my DEC buddy types "IBM" into her PDP-11 it won't work. It won't even work between Windows and Mac
    machines when using any characters that don't have the same single-byte encodings.

    Yep, which is what Chris was missing when he started talking about
    using "unicode passwords".
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Rich@rich@example.invalid to sci.crypt on Thu Feb 19 03:39:03 2026
    From Newsgroup: sci.crypt

    Richard Harnden <richard.nospam@gmail.invalid> wrote:
    On 18/02/2026 12:19, Ben Bacarisse wrote:
    Rich <rich@example.invalid> writes:

    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/16/2026 2:01 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    Features:
    [...]
    arbitrary Unicode passwords supported

    BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>
    <https://www.unicode.org/faq/normalization#1>
    <https://www.unicode.org/faq/normalization#2>

    I "think" my code handles it okay... Humm... thanks.

    Unless you are actively "normalizing" per the Unicode Consortium
    recommendation, then it is nearly a 100% chance it *does not* handle it >>>>> ok.

    Well, I am trying to get the password, plaintext to convert to raw bytes >>>> for my algo to handle. Of course my C version handles anything


    Unicode has plural underlying representations that all produce the same >>>>> "visible characters" for a surprisingly large number of code points.

    Let me try a plaintext with some unicode that encrypts using the default >>>> key.

    This seems to work:

    Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

    Of course, because you only have one "password".

    You missed the entire point:

    Unicode has plural underlying representations that all produce the
    same "visible characters" for a surprisingly large number of code
    points.

    So (all made up as I'm not going to bother digging into Unicode to drag
    out the actual byte sequences) the byte sequence:

    I don't think it's helpful to use made-up encodings! There are
    normalisation issues, but they are not at all as you present them. I
    think you might be talking about completely different Unicode encodings,
    like UTF-16 vs UTF-8.

    0x80 0x55 0x54

    might display a Pilcrow symbol

    And this byte sequence:

    0x81 0x23 0x88 0x23

    might *also* display as a Pilcrow symbol

    So, you, on your Windows machine, enter a pilcrow as part of your
    password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow.
    Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's
    encrypting.

    You send the result to a buddy using a Mac, and you tell him the
    password is a Pilcrow symbol.

    They enter a Pilcrow symbol however one would do so on a Mac, but MacOS
    outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
    fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
    returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
    password as you used to encrypt.

    This is not to do with normalisation but, I think, two systems using
    different Uniconde encodings. Im guessing because you don't give two
    actual encodings that can represent a pilcrow.

    Exactly the same problem has been around since the birth of computing.
    If my password is "IBM" in EBCDIC and my DEC buddy types "IBM" into her
    PDP-11 it won't work. It won't even work between Windows and Mac
    machines when using any characters that don't have the same single-byte
    encodings.


    For UTF-8, "|i", for example, could be any of:

    The precomposed character: c3a1 - 11000011 10100001,
    "a" plus a combining-acute: 61cc81 - 01100001 11001100 10000001,
    or some overlong encoding that takes up more bytes than it actually needs.

    Technically a spec. violation, as the spec. requires the shortest
    encoding be used. So a proper system shouldn't overlong encode, but
    someone "trying to find edge conditions" just might ship in an overlong encoding to see what happens.

    I think that's what Rich meant.

    Yep, exactly what I meant.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Rich@rich@example.invalid to sci.crypt on Thu Feb 19 03:43:39 2026
    From Newsgroup: sci.crypt

    Ben Bacarisse <ben@bsb.me.uk> wrote:
    Richard Harnden <richard.nospam@gmail.invalid> writes:

    On 18/02/2026 12:19, Ben Bacarisse wrote:
    Rich <rich@example.invalid> writes:

    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/16/2026 2:01 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    Features:
    [...]
    arbitrary Unicode passwords supported

    BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>>
    <https://www.unicode.org/faq/normalization#1>
    <https://www.unicode.org/faq/normalization#2>

    I "think" my code handles it okay... Humm... thanks.

    Unless you are actively "normalizing" per the Unicode Consortium
    recommendation, then it is nearly a 100% chance it *does not* handle it >>>>>> ok.

    Well, I am trying to get the password, plaintext to convert to raw bytes >>>>> for my algo to handle. Of course my C version handles anything


    Unicode has plural underlying representations that all produce the same >>>>>> "visible characters" for a surprisingly large number of code points. >>>>>
    Let me try a plaintext with some unicode that encrypts using the default >>>>> key.

    This seems to work:

    Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

    Of course, because you only have one "password".

    You missed the entire point:

    Unicode has plural underlying representations that all produce the >>>> same "visible characters" for a surprisingly large number of code
    points.

    So (all made up as I'm not going to bother digging into Unicode to drag >>>> out the actual byte sequences) the byte sequence:
    I don't think it's helpful to use made-up encodings! There are
    normalisation issues, but they are not at all as you present them. I
    think you might be talking about completely different Unicode encodings, >>> like UTF-16 vs UTF-8.

    0x80 0x55 0x54

    might display a Pilcrow symbol

    And this byte sequence:

    0x81 0x23 0x88 0x23

    might *also* display as a Pilcrow symbol

    So, you, on your Windows machine, enter a pilcrow as part of your
    password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow. >>>> Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's
    encrypting.

    You send the result to a buddy using a Mac, and you tell him the
    password is a Pilcrow symbol.

    They enter a Pilcrow symbol however one would do so on a Mac, but MacOS >>>> outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
    fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
    returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
    password as you used to encrypt.
    This is not to do with normalisation but, I think, two systems using
    different Uniconde encodings. Im guessing because you don't give two
    actual encodings that can represent a pilcrow.
    Exactly the same problem has been around since the birth of computing.
    If my password is "IBM" in EBCDIC and my DEC buddy types "IBM" into her
    PDP-11 it won't work. It won't even work between Windows and Mac
    machines when using any characters that don't have the same single-byte
    encodings.


    For UTF-8, "|i", for example, could be any of:

    The precomposed character: c3a1 - 11000011 10100001,
    "a" plus a combining-acute: 61cc81 - 01100001 11001100 10000001,
    or some overlong encoding that takes up more bytes than it actually needs. >>
    I think that's what Rich meant.

    My first thought, but then why would pilcrow have different encodings
    between Windows and Mac?

    You are still being "too literal". "Made up" means everything from my
    example was "made up", including the fact that I said "pilcrow" (I
    wanted a character that wasn't a standard ascii character, and that was
    the first "name" that came to mind).

    But "made up" means none of it was actual fact, it was just to try to
    get Chris to understand that the same characters on screen (or in his
    password entry box) could become very different byte sequences
    depending upon what the user program, its libraries, and/or the OS
    might choose to do.

    That's an entirely different scenario. That's why I would have
    preferred an example, not a couple of made-up sequences that are not
    Unicode encodings at all!

    That's fair, but I was not going to bother doing the leg work to work
    out the example.

    BTW, all programs should reject over-long encodings in input and none
    should ever generate any as output so, again, it does not for the
    Windows/Mac scenario. A system that generated one when the user is
    typing a password is seriously broken.

    Agreed. So Chris should not need to worry about 'overlong' sequences.
    Except when a pen-tester is attacking his project, and then the
    pen-tester might just inject overlong sequences to see what happens.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Rich@rich@example.invalid to sci.crypt on Thu Feb 19 03:45:40 2026
    From Newsgroup: sci.crypt

    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/17/2026 8:34 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/16/2026 2:01 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    Features:
    [...]
    arbitrary Unicode passwords supported

    BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>
    <https://www.unicode.org/faq/normalization#1>
    <https://www.unicode.org/faq/normalization#2>

    I "think" my code handles it okay... Humm... thanks.

    Unless you are actively "normalizing" per the Unicode Consortium
    recommendation, then it is nearly a 100% chance it *does not* handle it >>>> ok.

    Well, I am trying to get the password, plaintext to convert to raw bytes >>> for my algo to handle. Of course my C version handles anything


    Unicode has plural underlying representations that all produce the same >>>> "visible characters" for a surprisingly large number of code points.

    Let me try a plaintext with some unicode that encrypts using the default >>> key.

    This seems to work:

    Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

    Of course, because you only have one "password".

    You missed the entire point:

    Unicode has plural underlying representations that all produce the
    same "visible characters" for a surprisingly large number of code
    points.

    So (all made up as I'm not going to bother digging into Unicode to drag
    out the actual byte sequences) the byte sequence:

    0x80 0x55 0x54

    might display a Pilcrow symbol

    And this byte sequence:

    0x81 0x23 0x88 0x23

    might *also* display as a Pilcrow symbol

    So, you, on your Windows machine, enter a pilcrow as part of your
    password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow.
    Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's
    encrypting.

    Okay.

    You send the result to a buddy using a Mac, and you tell him the
    password is a Pilcrow symbol.

    They enter a Pilcrow symbol however one would do so on a Mac, but MacOS
    outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
    fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
    returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
    password as you used to encrypt.

    BAM! It will not decrypt. I really need to make a checkbox or something
    that denotes "hex byte mode"... This will allow plaintext and password textarea's to use hex bytes directly...

    I did it on my local machine, not uploaded it to the web yet... I have,
    say a plaintext treated as hex bytes with the different encodings, hit decrypt, and they show as the correct symbols even though the hex representations are different. Humm, need to ponder...

    If in the end you want the ability of a user to be able to enter any
    string of binary bytes then some form of "binary input" (hex, base64,
    base85, etc.) will be needed if you expect them to type them in.

    If you let them upload a file as the "key" then you don't need a hex
    (or other) input mode.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Ben Bacarisse@ben@bsb.me.uk to sci.crypt on Thu Feb 19 11:52:23 2026
    From Newsgroup: sci.crypt

    Rich <rich@example.invalid> writes:

    Ben Bacarisse <ben@bsb.me.uk> wrote:
    Rich <rich@example.invalid> writes:

    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/16/2026 2:01 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    Features:
    [...]
    arbitrary Unicode passwords supported

    BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>
    <https://www.unicode.org/faq/normalization#1>
    <https://www.unicode.org/faq/normalization#2>

    I "think" my code handles it okay... Humm... thanks.

    Unless you are actively "normalizing" per the Unicode Consortium
    recommendation, then it is nearly a 100% chance it *does not* handle it >>>>> ok.

    Well, I am trying to get the password, plaintext to convert to raw bytes >>>> for my algo to handle. Of course my C version handles anything


    Unicode has plural underlying representations that all produce the same >>>>> "visible characters" for a surprisingly large number of code points.

    Let me try a plaintext with some unicode that encrypts using the default >>>> key.

    This seems to work:

    Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

    Of course, because you only have one "password".

    You missed the entire point:

    Unicode has plural underlying representations that all produce the
    same "visible characters" for a surprisingly large number of code
    points.

    So (all made up as I'm not going to bother digging into Unicode to drag >>> out the actual byte sequences) the byte sequence:

    I don't think it's helpful to use made-up encodings!

    I wasn't going to put in the effort to actually do the research to find
    a real one (plus another poster already provided a real one).

    There are normalisation issues, but they are not at all as you
    present them. I think you might be talking about completely
    different Unicode encodings, like UTF-16 vs UTF-8.

    Nope. For most of the accented characters (i.e., something like the a
    with umlaut) there is both a code point that is directly "a with umlaut"
    and there are combining characters (a combining umlaut) that can be
    combined with a standard "letter a" code point, to also make "a with umlaut". But the "code point" version will be a different byte
    sequence (in whatever UTF you pick) than the "combining character plus non-accented letter) version.

    When I started reading your post that's what I thought you might be
    getting at but you example -- pilcrow -- put a stop to that. Surely you
    could not be talking about combining characters and diacriticals given
    that example. So in trying to make sense of your example I decided you
    must be talking about different encodings.

    Anyway, I should have stuck with that thought despite the example. Even
    so, I would probably still have posted a "different encodings" remark as
    that is just as much a problem (though one that is reducing over time)
    as Unicode equivalence and compatibility.
    --
    Ben.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Thu Feb 19 12:36:26 2026
    From Newsgroup: sci.crypt

    On 2/18/2026 7:45 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/17/2026 8:34 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/16/2026 2:01 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    Features:
    [...]
    arbitrary Unicode passwords supported

    BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>
    <https://www.unicode.org/faq/normalization#1>
    <https://www.unicode.org/faq/normalization#2>

    I "think" my code handles it okay... Humm... thanks.

    Unless you are actively "normalizing" per the Unicode Consortium
    recommendation, then it is nearly a 100% chance it *does not* handle it >>>>> ok.

    Well, I am trying to get the password, plaintext to convert to raw bytes >>>> for my algo to handle. Of course my C version handles anything


    Unicode has plural underlying representations that all produce the same >>>>> "visible characters" for a surprisingly large number of code points.

    Let me try a plaintext with some unicode that encrypts using the default >>>> key.

    This seems to work:

    Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

    Of course, because you only have one "password".

    You missed the entire point:

    Unicode has plural underlying representations that all produce the
    same "visible characters" for a surprisingly large number of code
    points.

    So (all made up as I'm not going to bother digging into Unicode to drag
    out the actual byte sequences) the byte sequence:

    0x80 0x55 0x54

    might display a Pilcrow symbol

    And this byte sequence:

    0x81 0x23 0x88 0x23

    might *also* display as a Pilcrow symbol

    So, you, on your Windows machine, enter a pilcrow as part of your
    password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow.
    Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's
    encrypting.

    Okay.

    You send the result to a buddy using a Mac, and you tell him the
    password is a Pilcrow symbol.

    They enter a Pilcrow symbol however one would do so on a Mac, but MacOS
    outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
    fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
    returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
    password as you used to encrypt.

    BAM! It will not decrypt. I really need to make a checkbox or something
    that denotes "hex byte mode"... This will allow plaintext and password
    textarea's to use hex bytes directly...

    I did it on my local machine, not uploaded it to the web yet... I have,
    say a plaintext treated as hex bytes with the different encodings, hit
    decrypt, and they show as the correct symbols even though the hex
    representations are different. Humm, need to ponder...

    If in the end you want the ability of a user to be able to enter any
    string of binary bytes then some form of "binary input" (hex, base64,
    base85, etc.) will be needed if you expect them to type them in.

    If you let them upload a file as the "key" then you don't need a hex
    (or other) input mode.

    I was thinking about that. It would not be "uploaded" to the server
    because my code is client only. It would need to allow the user to look
    for, I guess, any file. Then use its contents as a Password, raw bytes.
    Should work. At least I should put in a warning about using the password
    with unicode... Thanks! :^)

    Password with unicode = can of oh shit's!, not just a can of worms?
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Rich@rich@example.invalid to sci.crypt on Fri Feb 20 04:05:19 2026
    From Newsgroup: sci.crypt

    Ben Bacarisse <ben@bsb.me.uk> wrote:
    Rich <rich@example.invalid> writes:

    Ben Bacarisse <ben@bsb.me.uk> wrote:
    Rich <rich@example.invalid> writes:

    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/16/2026 2:01 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    Features:
    [...]
    arbitrary Unicode passwords supported

    BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>>
    <https://www.unicode.org/faq/normalization#1>
    <https://www.unicode.org/faq/normalization#2>

    I "think" my code handles it okay... Humm... thanks.

    Unless you are actively "normalizing" per the Unicode Consortium
    recommendation, then it is nearly a 100% chance it *does not* handle it >>>>>> ok.

    Well, I am trying to get the password, plaintext to convert to raw bytes >>>>> for my algo to handle. Of course my C version handles anything


    Unicode has plural underlying representations that all produce the same >>>>>> "visible characters" for a surprisingly large number of code points. >>>>>
    Let me try a plaintext with some unicode that encrypts using the default >>>>> key.

    This seems to work:

    Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

    Of course, because you only have one "password".

    You missed the entire point:

    Unicode has plural underlying representations that all produce the >>>> same "visible characters" for a surprisingly large number of code
    points.

    So (all made up as I'm not going to bother digging into Unicode to drag >>>> out the actual byte sequences) the byte sequence:

    I don't think it's helpful to use made-up encodings!

    I wasn't going to put in the effort to actually do the research to find
    a real one (plus another poster already provided a real one).

    There are normalisation issues, but they are not at all as you
    present them. I think you might be talking about completely
    different Unicode encodings, like UTF-16 vs UTF-8.

    Nope. For most of the accented characters (i.e., something like the a
    with umlaut) there is both a code point that is directly "a with umlaut"
    and there are combining characters (a combining umlaut) that can be
    combined with a standard "letter a" code point, to also make "a with
    umlaut". But the "code point" version will be a different byte
    sequence (in whatever UTF you pick) than the "combining character plus
    non-accented letter) version.

    When I started reading your post that's what I thought you might be
    getting at but you example -- pilcrow -- put a stop to that. Surely you could not be talking about combining characters and diacriticals given
    that example. So in trying to make sense of your example I decided you
    must be talking about different encodings.

    That's a fair critisism. I should have used an accented char as the
    fake character. And yes, there is the additional 'encodings' issue
    too.

    Anyway, I should have stuck with that thought despite the example. Even
    so, I would probably still have posted a "different encodings" remark as
    that is just as much a problem (though one that is reducing over time)
    as Unicode equivalence and compatibility.

    Yes, I did not think to mention that. At the same time, it was aimed
    at Chris, and combining a "different sets of codepoints represent same character" and "different UTF encodings convert same code points to
    different byte strings" would likely have sent him off on an unrelated tangent.

    But you are correct, UTF-8 has almost taken over, but the other
    encodings do still show up occasionally, and a different encoding will
    cause his "Unicode" passwords to also fail.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Rich@rich@example.invalid to sci.crypt on Fri Feb 20 04:08:20 2026
    From Newsgroup: sci.crypt

    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/18/2026 7:45 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/17/2026 8:34 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/16/2026 2:01 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    Features:
    [...]
    arbitrary Unicode passwords supported

    BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>>
    <https://www.unicode.org/faq/normalization#1>
    <https://www.unicode.org/faq/normalization#2>

    I "think" my code handles it okay... Humm... thanks.

    Unless you are actively "normalizing" per the Unicode Consortium
    recommendation, then it is nearly a 100% chance it *does not* handle it >>>>>> ok.

    Well, I am trying to get the password, plaintext to convert to raw bytes >>>>> for my algo to handle. Of course my C version handles anything


    Unicode has plural underlying representations that all produce the same >>>>>> "visible characters" for a surprisingly large number of code points. >>>>>
    Let me try a plaintext with some unicode that encrypts using the default >>>>> key.

    This seems to work:

    Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

    Of course, because you only have one "password".

    You missed the entire point:

    Unicode has plural underlying representations that all produce the >>>> same "visible characters" for a surprisingly large number of code >>>> points.

    So (all made up as I'm not going to bother digging into Unicode to drag >>>> out the actual byte sequences) the byte sequence:

    0x80 0x55 0x54

    might display a Pilcrow symbol

    And this byte sequence:

    0x81 0x23 0x88 0x23

    might *also* display as a Pilcrow symbol

    So, you, on your Windows machine, enter a pilcrow as part of your
    password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow. >>>> Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's
    encrypting.

    Okay.

    You send the result to a buddy using a Mac, and you tell him the
    password is a Pilcrow symbol.

    They enter a Pilcrow symbol however one would do so on a Mac, but MacOS >>>> outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your
    fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and
    returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
    password as you used to encrypt.

    BAM! It will not decrypt. I really need to make a checkbox or something
    that denotes "hex byte mode"... This will allow plaintext and password
    textarea's to use hex bytes directly...

    I did it on my local machine, not uploaded it to the web yet... I have,
    say a plaintext treated as hex bytes with the different encodings, hit
    decrypt, and they show as the correct symbols even though the hex
    representations are different. Humm, need to ponder...

    If in the end you want the ability of a user to be able to enter any
    string of binary bytes then some form of "binary input" (hex, base64,
    base85, etc.) will be needed if you expect them to type them in.

    If you let them upload a file as the "key" then you don't need a hex
    (or other) input mode.

    I was thinking about that. It would not be "uploaded" to the server
    because my code is client only. It would need to allow the user to look
    for, I guess, any file. Then use its contents as a Password, raw bytes. Should work. At least I should put in a warning about using the password with unicode... Thanks! :^)

    Password with unicode = can of oh shit's!, not just a can of worms?

    Unicode passwords bring the trouble of "looks the same on screen" but
    might be "very different set of bytes on the wire". And your hashes
    only see the "bytes on the wire".
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Ben Bacarisse@ben@bsb.me.uk to sci.crypt on Fri Feb 20 13:07:40 2026
    From Newsgroup: sci.crypt

    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    Password with unicode = can of oh shit's!, not just a can of worms?

    They introduce some difficulties, but there are advantages as well.
    People want to use passwords and pass phrases that are in their native language. It can help people choose longer ones while still remembering
    them. Also Unicode can increase the entropy without needing longer
    passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
    many places will either reject it or insist I add a digit, an upper and
    lower case letter and a symbol!
    --
    Ben.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Ben Bacarisse@ben@bsb.me.uk to sci.crypt on Fri Feb 20 21:01:09 2026
    From Newsgroup: sci.crypt

    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 2/20/2026 5:07 AM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    Password with unicode = can of oh shit's!, not just a can of worms?
    They introduce some difficulties, but there are advantages as well.
    People want to use passwords and pass phrases that are in their native
    language. It can help people choose longer ones while still remembering
    them. Also Unicode can increase the entropy without needing longer
    passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
    many places will either reject it or insist I add a digit, an upper and
    lower case letter and a symbol!


    Well let me try with just copy-and-paste between the quotes "raAEYc+-+rCR-i-+-#" as a
    password for the plaintext, between the quotes:

    "Ben Bacarisse"

    Here is the link to some ciphertext... Can you decrypt it? copy paste in
    the password, click the decrypt button:

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7

    I don't want to copy an post stuff into a link. What purpose does that
    serve? You know what your code does so you know what will happen for a particular sequence of input bytes.

    I still think it scary because of what I read in this thread. Might work, might not??? Perhaps? Humm... ;^o

    The trick to knowing if some software will do what you expect is to
    understand the code, the inputs and the outputs. Copying and pasting
    text won't teach you much about any of these things.
    --
    Ben.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Fri Feb 20 12:52:46 2026
    From Newsgroup: sci.crypt

    On 2/20/2026 5:07 AM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    Password with unicode = can of oh shit's!, not just a can of worms?

    They introduce some difficulties, but there are advantages as well.
    People want to use passwords and pass phrases that are in their native language. It can help people choose longer ones while still remembering them. Also Unicode can increase the entropy without needing longer passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
    many places will either reject it or insist I add a digit, an upper and
    lower case letter and a symbol!


    Well let me try with just copy-and-paste between the quotes "raAEYc+-+rCR-i-+-#"
    as a password for the plaintext, between the quotes:

    "Ben Bacarisse"

    Here is the link to some ciphertext... Can you decrypt it? copy paste in
    the password, click the decrypt button:

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7

    I can from here, but can anybody else?

    https://i.ibb.co/v4VG3k6K/image.png

    thanks Ben. :^)

    I still think it scary because of what I read in this thread. Might
    work, might not??? Perhaps? Humm... ;^o

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Fri Feb 20 13:08:14 2026
    From Newsgroup: sci.crypt

    On 2/20/2026 1:01 PM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 2/20/2026 5:07 AM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    Password with unicode = can of oh shit's!, not just a can of worms?
    They introduce some difficulties, but there are advantages as well.
    People want to use passwords and pass phrases that are in their native
    language. It can help people choose longer ones while still remembering >>> them. Also Unicode can increase the entropy without needing longer
    passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
    many places will either reject it or insist I add a digit, an upper and
    lower case letter and a symbol!


    Well let me try with just copy-and-paste between the quotes "raAEYc+-+rCR-i-+-#" as a
    password for the plaintext, between the quotes:

    "Ben Bacarisse"

    Here is the link to some ciphertext... Can you decrypt it? copy paste in
    the password, click the decrypt button:

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7

    I don't want to copy an post stuff into a link. What purpose does that serve? You know what your code does so you know what will happen for a particular sequence of input bytes.

    I still think it scary because of what I read in this thread. Might work,
    might not??? Perhaps? Humm... ;^o

    The trick to knowing if some software will do what you expect is to understand the code, the inputs and the outputs. Copying and pasting
    text won't teach you much about any of these things.


    Well, if you are on a different system and the copy and paste might give different bytes for the same visual symbols, that would murder the
    password and you could not decrypt... Right?
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Ben Bacarisse@ben@bsb.me.uk to sci.crypt on Fri Feb 20 22:03:54 2026
    From Newsgroup: sci.crypt

    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 2/20/2026 1:01 PM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 2/20/2026 5:07 AM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    Password with unicode = can of oh shit's!, not just a can of worms?
    They introduce some difficulties, but there are advantages as well.
    People want to use passwords and pass phrases that are in their native >>>> language. It can help people choose longer ones while still remembering >>>> them. Also Unicode can increase the entropy without needing longer
    passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
    many places will either reject it or insist I add a digit, an upper and >>>> lower case letter and a symbol!


    Well let me try with just copy-and-paste between the quotes "raAEYc+-+rCR-i-+-#" as a
    password for the plaintext, between the quotes:

    "Ben Bacarisse"

    Here is the link to some ciphertext... Can you decrypt it? copy paste in >>> the password, click the decrypt button:

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7
    I don't want to copy an post stuff into a link. What purpose does that
    serve? You know what your code does so you know what will happen for a
    particular sequence of input bytes.

    I still think it scary because of what I read in this thread. Might work, >>> might not??? Perhaps? Humm... ;^o
    The trick to knowing if some software will do what you expect is to
    understand the code, the inputs and the outputs. Copying and pasting
    text won't teach you much about any of these things.

    Well, if you are on a different system and the copy and paste might give different bytes for the same visual symbols, that would murder the password and you could not decrypt... Right?

    Yes it would. But so what? If your code handles the input correctly,
    why do you care if I can paste something into a website correctly?
    --
    Ben.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Fri Feb 20 14:07:28 2026
    From Newsgroup: sci.crypt

    On 2/20/2026 2:03 PM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 2/20/2026 1:01 PM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 2/20/2026 5:07 AM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    Password with unicode = can of oh shit's!, not just a can of worms? >>>>> They introduce some difficulties, but there are advantages as well.
    People want to use passwords and pass phrases that are in their native >>>>> language. It can help people choose longer ones while still remembering >>>>> them. Also Unicode can increase the entropy without needing longer
    passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
    many places will either reject it or insist I add a digit, an upper and >>>>> lower case letter and a symbol!


    Well let me try with just copy-and-paste between the quotes "raAEYc+-+rCR-i-+-#" as a
    password for the plaintext, between the quotes:

    "Ben Bacarisse"

    Here is the link to some ciphertext... Can you decrypt it? copy paste in >>>> the password, click the decrypt button:

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7
    I don't want to copy an post stuff into a link. What purpose does that
    serve? You know what your code does so you know what will happen for a
    particular sequence of input bytes.

    I still think it scary because of what I read in this thread. Might work, >>>> might not??? Perhaps? Humm... ;^o
    The trick to knowing if some software will do what you expect is to
    understand the code, the inputs and the outputs. Copying and pasting
    text won't teach you much about any of these things.

    Well, if you are on a different system and the copy and paste might give
    different bytes for the same visual symbols, that would murder the password >> and you could not decrypt... Right?

    Yes it would. But so what? If your code handles the input correctly,
    why do you care if I can paste something into a website correctly?


    I wanted to see if you can post those visual symbols as a password and
    make it not decrypt the plaintext correctly. It would give an example of
    the potential problem we are discussing here? Ala, Rich and Marcel 's
    point? My hash would not work if its not bit exact...
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Ben Bacarisse@ben@bsb.me.uk to sci.crypt on Sat Feb 21 00:01:42 2026
    From Newsgroup: sci.crypt

    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 2/20/2026 2:03 PM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 2/20/2026 1:01 PM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 2/20/2026 5:07 AM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    Password with unicode = can of oh shit's!, not just a can of worms? >>>>>> They introduce some difficulties, but there are advantages as well. >>>>>> People want to use passwords and pass phrases that are in their native >>>>>> language. It can help people choose longer ones while still remembering >>>>>> them. Also Unicode can increase the entropy without needing longer >>>>>> passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
    many places will either reject it or insist I add a digit, an upper and >>>>>> lower case letter and a symbol!


    Well let me try with just copy-and-paste between the quotes "raAEYc+-+rCR-i-+-#" as a
    password for the plaintext, between the quotes:

    "Ben Bacarisse"

    Here is the link to some ciphertext... Can you decrypt it? copy paste in >>>>> the password, click the decrypt button:

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7
    I don't want to copy an post stuff into a link. What purpose does that >>>> serve? You know what your code does so you know what will happen for a >>>> particular sequence of input bytes.

    I still think it scary because of what I read in this thread. Might work, >>>>> might not??? Perhaps? Humm... ;^o
    The trick to knowing if some software will do what you expect is to
    understand the code, the inputs and the outputs. Copying and pasting
    text won't teach you much about any of these things.

    Well, if you are on a different system and the copy and paste might give >>> different bytes for the same visual symbols, that would murder the password >>> and you could not decrypt... Right?
    Yes it would. But so what? If your code handles the input correctly,
    why do you care if I can paste something into a website correctly?

    I wanted to see if you can post those visual symbols as a password and make it not decrypt the plaintext correctly. It would give an example of the potential problem we are discussing here? Ala, Rich and Marcel 's point? My hash would not work if its not bit exact...

    I don't think any of the symbols I posted can illustrate Rich's point
    and even if I had chosen ones that did, I would be surprised if copying
    and pasting would trigger normalisation. But let's say it's a
    concern... Then a good way to illustrate that point would be a website
    that just showed what bytes are pasted into an input field. Having hash
    code behind it just obscures the problem. The problem would not lie in
    the hash but with the system that's doing the copy and paste (and the
    details of how the HTML form in handled server side).
    --
    Ben.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Fri Feb 20 21:05:48 2026
    From Newsgroup: sci.crypt

    On 2/19/2026 8:05 PM, Rich wrote:
    Ben Bacarisse <ben@bsb.me.uk> wrote:
    Rich <rich@example.invalid> writes:

    Ben Bacarisse <ben@bsb.me.uk> wrote:
    Rich <rich@example.invalid> writes:

    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/16/2026 2:01 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    Features:
    [...]
    arbitrary Unicode passwords supported

    BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>>>
    <https://www.unicode.org/faq/normalization#1>
    <https://www.unicode.org/faq/normalization#2>

    I "think" my code handles it okay... Humm... thanks.

    Unless you are actively "normalizing" per the Unicode Consortium >>>>>>> recommendation, then it is nearly a 100% chance it *does not* handle it >>>>>>> ok.

    Well, I am trying to get the password, plaintext to convert to raw bytes >>>>>> for my algo to handle. Of course my C version handles anything


    Unicode has plural underlying representations that all produce the same >>>>>>> "visible characters" for a surprisingly large number of code points. >>>>>>
    Let me try a plaintext with some unicode that encrypts using the default >>>>>> key.

    This seems to work:

    Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

    Of course, because you only have one "password".

    You missed the entire point:

    Unicode has plural underlying representations that all produce the >>>>> same "visible characters" for a surprisingly large number of code >>>>> points.

    So (all made up as I'm not going to bother digging into Unicode to drag >>>>> out the actual byte sequences) the byte sequence:

    I don't think it's helpful to use made-up encodings!

    I wasn't going to put in the effort to actually do the research to find
    a real one (plus another poster already provided a real one).

    There are normalisation issues, but they are not at all as you
    present them. I think you might be talking about completely
    different Unicode encodings, like UTF-16 vs UTF-8.

    Nope. For most of the accented characters (i.e., something like the a
    with umlaut) there is both a code point that is directly "a with umlaut" >>> and there are combining characters (a combining umlaut) that can be
    combined with a standard "letter a" code point, to also make "a with
    umlaut". But the "code point" version will be a different byte
    sequence (in whatever UTF you pick) than the "combining character plus
    non-accented letter) version.

    When I started reading your post that's what I thought you might be
    getting at but you example -- pilcrow -- put a stop to that. Surely you
    could not be talking about combining characters and diacriticals given
    that example. So in trying to make sense of your example I decided you
    must be talking about different encodings.

    That's a fair critisism. I should have used an accented char as the
    fake character. And yes, there is the additional 'encodings' issue
    too.

    Anyway, I should have stuck with that thought despite the example. Even
    so, I would probably still have posted a "different encodings" remark as
    that is just as much a problem (though one that is reducing over time)
    as Unicode equivalence and compatibility.

    Yes, I did not think to mention that. At the same time, it was aimed
    at Chris, and combining a "different sets of codepoints represent same character" and "different UTF encodings convert same code points to
    different byte strings" would likely have sent him off on an unrelated tangent.

    But you are correct, UTF-8 has almost taken over, but the other
    encodings do still show up occasionally, and a different encoding will
    cause his "Unicode" passwords to also fail.

    Agreed.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Fri Feb 20 21:54:22 2026
    From Newsgroup: sci.crypt

    On 2/19/2026 8:08 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/18/2026 7:45 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/17/2026 8:34 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/16/2026 2:01 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    Features:
    [...]
    arbitrary Unicode passwords supported

    BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>>>
    <https://www.unicode.org/faq/normalization#1>
    <https://www.unicode.org/faq/normalization#2>

    I "think" my code handles it okay... Humm... thanks.

    Unless you are actively "normalizing" per the Unicode Consortium >>>>>>> recommendation, then it is nearly a 100% chance it *does not* handle it >>>>>>> ok.

    Well, I am trying to get the password, plaintext to convert to raw bytes >>>>>> for my algo to handle. Of course my C version handles anything


    Unicode has plural underlying representations that all produce the same >>>>>>> "visible characters" for a surprisingly large number of code points. >>>>>>
    Let me try a plaintext with some unicode that encrypts using the default >>>>>> key.

    This seems to work:

    Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

    Of course, because you only have one "password".

    You missed the entire point:

    Unicode has plural underlying representations that all produce the >>>>> same "visible characters" for a surprisingly large number of code >>>>> points.

    So (all made up as I'm not going to bother digging into Unicode to drag >>>>> out the actual byte sequences) the byte sequence:

    0x80 0x55 0x54

    might display a Pilcrow symbol

    And this byte sequence:

    0x81 0x23 0x88 0x23

    might *also* display as a Pilcrow symbol

    So, you, on your Windows machine, enter a pilcrow as part of your
    password. Windows outputs 0x80 0x55 0x54 as the encoding of a Pilcrow. >>>>> Your "fancy hasher" sees 0x80 0x55 0x54 as the password and does it's >>>>> encrypting.

    Okay.

    You send the result to a buddy using a Mac, and you tell him the
    password is a Pilcrow symbol.

    They enter a Pilcrow symbol however one would do so on a Mac, but MacOS >>>>> outputs the byte sequence 0x81 0x23 0x88 0x23 for the Pilcrow. Your >>>>> fancy hasher encrypter sees 0x81 0x23 0x88 0x23 as the password, and >>>>> returns "line noise" because 0x81 0x23 0x88 0x23 was not the same
    password as you used to encrypt.

    BAM! It will not decrypt. I really need to make a checkbox or something >>>> that denotes "hex byte mode"... This will allow plaintext and password >>>> textarea's to use hex bytes directly...

    I did it on my local machine, not uploaded it to the web yet... I have, >>>> say a plaintext treated as hex bytes with the different encodings, hit >>>> decrypt, and they show as the correct symbols even though the hex
    representations are different. Humm, need to ponder...

    If in the end you want the ability of a user to be able to enter any
    string of binary bytes then some form of "binary input" (hex, base64,
    base85, etc.) will be needed if you expect them to type them in.

    If you let them upload a file as the "key" then you don't need a hex
    (or other) input mode.

    I was thinking about that. It would not be "uploaded" to the server
    because my code is client only. It would need to allow the user to look
    for, I guess, any file. Then use its contents as a Password, raw bytes.
    Should work. At least I should put in a warning about using the password
    with unicode... Thanks! :^)

    Password with unicode = can of oh shit's!, not just a can of worms?

    Unicode passwords bring the trouble of "looks the same on screen" but
    might be "very different set of bytes on the wire". And your hashes
    only see the "bytes on the wire".

    bingo. Tango! Delta. Niner! Over and out.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Fri Feb 20 22:25:24 2026
    From Newsgroup: sci.crypt

    On 2/15/2026 1:20 AM, Chris M. Thomasson wrote:
    [...]

    Wrt the online version:

    Two bags of bytes on the local system (files), nothing sent to the
    server. One bag of raw bytes for the password, and another one for the plaintext. I can do it. It's been a while since I blew through the
    javascript trees... ;^D rofl.

    Using unicode for the password was fucking moronic. Sorry. DrMoron? wow...


    Sorry.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Fri Feb 20 22:41:50 2026
    From Newsgroup: sci.crypt

    On 2/20/2026 4:01 PM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 2/20/2026 2:03 PM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 2/20/2026 1:01 PM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 2/20/2026 5:07 AM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    Password with unicode = can of oh shit's!, not just a can of worms? >>>>>>> They introduce some difficulties, but there are advantages as well. >>>>>>> People want to use passwords and pass phrases that are in their native >>>>>>> language. It can help people choose longer ones while still remembering
    them. Also Unicode can increase the entropy without needing longer >>>>>>> passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
    many places will either reject it or insist I add a digit, an upper and >>>>>>> lower case letter and a symbol!


    Well let me try with just copy-and-paste between the quotes "raAEYc+-+rCR-i-+-#" as a
    password for the plaintext, between the quotes:

    "Ben Bacarisse"

    Here is the link to some ciphertext... Can you decrypt it? copy paste in >>>>>> the password, click the decrypt button:

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7
    I don't want to copy an post stuff into a link. What purpose does that >>>>> serve? You know what your code does so you know what will happen for a >>>>> particular sequence of input bytes.

    I still think it scary because of what I read in this thread. Might work,
    might not??? Perhaps? Humm... ;^o
    The trick to knowing if some software will do what you expect is to
    understand the code, the inputs and the outputs. Copying and pasting >>>>> text won't teach you much about any of these things.

    Well, if you are on a different system and the copy and paste might give >>>> different bytes for the same visual symbols, that would murder the password
    and you could not decrypt... Right?
    Yes it would. But so what? If your code handles the input correctly,
    why do you care if I can paste something into a website correctly?

    I wanted to see if you can post those visual symbols as a password and make >> it not decrypt the plaintext correctly. It would give an example of the
    potential problem we are discussing here? Ala, Rich and Marcel 's point? My >> hash would not work if its not bit exact...

    I don't think any of the symbols I posted can illustrate Rich's point
    and even if I had chosen ones that did, I would be surprised if copying
    and pasting would trigger normalisation. But let's say it's a
    concern... Then a good way to illustrate that point would be a website
    that just showed what bytes are pasted into an input field. Having hash
    code behind it just obscures the problem. The problem would not lie in
    the hash but with the system that's doing the copy and paste (and the
    details of how the HTML form in handled server side).


    Using unicode for a password is scary. DrMoron? Yikes!

    https://youtu.be/q3qDESAvzh0?list=RDRijB8wnJCN0
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Rich@rich@example.invalid to sci.crypt on Sat Feb 21 18:43:05 2026
    From Newsgroup: sci.crypt

    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/19/2026 8:05 PM, Rich wrote:
    Ben Bacarisse <ben@bsb.me.uk> wrote:
    Rich <rich@example.invalid> writes:

    Ben Bacarisse <ben@bsb.me.uk> wrote:
    Rich <rich@example.invalid> writes:

    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/16/2026 2:01 PM, Rich wrote:
    Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
    On 2/15/2026 3:54 AM, Marcel Logen wrote:
    Chris M. Thomasson in sci.crypt:

    Features:
    [...]
    arbitrary Unicode passwords supported

    BTW: Consider to read <https://www.unicode.org/faq/normalization>. >>>>>>>>>>
    <https://www.unicode.org/faq/normalization#1>
    <https://www.unicode.org/faq/normalization#2>

    I "think" my code handles it okay... Humm... thanks.

    Unless you are actively "normalizing" per the Unicode Consortium >>>>>>>> recommendation, then it is nearly a 100% chance it *does not* handle it
    ok.

    Well, I am trying to get the password, plaintext to convert to raw bytes
    for my algo to handle. Of course my C version handles anything


    Unicode has plural underlying representations that all produce the same
    "visible characters" for a surprisingly large number of code points. >>>>>>>
    Let me try a plaintext with some unicode that encrypts using the default
    key.

    This seems to work:

    Plaintext (Efiiriy ) pUopeopU2pUipU> -f-C-+-#-|-e

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=47426a0c48a31d3264f6ddef8fe52875fb7d88bf7353d1716f73341247184f533a9f3c076f89b9db54bf4b90487aed4c669b4f84983a87dc6b87239d09564a3363db3bc9c1dccfdbae2472e219cc8f8fc8582e24fa16467fa4649f9b52d89ee240b8ce1f18501c5bfa37c95347cbec0940f949f9fc92c5c6c12d

    Of course, because you only have one "password".

    You missed the entire point:

    Unicode has plural underlying representations that all produce the >>>>>> same "visible characters" for a surprisingly large number of code >>>>>> points.

    So (all made up as I'm not going to bother digging into Unicode to drag >>>>>> out the actual byte sequences) the byte sequence:

    I don't think it's helpful to use made-up encodings!

    I wasn't going to put in the effort to actually do the research to find >>>> a real one (plus another poster already provided a real one).

    There are normalisation issues, but they are not at all as you
    present them. I think you might be talking about completely
    different Unicode encodings, like UTF-16 vs UTF-8.

    Nope. For most of the accented characters (i.e., something like the a >>>> with umlaut) there is both a code point that is directly "a with umlaut" >>>> and there are combining characters (a combining umlaut) that can be
    combined with a standard "letter a" code point, to also make "a with
    umlaut". But the "code point" version will be a different byte
    sequence (in whatever UTF you pick) than the "combining character plus >>>> non-accented letter) version.

    When I started reading your post that's what I thought you might be
    getting at but you example -- pilcrow -- put a stop to that. Surely you >>> could not be talking about combining characters and diacriticals given
    that example. So in trying to make sense of your example I decided you
    must be talking about different encodings.

    That's a fair critisism. I should have used an accented char as the
    fake character. And yes, there is the additional 'encodings' issue
    too.

    Anyway, I should have stuck with that thought despite the example. Even >>> so, I would probably still have posted a "different encodings" remark as >>> that is just as much a problem (though one that is reducing over time)
    as Unicode equivalence and compatibility.

    Yes, I did not think to mention that. At the same time, it was aimed
    at Chris, and combining a "different sets of codepoints represent same
    character" and "different UTF encodings convert same code points to
    different byte strings" would likely have sent him off on an unrelated
    tangent.

    But you are correct, UTF-8 has almost taken over, but the other
    encodings do still show up occasionally, and a different encoding will
    cause his "Unicode" passwords to also fail.

    Agreed.

    Thing is, provided you have some way to know which "encoding" is being provided, you can convert those differing "encodings" to a single
    common one (i.e., if you get UTF-16 encoding, you can convert it to
    UTF-8 if your input is expected to be UTF-8).

    That is more straighforward than proper normalization (the Unicode spec
    doc for normalizing is rather lengthy).

    Reality is, if you want to support "unicode passwords" you do really
    need to do both. Convert the supplied encoding to the encoding your
    'hasher' accepts, *and* perform normalization on the code points within
    the "password" you are provided.

    So you might want to look for some library code that would handle both
    for you.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Rich@rich@example.invalid to sci.crypt on Sat Feb 21 18:46:32 2026
    From Newsgroup: sci.crypt

    Ben Bacarisse <ben@bsb.me.uk> wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 2/20/2026 2:03 PM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 2/20/2026 1:01 PM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 2/20/2026 5:07 AM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    Password with unicode = can of oh shit's!, not just a can of worms? >>>>>>> They introduce some difficulties, but there are advantages as well. >>>>>>> People want to use passwords and pass phrases that are in their native >>>>>>> language. It can help people choose longer ones while still remembering
    them. Also Unicode can increase the entropy without needing longer >>>>>>> passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
    many places will either reject it or insist I add a digit, an upper and >>>>>>> lower case letter and a symbol!


    Well let me try with just copy-and-paste between the quotes "raAEYc+-+rCR-i-+-#" as a
    password for the plaintext, between the quotes:

    "Ben Bacarisse"

    Here is the link to some ciphertext... Can you decrypt it? copy paste in >>>>>> the password, click the decrypt button:

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7
    I don't want to copy an post stuff into a link. What purpose does that >>>>> serve? You know what your code does so you know what will happen for a >>>>> particular sequence of input bytes.

    I still think it scary because of what I read in this thread. Might work,
    might not??? Perhaps? Humm... ;^o
    The trick to knowing if some software will do what you expect is to
    understand the code, the inputs and the outputs. Copying and pasting >>>>> text won't teach you much about any of these things.

    Well, if you are on a different system and the copy and paste might give >>>> different bytes for the same visual symbols, that would murder the password
    and you could not decrypt... Right?
    Yes it would. But so what? If your code handles the input correctly,
    why do you care if I can paste something into a website correctly?

    I wanted to see if you can post those visual symbols as a password and make >> it not decrypt the plaintext correctly. It would give an example of the
    potential problem we are discussing here? Ala, Rich and Marcel 's point? My >> hash would not work if its not bit exact...

    I don't think any of the symbols I posted can illustrate Rich's point
    and even if I had chosen ones that did, I would be surprised if copying
    and pasting would trigger normalisation. But let's say it's a
    concern... Then a good way to illustrate that point would be a website
    that just showed what bytes are pasted into an input field. Having hash
    code behind it just obscures the problem. The problem would not lie in
    the hash but with the system that's doing the copy and paste (and the
    details of how the HTML form in handled server side).

    Richard Harnden, in Message-ID: <10n52nf$2t5lv$1@dont-email.me>, took
    care of that part for me:

    For UTF-8, "|i", for example, could be any of:

    The precomposed character: c3a1 - 11000011 10100001,
    "a" plus a combining-acute: 61cc81 - 01100001 11001100 10000001,
    or some overlong encoding that takes up more bytes than it actually needs.

    Now, proper confirming implementations should not issue "overlong"
    encodings, so with the exception of pen-testers, Chris *should* be able
    to ignore the "overlong" angle.

    But Richard's post shows two different byte strings that produce the
    identical visual character.

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Ben Bacarisse@ben@bsb.me.uk to sci.crypt on Sun Feb 22 22:34:09 2026
    From Newsgroup: sci.crypt

    Rich <rich@example.invalid> writes:

    Ben Bacarisse <ben@bsb.me.uk> wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 2/20/2026 2:03 PM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 2/20/2026 1:01 PM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 2/20/2026 5:07 AM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    Password with unicode = can of oh shit's!, not just a can of worms? >>>>>>>> They introduce some difficulties, but there are advantages as well. >>>>>>>> People want to use passwords and pass phrases that are in their native >>>>>>>> language. It can help people choose longer ones while still remembering
    them. Also Unicode can increase the entropy without needing longer >>>>>>>> passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
    many places will either reject it or insist I add a digit, an upper and
    lower case letter and a symbol!


    Well let me try with just copy-and-paste between the quotes
    "raAEYc+-+rCR-i-+-#" as a
    password for the plaintext, between the quotes:

    "Ben Bacarisse"

    Here is the link to some ciphertext... Can you decrypt it? copy paste in
    the password, click the decrypt button:

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7
    I don't want to copy an post stuff into a link. What purpose does that >>>>>> serve? You know what your code does so you know what will happen for a >>>>>> particular sequence of input bytes.

    I still think it scary because of what I read in this thread. Might work,
    might not??? Perhaps? Humm... ;^o
    The trick to knowing if some software will do what you expect is to >>>>>> understand the code, the inputs and the outputs. Copying and pasting >>>>>> text won't teach you much about any of these things.

    Well, if you are on a different system and the copy and paste might give >>>>> different bytes for the same visual symbols, that would murder the password
    and you could not decrypt... Right?
    Yes it would. But so what? If your code handles the input correctly, >>>> why do you care if I can paste something into a website correctly?

    I wanted to see if you can post those visual symbols as a password and make >>> it not decrypt the plaintext correctly. It would give an example of the
    potential problem we are discussing here? Ala, Rich and Marcel 's point? My >>> hash would not work if its not bit exact...

    I don't think any of the symbols I posted can illustrate Rich's point
    and even if I had chosen ones that did, I would be surprised if copying
    and pasting would trigger normalisation. But let's say it's a
    concern... Then a good way to illustrate that point would be a website
    that just showed what bytes are pasted into an input field. Having hash
    code behind it just obscures the problem. The problem would not lie in
    the hash but with the system that's doing the copy and paste (and the
    details of how the HTML form in handled server side).

    Richard Harnden, in Message-ID: <10n52nf$2t5lv$1@dont-email.me>, took
    care of that part for me:

    For UTF-8, "|i", for example, could be any of:

    The precomposed character: c3a1 - 11000011 10100001,
    "a" plus a combining-acute: 61cc81 - 01100001 11001100 10000001,
    or some overlong encoding that takes up more bytes than it actually
    needs.

    I'm not sure why you are reposting this. My point to Chris was that his website was not a good way to test if copy and past is working as he
    expects since it does show the input bytes and just adds an extra level
    of complexity to the test.

    Now, proper confirming implementations should not issue "overlong" encodings, so with the exception of pen-testers, Chris *should* be able
    to ignore the "overlong" angle.

    There are cases where overlong encodings can cause problems and there is nothing to stop these being generated maliciously so the usual advice is
    that programs should check for these on input.

    But Richard's post shows two different byte strings that produce the identical visual character.

    Yes. I didn't think this was in any doubt.
    --
    Ben.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to sci.crypt on Mon Feb 23 12:54:51 2026
    From Newsgroup: sci.crypt

    On 2/22/2026 2:34 PM, Ben Bacarisse wrote:
    Rich <rich@example.invalid> writes:

    Ben Bacarisse <ben@bsb.me.uk> wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 2/20/2026 2:03 PM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 2/20/2026 1:01 PM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 2/20/2026 5:07 AM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes: >>>>>>>>>
    Password with unicode = can of oh shit's!, not just a can of worms? >>>>>>>>> They introduce some difficulties, but there are advantages as well. >>>>>>>>> People want to use passwords and pass phrases that are in their native
    language. It can help people choose longer ones while still remembering
    them. Also Unicode can increase the entropy without needing longer >>>>>>>>> passwords. Using raAEYc+-+rCR-i-+-# as a password is probably quite strong, though
    many places will either reject it or insist I add a digit, an upper and
    lower case letter and a symbol!


    Well let me try with just copy-and-paste between the quotes
    "raAEYc+-+rCR-i-+-#" as a
    password for the plaintext, between the quotes:

    "Ben Bacarisse"

    Here is the link to some ciphertext... Can you decrypt it? copy paste in
    the password, click the decrypt button:

    https://fractallife247.com/test/hmac_cipher/drmoron/?ct_hmac_cipher=ce71abab3bb661afc83142e0d7a59abfb80348bffcbe8c3659c00cf015d04d06f4d4f9795e6d5f40db058651609680a0e98a00b236ebf962fb264f5642bdf431402b322dd186a92655a6e9d551c0d4d3068fb7f4d1d7
    I don't want to copy an post stuff into a link. What purpose does that >>>>>>> serve? You know what your code does so you know what will happen for a >>>>>>> particular sequence of input bytes.

    I still think it scary because of what I read in this thread. Might work,
    might not??? Perhaps? Humm... ;^o
    The trick to knowing if some software will do what you expect is to >>>>>>> understand the code, the inputs and the outputs. Copying and pasting >>>>>>> text won't teach you much about any of these things.

    Well, if you are on a different system and the copy and paste might give >>>>>> different bytes for the same visual symbols, that would murder the password
    and you could not decrypt... Right?
    Yes it would. But so what? If your code handles the input correctly, >>>>> why do you care if I can paste something into a website correctly?

    I wanted to see if you can post those visual symbols as a password and make
    it not decrypt the plaintext correctly. It would give an example of the >>>> potential problem we are discussing here? Ala, Rich and Marcel 's point? My
    hash would not work if its not bit exact...

    I don't think any of the symbols I posted can illustrate Rich's point
    and even if I had chosen ones that did, I would be surprised if copying
    and pasting would trigger normalisation. But let's say it's a
    concern... Then a good way to illustrate that point would be a website
    that just showed what bytes are pasted into an input field. Having hash >>> code behind it just obscures the problem. The problem would not lie in
    the hash but with the system that's doing the copy and paste (and the
    details of how the HTML form in handled server side).

    Richard Harnden, in Message-ID: <10n52nf$2t5lv$1@dont-email.me>, took
    care of that part for me:

    For UTF-8, "|i", for example, could be any of:

    The precomposed character: c3a1 - 11000011 10100001,
    "a" plus a combining-acute: 61cc81 - 01100001 11001100 10000001,
    or some overlong encoding that takes up more bytes than it actually
    needs.

    I'm not sure why you are reposting this. My point to Chris was that his website was not a good way to test if copy and past is working as he
    expects since it does show the input bytes and just adds an extra level
    of complexity to the test.

    Now, proper confirming implementations should not issue "overlong"
    encodings, so with the exception of pen-testers, Chris *should* be able
    to ignore the "overlong" angle.

    There are cases where overlong encodings can cause problems and there is nothing to stop these being generated maliciously so the usual advice is
    that programs should check for these on input.

    But Richard's post shows two different byte strings that produce the
    identical visual character.

    Yes. I didn't think this was in any doubt.


    Right. I need to allow a user to see the raw hex bytes from password and
    the plaintext.
    --- Synchronet 3.21b-Linux NewsLink 1.2