• Re: Musings about Usernames in adduser and Debian

    From Gioele Barabucci@21:1/5 to Michal Politowski on Sun Dec 1 23:30:01 2024
    On 28/11/24 11:28, Michal Politowski wrote:
    POSIX explicitly limits itself of a subset of ASCII, so it is not going to >> mandate any normalization form. Are there other standards (or initiatives) >> in this area that you know of?

    What about RFC 8265?
    "Preparation, Enforcement, and Comparison of Internationalized Strings Representing Usernames and Passwords"
    https://datatracker.ietf.org/doc/html/rfc8265

    Thank you Michal for the pointer.

    RFC 8265 (and the associated RFC 8264 "PRECIS Framework: Preparation, Enforcement, and Comparison of Internationalized Strings in Application Protocols") looks exactly what all login-related programs should
    implement in order to avoid the kind of errors described in <https://lists.debian.org/debian-devel/2024/11/msg00491.html>.

    But a cursory search shows that none of the current upstreams support
    (or mention) PRECIS. (It also shows that src:precis is a Java library
    squatting a bit on that package name... :))

    Regards,

    --
    Gioele Barabucci

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to G. Branden Robinson on Mon Dec 2 09:00:01 2024
    On Sun, Dec 01, 2024 at 09:16:03PM -0600, G. Branden Robinson wrote:
    These things are ugly, which is why I suppose they haven't caught on
    despite being around for decades, but I would guess that this problem
    space is such that there are no non-ugly solutions apart from "just
    stick to ASCII", which some people find ugly in a different way.

    The issue is that we didn't stick to ASCII. You CAN use UTF-8 in user
    names and it works.

    Apologies if I missed someone bringing up and rejecting Punycode in the previous ~41 messages in this thread.

    Noone did. It doesn't make sense anyway (and I would not implement this
    in adduser), because we HAVE UTF-8 and it works. So ther alternatives
    are really

    (1) Stick with the current way, having UTF-8 work but keeping it
    undocumented, hurling any breakage on the user
    (2) Document UTF-8 as working and consider breakage a bug
    (3) Forbid UTF-8

    Greetings
    Marc


    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to nick black on Mon Dec 2 09:30:01 2024
    On Mon, Dec 02, 2024 at 01:35:05AM -0500, nick black wrote:
    WTF-8 extends UTF-8 to handle
    invalid UTF-16 input.

    WTF-8 is a seriously defined encoding? I have only experienced that name
    as a mocking name for an UTF-8 string that erroneously went though UTF-8 encoding a second time (double-UTF-8).

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to nick black on Mon Dec 2 09:50:01 2024
    On Sun, Dec 01, 2024 at 06:55:09PM -0500, nick black wrote:
    Marc Haber left as an exercise for the reader:
    * any upstream tool could say "bad idea" and refuse patches,
    requiring their long term management,

    Depending of how important this tool is, we could get away without
    patching and probably not even documenting this failure.

    This kind of attitude seems self-defeating. Despite being
    *strongly* in favor of this effort, I would oppose it if were
    strictly a Debian thing. We can inspire the move, but going it
    alone seems a recipe for present and future pain (think SSHing
    from/to Debian and a non-Debian machine).

    I bet that other distribtions will also allow me to useradd an UTF-8
    name today. I don't think that we have patched useradd to allow this.

    * the Linux framebuffer console is pretty limited in what
    glyphs it has available, and the number of glyphs it can
    support,

    Probably, yes. But people working on the Linux framebuffer console are unlikely to actually use UTF-8 user names, so the only really bad

    With all due respect, this seems totally unsupported by anything
    other than vibes =].

    So you think that we should be stricter than we are today?

    * broken localization (or failure to call setlocale()) could be
    a bigger problem, especially for root/system accounts.

    I don't think we should allow UTF-8 charactes in the string "root" or in system account names. And if a local admin decides to do so, Debian packages should still restrict themselves to using US-ASCII in their
    system accounts.

    Why? This would require multiple code paths for what seems to me a
    very questionable objective. You point out later in your
    response that there already exist diverging codepaths, but isn't
    unifying such things always a goal?

    I think that the distinction between system users and regular users is a
    good thing and that we should continue treating them differently.
    Strictly, it's only adduser (and useradd, UID only) having different
    code paths, the treatment in other software is identical.

    Even if we unify things (either by allowing strange characters in system
    user names, or by restricting regular user names to the western
    character set), adduser will need to keep the distinction because we
    assign UIDs from different ranges.

    Do you have a suggestion for a perl regexp that allows this? My current development directory has "qr/[\p{Graph}*\.\${}><%'@]+/".

    I do not. This is not a regex problem in my mind and experience;
    you need full access to complicated libraries.

    Adduser will have to stick to regexes for dependency reasons.

    Any such effort
    should go through Annex 15 canonicalization before being
    inspected at all.

    I have always assumed that canonicalization would be used for sorting
    and equality, while in the databases it is important to keep the
    difference between the unit Angstrom and the capital letter A with
    circle. If we canonicalize everything, why do we have different
    codepoints for different semantics?

    Yes, I need to read your book.

    At that point, you're well past regular
    languages so far as I can tell. I do not see this goal as
    possible with small surgeries on the adduser code base, but
    rather something that requires work across the chain.

    So, "not for Trixie". And what would we do in Trixie? I think we need
    something that a single person can implement in spare time before
    christmas. This is a rather limited amount of time that we have.

    It cannot. "C" is not UTF-8. Assumption of UTF-8 requires a
    properly set LANG and programs calling setlocale(). This, as
    alluded to above, has the potential for a big mess.
    Our default is C.UTF-8 and has been like that for a while.

    Yes, but that can be changed.

    By the local admin? Yes. That's why we (Linux distributions) should
    stick to us-ascii user names for the accounts that are created by our
    packages. If a local admin creates UTF-8 user names but gives them a
    non-UTF-8 locale than it's their fault, and if a user with a UTF-8 user
    name selects a non-UTF-8 locale it's deliberate sabotage. I don't think
    we should or care about that, and it's already possible today.

    With all due respect, I admire your gung ho candoit spirit, but
    adduser alone is not IMHO the place. This is a major change
    requiring support from libraries, applications, and UI to do
    right, and thus wide buyin. I love the idea, but it's not going
    to happen with a few Perl regexes. Please don't read this as
    commentary on you or your code.

    So your recommendation is to disallow things that we have allowed until recently, and maybe remove configurability to REALLY disallow it?

    Greetings
    Marc


    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michal Politowski@21:1/5 to All on Mon Dec 2 11:40:01 2024
    Dnia Sun, 1 Dec 2024 23:27:09 +0100, Gioele Barabucci napisał(a):
    [...]
    But a cursory search shows that none of the current upstreams support (or mention) PRECIS. (It also shows that src:precis is a Java library squatting
    a bit on that package name... :))

    But at least it is an implementation of this PRECIS :)
    There is also python3-precis-i18n in the archive.

    --
    Michał Politowski

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Hofstaedtler@21:1/5 to All on Mon Dec 2 16:30:02 2024
    * Marc Haber <mh+debian-devel@zugschlus.de> [241202 09:43]:
    On Sun, Dec 01, 2024 at 06:55:09PM -0500, nick black wrote:
    Marc Haber left as an exercise for the reader:
    * any upstream tool could say "bad idea" and refuse patches,
    requiring their long term management,

    Depending of how important this tool is, we could get away without patching and probably not even documenting this failure.

    This kind of attitude seems self-defeating. Despite being
    *strongly* in favor of this effort, I would oppose it if were
    strictly a Debian thing. We can inspire the move, but going it
    alone seems a recipe for present and future pain (think SSHing
    from/to Debian and a non-Debian machine).

    I bet that other distribtions will also allow me to useradd an UTF-8
    name today. I don't think that we have patched useradd to allow this.

    We did. Debian carries (since "forever") a patch in useradd to turn
    off most name checking. (Trying to) remove this patch is what
    started this all.

    Observe:

    [root@cc65635fbf00 /]# cat /etc/os-release
    NAME="Fedora Linux"
    VERSION="40 (Container Image)"
    ...
    [root@cc65635fbf00 /]# useradd för
    useradd: invalid user name 'för': use --badname to ignore


    Not sure if mjt brought it up yet, but the sendmail interface will
    also need some solution for utf8 usernames (=email address local
    parts). However, it seems some sendmail implementations already
    cannot cope with utf8 gecos fields.

    Chris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to All on Tue Dec 3 17:30:01 2024
    Hi,

    thank you all for your contributions to this discussion. I have now
    finally understood¹ that it is not enough to try creating an UTF-8
    encoded user name and see that it correctly shows up in /etc/passwd to
    declare UTF-8 support. Please forgive me for not replying to all of you
    in this thread individually, I have read everything and if I didnt cater
    for your arguments in this message please feel free to remind me.

    https://lists.debian.org/debian-devel/2024/11/msg00491.html correctly
    outlines that homograph characters (such as é (UTF-8 0xC3 0xA9 and the lookalike é 0x65 0xCC 0x81) are not only a nuisance. At the least,
    adduser should reject creating étienne if étienne already exists - those
    are different user names but look the same, and if you don't
    cut-and-paste user names instead of typing them you're bound to hit the
    wrong user depending on HOW you type and what input medium you use. Not
    good.

    https://wiki.debian.org/UserAccounts and https://wiki.debian.org/UserAccountsPhilosophy are updated accordingly.

    After understanding this, I must admit that what's currently left active
    on the adduser team (me) doesn't have the capacity to implement this
    properly and in time for trixie. To make things worse, the
    Unicode::Precis module, which should be in Debian as
    libunicode-precis-perl (but isn't) hasnt seen an upstream release in
    more than five years.

    Additionally, I don't see myself in the situation of writing a proper
    checker for the RFC 8264 IdentifierClass (Chapter 4.2) at the moment
    since I don't have the time to check out which \p{Foo} character classes
    match the classes given in the RFC.

    I would appreciate volunteers to help here, but first I need to bring
    some sense in adduser's current state of affairs to make an unstable
    upload that can eventuall migrate to testing.

    What I intend to do in adduser for the next unstable upload is:

    - adduser --system's user name validation will not change
    - I'll make sure that adduser <normal user account> doesn't accept
    UTF-8 user names, bringing it closer to systemd's notion of a valid
    user name
    - adduser --allow-bad-names will still allow UTF-8 usernames, not doing
    normalization. I will document this and make it clear that the local
    admin needs to make sure that they don't allow things they don't want
    to have
    - adduser --allow-all-names will just verbatim pass all user names to
    useradd.

    All this will be documented in the man page, in README.Debian and/or the
    Wiki after the code passes the test suite again.

    I'll probably deprecate --allow-bad-names in favor of something that
    doesn't use the word "bad" (suggestions appreciated). Otoh, adduser in
    the Red Hat World uses --badname to allow such names as well.

    I would love to hear your opinion. Silence is agreement ;-)

    Greetings
    Marc


    ¹ RFC 8264, RFC 8265, and Unicode TR 15 linked in this thread were
    educating for me

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to Gioele Barabucci on Tue Dec 3 18:00:02 2024
    On Tue, Dec 03, 2024 at 05:46:00PM +0100, Gioele Barabucci wrote:
    On 03/12/24 17:20, Marc Haber wrote:
    What I intend to do in adduser for the next unstable upload is:

    - adduser --system's user name validation will not change
    - I'll make sure that adduser <normal user account> doesn't accept
    UTF-8 user names, bringing it closer to systemd's notion of a valid
    user name
    - adduser --allow-bad-names will still allow UTF-8 usernames, not doing
    normalization. I will document this and make it clear that the local
    admin needs to make sure that they don't allow things they don't want
    to have

    Dear Marc,

    in preparation for a PRECIS future, couldn't adduser pass the usernames through NFC instead of doing no normalization?

    RFC 8264 5.2.4 Normalization Rule states:

    In accordance with [RFC5198], Normalization Form C (NFC) is
    RECOMMENDED.

    that would solve the étienne and étienne issue (where the two characters
    are just different renderings of the same character), but not the Ohm-against-Omega issue, right?

    While this seems the right thing to do, I think this should be done in
    useradd (pkg:shadow), in the respective upstream project, so that all
    Linux distributions get the same behavior.

    I have filed https://github.com/shadow-maint/shadow/issues/1138 in the
    general regard. Feel free to add what I fotgot to mention there.

    I'd rather not have this can of worms in adduser, but I'd consider a
    patch.

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gioele Barabucci@21:1/5 to Marc Haber on Tue Dec 3 17:50:01 2024
    On 03/12/24 17:20, Marc Haber wrote:
    What I intend to do in adduser for the next unstable upload is:

    - adduser --system's user name validation will not change
    - I'll make sure that adduser <normal user account> doesn't accept
    UTF-8 user names, bringing it closer to systemd's notion of a valid
    user name
    - adduser --allow-bad-names will still allow UTF-8 usernames, not doing
    normalization. I will document this and make it clear that the local
    admin needs to make sure that they don't allow things they don't want
    to have

    Dear Marc,

    in preparation for a PRECIS future, couldn't adduser pass the usernames
    through NFC instead of doing no normalization?

    RFC 8264 5.2.4 Normalization Rule states:

    In accordance with [RFC5198], Normalization Form C (NFC) is
    RECOMMENDED.

    [1] https://www.rfc-editor.org/rfc/rfc8264.html#section-5.2.4

    Regards,

    --
    Gioele Barabucci

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?utf-8?Q?=C3=89tienne?= Mollier@21:1/5 to All on Tue Dec 3 20:50:01 2024
    Hi Marc,

    Marc Haber, on 2024-12-03:
    thank you all for your contributions to this discussion. I have now
    finally understood¹ that it is not enough to try creating an UTF-8
    encoded user name and see that it correctly shows up in /etc/passwd to declare UTF-8 support. Please forgive me for not replying to all of you
    in this thread individually, I have read everything and if I didnt cater
    for your arguments in this message please feel free to remind me.

    Thank you for having taken the time to investigate this issue,
    as a person concerned, I much appreciated it. Let's see whether
    I can contribute one last useful item.

    I'll probably deprecate --allow-bad-names in favor of something that
    doesn't use the word "bad" (suggestions appreciated). Otoh, adduser in
    the Red Hat World uses --badname to allow such names as well.

    The problem is not the name, but the character set, so perhaps --allow-bad-characters will be better perceived. If you want to
    also avoid "bad", maybe try --allow-ambiguous-characters, or --allow-extended-character-set? The last one is perhaps a bit
    long winded, but also sounds more accurate than the rest. What
    do you think of these approaches?

    Have a nice day, :)
    --
    .''`. Étienne Mollier <emollier@debian.org>
    : :' : pgp: 8f91 b227 c7d6 f2b1 948c 8236 793c f67e 8f0d 11da
    `. `' sent from /dev/pts/5, please excuse my verbosity
    `- on air: DGM - Solitude

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEj5GyJ8fW8rGUjII2eTz2fo8NEdoFAmdPXtIACgkQeTz2fo8N Edrubw/8CqDvIyJGTGpt0Wwy8NThEEQ38je9rmO6P0Bz/ExVgDVaD/s7hHNfamq7 VG5sqVOZUD1Mtn31mXVkaB1ZvSRfLwYHDqBos6jm/4rOGjvrmQKCC3niy8A1H6KY FNsTSB9ERcVB0IV94o9zOtzfMh4X26RSop7XZEQVO30+x25uh49Es3GXGTxuedds +dplgu0ikDtZWWhZIWqVlzRk+yQzMUMuk2Y3OOkNY5ieHwGXl8RE+iAsp2czLkSC gaCZm7U3bc2FMnscNnd3AY21e1agAnJblCl80rj3+HhiIzeRXSEo1fFf2cz0ofDL MNuXTY2das46AyDDwirJ7uz3ocyMXYu652Ih/RxxIjNWfd1RU+yY7CZFSSxXBsLZ YehWomyFjHn/zRTV3jHpLWEmgQJ7eSfNtuqe28rSLl0mArbtdzHEMnq2AD3FsoXJ TIrT0LTQmJMBvMiQJmSvO89bSjY8rhBfIQm9slbVTaTJrO4sCTFvGeWkwU1a1mLT 2AtsO4fvVKgTMpZ1BPxNawdPfH/AWGL7EXrMqVLT9AzCY+JQZp9jbB0c9F/P5zlj plGKHFkPqD0xqlya4P1X3roO6R09UMf41JE/1jOcSVIOqgdlSqlDYQzCbwZDnC2T HTXauqNOJ7hKk5lRaxw4x8xFFaQv1XQ3qcrxatqrmBpt6dr5sXU
  • From Gioele Barabucci@21:1/5 to Marc Haber on Tue Dec 3 21:40:01 2024
    On 03/12/24 17:59, Marc Haber wrote:
    in preparation for a PRECIS future, couldn't adduser pass the usernames
    through NFC instead of doing no normalization?

    RFC 8264 5.2.4 Normalization Rule states:

    In accordance with [RFC5198], Normalization Form C (NFC) is
    RECOMMENDED.

    that would solve the étienne and étienne issue (where the two characters are just different renderings of the same character), but not the Ohm-against-Omega issue, right?

    NFC would solve both of these "problems":

    * Both U+00E9 (é) and U+0065, U+0301 are NFC-normalized to U+00E9,
    * Both U+2126 (Ohm sign) and U+0349 (omega) are NFC-normalized to U+0349 (omega).

    What NFC alone will not solve are homograph collisions: a (U+0061 Latin
    small letter a) and а (U+0430 Cyrillic small letter a) are
    NFC-normalized to different codepoints.

    But these are two different scenarios: the former problem may (and does)
    arise without any wrongdoing from the user's side (a different OS, or a different string manipulation library, or a screen keyboard may produce
    a different é), the latter is an attack. The former is an
    interoperability issue, the latter is a security issue.

    While this seems the right thing to do, I think this should be done in useradd (pkg:shadow), in the respective upstream project, so that all
    Linux distributions get the same behavior.

    That's probably the best approach.

    Thanks for taking the time to delve into this issue,

    --
    Gioele Barabucci

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to All on Tue Dec 3 22:10:02 2024
    On Tue, Dec 03, 2024 at 08:41:06PM +0100, Étienne Mollier wrote:
    Marc Haber, on 2024-12-03:
    I'll probably deprecate --allow-bad-names in favor of something that doesn't use the word "bad" (suggestions appreciated). Otoh, adduser in
    the Red Hat World uses --badname to allow such names as well.

    The problem is not the name, but the character set, so perhaps --allow-bad-characters will be better perceived. If you want to
    also avoid "bad", maybe try --allow-ambiguous-characters, or --allow-extended-character-set? The last one is perhaps a bit
    long winded, but also sounds more accurate than the rest. What
    do you think of these approaches?

    Extended sounds good, maybe even "unicode"? or "international"?

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to Gioele Barabucci on Tue Dec 3 22:10:02 2024
    On Tue, Dec 03, 2024 at 09:39:03PM +0100, Gioele Barabucci wrote:
    On 03/12/24 17:59, Marc Haber wrote:
    in preparation for a PRECIS future, couldn't adduser pass the usernames through NFC instead of doing no normalization?

    RFC 8264 5.2.4 Normalization Rule states:

    In accordance with [RFC5198], Normalization Form C (NFC) is
    RECOMMENDED.

    that would solve the étienne and étienne issue (where the two characters are just different renderings of the same character), but not the Ohm-against-Omega issue, right?

    NFC would solve both of these "problems":

    * Both U+00E9 (é) and U+0065, U+0301 are NFC-normalized to U+00E9,
    * Both U+2126 (Ohm sign) and U+0349 (omega) are NFC-normalized to U+0349 (omega).

    Converting Ohm into an Omega is losing intended information, isnt it?

    Thanks for taking the time to delve into this issue,

    I have learned many things.

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?utf-8?Q?=C3=89tienne?= Mollier@21:1/5 to All on Tue Dec 3 22:30:01 2024
    Marc Haber, on 2024-12-03:
    On Tue, Dec 03, 2024 at 08:41:06PM +0100, Étienne Mollier wrote:
    The problem is not the name, but the character set, so perhaps --allow-bad-characters will be better perceived. If you want to
    also avoid "bad", maybe try --allow-ambiguous-characters, or --allow-extended-character-set? The last one is perhaps a bit
    long winded, but also sounds more accurate than the rest. What
    do you think of these approaches?

    Extended sounds good, maybe even "unicode"? or "international"?

    I avoided unicode as it would include ascii and the safe subset
    documented by posix, and I also considered the unlikely case
    where something were to replace unicode. "international" would
    make the name technology agnostic, but there is still the case
    about also covering the posix-safe subset… Borrowing the idea
    from the other branch of the thread, --allow-unsafe-characters
    sounds fine and would carry the idea that certain characters
    could cause issues, if used in a login name.

    Have a nice day, :)
    --
    .''`. Étienne Mollier <emollier@debian.org>
    : :' : pgp: 8f91 b227 c7d6 f2b1 948c 8236 793c f67e 8f0d 11da
    `. `' sent from /dev/pts/1, please excuse my verbosity
    `- on air: Atlas - Hemifran

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEj5GyJ8fW8rGUjII2eTz2fo8NEdoFAmdPd+kACgkQeTz2fo8N Edq2uQ/+JV7YXDj2ti360MAPkBpFqT9AxgZcElkbo9utZuiqM/YdEUxURXHizqhG japLLIuW1si8xmAT6KAbCRs0pDhsROhALWH1hYCiqmHLCLlEPXV3MwFHtTTu2vvF 6peG7tEH419evKynMuRHW1ZVeoBo2tylONZldyH/83aq75naY8oaJqCndFZj8ZZR bdBB5qpjB7TbojIOFBsunQImWF0ZB/a72boIWl6JFoCvooeY5LLhXqictSwBbo0O R+uZniul+aUDSY1rgbO4jIuWrl6Znk6wmXEFdZshyPgkF+hGSugdEawqaZ9GQ4bT PkDMcez6JXYYW/3ToZellYpBdnnjclVqYm5v83CaakA/pYRSQL/57keRDOBVzfN4 3gkBVmunDrnkJooOrORTKo+3OC5nj+dHk2XRU5D+bFiMv0rOljSg+j1FspYEK8vE xaPPTpAYH1g+3wzndIIRyXYeFDZi64g5mQlFiV4WM4XTbODvNWRCtmjcypQJgLYz kwlgq/mrLM0dv4D3BK8ujEZp9QUvaaehRYr5/Q8+KTWyUYx671AISpysMQhmmSMA PZGzj8r1ENwC1Zps+JVS2GIDeu57EXleW3j+mN2LZORBR+aA1oqDRVGokg10A8BP +PzAEFsWs9hbL9+A5sIY9Gyvqk76kEJuHHWCvnnqDQEaDLQbw
  • From Marc Haber@21:1/5 to Gioele Barabucci on Tue Dec 3 22:50:02 2024
    On Tue, Dec 03, 2024 at 10:18:46PM +0100, Gioele Barabucci wrote:
    Normalization is always lossy, at least in principle.

    Applications that employ normalization accept that tradeoff in order to gain something valuable: in this case the ability to have a Ohm sign codepoint as part of your username is traded for the ability to compare usernames across different OSes and applications.

    I don't know what's exactly in the standard, but my gut feeling says
    that I would probably store _exactly_ what was received, but normalize
    both sides before duplicate checking, sorting, comparing.

    If we'd normalize things away in storage, why do we have homographs in
    the first place? Why would I replace a kyrillic a with a latin a,
    destroying the idea of a "script"?

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gioele Barabucci@21:1/5 to Marc Haber on Tue Dec 3 22:20:01 2024
    On 03/12/24 22:02, Marc Haber wrote:
    On Tue, Dec 03, 2024 at 09:39:03PM +0100, Gioele Barabucci wrote:
    On 03/12/24 17:59, Marc Haber wrote:
    in preparation for a PRECIS future, couldn't adduser pass the usernames >>>> through NFC instead of doing no normalization?

    RFC 8264 5.2.4 Normalization Rule states:

    In accordance with [RFC5198], Normalization Form C (NFC) is
    RECOMMENDED.

    that would solve the étienne and étienne issue (where the two characters >>> are just different renderings of the same character), but not the
    Ohm-against-Omega issue, right?

    NFC would solve both of these "problems":

    * Both U+00E9 (é) and U+0065, U+0301 are NFC-normalized to U+00E9,
    * Both U+2126 (Ohm sign) and U+0349 (omega) are NFC-normalized to U+0349
    (omega).

    Converting Ohm into an Omega is losing intended information, isnt it?

    Normalization is always lossy, at least in principle.

    Applications that employ normalization accept that tradeoff in order to
    gain something valuable: in this case the ability to have a Ohm sign
    codepoint as part of your username is traded for the ability to compare usernames across different OSes and applications.

    Regards,

    --
    Gioele Barabucci

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Soren Stoutner@21:1/5 to All on Tue Dec 3 15:15:52 2024
    I appreciate your being careful and deliberate about this instead of rushing into a solution that brings unintended consequences. But I also appreciate your taking the time to engage with the issue instead of just ignoring it.

    On Tuesday, December 3, 2024 9:20:53 AM MST Marc Haber wrote:
    Hi,

    thank you all for your contributions to this discussion. I have now
    finally understood¹ that it is not enough to try creating an UTF-8
    encoded user name and see that it correctly shows up in /etc/passwd to declare UTF-8 support. Please forgive me for not replying to all of you
    in this thread individually, I have read everything and if I didnt cater
    for your arguments in this message please feel free to remind me.

    https://lists.debian.org/debian-devel/2024/11/msg00491.html correctly outlines that homograph characters (such as é (UTF-8 0xC3 0xA9 and the lookalike é 0x65 0xCC 0x81) are not only a nuisance. At the least,
    adduser should reject creating étienne if étienne already exists - those are different user names but look the same, and if you don't
    cut-and-paste user names instead of typing them you're bound to hit the
    wrong user depending on HOW you type and what input medium you use. Not
    good.

    https://wiki.debian.org/UserAccounts and https://wiki.debian.org/UserAccountsPhilosophy are updated accordingly.

    After understanding this, I must admit that what's currently left active
    on the adduser team (me) doesn't have the capacity to implement this
    properly and in time for trixie. To make things worse, the
    Unicode::Precis module, which should be in Debian as
    libunicode-precis-perl (but isn't) hasnt seen an upstream release in
    more than five years.

    Additionally, I don't see myself in the situation of writing a proper
    checker for the RFC 8264 IdentifierClass (Chapter 4.2) at the moment
    since I don't have the time to check out which \p{Foo} character classes match the classes given in the RFC.

    I would appreciate volunteers to help here, but first I need to bring
    some sense in adduser's current state of affairs to make an unstable
    upload that can eventuall migrate to testing.

    What I intend to do in adduser for the next unstable upload is:

    - adduser --system's user name validation will not change
    - I'll make sure that adduser <normal user account> doesn't accept
    UTF-8 user names, bringing it closer to systemd's notion of a valid
    user name
    - adduser --allow-bad-names will still allow UTF-8 usernames, not doing
    normalization. I will document this and make it clear that the local
    admin needs to make sure that they don't allow things they don't want
    to have
    - adduser --allow-all-names will just verbatim pass all user names to
    useradd.

    All this will be documented in the man page, in README.Debian and/or the
    Wiki after the code passes the test suite again.

    I'll probably deprecate --allow-bad-names in favor of something that
    doesn't use the word "bad" (suggestions appreciated). Otoh, adduser in
    the Red Hat World uses --badname to allow such names as well.

    I would love to hear your opinion. Silence is agreement ;-)

    Greetings
    Marc


    ¹ RFC 8264, RFC 8265, and Unicode TR 15 linked in this thread were
    educating for me


    --
    Soren Stoutner
    soren@debian.org
    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEJKVN2yNUZnlcqOI+wufLJ66wtgMFAmdPgxgACgkQwufLJ66w tgP3dA//e7/S9ajyCAzadr0wNbx9oBzIGofqM3OZ6gzZlHtoj/jm0/VwY1VB47kw 5H0zcTO2eyul75riOVKBwwhOXNDueuBv7PwL5tr2c3o1mBHrEdtS+TPaLooUfw/M Qdx+Knouxha5fM+yPbogUZOO3pADrUHDRW2CUaqTEwKleSZdw4IV9Qx8G1hIMqAu QCGMmyeMO9Q+T162eBZ0Ah8sR16jbBEGTk17ax/ssdItNryxJRbl3M+eRk6ge1Mx BWfliH+4s2w/CbyJDTzRiY2lLPeQwpOPfTDv88061kC3iGHzT4YO98oiV8TwTVw6 m1HnJQAUgYbLlN8Bm7iIBL0cTkCK6vv1kPl10dv4Sf6LAwbWokXyU05H6uMnbNSe 7nkuNX1t/3tOKXDZykkfT4BIrjEmzXg3/GVAqUMlUsQbrdR7Js8WjDN3QKj4H3/g FOA1RuKUfo/A2/0o0R2BZIbdmfmM98h4rniy8BRaOlLN4BRBObtHwUDOW9n36oT8 zT9/eJ0WJbSUtNydDDcrHpiYKKyW55fCiEU0XRGf0gtZK4LQcIdOE9oqETnweFt+ xuZbIp8WWl/JYy5tnxnREMwPPV2S4XoE1erpYgUVpt+HY5u9cFKP7bjFYT1VX5HS ZwQ16xvy3aHaO1KFHR0T1GwDf+F7FWEKSc4SaQx9XYY5AgGugdY=
    =LMes
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Alejandro Colomar@21:1/5 to All on Thu Dec 5 14:34:21 2024
    Copy: debian-devel@lists.debian.org

    Hi Marc,

    Homograph attacks would be best mitigated in software reading
    /etc/passwd, alerting in their output or logs that the user name they
    just printed was composed of strange alphabets.

    Software that reads /etc/passwd or /etc/shadow is quite sensitive, and
    should therefore be as simple as possible. More code, more bugs.

    The best mitigation for those attacks is to ban the names altogether.
    IMO, setuid programs should not accept Unicode.

    Have a lovely day!
    Alex

    --
    <https://www.alejandro-colomar.es/>

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEE6jqH8KTroDDkXfJAnowa+77/2zIFAmdRq90ACgkQnowa+77/ 2zKYJg//fhAD6gj5l4mWj+4XVfaR1Gz3whaMhQ3K+Bhhbngeg+nyghxONMrZneZL M/63zVSwWnQxOu1wOvV/XkO8yioO8v8EUglDWp0iZwmWEPqQWT6VdBTm5+PlFvSD mLfEF8be+mK/0obnXJVa0Qs+cuWUQAjkep21aovYVh8hN1lTvVcCSsandFe4uFPT wiS0d70lDGja/0xWZqtcrnWiT8I2mfiyrKnGKHOR4Sgg4pPPYVjy1XbR8xPq649u u1klAHUKCrI5UefSns1iTmuoWvywfU5DqOzOp5PJthCnf6eL+ji8ERAihBOQcBy1 hONT1/OHCohuqACFjl4Ian58RGEXwER4Ok0Zus5YEi4ognnh8zMdRifkq8QQ2iuc f9QXqFAzYKS8FtR6VVOyyciVHLE3cU2dTqndzxAaq4b7Dbks719N622Gf20dst5j g5EvVxOfmPpIgHRwMMe9gyst1bkrtXhpc2BYXHaNmInrhRB00G7y8kOMBhsRI1uz SjmDYoxzUh6ZC4jGR5Qy0SyFfxdJGZbZEtEo8XQnXsqCqz9cpDtq90of5HDotj6n vR6iNPav2HjsbdAr41dGY72O4/O8b0pY2Rqr49IM9UF6tHU2fO+qGn9YLPi7fTUN XiLlk6yGVQHHVWIx5mQyzEIFP8ZQbMq7zUe1PL4a1V0XUVAA3ag=
    =Ms1f
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Alejandro Colomar@21:1/5 to Marc on Thu Dec 5 15:53:36 2024
    Copy: debian-devel@lists.debian.org

    Marc wrote:
    On Tue, Dec 03, 2024 at 08:41:06PM +0100, Étienne Mollier wrote:
    Marc Haber, on 2024-12-03:
    I'll probably deprecate --allow-bad-names in favor of something that doesn't use the word "bad" (suggestions appreciated). Otoh, adduser in the Red Hat World uses --badname to allow such names as well.

    The problem is not the name, but the character set, so perhaps --allow-bad-characters will be better perceived. If you want to
    also avoid "bad", maybe try --allow-ambiguous-characters, or --allow-extended-character-set? The last one is perhaps a bit
    long winded, but also sounds more accurate than the rest. What
    do you think of these approaches?

    Extended sounds good, maybe even "unicode"? or "international"?

    I prefer "bad". It gives the implicit message that it's bad to use that
    flag. If you find it offensive, then how about --allow-unsafe-names?

    I oppose "unicode", "extended", or "international", as all of them
    remove the connotation that you should not use that flag.

    Anyway, I vote for removing the possibility of using unsafe names, and
    not even exposing a flag.

    Have a lovely day!
    Alex

    --
    <https://www.alejandro-colomar.es/>

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEE6jqH8KTroDDkXfJAnowa+77/2zIFAmdRvnAACgkQnowa+77/ 2zLwxA//WTlK2a6dxhRW5f+ohED8josb+k54CmClx76kzQvL7/QililZdl/3eMJx EEnZs3KpiNTrCXwPrI+BAJ7p/yqA6ySI+BXr03nDJYlILtLlkh+9U1UMuEgM5xED KLXME/b+upaBKkOhXHHpjSkWiXnvjdoozEGCU+I8LZ5NY2qGbigfUg1Mnw5asY8B ra5UkN6s0GVzEaRcIm4t5da7ObOwxHir68VsY3kfa9eX9mWKF4foTnnnVvxOUwHg W3BAtllc4ADyCoevsr3cOqdn3GmUKDvH6LDL5Fl6KYVUgW24yXJ2x2Fra+RMj3+p 9ZhM2BNWNjdmJv4pzS7JYwEk6po4/GvfEe5UTaS3d0dfBkAzNx6I8f6OmVQUykOo s4txzSLZZ7WV0zseYjdG5bbbVwn0Uk0cnvPMZ0soTTCMllLBBET2G40qIctO+nTZ X5CLb193/zeF3Uuah5ufMlRS9DMgp8v+Uw3Yz9lD2ht+kGz3z+UJWv7u/19KELYg IH9QisBxQX+5edbhE/Ve3bbPXrpc8VV5YSSpmY67eS4usK28bR/x2zRZNFKWCUHD v1tT9EmYVeFul2FM0A/FkqxDff46MqiqK4wmsz9WPGnJ5GMqgUJoBMOEyllX5nCO lY7N445g2q0UBtHSBYj5ApWv6cBiWl1bMiu6M157d1auiB1Y+ZQ=
    =ednz
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephan Seitz@21:1/5 to All on Thu Dec 5 16:40:02 2024
    Am Do, Dez 05, 2024 at 14:34:21 +0100 schrieb Alejandro Colomar:
    The best mitigation for those attacks is to ban the names altogether.
    IMO, setuid programs should not accept Unicode.

    Today, not many people want to live in the past and accept simply ASCII
    if there name needs a bigger character set.

    Stephan

    --
    | If your life was a horse, you'd have to shoot it. |

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to All on Thu Dec 5 17:10:01 2024
    On Thu, 5 Dec 2024 14:34:21 +0100, Alejandro Colomar <alx@kernel.org>
    wrote:
    The best mitigation for those attacks is to ban the names altogether.
    IMO, setuid programs should not accept Unicode.

    Oh, Bugs by Code. Dangerous. We should stop producing code completely.
    No code, no bugs.

    Neither adduser nor useradd are setuid.

    --
    ---------------------------------------------------------------------------- Marc Haber | " Questions are the | Mailadresse im Header Rhein-Neckar, DE | Beginning of Wisdom " |
    Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 6224 1600402

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephan Seitz@21:1/5 to All on Thu Dec 5 17:20:01 2024
    Am Do, Dez 05, 2024 at 17:05:29 +0100 schrieb Marc Haber:
    Neither adduser nor useradd are setuid.

    To be fair, passwd is setuid. And I’m sure you are using it to set the password. So it has to survive an unicode user name.

    Stephan

    --
    | If your life was a horse, you'd have to shoot it. |

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to nick black on Thu Dec 5 18:10:01 2024
    On Sat, Nov 23, 2024 at 02:48:10AM -0500, nick black wrote:
    I recommend Chapter 7 of my free book, "Hacking the Planet with
    Notcurses: A Guide to TUIs and Character Semigraphics" for the
    full story (as I understand it) regarding Unicode presentation: https://nick-black.com/htp-notcurses.pdf (starts on page 41).

    Thank you very much for providing this. The chapter has educated me.
    "The vast minimum of things you should know about Unicode."

    The time to read it was well spent.

    Greetings
    Marc

    P.S.: Sadly, this has gotten less than positive coverage on LWN. I
    apologize for the harm this discussion has done.

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Kallus@21:1/5 to All on Sun Dec 8 21:40:01 2024
    Hi everyone!

    I second calling it "allow-unsafe-names" for the following reasons:

    1. Many programs assume that usernames are so inert that they can be
    used in shell strings without proper escaping. For example, a user
    named $(touch /tmp/pwn) will create /tmp/pwn upon the first launch of
    an interactive bash, because the default bash PS1 interpolates the
    username before doing command substitution. adduser doesn't allow
    whitespace or forward slashes in usernames, even with
    --allow-all-names, but you can still get the same behavior with the
    username $(>`printf$IFS"\x2ftmp\x2fpwn"`). How this works is left as
    an exercise for the reader. Once you figure it out, see if you can
    out-golf us :)

    2. There's a path traversal bug in useradd (but not adduser) that can
    be triggered by usernames beginning with "../". For example, for the
    username "../bin/brangal", useradd will create a home directory at /home/../bin/brangal (i.e. /bin/brangal). This can be used to place a
    directory owned by the new user nearly anywhere on the system.

    -Ben Kallus && Jonah Weinbaum

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Hofstaedtler@21:1/5 to All on Mon Dec 9 18:10:01 2024
    * Marc Haber <mh+debian-devel@zugschlus.de> [241203 22:06]:
    On Tue, Dec 03, 2024 at 08:41:06PM +0100, Étienne Mollier wrote:
    Marc Haber, on 2024-12-03:
    I'll probably deprecate --allow-bad-names in favor of something that doesn't use the word "bad" (suggestions appreciated). Otoh, adduser in the Red Hat World uses --badname to allow such names as well.

    The problem is not the name, but the character set, so perhaps --allow-bad-characters will be better perceived. If you want to
    also avoid "bad", maybe try --allow-ambiguous-characters, or --allow-extended-character-set? The last one is perhaps a bit
    long winded, but also sounds more accurate than the rest. What
    do you think of these approaches?

    Extended sounds good, maybe even "unicode"? or "international"?

    I echo Alejandro's concerns. We should stop having the flag
    completely, not encourage using it.

    If the default restrictions are too tight, then we need to work on
    that. What we should not do is to introduce a badly tested because
    mostly unused codepath, that will introduce bugs in all sorts of
    places.
    IOW: if we move towards better character support, we need to do that
    by allowing it always. Same for longer names.

    Chris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Hofstaedtler@21:1/5 to All on Mon Dec 9 18:20:01 2024
    * Marc Haber <mh+debian-devel@zugschlus.de> [241205 18:06]:
    P.S.: Sadly, this has gotten less than positive coverage on LWN. I
    apologize for the harm this discussion has done.

    Marc, my thank you for collecting the info on the wiki, and starting
    this discussion. I'm sorry I was not able to participate more.

    However, I reject the idea that it is on you to apologize for LWN
    covering this discussion and the harm that might have come out of
    it. This is something we need to address on a wider floor. Otherwise
    we lose our ability to discuss anything (and then changing anything
    ever).

    Best,
    Chris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to zeha@debian.org on Mon Dec 9 21:30:01 2024
    On Mon, 9 Dec 2024 18:04:52 +0100, Chris Hofstaedtler
    <zeha@debian.org> wrote:
    This was never on the table, and shadow upstream might even drop the
    entire "support" for having bad names.

    Just for the record, I consider this a kneejerk reaction that moves
    the world backwards. It's sad.

    --
    ---------------------------------------------------------------------------- Marc Haber | " Questions are the | Mailadresse im Header Rhein-Neckar, DE | Beginning of Wisdom " |
    Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 6224 1600402

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Hofstaedtler@21:1/5 to All on Tue Dec 10 12:20:01 2024
    * Marc Haber <mh+debian-devel@zugschlus.de> [241209 21:21]:
    On Mon, 9 Dec 2024 18:08:33 +0100, Chris Hofstaedtler
    <zeha@debian.org> wrote:
    I echo Alejandro's concerns. We should stop having the flag
    completely, not encourage using it.

    I violently disagree. But I have to accept this.

    IOW: if we move towards better character support, we need to do that
    by allowing it always. Same for longer names.

    I think that our distinction between system users and "normal" users
    is fine. Noone needs a package generating "weird" user names.

    I think we're speaking past each other here.

    Packages can already create absolutely broken usernames today, if
    they want.

    To me, the question is more, why do we have a flag that, if used,
    allows you to break /etc/{passwd,shadow,group,gshadow} completely?

    Chris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theodore Ts'o@21:1/5 to Gioele Barabucci on Tue Dec 10 13:50:01 2024
    On Tue, Dec 03, 2024 at 09:39:03PM +0100, Gioele Barabucci wrote:
    NFC would solve both of these "problems":

    * Both U+00E9 (é) and U+0065, U+0301 are NFC-normalized to U+00E9,
    * Both U+2126 (Ohm sign) and U+0349 (omega) are NFC-normalized to U+0349 (omega).

    What NFC alone will not solve are homograph collisions: a (U+0061 Latin
    small letter a) and а (U+0430 Cyrillic small letter a) are NFC-normalized to different codepoints.

    NFC also doesn't solve various invisible characters (e.g., zero-width
    spaces, bidirectional control characters). For more information about
    all of the various security land mines, see[1]. I also suggest that
    people do a google search on "CVE" and "Unicode". There has been at
    least one interaction where we needed to make a kernel(!) change to
    address a security vulnerability, although we decided it wasn't
    super-critical because "no sane distribution actually enables the
    casefold feature on users' file systems by default".

    [1] https://www.unicode.org/reports/tr39/tr39-22.html

    The other security consideration to consider is the vast amount of
    code that you need to link into security critical / setuid programs if
    you are going to use libunicode. (And yes, we do include libunicode
    into the kernel in order to support casefold. If you are thinking
    about potentially enabling casefold by default on User file systems
    because Windows and MacOS does it, and we need to appeal to Gen Z'ers
    in order for Debian to stay relevent(tm) --- please don't. :-)

    So if we really do want to support unicode in usernames, may I suggest
    that having someone implement the smallest possible Unicode
    canonicalization library, which also handles getting rid of all of the
    *other* Unicode security traps like invisible characters,
    bidirectional control characters, etc., and then asking it to get
    subjected to rigorous security audits before we propose linking it
    into setuid programs, that would be a Really Good Idea.

    This would also reduce bloat in the minimal Debian install required
    for installer images, docker containers, etc., since we wouldn't need
    to support things like Unicode sorting rules, Unicode case folding,
    conversion between the many different Unicode encoding forms, etc.

    Cheers,

    - Ted







    But these are two different scenarios: the former problem may (and does) arise without any wrongdoing from the user's side (a different OS, or a different string manipulation library, or a screen keyboard may produce a different é), the latter is an attack. The former is an interoperability issue, the latter is a security issue.

    While this seems the right thing to do, I think this should be done in useradd (pkg:shadow), in the respective upstream project, so that all
    Linux distributions get the same behavior.

    That's probably the best approach.

    Thanks for taking the time to delve into this issue,

    --
    Gioele Barabucci




    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gioele Barabucci@21:1/5 to Theodore Ts'o on Tue Dec 10 15:00:01 2024
    On 10/12/24 13:47, Theodore Ts'o wrote:
    On Tue, Dec 03, 2024 at 09:39:03PM +0100, Gioele Barabucci wrote:
    NFC would solve both of these "problems":

    * Both U+00E9 (é) and U+0065, U+0301 are NFC-normalized to U+00E9,
    * Both U+2126 (Ohm sign) and U+0349 (omega) are NFC-normalized to U+0349
    (omega).

    What NFC alone will not solve are homograph collisions: a (U+0061 Latin
    small letter a) and а (U+0430 Cyrillic small letter a) are NFC-normalized to
    different codepoints.

    NFC also doesn't solve various invisible characters (e.g., zero-width
    spaces, bidirectional control characters). For more information about
    all of the various security land mines, see[1].

    NFC has been mentioned in a broader discussion on PRECIS/RFC8264/RFC8265.

    The IdentifierClass of RFC 8264 explicitly disallows all these "security
    land mines": https://www.rfc-editor.org/rfc/rfc8264.html#section-4.2.3

    The "Security considerations" section is quite extensive (5 pages long): https://www.rfc-editor.org/rfc/rfc8264.html#section-12

    In general, the PRECIS RFCs are more prescriptive than Unicode UTS #39,
    so, should Unicode usernames ever happen, the PRECIS RFCs are the
    reference all programs should follow.

    Regards,

    --
    Gioele Barabucci

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theodore Ts'o@21:1/5 to Gioele Barabucci on Tue Dec 10 16:00:01 2024
    On Tue, Dec 10, 2024 at 02:52:05PM +0100, Gioele Barabucci wrote:
    NFC has been mentioned in a broader discussion on PRECIS/RFC8264/RFC8265.

    The IdentifierClass of RFC 8264 explicitly disallows all these "security
    land mines": https://www.rfc-editor.org/rfc/rfc8264.html#section-4.2.3

    The "Security considerations" section is quite extensive (5 pages long): https://www.rfc-editor.org/rfc/rfc8264.html#section-12

    Oh, good. I was just getting worried when discussion on the list
    seemed to be treating NFC as a silver bullet, and people were
    suggesting that the canonicalization should be done both by readers
    and writers of /etc/passwd --- which would imply linking libunicode
    into setuid programs like sudo and login, with the (to my view)
    invevitable results of hilarity ensuing.

    As I look at RFC 8264, I note that it does not take a position about
    which version of Unicode should be considered canonical, and in fact
    talks about one of the features (tm) of RFC 8264 being that it is
    agile with respect to newer versions of Unicode.

    However, it should be noted that RFC 8264 also states that code points
    which are not defined in whatever version of the Unicode supported by
    "the application" shall be disallowed. From Debian's perspective,
    though, if we are going to take a position about what version of
    Unicode should be supported by "the application(s)" that read and
    write /etc/passwd, we *will* need to take a position on what version
    of Unicode should be supported, and therefore, what set of characters
    will be disallowed.

    It also means that we need to be careful about what happens when we
    want to upgrade to newer versions of Unicode in future versions of
    Debian. If the system administrator wants to support more than one
    version of Debian, then it would be advisable if the Unicode version
    is something which is configurable, especially if the passwd entries
    are being supplied via some kind of network protocol such as LDAP or
    Hesiod (for those people who remember MIT Project Athena :-P).

    There is also (admittedly, only on edge case) of what to do if a newer
    version of Unicode disallows or remove characters. This rarely
    happens, but it has in the past (in particular in the case of various
    security disasters, or in the case of characters getting deprecated in
    favor of newer characters, many of which are mentioned in RFC 8264).
    So we can probably just ignore this case and hope that the Unicode
    consortium will be more careful in the future, but I'd thought I'd
    just mention it.

    The bottom line is that while I am sympethetic to the desire to
    support Unicode --- heck, I was one of the primary drivers of
    libunicode into the kernel so we could support case folding for more
    than just the ASCII character set --- the meme of "One does not simply
    walk into Morder" also applies for "adopting Unicode".

    And I am reminded of one of my IETF mentors who was an
    Iternationalization expert tell me two decades ago that, late at
    night, in the bar after a standard meeting, one of the things that
    I18N folks would say, just amongst themselves, was, "It would be
    easier just to teach everyone English" --- and this was with I18N
    experts who understood everything that was involved in doing full I18N
    support. No doubt this was only half-joking, but I think the point is
    valid.

    So if we're going to do this, let's do it right. :-)

    - Ted

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to zeha@debian.org on Tue Dec 10 15:30:01 2024
    On Tue, 10 Dec 2024 12:10:14 +0100, Chris Hofstaedtler
    <zeha@debian.org> wrote:
    To me, the question is more, why do we have a flag that, if used,
    allows you to break /etc/{passwd,shadow,group,gshadow} completely?

    The user-oriented solution would be to identify the things that break /etc/passwd and to forbid these. Just forbidding everything is heading
    the wrong direction.

    Greetings
    Marc
    --
    ---------------------------------------------------------------------------- Marc Haber | " Questions are the | Mailadresse im Header Rhein-Neckar, DE | Beginning of Wisdom " |
    Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 6224 1600402

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon Josefsson@21:1/5 to Theodore Ts'o on Tue Dec 10 18:10:02 2024
    "Theodore Ts'o" <tytso@mit.edu> writes:

    However, it should be noted that RFC 8264 also states that code points
    which are not defined in whatever version of the Unicode supported by
    "the application" shall be disallowed. From Debian's perspective,
    though, if we are going to take a position about what version of
    Unicode should be supported by "the application(s)" that read and
    write /etc/passwd, we *will* need to take a position on what version
    of Unicode should be supported, and therefore, what set of characters
    will be disallowed.

    A possible position may be to treat code points that are the subject of
    version mismatching to be undefined. This is how IDNA resolved the same problem, and PRECIS inherited this. While I protested about that
    approach many years ago as libidn maintainer when IDNA2003 was
    hard-coded to use Unicode 3.2, I think today that the approach is
    reasonable since Unicode has maintained good stability. We've done a
    couple of Unicode version bumps in libidn2 and interop with other IDN implementations -- that typically always use some other Unicode version
    -- is good enough to not cause serious breakage. I would expect the
    same to be true for PRECIS usernames too. Hostnames are hashed and is
    subject to string comparisons, just like usernames, so we have some
    experience to build on here.

    I would involve cross-distribution discussion about this though.
    Perhaps the /etc/passwd APIs affect some POSIX specifications, and a
    non-ASCII extension could be proposed.

    /Simon

    -----BEGIN PGP SIGNATURE-----

    iIoEARYIADIWIQSjzJyHC50xCrrUzy9RcisI/kdFogUCZ1h1mBQcc2ltb25Aam9z ZWZzc29uLm9yZwAKCRBRcisI/kdFouqHAQC/TPObCg/ICrzye/UYk5zHKrYrpoCg nTGBrRJuLGeZZwD/dvik6f8DK81jUjxk+WyGnQK58JsjrvLEmCDEHSlXCQI=
    =RWT5
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theodore Ts'o@21:1/5 to Simon Josefsson on Tue Dec 10 19:20:01 2024
    On Tue, Dec 10, 2024 at 06:08:40PM +0100, Simon Josefsson wrote:
    I would involve cross-distribution discussion about this though.
    Perhaps the /etc/passwd APIs affect some POSIX specifications, and a non-ASCII extension could be proposed.

    Yeah, good point. If the scope is going to include passwd entries
    that are distributed via network protocols like LDAP, then we need to
    worry about sites that support other Linux distributions beyond just
    Debian --- or for that matter, sites that need to support Linux as
    well as legacy Unix systems like AIX or Solaris.

    Of course, we could just exclude them from the scope and say that if
    you are using LDAP, then you MUST only use ASCII characters in the
    username, given that POSIX has decided to run away from the I18N
    problems wrt to usernames. That might be the simpler approach, unless
    we want to drive something that could eventually be adopted by POSIX.

    - Ted

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to All on Tue Dec 10 21:30:01 2024
    On Tue, 10 Dec 2024 13:13:08 -0500, "Theodore Ts'o" <tytso@mit.edu>
    Yeah, good point. If the scope is going to include passwd entries
    that are distributed via network protocols like LDAP, then we need to
    worry about sites that support other Linux distributions beyond just
    Debian --- or for that matter, sites that need to support Linux as
    well as legacy Unix systems like AIX or Solaris.

    Even if we had full Unicode support for anything using /etc/passwd, a
    site is always free to restict itself to us-ascii usernames. Same with
    POSIX, in my understanding we would still be POSIX compliant if we had
    full Unicode support for usernames, because POSIX defines the minimum
    of things a system MUST support, but it is always free to support
    more. Or, at least I hope so.

    But things are moving by shadow upstream taking a user-hostile stance,
    willing to take away freedom. I must be fine with that because I
    cannot change it. But I don't need to like it.

    Greetings
    Marc
    --
    ---------------------------------------------------------------------------- Marc Haber | " Questions are the | Mailadresse im Header Rhein-Neckar, DE | Beginning of Wisdom " |
    Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 6224 1600402

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Charles Plessy@21:1/5 to All on Wed Dec 11 02:10:01 2024
    Hello everybody,

    sorry if it is too naive, but is there an easy way to determine for a
    given Unicode string if it can be typed from a single keboard layout or produced by a text-to-speech system? People who want a username because
    of SSH, email and su will want to be able to input it. On the other
    range of user cases, they can use a computer for years without seeing
    their username.

    If we take one step back and look at the future: will usernames
    still be a thing in 10 years? If not, then a simple heuristic that
    satisfies more than half of the users may be enough...

    Have a nice day,

    Charles


    --
    Charles Plessy Nagahama, Yomitan, Okinawa, Japan
    Debian Med packaging team http://www.debian.org/devel/debian-med Tooting from home https://framapiaf.org/@charles_plessy
    - You do not have my permission to use this email to train an AI -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jeremy Stanley@21:1/5 to Charles Plessy on Wed Dec 11 02:50:01 2024
    On 2024-12-11 10:04:44 +0900 (+0900), Charles Plessy wrote:
    [...]
    is there an easy way to determine for a given Unicode string if it
    can be typed from a single keboard layout
    [...]

    Do keyboards with a "compose" key count? There's plenty of glyphs I
    can type which aren't depicted directly on my keyboard's keycaps,
    after all.
    --
    Jeremy Stanley

    -----BEGIN PGP SIGNATURE-----

    iQKTBAABCgB9FiEEl65Jb8At7J/DU7LnSPmWEUNJWCkFAmdY7VtfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDk3 QUU0OTZGQzAyREVDOUZDMzUzQjJFNzQ4Rjk5NjExNDM0OTU4MjkACgkQSPmWEUNJ WCnQQQ/+IwTSZhfI3X0CPWIHvvZ0AnFmr5L2jvdx8BisChj2SpGEx5chXZbFkHdw vQtpM9xkwugBe9ZSKhClxXEweiOprEBQBy28EOWMWNzAJYAetxjw2r09lGlClfE2 GmJgWX5mhzqhxpLvWaqigA/GcIo4MiE/FvIZqr9kxrdSJW+5v8b8ZN7XWnQNfIoX +PnrvR13+j6tMyP2y8r4CCJwQqZD9czd3usROxFLnOBAe/rjXjMsmWTvOL6vqRh6 ORm3XDh3GLYPEdp29XlPcJ2HuBXvA0+k0F1t9e2kUDjM10hHJdswFGCKw9mUnV4U 0KLBUy0OB0TH/VFIc0IX5ax9MkNIxwVZMDaoS0nK3ikTdBPhthdTJaf9K31dBbFH tPepWvOIlelAwSXeO+JczoUMjGz+UtOAAG5Wa9KP1b8cjaC72fi/msOAt1c+G0Dy IfL8gPfPsfA12lTtjmiYek2VuDYdz/HmwzSB18JkVZUtexkH1mIPe+g/WHkgdSvp D54TrAQAPwD36jrBvgK9YNUa/aXm1+aQBPlFr1+QcDWiB4+5W0ac+V5Xn+m3FC7j PM3KDh3iZWVUJvBoqhOG5f/XsoGHD6s00pNCGkjMc9PxSznI2hwNCFIEWpbPoi6h LqwoWizTTsiRt2RyyF7jRkkKiagWrROgXQ+VOxnv21ByxQukN/k=
    =0dez
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32
  • From Marc Haber@21:1/5 to All on Wed Dec 11 09:20:01 2024
    On Wed, 11 Dec 2024 10:04:44 +0900, Charles Plessy <plessy@debian.org>
    wrote:
    sorry if it is too naive, but is there an easy way to determine for a
    given Unicode string if it can be typed from a single keboard layout or >produced by a text-to-speech system? People who want a username because
    of SSH, email and su will want to be able to input it.

    That's easy, just choose a user name for YOU that YOU can type on YOUR keyboard. Why would anybody chose a username that is impossible to use
    in their own locale?

    Greetings
    Marc
    --
    ---------------------------------------------------------------------------- Marc Haber | " Questions are the | Mailadresse im Header Rhein-Neckar, DE | Beginning of Wisdom " |
    Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 6224 1600402

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theodore Ts'o@21:1/5 to Marc Haber on Thu Dec 12 17:10:01 2024
    On Tue, Dec 10, 2024 at 09:24:15PM +0100, Marc Haber wrote:

    But things are moving by shadow upstream taking a user-hostile stance, willing to take away freedom. I must be fine with that because I
    cannot change it. But I don't need to like it.

    As a suggestion, we might make more forward progress if we assume good
    faith and accept that other people might have different priorities
    than others. I could easily see shadow, being a security-related
    package, would consider encouraging something that could lead to
    security bugs or just other random breakage, as "user-hostile".

    I am reminded of Professor Jerome Saltzer, who was responsible for the
    overall technical architecture for MIT's Project Athena, insisting
    that he be assigned the username Saltzer. He theorized that while
    this *would* cause breakage (for a long time, usernames were assumed
    to be always lowercase ASCII, and given that e-mail localparts where
    case insensitive, and usernames were case sensitive), but since he was
    (a) a Professor, and (b) responsible for the technical architecture
    for Project Athena, that when problems inevitably showed up, that
    programmers would be incentivized to fix them. As I recall, we didn't
    let students chose mixed-case usernames for a while, since there was
    presumed to be breakage; Professor Saltzer's username was a special
    case.

    If there are brave people who want to use Unicode characters (for
    bonus points, they could try using "unofficial" characters such as the
    Klingon script), they could be the first to find bugs, and report
    them. And if they suffer from security breaches, they would know what
    they were getting into. (And we salute them for their courage. :-)

    Perhaps at some future stable Debian release (not Trixie), we could
    enable it by default. But I really do think we need to do some
    technical work, including not requring adding libunicode as a required
    package, but having a minimal security unicode library that can be
    used by privileged programs first.

    Cheers,

    - Ted

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Henrik Ahlgren@21:1/5 to Marc Haber on Thu Dec 12 19:40:01 2024
    On Wed, 2024-12-11 at 09:11 +0100, Marc Haber wrote:
    That's easy, just choose a user name for YOU that YOU can type on YOUR keyboard. Why would anybody chose a username that is impossible to use
    in their own locale?

    I don't see much problems with single-user machines, especially security related. But, think multi-user environments? Imagine, as a non-Chinese
    speaking Westerner, needing to chown a file to a colleague called 陈成. Even if you have Pinyin configured, you might not even know how to type it. (Of course, you have the same problem with filenames that have essentially no limitations. I know from experience how hard it is to type names in Arabic which I can't read.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to Henrik Ahlgren on Fri Dec 13 12:30:01 2024
    On Thu, Dec 12, 2024 at 08:21:15PM +0200, Henrik Ahlgren wrote:
    I don't see much problems with single-user machines, especially security related. But, think multi-user environments? Imagine, as a non-Chinese speaking Westerner, needing to chown a file to a colleague called 陈成.

    I would type "chown 陈成 <filename>", pasting the user name from the
    written request or probably from /etc/passwd. Or I would ask the system administrator for a solution.

    I see your argument, but I'd also see that as an issue that the system administrator choosing the user names needs to solve. I's nothing that
    we as a distribution should solve.

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to All on Fri Dec 13 12:30:01 2024
    On Thu, 12 Dec 2024 11:02:21 -0500, "Theodore Ts'o" <tytso@mit.edu>
    wrote:
    On Tue, Dec 10, 2024 at 09:24:15PM +0100, Marc Haber wrote:
    But things are moving by shadow upstream taking a user-hostile stance,
    willing to take away freedom. I must be fine with that because I
    cannot change it. But I don't need to like it.

    As a suggestion, we might make more forward progress if we assume good
    faith and accept that other people might have different priorities
    than others. I could easily see shadow, being a security-related
    package, would consider encouraging something that could lead to
    security bugs or just other random breakage, as "user-hostile".

    They are planning to remove the --badname option from useradd, making
    it impossible to even try UTF-8 user names, without patching useradd.
    And if I was in Chris' shoes, I would probably refrain from doing so
    as well.

    And shadow would be the canonical place to do the PRECIS normalization
    at least for comparing usernames. That's something they wouldn't do.

    Perhaps at some future stable Debian release (not Trixie), we could
    enable it by default.

    There won't be such an option for us to enable.

    I need to be fine with that because I cannot change it. But I don't
    need to like it.

    Greetings
    Marc
    --
    ---------------------------------------------------------------------------- Marc Haber | " Questions are the | Mailadresse im Header Rhein-Neckar, DE | Beginning of Wisdom " |
    Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 6224 1600402

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephan Seitz@21:1/5 to All on Fri Dec 13 13:10:01 2024
    Am Do, Dez 12, 2024 at 20:21:15 +0200 schrieb Henrik Ahlgren:
    I don't see much problems with single-user machines, especially security >related. But, think multi-user environments? Imagine, as a non-Chinese >speaking Westerner, needing to chown a file to a colleague called 陈成. Even

    You are joking, aren’t you? You could use „getent passwd” and copy
    & paste the username. Or use the user id.

    With this argument passwd should refuse to set the password to „12345”.

    And no one in this thread has said that you *have* to use non-ASCII
    usernames. But some people don’t want to give you a chance to do it.

    I don’t need non-ASCII for my name but I would never use a system that
    would forces me to rewrite my name in ASCII because it is so utterly
    broken in 2024. I bet there is no problem on Windows systems.

    Stephan

    --
    | Stephan Seitz E-Mail: stse@rootsland.net |
    | If your life was a horse, you'd have to shoot it. |

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?ISO-8859-1?Q?IOhannes_m_zm=F6lnig@21:1/5 to All on Fri Dec 13 14:00:02 2024
    Am 13. Dezember 2024 13:08:01 MEZ schrieb Stephan Seitz <stse+debian@rootsland.net>:

    I don’t need non-ASCII for my name but I would never use a system that would forces me to rewrite my name in ASCII because it is so utterly broken in 2024. I bet there is no problem on Windows systems.

    Stephan


    Incidentally, my kid's school rolled out their school laptops this week, which of course come with Windows11 preinstalled (as a sidenote I am now looking forward to four years of "digital competence training" consisting entirely of Windows(basics),
    PowerPoint, Word and Excel; but that's another story), and *of course* all usernames have been normalized to lowercase ASCII.

    so my take is, that "no. In Redmond you would use ASCII for username"

    Oh, and my name does have non-ASCII characters, and I have been using Unicode in my display name for the last 20 years.
    I do remember problems in the 90ies.
    But those are long past.


    mfh.her.fsr
    IOhannes

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephan Seitz@21:1/5 to All on Fri Dec 13 14:40:01 2024
    Am Fr, Dez 13, 2024 at 13:38:31 +0100 schrieb IOhannes m zmölnig: >Incidentally, my kid's school rolled out their school laptops this week, >which of course come with Windows11 preinstalled (as a sidenote I am now >looking forward to four years of "digital competence training"
    consisting entirely of Windows(basics), PowerPoint, Word and Excel; but >that's another story), and *of course* all usernames have been
    normalized to lowercase ASCII.

    I’m quite sure I have never seen an Asian Windows where you had to use
    ASCII for your username.

    Stephan

    --
    | Stephan Seitz E-Mail: stse@rootsland.net |
    | If your life was a horse, you'd have to shoot it. |

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Stone@21:1/5 to Marc Haber on Fri Dec 13 16:10:01 2024
    On Fri, Dec 13, 2024 at 12:22:38PM +0100, Marc Haber wrote:
    They are planning to remove the --badname option from useradd, making
    it impossible to even try UTF-8 user names, without patching useradd.

    Or edit the passwd file (vipw), or use any non-passwd-file
    authentication mechanism, or use a different user management tool, etc.
    I think you're overemphasizing the importance of the useradd command
    here--it just acts as a convenience and sets some baseline policies;
    it's not actually essential for adding a user. If you don't like the
    policy that useradd sets...just don't use it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From sre4ever@free.fr@21:1/5 to All on Fri Dec 13 15:30:01 2024
    Hi,

    Le 2024-12-13 13:38, IOhannes m zmölnig a écrit :

    and *of course* all usernames have been normalized to lowercase ASCII.

    I just took a look at some reasonably recent government-issued IDs and
    it turns out the French ones normalized my name to uppercase whatever-some-clerk-had-on-their-typewriter-keyboard-late-last-millenium, dropping the accent from the second word of my name. My father's birth certificate is handwritten and has the accent. My Canadian IDs are
    better as they retained the name as I wrote it in in the application
    form. I don't remember if the french online application forms for IDs
    allowed accents in names but I would not be too surprised if they
    didn't. I might start a procedure to try to get that officially fixed in
    2025, as there is another issue with the way my name is registered with
    some administrations that occasionnally complicates my life. I'm pretty confident the other issue will get fixed, much less the accent one
    though the law should be on my side which here means that I could well
    sue the government, win the lawsuit and the subsequent ones up to the
    ECJ and back and still not get that fixed within my lifetime.

    I was going to write that on payment cards you can't have accents in
    your name. Wrong. I managed to get one that reproduced it. I don't use
    that one much online so I don't know if entering my name with the accent actually works somewhere when paying with that card.

    I would not try too hard to get non-ascii characters in that convenient computer identifier often named "login name" rather than "user name".
    You can't get them in the local part of an e-mail address and not many
    people complain. You can't get them in IRC nicknames. You can't get them
    in the machine readable part of your IATA-compliant government-issued
    IDs. It's still better than just numbers. I'm fine with that as long as
    my name is properly written in the places that actually matter.

    If you need a name for that option, --allow-non-ascii should be neutral
    enough.

    --
    Julien Plissonneau Duquène

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter Pentchev@21:1/5 to Peter Pentchev on Fri Dec 13 18:10:01 2024
    On Fri, Dec 13, 2024 at 07:00:36PM +0200, Peter Pentchev wrote:
    On Fri, Dec 13, 2024 at 10:08:19AM -0500, Michael Stone wrote:
    On Fri, Dec 13, 2024 at 12:22:38PM +0100, Marc Haber wrote:
    They are planning to remove the --badname option from useradd, making
    it impossible to even try UTF-8 user names, without patching useradd.

    Or edit the passwd file (vipw), or use any non-passwd-file authentication mechanism, or use a different user management tool, etc.
    I think you're overemphasizing the importance of the useradd command here--it just acts as a convenience and sets some baseline policies;
    it's not actually essential for adding a user. If you don't like the policy that useradd sets...just don't use it.

    In the context of the whole thread, are you suggesting that adduser(1)
    should be changed to use something other than useradd(8) under the hood?

    Sigh, that's adduser(8) too, of course.

    G'luck,
    Peter

    --
    Peter Pentchev roam@ringlet.net roam@debian.org peter@morpheusly.com
    PGP key: https://www.ringlet.net/roam/roam.key.asc
    Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEELuenpRf8EkzxFcNUZR7vsCUn3xMFAmdcaFkACgkQZR7vsCUn 3xMI5g//U2QkXevEVYNw2RUF2LhVZsD3SSnfCYvQE3db/PlXYK54dWfQhkXYhLyw mHH+XncyJUbMv4s1v1hoeyZEYIF/huh9NYl7Ntd99qxpyKiriO+LG6q0Vrf3bVz0 fJtMDArFkwAVxKrhTn/VingixjXUYYe2YJFxJA0zNaTGcLR9f9JX3NCw93TuBhD0 Gh/2M5tu/N7TtLIhB7sXa3DtACJqxOTcPnxN6riOV9BFgalVTWVwTuZGTZUoLGaI aG6bPZsIi8XpCssLuiN9sky4yOpoJaeJ43I7+djO43iI3Iz6kkzy5tiVHsl9iR1d 5Hpv4AyQyoIcvW9epmPcpR+K1xBLxkbuIk8CfyFZSoSbpbdSsobEhZ6/HF4yPrjb vl3YT4SekQI/hA/OBkr+ai2NB3SElcF+/Fd/+zVpjGXp1sjMAdHQDFlcS+5vEVSY iQAdHYmDYDe6SLdWEY6BK+TpOkzaUYVOcgH+STEKxXtqlmwMACfi+MmlVC4+O4Sf l2sJeNBUIGRFZ+NG6Ju9lHJ64MDrpHaECXB+nKV3onYtxOXiPANLK1KrMDUYHir/ 4JOxBm1TT6eYffIuG7lWYtcLoHSEMAjqD+dlTdw64DR4EUlkg0XjXbgPrAh87VwH 5Hgf177Fa5lXaZ75RmztPFoFLxgEDxyeI4E01FovBDMQCJHfWCI=
    =vREB
  • From Peter Pentchev@21:1/5 to Michael Stone on Fri Dec 13 18:10:01 2024
    On Fri, Dec 13, 2024 at 10:08:19AM -0500, Michael Stone wrote:
    On Fri, Dec 13, 2024 at 12:22:38PM +0100, Marc Haber wrote:
    They are planning to remove the --badname option from useradd, making
    it impossible to even try UTF-8 user names, without patching useradd.

    Or edit the passwd file (vipw), or use any non-passwd-file authentication mechanism, or use a different user management tool, etc.
    I think you're overemphasizing the importance of the useradd command
    here--it just acts as a convenience and sets some baseline policies;
    it's not actually essential for adding a user. If you don't like the policy that useradd sets...just don't use it.

    In the context of the whole thread, are you suggesting that adduser(1)
    should be changed to use something other than useradd(8) under the hood?

    G'luck,
    Peter

    --
    Peter Pentchev roam@ringlet.net roam@debian.org peter@morpheusly.com
    PGP key: https://www.ringlet.net/roam/roam.key.asc
    Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEELuenpRf8EkzxFcNUZR7vsCUn3xMFAmdcaC8ACgkQZR7vsCUn 3xMDQBAAsHM//kwqqpWljEKePmadA2kmUpUsoxI0MERCnQztCb2fJuQXYT2niCZz l4VTBxibAIp7CLuq8I6UoGv3R89FUpp0RkrXInS3Rfhhu/mWdIAFX9WLLsItyAJN Y0+dpnWuHUx6KBNA0js0F5bZ9QcsxjJiA3LF9MuOg5fCJkkRi2QqMa930Lc59m6V qdI8Cd34ppCo7wEnkafpOPY5a0isVHwYf/nmNh1MMTQKTEgHZQH0DmQar2NlQsTr hV0xag9PLyiEwZqzI7YOBXMSCKfn6TQNkrb2BwOAwxCWalmYwTOzNRpfyUKOUTbs czOO3ty8KJCNjIvILZu52Hn+Ur1cqotx8hK+Oz9gKhrK86PLmKDEVvZv7USwOHMV 928+W6b2JljaF/wma6hFB8adlTtwlS552gcghlO39i4qUgffoQBVNewiJxkYRPhY U4wyNijH4xrI/LvdWzc4EVOkUfhpK0d2HLKDWyPzWmy9edQgDqxxiYmMhZXwa/xa 2lfwTB4ceNTQJ3TpxWWfCbZ56VbiE3HiU/ckijAekHlW4GQI/R2o+O7FtJI3UU0n QNwWJ5X7pW5zlA3fOi+36zClqH18ujVloXLfInJI/b4+2PmgGAen/jMJG7+t0K7R geUO9C2zKzjPb6ZRlP64AxqjMSSPIgMnzOjUcBxm0EehUzyGQ7I=
    =u1jV
  • From Marc Haber@21:1/5 to Peter Pentchev on Fri Dec 13 21:40:01 2024
    On Fri, Dec 13, 2024 at 07:00:36PM +0200, Peter Pentchev wrote:
    In the context of the whole thread, are you suggesting that adduser(1)
    should be changed to use something other than useradd(8) under the hood?

    adduser will not do that. Doing so is nonsense.

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Stone@21:1/5 to Peter Pentchev on Sat Dec 14 05:10:01 2024
    On Fri, Dec 13, 2024 at 07:00:36PM +0200, Peter Pentchev wrote:
    On Fri, Dec 13, 2024 at 10:08:19AM -0500, Michael Stone wrote:
    On Fri, Dec 13, 2024 at 12:22:38PM +0100, Marc Haber wrote:
    They are planning to remove the --badname option from useradd, making
    it impossible to even try UTF-8 user names, without patching useradd.

    Or edit the passwd file (vipw), or use any non-passwd-file authentication
    mechanism, or use a different user management tool, etc.
    I think you're overemphasizing the importance of the useradd command
    here--it just acts as a convenience and sets some baseline policies;
    it's not actually essential for adding a user. If you don't like the policy >> that useradd sets...just don't use it.

    In the context of the whole thread, are you suggesting that adduser(1)
    should be changed to use something other than useradd(8) under the hood?

    No, I'm suggesting that rhetoric asserting that any adduser/useradd
    policy could constrain people is overblown because users can be added to
    the system without using either of those tools. The tools' policies
    should reflect what is safest and most sensible for the majority of
    users, but if someone wants to do something different there is nothing
    stopping them from doing so.

    The claim at the top of this subthread is that some useradd change would prevent people from experimenting with UTF-8 usernames. As an exercise I
    just created UTF-8 users and groups entirely without useradd/adduser
    (using vipw and vigr):

    getent passwd 1144
    💩:*:1144:1144::/nowhere:/bin/false
    getent group 1144
    💩:*:1144:
    ls -l /tmp/samplefile
    -rw-r--r-- 1 💩 💩 0 Dec 13 22:42 /tmp/samplefile

    On an individual basis there aren't so many steps that creating a user
    manually is a big deal, or that a script dedicated to creating users
    according to the policies of a particular environment would be overly complicated. For a large organization I question the idea that user
    accounts would be managed by adduser/useradd at all.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter Pentchev@21:1/5 to Michael Stone on Sat Dec 14 11:00:02 2024
    On Fri, Dec 13, 2024 at 11:01:43PM -0500, Michael Stone wrote:
    On Fri, Dec 13, 2024 at 07:00:36PM +0200, Peter Pentchev wrote:
    On Fri, Dec 13, 2024 at 10:08:19AM -0500, Michael Stone wrote:
    On Fri, Dec 13, 2024 at 12:22:38PM +0100, Marc Haber wrote:
    They are planning to remove the --badname option from useradd, making it impossible to even try UTF-8 user names, without patching useradd.

    Or edit the passwd file (vipw), or use any non-passwd-file authentication mechanism, or use a different user management tool, etc.
    I think you're overemphasizing the importance of the useradd command here--it just acts as a convenience and sets some baseline policies;
    it's not actually essential for adding a user. If you don't like the policy
    that useradd sets...just don't use it.

    In the context of the whole thread, are you suggesting that adduser(1) should be changed to use something other than useradd(8) under the hood?

    No, I'm suggesting that rhetoric asserting that any adduser/useradd policy could constrain people is overblown because users can be added to the system without using either of those tools. The tools' policies should reflect what is safest and most sensible for the majority of users, but if someone wants to do something different there is nothing stopping them from doing so.
    [snip more about adding accounts without useradd/adduser]

    Thanks, that makes sense. Apologies if my reply came through as snarky.

    G'luck,
    Peter

    --
    Peter Pentchev roam@ringlet.net roam@debian.org peter@morpheusly.com
    PGP key: https://www.ringlet.net/roam/roam.key.asc
    Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEELuenpRf8EkzxFcNUZR7vsCUn3xMFAmddVhAACgkQZR7vsCUn 3xMajxAAtselXvTcg/EA7ftvqA1jDLJewYsGh5nGW0vm+H/WwawQnIdXeU68SePm qs6fbwrHK4xtwjHQvZf4dCw+WF95E0WvUxpMCDtTktXmAgm3acJ2W6rqKPYtfner 62PGLImVuWrGvFWonnHApewFE1qlPxR7K544jnyvXH+XJrYNuTN/npYeXxLZTgnb mESlq3WfevrpuU8TbFLl5ERPs+WDC2RkJwASnBb3bmgRlBlFE1qyRca0Ee6Clknq wk951vpGhNbGnXIvJqDA5PPsd5owHgnN2a5ZNVuXxUdQPW+DoybbTcOqD7FR6Hbp urolxrAmP8wup/EvhTi7f8HRuLZbecNbNA4mll9untzKxTZUKf7ED3V/8y+7S48l isBObSqYNXiLwyva7glLL9SbdDsjR03r0HYlKY7/ohfAGD/SNZ2ndzGmJxlHmXgz Yxg3ktxcDX0l5nNM8YAdq6oJNmX6a+t6BgdoINMhcGx0G4bL5y1zzfJE0nL7csZE bzjNj4N0wr9E/RMApBn0fuCznXjJXR16NJ4Hb+cK3J9itRWOLQt7m1Uyhhq8d/qt 0nLGF1SaJr/mfwSaION2DpfPyCrRohwffr1nIl1WzhaCiYMcvvjU/fimRx99PGTV U7MrGsWgzajS8cyWogDBmTa9A7s4xIoOp9wweoDtnPL3dkS/Jig=
    =AIwG