Sysop: | Amessyroom |
---|---|
Location: | Fayetteville, NC |
Users: | 43 |
Nodes: | 6 (0 / 6) |
Uptime: | 107:31:35 |
Calls: | 290 |
Files: | 905 |
Messages: | 76,677 |
POSIX explicitly limits itself of a subset of ASCII, so it is not going to >> mandate any normalization form. Are there other standards (or initiatives) >> in this area that you know of?
What about RFC 8265?
"Preparation, Enforcement, and Comparison of Internationalized Strings Representing Usernames and Passwords"
https://datatracker.ietf.org/doc/html/rfc8265
These things are ugly, which is why I suppose they haven't caught on
despite being around for decades, but I would guess that this problem
space is such that there are no non-ugly solutions apart from "just
stick to ASCII", which some people find ugly in a different way.
Apologies if I missed someone bringing up and rejecting Punycode in the previous ~41 messages in this thread.
WTF-8 extends UTF-8 to handle
invalid UTF-16 input.
Marc Haber left as an exercise for the reader:
* any upstream tool could say "bad idea" and refuse patches,
requiring their long term management,
Depending of how important this tool is, we could get away without
patching and probably not even documenting this failure.
This kind of attitude seems self-defeating. Despite being
*strongly* in favor of this effort, I would oppose it if were
strictly a Debian thing. We can inspire the move, but going it
alone seems a recipe for present and future pain (think SSHing
from/to Debian and a non-Debian machine).
* the Linux framebuffer console is pretty limited in what
glyphs it has available, and the number of glyphs it can
support,
Probably, yes. But people working on the Linux framebuffer console are unlikely to actually use UTF-8 user names, so the only really bad
With all due respect, this seems totally unsupported by anything
other than vibes =].
* broken localization (or failure to call setlocale()) could be
a bigger problem, especially for root/system accounts.
I don't think we should allow UTF-8 charactes in the string "root" or in system account names. And if a local admin decides to do so, Debian packages should still restrict themselves to using US-ASCII in their
system accounts.
Why? This would require multiple code paths for what seems to me a
very questionable objective. You point out later in your
response that there already exist diverging codepaths, but isn't
unifying such things always a goal?
Do you have a suggestion for a perl regexp that allows this? My current development directory has "qr/[\p{Graph}*\.\${}><%'@]+/".
I do not. This is not a regex problem in my mind and experience;
you need full access to complicated libraries.
Any such effort
should go through Annex 15 canonicalization before being
inspected at all.
At that point, you're well past regular
languages so far as I can tell. I do not see this goal as
possible with small surgeries on the adduser code base, but
rather something that requires work across the chain.
It cannot. "C" is not UTF-8. Assumption of UTF-8 requires aOur default is C.UTF-8 and has been like that for a while.
properly set LANG and programs calling setlocale(). This, as
alluded to above, has the potential for a big mess.
Yes, but that can be changed.
With all due respect, I admire your gung ho candoit spirit, but
adduser alone is not IMHO the place. This is a major change
requiring support from libraries, applications, and UI to do
right, and thus wide buyin. I love the idea, but it's not going
to happen with a few Perl regexes. Please don't read this as
commentary on you or your code.
But a cursory search shows that none of the current upstreams support (or mention) PRECIS. (It also shows that src:precis is a Java library squatting
a bit on that package name... :))
On Sun, Dec 01, 2024 at 06:55:09PM -0500, nick black wrote:
Marc Haber left as an exercise for the reader:
* any upstream tool could say "bad idea" and refuse patches,
requiring their long term management,
Depending of how important this tool is, we could get away without patching and probably not even documenting this failure.
This kind of attitude seems self-defeating. Despite being
*strongly* in favor of this effort, I would oppose it if were
strictly a Debian thing. We can inspire the move, but going it
alone seems a recipe for present and future pain (think SSHing
from/to Debian and a non-Debian machine).
I bet that other distribtions will also allow me to useradd an UTF-8
name today. I don't think that we have patched useradd to allow this.
On 03/12/24 17:20, Marc Haber wrote:
What I intend to do in adduser for the next unstable upload is:
- adduser --system's user name validation will not change
- I'll make sure that adduser <normal user account> doesn't accept
UTF-8 user names, bringing it closer to systemd's notion of a valid
user name
- adduser --allow-bad-names will still allow UTF-8 usernames, not doing
normalization. I will document this and make it clear that the local
admin needs to make sure that they don't allow things they don't want
to have
Dear Marc,
in preparation for a PRECIS future, couldn't adduser pass the usernames through NFC instead of doing no normalization?
RFC 8264 5.2.4 Normalization Rule states:
In accordance with [RFC5198], Normalization Form C (NFC) is
RECOMMENDED.
What I intend to do in adduser for the next unstable upload is:
- adduser --system's user name validation will not change
- I'll make sure that adduser <normal user account> doesn't accept
UTF-8 user names, bringing it closer to systemd's notion of a valid
user name
- adduser --allow-bad-names will still allow UTF-8 usernames, not doing
normalization. I will document this and make it clear that the local
admin needs to make sure that they don't allow things they don't want
to have
thank you all for your contributions to this discussion. I have now
finally understood¹ that it is not enough to try creating an UTF-8
encoded user name and see that it correctly shows up in /etc/passwd to declare UTF-8 support. Please forgive me for not replying to all of you
in this thread individually, I have read everything and if I didnt cater
for your arguments in this message please feel free to remind me.
I'll probably deprecate --allow-bad-names in favor of something that
doesn't use the word "bad" (suggestions appreciated). Otoh, adduser in
the Red Hat World uses --badname to allow such names as well.
in preparation for a PRECIS future, couldn't adduser pass the usernames
through NFC instead of doing no normalization?
RFC 8264 5.2.4 Normalization Rule states:
In accordance with [RFC5198], Normalization Form C (NFC) is
RECOMMENDED.
that would solve the étienne and étienne issue (where the two characters are just different renderings of the same character), but not the Ohm-against-Omega issue, right?
While this seems the right thing to do, I think this should be done in useradd (pkg:shadow), in the respective upstream project, so that all
Linux distributions get the same behavior.
Marc Haber, on 2024-12-03:
I'll probably deprecate --allow-bad-names in favor of something that doesn't use the word "bad" (suggestions appreciated). Otoh, adduser in
the Red Hat World uses --badname to allow such names as well.
The problem is not the name, but the character set, so perhaps --allow-bad-characters will be better perceived. If you want to
also avoid "bad", maybe try --allow-ambiguous-characters, or --allow-extended-character-set? The last one is perhaps a bit
long winded, but also sounds more accurate than the rest. What
do you think of these approaches?
On 03/12/24 17:59, Marc Haber wrote:
in preparation for a PRECIS future, couldn't adduser pass the usernames through NFC instead of doing no normalization?
RFC 8264 5.2.4 Normalization Rule states:
In accordance with [RFC5198], Normalization Form C (NFC) is
RECOMMENDED.
that would solve the étienne and étienne issue (where the two characters are just different renderings of the same character), but not the Ohm-against-Omega issue, right?
NFC would solve both of these "problems":
* Both U+00E9 (é) and U+0065, U+0301 are NFC-normalized to U+00E9,
* Both U+2126 (Ohm sign) and U+0349 (omega) are NFC-normalized to U+0349 (omega).
Thanks for taking the time to delve into this issue,
On Tue, Dec 03, 2024 at 08:41:06PM +0100, Étienne Mollier wrote:
The problem is not the name, but the character set, so perhaps --allow-bad-characters will be better perceived. If you want to
also avoid "bad", maybe try --allow-ambiguous-characters, or --allow-extended-character-set? The last one is perhaps a bit
long winded, but also sounds more accurate than the rest. What
do you think of these approaches?
Extended sounds good, maybe even "unicode"? or "international"?
Normalization is always lossy, at least in principle.
Applications that employ normalization accept that tradeoff in order to gain something valuable: in this case the ability to have a Ohm sign codepoint as part of your username is traded for the ability to compare usernames across different OSes and applications.
On Tue, Dec 03, 2024 at 09:39:03PM +0100, Gioele Barabucci wrote:
On 03/12/24 17:59, Marc Haber wrote:
in preparation for a PRECIS future, couldn't adduser pass the usernames >>>> through NFC instead of doing no normalization?
RFC 8264 5.2.4 Normalization Rule states:
In accordance with [RFC5198], Normalization Form C (NFC) is
RECOMMENDED.
that would solve the étienne and étienne issue (where the two characters >>> are just different renderings of the same character), but not the
Ohm-against-Omega issue, right?
NFC would solve both of these "problems":
* Both U+00E9 (é) and U+0065, U+0301 are NFC-normalized to U+00E9,
* Both U+2126 (Ohm sign) and U+0349 (omega) are NFC-normalized to U+0349
(omega).
Converting Ohm into an Omega is losing intended information, isnt it?
Hi,
thank you all for your contributions to this discussion. I have now
finally understood¹ that it is not enough to try creating an UTF-8
encoded user name and see that it correctly shows up in /etc/passwd to declare UTF-8 support. Please forgive me for not replying to all of you
in this thread individually, I have read everything and if I didnt cater
for your arguments in this message please feel free to remind me.
https://lists.debian.org/debian-devel/2024/11/msg00491.html correctly outlines that homograph characters (such as é (UTF-8 0xC3 0xA9 and the lookalike é 0x65 0xCC 0x81) are not only a nuisance. At the least,
adduser should reject creating étienne if étienne already exists - those are different user names but look the same, and if you don't
cut-and-paste user names instead of typing them you're bound to hit the
wrong user depending on HOW you type and what input medium you use. Not
good.
https://wiki.debian.org/UserAccounts and https://wiki.debian.org/UserAccountsPhilosophy are updated accordingly.
After understanding this, I must admit that what's currently left active
on the adduser team (me) doesn't have the capacity to implement this
properly and in time for trixie. To make things worse, the
Unicode::Precis module, which should be in Debian as
libunicode-precis-perl (but isn't) hasnt seen an upstream release in
more than five years.
Additionally, I don't see myself in the situation of writing a proper
checker for the RFC 8264 IdentifierClass (Chapter 4.2) at the moment
since I don't have the time to check out which \p{Foo} character classes match the classes given in the RFC.
I would appreciate volunteers to help here, but first I need to bring
some sense in adduser's current state of affairs to make an unstable
upload that can eventuall migrate to testing.
What I intend to do in adduser for the next unstable upload is:
- adduser --system's user name validation will not change
- I'll make sure that adduser <normal user account> doesn't accept
UTF-8 user names, bringing it closer to systemd's notion of a valid
user name
- adduser --allow-bad-names will still allow UTF-8 usernames, not doing
normalization. I will document this and make it clear that the local
admin needs to make sure that they don't allow things they don't want
to have
- adduser --allow-all-names will just verbatim pass all user names to
useradd.
All this will be documented in the man page, in README.Debian and/or the
Wiki after the code passes the test suite again.
I'll probably deprecate --allow-bad-names in favor of something that
doesn't use the word "bad" (suggestions appreciated). Otoh, adduser in
the Red Hat World uses --badname to allow such names as well.
I would love to hear your opinion. Silence is agreement ;-)
Greetings
Marc
¹ RFC 8264, RFC 8265, and Unicode TR 15 linked in this thread were
educating for me
Homograph attacks would be best mitigated in software reading
/etc/passwd, alerting in their output or logs that the user name they
just printed was composed of strange alphabets.
On Tue, Dec 03, 2024 at 08:41:06PM +0100, Étienne Mollier wrote:
Marc Haber, on 2024-12-03:
I'll probably deprecate --allow-bad-names in favor of something that doesn't use the word "bad" (suggestions appreciated). Otoh, adduser in the Red Hat World uses --badname to allow such names as well.
The problem is not the name, but the character set, so perhaps --allow-bad-characters will be better perceived. If you want to
also avoid "bad", maybe try --allow-ambiguous-characters, or --allow-extended-character-set? The last one is perhaps a bit
long winded, but also sounds more accurate than the rest. What
do you think of these approaches?
Extended sounds good, maybe even "unicode"? or "international"?
The best mitigation for those attacks is to ban the names altogether.
IMO, setuid programs should not accept Unicode.
The best mitigation for those attacks is to ban the names altogether.
IMO, setuid programs should not accept Unicode.
Neither adduser nor useradd are setuid.
I recommend Chapter 7 of my free book, "Hacking the Planet with
Notcurses: A Guide to TUIs and Character Semigraphics" for the
full story (as I understand it) regarding Unicode presentation: https://nick-black.com/htp-notcurses.pdf (starts on page 41).
On Tue, Dec 03, 2024 at 08:41:06PM +0100, Étienne Mollier wrote:
Marc Haber, on 2024-12-03:
I'll probably deprecate --allow-bad-names in favor of something that doesn't use the word "bad" (suggestions appreciated). Otoh, adduser in the Red Hat World uses --badname to allow such names as well.
The problem is not the name, but the character set, so perhaps --allow-bad-characters will be better perceived. If you want to
also avoid "bad", maybe try --allow-ambiguous-characters, or --allow-extended-character-set? The last one is perhaps a bit
long winded, but also sounds more accurate than the rest. What
do you think of these approaches?
Extended sounds good, maybe even "unicode"? or "international"?
P.S.: Sadly, this has gotten less than positive coverage on LWN. I
apologize for the harm this discussion has done.
This was never on the table, and shadow upstream might even drop the
entire "support" for having bad names.
On Mon, 9 Dec 2024 18:08:33 +0100, Chris Hofstaedtler
<zeha@debian.org> wrote:
I echo Alejandro's concerns. We should stop having the flag
completely, not encourage using it.
I violently disagree. But I have to accept this.
IOW: if we move towards better character support, we need to do that
by allowing it always. Same for longer names.
I think that our distinction between system users and "normal" users
is fine. Noone needs a package generating "weird" user names.
NFC would solve both of these "problems":
* Both U+00E9 (é) and U+0065, U+0301 are NFC-normalized to U+00E9,
* Both U+2126 (Ohm sign) and U+0349 (omega) are NFC-normalized to U+0349 (omega).
What NFC alone will not solve are homograph collisions: a (U+0061 Latin
small letter a) and а (U+0430 Cyrillic small letter a) are NFC-normalized to different codepoints.
But these are two different scenarios: the former problem may (and does) arise without any wrongdoing from the user's side (a different OS, or a different string manipulation library, or a screen keyboard may produce a different é), the latter is an attack. The former is an interoperability issue, the latter is a security issue.
While this seems the right thing to do, I think this should be done in useradd (pkg:shadow), in the respective upstream project, so that all
Linux distributions get the same behavior.
That's probably the best approach.
Thanks for taking the time to delve into this issue,
--
Gioele Barabucci
On Tue, Dec 03, 2024 at 09:39:03PM +0100, Gioele Barabucci wrote:
NFC would solve both of these "problems":
* Both U+00E9 (é) and U+0065, U+0301 are NFC-normalized to U+00E9,
* Both U+2126 (Ohm sign) and U+0349 (omega) are NFC-normalized to U+0349
(omega).
What NFC alone will not solve are homograph collisions: a (U+0061 Latin
small letter a) and а (U+0430 Cyrillic small letter a) are NFC-normalized to
different codepoints.
NFC also doesn't solve various invisible characters (e.g., zero-width
spaces, bidirectional control characters). For more information about
all of the various security land mines, see[1].
NFC has been mentioned in a broader discussion on PRECIS/RFC8264/RFC8265.
The IdentifierClass of RFC 8264 explicitly disallows all these "security
land mines": https://www.rfc-editor.org/rfc/rfc8264.html#section-4.2.3
The "Security considerations" section is quite extensive (5 pages long): https://www.rfc-editor.org/rfc/rfc8264.html#section-12
To me, the question is more, why do we have a flag that, if used,
allows you to break /etc/{passwd,shadow,group,gshadow} completely?
However, it should be noted that RFC 8264 also states that code points
which are not defined in whatever version of the Unicode supported by
"the application" shall be disallowed. From Debian's perspective,
though, if we are going to take a position about what version of
Unicode should be supported by "the application(s)" that read and
write /etc/passwd, we *will* need to take a position on what version
of Unicode should be supported, and therefore, what set of characters
will be disallowed.
I would involve cross-distribution discussion about this though.
Perhaps the /etc/passwd APIs affect some POSIX specifications, and a non-ASCII extension could be proposed.
Yeah, good point. If the scope is going to include passwd entries
that are distributed via network protocols like LDAP, then we need to
worry about sites that support other Linux distributions beyond just
Debian --- or for that matter, sites that need to support Linux as
well as legacy Unix systems like AIX or Solaris.
is there an easy way to determine for a given Unicode string if it[...]
can be typed from a single keboard layout
sorry if it is too naive, but is there an easy way to determine for a
given Unicode string if it can be typed from a single keboard layout or >produced by a text-to-speech system? People who want a username because
of SSH, email and su will want to be able to input it.
But things are moving by shadow upstream taking a user-hostile stance, willing to take away freedom. I must be fine with that because I
cannot change it. But I don't need to like it.
That's easy, just choose a user name for YOU that YOU can type on YOUR keyboard. Why would anybody chose a username that is impossible to use
in their own locale?
I don't see much problems with single-user machines, especially security related. But, think multi-user environments? Imagine, as a non-Chinese speaking Westerner, needing to chown a file to a colleague called 陈成.
On Tue, Dec 10, 2024 at 09:24:15PM +0100, Marc Haber wrote:
But things are moving by shadow upstream taking a user-hostile stance,
willing to take away freedom. I must be fine with that because I
cannot change it. But I don't need to like it.
As a suggestion, we might make more forward progress if we assume good
faith and accept that other people might have different priorities
than others. I could easily see shadow, being a security-related
package, would consider encouraging something that could lead to
security bugs or just other random breakage, as "user-hostile".
Perhaps at some future stable Debian release (not Trixie), we could
enable it by default.
I don't see much problems with single-user machines, especially security >related. But, think multi-user environments? Imagine, as a non-Chinese >speaking Westerner, needing to chown a file to a colleague called 陈成. Even
I don’t need non-ASCII for my name but I would never use a system that would forces me to rewrite my name in ASCII because it is so utterly broken in 2024. I bet there is no problem on Windows systems.
Stephan
consisting entirely of Windows(basics), PowerPoint, Word and Excel; but >that's another story), and *of course* all usernames have been
normalized to lowercase ASCII.
They are planning to remove the --badname option from useradd, making
it impossible to even try UTF-8 user names, without patching useradd.
and *of course* all usernames have been normalized to lowercase ASCII.
On Fri, Dec 13, 2024 at 10:08:19AM -0500, Michael Stone wrote:
On Fri, Dec 13, 2024 at 12:22:38PM +0100, Marc Haber wrote:
They are planning to remove the --badname option from useradd, making
it impossible to even try UTF-8 user names, without patching useradd.
Or edit the passwd file (vipw), or use any non-passwd-file authentication mechanism, or use a different user management tool, etc.
I think you're overemphasizing the importance of the useradd command here--it just acts as a convenience and sets some baseline policies;
it's not actually essential for adding a user. If you don't like the policy that useradd sets...just don't use it.
In the context of the whole thread, are you suggesting that adduser(1)
should be changed to use something other than useradd(8) under the hood?
On Fri, Dec 13, 2024 at 12:22:38PM +0100, Marc Haber wrote:
They are planning to remove the --badname option from useradd, making
it impossible to even try UTF-8 user names, without patching useradd.
Or edit the passwd file (vipw), or use any non-passwd-file authentication mechanism, or use a different user management tool, etc.
I think you're overemphasizing the importance of the useradd command
here--it just acts as a convenience and sets some baseline policies;
it's not actually essential for adding a user. If you don't like the policy that useradd sets...just don't use it.
In the context of the whole thread, are you suggesting that adduser(1)
should be changed to use something other than useradd(8) under the hood?
On Fri, Dec 13, 2024 at 10:08:19AM -0500, Michael Stone wrote:
On Fri, Dec 13, 2024 at 12:22:38PM +0100, Marc Haber wrote:
They are planning to remove the --badname option from useradd, making
it impossible to even try UTF-8 user names, without patching useradd.
Or edit the passwd file (vipw), or use any non-passwd-file authentication
mechanism, or use a different user management tool, etc.
I think you're overemphasizing the importance of the useradd command
here--it just acts as a convenience and sets some baseline policies;
it's not actually essential for adding a user. If you don't like the policy >> that useradd sets...just don't use it.
In the context of the whole thread, are you suggesting that adduser(1)
should be changed to use something other than useradd(8) under the hood?
getent passwd 1144💩:*:1144:1144::/nowhere:/bin/false
getent group 1144💩:*:1144:
ls -l /tmp/samplefile-rw-r--r-- 1 💩 💩 0 Dec 13 22:42 /tmp/samplefile
On Fri, Dec 13, 2024 at 07:00:36PM +0200, Peter Pentchev wrote:[snip more about adding accounts without useradd/adduser]
On Fri, Dec 13, 2024 at 10:08:19AM -0500, Michael Stone wrote:
On Fri, Dec 13, 2024 at 12:22:38PM +0100, Marc Haber wrote:
They are planning to remove the --badname option from useradd, making it impossible to even try UTF-8 user names, without patching useradd.
Or edit the passwd file (vipw), or use any non-passwd-file authentication mechanism, or use a different user management tool, etc.
I think you're overemphasizing the importance of the useradd command here--it just acts as a convenience and sets some baseline policies;
it's not actually essential for adding a user. If you don't like the policy
that useradd sets...just don't use it.
In the context of the whole thread, are you suggesting that adduser(1) should be changed to use something other than useradd(8) under the hood?
No, I'm suggesting that rhetoric asserting that any adduser/useradd policy could constrain people is overblown because users can be added to the system without using either of those tools. The tools' policies should reflect what is safest and most sensible for the majority of users, but if someone wants to do something different there is nothing stopping them from doing so.