Sysop: | Amessyroom |
---|---|
Location: | Fayetteville, NC |
Users: | 28 |
Nodes: | 6 (1 / 5) |
Uptime: | 45:00:14 |
Calls: | 422 |
Calls today: | 1 |
Files: | 1,024 |
Messages: | 90,303 |
**ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท< wxy8 >ยท**ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท**ยท< hij3 >ยทยท**ยทยทยทยทยทยทยทยทยทยท**ยทยทยทยทยทยทยทยทยท**ยทยทยทยท< tuv7 >ยทยทยท**ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท**ยท**ยทยท< qrs6 >ยทยทยท**ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท**ยทยทยทยท< nop5 >ยทยทยทยท**ยทยทยทยทยทยทยท**ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท< abc1 >ยทยทยทยทยทยทยทยทยทยทยท**ยทยทยทยทยทยท**ยทยทยทยทยทยทยทยทยทยท< efg2 >ยทยทยทยทยทยทยทยทยทยทยทยท**ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท< klm4
ยทยทยทยท**ยทยทยทยทยทยทยท**ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท< abc1 >ยทยทยทยทยทยทยทยทยทยทยท**ยทยทยทยทยทยท**ยทยทยทยทยทยทยทยทยทยท< efg2 >ยท**ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท**ยท< hij3 >ยทยทยทยทยทยทยทยทยทยทยทยท**ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท< klm4 >ยทยทยท**ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท**ยทยทยทยท< nop5 >ยทยทยท**ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท**ยท**ยทยท< qrs6 >ยทยท**ยทยทยทยทยทยทยทยทยทยท**ยทยทยทยทยทยทยทยทยท**ยทยทยทยท< tuv7 >**ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท< wxy8
If anything, I'd expected LC_COLLATE to have an effect on sorting.
Then there's no locale with @isodate on that sort-defunct system.
And clearing that LC_TIME locale or removing the "@isodate" part
did not change anything; it needs that setting to a non-existing
locale file to work correctly on the otherwise not correctly
sorting system.
ยทยทยทยท**ยทยทยทยทยทยทยท**ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท< abc1 >ยทยทยทยทยทยทยทยทยทยทยท**ยทยทยทยทยทยท**ยทยทยทยทยทยทยทยทยทยท< efg2 >ยท**ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท**ยท< hij3
I've been sorting punctuation characters on one Unix system and it
did not produce the expected result. Switching to another system did
it as expected.
The test program (it contains non-ASCII middle-dot characters) was
sort -t $'\t' <<EOT
One hypothesis was that it's some locale issue. So I've copied the
LC_* settings to the newer system and disabled them one by one.
Strangely, the one that was responsible for the effect was LC_TIME!
On the correct sorting system it was defined as
LC_TIME=de_DE.UTF-8@isodate
and the one that worked improperly had
LC_TIME=de_DE.UTF-8
Now I'm puzzled in many ways...
If anything, I'd expected LC_COLLATE to have an effect on sorting.
Then there's no locale with @isodate on that sort-defunct system.
And clearing that LC_TIME locale or removing the "@isodate" part
did not change anything; it needs that setting to a non-existing
locale file to work correctly on the otherwise not correctly
sorting system.
Does anyone have an idea what's going on here?
I'm reluctant to globally set LC_TIME=de_DE.UTF-8@isodate
(since there is no file with that name in the locale directories).
Thanks.
Janis
[*] Lines with additional other contents than the depicted payload
were sorted correctly.
On 2025-02-19, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
If anything, I'd expected LC_COLLATE to have an effect on sorting.
Then there's no locale with @isodate on that sort-defunct system.
And clearing that LC_TIME locale or removing the "@isodate" part
did not change anything; it needs that setting to a non-existing
locale file to work correctly on the otherwise not correctly
sorting system.
My working hypothesis would be that setting LC_TIME to a nonexistent
locale causes an error that invalidates the _whole_ locale setting
and causes a fallback to a default setting, likely the "C" locale.
You can check that sorting with LC_ALL=C or an invalid value like LC_ALL=foobar will produce your "correct" result.
A corollary from this would be that your "sort-defunct" system uses
a different collation order than your "correctly" sorting system
for the de_DE.UTF-8 locale.
On the FreeBSD 14-STABLE system I'm typing this on, sorting your
example data with my typical C.UTF-8 locale produces your expected
result, sorting with de_DE.UTF-8 (or en_US.UTF-8) produces a different
order.
ยทยทยทยท**ยทยทยทยทยทยทยท**ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท< abc1
ยทยทยทยทยทยทยทยทยทยทยท**ยทยทยทยทยทยท**ยทยทยทยทยทยทยทยทยทยท< efg2
ยท**ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท**ยท< hij3
Also, I have no idea what could be considered the "correct" sorting
order for this.
In article <vp4f6o$288ui$1@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
I've been sorting punctuation characters on one Unix system and it
did not produce the expected result. Switching to another system did
it as expected.
The test program (it contains non-ASCII middle-dot characters) was
sort -t $'\t' <<EOT
Do you really have the '$' there?
I'm sure there was a reason why the setting is now "en_US" instead of
"de_DE" (like almost all others LC-settings), so I'm reluctant to change that.
The test program (it contains non-ASCII middle-dot characters) was
sort -t $'\t' <<EOT
Do you really have the '$' there?
My working hypothesis would be that setting LC_TIME to a nonexistent
locale causes an error that invalidates the _whole_ locale setting
and causes a fallback to a default setting, likely the "C" locale.
I've been sorting punctuation characters on one Unix system and it did
not produce the expected result. Switching to another system did it as expected.