Sysop: | Amessyroom |
---|---|
Location: | Fayetteville, NC |
Users: | 43 |
Nodes: | 6 (0 / 6) |
Uptime: | 108:36:28 |
Calls: | 290 |
Files: | 905 |
Messages: | 76,683 |
What I'm trying to do is iterating over files, say,
P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM P7.HTM
P8.HTM P9.HTM
in numerical order.
(Ideally I'd also like to handle names with two numbers "A35P56.txt"
and irregular string components (lowercase, say, "page310ch1.txt"),
but that's just a nice-to-have. - I might make use of 'sort'?)
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
What I'm trying to do is iterating over files, say,
P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM P7.HTM
P8.HTM P9.HTM
in numerical order.
Could you employ printf to add leading zeros, sort lexicogpaphically and
then remove the zeros?
(Ideally I'd also like to handle names with two numbers "A35P56.txt"
and irregular string components (lowercase, say, "page310ch1.txt"),
but that's just a nice-to-have. - I might make use of 'sort'?)
You did not yet mention what your final goal is with the numerically
sorted list.
In case this is in the end a renaming task, for this level of complexity
I would use the "wdired" mode of Emacs ("write directory edits") and use regexes for search and replace. Or some other "multi-rename" tools from
the command line.
I'm using ksh here...
On Fri, 14 Jun 2024 09:31:18 +0200, Janis Papanagnou wrote:
I'm using ksh here...
At some point, you have to accept that trying to do everything in a shell language is not the best way to go, and that it is time to switch to a “real” programming language.
For example, Perl or Python could do this much more easily.
I don't know of this feature in Perl or Python; please provide some hint
if there is a feature like the one I need. Some code samples for demonstration of your point are also welcome.
On Sun, 16 Jun 2024 05:11:29 +0200, Janis Papanagnou wrote:
I don't know of this feature in Perl or Python; please provide some hint
if there is a feature like the one I need. Some code samples for
demonstration of your point are also welcome.
Python solution:
import re
items = \
[
"P1.HTM", "P10.HTM", "P11.HTM", "P2.HTM", "P3.HTM",
"P4.HTM", "P5.HTM", "P6.HTM", "P7.HTM", "P8.HTM", "P9.HTM",
]
print(items)
print \
(
sorted
(
items,
key = lambda f :
tuple
(
(lambda : p, lambda : int(p))[i % 2 != 0]()
for i, p in enumerate(re.split("([0-9]+)", f))
)
)
)
output:
['P1.HTM', 'P10.HTM', 'P11.HTM', 'P2.HTM', 'P3.HTM', 'P4.HTM', 'P5.HTM', 'P6.HTM', 'P7.HTM', 'P8.HTM', 'P9.HTM']
['P1.HTM', 'P2.HTM', 'P3.HTM', 'P4.HTM', 'P5.HTM', 'P6.HTM', 'P7.HTM', 'P8.HTM', 'P9.HTM', 'P10.HTM', 'P11.HTM']
How would the main function look like that I could embed in my
call to make a numerically sorted list. Say, something like, for
example,
viewer $( p_sort P*.HTM )
where p_sort would be the Python code. - Note: this is no
appropriate solution since it would anyway not work correctly
for file names with embedded blanks and newlines. I just want to
get a closer understanding how you think this would be usable in
shell (or from shell). Thanks.
On 16.06.2024 00:49, Lawrence D'Oliveiro wrote:*SKIP* [ 8 lines 2 levels deep] # borderly ad hominem
On Fri, 14 Jun 2024 09:31:18 +0200, Janis Papanagnou wrote:
I'm using ksh here...
(My approach is to take the appropriate tools and language from the
set of (a dozen, or so) "real" languages I know, plus from the set of
a handful of scripting languages that I know.)
For example, Perl or Python could do this much more easily.I don't know of this feature in Perl or Python; please provide some
hint if there is a feature like the one I need. Some code samples for demonstration of your point are also welcome.
How would the main function look like that I could embed in my call to
make a numerically sorted list.
with <v4ll54$3sd11$1@dont-email.me> Janis Papanagnou wrote:
[sorting file names numerically]
(Disclaimer: I'm ksh-ignorant) Speaking of features.
{14439:44} [0:0]% print -cC6 *
bar-20.baz bar-3.baz foo-10.bar foo-23.bar foo-5.bar
bar-21.baz bar-6.baz foo-13.bar foo-29.bar foo-6.bar
bar-24.baz bar-8.baz foo-1.bar foo-3.bar foo-7.bar
bar-26.baz foo-0.bar foo-22.bar foo-4.bar foo-8.bar
{14445:45} [0:0]% print -cC6 *(n)
bar-3.baz bar-21.baz foo-1.bar foo-6.bar foo-13.bar
bar-6.baz bar-24.baz foo-3.bar foo-7.bar foo-22.bar
bar-8.baz bar-26.baz foo-4.bar foo-8.bar foo-23.bar
bar-20.baz foo-0.bar foo-5.bar foo-10.bar foo-29.bar
That nymph between weapon and tool is 'glob qualifier' (acts at
'filename generation' phase). But! It's zsh.
That being said, as a
result of cross-pollination, something similar might be in ksh too. I
can't say where to dig through ksh-documentation.
Janis Papanagnou <janis_papanagnou+ng@hotmail.com>:
How would the main function look like that I could embed in my call to
make a numerically sorted list. Say, something like, for example,
viewer $( p_sort P*.HTM )
where p_sort would be the Python code. - Note: this is no appropriate
solution since it would anyway not work correctly for file names with
embedded blanks and newlines. I just want to get a closer
understanding how you think this would be usable in shell (or from
shell). Thanks.
If ‘p_sort’ is designed to output the sorted file names separated by an ASCII NUL character rather than a newline then, using the GNU version of ‘xargs’, one can feed that output into ‘xargs’:
{
p_sort P*.HTM 3<&- |
xargs --null --no-run-if-empty -- sh -c \
'exec 0<&3 3<&- "$@"' sh \
viewer
} 3<&0
This will avoid the problems with funny characters (including blanks and linefeeds) in filenames processed by the shell.
On Sun, 16 Jun 2024 11:48:25 +0200, Janis Papanagnou wrote:
How would the main function look like that I could embed in my call to
make a numerically sorted list.
That’s what the code I posted does.
On Mon, 17 Jun 2024 08:32:21 +0200, Janis Papanagnou wrote:
... I've got the impression that it rather sorts only the
_hard-coded data_ ...
So get the data from the usual sources, e.g. os.listdir().
Anyway. Don't bother.
I can only lead the horse to water, I cannot make you drink.
... I've got the impression that it rather sorts only the
_hard-coded data_ ...
Anyway. Don't bother.
IOW; I'd have to learn Python completely to understand your code and get
the details properly.
In the present form it's just useless and off-topic here. But as said,
don't bother.
On Mon, 17 Jun 2024 09:44:55 +0200, Janis Papanagnou wrote:
IOW; I'd have to learn Python completely to understand your code and get
the details properly.
I give you a fish, you eat for a day. You learn to fish, you eat for a >lifetime.
In the present form it's just useless and off-topic here. But as said,
don't bother.
Have you received a better offer yet?
If ‘p_sort’ is designed to output the sorted file names separated
by an ASCII NUL character rather than a newline then, using the
GNU version of ‘xargs’, one can feed that output into ‘xargs’:
{
p_sort P*.HTM 3<&- |
xargs --null --no-run-if-empty -- sh -c \
'exec 0<&3 3<&- "$@"' sh \
viewer
} 3<&0
I'm using ksh here...
I can set the shell parameters in numerical order
$ set {1..100}
then sort them _lexicographically_
$ set -s
Or do both in one
$ set -s {1..100}
I haven't found anything to sort them _numerically_ in shell.
What I'm trying to do is iterating over files, say,
P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM P7.HTM
P8.HTM P9.HTM
in numerical order.
Setting the files as shell arguments with P*.HTM will also produce lexicographical order.
The preceding files are just samples. It should work also if the
numbers are non-consecutive (say, 2, 10, 10000, 3333333) so that
iterating using a for-loop and building the list is not an option.
(Ideally I'd also like to handle names with two numbers "A35P56.txt"
and irregular string components (lowercase, say, "page310ch1.txt"),
but that's just a nice-to-have. - I might make use of 'sort'?)
But the primary question is; how to organize/iterate the arguments *numerically* _in shell_? (If that's possible in some simple way.)
N.B.: I prefer not to use external commands like 'sort' because of
the negative side effects and bulky code to handle newlines and
blanks in filenames, and messing around with quotes.
Janis
On 14/06/2024 at 08:31, Janis Papanagnou wrote:
I'm using ksh here...
I can set the shell parameters in numerical order
$ set {1..100}
then sort them _lexicographically_
$ set -s
Or do both in one
$ set -s {1..100}
I haven't found anything to sort them _numerically_ in shell.
What I'm trying to do is iterating over files, say,
P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM P7.HTM
P8.HTM P9.HTM
in numerical order.
Setting the files as shell arguments with P*.HTM will also produce
lexicographical order.
The preceding files are just samples. It should work also if the
numbers are non-consecutive (say, 2, 10, 10000, 3333333) so that
iterating using a for-loop and building the list is not an option.
(Ideally I'd also like to handle names with two numbers "A35P56.txt"
and irregular string components (lowercase, say, "page310ch1.txt"),
but that's just a nice-to-have. - I might make use of 'sort'?)
But the primary question is; how to organize/iterate the arguments
*numerically* _in shell_? (If that's possible in some simple way.)
N.B.: I prefer not to use external commands like 'sort' because of
the negative side effects and bulky code to handle newlines and
blanks in filenames, and messing around with quotes.
Janis
Can you use an array? E.g. (bash, I don't know ksh, but could be similar)
for i in P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM
P7.HTM P8.HTM P9.HTM; do
j=${i//[![:digit:]]}
files[j]="$i"
done
printf '%s\n' "${files[@]}"
P1.HTM
P2.HTM
P3.HTM
P4.HTM
P5.HTM
P6.HTM
P7.HTM
P8.HTM
P9.HTM
P10.HTM
P11.HTM
I'll have to work on names with two (or more?) numbers.
In the present form it's just useless and off-topic here. But as said,
don't bother.
Have you received a better offer yet?
On 18.06.2024 15:32, Chris Elvidge wrote:
On 14/06/2024 at 08:31, Janis Papanagnou wrote:
I'm using ksh here...
I can set the shell parameters in numerical order
$ set {1..100}
then sort them _lexicographically_
$ set -s
Or do both in one
$ set -s {1..100}
I haven't found anything to sort them _numerically_ in shell.
What I'm trying to do is iterating over files, say,
P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM P7.HTM
P8.HTM P9.HTM
in numerical order.
Setting the files as shell arguments with P*.HTM will also produce
lexicographical order.
The preceding files are just samples. It should work also if the
numbers are non-consecutive (say, 2, 10, 10000, 3333333) so that
iterating using a for-loop and building the list is not an option.
(Ideally I'd also like to handle names with two numbers "A35P56.txt"
and irregular string components (lowercase, say, "page310ch1.txt"),
but that's just a nice-to-have. - I might make use of 'sort'?)
But the primary question is; how to organize/iterate the arguments
*numerically* _in shell_? (If that's possible in some simple way.)
N.B.: I prefer not to use external commands like 'sort' because of
the negative side effects and bulky code to handle newlines and
blanks in filenames, and messing around with quotes.
Janis
Can you use an array? E.g. (bash, I don't know ksh, but could be similar)
Yes, Ksh supports both, indexed and associative arrays.
for i in P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM
P7.HTM P8.HTM P9.HTM; do
j=${i//[![:digit:]]}
files[j]="$i"
done
printf '%s\n' "${files[@]}"
P1.HTM
P2.HTM
P3.HTM
P4.HTM
P5.HTM
P6.HTM
P7.HTM
P8.HTM
P9.HTM
P10.HTM
P11.HTM
I'll have to work on names with two (or more?) numbers.
One thing that concerns me with arrays is that I seem to recall that
there was a limit in the number of array elements (which might be an
issue on lengthy lists of files). But some ad hoc tests seem to show
that if there's a limit it's not any more in the 1k/4k elements range
as it had been. (Bolski/Korn says their arrays support at least 4k.)
Janis
On 18/06/2024 at 15:38, Janis Papanagnou wrote:[...]
One thing that concerns me with arrays is that I seem to recall that
there was a limit in the number of array elements (which might be an
issue on lengthy lists of files). But some ad hoc tests seem to show
that if there's a limit it's not any more in the 1k/4k elements range
as it had been. (Bolski/Korn says their arrays support at least 4k.)
I tested in ksh - works as written.
From here: https://unix.stackexchange.com/questions/195191/ksh-bash-maximum-size-of-an-array
<quote>
This simple script shows on my systems (Gnu/Linux and Solaris):
ksh88 limits the size to 2^12-1 (4095). (subscript out of range ).
Some older releases like the one on HP-UX limit the size to 1023.
ksh93 limits the size of a array to 2^22-1 (4194303), your mileage
may vary.
bash doesn't look to impose any hard-coded limit outside the one
dictated by the underlying memory resources available. For example bash
uses 1.3 GB of virtual memory for an array size of 18074340.
</quote>
I'm using ksh here...
I can set the shell parameters in numerical order
$ set {1..100}
then sort them _lexicographically_
$ set -s
Or do both in one
$ set -s {1..100}
I haven't found anything to sort them _numerically_ in shell.
What I'm trying to do is iterating over files, say,
P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM P7.HTM
P8.HTM P9.HTM
in numerical order.
Setting the files as shell arguments with P*.HTM will also produce lexicographical order.
The preceding files are just samples. It should work also if the
numbers are non-consecutive (say, 2, 10, 10000, 3333333) so that
iterating using a for-loop and building the list is not an option.
(Ideally I'd also like to handle names with two numbers "A35P56.txt"
and irregular string components (lowercase, say, "page310ch1.txt"),
but that's just a nice-to-have. - I might make use of 'sort'?)
But the primary question is; how to organize/iterate the arguments *numerically* _in shell_? (If that's possible in some simple way.)
N.B.: I prefer not to use external commands like 'sort' because of
the negative side effects and bulky code to handle newlines and
blanks in filenames, and messing around with quotes.
Janis
On 16.06.2024 20:00, Eric Pozharski wrote:
with <v4ll54$3sd11$1@dont-email.me> Janis Papanagnou wrote:
That being said, as a result of cross-pollination, something similar
might be in ksh too. I can't say where to dig through
ksh-documentation.
Well, I don't know of any in Ksh. (That's my problem.)
So your array approach looks promising for one numeric key, and
it's a nice and terse solution.
Janis
with <v4okm3$h4cs$1@dont-email.me> Janis Papanagnou wrote:
On 16.06.2024 20:00, Eric Pozharski wrote:
with <v4ll54$3sd11$1@dont-email.me> Janis Papanagnou wrote:
*SKIP* [ 20 lines 3 levels deep]
That being said, as a result of cross-pollination, something similar
might be in ksh too. I can't say where to dig through
ksh-documentation.
Well, I don't know of any in Ksh. (That's my problem.)
Is it because oh-my-bad documentation or ksh seeks minimal feature-set?
p.s. Lack of features is a feature by itself, there's that.
On Mon, 17 Jun 2024 09:44:55 +0200, Janis Papanagnou wrote:
IOW; I'd have to learn Python completely to understand your code and get
the details properly.
I give you a fish, you eat for a day. You learn to fish, you eat for a lifetime.
Helmut Waitzmann wrote:
If ‘p_sort’ is designed to output the sorted file names
separated by an ASCII NUL character rather than a newline
then, using the GNU version of ‘xargs’, one can feed that
output into ‘xargs’:
{
p_sort P*.HTM 3<&- |
xargs --null --no-run-if-empty -- sh -c \
'exec 0<&3 3<&- "$@"' sh \
viewer
} 3<&0
NUL as a record separator is also supported by several other
versions of xargs, and it is in the recently released
POSIX.1-2024 standard.
In all of those it is specified with -0, so using -0 is more
portable than the GNU-specific --null.
POSIX.1-2024 also has -r although I think that's not as widely
supported in current xargs implementations as -0. It should
become better supported over time, though, so again I would
suggest using -r rather than --no-run-if-empty for better future portability.
I've just tried a Unix tools based solution (with sed, sort, cut).
Up to and including the line containing 'shuf' is data generation,
the rest (starting with 'sed') extracts and sorts the data. I've
written it for TWO numeric sort keys (see printf format specifier)
for (( i=1; i<=50; i++ )) do for (( j=2; j<=120; j+=3 )) do printf "a%db%dc.txt\n" i j done done | shuf |
sed 's/[^0-9]*\([0-9]\+\)[^0-9]*\([0-9]\+\)[^0-9]*/\1\t\2\t&/' | sort
-t$'\t' -k1n -k2n | cut -f3-
For just one numeric argument this can be simplified (shorter sed
pattern, simpler sort -n command), and for more than two numeric
fields it can be modified to dynamically construct the sed pattern,
the sort option list, and the cut parameter, once at the beginning;
that way we could have a tool for arbitrary amounts of numeric keys
in the file name.
Note: this program doesn't handle pathological filenames (newlines).
Janis
On 18/06/2024 at 18:04, Janis Papanagnou wrote:
I've just tried a Unix tools based solution (with sed, sort, cut).
[...]
[...], and for more than two numeric
fields it can be modified to dynamically construct the sed pattern,
the sort option list, and the cut parameter, once at the beginning;
that way we could have a tool for arbitrary amounts of numeric keys in
the file name.
Note: this program doesn't handle pathological filenames (newlines).
If you're happy not handling pathological filenames:
for (( i=1; i<=50; i++ )); do for (( j=2; j<=120; j+=3 )); do touch "a${i}b${j}c.txt"; done; done
to create the files.
exnums() { j="$(sed 's/[^[:digit:]]\+/ /g' <<<"$@")"; printf '%s%s\n'
"$j" "$@"; }
function replaces all non-digit sequences with a space, prints digit sequence(s) and original input.
for i in *; do exnums "$i"; done | sort -k1n -k2n -k3n -k4n | awk
'{print $NF}'
sort doesn't seem to care how many -k you use, fields separated with space. awk prints the last field of the input.
This "seems" to work with all manner of filenames from PNN.htm (as your original sequence) to p323dc45g12.htm, p324dc45g12.htm, p333dc45g12.htm
Seems to work in ksh, too.
On 19.06.2024 14:40, Chris Elvidge wrote:
On 18/06/2024 at 18:04, Janis Papanagnou wrote:
I've just tried a Unix tools based solution (with sed, sort, cut).
[...]
[...], and for more than two numeric
fields it can be modified to dynamically construct the sed pattern,
the sort option list, and the cut parameter, once at the beginning;
that way we could have a tool for arbitrary amounts of numeric keys in
the file name.
Note: this program doesn't handle pathological filenames (newlines).
If you're happy not handling pathological filenames:
Well, typically I can indeed ignore them. But it's better of course
to avoid situations where processing is compromised by such names.
for (( i=1; i<=50; i++ )); do for (( j=2; j<=120; j+=3 )); do touch
"a${i}b${j}c.txt"; done; done
to create the files.
exnums() { j="$(sed 's/[^[:digit:]]\+/ /g' <<<"$@")"; printf '%s%s\n'
"$j" "$@"; }
function replaces all non-digit sequences with a space, prints digit
sequence(s) and original input.
for i in *; do exnums "$i"; done | sort -k1n -k2n -k3n -k4n | awk
'{print $NF}'
sort doesn't seem to care how many -k you use, fields separated with space. >> awk prints the last field of the input.
This "seems" to work with all manner of filenames from PNN.htm (as your
original sequence) to p323dc45g12.htm, p324dc45g12.htm, p333dc45g12.htm
Seems to work in ksh, too.
I tried the approach I outlined above... (here just echo'ing the
created parts)...
N=${1:-1}
sed_a="[^0-9]*\([0-9]\+\)[^0-9]*"
sed_r="\1\t"
sort_a="-k1n"
for (( n=2; n<=N; n++ ))
do
sed_a+="\([0-9]\+\)[^0-9]*"
sed_r+="\\${n}\t"
sort_a+=" -k${n}n"
done
cut_a="-f$((N+1))-"
echo "# The following commands would be connected by pipes:"
echo "sed 's/${sed_a}/${sed_r}&/'"
echo "sort -t$'\t' ${sort_a}"
echo "cut ${cut_a}"
Janis
On 19/06/2024 at 14:11, Janis Papanagnou wrote:
On 19.06.2024 14:40, Chris Elvidge wrote:Your way is still restricted to filenames with a known number of sets of digits, though (AFAICS). I.e. you pass N rather than finding it.
On 18/06/2024 at 18:04, Janis Papanagnou wrote:
I've just tried a Unix tools based solution (with sed, sort, cut).If you're happy not handling pathological filenames:
[...]
[...], and for more than two numeric fields it can be modified to
dynamically construct the sed pattern, the sort option list, and the
cut parameter, once at the beginning; that way we could have a tool
for arbitrary amounts of numeric keys in the file name.
Note: this program doesn't handle pathological filenames (newlines).
Well, typically I can indeed ignore them. But it's better of course to
avoid situations where processing is compromised by such names.
for (( i=1; i<=50; i++ )); do for (( j=2; j<=120; j+=3 )); do touch
"a${i}b${j}c.txt"; done; done to create the files.
exnums() { j="$(sed 's/[^[:digit:]]\+/ /g' <<<"$@")"; printf '%s%s\n'
"$j" "$@"; }
function replaces all non-digit sequences with a space, prints digit
sequence(s) and original input.
for i in *; do exnums "$i"; done | sort -k1n -k2n -k3n -k4n | awk
'{print $NF}'
sort doesn't seem to care how many -k you use, fields separated with
space.
awk prints the last field of the input.
This "seems" to work with all manner of filenames from PNN.htm (as
your original sequence) to p323dc45g12.htm, p324dc45g12.htm,
p333dc45g12.htm Seems to work in ksh, too.
I tried the approach I outlined above... (here just echo'ing the
created parts)...
N=${1:-1}
sed_a="[^0-9]*\([0-9]\+\)[^0-9]*"
sed_r="\1\t"
sort_a="-k1n"
for (( n=2; n<=N; n++ ))
do
sed_a+="\([0-9]\+\)[^0-9]*"
sed_r+="\\${n}\t" sort_a+=" -k${n}n"
done cut_a="-f$((N+1))-"
echo "# The following commands would be connected by pipes:"
echo "sed 's/${sed_a}/${sed_r}&/'"
echo "sort -t$'\t' ${sort_a}"
echo "cut ${cut_a}"
Janis
But it takes a long time to do it my way, a call to sed for each
filename, so I tried to cut down the time taken to do this and came up
with:
bash: exnums() { shopt -s extglob; j="${@//+([^[:digit:]])/ }"; printf '%s%s\n' "$j" "$@"; }
ksh: exnums() { j="${@//+([^[:digit:]])/ }"; printf '%s%s\n' "$j" "$@";
}
ksh seems to do the extglob needed for bash natively.
removing the sed calls from exnum changes the time taken from 37 secs to under 1 sec with 2000+ files ksh is faster than bash, ksh 50% of the
bash time taken.
Substituting a tab for the replacement space in j= and -t$'\t' in sort
would seem to allow spaces in filenames, too, as you originally had it.
On 19/06/2024 at 14:11, Janis Papanagnou wrote:
[...]
I tried the approach I outlined above... (here just echo'ing the
created parts)...
N=${1:-1}
sed_a="[^0-9]*\([0-9]\+\)[^0-9]*"
sed_r="\1\t"
sort_a="-k1n"
for (( n=2; n<=N; n++ ))
do
sed_a+="\([0-9]\+\)[^0-9]*"
sed_r+="\\${n}\t"
sort_a+=" -k${n}n"
done
cut_a="-f$((N+1))-"
echo "# The following commands would be connected by pipes:"
echo "sed 's/${sed_a}/${sed_r}&/'"
echo "sort -t$'\t' ${sort_a}"
echo "cut ${cut_a}"
Your way is still restricted to filenames with a known number of sets of digits, though (AFAICS). I.e. you pass N rather than finding it.
[...]
I finally remembered which tool has "versionsort(3)" -- [...]
[...]
I finally remembered which tool has "versionsort(3)" -- it's ls:
$ ls -1
test10.txt
test1.txt
test2.txt
$ ls -v -1
test1.txt
test2.txt
test10.txt
Does that help?
On 20.06.2024 01:45, vallor wrote:
I finally remembered which tool has "versionsort(3)" -- [...]
It's a pity that this function is a GNU extension, otherwise it could be
used to implement the desired function in shells (ksh, bash) as an
additional globbing option (like the zsh glob qualifier) or a new 'set' option to control the sorting.
Janis
I just posted a python program to comp.lang.python that sorts parameters using strverscmp(3).
On Thu, 20 Jun 2024 22:16:42 -0000 (UTC), vallor wrote:
I just posted a python program to comp.lang.python that sorts
parameters using strverscmp(3).
I already posted a snippet here which sorts strings containing any
number of decimal-numerical segments.
While I can't speak for others, something about the way you went about
that rubbed me the wrong way.
On Fri, 21 Jun 2024 04:20:47 -0000 (UTC), vallor wrote:
While I can't speak for others, something about the way you went about
that rubbed me the wrong way.
I solved the specific problem that seems to be the stumbling block, and
left the rest as an exercise for the reader.
That wasnt up to your particular high standards? You know what you can
do.
On Wed, 19 Jun 2024 16:06:37 +0100, Chris Elvidge <chris@x550c.mshome.net> wrote in <v4us5u$21bu3$1@dont-email.me>:
On 19/06/2024 at 14:11, Janis Papanagnou wrote:
On 19.06.2024 14:40, Chris Elvidge wrote:Your way is still restricted to filenames with a known number of sets of
On 18/06/2024 at 18:04, Janis Papanagnou wrote:
I've just tried a Unix tools based solution (with sed, sort, cut).If you're happy not handling pathological filenames:
[...]
[...], and for more than two numeric fields it can be modified to
dynamically construct the sed pattern, the sort option list, and the >>>>> cut parameter, once at the beginning; that way we could have a tool
for arbitrary amounts of numeric keys in the file name.
Note: this program doesn't handle pathological filenames (newlines). >>>>>
Well, typically I can indeed ignore them. But it's better of course to
avoid situations where processing is compromised by such names.
for (( i=1; i<=50; i++ )); do for (( j=2; j<=120; j+=3 )); do touch
"a${i}b${j}c.txt"; done; done to create the files.
exnums() { j="$(sed 's/[^[:digit:]]\+/ /g' <<<"$@")"; printf '%s%s\n'
"$j" "$@"; }
function replaces all non-digit sequences with a space, prints digit
sequence(s) and original input.
for i in *; do exnums "$i"; done | sort -k1n -k2n -k3n -k4n | awk
'{print $NF}'
sort doesn't seem to care how many -k you use, fields separated with
space.
awk prints the last field of the input.
This "seems" to work with all manner of filenames from PNN.htm (as
your original sequence) to p323dc45g12.htm, p324dc45g12.htm,
p333dc45g12.htm Seems to work in ksh, too.
I tried the approach I outlined above... (here just echo'ing the
created parts)...
N=${1:-1}
sed_a="[^0-9]*\([0-9]\+\)[^0-9]*"
sed_r="\1\t"
sort_a="-k1n"
for (( n=2; n<=N; n++ ))
do
sed_a+="\([0-9]\+\)[^0-9]*"
sed_r+="\\${n}\t" sort_a+=" -k${n}n"
done cut_a="-f$((N+1))-"
echo "# The following commands would be connected by pipes:"
echo "sed 's/${sed_a}/${sed_r}&/'"
echo "sort -t$'\t' ${sort_a}"
echo "cut ${cut_a}"
Janis
digits, though (AFAICS). I.e. you pass N rather than finding it.
But it takes a long time to do it my way, a call to sed for each
filename, so I tried to cut down the time taken to do this and came up
with:
bash: exnums() { shopt -s extglob; j="${@//+([^[:digit:]])/ }"; printf
'%s%s\n' "$j" "$@"; }
ksh: exnums() { j="${@//+([^[:digit:]])/ }"; printf '%s%s\n' "$j" "$@";
}
ksh seems to do the extglob needed for bash natively.
removing the sed calls from exnum changes the time taken from 37 secs to
under 1 sec with 2000+ files ksh is faster than bash, ksh 50% of the
bash time taken.
Substituting a tab for the replacement space in j= and -t$'\t' in sort
would seem to allow spaces in filenames, too, as you originally had it.
I finally remembered which tool has "versionsort(3)" -- it's ls:
$ ls -1
test10.txt
test1.txt
test2.txt
$ ls -v -1
test1.txt
test2.txt
test10.txt
Does that help?
I finally remembered which tool has "versionsort(3)" -- it's ls:
$ ls -1
test10.txt
test1.txt
test2.txt
$ ls -v -1
test1.txt
test2.txt
test10.txt
Does that help?
I didn't realise it could work like that. Thanks.
In article <7cbgkkx149.ln2@slack15-a.local.uk>,
...
I finally remembered which tool has "versionsort(3)" -- it's ls:
$ ls -1
test10.txt
test1.txt
test2.txt
$ ls -v -1
test1.txt
test2.txt
test10.txt
Does that help?
I didn't realise it could work like that. Thanks.
To OP: Does "ls -v" meet your criteria?
On 19.06.2024 17:06, Chris Elvidge wrote:
On 19/06/2024 at 14:11, Janis Papanagnou wrote:
[...]
I tried the approach I outlined above... (here just echo'ing the
created parts)...
N=${1:-1}
sed_a="[^0-9]*\([0-9]\+\)[^0-9]*"
sed_r="\1\t"
sort_a="-k1n"
for (( n=2; n<=N; n++ ))
do
sed_a+="\([0-9]\+\)[^0-9]*"
sed_r+="\\${n}\t"
sort_a+=" -k${n}n"
done
cut_a="-f$((N+1))-"
echo "# The following commands would be connected by pipes:"
echo "sed 's/${sed_a}/${sed_r}&/'"
echo "sort -t$'\t' ${sort_a}"
echo "cut ${cut_a}"
Your way is still restricted to filenames with a known number of sets of
digits, though (AFAICS). I.e. you pass N rather than finding it.
Yes. Above is just a codified version of the method I described
(thus also the echo's). Whether it's provided as parameter N or
obtained, say, from one of the files is left unanswered. Myself
I'd prefer some solution where even file sets with mixed amounts
of numerical parts may be used; thus being able to handle lists
that are named like chapters, like 1, 1.1, 1.2, ..., 5.3.3
Slowly and continuously approaching the goal... :-)
Janis
[...]
[...]
Originally you said:
(Ideally I'd also like to handle names with two numbers "A35P56.txt"
and irregular string components (lowercase, say, "page310ch1.txt"),
but that's just a nice-to-have. - I might make use of 'sort'?)
Does 'sort -V' help?
Seems to work with both spaces and newlines.