• Numerically sorted arguments (in shell)

    From Janis Papanagnou@21:1/5 to All on Fri Jun 14 09:31:18 2024
    I'm using ksh here...

    I can set the shell parameters in numerical order

    $ set {1..100}

    then sort them _lexicographically_

    $ set -s

    Or do both in one

    $ set -s {1..100}

    I haven't found anything to sort them _numerically_ in shell.

    What I'm trying to do is iterating over files, say,
    P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM P7.HTM
    P8.HTM P9.HTM
    in numerical order.

    Setting the files as shell arguments with P*.HTM will also produce lexicographical order.

    The preceding files are just samples. It should work also if the
    numbers are non-consecutive (say, 2, 10, 10000, 3333333) so that
    iterating using a for-loop and building the list is not an option.

    (Ideally I'd also like to handle names with two numbers "A35P56.txt"
    and irregular string components (lowercase, say, "page310ch1.txt"),
    but that's just a nice-to-have. - I might make use of 'sort'?)


    But the primary question is; how to organize/iterate the arguments *numerically* _in shell_? (If that's possible in some simple way.)


    N.B.: I prefer not to use external commands like 'sort' because of
    the negative side effects and bulky code to handle newlines and
    blanks in filenames, and messing around with quotes.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Axel Reichert@21:1/5 to Janis Papanagnou on Sat Jun 15 15:56:22 2024
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    What I'm trying to do is iterating over files, say,
    P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM P7.HTM
    P8.HTM P9.HTM
    in numerical order.

    Could you employ printf to add leading zeros, sort lexicogpaphically and
    then remove the zeros?

    (Ideally I'd also like to handle names with two numbers "A35P56.txt"
    and irregular string components (lowercase, say, "page310ch1.txt"),
    but that's just a nice-to-have. - I might make use of 'sort'?)

    You did not yet mention what your final goal is with the numerically
    sorted list.

    In case this is in the end a renaming task, for this level of complexity
    I would use the "wdired" mode of Emacs ("write directory edits") and use regexes for search and replace. Or some other "multi-rename" tools from
    the command line.

    Best regards

    Axel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Axel Reichert on Sat Jun 15 16:43:50 2024
    On 15.06.2024 15:56, Axel Reichert wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    What I'm trying to do is iterating over files, say,
    P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM P7.HTM
    P8.HTM P9.HTM
    in numerical order.

    Could you employ printf to add leading zeros, sort lexicogpaphically and
    then remove the zeros?

    Something like this (not using printf but a more elementary method) is
    actually what I'm currently doing. (It's not really complex but quite
    some data fiddling I wanted to avoid. I have that in a script and it's
    not a general solution but handles just simple cases like the sample
    data above, which is 95% of my usages so it's okay for that but not
    really satisfying for other or more general cases.)

    To get the original name back I think I'd have to store the original
    names along with the new names. (Which is something I've not yet done.)


    (Ideally I'd also like to handle names with two numbers "A35P56.txt"
    and irregular string components (lowercase, say, "page310ch1.txt"),
    but that's just a nice-to-have. - I might make use of 'sort'?)

    You did not yet mention what your final goal is with the numerically
    sorted list.

    The original application was that I simply wanted to sequentially skim
    through a number of files.[*] In the past (where possible) I've just
    renamed the files to let them have numbers of equal length (as noted
    above). But the general task I envision is that I don't want to change
    any name of data files but just be able to iterate over these files,
    or list them numerically sorted (and without the known issues of \n
    and blank handling).

    I thought that a contemporary shell would probably support that but I
    was astonished that (at least in ksh) it wasn't supported (as far as
    I saw).[**]


    In case this is in the end a renaming task, for this level of complexity
    I would use the "wdired" mode of Emacs ("write directory edits") and use regexes for search and replace. Or some other "multi-rename" tools from
    the command line.

    I've my own script to adjust numbers in files. But as said, I'd rather
    want to iterate or sort, like the lexicographic ordering in ksh

    set -s page*.gif

    (which in that example is anyway the default for wildcard patterns)
    something similar for numeric argument setting (or pattern expansion)

    set --numerical page*.gif

    To use that features more widely it would be nice if the wild-card
    expansion could be controlled, say by

    set -o numerical

    Well, maybe that all makes no sense and should be tackled differently?
    But it's how it appears to me at the moment. (Feel free to enlighten
    me. :-)

    Janis

    [*] I occasionally have this task; the last time was when I wanted to
    read old typewriter documents that had been scanned page-wise as GIF
    files.

    [**] Yet I haven't checked Zsh; that shell supports some non-standard
    modifiers in certain zsh-specific constructs, so it might possible it
    has support for this requirement as well. (But Zsh is not the shell
    I'm using so I'm primarily seeking for a Ksh solution or POSIX shell workaround.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Janis Papanagnou on Sat Jun 15 22:49:05 2024
    On Fri, 14 Jun 2024 09:31:18 +0200, Janis Papanagnou wrote:

    I'm using ksh here...

    At some point, you have to accept that trying to do everything in a shell language is not the best way to go, and that it is time to switch to a “real” programming language.

    For example, Perl or Python could do this much more easily.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Lawrence D'Oliveiro on Sun Jun 16 05:11:29 2024
    On 16.06.2024 00:49, Lawrence D'Oliveiro wrote:
    On Fri, 14 Jun 2024 09:31:18 +0200, Janis Papanagnou wrote:

    I'm using ksh here...

    At some point, you have to accept that trying to do everything in a shell language is not the best way to go, and that it is time to switch to a “real” programming language.

    What argument are you trying to make up? What makes you think
    I'm doing "everything in a shell"?

    (My approach is to take the appropriate tools and language from
    the set of (a dozen, or so) "real" languages I know, plus from
    the set of a handful of scripting languages that I know.)

    For example, Perl or Python could do this much more easily.

    I don't know of this feature in Perl or Python; please provide
    some hint if there is a feature like the one I need. Some code
    samples for demonstration of your point are also welcome.

    (Only in case you missed it; I'm not [primarily] looking for a
    program; for the described task I'm fine with what I have done.
    Rather I'm looking for an inherent feature that supports what I
    described elsethread. And I like to have this feature in shell
    since a shell is my standard interface to my Unix system. HTH.)

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Janis Papanagnou on Sun Jun 16 04:56:39 2024
    On Sun, 16 Jun 2024 05:11:29 +0200, Janis Papanagnou wrote:

    I don't know of this feature in Perl or Python; please provide some hint
    if there is a feature like the one I need. Some code samples for demonstration of your point are also welcome.

    Python solution:

    import re

    items = \
    [
    "P1.HTM", "P10.HTM", "P11.HTM", "P2.HTM", "P3.HTM",
    "P4.HTM", "P5.HTM", "P6.HTM", "P7.HTM", "P8.HTM", "P9.HTM",
    ]

    print(items)

    print \
    (
    sorted
    (
    items,
    key = lambda f :
    tuple
    (
    (lambda : p, lambda : int(p))[i % 2 != 0]()
    for i, p in enumerate(re.split("([0-9]+)", f))
    )
    )
    )

    output:

    ['P1.HTM', 'P10.HTM', 'P11.HTM', 'P2.HTM', 'P3.HTM', 'P4.HTM', 'P5.HTM', 'P6.HTM', 'P7.HTM', 'P8.HTM', 'P9.HTM']
    ['P1.HTM', 'P2.HTM', 'P3.HTM', 'P4.HTM', 'P5.HTM', 'P6.HTM', 'P7.HTM', 'P8.HTM', 'P9.HTM', 'P10.HTM', 'P11.HTM']

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Lawrence D'Oliveiro on Sun Jun 16 11:48:25 2024
    On 16.06.2024 06:56, Lawrence D'Oliveiro wrote:
    On Sun, 16 Jun 2024 05:11:29 +0200, Janis Papanagnou wrote:

    I don't know of this feature in Perl or Python; please provide some hint
    if there is a feature like the one I need. Some code samples for
    demonstration of your point are also welcome.

    Python solution:

    import re

    items = \
    [
    "P1.HTM", "P10.HTM", "P11.HTM", "P2.HTM", "P3.HTM",
    "P4.HTM", "P5.HTM", "P6.HTM", "P7.HTM", "P8.HTM", "P9.HTM",
    ]

    print(items)

    print \
    (
    sorted
    (
    items,
    key = lambda f :
    tuple
    (
    (lambda : p, lambda : int(p))[i % 2 != 0]()
    for i, p in enumerate(re.split("([0-9]+)", f))
    )
    )
    )

    output:

    ['P1.HTM', 'P10.HTM', 'P11.HTM', 'P2.HTM', 'P3.HTM', 'P4.HTM', 'P5.HTM', 'P6.HTM', 'P7.HTM', 'P8.HTM', 'P9.HTM']
    ['P1.HTM', 'P2.HTM', 'P3.HTM', 'P4.HTM', 'P5.HTM', 'P6.HTM', 'P7.HTM', 'P8.HTM', 'P9.HTM', 'P10.HTM', 'P11.HTM']


    Thanks. Though I'm not familiar with Python to understand that code;
    it's too far from any language I've been using.

    The (for me) interesting question, though, is; how does it solve the
    task I had been addressing? - For convenience I reiterate one main application...

    I want from my shell command line interface call a viewer (or any
    other application) with a list of files. If in shell I do, e.g.,

    viewer P*.HTM

    the list gets sorted lexicographically. How would the main function
    look like that I could embed in my call to make a numerically sorted
    list. Say, something like, for example,

    viewer $( p_sort P*.HTM )

    where p_sort would be the Python code. - Note: this is no appropriate
    solution since it would anyway not work correctly for file names with
    embedded blanks and newlines. I just want to get a closer understanding
    how you think this would be usable in shell (or from shell). Thanks.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Waitzmann@21:1/5 to All on Sun Jun 16 21:52:58 2024
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com>:
    How would the main function look like that I could embed in my
    call to make a numerically sorted list. Say, something like, for
    example,


    viewer $( p_sort P*.HTM )

    where p_sort would be the Python code. - Note: this is no
    appropriate solution since it would anyway not work correctly
    for file names with embedded blanks and newlines. I just want to
    get a closer understanding how you think this would be usable in
    shell (or from shell). Thanks.


    If ‘p_sort’ is designed to output the sorted file names separated
    by an ASCII NUL character rather than a newline then, using the
    GNU version of ‘xargs’, one can feed that output into ‘xargs’:


    {
    p_sort P*.HTM 3<&- |
    xargs --null --no-run-if-empty -- sh -c \
    'exec 0<&3 3<&- "$@"' sh \
    viewer
    } 3<&0


    This will avoid the problems with funny characters (including
    blanks and linefeeds) in filenames processed by the shell.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eric Pozharski@21:1/5 to Janis Papanagnou on Sun Jun 16 18:00:11 2024
    with <v4ll54$3sd11$1@dont-email.me> Janis Papanagnou wrote:
    On 16.06.2024 00:49, Lawrence D'Oliveiro wrote:
    On Fri, 14 Jun 2024 09:31:18 +0200, Janis Papanagnou wrote:

    I'm using ksh here...
    *SKIP* [ 8 lines 2 levels deep] # borderly ad hominem
    (My approach is to take the appropriate tools and language from the
    set of (a dozen, or so) "real" languages I know, plus from the set of
    a handful of scripting languages that I know.)

    (Disclaimer: I'm ksh-ignorant) Speaking of features.

    {14439:44} [0:0]% print -cC6 *
    bar-20.baz bar-3.baz foo-10.bar foo-23.bar foo-5.bar
    bar-21.baz bar-6.baz foo-13.bar foo-29.bar foo-6.bar
    bar-24.baz bar-8.baz foo-1.bar foo-3.bar foo-7.bar
    bar-26.baz foo-0.bar foo-22.bar foo-4.bar foo-8.bar
    {14445:45} [0:0]% print -cC6 *(n)
    bar-3.baz bar-21.baz foo-1.bar foo-6.bar foo-13.bar
    bar-6.baz bar-24.baz foo-3.bar foo-7.bar foo-22.bar
    bar-8.baz bar-26.baz foo-4.bar foo-8.bar foo-23.bar
    bar-20.baz foo-0.bar foo-5.bar foo-10.bar foo-29.bar

    That nymph between weapon and tool is 'glob qualifier' (acts at
    'filename generation' phase). But! It's zsh. That being said, as a
    result of cross-pollination, something similar might be in ksh too. I
    can't say where to dig through ksh-documentation.

    For example, Perl or Python could do this much more easily.
    I don't know of this feature in Perl or Python; please provide some
    hint if there is a feature like the one I need. Some code samples for demonstration of your point are also welcome.

    Well, here's a hint:

    Rule#34: If you can imagine sex, there's a module for this.

    *CUT* [ 10 lines 1 level deep]

    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Janis Papanagnou on Mon Jun 17 05:42:57 2024
    On Sun, 16 Jun 2024 11:48:25 +0200, Janis Papanagnou wrote:

    How would the main function look like that I could embed in my call to
    make a numerically sorted list.

    That’s what the code I posted does.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Eric Pozharski on Mon Jun 17 08:21:55 2024
    On 16.06.2024 20:00, Eric Pozharski wrote:
    with <v4ll54$3sd11$1@dont-email.me> Janis Papanagnou wrote:
    [sorting file names numerically]

    (Disclaimer: I'm ksh-ignorant) Speaking of features.

    {14439:44} [0:0]% print -cC6 *
    bar-20.baz bar-3.baz foo-10.bar foo-23.bar foo-5.bar
    bar-21.baz bar-6.baz foo-13.bar foo-29.bar foo-6.bar
    bar-24.baz bar-8.baz foo-1.bar foo-3.bar foo-7.bar
    bar-26.baz foo-0.bar foo-22.bar foo-4.bar foo-8.bar
    {14445:45} [0:0]% print -cC6 *(n)
    bar-3.baz bar-21.baz foo-1.bar foo-6.bar foo-13.bar
    bar-6.baz bar-24.baz foo-3.bar foo-7.bar foo-22.bar
    bar-8.baz bar-26.baz foo-4.bar foo-8.bar foo-23.bar
    bar-20.baz foo-0.bar foo-5.bar foo-10.bar foo-29.bar

    That nymph between weapon and tool is 'glob qualifier' (acts at
    'filename generation' phase). But! It's zsh.

    Yeah, I was positive that Zsh supports such a qualifier.

    That being said, as a
    result of cross-pollination, something similar might be in ksh too. I
    can't say where to dig through ksh-documentation.

    Well, I don't know of any in Ksh. (That's my problem.)

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Helmut Waitzmann on Mon Jun 17 08:30:54 2024
    On 16.06.2024 21:52, Helmut Waitzmann wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com>:
    How would the main function look like that I could embed in my call to
    make a numerically sorted list. Say, something like, for example,

    viewer $( p_sort P*.HTM )

    where p_sort would be the Python code. - Note: this is no appropriate
    solution since it would anyway not work correctly for file names with
    embedded blanks and newlines. I just want to get a closer
    understanding how you think this would be usable in shell (or from
    shell). Thanks.

    If ‘p_sort’ is designed to output the sorted file names separated by an ASCII NUL character rather than a newline then, using the GNU version of ‘xargs’, one can feed that output into ‘xargs’:

    {
    p_sort P*.HTM 3<&- |
    xargs --null --no-run-if-empty -- sh -c \
    'exec 0<&3 3<&- "$@"' sh \
    viewer
    } 3<&0


    This will avoid the problems with funny characters (including blanks and linefeeds) in filenames processed by the shell.

    I'm sure it does. You've actually shown a way to circumvent all the
    issues with $( ... ) . So I'd probably write a wrapper to make that
    code usable in a simpler way. Thanks.

    And for the 'p_sort' function I'll resort to some standard tools...

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Lawrence D'Oliveiro on Mon Jun 17 08:32:21 2024
    On 17.06.2024 07:42, Lawrence D'Oliveiro wrote:
    On Sun, 16 Jun 2024 11:48:25 +0200, Janis Papanagnou wrote:

    How would the main function look like that I could embed in my call to
    make a numerically sorted list.

    That’s what the code I posted does.

    Erm, really? - I've got the impression that it rather sorts only
    the _hard-coded data_, and not to work with arbitrary arguments.
    Anyway. Don't bother.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Lawrence D'Oliveiro on Mon Jun 17 09:44:55 2024
    On 17.06.2024 09:16, Lawrence D'Oliveiro wrote:
    On Mon, 17 Jun 2024 08:32:21 +0200, Janis Papanagnou wrote:

    ... I've got the impression that it rather sorts only the
    _hard-coded data_ ...

    So get the data from the usual sources, e.g. os.listdir().

    Anyway. Don't bother.

    I can only lead the horse to water, I cannot make you drink.

    I wouldn't drink from a poisoned spring. IOW; I'd have to learn
    Python completely to understand your code and get the details
    properly. If it's as simple as you suggested, and since we're
    not in a Python NG, I thought you'd have been able to address
    the original question with your code (as a black box). In the
    present form it's just useless and off-topic here. But as said,
    don't bother.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Janis Papanagnou on Mon Jun 17 07:16:40 2024
    On Mon, 17 Jun 2024 08:32:21 +0200, Janis Papanagnou wrote:

    ... I've got the impression that it rather sorts only the
    _hard-coded data_ ...

    So get the data from the usual sources, e.g. os.listdir().

    Anyway. Don't bother.

    I can only lead the horse to water, I cannot make you drink.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Janis Papanagnou on Tue Jun 18 08:26:45 2024
    On Mon, 17 Jun 2024 09:44:55 +0200, Janis Papanagnou wrote:

    IOW; I'd have to learn Python completely to understand your code and get
    the details properly.

    I give you a fish, you eat for a day. You learn to fish, you eat for a lifetime.

    In the present form it's just useless and off-topic here. But as said,
    don't bother.

    Have you received a better offer yet?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to ldo@nz.invalid on Tue Jun 18 11:14:19 2024
    In article <v4rgc5$18eq9$7@dont-email.me>,
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
    On Mon, 17 Jun 2024 09:44:55 +0200, Janis Papanagnou wrote:

    IOW; I'd have to learn Python completely to understand your code and get
    the details properly.

    I give you a fish, you eat for a day. You learn to fish, you eat for a >lifetime.

    In the present form it's just useless and off-topic here. But as said,
    don't bother.

    Have you received a better offer yet?

    Sounds like somebody's fee-fees got hurt here.

    --
    "Only a genius could lose a billion dollars running a casino."
    "You know what they say: the house always loses."
    "When life gives you lemons, don't pay taxes."
    "Grab 'em by the p***y!"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Geoff Clare@21:1/5 to Helmut Waitzmann on Tue Jun 18 13:32:06 2024
    Helmut Waitzmann wrote:

    If ‘p_sort’ is designed to output the sorted file names separated
    by an ASCII NUL character rather than a newline then, using the
    GNU version of ‘xargs’, one can feed that output into ‘xargs’:

    {
    p_sort P*.HTM 3<&- |
    xargs --null --no-run-if-empty -- sh -c \
    'exec 0<&3 3<&- "$@"' sh \
    viewer
    } 3<&0

    NUL as a record separator is also supported by several other versions
    of xargs, and it is in the recently released POSIX.1-2024 standard.
    In all of those it is specified with -0, so using -0 is more
    portable than the GNU-specific --null.

    POSIX.1-2024 also has -r although I think that's not as widely
    supported in current xargs implementations as -0. It should become
    better supported over time, though, so again I would suggest using -r
    rather than --no-run-if-empty for better future portability.

    --
    Geoff Clare <netnews@gclare.org.uk>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Elvidge@21:1/5 to Janis Papanagnou on Tue Jun 18 14:32:15 2024
    On 14/06/2024 at 08:31, Janis Papanagnou wrote:
    I'm using ksh here...

    I can set the shell parameters in numerical order

    $ set {1..100}

    then sort them _lexicographically_

    $ set -s

    Or do both in one

    $ set -s {1..100}

    I haven't found anything to sort them _numerically_ in shell.

    What I'm trying to do is iterating over files, say,
    P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM P7.HTM
    P8.HTM P9.HTM
    in numerical order.

    Setting the files as shell arguments with P*.HTM will also produce lexicographical order.

    The preceding files are just samples. It should work also if the
    numbers are non-consecutive (say, 2, 10, 10000, 3333333) so that
    iterating using a for-loop and building the list is not an option.

    (Ideally I'd also like to handle names with two numbers "A35P56.txt"
    and irregular string components (lowercase, say, "page310ch1.txt"),
    but that's just a nice-to-have. - I might make use of 'sort'?)


    But the primary question is; how to organize/iterate the arguments *numerically* _in shell_? (If that's possible in some simple way.)


    N.B.: I prefer not to use external commands like 'sort' because of
    the negative side effects and bulky code to handle newlines and
    blanks in filenames, and messing around with quotes.

    Janis


    Can you use an array? E.g. (bash, I don't know ksh, but could be similar)

    for i in P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM
    P7.HTM P8.HTM P9.HTM; do
    j=${i//[![:digit:]]}
    files[j]="$i"
    done
    printf '%s\n' "${files[@]}"
    P1.HTM
    P2.HTM
    P3.HTM
    P4.HTM
    P5.HTM
    P6.HTM
    P7.HTM
    P8.HTM
    P9.HTM
    P10.HTM
    P11.HTM

    I'll have to work on names with two (or more?) numbers.

    --
    Chris Elvidge, England
    BART BUCKS, ARE NOT LEGAL TENDER

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Chris Elvidge on Tue Jun 18 16:38:30 2024
    On 18.06.2024 15:32, Chris Elvidge wrote:
    On 14/06/2024 at 08:31, Janis Papanagnou wrote:
    I'm using ksh here...

    I can set the shell parameters in numerical order

    $ set {1..100}

    then sort them _lexicographically_

    $ set -s

    Or do both in one

    $ set -s {1..100}

    I haven't found anything to sort them _numerically_ in shell.

    What I'm trying to do is iterating over files, say,
    P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM P7.HTM
    P8.HTM P9.HTM
    in numerical order.

    Setting the files as shell arguments with P*.HTM will also produce
    lexicographical order.

    The preceding files are just samples. It should work also if the
    numbers are non-consecutive (say, 2, 10, 10000, 3333333) so that
    iterating using a for-loop and building the list is not an option.

    (Ideally I'd also like to handle names with two numbers "A35P56.txt"
    and irregular string components (lowercase, say, "page310ch1.txt"),
    but that's just a nice-to-have. - I might make use of 'sort'?)


    But the primary question is; how to organize/iterate the arguments
    *numerically* _in shell_? (If that's possible in some simple way.)


    N.B.: I prefer not to use external commands like 'sort' because of
    the negative side effects and bulky code to handle newlines and
    blanks in filenames, and messing around with quotes.

    Janis


    Can you use an array? E.g. (bash, I don't know ksh, but could be similar)

    Yes, Ksh supports both, indexed and associative arrays.


    for i in P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM
    P7.HTM P8.HTM P9.HTM; do
    j=${i//[![:digit:]]}
    files[j]="$i"
    done
    printf '%s\n' "${files[@]}"
    P1.HTM
    P2.HTM
    P3.HTM
    P4.HTM
    P5.HTM
    P6.HTM
    P7.HTM
    P8.HTM
    P9.HTM
    P10.HTM
    P11.HTM

    I'll have to work on names with two (or more?) numbers.

    One thing that concerns me with arrays is that I seem to recall that
    there was a limit in the number of array elements (which might be an
    issue on lengthy lists of files). But some ad hoc tests seem to show
    that if there's a limit it's not any more in the 1k/4k elements range
    as it had been. (Bolski/Korn says their arrays support at least 4k.)

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Lawrence D'Oliveiro on Tue Jun 18 16:27:33 2024
    On 18.06.2024 10:26, Lawrence D'Oliveiro wrote:

    In the present form it's just useless and off-topic here. But as said,
    don't bother.

    Have you received a better offer yet?

    Better than /dev/random ? - No, not yet. But thanks for trying.

    (You gave the impression that using python would be "better".
    Obviously it isn't. - Why buy a non-standard tool/solution if
    I can solve the task with standard Unix tools myself. So don't
    bother.)

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Elvidge@21:1/5 to Janis Papanagnou on Tue Jun 18 16:19:45 2024
    On 18/06/2024 at 15:38, Janis Papanagnou wrote:
    On 18.06.2024 15:32, Chris Elvidge wrote:
    On 14/06/2024 at 08:31, Janis Papanagnou wrote:
    I'm using ksh here...

    I can set the shell parameters in numerical order

    $ set {1..100}

    then sort them _lexicographically_

    $ set -s

    Or do both in one

    $ set -s {1..100}

    I haven't found anything to sort them _numerically_ in shell.

    What I'm trying to do is iterating over files, say,
    P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM P7.HTM
    P8.HTM P9.HTM
    in numerical order.

    Setting the files as shell arguments with P*.HTM will also produce
    lexicographical order.

    The preceding files are just samples. It should work also if the
    numbers are non-consecutive (say, 2, 10, 10000, 3333333) so that
    iterating using a for-loop and building the list is not an option.

    (Ideally I'd also like to handle names with two numbers "A35P56.txt"
    and irregular string components (lowercase, say, "page310ch1.txt"),
    but that's just a nice-to-have. - I might make use of 'sort'?)


    But the primary question is; how to organize/iterate the arguments
    *numerically* _in shell_? (If that's possible in some simple way.)


    N.B.: I prefer not to use external commands like 'sort' because of
    the negative side effects and bulky code to handle newlines and
    blanks in filenames, and messing around with quotes.

    Janis


    Can you use an array? E.g. (bash, I don't know ksh, but could be similar)

    Yes, Ksh supports both, indexed and associative arrays.


    for i in P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM
    P7.HTM P8.HTM P9.HTM; do
    j=${i//[![:digit:]]}
    files[j]="$i"
    done
    printf '%s\n' "${files[@]}"
    P1.HTM
    P2.HTM
    P3.HTM
    P4.HTM
    P5.HTM
    P6.HTM
    P7.HTM
    P8.HTM
    P9.HTM
    P10.HTM
    P11.HTM

    I'll have to work on names with two (or more?) numbers.

    One thing that concerns me with arrays is that I seem to recall that
    there was a limit in the number of array elements (which might be an
    issue on lengthy lists of files). But some ad hoc tests seem to show
    that if there's a limit it's not any more in the 1k/4k elements range
    as it had been. (Bolski/Korn says their arrays support at least 4k.)

    Janis


    I tested in ksh - works as written.

    From here: https://unix.stackexchange.com/questions/195191/ksh-bash-maximum-size-of-an-array

    <quote>

    This simple script shows on my systems (Gnu/Linux and Solaris):

    ksh88 limits the size to 2^12-1 (4095). (subscript out of range ).
    Some older releases like the one on HP-UX limit the size to 1023.

    ksh93 limits the size of a array to 2^22-1 (4194303), your mileage
    may vary.

    bash doesn't look to impose any hard-coded limit outside the one
    dictated by the underlying memory resources available. For example bash
    uses 1.3 GB of virtual memory for an array size of 18074340.

    </quote>

    --
    Chris Elvidge, England
    BART BUCKS, ARE NOT LEGAL TENDER

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Chris Elvidge on Tue Jun 18 19:12:20 2024
    On 18.06.2024 17:19, Chris Elvidge wrote:
    On 18/06/2024 at 15:38, Janis Papanagnou wrote:
    [...]
    One thing that concerns me with arrays is that I seem to recall that
    there was a limit in the number of array elements (which might be an
    issue on lengthy lists of files). But some ad hoc tests seem to show
    that if there's a limit it's not any more in the 1k/4k elements range
    as it had been. (Bolski/Korn says their arrays support at least 4k.)

    I tested in ksh - works as written.

    From here: https://unix.stackexchange.com/questions/195191/ksh-bash-maximum-size-of-an-array


    <quote>

    This simple script shows on my systems (Gnu/Linux and Solaris):

    ksh88 limits the size to 2^12-1 (4095). (subscript out of range ).
    Some older releases like the one on HP-UX limit the size to 1023.

    Yeah, I recall these limits from ksh88 on AIX and HP-UX.


    ksh93 limits the size of a array to 2^22-1 (4194303), your mileage
    may vary.

    I've tried on my system just with a million names (yet more
    filenames than we want to have in our directories in one place).


    bash doesn't look to impose any hard-coded limit outside the one
    dictated by the underlying memory resources available. For example bash
    uses 1.3 GB of virtual memory for an array size of 18074340.

    I didn't notice a limit for my old Bash or Ksh on my system.

    So your array approach looks promising for one numeric key, and
    it's a nice and terse solution.

    Janis


    </quote>


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Janis Papanagnou on Tue Jun 18 19:04:41 2024
    I've just tried a Unix tools based solution (with sed, sort, cut).

    Up to and including the line containing 'shuf' is data generation,
    the rest (starting with 'sed') extracts and sorts the data. I've
    written it for TWO numeric sort keys (see printf format specifier)

    for (( i=1; i<=50; i++ ))
    do
    for (( j=2; j<=120; j+=3 ))
    do
    printf "a%db%dc.txt\n" i j
    done
    done |
    shuf |

    sed 's/[^0-9]*\([0-9]\+\)[^0-9]*\([0-9]\+\)[^0-9]*/\1\t\2\t&/' |
    sort -t$'\t' -k1n -k2n |
    cut -f3-

    For just one numeric argument this can be simplified (shorter sed
    pattern, simpler sort -n command), and for more than two numeric
    fields it can be modified to dynamically construct the sed pattern,
    the sort option list, and the cut parameter, once at the beginning;
    that way we could have a tool for arbitrary amounts of numeric keys
    in the file name.

    Note: this program doesn't handle pathological filenames (newlines).

    Janis


    On 14.06.2024 09:31, Janis Papanagnou wrote:
    I'm using ksh here...

    I can set the shell parameters in numerical order

    $ set {1..100}

    then sort them _lexicographically_

    $ set -s

    Or do both in one

    $ set -s {1..100}

    I haven't found anything to sort them _numerically_ in shell.

    What I'm trying to do is iterating over files, say,
    P1.HTM P10.HTM P11.HTM P2.HTM P3.HTM P4.HTM P5.HTM P6.HTM P7.HTM
    P8.HTM P9.HTM
    in numerical order.

    Setting the files as shell arguments with P*.HTM will also produce lexicographical order.

    The preceding files are just samples. It should work also if the
    numbers are non-consecutive (say, 2, 10, 10000, 3333333) so that
    iterating using a for-loop and building the list is not an option.

    (Ideally I'd also like to handle names with two numbers "A35P56.txt"
    and irregular string components (lowercase, say, "page310ch1.txt"),
    but that's just a nice-to-have. - I might make use of 'sort'?)


    But the primary question is; how to organize/iterate the arguments *numerically* _in shell_? (If that's possible in some simple way.)


    N.B.: I prefer not to use external commands like 'sort' because of
    the negative side effects and bulky code to handle newlines and
    blanks in filenames, and messing around with quotes.

    Janis


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eric Pozharski@21:1/5 to Janis Papanagnou on Tue Jun 18 14:54:58 2024
    with <v4okm3$h4cs$1@dont-email.me> Janis Papanagnou wrote:
    On 16.06.2024 20:00, Eric Pozharski wrote:
    with <v4ll54$3sd11$1@dont-email.me> Janis Papanagnou wrote:

    *SKIP* [ 20 lines 3 levels deep]
    That being said, as a result of cross-pollination, something similar
    might be in ksh too. I can't say where to dig through
    ksh-documentation.

    Well, I don't know of any in Ksh. (That's my problem.)

    Is it because oh-my-bad documentation or ksh seeks minimal feature-set?

    p.s. Lack of features is a feature by itself, there's that.

    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Janis Papanagnou on Tue Jun 18 19:23:33 2024
    On 18.06.2024 19:12, Janis Papanagnou wrote:

    So your array approach looks promising for one numeric key, and
    it's a nice and terse solution.

    Forgot to praise its nice property to also handle newlines in the
    filenames.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Eric Pozharski on Wed Jun 19 02:18:50 2024
    On 18.06.2024 16:54, Eric Pozharski wrote:
    with <v4okm3$h4cs$1@dont-email.me> Janis Papanagnou wrote:
    On 16.06.2024 20:00, Eric Pozharski wrote:
    with <v4ll54$3sd11$1@dont-email.me> Janis Papanagnou wrote:

    [ zsh's glob qualifier for numerically sorted glob expansion ]

    *SKIP* [ 20 lines 3 levels deep]
    That being said, as a result of cross-pollination, something similar
    might be in ksh too. I can't say where to dig through
    ksh-documentation.

    Well, I don't know of any in Ksh. (That's my problem.)

    Is it because oh-my-bad documentation or ksh seeks minimal feature-set?

    Ksh has really a lot features. But not this ["basic" (sort of)] one.

    (Well, I might as well have just missed it in the docs, but there's
    also the Bolsky/Korn book where I didn't see it. And I'm using that
    shell so long. And I've also got no hints yet.)

    p.s. Lack of features is a feature by itself, there's that.

    Lacking features is certainly no feature of Ksh. ;-)

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Lawrence D'Oliveiro on Wed Jun 19 01:05:20 2024
    On 2024-06-18, Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
    On Mon, 17 Jun 2024 09:44:55 +0200, Janis Papanagnou wrote:

    IOW; I'd have to learn Python completely to understand your code and get
    the details properly.

    I give you a fish, you eat for a day. You learn to fish, you eat for a lifetime.

    You fall into the Python trap, and eat shit for a lifetime.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Waitzmann@21:1/5 to All on Wed Jun 19 02:22:59 2024
    Geoff Clare <geoff@clare.See-My-Signature.invalid>:
    Helmut Waitzmann wrote:

    If ‘p_sort’ is designed to output the sorted file names
    separated by an ASCII NUL character rather than a newline
    then, using the GNU version of ‘xargs’, one can feed that
    output into ‘xargs’:


    {
    p_sort P*.HTM 3<&- |
    xargs --null --no-run-if-empty -- sh -c \
    'exec 0<&3 3<&- "$@"' sh \
    viewer
    } 3<&0

    NUL as a record separator is also supported by several other
    versions of xargs, and it is in the recently released
    POSIX.1-2024 standard.


    I'm glad to read that.  I didn't know either.


    In all of those it is specified with -0, so using -0 is more
    portable than the GNU-specific --null.


    Yes, of course:  If ‘-0’ is in the POSIX standard, it is
    preferable over ‘--null’.


    POSIX.1-2024 also has -r although I think that's not as widely
    supported in current xargs implementations as -0. It should
    become better supported over time, though, so again I would
    suggest using -r rather than --no-run-if-empty for better future portability.


    I didn't know, that ‘-0’ as well as ‘-r’ are more widely
    available (with the same semantics) than just in the GNU
    version.  To minimize the risk of having a ‘xargs’ version, which
    by accident uses the options ‘-0’ or ‘-r’ with different
    semantics than GNU ‘xargs’ does, I preferred the long options (in
    particular ‘--no-run-if-empty’) over the short.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Elvidge@21:1/5 to Janis Papanagnou on Wed Jun 19 13:40:28 2024
    On 18/06/2024 at 18:04, Janis Papanagnou wrote:
    I've just tried a Unix tools based solution (with sed, sort, cut).

    Up to and including the line containing 'shuf' is data generation,
    the rest (starting with 'sed') extracts and sorts the data. I've
    written it for TWO numeric sort keys (see printf format specifier)

    for (( i=1; i<=50; i++ )) do for (( j=2; j<=120; j+=3 )) do printf "a%db%dc.txt\n" i j done done | shuf |

    sed 's/[^0-9]*\([0-9]\+\)[^0-9]*\([0-9]\+\)[^0-9]*/\1\t\2\t&/' | sort
    -t$'\t' -k1n -k2n | cut -f3-

    For just one numeric argument this can be simplified (shorter sed
    pattern, simpler sort -n command), and for more than two numeric
    fields it can be modified to dynamically construct the sed pattern,
    the sort option list, and the cut parameter, once at the beginning;
    that way we could have a tool for arbitrary amounts of numeric keys
    in the file name.

    Note: this program doesn't handle pathological filenames (newlines).

    Janis


    If you're happy not handling pathological filenames:

    for (( i=1; i<=50; i++ )); do for (( j=2; j<=120; j+=3 )); do touch "a${i}b${j}c.txt"; done; done
    to create the files.

    exnums() { j="$(sed 's/[^[:digit:]]\+/ /g' <<<"$@")"; printf '%s%s\n'
    "$j" "$@"; }
    function replaces all non-digit sequences with a space, prints digit sequence(s) and original input.

    for i in *; do exnums "$i"; done | sort -k1n -k2n -k3n -k4n | awk
    '{print $NF}'
    sort doesn't seem to care how many -k you use, fields separated with space.
    awk prints the last field of the input.

    This "seems" to work with all manner of filenames from PNN.htm (as your original sequence) to p323dc45g12.htm, p324dc45g12.htm, p333dc45g12.htm
    Seems to work in ksh, too.


    --
    Chris Elvidge, England
    TAR IS NOT A PLAYTHING

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Chris Elvidge on Wed Jun 19 15:11:27 2024
    On 19.06.2024 14:40, Chris Elvidge wrote:
    On 18/06/2024 at 18:04, Janis Papanagnou wrote:
    I've just tried a Unix tools based solution (with sed, sort, cut).
    [...]
    [...], and for more than two numeric
    fields it can be modified to dynamically construct the sed pattern,
    the sort option list, and the cut parameter, once at the beginning;
    that way we could have a tool for arbitrary amounts of numeric keys in
    the file name.

    Note: this program doesn't handle pathological filenames (newlines).


    If you're happy not handling pathological filenames:

    Well, typically I can indeed ignore them. But it's better of course
    to avoid situations where processing is compromised by such names.


    for (( i=1; i<=50; i++ )); do for (( j=2; j<=120; j+=3 )); do touch "a${i}b${j}c.txt"; done; done
    to create the files.

    exnums() { j="$(sed 's/[^[:digit:]]\+/ /g' <<<"$@")"; printf '%s%s\n'
    "$j" "$@"; }
    function replaces all non-digit sequences with a space, prints digit sequence(s) and original input.

    for i in *; do exnums "$i"; done | sort -k1n -k2n -k3n -k4n | awk
    '{print $NF}'
    sort doesn't seem to care how many -k you use, fields separated with space. awk prints the last field of the input.

    This "seems" to work with all manner of filenames from PNN.htm (as your original sequence) to p323dc45g12.htm, p324dc45g12.htm, p333dc45g12.htm
    Seems to work in ksh, too.

    I tried the approach I outlined above... (here just echo'ing the
    created parts)...


    N=${1:-1}
    sed_a="[^0-9]*\([0-9]\+\)[^0-9]*"
    sed_r="\1\t"
    sort_a="-k1n"
    for (( n=2; n<=N; n++ ))
    do
    sed_a+="\([0-9]\+\)[^0-9]*"
    sed_r+="\\${n}\t"
    sort_a+=" -k${n}n"
    done
    cut_a="-f$((N+1))-"

    echo "# The following commands would be connected by pipes:"
    echo "sed 's/${sed_a}/${sed_r}&/'"
    echo "sort -t$'\t' ${sort_a}"
    echo "cut ${cut_a}"


    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Elvidge@21:1/5 to Janis Papanagnou on Wed Jun 19 16:06:37 2024
    On 19/06/2024 at 14:11, Janis Papanagnou wrote:
    On 19.06.2024 14:40, Chris Elvidge wrote:
    On 18/06/2024 at 18:04, Janis Papanagnou wrote:
    I've just tried a Unix tools based solution (with sed, sort, cut).
    [...]
    [...], and for more than two numeric
    fields it can be modified to dynamically construct the sed pattern,
    the sort option list, and the cut parameter, once at the beginning;
    that way we could have a tool for arbitrary amounts of numeric keys in
    the file name.

    Note: this program doesn't handle pathological filenames (newlines).


    If you're happy not handling pathological filenames:

    Well, typically I can indeed ignore them. But it's better of course
    to avoid situations where processing is compromised by such names.


    for (( i=1; i<=50; i++ )); do for (( j=2; j<=120; j+=3 )); do touch
    "a${i}b${j}c.txt"; done; done
    to create the files.

    exnums() { j="$(sed 's/[^[:digit:]]\+/ /g' <<<"$@")"; printf '%s%s\n'
    "$j" "$@"; }
    function replaces all non-digit sequences with a space, prints digit
    sequence(s) and original input.

    for i in *; do exnums "$i"; done | sort -k1n -k2n -k3n -k4n | awk
    '{print $NF}'
    sort doesn't seem to care how many -k you use, fields separated with space. >> awk prints the last field of the input.

    This "seems" to work with all manner of filenames from PNN.htm (as your
    original sequence) to p323dc45g12.htm, p324dc45g12.htm, p333dc45g12.htm
    Seems to work in ksh, too.

    I tried the approach I outlined above... (here just echo'ing the
    created parts)...


    N=${1:-1}
    sed_a="[^0-9]*\([0-9]\+\)[^0-9]*"
    sed_r="\1\t"
    sort_a="-k1n"
    for (( n=2; n<=N; n++ ))
    do
    sed_a+="\([0-9]\+\)[^0-9]*"
    sed_r+="\\${n}\t"
    sort_a+=" -k${n}n"
    done
    cut_a="-f$((N+1))-"

    echo "# The following commands would be connected by pipes:"
    echo "sed 's/${sed_a}/${sed_r}&/'"
    echo "sort -t$'\t' ${sort_a}"
    echo "cut ${cut_a}"


    Janis


    Your way is still restricted to filenames with a known number of sets of digits, though (AFAICS). I.e. you pass N rather than finding it.

    But it takes a long time to do it my way, a call to sed for each
    filename, so I tried to cut down the time taken to do this and came up with:

    bash: exnums() { shopt -s extglob; j="${@//+([^[:digit:]])/ }"; printf
    '%s%s\n' "$j" "$@"; }

    ksh: exnums() { j="${@//+([^[:digit:]])/ }"; printf '%s%s\n' "$j" "$@"; }

    ksh seems to do the extglob needed for bash natively.

    removing the sed calls from exnum changes the time taken from 37 secs to
    under 1 sec with 2000+ files
    ksh is faster than bash, ksh 50% of the bash time taken.

    Substituting a tab for the replacement space in j= and -t$'\t' in sort
    would seem to allow spaces in filenames, too, as you originally had it.



    --
    Chris Elvidge, England
    SUBSTITUTE TEACHERS ARE NOT SCABS

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From vallor@21:1/5 to All on Wed Jun 19 23:45:15 2024
    On Wed, 19 Jun 2024 16:06:37 +0100, Chris Elvidge <chris@x550c.mshome.net> wrote in <v4us5u$21bu3$1@dont-email.me>:

    On 19/06/2024 at 14:11, Janis Papanagnou wrote:
    On 19.06.2024 14:40, Chris Elvidge wrote:
    On 18/06/2024 at 18:04, Janis Papanagnou wrote:
    I've just tried a Unix tools based solution (with sed, sort, cut).
    [...]
    [...], and for more than two numeric fields it can be modified to
    dynamically construct the sed pattern, the sort option list, and the
    cut parameter, once at the beginning; that way we could have a tool
    for arbitrary amounts of numeric keys in the file name.

    Note: this program doesn't handle pathological filenames (newlines).


    If you're happy not handling pathological filenames:

    Well, typically I can indeed ignore them. But it's better of course to
    avoid situations where processing is compromised by such names.


    for (( i=1; i<=50; i++ )); do for (( j=2; j<=120; j+=3 )); do touch
    "a${i}b${j}c.txt"; done; done to create the files.

    exnums() { j="$(sed 's/[^[:digit:]]\+/ /g' <<<"$@")"; printf '%s%s\n'
    "$j" "$@"; }
    function replaces all non-digit sequences with a space, prints digit
    sequence(s) and original input.

    for i in *; do exnums "$i"; done | sort -k1n -k2n -k3n -k4n | awk
    '{print $NF}'
    sort doesn't seem to care how many -k you use, fields separated with
    space.
    awk prints the last field of the input.

    This "seems" to work with all manner of filenames from PNN.htm (as
    your original sequence) to p323dc45g12.htm, p324dc45g12.htm,
    p333dc45g12.htm Seems to work in ksh, too.

    I tried the approach I outlined above... (here just echo'ing the
    created parts)...


    N=${1:-1}
    sed_a="[^0-9]*\([0-9]\+\)[^0-9]*"
    sed_r="\1\t"
    sort_a="-k1n"
    for (( n=2; n<=N; n++ ))
    do
    sed_a+="\([0-9]\+\)[^0-9]*"
    sed_r+="\\${n}\t" sort_a+=" -k${n}n"
    done cut_a="-f$((N+1))-"

    echo "# The following commands would be connected by pipes:"
    echo "sed 's/${sed_a}/${sed_r}&/'"
    echo "sort -t$'\t' ${sort_a}"
    echo "cut ${cut_a}"


    Janis


    Your way is still restricted to filenames with a known number of sets of digits, though (AFAICS). I.e. you pass N rather than finding it.

    But it takes a long time to do it my way, a call to sed for each
    filename, so I tried to cut down the time taken to do this and came up
    with:

    bash: exnums() { shopt -s extglob; j="${@//+([^[:digit:]])/ }"; printf '%s%s\n' "$j" "$@"; }

    ksh: exnums() { j="${@//+([^[:digit:]])/ }"; printf '%s%s\n' "$j" "$@";
    }

    ksh seems to do the extglob needed for bash natively.

    removing the sed calls from exnum changes the time taken from 37 secs to under 1 sec with 2000+ files ksh is faster than bash, ksh 50% of the
    bash time taken.

    Substituting a tab for the replacement space in j= and -t$'\t' in sort
    would seem to allow spaces in filenames, too, as you originally had it.

    I finally remembered which tool has "versionsort(3)" -- it's ls:

    $ ls -1
    test10.txt
    test1.txt
    test2.txt

    $ ls -v -1
    test1.txt
    test2.txt
    test10.txt

    Does that help?

    --
    -v

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Chris Elvidge on Thu Jun 20 06:34:02 2024
    On 19.06.2024 17:06, Chris Elvidge wrote:
    On 19/06/2024 at 14:11, Janis Papanagnou wrote:
    [...]

    I tried the approach I outlined above... (here just echo'ing the
    created parts)...


    N=${1:-1}
    sed_a="[^0-9]*\([0-9]\+\)[^0-9]*"
    sed_r="\1\t"
    sort_a="-k1n"
    for (( n=2; n<=N; n++ ))
    do
    sed_a+="\([0-9]\+\)[^0-9]*"
    sed_r+="\\${n}\t"
    sort_a+=" -k${n}n"
    done
    cut_a="-f$((N+1))-"

    echo "# The following commands would be connected by pipes:"
    echo "sed 's/${sed_a}/${sed_r}&/'"
    echo "sort -t$'\t' ${sort_a}"
    echo "cut ${cut_a}"


    Your way is still restricted to filenames with a known number of sets of digits, though (AFAICS). I.e. you pass N rather than finding it.

    Yes. Above is just a codified version of the method I described
    (thus also the echo's). Whether it's provided as parameter N or
    obtained, say, from one of the files is left unanswered. Myself
    I'd prefer some solution where even file sets with mixed amounts
    of numerical parts may be used; thus being able to handle lists
    that are named like chapters, like 1, 1.1, 1.2, ..., 5.3.3

    Slowly and continuously approaching the goal... :-)

    Janis

    [...]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to vallor on Thu Jun 20 06:58:11 2024
    On 20.06.2024 01:45, vallor wrote:

    I finally remembered which tool has "versionsort(3)" -- [...]

    It's a pity that this function is a GNU extension, otherwise
    it could be used to implement the desired function in shells
    (ksh, bash) as an additional globbing option (like the zsh
    glob qualifier) or a new 'set' option to control the sorting.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to vallor on Thu Jun 20 06:43:13 2024
    On 20.06.2024 01:45, vallor wrote:
    [...]

    I finally remembered which tool has "versionsort(3)" -- it's ls:

    $ ls -1
    test10.txt
    test1.txt
    test2.txt

    $ ls -v -1
    test1.txt
    test2.txt
    test10.txt

    Does that help?

    Sure, thanks. - Just remember that it's a non-standard option. But
    for my GNU environment it's certainly a usable part of the solution.
    It seems to also handle multiple numeric components as desired (as
    versions usually have)

    $ ls | shuf | xargs ls -v
    1 1.2 1.11 2.1 2.10 10.1 10.10 11.1
    1.1 1.10 2 2.2 2.11 10.2 10.11 11.2


    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From vallor@21:1/5 to janis_papanagnou+ng@hotmail.com on Thu Jun 20 22:16:42 2024
    On Thu, 20 Jun 2024 06:58:11 +0200, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote in <v50ct5$2e2lp$1@dont-email.me>:

    On 20.06.2024 01:45, vallor wrote:

    I finally remembered which tool has "versionsort(3)" -- [...]

    It's a pity that this function is a GNU extension, otherwise it could be
    used to implement the desired function in shells (ksh, bash) as an
    additional globbing option (like the zsh glob qualifier) or a new 'set' option to control the sorting.

    Janis

    I just posted a python program to comp.lang.python that sorts parameters
    using strverscmp(3). It also shell-escapes the (common) IFS characters.

    --
    -v

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to vallor on Fri Jun 21 03:37:51 2024
    On Thu, 20 Jun 2024 22:16:42 -0000 (UTC), vallor wrote:

    I just posted a python program to comp.lang.python that sorts parameters using strverscmp(3).

    I already posted a snippet here which sorts strings containing any number
    of decimal-numerical segments.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From vallor@21:1/5 to ldo@nz.invalid on Fri Jun 21 04:20:47 2024
    On Fri, 21 Jun 2024 03:37:51 -0000 (UTC), Lawrence D'Oliveiro
    <ldo@nz.invalid> wrote in <v52sif$3019v$1@dont-email.me>:

    On Thu, 20 Jun 2024 22:16:42 -0000 (UTC), vallor wrote:

    I just posted a python program to comp.lang.python that sorts
    parameters using strverscmp(3).

    I already posted a snippet here which sorts strings containing any
    number of decimal-numerical segments.

    While I can't speak for others, something about the way you
    went about that rubbed me the wrong way.

    More on-topic, I did post the code in comp.lang.python with a request
    for comments. Sometimes I feel like Manfred Mann's "The Demolition Man",
    where "I kill conversation as I walk into the room." It is no exception
    there, even though it would seem to be a good example of using a C binding
    with python's sort function. (Having very little experience with python programming, I couldn't really say with confidence -- but nobody has
    complained so far.)

    I wrote it to learn something about python, and to be a (q&d) shell
    utility that someone might find useful, possibly Janis. Since
    this is the shell newsgroup, and not the python programming
    newsgroup, I think discussing the finer points of python
    programming really don't belong here. YMMV.

    I'm also put-off by "snip-and-snark" Usenet. I know a few people who
    do that to tick-off the person they're conversing with. I'm guilty
    of doing that from time to time, but I hope it's only when
    the situation warrants it. If my opinion matters: it's a bad habit to
    do so by default.

    --
    -v

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to vallor on Fri Jun 21 05:41:01 2024
    On Fri, 21 Jun 2024 04:20:47 -0000 (UTC), vallor wrote:

    While I can't speak for others, something about the way you went about
    that rubbed me the wrong way.

    I solved the specific problem that seems to be the stumbling block, and
    left the rest as an exercise for the reader.

    That wasn’t up to your particular high standards? You know what you can
    do.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to ldo@nz.invalid on Fri Jun 21 07:31:53 2024
    In article <v533pd$318so$1@dont-email.me>,
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
    On Fri, 21 Jun 2024 04:20:47 -0000 (UTC), vallor wrote:

    While I can't speak for others, something about the way you went about
    that rubbed me the wrong way.

    I solved the specific problem that seems to be the stumbling block, and
    left the rest as an exercise for the reader.

    That wasnt up to your particular high standards? You know what you can
    do.

    Somebody's fee-fees are getting more than a little butt-hurt...

    Sad.

    --
    Those on the right constantly remind us that America is not a
    democracy; now they claim that Obama is a threat to democracy.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Elvidge@21:1/5 to vallor on Fri Jun 21 14:21:43 2024
    On 20/06/24 00:45, vallor wrote:
    On Wed, 19 Jun 2024 16:06:37 +0100, Chris Elvidge <chris@x550c.mshome.net> wrote in <v4us5u$21bu3$1@dont-email.me>:

    On 19/06/2024 at 14:11, Janis Papanagnou wrote:
    On 19.06.2024 14:40, Chris Elvidge wrote:
    On 18/06/2024 at 18:04, Janis Papanagnou wrote:
    I've just tried a Unix tools based solution (with sed, sort, cut).
    [...]
    [...], and for more than two numeric fields it can be modified to
    dynamically construct the sed pattern, the sort option list, and the >>>>> cut parameter, once at the beginning; that way we could have a tool
    for arbitrary amounts of numeric keys in the file name.

    Note: this program doesn't handle pathological filenames (newlines). >>>>>

    If you're happy not handling pathological filenames:

    Well, typically I can indeed ignore them. But it's better of course to
    avoid situations where processing is compromised by such names.


    for (( i=1; i<=50; i++ )); do for (( j=2; j<=120; j+=3 )); do touch
    "a${i}b${j}c.txt"; done; done to create the files.

    exnums() { j="$(sed 's/[^[:digit:]]\+/ /g' <<<"$@")"; printf '%s%s\n'
    "$j" "$@"; }
    function replaces all non-digit sequences with a space, prints digit
    sequence(s) and original input.

    for i in *; do exnums "$i"; done | sort -k1n -k2n -k3n -k4n | awk
    '{print $NF}'
    sort doesn't seem to care how many -k you use, fields separated with
    space.
    awk prints the last field of the input.

    This "seems" to work with all manner of filenames from PNN.htm (as
    your original sequence) to p323dc45g12.htm, p324dc45g12.htm,
    p333dc45g12.htm Seems to work in ksh, too.

    I tried the approach I outlined above... (here just echo'ing the
    created parts)...


    N=${1:-1}
    sed_a="[^0-9]*\([0-9]\+\)[^0-9]*"
    sed_r="\1\t"
    sort_a="-k1n"
    for (( n=2; n<=N; n++ ))
    do
    sed_a+="\([0-9]\+\)[^0-9]*"
    sed_r+="\\${n}\t" sort_a+=" -k${n}n"
    done cut_a="-f$((N+1))-"

    echo "# The following commands would be connected by pipes:"
    echo "sed 's/${sed_a}/${sed_r}&/'"
    echo "sort -t$'\t' ${sort_a}"
    echo "cut ${cut_a}"


    Janis


    Your way is still restricted to filenames with a known number of sets of
    digits, though (AFAICS). I.e. you pass N rather than finding it.

    But it takes a long time to do it my way, a call to sed for each
    filename, so I tried to cut down the time taken to do this and came up
    with:

    bash: exnums() { shopt -s extglob; j="${@//+([^[:digit:]])/ }"; printf
    '%s%s\n' "$j" "$@"; }

    ksh: exnums() { j="${@//+([^[:digit:]])/ }"; printf '%s%s\n' "$j" "$@";
    }

    ksh seems to do the extglob needed for bash natively.

    removing the sed calls from exnum changes the time taken from 37 secs to
    under 1 sec with 2000+ files ksh is faster than bash, ksh 50% of the
    bash time taken.

    Substituting a tab for the replacement space in j= and -t$'\t' in sort
    would seem to allow spaces in filenames, too, as you originally had it.

    I finally remembered which tool has "versionsort(3)" -- it's ls:

    $ ls -1
    test10.txt
    test1.txt
    test2.txt

    $ ls -v -1
    test1.txt
    test2.txt
    test10.txt

    Does that help?


    I didn't realise it could work like that. Thanks.


    --
    Chris Elvidge
    England

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to All on Mon Jun 24 13:22:28 2024
    In article <7cbgkkx149.ln2@slack15-a.local.uk>,
    ...
    I finally remembered which tool has "versionsort(3)" -- it's ls:

    $ ls -1
    test10.txt
    test1.txt
    test2.txt

    $ ls -v -1
    test1.txt
    test2.txt
    test10.txt

    Does that help?


    I didn't realise it could work like that. Thanks.

    To OP: Does "ls -v" meet your criteria?

    --
    Faced with the choice between changing one's mind and proving that there is
    no need to do so, almost everyone gets busy on the proof.

    - John Kenneth Galbraith -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Kenny McCormack on Mon Jun 24 16:01:03 2024
    On 24.06.2024 15:22, Kenny McCormack wrote:
    In article <7cbgkkx149.ln2@slack15-a.local.uk>,
    ...
    I finally remembered which tool has "versionsort(3)" -- it's ls:

    $ ls -1
    test10.txt
    test1.txt
    test2.txt

    $ ls -v -1
    test1.txt
    test2.txt
    test10.txt

    Does that help?


    I didn't realise it could work like that. Thanks.

    To OP: Does "ls -v" meet your criteria?

    Yes, at least partly. (Why are you asking?)

    It meets it in the way that it's ready available and usable
    as external command. It does not solve the shell internal
    additional globbing feature (like in Zsh) that would be
    nice and preferable.

    In the quoted form as 'ls -vQ' some pathological filenames
    are (seemingly) handled, but there's some hassle with the
    quotes in the subsequent processing steps to expect (or so
    I think). That's why an integrated form supported by shell
    would IMO be an advantage; so that we could simply write

    set -o numsortglob # <-- hypothetical shell feature
    for f in version*.gz
    do ...
    done

    At least the output from code like

    for f in $(ls -vQ) ; do printf "'%s'\n" "$f" ; done

    or

    ls -vQ | while IFS= read -r f ; do printf "'%s'\n" "$f" ; done

    indicates that there's still something to do, and without
    the 'ls -Q' quotes the (pathological) newlines are an issue
    (at least).

    I think it's a typical problem that would best be solved
    by a shell built-in feature. (External tools may take you
    part of the road and probably with additional effort for
    common subsets of the task but probably not bullet-proof.)

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Elvidge@21:1/5 to Janis Papanagnou on Mon Jun 24 15:22:24 2024
    On 20/06/2024 at 05:34, Janis Papanagnou wrote:
    On 19.06.2024 17:06, Chris Elvidge wrote:
    On 19/06/2024 at 14:11, Janis Papanagnou wrote:
    [...]

    I tried the approach I outlined above... (here just echo'ing the
    created parts)...


    N=${1:-1}
    sed_a="[^0-9]*\([0-9]\+\)[^0-9]*"
    sed_r="\1\t"
    sort_a="-k1n"
    for (( n=2; n<=N; n++ ))
    do
    sed_a+="\([0-9]\+\)[^0-9]*"
    sed_r+="\\${n}\t"
    sort_a+=" -k${n}n"
    done
    cut_a="-f$((N+1))-"

    echo "# The following commands would be connected by pipes:"
    echo "sed 's/${sed_a}/${sed_r}&/'"
    echo "sort -t$'\t' ${sort_a}"
    echo "cut ${cut_a}"


    Your way is still restricted to filenames with a known number of sets of
    digits, though (AFAICS). I.e. you pass N rather than finding it.

    Yes. Above is just a codified version of the method I described
    (thus also the echo's). Whether it's provided as parameter N or
    obtained, say, from one of the files is left unanswered. Myself
    I'd prefer some solution where even file sets with mixed amounts
    of numerical parts may be used; thus being able to handle lists
    that are named like chapters, like 1, 1.1, 1.2, ..., 5.3.3

    Slowly and continuously approaching the goal... :-)

    Janis

    [...]


    Originally you said:
    (Ideally I'd also like to handle names with two numbers "A35P56.txt"
    and irregular string components (lowercase, say, "page310ch1.txt"),
    but that's just a nice-to-have. - I might make use of 'sort'?)

    Does 'sort -V' help?
    Seems to work with both spaces and newlines.


    --
    Chris Elvidge, England
    NO ONE CARES WHAT MY DEFINITION OF "IS" IS

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Chris Elvidge on Mon Jun 24 17:22:00 2024
    On 24.06.2024 16:22, Chris Elvidge wrote:
    [...]

    Originally you said:
    (Ideally I'd also like to handle names with two numbers "A35P56.txt"
    and irregular string components (lowercase, say, "page310ch1.txt"),
    but that's just a nice-to-have. - I might make use of 'sort'?)

    Does 'sort -V' help?
    Seems to work with both spaces and newlines.

    Yes, that would help like 'ls -v' does (presuming it behaves similar;
    I haven't extensively tried 'sort -V' yet). (But using 'ls | sort -V'
    is not that terse like 'ls -v'.) The problem with both external tools
    is (as outlined upthread) the post-processing of the tool-generated
    list of data in shell context. So both tools take some burden from
    me (the sorting aspect is simply covered by an option), but doesn't
    help me how I can safely post-process the identified items. (A shell
    built-in could natively better, i.e. simpler and more consistently,
    address that.)

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)