• Splitting in shell (bash)

    From Kenny McCormack@21:1/5 to All on Sat Nov 9 16:19:17 2024
    There is a feature that is prominently missing from the shell language (I
    am speaking primarily of bash here) - which is the ability to split a
    string on a delimiter. This is a common operation in most other text-processing oriented languages (AWK, Perl, etc).

    First note/caveat: I'm not interested in any solution involving IFS, for
    two reasons:
    1) IFS-based solutions never work for me.
    2) Changing IFS is inherently dangerous, because, well, IFS itself in
    inherently dangerous. Yes, I know it has been somewhat de-fanged
    recently - but it is still dangerous.

    Anyway, the point of this thread is that I have recently developed a good solution for this, using bash's "mapfile" command. Suppose we have a
    string in a variable (foo) like: foo;bar;bletch

    I.e., with ; as the delimiter.

    This works well, with a couple of caveats:

    mapfile -td ';' <<< "$foo"

    Caveats:
    1) You can only have one, single character delimiter. It'd be nice if
    you could have a reg-exp, like in GAWK.
    2) If the output you're processing comes from a process, as is usually
    the case, special care must be taken:

    mapfile -td ';' < <(someprocess | awk 1 ORS=)

    The point is that since ';' is now the delimiter, the newline at the end of
    the line coming from someprocess is not treated as a delimiter, so it must
    be disposed of with the AWK script. It'd be nice if you could make both
    ';' and '\n' be recognized as delimiters.

    But other than those two caveats, it works well.

    --
    The randomly chosen signature file that would have appeared here is more than 4 lines long. As such, it violates one or more Usenet RFCs. In order to remain in compliance with said RFCs, the actual sig can be found at the following URL:
    http://user.xmission.com/~gazelle/Sigs/Cancer

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lem Novantotto@21:1/5 to All on Sun Nov 10 00:51:58 2024
    Il Sat, 9 Nov 2024 16:19:17 -0000 (UTC), Kenny McCormack ha scritto:

    First note/caveat: I'm not interested in any solution involving IFS, for
    two reasons:
    1) IFS-based solutions never work for me.
    2) Changing IFS is inherently dangerous, because, well, IFS itself
    inherently dangerous. Yes, I know it has been somewhat de-fanged
    recently - but it is still dangerous.

    Sorry to bother you, but... could you please give me some hints? Why
    dangerous? Thanks. :-)

    Anyway, the point of this thread is that I have recently developed a
    good solution for this, using bash's "mapfile" command.
    [...]
    mapfile -td ';' <<< "$foo"

    Yes, mapfile is a good solution. Of course, when you use <<<, no word
    expansion is performed, so IFS is out of the question.

    Caveats:
    1) You can only have one, single character delimiter.

    But you could have more using IFS. Something like:

    | $ IFS=";:[ENTER]
    | "
    | $ declare myarray
    | $ for i in $(echo "one:two;three 3[ENTER]
    | four"); do myarray+=( $i ); done
    | $ echo ${myarray[@]}
    | one two three 3 four
    | $ echo ${myarray[2]}
    | three 3

    What would be wrong with it?
    --
    Bye, Lem

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Lem Novantotto on Sun Nov 10 01:25:49 2024
    On Sun, 10 Nov 2024 00:51:58 -0000 (UTC), Lem Novantotto wrote:

    But you could have more using IFS.

    IFS is the way to go.

    If you want your change to IFS to be only temporary, you can restrict it
    to a subshell by putting the code sequence in “( ... )”. But then you cannot pass variables back to the parent shell.

    Another option is to use a coproc command.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to Kenny McCormack on Sun Nov 10 05:55:31 2024
    In article <vgo225$91aq$1@news.xmission.com>,
    Kenny McCormack <gazelle@shell.xmission.com> wrote:
    ...
    First note/caveat: I'm not interested in any solution involving IFS, for
    two reasons:
    1) IFS-based solutions never work for me.
    2) Changing IFS is inherently dangerous, because, well, IFS itself in inherently dangerous. Yes, I know it has been somewhat de-fanged
    recently - but it is still dangerous.
    ...
    This works well, with a couple of caveats:

    mapfile -td ';' <<< "$foo"

    Caveats:
    1) You can only have one, single character delimiter. It'd be nice if
    you could have a reg-exp, like in GAWK.
    2) If the output you're processing comes from a process, as is usually
    the case, special care must be taken:

    mapfile -td ';' < <(someprocess | awk 1 ORS=)

    It occured to me after posting this that a somewhat simpler approach would
    be to just convert all the delimiters to newlines, like this:

    mapfile -t < <(someprocess | sed 's/;/\n/g')

    --
    1) The only professionals who refer to their customers as "users" are
    computer guys and drug dealers.
    2) The only professionals who refer to their customers as "clients" are
    lawyers and prostitutes.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Axel Reichert@21:1/5 to Kenny McCormack on Sun Nov 10 09:14:45 2024
    gazelle@shell.xmission.com (Kenny McCormack) writes:

    Kenny McCormack <gazelle@shell.xmission.com> wrote:
    mapfile -td ';' < <(someprocess | awk 1 ORS=)

    [...]

    mapfile -t < <(someprocess | sed 's/;/\n/g')

    And in your original post you wrote:

    There is a feature that is prominently missing from the shell language
    (I am speaking primarily of bash here) - which is the ability to split
    a string on a delimiter. This is a common operation in most other
    text-processing oriented languages (AWK, Perl, etc).

    So why bother with a shell solution and why bother with avoiding IFS,
    when in the end you need to resort to AWK/sed anyway?

    Do not get me wrong, I am learning a lot in this thread here, much of
    the stuff is far beyond my level of expertise in shell programming, and
    it would be great to have a shell-only solution for your inquiry, even
    if only for "academic reasons" because, say, the solution (still to
    come) may turn out to be too clumsy for daily use). I will applaud such
    a result, but for the time being I would be happy if you could elaborate somewhat more about your motivation for this exercise.

    Best regards

    Axel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to janis_papanagnou+ng@hotmail.com on Sun Nov 10 09:32:10 2024
    In article <vgptv9$ahrg$1@dont-email.me>,
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    ...
    To add another thought to the original question, and since I know
    that the OP already has the relevant experience for that; for such
    a basic function writing a shell built-in could be appropriate.
    (That's not portable, but I think the OP doesn't care about that.)

    Not a bad idea, actually.

    In fact, one of my motivations for posting is the hope that a bash dev
    might see this and be motivated to add the functionality to the shell.
    Then I wouldn't have to write it myself.

    That and/or be motivated to fix the IFS handling in the shell.

    --
    Marshall: 10/22/51
    Jessica: 4/4/79

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Axel Reichert on Sun Nov 10 10:21:44 2024
    On 10.11.2024 09:14, Axel Reichert wrote:
    gazelle@shell.xmission.com (Kenny McCormack) writes:

    Kenny McCormack <gazelle@shell.xmission.com> wrote:
    mapfile -td ';' < <(someprocess | awk 1 ORS=)

    [...]

    mapfile -t < <(someprocess | sed 's/;/\n/g')

    And in your original post you wrote:

    There is a feature that is prominently missing from the shell language
    (I am speaking primarily of bash here) - which is the ability to split
    a string on a delimiter. This is a common operation in most other
    text-processing oriented languages (AWK, Perl, etc).

    So why bother with a shell solution and why bother with avoiding IFS,
    when in the end you need to resort to AWK/sed anyway?

    If we're on a targeted platform, looking for a solution for a simple
    function, but having made bad experiences with some technical detail
    on the platform, then we're trying to peek in various directions to
    find any (sufficiently acceptable) workaround. - Sort of; I guess.

    (Personally I'd say that 'IFS' is not that bad to completely avoid
    it. But I'm not the OP.)


    Do not get me wrong, I am learning a lot in this thread here, much of
    the stuff is far beyond my level of expertise in shell programming, and
    it would be great to have a shell-only solution for your inquiry, even
    if only for "academic reasons" because, say, the solution (still to
    come) may turn out to be too clumsy for daily use). I will applaud such
    a result, but for the time being I would be happy if you could elaborate somewhat more about your motivation for this exercise.

    Since the OP is (usually) very clear about excluding some solutions
    as acceptable by him I haven't bothered replying. - But if you're
    asking I'd probably try for a shell-only solution along the way of
    doing a substitution like arr=( ${var//[$'\n';]/|} ) to fill an
    array, whereby defining 'IFS' appropriately (using the '|' here).
    (This is just the outline of an idea not a solution, so be careful
    with that fragment!)

    To add another thought to the original question, and since I know
    that the OP already has the relevant experience for that; for such
    a basic function writing a shell built-in could be appropriate.
    (That's not portable, but I think the OP doesn't care about that.)

    Janis


    Best regards

    Axel


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jerry Peters@21:1/5 to Lawrence D'Oliveiro on Sun Nov 17 00:43:31 2024
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
    On Sun, 10 Nov 2024 00:51:58 -0000 (UTC), Lem Novantotto wrote:

    But you could have more using IFS.

    IFS is the way to go.

    If you want your change to IFS to be only temporary, you can restrict it
    to a subshell by putting the code sequence in ???( ... )???. But then you cannot pass variables back to the parent shell.

    Another option is to use a coproc command.

    Or put it in a function and declare IFS local.
    ~$ xxx () { local IFS="$IFS|" ; echo "$IFS" ; }
    ~$ xxx

    |
    ~$ echo $IFS

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Kenny McCormack on Sun Nov 10 14:05:29 2024
    On 10.11.2024 10:32, Kenny McCormack wrote:

    In fact, one of my motivations for posting is the hope that a bash dev
    might see this and be motivated to add the functionality to the shell.
    Then I wouldn't have to write it myself.

    :-)


    That and/or be motivated to fix the IFS handling in the shell.

    What specifically are you referring to? - I can't imagine that there's
    a bash specific bug with IFS since IFS handling is a very old concept
    in Unixes' shell. - OTOH, I cannot think about changing long existing
    behavior without breaking tons of code. This would at best [in bash]
    lead to yet another 'shopt' setting to be explicitly activated. - But
    how should it then behave?

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lem Novantotto@21:1/5 to All on Sun Nov 10 13:28:32 2024
    Il Sun, 10 Nov 2024 01:25:49 -0000 (UTC), Lawrence D'Oliveiro ha scritto:

    IFS is the way to go.

    I agree.

    I've thought of it again, but besides the fact that IFS can handle only one-char delimiters (which wasn't the matter in the OP's example)... I
    actually cannot fancy any major issues with it (that furthermore would
    need a fix). My bad, maybe.
    --
    Bye, Lem

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to mail@axel-reichert.de on Mon Dec 9 12:32:41 2024
    In article <87msi74ia2.fsf@axel-reichert.de>,
    Axel Reichert <mail@axel-reichert.de> wrote:
    gazelle@shell.xmission.com (Kenny McCormack) writes:

    Kenny McCormack <gazelle@shell.xmission.com> wrote:
    mapfile -td ';' < <(someprocess | awk 1 ORS=)

    [...]

    mapfile -t < <(someprocess | sed 's/;/\n/g')

    And in your original post you wrote:

    There is a feature that is prominently missing from the shell language
    (I am speaking primarily of bash here) - which is the ability to split
    a string on a delimiter. This is a common operation in most other
    text-processing oriented languages (AWK, Perl, etc).

    So why bother with a shell solution and why bother with avoiding IFS,
    when in the end you need to resort to AWK/sed anyway?

    Because I need the result in the shell script.

    (Obviously) Re-writing the whole app in AWK is not an option. Certainly
    not at this point in time.

    Do not get me wrong, I am learning a lot in this thread here, much of
    the stuff is far beyond my level of expertise in shell programming, and
    it would be great to have a shell-only solution for your inquiry, even
    if only for "academic reasons" because, say, the solution (still to
    come) may turn out to be too clumsy for daily use). I will applaud such
    a result, but for the time being I would be happy if you could elaborate >somewhat more about your motivation for this exercise.

    Primarily in the hopes that someone on the dev side of my chosen shell
    (bash) will see it and say to themselves "Hey, that's a good idea."

    I prefer to post about these sorts of things here (by "here", I mean
    Usenet). I'm too lazy to participate in the "official" channels for the
    FOSS software that I use.

    --
    Debating creationists on the topic of evolution is rather like trying to
    play chess with a pigeon --- it knocks the pieces over, craps on the
    board, and flies back to its flock to claim victory.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)