• Cleaning up background processes

    From Christian Weisgerber@21:1/5 to All on Sun May 5 19:06:22 2024
    Is there a standard POSIX shell idiom to clean up background
    processes?

    You have a shell script that starts some background process with &.
    Now you want to make sure that the background process terminates
    when the shell script terminates. In particular, when it terminates
    due to special circumstances.

    A noninteractive shell doesn't use job control, so the background
    process shares the same process group. That's great! When somebody
    hits ^C, SIGINT is sent to the whole process group. The same for
    ^\ and SIGQUIT, and for SIGHUP when the modem hangs up^W^W^Wxterm
    is closed. So the background process will be terminated by default.

    Except... bash seems to block SIGINT for background processes. As
    does FreeBSD's sh with both SIGINT and SIGQUIT. What now?

    Also, the shell script is typically invoked from some implementation
    of make(1), which seems to add more complications.

    This seems like a sufficiently common problem that there must be a
    standard solution.

    --
    Christian "naddy" Weisgerber naddy@mips.inka.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Christian Weisgerber on Sun May 5 20:21:56 2024
    On Sun, 5 May 2024 19:06:22 -0000 (UTC), Christian Weisgerber wrote:

    You have a shell script that starts some background process with &.
    Now you want to make sure that the background process terminates
    when the shell script terminates.

    There is no standard POSIX solution to this. There is a Linux-specific solution, through the prctl(2) call <https://manpages.debian.org/2/prctl.en.html>: in a child process, you
    make a call with PR_SET_PDEATHSIG and specify a signal to be sent to
    the child if/when the parent terminates, such as SIGKILL.

    Note also the PR_SET_CHILD_SUBREAPER function: this is useful if a
    child process spawns its own child and then dies; this allows the
    subreaper process to get the notification when that child-of-a-child terminates, instead of passing that buck back to pid 1.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christian Weisgerber@21:1/5 to Christian Weisgerber on Sun May 5 19:52:18 2024
    On 2024-05-05, Christian Weisgerber <naddy@mips.inka.de> wrote:

    Except... bash seems to block SIGINT for background processes. As
    does FreeBSD's sh with both SIGINT and SIGQUIT. What now?

    On repeat experiment, both bash and FreeBSD sh only block SIGINT
    for background processes.

    --
    Christian "naddy" Weisgerber naddy@mips.inka.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From vallor@21:1/5 to naddy@mips.inka.de on Mon May 6 00:19:32 2024
    On Sun, 5 May 2024 19:06:22 -0000 (UTC), Christian Weisgerber <naddy@mips.inka.de> wrote in <slrnv3fm5e.jrj.naddy@lorvorc.mips.inka.de>:

    Is there a standard POSIX shell idiom to clean up background processes?

    You have a shell script that starts some background process with &. Now
    you want to make sure that the background process terminates when the
    shell script terminates. In particular, when it terminates due to
    special circumstances.

    A noninteractive shell doesn't use job control, so the background
    process shares the same process group. That's great! When somebody
    hits ^C, SIGINT is sent to the whole process group. The same for ^\ and SIGQUIT, and for SIGHUP when the modem hangs up^W^W^Wxterm is closed.
    So the background process will be terminated by default.

    Except... bash seems to block SIGINT for background processes. As does FreeBSD's sh with both SIGINT and SIGQUIT. What now?

    Also, the shell script is typically invoked from some implementation of make(1), which seems to add more complications.

    This seems like a sufficiently common problem that there must be a
    standard solution.

    I have scripts that need to kill off processes started with
    "&", and there's a couple ways to do it.

    One way is "kill -1 0" -- sending the HUP signal with "0"
    as the process number will send the signal to the process group.

    Another way is to remember the process, such as:

    sleep 1000000 & # backgrounded process
    BACKGROUND=$!

    And then set a trap:

    trap "kill -1 $BACKGROUND" 0

    You can get even fancier with something like:

    hang_up_the_phone()
    {
    HPID=$1;
    kill -0 $HPID > /dev/null 2>&1 && kill -1 $HPID;
    }

    trap "hang_up_the_phone $BACKGROUND" 0

    I have a script that uses both, if you'd like me
    to post it -- it runs xdaliclock in timer mode, as well
    as a sleep for the length of the timer. For that, the
    trap looks like this:

    trap "hang_up_the_phone $BACK1 ; \
    hang_up_the_phone $BACK2 ; " 0 CHLD;

    --
    -v

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Christian Weisgerber on Mon May 6 04:46:59 2024
    On 2024-05-05, Christian Weisgerber <naddy@mips.inka.de> wrote:
    Is there a standard POSIX shell idiom to clean up background
    processes?

    You have a shell script that starts some background process with &.
    Now you want to make sure that the background process terminates
    when the shell script terminates. In particular, when it terminates
    due to special circumstances.

    Maybe have an EXIT trap which calls wait?

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to 643-408-1753@kylheku.com on Mon May 6 12:08:39 2024
    In article <20240505214609.114@kylheku.com>,
    Kaz Kylheku <643-408-1753@kylheku.com> wrote:
    On 2024-05-05, Christian Weisgerber <naddy@mips.inka.de> wrote:
    Is there a standard POSIX shell idiom to clean up background
    processes?

    You have a shell script that starts some background process with &.
    Now you want to make sure that the background process terminates
    when the shell script terminates. In particular, when it terminates
    due to special circumstances.

    Maybe have an EXIT trap which calls wait?

    The fundamental underlying problem here is that the EXIT trap is only
    called on a "normal" exit. In particular, it does not get called under (at least) the following circumstances:

    1) User hits ^C causing the script to abort.

    2) Script exits via an "exec" statement.

    As I read it, that's what this thread is actually about.

    This is a problem I've often grappled with and I'm convinced that there is
    no universal solution.

    Having said that, I think we are all making our own assumptions about what
    the actual, underlying problem is. Given that OP is not a newbie, it would help a lot if he would clarify what exact situation he is dealing with,
    rather than have us all guess (which is SOP when the poster *is* a newbie).

    --
    Indeed, most .NET developers couldn't pass CS101 at a third-rate
    community college.
    - F. Russell -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From vallor@21:1/5 to McCormack on Wed May 8 21:04:58 2024
    On Mon, 6 May 2024 12:08:39 -0000 (UTC), gazelle@shell.xmission.com (Kenny McCormack) wrote in <v1ah87$l0a8$1@news.xmission.com>:

    In article <20240505214609.114@kylheku.com>,
    Kaz Kylheku <643-408-1753@kylheku.com> wrote:
    On 2024-05-05, Christian Weisgerber <naddy@mips.inka.de> wrote:
    Is there a standard POSIX shell idiom to clean up background
    processes?

    You have a shell script that starts some background process with &.
    Now you want to make sure that the background process terminates when
    the shell script terminates. In particular, when it terminates due to
    special circumstances.

    Maybe have an EXIT trap which calls wait?

    The fundamental underlying problem here is that the EXIT trap is only
    called on a "normal" exit. In particular, it does not get called under
    (at least) the following circumstances:

    1) User hits ^C causing the script to abort.

    I'm sorry, but I've waited and nobody said anything, so
    I have to ask: Why couldn't you trap "kill -1 0" INT?

    2) Script exits via an "exec" statement.

    This is true, and I can't see any way around that.

    As I read it, that's what this thread is actually about.

    This is a problem I've often grappled with and I'm convinced that there
    is no universal solution.

    Having said that, I think we are all making our own assumptions about
    what the actual, underlying problem is. Given that OP is not a newbie,
    it would help a lot if he would clarify what exact situation he is
    dealing with, rather than have us all guess (which is SOP when the
    poster *is* a newbie).

    Wish they would clarify what they meant...

    --
    -v

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to vallor on Thu May 9 01:06:18 2024
    On 08.05.2024 23:04, vallor wrote:
    On Mon, 6 May 2024 12:08:39 -0000 (UTC), gazelle@shell.xmission.com (Kenny McCormack) wrote in <v1ah87$l0a8$1@news.xmission.com>:

    In article <20240505214609.114@kylheku.com>,
    Kaz Kylheku <643-408-1753@kylheku.com> wrote:
    On 2024-05-05, Christian Weisgerber <naddy@mips.inka.de> wrote:
    Is there a standard POSIX shell idiom to clean up background
    processes?

    You have a shell script that starts some background process with &.
    Now you want to make sure that the background process terminates when
    the shell script terminates. In particular, when it terminates due to >>>> special circumstances.

    Maybe have an EXIT trap which calls wait?

    'wait' wouldn't abort the process (as the OP seems to have required).


    The fundamental underlying problem here is that the EXIT trap is only
    called on a "normal" exit. In particular, it does not get called under
    (at least) the following circumstances:

    1) User hits ^C causing the script to abort.

    I'm sorry, but I've waited and nobody said anything, so

    I hadn't answered because it doesn't work as had been advertised. With

    trap 'echo INT' INT
    trap 'echo EXIT' EXIT

    and hitting ^C I get both signals (INT) and pseudo-signal (EXIT)
    in that order (with ksh, bash) [and it makes of course sense]

    INT
    EXIT

    so it would be no problem, in my book.

    But the pseudo-signal 'EXIT' is non-standard (to my knowledge), so you
    cannot generally rely on it, depending on your environment, and the OP
    asked for a "standard POSIX shell idiom".

    I have to ask: Why couldn't you trap "kill -1 0" INT?

    Storing PIDs and killing the respective processes on exit is something
    I had done in the past, making use of the process groups is something
    I'd also consider; both things you already proposed upthread, IIRC.

    Janis

    [...]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Geoff Clare@21:1/5 to Janis Papanagnou on Fri May 10 13:35:37 2024
    Janis Papanagnou wrote:

    But the pseudo-signal 'EXIT' is non-standard (to my knowledge), so you
    cannot generally rely on it, depending on your environment, and the OP
    asked for a "standard POSIX shell idiom".

    Quoting POSIX.2-1992, 3.14.13:

    The condition can be EXIT; 0 (equivalent to EXIT); or a signal
    specified using a symbolic name, without the SIG prefix, as listed
    in Required Signals and Job Control Signals (Table 3-1 and Table 3-2
    in POSIX.1 {8}). (For example: HUP, INT, QUIT, TERM). Setting a
    trap for SIGKILL or SIGSTOP produces undefined results.

    It has hardly changed in the current standard (POSIX.1-2017):

    The condition can be EXIT, 0 (equivalent to EXIT), or a signal
    specified using a symbolic name, without the SIG prefix, as listed
    in the tables of signal names in the <signal.h> header defined in
    XBD Chapter 13; for example, HUP, INT, QUIT, TERM. Implementations
    may permit names with the SIG prefix or ignore case in signal
    names as an extension. Setting a trap for SIGKILL or SIGSTOP
    produces undefined results.

    --
    Geoff Clare <netnews@gclare.org.uk>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Geoff Clare on Fri May 10 14:57:41 2024
    On 10.05.2024 14:35, Geoff Clare wrote:
    Janis Papanagnou wrote:

    But the pseudo-signal 'EXIT' is non-standard (to my knowledge), so you
    cannot generally rely on it, depending on your environment, and the OP
    asked for a "standard POSIX shell idiom".

    Quoting POSIX.2-1992, 3.14.13:

    The condition can be EXIT; 0 (equivalent to EXIT); or a signal
    specified using a symbolic name, without the SIG prefix, as listed
    in Required Signals and Job Control Signals (Table 3-1 and Table 3-2
    in POSIX.1 {8}). (For example: HUP, INT, QUIT, TERM). Setting a
    trap for SIGKILL or SIGSTOP produces undefined results.

    It has hardly changed in the current standard (POSIX.1-2017):

    The condition can be EXIT, 0 (equivalent to EXIT), or a signal
    specified using a symbolic name, without the SIG prefix, as listed
    in the tables of signal names in the <signal.h> header defined in
    XBD Chapter 13; for example, HUP, INT, QUIT, TERM. Implementations
    may permit names with the SIG prefix or ignore case in signal
    names as an extension. Setting a trap for SIGKILL or SIGSTOP
    produces undefined results.


    That's good. (And I was obviously misremembering.) Thanks!

    BTW, I'm astonished about the "undefined results" for KILL/STOP. Yes,
    on OS-level they cannot be caught but why undefined behavior on shell
    level; what is the reason or practical rationale for that?

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christian Weisgerber@21:1/5 to Kenny McCormack on Sat May 11 14:30:42 2024
    On 2024-05-06, Kenny McCormack <gazelle@shell.xmission.com> wrote:

    Having said that, I think we are all making our own assumptions about what the actual, underlying problem is. Given that OP is not a newbie, it would help a lot if he would clarify what exact situation he is dealing with, rather than have us all guess (which is SOP when the poster *is* a newbie).

    As part of a regression test suite, somebody wrote something like
    this in a script:

    ./http-server &
    trap "kill %1" HUP INT QUIT PIPE TERM
    sleep 1 # server starts up

    [... Tests ...]

    kill %1
    wait %1 # wait for http-server

    Using job control syntax in a noninteractive shell is already wrong,
    although, to my surprise, it works as intended in common shells.

    I've been looking at how to replace this with a portable-ish,
    reliable-ish solution. An obvious step is to use a standard
    background process reference $! instead of job control %1.
    Less obvious is the question how to deal with signals that abort
    the script, without leaving background processes hanging around.

    After remembering process groups, I thought I wouldn't need to trap
    any signal at all, because common interruptions (hangup, intr, quit)
    send signals to the whole group. Then I discovered that FreeBSD
    sh and bash start background processes with SIGINT ignored. Sigh.
    So I need to trap INT and convert it to some other signal.

    Another complication is that the script is typically invoked from
    make(1), so we have a make(1) process, an sh(1) process executing
    the script, potential children, and the background process it spawned,
    all in the same process group. That means that the approach "trap
    INT and manually signal process group" also signals make(1) with
    whatever signal handling that carries. And I haven't checked GNU
    make yet.

    I also had forgotten about "kill -<sig> 0" and was playing around
    with "kill -<sig> -$$", which is wrong when make(1) is the process
    group leader rather than sh(1).

    --
    Christian "naddy" Weisgerber naddy@mips.inka.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christian Weisgerber@21:1/5 to vallor on Sat May 11 14:37:57 2024
    On 2024-05-08, vallor <vallor@cultnix.org> wrote:

    I'm sorry, but I've waited and nobody said anything, so
    I have to ask: Why couldn't you trap "kill -1 0" INT?

    There is the cosmetic problem that you'll get termination notices
    referring to the new signal ("Hangup"), so I'll go with the less
    confusing

    trap "kill -TERM 0" INT

    but yes, it may be as simple as that. Let me run some tests...

    --
    Christian "naddy" Weisgerber naddy@mips.inka.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to naddy@mips.inka.de on Sat May 11 19:36:45 2024
    In article <slrnv3v08i.17e3.naddy@lorvorc.mips.inka.de>,
    Christian Weisgerber <naddy@mips.inka.de> wrote:
    On 2024-05-06, Kenny McCormack <gazelle@shell.xmission.com> wrote:

    Having said that, I think we are all making our own assumptions about what >> the actual, underlying problem is. Given that OP is not a newbie, it would >> help a lot if he would clarify what exact situation he is dealing with,
    rather than have us all guess (which is SOP when the poster *is* a newbie).

    As part of a regression test suite, somebody wrote something like
    this in a script:

    ./http-server &
    trap "kill %1" HUP INT QUIT PIPE TERM
    sleep 1 # server starts up

    [... Tests ...]

    kill %1
    wait %1 # wait for http-server

    Using job control syntax in a non-interactive shell is already wrong, >although, to my surprise, it works as intended in common shells.

    Thanks for the followup. I agree that using job control syntax in a (non-interactive) script is weird and to be avoided - even if it does "work". It should be good enough to just store the value of $! and then use that
    later on.

    I also believe that there is no "silver bullet" solution to this problem.
    There should be; it is shame that there isn't. You really should not even
    find yourself in the situation of having to hunt down and kill wayward processes when something unexpected happens in a script, but the fact is
    you do.

    For an example of this, I give you: sshfs. I love sshfs, but it is a fact
    that if you try to sshfs-mount something that "isn't there" (for various
    values of "isn't there"), it creates a mess that takes several steps to
    undo (clean up). I have successfully tackled this problem, but it is non-trivial, which is not unexpected, considering how many moving parts are involved in an sshfs mount operation.

    I've been looking at how to replace this with a portable-ish,
    reliable-ish solution. An obvious step is to use a standard
    background process reference $! instead of job control %1.
    Less obvious is the question how to deal with signals that abort
    the script, without leaving background processes hanging around.

    Someone mentioned using prctl() and PR_DEATHSIG earlier. I think that's a
    good idea, albeit not universal and, of course, Linux-specific. I think it would work in your specific case of a wayward http server.

    --
    Faith doesn't give you the answers; it just stops you from asking the questions.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to naddy@mips.inka.de on Sat May 11 19:24:37 2024
    In article <slrnv3v0m5.17e3.naddy@lorvorc.mips.inka.de>,
    Christian Weisgerber <naddy@mips.inka.de> wrote:
    On 2024-05-08, vallor <vallor@cultnix.org> wrote:

    I'm sorry, but I've waited and nobody said anything, so
    I have to ask: Why couldn't you trap "kill -1 0" INT?

    There is the cosmetic problem that you'll get termination notices
    referring to the new signal ("Hangup"), so I'll go with the less
    confusing

    trap "kill -TERM 0" INT

    but yes, it may be as simple as that. Let me run some tests...

    I don't get it. Is there any significant difference between hitting it with TERM vs. HUP? I think either/both will generate the message to which you allude.

    Anyway, to answer the "Can't you just...?" question posed earlier, the
    answer is there really is no general answer - one that will work in all
    cases, all the time. Answering the overall thread question is also both
    shell- and OS- specific.

    Personally, I'd be happy to have a good, solid solution that was both Linux- and bash-specific, but it sounds like OP is (unfortunately) running some version of BSD.

    I will expand on this further in my next post.

    --

    "This ain't my first time at the rodeo"

    is a line from the movie, Mommie Dearest, said by Joan Crawford at a board meeting.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christian Weisgerber@21:1/5 to Kenny McCormack on Sat May 11 22:08:16 2024
    On 2024-05-11, Kenny McCormack <gazelle@shell.xmission.com> wrote:

    I have to ask: Why couldn't you trap "kill -1 0" INT?

    trap "kill -TERM 0" INT

    I don't get it. Is there any significant difference between hitting it with TERM vs. HUP?

    I find "Terminated" less confusing than "Hangup", that's all.
    Of course it would be even better if I could keep the asynchronous
    process from ignoring SIGINT in the first place... Oh, maybe I can!

    Instead of

    foo &

    I can run

    (trap - INT; exec foo) &

    and indeed that seems to restore the default behavior, i.e., terminate
    the process, for both FreeBSD sh and bash. Anybody see any problem
    with that approach?

    I'd also be interested in historical insights how this "ignore SIGINT
    for asynchronous processes" behavior came to be.

    --
    Christian "naddy" Weisgerber naddy@mips.inka.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Geoff Clare@21:1/5 to Christian Weisgerber on Mon May 13 14:14:40 2024
    Christian Weisgerber wrote:

    Instead of

    foo &

    I can run

    (trap - INT; exec foo) &

    and indeed that seems to restore the default behavior, i.e., terminate
    the process, for both FreeBSD sh and bash. Anybody see any problem
    with that approach?

    It doesn't/didn't work in some shells, according the POSIX rationale
    (XRAT C.2.11 "Signals and Error Handling"):

    Historically, some shell implementations silently ignored attempts
    to use trap to set SIGINT or SIGQUIT to the default action or to
    set a trap for them after they have been set to be ignored by the
    shell when it executes an asynchronous subshell (and job control
    is disabled). This behavior is not conforming. For example, if a
    shell script containing the following line is run in the
    foreground at a terminal:

    (trap - INT; exec sleep 10) & wait

    and is then terminated by typing the interrupt character, this
    standard requires that the sleep command is terminated by the
    SIGINT signal.

    I don't know which shells were affected or whether any might still
    be in use (without the behaviour having been corrected).

    I'd also be interested in historical insights how this "ignore SIGINT
    for asynchronous processes" behavior came to be.

    That's how shells behaved before job control was invented. It was
    how you could interrupt a foreground process without the signal
    affecting background processes.

    --
    Geoff Clare <netnews@gclare.org.uk>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martijn Dekker@21:1/5 to All on Sun May 19 20:34:41 2024
    Op 10-05-2024 om 13:57 schreef Janis Papanagnou:
    BTW, I'm astonished about the "undefined results" for KILL/STOP. Yes,
    on OS-level they cannot be caught but why undefined behavior on shell
    level; what is the reason or practical rationale for that?

    I suspect it was to allow the shell to error out on an attempt to trap those signals. As far as I know, yash is the only shell that actually does that.

    --
    || modernish -- harness the shell
    || https://github.com/modernish/modernish
    ||
    || KornShell lives!
    || https://github.com/ksh93/ksh

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)