Package: cron
Version: 3.0pl1-162
Severity: normal
X-Debbugs-Cc:
debian+bugs@jasiak.dev
Dear Maintainer,
* What led up to the situation?
Installation of libpam-fscrypt and the following crontab after a few
hours:
* * * * * touch /foo
* * * * * touch /foo
# ... repeat more times to increase probability of deadlock
* * * * * touch /foo
* What was the outcome of this action?
# ps auxf
root 432 0.0 0.2 6608 2732 ? Ss 22:14 0:00 /usr/sbin/cron -f
root 4823 0.0 1.4 1090900 14112 ? Sl 22:21 0:00 \_ /usr/sbin/CRON -f
root 5174 0.0 1.2 960148 12016 ? Ss 22:21 0:00 | \_ /usr/sbin/CRON -f
root 8095 0.0 1.4 1091156 14180 ? Sl 22:25 0:00 \_ /usr/sbin/CRON -f
root 8337 0.0 1.2 1025940 12032 ? Ss 22:25 0:00 | \_ /usr/sbin/CRON -f
root 12228 0.0 1.4 1090904 13868 ? Sl 22:30 0:00 \_ /usr/sbin/CRON -f
root 12519 0.0 1.2 1025688 12036 ? Ss 22:30 0:00 \_ /usr/sbin/CRON -f
/usr/sbin/CRON processes are deadlocked (in pam_fscrypt.so) and
resources are not released until manual intervention.
* What outcome did you expect instead?
I expected that touch /foo will be executed and CRON processes will
exit.
* More debugging:
My understanding of ithe relevant cron control flow:
* [cron] decides that it's time to execute entry X
* [cron] fork is called
* [child] prepares environment
* [child] fork is called
* [child] calls wait and waits for [grandchild] death
* [grandchild] continues to prepare environment
* [grandchild] execve final command
Let's focus on the following pids: 4823 [child], 5174 [grandchild]:
# cat /proc/4823/task/*/stack
[<0>] do_wait+0x160/0x2f0
[<0>] kernel_wait4+0xb4/0x160
[<0>] __do_sys_wait4+0xa2/0xb0
[<0>] do_syscall_64+0x55/0xb0
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[<0>] futex_wait_queue+0x60/0x90
[<0>] futex_wait+0x185/0x270
[<0>] do_futex+0x106/0x1b0
[<0>] __x64_sys_futex+0x8e/0x1d0
[<0>] do_syscall_64+0x55/0xb0
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[<0>] futex_wait_queue+0x60/0x90
[<0>] futex_wait+0x185/0x270
[<0>] do_futex+0x106/0x1b0
[<0>] __x64_sys_futex+0x8e/0x1d0
[<0>] do_syscall_64+0x55/0xb0
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[<0>] futex_wait_queue+0x60/0x90
[<0>] futex_wait+0x185/0x270
[<0>] do_futex+0x106/0x1b0
[<0>] __x64_sys_futex+0x8e/0x1d0
[<0>] do_syscall_64+0x55/0xb0
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[<0>] futex_wait_queue+0x60/0x90
[<0>] futex_wait+0x185/0x270
[<0>] do_futex+0x106/0x1b0
[<0>] __x64_sys_futex+0x8e/0x1d0
[<0>] do_syscall_64+0x55/0xb0
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[<0>] futex_wait_queue+0x60/0x90
[<0>] futex_wait+0x185/0x270
[<0>] do_futex+0x106/0x1b0
[<0>] __x64_sys_futex+0x8e/0x1d0
[<0>] do_syscall_64+0x55/0xb0
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Based on source code futex (and threads that are calling futex) are not
part of cron codebase.
# gdb -p 4823
(gdb) thread apply all bt
Thread 6 (Thread 0x7f619ffff6c0 (LWP 5167) "cron"):
#0 0x00007f61d20bbea3 in ?? () from /lib/x86_64-linux-gnu/security/pam_fscrypt.so
#1 0x00007f61d20867f6 in ?? () from /lib/x86_64-linux-gnu/security/pam_fscrypt.so
#2 0x00007f61d240df58 in ?? () from /lib/x86_64-linux-gnu/security/pam_fscrypt.so
#3 0x0000000000000080 in ?? ()
#4 0x0000000000000000 in ?? ()
Thread 5 (Thread 0x7f61a4ffd6c0 (LWP 5162) "cron"):
#0 0x00007f61d20bbea3 in ?? () from /lib/x86_64-linux-gnu/security/pam_fscrypt.so
#1 0x00007f61d20867f6 in ?? () from /lib/x86_64-linux-gnu/security/pam_fscrypt.so
#2 0x000000c000038d48 in ?? ()
#3 0x0000000000000080 in ?? ()
#4 0x0000000000000000 in ?? ()
Thread 4 (Thread 0x7f61a57fe6c0 (LWP 5085) "cron"):
#0 0x00007f61d20bbea3 in ?? () from /lib/x86_64-linux-gnu/security/pam_fscrypt.so
#1 0x00007f61d20867f6 in ?? () from /lib/x86_64-linux-gnu/security/pam_fscrypt.so
#2 0x000000c000038948 in ?? ()
#3 0x0000000000000080 in ?? ()
#4 0x0000000000000000 in ?? ()
Thread 3 (Thread 0x7f61a5fff6c0 (LWP 5084) "cron"):
#0 0x00007f61d20bbea3 in ?? () from /lib/x86_64-linux-gnu/security/pam_fscrypt.so
#1 0x00007f61d2086871 in ?? () from /lib/x86_64-linux-gnu/security/pam_fscrypt.so
#2 0x00007f61d23df9b8 in ?? () from /lib/x86_64-linux-gnu/security/pam_fscrypt.so
#3 0x0000000000000080 in ?? ()
#4 0x00007f61a5ffec38 in ?? ()
#5 0x0000000000000000 in ?? ()
Thread 2 (Thread 0x7f61d1fff6c0 (LWP 4967) "cron"):
#0 0x00007f61d20bbea3 in ?? () from /lib/x86_64-linux-gnu/security/pam_fscrypt.so
#1 0x00007f61d20867f6 in ?? () from /lib/x86_64-linux-gnu/security/pam_fscrypt.so
#2 0x00007f61d23df608 in ?? () from /lib/x86_64-linux-gnu/security/pam_fscrypt.so
#3 0x0000000000000080 in ?? ()
#4 0x0000000000000000 in ?? ()
Thread 1 (Thread 0x7f61d2985840 (LWP 4823) "cron"):
#0 0x00007f61d2b30bd7 in __GI___wait4 (pid=-1, stat_loc=0x7ffe3ad2aae4, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
#1 0x000055a12c53bbda in ?? ()
#2 0x000055a12c53c675 in ?? ()
#3 0x000055a12c53b678 in ?? ()
#4 0x000055a12c538d05 in ?? ()
#5 0x00007f61d2a8424a in __libc_start_call_main (main=main@entry=0x55a12c538860, argc=argc@entry=2, argv=argv@entry=0x7ffe3ad2c588) at ../sysdeps/nptl/libc_start_call_main.h:58
#6 0x00007f61d2a84305 in __libc_start_main_impl (main=0x55a12c538860, argc=2, argv=0x7ffe3ad2c588, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe3ad2c578) at ../csu/libc-start.c:360
#7 0x000055a12c538df1 in ?? ()
gdb confirms they are part of pam_fscrypt.so. I guess it's a golang
runtime.
# cat /proc/5174/task/*/stack
[<0>] futex_wait_queue+0x60/0x90
[<0>] futex_wait+0x185/0x270
[<0>] do_futex+0x106/0x1b0
[<0>] __x64_sys_futex+0x8e/0x1d0
[<0>] do_syscall_64+0x55/0xb0
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
# gdb -p 5174
Thread 1 (Thread 0x7f61d2985840 (LWP 5174) "cron"):
#0 0x00007f61d20bbea3 in ?? () from /lib/x86_64-linux-gnu/security/pam_fscrypt.so
#1 0x00007f61d20867f6 in ?? () from /lib/x86_64-linux-gnu/security/pam_fscrypt.so
#2 0x000000c000038148 in ?? ()
#3 0x0000000000000080 in ?? ()
#4 0x0000000000000000 in ?? ()
grandchild has only one thread and it also waits in pam_fscrypt.so.
For the remaining deadlocked CRON processes results are the same.
In do_command.c both child [4823] and grandchild [5174] enters pam. My
theory is that on first enter to pam we start golang runtime and on
second enter if we forked when child runtime was in "bad" state
grandchild will just deadlock waiting for action from missing threads
(the child process is created with a single thread - the one that
called fork()).
The easy workaround that works for me is to modify pam config for cron (/etc/pam.d/cron):
# The PAM configuration file for the cron daemon
@include common-auth # this line needs to be replaced with content of
# common-auth without pam_fscrypt.so
# Sets the loginuid process attribute
session required pam_loginuid.so
# Read environment variables from pam_env's default files, /etc/environment
# and /etc/security/pam_env.conf.
session required pam_env.so
# In addition, read system locale information
session required pam_env.so envfile=/etc/default/locale
@include common-account
@include common-session-noninteractive
# Sets up user limits, please define limits for cron tasks
# through /etc/security/limits.conf
session required pam_limits.so
The same issue exist in sid - this report was prepared on clean stable installation.
As an alternative I tried with cronie where everything works fine -
I didn't find pam enter in grandchild.
Tested on:
stable:
cron version: 3.0pl1-162
libpam-fscrypt version: 0.3.3-1+b6
sid:
cron version: 3.0pl1-189
libpam-fscrypt version: 0.3.5-1
cronie: 1.7.2-3
-- Package-specific info:
--- EDITOR:
--- /usr/bin/editor:
/usr/bin/nano
--- /usr/bin/crontab:
-rwxr-sr-x 1 root crontab 43648 Mar 2 2023 /usr/bin/crontab
--- /var/spool/cron:
drwxr-xr-x 3 root root 4096 Dec 20 19:38 /var/spool/cron
--- /var/spool/cron/crontabs:
drwx-wx--T 2 root crontab 4096 Dec 27 22:30 /var/spool/cron/crontabs
--- /etc/cron.d:
drwxr-xr-x 2 root root 4096 Dec 20 19:49 /etc/cron.d
--- /etc/cron.daily:
drwxr-xr-x 2 root root 4096 Dec 20 19:52 /etc/cron.daily
--- /etc/cron.hourly:
drwxr-xr-x 2 root root 4096 Dec 20 19:38 /etc/cron.hourly
--- /etc/cron.monthly:
drwxr-xr-x 2 root root 4096 Dec 20 19:38 /etc/cron.monthly
--- /etc/cron.weekly:
drwxr-xr-x 2 root root 4096 Dec 20 19:52 /etc/cron.weekly
-- System Information:
Debian Release: 12.8
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 6.1.0-28-amd64 (SMP w/2 CPU threads; PREEMPT)
Locale: LANG=en_US.UTF-8, LC_CTY