Sysop: | Amessyroom |
---|---|
Location: | Fayetteville, NC |
Users: | 23 |
Nodes: | 6 (0 / 6) |
Uptime: | 52:25:27 |
Calls: | 583 |
Files: | 1,139 |
D/L today: |
179 files (27,921K bytes) |
Messages: | 111,611 |
In a context with RAM+SWAP = 704 GiBytes (192 GiBytes being RAM,A panic from the activity during another bulk -Ca
512 GiBytes being SWAP) doing poudriere bulk -Ca builds at some
point ends up with reports like:
swp_pager_getswapspace(22): failed
and:
was killed: failed to reclaim memory
for 12 builders, MAKE_JOBS_NUMBER=3 , TMPFS_BLACKLIST
in use, 32 FreeBSD cpus, etc.
For example:
. . .
Jul 22 10:17:27 7950X3D-ZFS kernel: pid 62915 (scc_16815), jid 780, uid 0: exited on signal 11 (core dumped)
Jul 22 21:38:10 7950X3D-ZFS kernel: ue0: link state changed to DOWN
Jul 22 21:38:10 7950X3D-ZFS kernel: ue0: link state changed to UP
Jul 22 21:38:29 7950X3D-ZFS kernel: swap_pager: out of swap space
Jul 22 21:38:29 7950X3D-ZFS kernel: swp_pager_getswapspace(22): failed
Jul 22 21:39:11 7950X3D-ZFS kernel: pid 15059 (dot), jid 780, uid 0, was killed: failed to reclaim memory
Jul 22 21:43:38 7950X3D-ZFS kernel: swap_pager: out of swap space
Jul 22 21:43:38 7950X3D-ZFS kernel: swp_pager_getswapspace(14): failed
Jul 22 21:44:04 7950X3D-ZFS kernel: pid 15049 (dot), jid 780, uid 0, was killed: failed to reclaim memory
Jul 22 21:56:39 7950X3D-ZFS kernel: swap_pager: out of swap space
Jul 22 21:56:39 7950X3D-ZFS kernel: swp_pager_getswapspace(15): failed
Jul 22 21:57:12 7950X3D-ZFS kernel: pid 15045 (dot), jid 780, uid 0, was killed: failed to reclaim memory
I've not figured out a way to track down such messages
back to the relevant log file for the builds that were
killed. Neither the pid, nor the jid appear in
the log files. Similarly, nothing in /var/log/messages
identifies the poudriere Job Id or other such.
(I've never happened to be actively monitoring when
the issue happened. So I've always ended up looking at
it after the fact.)
It would be nice to be able to identify what specific
packages to try to rebuild for these --and to investigate
why the SWAP usage that had stayed under 2 GiByte ended
up reaching 512 GiBytes during that period.
On Jul 23, 2025, at 01:42, Mark Millard <marklmi@yahoo.com> wrote:Dumb typo, missing the "u" in "gpu", so: graphics/sdl2_gpu
In a context with RAM+SWAP = 704 GiBytes (192 GiBytes being RAM,
512 GiBytes being SWAP) doing poudriere bulk -Ca builds at some
point ends up with reports like:
swp_pager_getswapspace(22): failed
and:
was killed: failed to reclaim memory
for 12 builders, MAKE_JOBS_NUMBER=3 , TMPFS_BLACKLIST
in use, 32 FreeBSD cpus, etc.
For example:
. . .
Jul 22 10:17:27 7950X3D-ZFS kernel: pid 62915 (scc_16815), jid 780, uid 0: exited on signal 11 (core dumped)
Jul 22 21:38:10 7950X3D-ZFS kernel: ue0: link state changed to DOWN
Jul 22 21:38:10 7950X3D-ZFS kernel: ue0: link state changed to UP
Jul 22 21:38:29 7950X3D-ZFS kernel: swap_pager: out of swap space
Jul 22 21:38:29 7950X3D-ZFS kernel: swp_pager_getswapspace(22): failed
Jul 22 21:39:11 7950X3D-ZFS kernel: pid 15059 (dot), jid 780, uid 0, was killed: failed to reclaim memory
Jul 22 21:43:38 7950X3D-ZFS kernel: swap_pager: out of swap space
Jul 22 21:43:38 7950X3D-ZFS kernel: swp_pager_getswapspace(14): failed
Jul 22 21:44:04 7950X3D-ZFS kernel: pid 15049 (dot), jid 780, uid 0, was killed: failed to reclaim memory
Jul 22 21:56:39 7950X3D-ZFS kernel: swap_pager: out of swap space
Jul 22 21:56:39 7950X3D-ZFS kernel: swp_pager_getswapspace(15): failed
Jul 22 21:57:12 7950X3D-ZFS kernel: pid 15045 (dot), jid 780, uid 0, was killed: failed to reclaim memory
I've not figured out a way to track down such messages
back to the relevant log file for the builds that were
killed. Neither the pid, nor the jid appear in
the log files. Similarly, nothing in /var/log/messages
identifies the poudriere Job Id or other such.
(I've never happened to be actively monitoring when
the issue happened. So I've always ended up looking at
it after the fact.)
It would be nice to be able to identify what specific
packages to try to rebuild for these --and to investigate
why the SWAP usage that had stayed under 2 GiByte ended
up reaching 512 GiBytes during that period.
A panic from the activity during another bulk -Ca
test lead to the dump providing enough context to
track down the package that was being built that
got the issue and what is was running that, in
turn, has the problem memory usage:
[2D:01:22:29] [06] [00:00:00] Building graphics/sdl2_gpu | sdl2_gpu-0.12.0
was using:
UID PID PPID C PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND
. . .
0 79229 40923 4 59 0 23524 4148 wait D - 0:00.00 [sh]
0 79230 79229 5 59 0 14208 172 wait Ds - 0:00.01 [make]
0 79233 79230 4 59 0 14668 176 wait D - 0:00.00 [sh]
0 79234 79233 5 59 0 14668 176 wait D - 0:00.00 [sh]
0 79235 79234 12 0 0 16284 356 select D - 0:00.01 [ninja]
0 79236 79235 28 59 0 223048 1052 uwait D - 0:00.44 [doxygen]
0 79272 79236 25 59 0 157589964 41424308 pfault D - 3:25.33 [dot]
0 79279 79236 31 59 0 157601740 41513520 pfault D - 3:23.41 [dot]
0 79289 79236 14 59 0 157589964 41361600 pfault D - 3:22.72 [dot]
0 79301 79236 18 49 0 157667276 41208476 pfault D - 3:24.32 [dot]
. . .
Part of the context was the /06/ text in:
. . .
root dot 79301 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
root dot 79289 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
. . .
root dot 79279 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
. . .
root dot 79272 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
. . .
root doxygen 79236 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
. . .
It identifies the [06] builder and the "Building" notice had made it to
the disk before the panic happened. Then I could check the Makefile for
if doxygen was used and it was. graphics/sdl2_gp historical build logs suggest problems exist.