Forum: Too Lazy BBS

Who's Online

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	23
Nodes:	6 (0 / 6)
Uptime:	52:25:27
Calls:	583
Files:	1,139
D/L today:	179 files (27,921K bytes)
Messages:	111,611

Odd "swp_pager_getswapspace(??): failed"s happen during bulk -Ca for RAM+SWAP=704 GiBytes

From Mark Millard@marklmi@yahoo.com to muc.lists.freebsd.ports on Wed Jul 23 01:42:02 2025

From Newsgroup: muc.lists.freebsd.ports

In a context with RAM+SWAP = 704 GiBytes (192 GiBytes being RAM,
512 GiBytes being SWAP) doing poudriere bulk -Ca builds at some
point ends up with reports like:
swp_pager_getswapspace(22): failed
and:
was killed: failed to reclaim memory
for 12 builders, MAKE_JOBS_NUMBER=3 , TMPFS_BLACKLIST
in use, 32 FreeBSD cpus, etc.
For example:
. . .
Jul 22 10:17:27 7950X3D-ZFS kernel: pid 62915 (scc_16815), jid 780, uid 0: exited on signal 11 (core dumped)
Jul 22 21:38:10 7950X3D-ZFS kernel: ue0: link state changed to DOWN
Jul 22 21:38:10 7950X3D-ZFS kernel: ue0: link state changed to UP
Jul 22 21:38:29 7950X3D-ZFS kernel: swap_pager: out of swap space
Jul 22 21:38:29 7950X3D-ZFS kernel: swp_pager_getswapspace(22): failed
Jul 22 21:39:11 7950X3D-ZFS kernel: pid 15059 (dot), jid 780, uid 0, was killed: failed to reclaim memory
Jul 22 21:43:38 7950X3D-ZFS kernel: swap_pager: out of swap space
Jul 22 21:43:38 7950X3D-ZFS kernel: swp_pager_getswapspace(14): failed
Jul 22 21:44:04 7950X3D-ZFS kernel: pid 15049 (dot), jid 780, uid 0, was killed: failed to reclaim memory
Jul 22 21:56:39 7950X3D-ZFS kernel: swap_pager: out of swap space
Jul 22 21:56:39 7950X3D-ZFS kernel: swp_pager_getswapspace(15): failed
Jul 22 21:57:12 7950X3D-ZFS kernel: pid 15045 (dot), jid 780, uid 0, was killed: failed to reclaim memory
I've not figured out a way to track down such messages
back to the relevant log file for the builds that were
killed. Neither the pid, nor the jid appear in
the log files. Similarly, nothing in /var/log/messages
identifies the poudriere Job Id or other such.
(I've never happened to be actively monitoring when
the issue happened. So I've always ended up looking at
it after the fact.)
It would be nice to be able to identify what specific
packages to try to rebuild for these --and to investigate
why the SWAP usage that had stayed under 2 GiByte ended
up reaching 512 GiBytes during that period.
===
Mark Millard
marklmi at yahoo.com
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21a-Linux NewsLink 1.2

From Mark Millard@marklmi@yahoo.com to muc.lists.freebsd.ports on Sun Jul 27 00:33:16 2025

From Newsgroup: muc.lists.freebsd.ports

On Jul 23, 2025, at 01:42, Mark Millard <marklmi@yahoo.com> wrote:

In a context with RAM+SWAP = 704 GiBytes (192 GiBytes being RAM,
512 GiBytes being SWAP) doing poudriere bulk -Ca builds at some
point ends up with reports like:

swp_pager_getswapspace(22): failed

and:

was killed: failed to reclaim memory

for 12 builders, MAKE_JOBS_NUMBER=3 , TMPFS_BLACKLIST
in use, 32 FreeBSD cpus, etc.

For example:

. . .
Jul 22 10:17:27 7950X3D-ZFS kernel: pid 62915 (scc_16815), jid 780, uid 0: exited on signal 11 (core dumped)
Jul 22 21:38:10 7950X3D-ZFS kernel: ue0: link state changed to DOWN
Jul 22 21:38:10 7950X3D-ZFS kernel: ue0: link state changed to UP
Jul 22 21:38:29 7950X3D-ZFS kernel: swap_pager: out of swap space
Jul 22 21:38:29 7950X3D-ZFS kernel: swp_pager_getswapspace(22): failed
Jul 22 21:39:11 7950X3D-ZFS kernel: pid 15059 (dot), jid 780, uid 0, was killed: failed to reclaim memory
Jul 22 21:43:38 7950X3D-ZFS kernel: swap_pager: out of swap space
Jul 22 21:43:38 7950X3D-ZFS kernel: swp_pager_getswapspace(14): failed
Jul 22 21:44:04 7950X3D-ZFS kernel: pid 15049 (dot), jid 780, uid 0, was killed: failed to reclaim memory
Jul 22 21:56:39 7950X3D-ZFS kernel: swap_pager: out of swap space
Jul 22 21:56:39 7950X3D-ZFS kernel: swp_pager_getswapspace(15): failed
Jul 22 21:57:12 7950X3D-ZFS kernel: pid 15045 (dot), jid 780, uid 0, was killed: failed to reclaim memory

I've not figured out a way to track down such messages
back to the relevant log file for the builds that were
killed. Neither the pid, nor the jid appear in
the log files. Similarly, nothing in /var/log/messages
identifies the poudriere Job Id or other such.

(I've never happened to be actively monitoring when
the issue happened. So I've always ended up looking at
it after the fact.)

It would be nice to be able to identify what specific
packages to try to rebuild for these --and to investigate
why the SWAP usage that had stayed under 2 GiByte ended
up reaching 512 GiBytes during that period.

A panic from the activity during another bulk -Ca
test lead to the dump providing enough context to
track down the package that was being built that
got the issue and what is was running that, in
turn, has the problem memory usage:
[2D:01:22:29] [06] [00:00:00] Building graphics/sdl2_gpu | sdl2_gpu-0.12.0 was using:
UID PID PPID C PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND
. . .
0 79229 40923 4 59 0 23524 4148 wait D - 0:00.00 [sh]
0 79230 79229 5 59 0 14208 172 wait Ds - 0:00.01 [make]
0 79233 79230 4 59 0 14668 176 wait D - 0:00.00 [sh]
0 79234 79233 5 59 0 14668 176 wait D - 0:00.00 [sh]
0 79235 79234 12 0 0 16284 356 select D - 0:00.01 [ninja]
0 79236 79235 28 59 0 223048 1052 uwait D - 0:00.44 [doxygen]
0 79272 79236 25 59 0 157589964 41424308 pfault D - 3:25.33 [dot]
0 79279 79236 31 59 0 157601740 41513520 pfault D - 3:23.41 [dot]
0 79289 79236 14 59 0 157589964 41361600 pfault D - 3:22.72 [dot]
0 79301 79236 18 49 0 157667276 41208476 pfault D - 3:24.32 [dot]
. . .
Part of the context was the /06/ text in:
. . .
root dot 79301 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
root dot 79289 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
. . .
root dot 79279 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
. . .
root dot 79272 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
. . .
root doxygen 79236 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
. . .
It identifies the [06] builder and the "Building" notice had made it to
the disk before the panic happened. Then I could check the Makefile for
if doxygen was used and it was. graphics/sdl2_gp historical build logs
suggest problems exist.
===
Mark Millard
marklmi at yahoo.com
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21a-Linux NewsLink 1.2

From Mark Millard@marklmi@yahoo.com to muc.lists.freebsd.ports on Sun Jul 27 11:51:05 2025

From Newsgroup: muc.lists.freebsd.ports

On Jul 27, 2025, at 00:33, Mark Millard <marklmi@yahoo.com> wrote:

On Jul 23, 2025, at 01:42, Mark Millard <marklmi@yahoo.com> wrote:

In a context with RAM+SWAP = 704 GiBytes (192 GiBytes being RAM,
512 GiBytes being SWAP) doing poudriere bulk -Ca builds at some
point ends up with reports like:

swp_pager_getswapspace(22): failed

and:

was killed: failed to reclaim memory

for 12 builders, MAKE_JOBS_NUMBER=3 , TMPFS_BLACKLIST
in use, 32 FreeBSD cpus, etc.

For example:

. . .
Jul 22 10:17:27 7950X3D-ZFS kernel: pid 62915 (scc_16815), jid 780, uid 0: exited on signal 11 (core dumped)
Jul 22 21:38:10 7950X3D-ZFS kernel: ue0: link state changed to DOWN
Jul 22 21:38:10 7950X3D-ZFS kernel: ue0: link state changed to UP
Jul 22 21:38:29 7950X3D-ZFS kernel: swap_pager: out of swap space
Jul 22 21:38:29 7950X3D-ZFS kernel: swp_pager_getswapspace(22): failed
Jul 22 21:39:11 7950X3D-ZFS kernel: pid 15059 (dot), jid 780, uid 0, was killed: failed to reclaim memory
Jul 22 21:43:38 7950X3D-ZFS kernel: swap_pager: out of swap space
Jul 22 21:43:38 7950X3D-ZFS kernel: swp_pager_getswapspace(14): failed
Jul 22 21:44:04 7950X3D-ZFS kernel: pid 15049 (dot), jid 780, uid 0, was killed: failed to reclaim memory
Jul 22 21:56:39 7950X3D-ZFS kernel: swap_pager: out of swap space
Jul 22 21:56:39 7950X3D-ZFS kernel: swp_pager_getswapspace(15): failed
Jul 22 21:57:12 7950X3D-ZFS kernel: pid 15045 (dot), jid 780, uid 0, was killed: failed to reclaim memory

I've not figured out a way to track down such messages
back to the relevant log file for the builds that were
killed. Neither the pid, nor the jid appear in
the log files. Similarly, nothing in /var/log/messages
identifies the poudriere Job Id or other such.

(I've never happened to be actively monitoring when
the issue happened. So I've always ended up looking at
it after the fact.)

It would be nice to be able to identify what specific
packages to try to rebuild for these --and to investigate
why the SWAP usage that had stayed under 2 GiByte ended
up reaching 512 GiBytes during that period.

A panic from the activity during another bulk -Ca
test lead to the dump providing enough context to
track down the package that was being built that
got the issue and what is was running that, in
turn, has the problem memory usage:

[2D:01:22:29] [06] [00:00:00] Building graphics/sdl2_gpu | sdl2_gpu-0.12.0

was using:

UID PID PPID C PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND
. . .
0 79229 40923 4 59 0 23524 4148 wait D - 0:00.00 [sh]
0 79230 79229 5 59 0 14208 172 wait Ds - 0:00.01 [make]
0 79233 79230 4 59 0 14668 176 wait D - 0:00.00 [sh]
0 79234 79233 5 59 0 14668 176 wait D - 0:00.00 [sh]
0 79235 79234 12 0 0 16284 356 select D - 0:00.01 [ninja]
0 79236 79235 28 59 0 223048 1052 uwait D - 0:00.44 [doxygen]
0 79272 79236 25 59 0 157589964 41424308 pfault D - 3:25.33 [dot]
0 79279 79236 31 59 0 157601740 41513520 pfault D - 3:23.41 [dot]
0 79289 79236 14 59 0 157589964 41361600 pfault D - 3:22.72 [dot]
0 79301 79236 18 49 0 157667276 41208476 pfault D - 3:24.32 [dot]
. . .

Part of the context was the /06/ text in:
. . .
root dot 79301 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
root dot 79289 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
. . .
root dot 79279 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
. . .
root dot 79272 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
. . .
root doxygen 79236 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
. . .

It identifies the [06] builder and the "Building" notice had made it to
the disk before the panic happened. Then I could check the Makefile for
if doxygen was used and it was. graphics/sdl2_gp historical build logs suggest problems exist.

Dumb typo, missing the "u" in "gpu", so: graphics/sdl2_gpu
===
Mark Millard
marklmi at yahoo.com
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21a-Linux NewsLink 1.2

Who's Online

System Info

Odd "swp_pager_getswapspace(??): failed"s happen during bulk -Ca for RAM+SWAP=704 GiBytes