• Odd "swp_pager_getswapspace(??): failed"s happen during bulk -Ca for RAM+SWAP=704 GiBytes

    From Mark Millard@marklmi@yahoo.com to muc.lists.freebsd.ports on Wed Jul 23 01:42:02 2025
    From Newsgroup: muc.lists.freebsd.ports

    In a context with RAM+SWAP = 704 GiBytes (192 GiBytes being RAM,
    512 GiBytes being SWAP) doing poudriere bulk -Ca builds at some
    point ends up with reports like:
    swp_pager_getswapspace(22): failed
    and:
    was killed: failed to reclaim memory
    for 12 builders, MAKE_JOBS_NUMBER=3 , TMPFS_BLACKLIST
    in use, 32 FreeBSD cpus, etc.
    For example:
    . . .
    Jul 22 10:17:27 7950X3D-ZFS kernel: pid 62915 (scc_16815), jid 780, uid 0: exited on signal 11 (core dumped)
    Jul 22 21:38:10 7950X3D-ZFS kernel: ue0: link state changed to DOWN
    Jul 22 21:38:10 7950X3D-ZFS kernel: ue0: link state changed to UP
    Jul 22 21:38:29 7950X3D-ZFS kernel: swap_pager: out of swap space
    Jul 22 21:38:29 7950X3D-ZFS kernel: swp_pager_getswapspace(22): failed
    Jul 22 21:39:11 7950X3D-ZFS kernel: pid 15059 (dot), jid 780, uid 0, was killed: failed to reclaim memory
    Jul 22 21:43:38 7950X3D-ZFS kernel: swap_pager: out of swap space
    Jul 22 21:43:38 7950X3D-ZFS kernel: swp_pager_getswapspace(14): failed
    Jul 22 21:44:04 7950X3D-ZFS kernel: pid 15049 (dot), jid 780, uid 0, was killed: failed to reclaim memory
    Jul 22 21:56:39 7950X3D-ZFS kernel: swap_pager: out of swap space
    Jul 22 21:56:39 7950X3D-ZFS kernel: swp_pager_getswapspace(15): failed
    Jul 22 21:57:12 7950X3D-ZFS kernel: pid 15045 (dot), jid 780, uid 0, was killed: failed to reclaim memory
    I've not figured out a way to track down such messages
    back to the relevant log file for the builds that were
    killed. Neither the pid, nor the jid appear in
    the log files. Similarly, nothing in /var/log/messages
    identifies the poudriere Job Id or other such.
    (I've never happened to be actively monitoring when
    the issue happened. So I've always ended up looking at
    it after the fact.)
    It would be nice to be able to identify what specific
    packages to try to rebuild for these --and to investigate
    why the SWAP usage that had stayed under 2 GiByte ended
    up reaching 512 GiBytes during that period.
    ===
    Mark Millard
    marklmi at yahoo.com
    --
    Posted automagically by a mail2news gateway at muc.de e.V.
    Please direct questions, flames, donations, etc. to news-admin@muc.de
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Mark Millard@marklmi@yahoo.com to muc.lists.freebsd.ports on Sun Jul 27 00:33:16 2025
    From Newsgroup: muc.lists.freebsd.ports


    On Jul 23, 2025, at 01:42, Mark Millard <marklmi@yahoo.com> wrote:
    In a context with RAM+SWAP = 704 GiBytes (192 GiBytes being RAM,
    512 GiBytes being SWAP) doing poudriere bulk -Ca builds at some
    point ends up with reports like:

    swp_pager_getswapspace(22): failed

    and:

    was killed: failed to reclaim memory

    for 12 builders, MAKE_JOBS_NUMBER=3 , TMPFS_BLACKLIST
    in use, 32 FreeBSD cpus, etc.

    For example:

    . . .
    Jul 22 10:17:27 7950X3D-ZFS kernel: pid 62915 (scc_16815), jid 780, uid 0: exited on signal 11 (core dumped)
    Jul 22 21:38:10 7950X3D-ZFS kernel: ue0: link state changed to DOWN
    Jul 22 21:38:10 7950X3D-ZFS kernel: ue0: link state changed to UP
    Jul 22 21:38:29 7950X3D-ZFS kernel: swap_pager: out of swap space
    Jul 22 21:38:29 7950X3D-ZFS kernel: swp_pager_getswapspace(22): failed
    Jul 22 21:39:11 7950X3D-ZFS kernel: pid 15059 (dot), jid 780, uid 0, was killed: failed to reclaim memory
    Jul 22 21:43:38 7950X3D-ZFS kernel: swap_pager: out of swap space
    Jul 22 21:43:38 7950X3D-ZFS kernel: swp_pager_getswapspace(14): failed
    Jul 22 21:44:04 7950X3D-ZFS kernel: pid 15049 (dot), jid 780, uid 0, was killed: failed to reclaim memory
    Jul 22 21:56:39 7950X3D-ZFS kernel: swap_pager: out of swap space
    Jul 22 21:56:39 7950X3D-ZFS kernel: swp_pager_getswapspace(15): failed
    Jul 22 21:57:12 7950X3D-ZFS kernel: pid 15045 (dot), jid 780, uid 0, was killed: failed to reclaim memory

    I've not figured out a way to track down such messages
    back to the relevant log file for the builds that were
    killed. Neither the pid, nor the jid appear in
    the log files. Similarly, nothing in /var/log/messages
    identifies the poudriere Job Id or other such.

    (I've never happened to be actively monitoring when
    the issue happened. So I've always ended up looking at
    it after the fact.)

    It would be nice to be able to identify what specific
    packages to try to rebuild for these --and to investigate
    why the SWAP usage that had stayed under 2 GiByte ended
    up reaching 512 GiBytes during that period.
    A panic from the activity during another bulk -Ca
    test lead to the dump providing enough context to
    track down the package that was being built that
    got the issue and what is was running that, in
    turn, has the problem memory usage:
    [2D:01:22:29] [06] [00:00:00] Building graphics/sdl2_gpu | sdl2_gpu-0.12.0 was using:
    UID PID PPID C PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND
    . . .
    0 79229 40923 4 59 0 23524 4148 wait D - 0:00.00 [sh]
    0 79230 79229 5 59 0 14208 172 wait Ds - 0:00.01 [make]
    0 79233 79230 4 59 0 14668 176 wait D - 0:00.00 [sh]
    0 79234 79233 5 59 0 14668 176 wait D - 0:00.00 [sh]
    0 79235 79234 12 0 0 16284 356 select D - 0:00.01 [ninja]
    0 79236 79235 28 59 0 223048 1052 uwait D - 0:00.44 [doxygen]
    0 79272 79236 25 59 0 157589964 41424308 pfault D - 3:25.33 [dot]
    0 79279 79236 31 59 0 157601740 41513520 pfault D - 3:23.41 [dot]
    0 79289 79236 14 59 0 157589964 41361600 pfault D - 3:22.72 [dot]
    0 79301 79236 18 49 0 157667276 41208476 pfault D - 3:24.32 [dot]
    . . .
    Part of the context was the /06/ text in:
    . . .
    root dot 79301 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
    root dot 79289 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
    . . .
    root dot 79279 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
    . . .
    root dot 79272 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
    . . .
    root doxygen 79236 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
    . . .
    It identifies the [06] builder and the "Building" notice had made it to
    the disk before the panic happened. Then I could check the Makefile for
    if doxygen was used and it was. graphics/sdl2_gp historical build logs
    suggest problems exist.
    ===
    Mark Millard
    marklmi at yahoo.com
    --
    Posted automagically by a mail2news gateway at muc.de e.V.
    Please direct questions, flames, donations, etc. to news-admin@muc.de
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Mark Millard@marklmi@yahoo.com to muc.lists.freebsd.ports on Sun Jul 27 11:51:05 2025
    From Newsgroup: muc.lists.freebsd.ports

    On Jul 27, 2025, at 00:33, Mark Millard <marklmi@yahoo.com> wrote:
    On Jul 23, 2025, at 01:42, Mark Millard <marklmi@yahoo.com> wrote:

    In a context with RAM+SWAP = 704 GiBytes (192 GiBytes being RAM,
    512 GiBytes being SWAP) doing poudriere bulk -Ca builds at some
    point ends up with reports like:

    swp_pager_getswapspace(22): failed

    and:

    was killed: failed to reclaim memory

    for 12 builders, MAKE_JOBS_NUMBER=3 , TMPFS_BLACKLIST
    in use, 32 FreeBSD cpus, etc.

    For example:

    . . .
    Jul 22 10:17:27 7950X3D-ZFS kernel: pid 62915 (scc_16815), jid 780, uid 0: exited on signal 11 (core dumped)
    Jul 22 21:38:10 7950X3D-ZFS kernel: ue0: link state changed to DOWN
    Jul 22 21:38:10 7950X3D-ZFS kernel: ue0: link state changed to UP
    Jul 22 21:38:29 7950X3D-ZFS kernel: swap_pager: out of swap space
    Jul 22 21:38:29 7950X3D-ZFS kernel: swp_pager_getswapspace(22): failed
    Jul 22 21:39:11 7950X3D-ZFS kernel: pid 15059 (dot), jid 780, uid 0, was killed: failed to reclaim memory
    Jul 22 21:43:38 7950X3D-ZFS kernel: swap_pager: out of swap space
    Jul 22 21:43:38 7950X3D-ZFS kernel: swp_pager_getswapspace(14): failed
    Jul 22 21:44:04 7950X3D-ZFS kernel: pid 15049 (dot), jid 780, uid 0, was killed: failed to reclaim memory
    Jul 22 21:56:39 7950X3D-ZFS kernel: swap_pager: out of swap space
    Jul 22 21:56:39 7950X3D-ZFS kernel: swp_pager_getswapspace(15): failed
    Jul 22 21:57:12 7950X3D-ZFS kernel: pid 15045 (dot), jid 780, uid 0, was killed: failed to reclaim memory

    I've not figured out a way to track down such messages
    back to the relevant log file for the builds that were
    killed. Neither the pid, nor the jid appear in
    the log files. Similarly, nothing in /var/log/messages
    identifies the poudriere Job Id or other such.

    (I've never happened to be actively monitoring when
    the issue happened. So I've always ended up looking at
    it after the fact.)

    It would be nice to be able to identify what specific
    packages to try to rebuild for these --and to investigate
    why the SWAP usage that had stayed under 2 GiByte ended
    up reaching 512 GiBytes during that period.

    A panic from the activity during another bulk -Ca
    test lead to the dump providing enough context to
    track down the package that was being built that
    got the issue and what is was running that, in
    turn, has the problem memory usage:

    [2D:01:22:29] [06] [00:00:00] Building graphics/sdl2_gpu | sdl2_gpu-0.12.0

    was using:

    UID PID PPID C PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND
    . . .
    0 79229 40923 4 59 0 23524 4148 wait D - 0:00.00 [sh]
    0 79230 79229 5 59 0 14208 172 wait Ds - 0:00.01 [make]
    0 79233 79230 4 59 0 14668 176 wait D - 0:00.00 [sh]
    0 79234 79233 5 59 0 14668 176 wait D - 0:00.00 [sh]
    0 79235 79234 12 0 0 16284 356 select D - 0:00.01 [ninja]
    0 79236 79235 28 59 0 223048 1052 uwait D - 0:00.44 [doxygen]
    0 79272 79236 25 59 0 157589964 41424308 pfault D - 3:25.33 [dot]
    0 79279 79236 31 59 0 157601740 41513520 pfault D - 3:23.41 [dot]
    0 79289 79236 14 59 0 157589964 41361600 pfault D - 3:22.72 [dot]
    0 79301 79236 18 49 0 157667276 41208476 pfault D - 3:24.32 [dot]
    . . .

    Part of the context was the /06/ text in:
    . . .
    root dot 79301 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
    root dot 79289 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
    . . .
    root dot 79279 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
    . . .
    root dot 79272 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
    . . .
    root doxygen 79236 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r
    . . .

    It identifies the [06] builder and the "Building" notice had made it to
    the disk before the panic happened. Then I could check the Makefile for
    if doxygen was used and it was. graphics/sdl2_gp historical build logs suggest problems exist.
    Dumb typo, missing the "u" in "gpu", so: graphics/sdl2_gpu
    ===
    Mark Millard
    marklmi at yahoo.com
    --
    Posted automagically by a mail2news gateway at muc.de e.V.
    Please direct questions, flames, donations, etc. to news-admin@muc.de
    --- Synchronet 3.21a-Linux NewsLink 1.2