Since we upgraded to 14.3 last summer, we have been experiencingI once saw a similar bug. In my case I had a process that mmap()ed
numerous memory accounting issues on our NFS servers. These manifest
as a server *desperate* to free up memory despite having multiple
gigabytes of physical RAM available. (Some of these machines have 1
TiB of RAM, with more than 64 GiB free, and were swapping and invoking
the OOM-killer.)
I had a server deadlock just now after only three days of uptime with
32 GiB of free memory. Prior to the crash, about 70 GiB (of 128) was
used by the ARC, of which some 60 GiB was accounted for as
"evictable", and the load was pretty modest.
In DDB on the console, I noted:
pid ppid pgrp uid state wmesg wchan cmd
60673 60672 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60672 1 3008 0 S wait 0xfffffe031ee41560 nrpe
60670 1186 60670 0 Ds db->db_ 0xfffff8173309f1e8 sshd-session 60669 1202 1202 0 D voffloc 0xfffff8024db4966a perl
60668 60667 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60667 1 3008 0 S wait 0xfffffe031ee41000 nrpe
60665 60664 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60664 1 3008 0 S wait 0xfffffe031723a5c0 nrpe
60662 60661 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60661 1 3008 0 S wait 0xfffffe03172395a0 nrpe
60659 60658 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60658 1 3008 0 S wait 0xfffffe0317239040 nrpe
60656 60655 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60655 1 3008 0 S wait 0xfffffe0317238ae0 nrpe
60653 60652 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60652 1 3008 0 S wait 0xfffffe0317238580 nrpe
60650 60649 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60649 1 3008 0 S wait 0xfffffe0317238020 nrpe
60647 60646 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60646 1 3008 0 S wait 0xfffffe0317237ac0 nrpe
60644 60643 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60643 1 3008 0 S wait 0xfffffe0317237000 nrpe
60641 60640 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60640 1 3008 0 S wait 0xfffffe00d3cfa040 nrpe
60638 1202 1202 0 D voffloc 0xfffff8024db4966a perl
60637 1186 60637 0 Ds db->db_ 0xfffff8173309f1e8 sshd-session 60636 60635 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60635 1 3008 0 S wait 0xfffffe00d3cf9ae0 nrpe
60633 60632 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60632 1 3008 0 S wait 0xfffffe00d3cf9580 nrpe
60630 60629 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60629 1 3008 0 S wait 0xfffffe00d3cf9020 nrpe
60627 60626 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60626 1 3008 0 S wait 0xfffffe00d3cf8560 nrpe
60624 60623 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60623 1 3008 0 S wait 0xfffffe00d3cf8000 nrpe
60621 60620 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60620 1 3008 0 S wait 0xfffffe0317188060 nrpe
60618 60617 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60617 1 3008 0 S wait 0xfffffe0317187b00 nrpe
60615 60614 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60614 1 3008 0 S wait 0xfffffe03171875a0 nrpe
60612 60611 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60611 1 3008 0 S wait 0xfffffe0317186ae0 nrpe
60609 60608 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60608 1 3008 0 S wait 0xfffffe0317186580 nrpe
60606 1186 60606 0 Ds db->db_ 0xfffff8173309f1e8 sshd-session 60605 60604 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60604 1 3008 0 S wait 0xfffffe0317186020 nrpe
60602 60601 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60601 1 3008 0 S wait 0xfffffe0317185ac0 nrpe
60599 1202 1202 0 D voffloc 0xfffff8024db4966a perl
60598 60597 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60597 1 3008 0 S wait 0xfffffe0317185560 nrpe
60595 60594 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60594 1 3008 0 S wait 0xfffffe0317185000 nrpe
60592 60591 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60591 1 3008 0 S wait 0xfffffe031724c5c0 nrpe
60589 60588 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60588 1 3008 0 S wait 0xfffffe031724c060 nrpe
60586 60585 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60585 1 3008 0 S wait 0xfffffe031724b5a0 nrpe
60583 60582 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60582 1 3008 0 S wait 0xfffffe031724a580 nrpe
60580 60579 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60579 1 3008 0 S wait 0xfffffe031724a020 nrpe
60577 1186 60577 0 Ds aw.aew_ 0xfffffe0326e5a608 sshd-session 60576 60575 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60575 1 3008 0 S wait 0xfffffe0317249560 nrpe
60573 1202 1202 0 D aw.aew_ 0xfffffe0326df6478 perl
60572 60571 3008 0 D db->db_ 0xfffff8058173af68 nrpe
60571 1 3008 0 S wait 0xfffffe0317249000 nrpe
5015 5010 5015 6263 Ss+ ttyin 0xfffff810aa50a8b0 zsh
5010 5006 5006 6263 S select 0xfffff8024ca966c0 sshd-session
5006 1186 5006 0 Ss select 0xfffff8024ca984c0 sshd-session
3008 1 3008 0 Ss select 0xfffff80209dc98c0 nrpe
2910 1 2910 0 Ds+ aw.aew_ 0xfffffe03274d66e8 getty
This getty is the one running on the console tty, which was stuck.
Note the wait channel is "aw.aew_cv", which is part of the logic for
evicting buffers from the ARC. Other threads are waiting for a
dbuf (ZFS disk buffer) object mutex.
I'm currently planning on taking us to 14.4 later this spring, but it
would be nice to know if anyone else has seen this bug or has a fix.
I've tried dropping kern.maxvnodes and increasing
vfs.zfs.arc_free_target, with no change in symptoms.
This particular server is due to be replaced but the new disk array
(which was ordered in January) won't ship until late April per the
vendor.
-GAWollman
A less effective workaround was to set vfs.zfs.arc.min to some
reasonable value. That can prevent ARC from shrinking too far. You
could try that.
Another thing you could try is to run "vmstat -o" when the system is
in the problematic state.
<<On Mon, 16 Mar 2026 15:08:44 -0600, Alan Somers <asomers@freebsd.org> said:I don't know how to do it from ddb. But if you dump a core file, then
A less effective workaround was to set vfs.zfs.arc.min to some
reasonable value. That can prevent ARC from shrinking too far. You
could try that.
So far as I can tell, the ARC doesn't actually shrink, and shouldn't
need to given the gigabytes of free physmem at the time (well,
immediately prior). Within 5 minutes of the crash, the total ARC size
was 70 GiB, c_max was 127 GiB, and c_min was 4 GiB -- in practice it's
never anywhere near that small. The first observation after the
server came back up, ARC size was already over 20 GiB.
Either *something* is causing the kernel to think it has no free
memory when there's actually lots, or else something is causing the
kernel to allocate gigabytes of RAM much faster than we can observe it happening.
There's epsilon memory in the inactive queue on this system, before or
after the crash: it's so small I can't even see the line on the graph.
The 24-hour maximum is 268 MiB, or about 0.2% of RAM.
Another thing you could try is to run "vmstat -o" when the system is
in the problematic state.
What's the equivalent in DDB? No getty, no login.
-GAWollman
I don't know how to do it from ddb. But if you dump a core file,
BTW, Alan, mail to your freebsd.org mailbox bounces because youPretty tiny swap partitions? Maybe that's the problem. I recall kib@
forward to gmail.
<<On Mon, 16 Mar 2026 15:29:00 -0600, Alan Somers <asomers@freebsd.org> said:
I don't know how to do it from ddb. But if you dump a core file,
None of our systems are set up for that. They all have huge memory
and pretty tiny swap partitions, and in any case, they don't panic,
they just deadlock. Or the OOM killer just shoots all user processes;
these are nearly indistinguishable from a service provider's
perspective.
They're just NFS servers; they don't run anything else except what's necessary for monitoring and administration.
-GAWollman
I once saw a similar bug. In my case I had a process that mmap()ed
some very large files on fusefs, consuming lots of inactive pages.
And when the system comes under memory pressure, it asks ARC to evict
first. So the ARC would end up shrinking down to arc_min every time.
In my case, the solution was to set vfs.fusefs.data_cache_mode=0 . I
suspect that similar bugs could be possible with UFS or tmpfs, if they
have giant files that are mmaped().
On Mon, Mar 16, 2026 at 3:39rC>PM Garrett Wollman <wollman@bimajority.org> wrote:Tiny compared to RAM, typically 16 or 32 GiB. After all, these are
None of our systems are set up for that. They all have huge memoryPretty tiny swap partitions?
and pretty tiny swap partitions, and in any case, they don't panic,
they just deadlock. Or the OOM killer just shoots all user processes;
these are nearly indistinguishable from a service provider's
perspective.
On Mon, Mar 16, 2026 at 03:08:44PM -0600, Alan Somers wrote:I experienced this bug in 2021, and reproduced it on both FreeBSD 12.2
I once saw a similar bug. In my case I had a process that mmap()ed
some very large files on fusefs, consuming lots of inactive pages.
And when the system comes under memory pressure, it asks ARC to evict first. So the ARC would end up shrinking down to arc_min every time.
In my case, the solution was to set vfs.fusefs.data_cache_mode=0 . I suspect that similar bugs could be possible with UFS or tmpfs, if they
have giant files that are mmaped().
What are 'similar bugs with UFS or tmpfs'?
Can you please be more specific, what is the erronous behavior?
<<On Mon, 16 Mar 2026 15:55:59 -0600, Alan Somers <asomers@freebsd.org> said:You won't need a 2 TB SSD. By default, FreeBSD will make a mini dump,
On Mon, Mar 16, 2026 at 3:39rC>PM Garrett Wollman <wollman@bimajority.org> wrote:
None of our systems are set up for that. They all have huge memory
and pretty tiny swap partitions, and in any case, they don't panic,
they just deadlock. Or the OOM killer just shoots all user processes;
these are nearly indistinguishable from a service provider's
perspective.
Pretty tiny swap partitions?
Tiny compared to RAM, typically 16 or 32 GiB. After all, these are
NFS servers, they shouldn't have more than a few dozen MiB of
swappable anonymous memory.(*) We're not going to put a 2T SSD as a hopefully-never-to-be-used swap drive in a file server.
I configured a dump device on the server that crashed today, if it
crashes again when I'm at a keyboard I'll see if I can get to write a
dump in the 32 GiB of swap that it has configured.
-GAWollman
(*) If the kernel erroneously thinks it's out of free memory and
swapping stuff out only opens up a few MiB, that would certainly
explain why it goes on to ARC eviction and eventual OOM. On this
server, after two hours of uptime, I see:
Device 1K-blocks Used Avail Capacity
/dev/gpt/swap0 33554432 32132 33522300 0%
ZFS is documented to have the property: "This approach providesThat's not relevant in this case, because no ZFS file was mmapped().
coherency between memory-mapped and IO access as the expense of wasted
memory due to having two copies of the file in memory and extra overhead caused by the need to copy the contents between the two copies."
(Chapter 10, page 548, last bullet item of the 2nd edition of the design
and implementation book.)
| Sysop: | Amessyroom |
|---|---|
| Location: | Fayetteville, NC |
| Users: | 65 |
| Nodes: | 6 (0 / 6) |
| Uptime: | 12:08:06 |
| Calls: | 862 |
| Files: | 1,311 |
| D/L today: |
5 files (10,064K bytes) |
| Messages: | 265,374 |