• Re: "failed to reclaim memory" with much free physmem

    From Mark Johnston@markj@freebsd.org to muc.lists.freebsd.stable on Wed Oct 22 11:45:54 2025
    From Newsgroup: muc.lists.freebsd.stable

    On Tue, Sep 16, 2025 at 09:33:26PM -0400, Garrett Wollman wrote:
    <<On Fri, 12 Sep 2025 21:35:26 -0400, Garrett Wollman <wollman@bimajority.org> said:

    The point being that the ARC is supposed to respond to backpressure
    long before memory runs out. And again, we're talking about a system
    with 100 GiB of outright FREE physical memory. There's no possible
    way that can be fully allocated in less than 5 minutes -- the NICs
    aren't that fast and the servers aren't doing anything else.

    The past couple of nights we've had failures of other NFS servers
    (same FreeBSD build, different hardware, different clients, different
    data). The most recent one, unlike the one I started this thread
    with, didn't get so far as to invoke the OOM killer -- it seems to
    have been stuck in arc_wait_for_eviction(). I wasn't in a position to
    get a backtrace, so I can't tell if this was the call from arc_get_data_impl() (which is called for every block allocated but
    normally just returns immediately) or the one from arc_lowmem() (which
    is ultimately called from the vm_lowmem event handler when the system
    is really out of memory).

    As with previous failures, this one was with plenty of physical memory seemingly available (20 GiB out of 96 GiB). Separate swap partition,
    of course, and after 34 minutes memory allocation is pretty much back
    to where it was before the crash.

    Sorry to chime in late. Is this a NUMA system by any chance? That is,
    what does sysctl vm.ndomains report?


    --
    Posted automagically by a mail2news gateway at muc.de e.V.
    Please direct questions, flames, donations, etc. to news-admin@muc.de
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Garrett Wollman@wollman@bimajority.org to muc.lists.freebsd.stable on Wed Oct 22 22:48:13 2025
    From Newsgroup: muc.lists.freebsd.stable

    <<On Wed, 22 Oct 2025 11:45:54 -0400, Mark Johnston <markj@freebsd.org> said:

    On Tue, Sep 16, 2025 at 09:33:26PM -0400, Garrett Wollman wrote:

    As with previous failures, this one was with plenty of physical memory
    seemingly available (20 GiB out of 96 GiB). Separate swap partition,
    of course, and after 34 minutes memory allocation is pretty much back
    to where it was before the crash.

    Sorry to chime in late. Is this a NUMA system by any chance? That is,
    what does sysctl vm.ndomains report?

    It's hard to roll back my short-term memory to where it was a month
    ago, but I checked several of our NFS servers of various vintages
    (some old and small, some new) and all of them show vm.ndomains == 2.

    Correction: one of them, a newer AMD server which hasn't crashed in
    this way, has vm.ndomains == 1. I suppose it may be a single-socket
    system, given that it's weird in a bunch of other ways.

    -GAWollman



    --
    Posted automagically by a mail2news gateway at muc.de e.V.
    Please direct questions, flames, donations, etc. to news-admin@muc.de
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Garrett Wollman@wollman@bimajority.org to muc.lists.freebsd.stable on Wed Oct 22 22:53:10 2025
    From Newsgroup: muc.lists.freebsd.stable

    <<On Wed, 22 Oct 2025 22:48:13 -0400, Garrett Wollman <wollman@bimajority.org> said:

    It's hard to roll back my short-term memory to where it was a month
    ago, but I checked several of our NFS servers of various vintages
    (some old and small, some new) and all of them show vm.ndomains == 2.

    Someone who uses gmail please ping markj and let him know that I
    responded to this question. markj@freebsd.org is aliased to gmail
    so non-gmail users can't send mail to him through freebsd.org.

    -GAWollman



    --
    Posted automagically by a mail2news gateway at muc.de e.V.
    Please direct questions, flames, donations, etc. to news-admin@muc.de
    --- Synchronet 3.21a-Linux NewsLink 1.2