• =?US-ASCII?Q?Re=3A_FYI=3B_14=2E3=3A_A_discord_report_o?= =?US-ASCII?Q?f_Wired_Memory_growing_to_17_GiBy?= =?US-ASCII?Q?tes_over_something_like_60_days=3B_?= =?US-ASCII?Q?ARC_shrinks_to=2C_say=2C_1942_MiBytes?=

    From Sulev-Madis Silber@freebsd-stable-freebsd-org730@ketas.si.pri.ee to muc.lists.freebsd.stable on Tue Aug 12 04:24:31 2025
    From Newsgroup: muc.lists.freebsd.stable

    damn if this is the same thing that i experience on 13. battled for long time now
    the worst version of this is when wired grows to entire ram size. then entire system becomes unusable, unless you don't need userland ever
    things that use mmap seem to be very good at it, as well as having storage device write speeds low "helps" as well
    i'm not the only one who reports it
    the problem is also hard to debug it seems, otherwise it would have been fixed? i've seen number of zfs related wtf's over a decade but they do get eventually fixed
    in my case, i don't think it leaks. it does give it back on memory pressure. to a point, still super high wired and i can't see where it goes to. it's not in anywhere in that lines that various stats utils give
    i think that it rightfully assumes that free ram is wasted ram and just caches something somewhere and doesn't ever give it back or at least does it very slowly
    i for example observe it being ok for a while after a reboot but after you start to actually using zfs, running scrubs and so on, it gets into weird state where it's slower. nothing fails, it just stays like this
    anyone else with such issues?
    On August 11, 2025 11:18:47 PM GMT+03:00, Mark Millard <marklmi@yahoo.com> wrote:
    Context reported by notafet :

    14.3-RELEASE-p2 GENERIC on amd64 with ZFS in use

    RAM: looks to be 24 GiBytes, not explicitly mentioned
    SWAP: 8192 MiBytes
    (From using the image of top's figures.)

    Wired: 17 GiBytes
    ARC Total: 1942 MiBytes

    SWAP used: 1102 MiBytes

    The link to the storage channel's message is:

    https://discord.com/channels/727023752348434432/757305697527398481/1404367777904463914


    ===
    Mark Millard
    marklmi at yahoo.com


    --
    Posted automagically by a mail2news gateway at muc.de e.V.
    Please direct questions, flames, donations, etc. to news-admin@muc.de
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Sulev-Madis Silber@freebsd-stable-freebsd-org730@ketas.si.pri.ee to muc.lists.freebsd.stable on Tue Aug 12 06:39:04 2025
    From Newsgroup: muc.lists.freebsd.stable

    that's what i'm talking about!!!
    i do have swap
    i also have only 4g of ram
    but system is not swapping, except rarely
    with less ram, one would expect just slower performance. it doesn't become slower, it's just slow. just like with less cpu power, buildworld takes 9h here. acceptable. i can also build rust & clang here, takes all ram and 6g swap. but this isn't. the tests you see me making shows that machine doesn't really fail per se
    currently my machine shows:
    last pid: 2292; load averages: 0.25, 0.29, 0.30 up 1+02:56:04 05:44:57
    670 threads: 3 running, 626 sleeping, 41 waiting
    CPU 0: 3.1% user, 0.0% nice, 0.8% system, 0.0% interrupt, 96.1% idle
    CPU 1: 0.0% user, 0.0% nice, 3.1% system, 0.0% interrupt, 96.9% idle
    Mem: 273M Active, 1310M Inact, 144M Laundry, 1969M Wired, 155M Free
    ARC: 856M Total, 403M MFU, 183M MRU, 1775K Anon, 20M Header, 247M Other
    272M Compressed, 865M Uncompressed, 3.19:1 Ratio
    Swap: 16G Total, 1656M Used, 14G Free, 10% Inuse
    nevermind, i've seen 22:1 compress ratios after a scrub which is also bit wtf. maybe it's good. i'm no (z)fs expert
    but my main question is, what's taking 1.1g? i don't do anything to achieve this. only culprit is zfs i'm afraid
    and there it's 15g of non-arc related kernel memory. iirc userland can wire things too but
    still begs a question. my small box has over 25% ram allocated to something i can't even find. it's not arc. what is it? at least i want to know where it's going. then we could decide if it's a bug
    i know zfs also does write caches
    but i mean, how to i know what and where the memory goes to?
    vmstat -m or vmstat -z doesn't show it and i can't find anything else
    zfs-stats -a from sysutils/zfs-stats gets me
    ...
    Kernel Memory: 375.51 MiB
    Data: 88.89% 333.79 MiB
    Text: 11.11% 41.71 MiB
    Kernel Memory Map: 3.76 GiB
    Size: 44.37% 1.67 GiB
    Free: 55.63% 2.09 GiB
    ...
    that looks fairly reasonable. where's the rest of 700mb tho? that's not arc
    of in that case, 15g
    caches are fine but why does it slow down over time. why doesn't it give it back. i bet there are better methods than reboot
    funnily, even with that little ram and pool sizes, my fs operations are surprisingly fast, despite lacking hw
    funnily once i got speeds like 40mb/s on even faster machine
    here i actually get 100mb/s write and 200mb/s read on zfs 2 disk mirror 06:11,root@green:/gold/files/tmp# dd if=/dev/urandom of=urandom bs=10m count=1000 st
    atus=progress
    10433331200 bytes (10 GB, 9950 MiB) transferred 103.066s, 101 MB/s
    1000+0 records in
    1000+0 records out
    10485760000 bytes transferred in 103.551315 secs (101261485 bytes/sec) 06:13,root@green:/gold/files/tmp# dd if=urandom of=/dev/null bs=10m status=progress
    10412359680 bytes (10 GB, 9930 MiB) transferred 52.048s, 200 MB/s
    1000+0 records in
    1000+0 records out
    10485760000 bytes transferred in 52.396408 secs (200123640 bytes/sec)
    that shows that this box does still have considerable overall performance. cpu is c2d btw. none of this was cached. so cpu, ram, os, hw, disks, all seems quite fine and can handle it. i get like near expected raw physical disk speeds here
    so where's the issue?
    why there's mysterious slowdowns and where the ram goes etc
    during those tests, wired and arc gets even higher, basically most of ram, but nothing fails. but git on ports tree could push it over if i let it. i thing this is all related
    what are those issues, i don't get. judging from low load avg and low swap usage, i'm not really using this machine, meaning i don't really need better hw here
    probably sun engineers rolling in their graves or beds when they see their high performance super server only zfs being used like this, not intended way, but still
    maybe there's a way for both? so zfs still works well on 100+ disk pools with 2tb of ram and also on like here
    and everything in the middle
    unfortunately i can't fix it. i can only report unexpected behaviour
    On August 12, 2025 5:16:00 AM GMT+03:00, "Edward Sanford Sutton, III" <mirror176@hotmail.com> wrote:
    On 8/11/25 18:24, Sulev-Madis Silber wrote:
    damn if this is the same thing that i experience on 13. battled for long time now

    the worst version of this is when wired grows to entire ram size. then entire system becomes unusable, unless you don't need userland ever

    things that use mmap seem to be very good at it, as well as having storage device write speeds low "helps" as well

    i'm not the only one who reports it

    the problem is also hard to debug it seems, otherwise it would have been fixed? i've seen number of zfs related wtf's over a decade but they do get eventually fixed

    in my case, i don't think it leaks. it does give it back on memory pressure. to a point, still super high wired and i can't see where it goes to. it's not in anywhere in that lines that various stats utils give

    If wired is taken up by ZFS ARC it should give it up reasonably quickly to within ARC's bounds. Unfortunately ZFS ARC is not the only thing to use memory as wired and wired memory is not supposed to be able to be swapped out. That has been confusing for ZFS having came from UFS as UFS memory was more clearly marked as 'temporarily using RAM while we can' for tools like top.

    i think that it rightfully assumes that free ram is wasted ram and just caches something somewhere and doesn't ever give it back or at least does it very slowly

    And without caching, ZFS performance can be quite horrible as copy-on-write should cause noticeable fragmentation to all but single write files while metadata similarly never seems to get regrouped efficiently after edits to it without a full copy rewriting it.

    i for example observe it being ok for a while after a reboot but after you start to actually using zfs, running scrubs and so on, it gets into weird state where it's slower. nothing fails, it just stays like this

    I hadn't tracked it down specifically but see ZFS get slower too. My 'guess' is it has trouble reasonably tracking what file metadata to keep in memory once heavy memory pressure has been applied on the system (definitely firefox or 'sometimes' a large poudriere job seem easy enough to trigger it). Performance stays down if memory pressure causing processes have not been closed and it doesn't always seem to work well again after closing them.
    I see biggest performance hits to fragmented ZFS filesystem metadata which reads very slowly from magnetic disk.
    I think atime updates were one cause of it which is likely why installs have disabled that on all but 1 zfs pool; haven't tested if relatime is still much of a source of it. I saw very high performance on a pool that received a backup of my full system (either 20MB/s or 200MB/s doing directory listings after transfer clean reboot but can't remember) and found very bad performance checking it a short time later (<2MB/s not uncommon after cron jobs ran so likely that less database, permissions, pkg checks, etc. ran around 'accessing' the content). I had disabled atime for the backup pool and set it to read only trying to mitigate such unnecessary performance drops but never did a lot of testing around it.
    Other disk modifications/deletions cause it too. Git runs slower over time doing updates, checking its in a clean state or not, and even 'git log' after doing a series of updates. Running `git gc` seems to help but seemed less effective when ran every time instead of waiting until a number of pull requests and maybe other activities caused slowdown first. Cache from ccache is another candidate for horrible performance. I have a ccache4 61GB cache that shows it has been through 301 cleanups that seems to be getting slower but I've had <20GB cache running more than 15x slower that this one after a good # of cleanups. Not permitting cleanups until willing to make a copy seems the best choice for performance. Running `ccache -X 13` or # greater than original compression setting seems to help reorder it somewhat but requires ccache4 (more efficient but misses some compiler substitutions in poudriere builds so less can be cache accelerated).
    I normally 'fix' the performance degradation by moving the impacted area and copying it back (ex: `mv /var/cache/ccache /var/cache/ccache.orig&&cp -a /var/cache/ccache2 /var/cache/ccache` and then rm -rf the original if the copy creates without error but that sometimes becomes more troublesome if a folder is a dataset instead of just a folder. It gets silly looking for and managing candidates as even things like a Firefox or Thunderbird profile get a lot slower with metadata fragmentation. I haven't tried tools to rewrite files in place to see if that is enough to cause metadata to be rewritten well without full copies being temporarily stored.
    I haven't tracked how much impact snapshots vs no snapshots have regarding slowing down datasets that have been through a lot of modification/deletion. Making copies this way will increase disk size if dedupe and block cloning are not in use and last I heard block cloning copies expand out to their own after ZFS replication.
    I find that unmounting and remounting swap devices after memory conditions are reasonable again helps but don't know if its truly related. If you don't have swap, try adding some and note if there is any impact too.

    anyone else with such issues?



    On August 11, 2025 11:18:47 PM GMT+03:00, Mark Millard <marklmi@yahoo.com> wrote:
    Context reported by notafet :

    14.3-RELEASE-p2 GENERIC on amd64 with ZFS in use

    RAM: looks to be 24 GiBytes, not explicitly mentioned
    SWAP: 8192 MiBytes
    (From using the image of top's figures.)

    Wired: 17 GiBytes
    ARC Total: 1942 MiBytes

    SWAP used: 1102 MiBytes

    The link to the storage channel's message is:

    https://discord.com/channels/727023752348434432/757305697527398481/1404367777904463914


    ===
    Mark Millard
    marklmi at yahoo.com





    --
    Posted automagically by a mail2news gateway at muc.de e.V.
    Please direct questions, flames, donations, etc. to news-admin@muc.de
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Sulev-Madis Silber@freebsd-stable-freebsd-org730@ketas.si.pri.ee to muc.lists.freebsd.stable on Wed Aug 13 04:50:54 2025
    From Newsgroup: muc.lists.freebsd.stable

    i ran this oneliner sequence. it was a (poor) attempt to force push ram empty. it mostly got empty. but. i think i have a problem here. do you also see problem here? i have 532.07 mb of kernel memory going somewhere. i won't be looking if it doesn't get slower or "overflow" in rare cases. is that a zfs write cache? note that this is on 13.5-RELEASE-p3, currently no other releases, stables or currents where i can perform same tests as a comparison, nor any more powerful systems. i'm reading here that zfs can't give ram back if it's middle of "something". unsure what to think
    04:32,ketas@green:~> stress-ng --vm 1 --vm-bytes 100% -t 10s ; top -b | head -7 ; zf
    s-stats -a | fgrep 'Kernel Memory:'
    stress-ng: info: [61750] setting to a 10 secs run per stressor
    stress-ng: info: [61750] dispatching hogs: 1 vm
    stress-ng: info: [61814] vm: using 3.88G per stressor instance (total 3.88G of 3.88
    G available memory)
    stress-ng: warn: [61750] metrics-check: all bogo-op counters are zero, data may be
    incorrect
    stress-ng: info: [61750] skipped: 0
    stress-ng: info: [61750] passed: 1: vm (1)
    stress-ng: info: [61750] failed: 0
    stress-ng: info: [61750] metrics untrustworthy: 0
    stress-ng: info: [61750] successful run completed in 45.04 secs
    last pid: 62582; load averages: 0.57, 0.92, 1.06 up 2+01:45:26 04:34:19 261 processes: 1 running, 260 sleeping
    CPU: 6.4% user, 2.6% nice, 8.1% system, 0.6% interrupt, 82.2% idle
    Mem: 93M Active, 20M Inact, 484K Laundry, 1246M Wired, 2492M Free
    ARC: 392M Total, 156M MFU, 102M MRU, 2794K Anon, 3493K Header, 127M Other
    38M Compressed, 224M Uncompressed, 5.88:1 Ratio
    Swap: 16G Total, 2627M Used, 13G Free, 16% Inuse
    Kernel Memory: 321.93 MiB 04:34,ketas@green:~>
    --
    Posted automagically by a mail2news gateway at muc.de e.V.
    Please direct questions, flames, donations, etc. to news-admin@muc.de
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Sulev-Madis Silber@freebsd-stable-freebsd-org730@ketas.si.pri.ee to muc.lists.freebsd.stable on Thu Aug 14 02:34:40 2025
    From Newsgroup: muc.lists.freebsd.stable

    can anybody explain to me how to cap zfs related memory usage in kernel?
    or even help me get it
    after i learned that git pull on ports tree with ton updates takes machine down in seconds i managed to find and adjust to have this:
    [core]
    packedGitWindowSize = 1m
    packedGitLimit = 1m
    preloadIndex = false
    [diff]
    renameLimit = 1
    i tried to get what those packedgit*'s do and i understand that they are caches. and somehow mmap is involved. dovecot also uses mmap which i disabled as things started to die and i nearly lost the sshd. tho i have software watchdog configured to reboot if all userland dies. seems to be enough as kernel never freezes
    what i want to, or maybe we could have it as default, which one can override, is to limit zfs appearant ability to exhaust all kernel memory. doesn't matter how much ram machine has, i think if kernel aready occupies 95% of ram, now it's the time to slow down zfs operations
    i'm used to this gradual slowdown. but apparently with zfs it's fast till the end. i wonder how it stops after filling all the ram tho, did the demand just stop due userland got killed off or would it eventually panic the kernel. and how to test it? in-kernel iscsi?
    unsure how many would ever need this filesystem "battleshort" mode
    that's a bug, right? i don't think the low ram or slow io would justify the failure either
    i can think of nasty failure mode where vm with not uncommonly low ram could experience temporary io slowdown due bottlenecking somewhere and this could kill the whatever important thing the userland program is doing and instead usual lockup which could resolve one could just have immediate data loss
    io is wild guess too
    why is this ever an issue i don't even know. this is standard setup
    you could blame me for not having tested this on anything other than 13 but at one case i had 100% success rate of taking system down with git. fresh boot, pull, it's gone. that's not what one expect even from low power systems
    zfs has appealing options for embedded too, like copies=3, compression, etc. even if it was originally designed not to (but why?), why can't it just pause io. all other fses do this. there are number of real reasons why to do this
    seems like zfs has serious issues of saying no and would rather die first
    it's probably first time ever in my life seeing this
    what's the fix or workaround i don't know
    i get that zfs read and especially write operations are extremely complex but can't they like wait or something? just like iirc arc is limited to 60% of ram by default as more might be too insane. zfs could complete whatever atomic operation it needs to do and then just don't take new. io would stall but it could resolve
    or at least tell me i'm wrong and problem is elsewhere
    feels similar than this tho...
    --
    Posted automagically by a mail2news gateway at muc.de e.V.
    Please direct questions, flames, donations, etc. to news-admin@muc.de
    --- Synchronet 3.21a-Linux NewsLink 1.2