Forum: Too Lazy BBS

Who's Online

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	23
Nodes:	6 (0 / 6)
Uptime:	46:59:22
Calls:	583
Files:	1,138
Messages:	111,072

FYI; 14.3: A discord report of Wired Memory growing to 17 GiBytes over something like 60 days; ARC shrinks to, say, 1942 MiBytes

From Mark Millard@marklmi@yahoo.com to muc.lists.freebsd.stable on Mon Aug 11 13:18:47 2025

From Newsgroup: muc.lists.freebsd.stable

Context reported by notafet :
14.3-RELEASE-p2 GENERIC on amd64 with ZFS in use
RAM: looks to be 24 GiBytes, not explicitly mentioned
SWAP: 8192 MiBytes
(From using the image of top's figures.)
Wired: 17 GiBytes
ARC Total: 1942 MiBytes
SWAP used: 1102 MiBytes
The link to the storage channel's message is: https://discord.com/channels/727023752348434432/757305697527398481/1404367777904463914
===
Mark Millard
marklmi at yahoo.com
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21a-Linux NewsLink 1.2

From Darrin Smith@beldin@beldin.org to muc.lists.freebsd.stable on Tue Aug 12 16:10:32 2025

From Newsgroup: muc.lists.freebsd.stable

On Mon, 11 Aug 2025 13:18:47 -0700
Mark Millard <marklmi@yahoo.com> wrote:

Context reported by notafet :

14.3-RELEASE-p2 GENERIC on amd64 with ZFS in use

RAM: looks to be 24 GiBytes, not explicitly mentioned
SWAP: 8192 MiBytes
(From using the image of top's figures.)

Wired: 17 GiBytes
ARC Total: 1942 MiBytes

SWAP used: 1102 MiBytes

The link to the storage channel's message is:

https://discord.com/channels/727023752348434432/757305697527398481/1404367777904463914

As of a couple of weeks ago update to latest -CURRENT at the time
(from previous build of a couple of months ago) I've been experiencing a similar but worse case of this. 64G ram on the machine, given 2 days it
will run out of memory and start killing off process, with Wired Memory
growing the whole time (at the rate of about 1G every 5s at some
points). ZFS ARC never passes more then 20G the whole time.

For example right now top reports:

Mem: 1031M Active, 14G Inact, 241M Laundry, 45G Wired, 404M Buf, 2157M
Free ARC: 17G Total, 2883M MFU, 11G MRU, 256K Anon, 167M Header, 3025M

(Time for another reboot or it'll have killed things off when I get
home from work in 8 hours)

I'm still trying to pin down what exactly it's related too hence. I've
pretty much elimiated ZFS being the issue and a large poudriere ports
build or a full build world doesn't seem to push it up. It seems to be
more closely related to how many graphical things are open at once.
I've tried drm-61-kmod and drm-66-kmod for my amdgpu polaris 10 video
card but that seems to make no difference. Next thing I was planning on
trying was switch to the vesa driver and see if that reduces things,
but these things take time to fit in around everything else.

Darrin

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21a-Linux NewsLink 1.2

From beldin@beldin@beldin.org to muc.lists.freebsd.stable on Tue Aug 12 18:50:49 2025

From Newsgroup: muc.lists.freebsd.stable

On Tue, 12 Aug 2025 16:10:32 +0930
Darrin Smith <beldin@beldin.org> wrote:

On Mon, 11 Aug 2025 13:18:47 -0700
Mark Millard <marklmi@yahoo.com> wrote:

Context reported by notafet :

14.3-RELEASE-p2 GENERIC on amd64 with ZFS in use

RAM: looks to be 24 GiBytes, not explicitly mentioned
SWAP: 8192 MiBytes
(From using the image of top's figures.)

Wired: 17 GiBytes
ARC Total: 1942 MiBytes

SWAP used: 1102 MiBytes

The link to the storage channel's message is:

https://discord.com/channels/727023752348434432/757305697527398481/1404367777904463914

As of a couple of weeks ago update to latest -CURRENT at the time
(from previous build of a couple of months ago) I've been
experiencing a similar but worse case of this. 64G ram on the
machine, given 2 days it will run out of memory and start killing off process, with Wired Memory growing the whole time (at the rate of
about 1G every 5s at some points). ZFS ARC never passes more then 20G
the whole time.

Correction here, 1*M* every 5s...It lasts a day or two at least :D

For example right now top reports:

Mem: 1031M Active, 14G Inact, 241M Laundry, 45G Wired, 404M Buf, 2157M
Free ARC: 17G Total, 2883M MFU, 11G MRU, 256K Anon, 167M Header, 3025M

(Time for another reboot or it'll have killed things off when I get
home from work in 8 hours)

I'm still trying to pin down what exactly it's related too hence. I've
pretty much elimiated ZFS being the issue and a large poudriere ports
build or a full build world doesn't seem to push it up. It seems to be
more closely related to how many graphical things are open at once.
I've tried drm-61-kmod and drm-66-kmod for my amdgpu polaris 10 video
card but that seems to make no difference. Next thing I was planning
on trying was switch to the vesa driver and see if that reduces
things, but these things take time to fit in around everything else.

Darrin

Well removing amdgpu altogether made no difference. Still climbing
Wired (I only suspected it was that because it climbed sharpest when I
was logged in). However I have noticed that there is no noticable growth
when using a local login (on ZFS), it's only the NFS based users that
seem to be causing the wired to climb sharply.

Darrin

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21a-Linux NewsLink 1.2

From Konstantin Belousov@kostikbel@gmail.com to muc.lists.freebsd.stable on Tue Aug 12 12:57:39 2025

From Newsgroup: muc.lists.freebsd.stable

On Tue, Aug 12, 2025 at 06:50:49PM +0930, beldin@beldin.org wrote:

On Tue, 12 Aug 2025 16:10:32 +0930
Darrin Smith <beldin@beldin.org> wrote:

On Mon, 11 Aug 2025 13:18:47 -0700
Mark Millard <marklmi@yahoo.com> wrote:

Context reported by notafet :

14.3-RELEASE-p2 GENERIC on amd64 with ZFS in use

RAM: looks to be 24 GiBytes, not explicitly mentioned
SWAP: 8192 MiBytes
(From using the image of top's figures.)

Wired: 17 GiBytes
ARC Total: 1942 MiBytes

SWAP used: 1102 MiBytes

The link to the storage channel's message is:

https://discord.com/channels/727023752348434432/757305697527398481/1404367777904463914

As of a couple of weeks ago update to latest -CURRENT at the time
(from previous build of a couple of months ago) I've been
experiencing a similar but worse case of this. 64G ram on the
machine, given 2 days it will run out of memory and start killing off process, with Wired Memory growing the whole time (at the rate of
about 1G every 5s at some points). ZFS ARC never passes more then 20G
the whole time.

Correction here, 1*M* every 5s...It lasts a day or two at least :D

For example right now top reports:

Mem: 1031M Active, 14G Inact, 241M Laundry, 45G Wired, 404M Buf, 2157M
Free ARC: 17G Total, 2883M MFU, 11G MRU, 256K Anon, 167M Header, 3025M

(Time for another reboot or it'll have killed things off when I get
home from work in 8 hours)

I'm still trying to pin down what exactly it's related too hence. I've pretty much elimiated ZFS being the issue and a large poudriere ports
build or a full build world doesn't seem to push it up. It seems to be
more closely related to how many graphical things are open at once.
I've tried drm-61-kmod and drm-66-kmod for my amdgpu polaris 10 video
card but that seems to make no difference. Next thing I was planning
on trying was switch to the vesa driver and see if that reduces
things, but these things take time to fit in around everything else.

Darrin

Well removing amdgpu altogether made no difference. Still climbing
Wired (I only suspected it was that because it climbed sharpest when I
was logged in). However I have noticed that there is no noticable growth
when using a local login (on ZFS), it's only the NFS based users that
seem to be causing the wired to climb sharply.

Start looking at differences in periodic shots of vmstat -z and vmstat -m.
It would not catch direct page allocators.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21a-Linux NewsLink 1.2

From Rick Macklem@rick.macklem@gmail.com to muc.lists.freebsd.stable on Tue Aug 12 07:44:45 2025

From Newsgroup: muc.lists.freebsd.stable

On Tue, Aug 12, 2025 at 2:21rC>AM <beldin@beldin.org> wrote:

On Tue, 12 Aug 2025 16:10:32 +0930
Darrin Smith <beldin@beldin.org> wrote:

On Mon, 11 Aug 2025 13:18:47 -0700
Mark Millard <marklmi@yahoo.com> wrote:

Context reported by notafet :

14.3-RELEASE-p2 GENERIC on amd64 with ZFS in use

RAM: looks to be 24 GiBytes, not explicitly mentioned
SWAP: 8192 MiBytes
(From using the image of top's figures.)

Wired: 17 GiBytes
ARC Total: 1942 MiBytes

SWAP used: 1102 MiBytes

The link to the storage channel's message is:

https://discord.com/channels/727023752348434432/757305697527398481/1404367777904463914

As of a couple of weeks ago update to latest -CURRENT at the time
(from previous build of a couple of months ago) I've been
experiencing a similar but worse case of this. 64G ram on the
machine, given 2 days it will run out of memory and start killing off process, with Wired Memory growing the whole time (at the rate of
about 1G every 5s at some points). ZFS ARC never passes more then 20G
the whole time.

Correction here, 1*M* every 5s...It lasts a day or two at least :D

For example right now top reports:

Mem: 1031M Active, 14G Inact, 241M Laundry, 45G Wired, 404M Buf, 2157M
Free ARC: 17G Total, 2883M MFU, 11G MRU, 256K Anon, 167M Header, 3025M

(Time for another reboot or it'll have killed things off when I get
home from work in 8 hours)

I'm still trying to pin down what exactly it's related too hence. I've pretty much elimiated ZFS being the issue and a large poudriere ports
build or a full build world doesn't seem to push it up. It seems to be
more closely related to how many graphical things are open at once.
I've tried drm-61-kmod and drm-66-kmod for my amdgpu polaris 10 video
card but that seems to make no difference. Next thing I was planning
on trying was switch to the vesa driver and see if that reduces
things, but these things take time to fit in around everything else.

Darrin

Well removing amdgpu altogether made no difference. Still climbing
Wired (I only suspected it was that because it climbed sharpest when I
was logged in). However I have noticed that there is no noticable growth
when using a local login (on ZFS), it's only the NFS based users that
seem to be causing the wired to climb sharply.

This might be a hint. NFS uses metadata heavily. I'm not a ZFS guy,
but I vaguely recall there is a way to control how much metadata gets
cached. Limiting the metadata portion of the ARC might help?
rick

Darrin

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21a-Linux NewsLink 1.2

From Darrin Smith@beldin@beldin.org to muc.lists.freebsd.stable on Wed Aug 13 17:16:51 2025

From Newsgroup: muc.lists.freebsd.stable

On Tue, 12 Aug 2025 12:57:39 +0300
Konstantin Belousov <kostikbel@gmail.com> wrote:

Start looking at differences in periodic shots of vmstat -z and
vmstat -m. It would not catch direct page allocators.

Ok, I hope I'm reading these outputs correctlty...
Looking at vmstat -z I am assuming the 'size' column shows the size of
each malloc bucket and the used indicates the number of buckets used?
(A quick look at vmstat.c pointing me to memstat_get_* suggests I'm on
the right track) This results in numbers around the right order of
magnitude to match my memory.

I have noticed with 3 samples over the last 18 hours (in which time it
looks like about 1/2 of my memory is now wired, which seems a little execessive, especially considering ZFS is only using about 6 1/2G
accoding to top:

Mem: 1568M Active, 12G Inact, 656M Laundry, 36G Wired, 994M Buf, 12G
Free ARC: 6645M Total, 3099M MFU, 2617M MRU, 768K Anon, 49M Header, 877M
4995M Compressed, 5803M Uncompressed, 1.16:1 Ratio Swap:
8192M Total, 198M Used, 7993M Free, 2% Inuse

In the middle of this rang I was building about 1000 packages in
poudriere so it's been busy.

Interestingly the ZFS ARC size has actually dropped since 9 hours ago
when I took the 2nd measurement (was about 15G then) but that was at
the height of the build and suggests the ARC is expiring older stuff
happily.

So assuming the used * size is correct I saw the following big changes
in vmstat -z:

vm_page:

18 hours ago (before build): 18159063040, 25473990656

9 hours ago (during build) : 27994304512, 29363249152
delta : +9835241472, +3889258496

recent sample : 14337658880, 35773743104
delta : -13656645632, +6410493952

NAMEI:

18 hours ago: 2 267 478 016

9 hours ago : 13 991 848 960
delta : +11 724 370 944

recent sample: 24 441 244 672
delta : +10 449 395 712

zfs_znode_cache:

18 hours ago: 370777296

9 hours ago : 975800816
delta : +605023520

recent sample: 156404656
delta : -819396160

VNODE:

18 hours ago: 440384120

9 hours ago : 952734200
delta : +512350080

recent sample: 159528160
delta : -793206040

Everything else comes out to smaller numbers, so I assume it's probably
not them.

If Im getting the numbers right I'm seeing various caches
expiring after the poudriere build finished. But that NAMEI seems to be
growing quite extensively still, don't know if that's expected or not :)

I will keep watching these, and hopefully get a sample after the
machine has started killing processess.

If any gurus would like .xml dumps of the vmstat -z & -m outputs I have
them avaiable (xml easier to import into spreadsheet for me), I can
email them or upload them somewhere suitable.

Darrin
--

=b

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21a-Linux NewsLink 1.2

From Rick Macklem@rick.macklem@gmail.com to muc.lists.freebsd.stable on Wed Aug 13 15:20:02 2025

From Newsgroup: muc.lists.freebsd.stable

On Wed, Aug 13, 2025 at 12:47rC>AM Darrin Smith <beldin@beldin.org> wrote:

On Tue, 12 Aug 2025 12:57:39 +0300
Konstantin Belousov <kostikbel@gmail.com> wrote:

Start looking at differences in periodic shots of vmstat -z and
vmstat -m. It would not catch direct page allocators.

Ok, I hope I'm reading these outputs correctlty...
Looking at vmstat -z I am assuming the 'size' column shows the size of
each malloc bucket and the used indicates the number of buckets used?
(A quick look at vmstat.c pointing me to memstat_get_* suggests I'm on
the right track) This results in numbers around the right order of
magnitude to match my memory.

I have noticed with 3 samples over the last 18 hours (in which time it
looks like about 1/2 of my memory is now wired, which seems a little execessive, especially considering ZFS is only using about 6 1/2G
accoding to top:

Mem: 1568M Active, 12G Inact, 656M Laundry, 36G Wired, 994M Buf, 12G
Free ARC: 6645M Total, 3099M MFU, 2617M MRU, 768K Anon, 49M Header, 877M
4995M Compressed, 5803M Uncompressed, 1.16:1 Ratio Swap:
8192M Total, 198M Used, 7993M Free, 2% Inuse

In the middle of this rang I was building about 1000 packages in
poudriere so it's been busy.

Interestingly the ZFS ARC size has actually dropped since 9 hours ago
when I took the 2nd measurement (was about 15G then) but that was at
the height of the build and suggests the ARC is expiring older stuff
happily.

So assuming the used * size is correct I saw the following big changes
in vmstat -z:

vm_page:

18 hours ago (before build): 18159063040, 25473990656

9 hours ago (during build) : 27994304512, 29363249152
delta : +9835241472, +3889258496

recent sample : 14337658880, 35773743104
delta : -13656645632, +6410493952

NAMEI:

18 hours ago: 2 267 478 016

9 hours ago : 13 991 848 960
delta : +11 724 370 944

recent sample: 24 441 244 672
delta : +10 449 395 712

zfs_znode_cache:

18 hours ago: 370777296

9 hours ago : 975800816
delta : +605023520

recent sample: 156404656
delta : -819396160

VNODE:

18 hours ago: 440384120

9 hours ago : 952734200
delta : +512350080

recent sample: 159528160
delta : -793206040

Everything else comes out to smaller numbers, so I assume it's probably
not them.

If Im getting the numbers right I'm seeing various caches
expiring after the poudriere build finished. But that NAMEI seems to be growing quite extensively still, don't know if that's expected or not :)

Are you running the nfsd?
I ask because there might be a pretty basic blunder in the NFS server.
There several places where the NFS server code calls namei() and
they don't do a NDFREE_PNBUF() after the call.
All but one of them is related to the pNFS server, so it would not
affect anyone (no one uses it), but one of them is used to update the
V4 export list (a function called nfsrv_v4rootexport()).
So Kostik, should there be a NDFREE_PNBUF() after a successful
namei() call to get rid of the buffer?
rick

I will keep watching these, and hopefully get a sample after the
machine has started killing processess.

If any gurus would like .xml dumps of the vmstat -z & -m outputs I have
them avaiable (xml easier to import into spreadsheet for me), I can
email them or upload them somewhere suitable.

Darrin

--

=b

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21a-Linux NewsLink 1.2

From Rick Macklem@rick.macklem@gmail.com to muc.lists.freebsd.stable on Wed Aug 13 17:42:55 2025

From Newsgroup: muc.lists.freebsd.stable

On Wed, Aug 13, 2025 at 3:20rC>PM Rick Macklem <rick.macklem@gmail.com> wrote:

On Wed, Aug 13, 2025 at 12:47rC>AM Darrin Smith <beldin@beldin.org> wrote:

On Tue, 12 Aug 2025 12:57:39 +0300
Konstantin Belousov <kostikbel@gmail.com> wrote:

Start looking at differences in periodic shots of vmstat -z and
vmstat -m. It would not catch direct page allocators.

Ok, I hope I'm reading these outputs correctlty...
Looking at vmstat -z I am assuming the 'size' column shows the size of
each malloc bucket and the used indicates the number of buckets used?
(A quick look at vmstat.c pointing me to memstat_get_* suggests I'm on
the right track) This results in numbers around the right order of magnitude to match my memory.

I have noticed with 3 samples over the last 18 hours (in which time it looks like about 1/2 of my memory is now wired, which seems a little execessive, especially considering ZFS is only using about 6 1/2G
accoding to top:

Mem: 1568M Active, 12G Inact, 656M Laundry, 36G Wired, 994M Buf, 12G
Free ARC: 6645M Total, 3099M MFU, 2617M MRU, 768K Anon, 49M Header, 877M
4995M Compressed, 5803M Uncompressed, 1.16:1 Ratio Swap:
8192M Total, 198M Used, 7993M Free, 2% Inuse

In the middle of this rang I was building about 1000 packages in
poudriere so it's been busy.

Interestingly the ZFS ARC size has actually dropped since 9 hours ago
when I took the 2nd measurement (was about 15G then) but that was at
the height of the build and suggests the ARC is expiring older stuff happily.

So assuming the used * size is correct I saw the following big changes
in vmstat -z:

vm_page:

18 hours ago (before build): 18159063040, 25473990656

9 hours ago (during build) : 27994304512, 29363249152
delta : +9835241472, +3889258496

recent sample : 14337658880, 35773743104
delta : -13656645632, +6410493952

NAMEI:

18 hours ago: 2 267 478 016

9 hours ago : 13 991 848 960
delta : +11 724 370 944

recent sample: 24 441 244 672
delta : +10 449 395 712

zfs_znode_cache:

18 hours ago: 370777296

9 hours ago : 975800816
delta : +605023520

recent sample: 156404656
delta : -819396160

VNODE:

18 hours ago: 440384120

9 hours ago : 952734200
delta : +512350080

recent sample: 159528160
delta : -793206040

Everything else comes out to smaller numbers, so I assume it's probably
not them.

If Im getting the numbers right I'm seeing various caches
expiring after the poudriere build finished. But that NAMEI seems to be growing quite extensively still, don't know if that's expected or not :)

Are you running the nfsd?

I ask because there might be a pretty basic blunder in the NFS server.
There several places where the NFS server code calls namei() and
they don't do a NDFREE_PNBUF() after the call.
All but one of them is related to the pNFS server, so it would not
affect anyone (no one uses it), but one of them is used to update the
V4 export list (a function called nfsrv_v4rootexport()).

So Kostik, should there be a NDFREE_PNBUF() after a successful
namei() call to get rid of the buffer?

So, I basically answered the question myself. After mjg@'s commit
on Sep. 17, 2022 (5b5b7e2 in main), the buffer is always saved
unless there is an error return.
The "vmstat -z | fgrep NAMEI" count does increase by one each
time I send a SIGHUP to mountd.
This is fixed by adding a NDFREE_PNBUF().
However, one buffer each time exports are reloaded probably is
not the leak you guys are looking for.
rick

rick

I will keep watching these, and hopefully get a sample after the
machine has started killing processess.

If any gurus would like .xml dumps of the vmstat -z & -m outputs I have them avaiable (xml easier to import into spreadsheet for me), I can
email them or upload them somewhere suitable.

Darrin

--

=b

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21a-Linux NewsLink 1.2

From Rick Macklem@rick.macklem@gmail.com to muc.lists.freebsd.stable on Wed Aug 13 21:35:19 2025

From Newsgroup: muc.lists.freebsd.stable

On Wed, Aug 13, 2025 at 7:39rC>PM Konstantin Belousov <kostikbel@gmail.com> wrote:

On Wed, Aug 13, 2025 at 05:42:55PM -0700, Rick Macklem wrote:

On Wed, Aug 13, 2025 at 3:20rC>PM Rick Macklem <rick.macklem@gmail.com> wrote:

On Wed, Aug 13, 2025 at 12:47rC>AM Darrin Smith <beldin@beldin.org> wrote:

On Tue, 12 Aug 2025 12:57:39 +0300
Konstantin Belousov <kostikbel@gmail.com> wrote:

Start looking at differences in periodic shots of vmstat -z and vmstat -m. It would not catch direct page allocators.

Ok, I hope I'm reading these outputs correctlty...
Looking at vmstat -z I am assuming the 'size' column shows the size of each malloc bucket and the used indicates the number of buckets used? (A quick look at vmstat.c pointing me to memstat_get_* suggests I'm on the right track) This results in numbers around the right order of magnitude to match my memory.

I have noticed with 3 samples over the last 18 hours (in which time it looks like about 1/2 of my memory is now wired, which seems a little execessive, especially considering ZFS is only using about 6 1/2G accoding to top:

Mem: 1568M Active, 12G Inact, 656M Laundry, 36G Wired, 994M Buf, 12G Free ARC: 6645M Total, 3099M MFU, 2617M MRU, 768K Anon, 49M Header, 877M
4995M Compressed, 5803M Uncompressed, 1.16:1 Ratio Swap: 8192M Total, 198M Used, 7993M Free, 2% Inuse

In the middle of this rang I was building about 1000 packages in poudriere so it's been busy.

Interestingly the ZFS ARC size has actually dropped since 9 hours ago when I took the 2nd measurement (was about 15G then) but that was at the height of the build and suggests the ARC is expiring older stuff happily.

So assuming the used * size is correct I saw the following big changes in vmstat -z:

vm_page:

18 hours ago (before build): 18159063040, 25473990656

9 hours ago (during build) : 27994304512, 29363249152
delta : +9835241472, +3889258496

recent sample : 14337658880, 35773743104
delta : -13656645632, +6410493952

NAMEI:

18 hours ago: 2 267 478 016

9 hours ago : 13 991 848 960
delta : +11 724 370 944

recent sample: 24 441 244 672
delta : +10 449 395 712

zfs_znode_cache:

18 hours ago: 370777296

9 hours ago : 975800816
delta : +605023520

recent sample: 156404656
delta : -819396160

VNODE:

18 hours ago: 440384120

9 hours ago : 952734200
delta : +512350080

recent sample: 159528160
delta : -793206040

Everything else comes out to smaller numbers, so I assume it's probably not them.

If Im getting the numbers right I'm seeing various caches
expiring after the poudriere build finished. But that NAMEI seems to be growing quite extensively still, don't know if that's expected or not :)

Are you running the nfsd?

I ask because there might be a pretty basic blunder in the NFS server. There several places where the NFS server code calls namei() and
they don't do a NDFREE_PNBUF() after the call.
All but one of them is related to the pNFS server, so it would not
affect anyone (no one uses it), but one of them is used to update the
V4 export list (a function called nfsrv_v4rootexport()).

So Kostik, should there be a NDFREE_PNBUF() after a successful
namei() call to get rid of the buffer?

So, I basically answered the question myself. After mjg@'s commit
on Sep. 17, 2022 (5b5b7e2 in main), the buffer is always saved
unless there is an error return.

YYes.

The "vmstat -z | fgrep NAMEI" count does increase by one each
time I send a SIGHUP to mountd.
This is fixed by adding a NDFREE_PNBUF().

However, one buffer each time exports are reloaded probably is
not the leak you guys are looking for.

Definitely.

I am not sure what they reported (instead of raw output some
interpretation was provided), but so far it seems just the normal vnode caching. Perhaps they can compare the number of vnode allocated against
the cap kern.maxvnodes. The allocation number should not exceed the
maxvnodes significantly.

Peter Eriksson posted this to me a little while ago...
I wish I could upgrade our front-end servers from FreeBSD 13.5 btw -
but there is a very troublesome issue with ZFS on FreeBSD 14+ -
sometimes it runs amok and basically uses up all available RAM - and
then the system load goes thru the roof and the machine basically
grinds to a hold for _long_ periods - happens when we run our backup
rsync jobs.
https://github.com/openzfs/zfs/issues/17052
rick
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21a-Linux NewsLink 1.2

From Darrin Smith@beldin@beldin.org to muc.lists.freebsd.stable on Thu Aug 14 17:06:32 2025

From Newsgroup: muc.lists.freebsd.stable

On Thu, 14 Aug 2025 05:38:00 +0300
Konstantin Belousov <kostikbel@gmail.com> wrote:

On Wed, Aug 13, 2025 at 05:42:55PM -0700, Rick Macklem wrote:

On Wed, Aug 13, 2025 at 3:20rC>PM Rick Macklem
<rick.macklem@gmail.com> wrote:

On Wed, Aug 13, 2025 at 12:47rC>AM Darrin Smith <beldin@beldin.org> wrote:

On Tue, 12 Aug 2025 12:57:39 +0300
Konstantin Belousov <kostikbel@gmail.com> wrote:

Start looking at differences in periodic shots of vmstat -z
and vmstat -m. It would not catch direct page allocators.

Ok, I hope I'm reading these outputs correctlty...
Looking at vmstat -z I am assuming the 'size' column shows the
size of each malloc bucket and the used indicates the number of
buckets used? (A quick look at vmstat.c pointing me to
memstat_get_* suggests I'm on the right track) This results in
numbers around the right order of magnitude to match my memory.

I have noticed with 3 samples over the last 18 hours (in which
time it looks like about 1/2 of my memory is now wired, which
seems a little execessive, especially considering ZFS is only
using about 6 1/2G accoding to top:

Mem: 1568M Active, 12G Inact, 656M Laundry, 36G Wired, 994M
Buf, 12G Free ARC: 6645M Total, 3099M MFU, 2617M MRU, 768K
Anon, 49M Header, 877M 4995M Compressed, 5803M Uncompressed,
1.16:1 Ratio Swap: 8192M Total, 198M Used, 7993M Free, 2% Inuse

In the middle of this rang I was building about 1000 packages in poudriere so it's been busy.

Interestingly the ZFS ARC size has actually dropped since 9
hours ago when I took the 2nd measurement (was about 15G then)
but that was at the height of the build and suggests the ARC is expiring older stuff happily.

So assuming the used * size is correct I saw the following big
changes in vmstat -z:

vm_page:

18 hours ago (before build): 18159063040, 25473990656

9 hours ago (during build) : 27994304512, 29363249152
delta : +9835241472, +3889258496

recent sample : 14337658880, 35773743104
delta : -13656645632, +6410493952

NAMEI:

18 hours ago: 2 267 478 016

9 hours ago : 13 991 848 960
delta : +11 724 370 944

recent sample: 24 441 244 672
delta : +10 449 395 712

zfs_znode_cache:

18 hours ago: 370777296

9 hours ago : 975800816
delta : +605023520

recent sample: 156404656
delta : -819396160

VNODE:

18 hours ago: 440384120

9 hours ago : 952734200
delta : +512350080

recent sample: 159528160
delta : -793206040

Everything else comes out to smaller numbers, so I assume it's
probably not them.

If Im getting the numbers right I'm seeing various caches
expiring after the poudriere build finished. But that NAMEI
seems to be growing quite extensively still, don't know if
that's expected or not :)

Are you running the nfsd?

I ask because there might be a pretty basic blunder in the NFS
server. There several places where the NFS server code calls
namei() and they don't do a NDFREE_PNBUF() after the call.
All but one of them is related to the pNFS server, so it would not
affect anyone (no one uses it), but one of them is used to update
the V4 export list (a function called nfsrv_v4rootexport()).

So Kostik, should there be a NDFREE_PNBUF() after a successful
namei() call to get rid of the buffer?

So, I basically answered the question myself. After mjg@'s commit
on Sep. 17, 2022 (5b5b7e2 in main), the buffer is always saved
unless there is an error return.

YYes.

The "vmstat -z | fgrep NAMEI" count does increase by one each
time I send a SIGHUP to mountd.
This is fixed by adding a NDFREE_PNBUF().

However, one buffer each time exports are reloaded probably is
not the leak you guys are looking for.

Definitely.

I am not sure what they reported (instead of raw output some
interpretation was provided), but so far it seems just the normal
vnode caching. Perhaps they can compare the number of vnode allocated
against the cap kern.maxvnodes. The allocation number should not
exceed the maxvnodes significantly.

I appologise for not just pasting the direct dumps, but I didn't
think multiple 246 line files would be appreciated.
All the raw data is at http://files.beldin.org/logs.
Unfortunately a power outage occured here before I was able to reach
the memory exhaustion level so I will have to wait another
approximately 2 days to hit the problem again.
Darrin
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21a-Linux NewsLink 1.2

From Rick Macklem@rick.macklem@gmail.com to muc.lists.freebsd.stable on Thu Aug 14 08:33:44 2025

From Newsgroup: muc.lists.freebsd.stable

On Thu, Aug 14, 2025 at 12:37rC>AM Darrin Smith <beldin@beldin.org> wrote:

On Thu, 14 Aug 2025 05:38:00 +0300
Konstantin Belousov <kostikbel@gmail.com> wrote:

On Wed, Aug 13, 2025 at 05:42:55PM -0700, Rick Macklem wrote:

On Wed, Aug 13, 2025 at 3:20rC>PM Rick Macklem
<rick.macklem@gmail.com> wrote:

On Wed, Aug 13, 2025 at 12:47rC>AM Darrin Smith <beldin@beldin.org> wrote:

On Tue, 12 Aug 2025 12:57:39 +0300
Konstantin Belousov <kostikbel@gmail.com> wrote:

Start looking at differences in periodic shots of vmstat -z
and vmstat -m. It would not catch direct page allocators.

Ok, I hope I'm reading these outputs correctlty...
Looking at vmstat -z I am assuming the 'size' column shows the
size of each malloc bucket and the used indicates the number of buckets used? (A quick look at vmstat.c pointing me to
memstat_get_* suggests I'm on the right track) This results in numbers around the right order of magnitude to match my memory.

I have noticed with 3 samples over the last 18 hours (in which
time it looks like about 1/2 of my memory is now wired, which
seems a little execessive, especially considering ZFS is only
using about 6 1/2G accoding to top:

Mem: 1568M Active, 12G Inact, 656M Laundry, 36G Wired, 994M
Buf, 12G Free ARC: 6645M Total, 3099M MFU, 2617M MRU, 768K
Anon, 49M Header, 877M 4995M Compressed, 5803M Uncompressed,
1.16:1 Ratio Swap: 8192M Total, 198M Used, 7993M Free, 2% Inuse

In the middle of this rang I was building about 1000 packages in poudriere so it's been busy.

Interestingly the ZFS ARC size has actually dropped since 9
hours ago when I took the 2nd measurement (was about 15G then)
but that was at the height of the build and suggests the ARC is expiring older stuff happily.

So assuming the used * size is correct I saw the following big changes in vmstat -z:

vm_page:

18 hours ago (before build): 18159063040, 25473990656

9 hours ago (during build) : 27994304512, 29363249152
delta : +9835241472, +3889258496

recent sample : 14337658880, 35773743104
delta : -13656645632, +6410493952

NAMEI:

18 hours ago: 2 267 478 016

9 hours ago : 13 991 848 960
delta : +11 724 370 944

recent sample: 24 441 244 672
delta : +10 449 395 712

zfs_znode_cache:

18 hours ago: 370777296

9 hours ago : 975800816
delta : +605023520

recent sample: 156404656
delta : -819396160

VNODE:

18 hours ago: 440384120

9 hours ago : 952734200
delta : +512350080

recent sample: 159528160
delta : -793206040

Everything else comes out to smaller numbers, so I assume it's probably not them.

If Im getting the numbers right I'm seeing various caches
expiring after the poudriere build finished. But that NAMEI
seems to be growing quite extensively still, don't know if
that's expected or not :)

Are you running the nfsd?

I ask because there might be a pretty basic blunder in the NFS
server. There several places where the NFS server code calls
namei() and they don't do a NDFREE_PNBUF() after the call.
All but one of them is related to the pNFS server, so it would not affect anyone (no one uses it), but one of them is used to update
the V4 export list (a function called nfsrv_v4rootexport()).

So Kostik, should there be a NDFREE_PNBUF() after a successful
namei() call to get rid of the buffer?

So, I basically answered the question myself. After mjg@'s commit
on Sep. 17, 2022 (5b5b7e2 in main), the buffer is always saved
unless there is an error return.

YYes.

The "vmstat -z | fgrep NAMEI" count does increase by one each
time I send a SIGHUP to mountd.
This is fixed by adding a NDFREE_PNBUF().

However, one buffer each time exports are reloaded probably is
not the leak you guys are looking for.

Definitely.

I am not sure what they reported (instead of raw output some
interpretation was provided), but so far it seems just the normal
vnode caching. Perhaps they can compare the number of vnode allocated against the cap kern.maxvnodes. The allocation number should not
exceed the maxvnodes significantly.

I appologise for not just pasting the direct dumps, but I didn't
think multiple 246 line files would be appreciated.

All the raw data is at http://files.beldin.org/logs.

Unfortunately a power outage occured here before I was able to reach
the memory exhaustion level so I will have to wait another
approximately 2 days to hit the problem again.

So, in your case it looks like a NAMEI leak (buffers for file paths being looked up).
It is up to about 10million by the last log entry, whereas vnodes are at
77K and have gone down from previous entries.
It wasn't clear to me what version of FreeBSD you are running and
what was changed when you started experiencing this.
Maybe you can post with some info?
rick

Darrin

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21a-Linux NewsLink 1.2

From Rick Macklem@rick.macklem@gmail.com to muc.lists.freebsd.stable on Thu Aug 14 09:13:57 2025

From Newsgroup: muc.lists.freebsd.stable

On Wed, Aug 13, 2025 at 9:35rC>PM Rick Macklem <rick.macklem@gmail.com> wrote:

On Wed, Aug 13, 2025 at 7:39rC>PM Konstantin Belousov <kostikbel@gmail.com> wrote:

On Wed, Aug 13, 2025 at 05:42:55PM -0700, Rick Macklem wrote:

On Wed, Aug 13, 2025 at 3:20rC>PM Rick Macklem <rick.macklem@gmail.com> wrote:

On Wed, Aug 13, 2025 at 12:47rC>AM Darrin Smith <beldin@beldin.org> wrote:

On Tue, 12 Aug 2025 12:57:39 +0300
Konstantin Belousov <kostikbel@gmail.com> wrote:

Start looking at differences in periodic shots of vmstat -z and vmstat -m. It would not catch direct page allocators.

Ok, I hope I'm reading these outputs correctlty...
Looking at vmstat -z I am assuming the 'size' column shows the size of
each malloc bucket and the used indicates the number of buckets used? (A quick look at vmstat.c pointing me to memstat_get_* suggests I'm on
the right track) This results in numbers around the right order of magnitude to match my memory.

I have noticed with 3 samples over the last 18 hours (in which time it
looks like about 1/2 of my memory is now wired, which seems a little execessive, especially considering ZFS is only using about 6 1/2G accoding to top:

Mem: 1568M Active, 12G Inact, 656M Laundry, 36G Wired, 994M Buf, 12G Free ARC: 6645M Total, 3099M MFU, 2617M MRU, 768K Anon, 49M Header, 877M
4995M Compressed, 5803M Uncompressed, 1.16:1 Ratio Swap: 8192M Total, 198M Used, 7993M Free, 2% Inuse

In the middle of this rang I was building about 1000 packages in poudriere so it's been busy.

Interestingly the ZFS ARC size has actually dropped since 9 hours ago when I took the 2nd measurement (was about 15G then) but that was at the height of the build and suggests the ARC is expiring older stuff happily.

So assuming the used * size is correct I saw the following big changes
in vmstat -z:

vm_page:

18 hours ago (before build): 18159063040, 25473990656

9 hours ago (during build) : 27994304512, 29363249152
delta : +9835241472, +3889258496

recent sample : 14337658880, 35773743104
delta : -13656645632, +6410493952

NAMEI:

18 hours ago: 2 267 478 016

9 hours ago : 13 991 848 960
delta : +11 724 370 944

recent sample: 24 441 244 672
delta : +10 449 395 712

zfs_znode_cache:

18 hours ago: 370777296

9 hours ago : 975800816
delta : +605023520

recent sample: 156404656
delta : -819396160

VNODE:

18 hours ago: 440384120

9 hours ago : 952734200
delta : +512350080

recent sample: 159528160
delta : -793206040

Everything else comes out to smaller numbers, so I assume it's probably
not them.

If Im getting the numbers right I'm seeing various caches
expiring after the poudriere build finished. But that NAMEI seems to be
growing quite extensively still, don't know if that's expected or not :)

Are you running the nfsd?

I ask because there might be a pretty basic blunder in the NFS server. There several places where the NFS server code calls namei() and
they don't do a NDFREE_PNBUF() after the call.
All but one of them is related to the pNFS server, so it would not affect anyone (no one uses it), but one of them is used to update the V4 export list (a function called nfsrv_v4rootexport()).

So Kostik, should there be a NDFREE_PNBUF() after a successful
namei() call to get rid of the buffer?

So, I basically answered the question myself. After mjg@'s commit
on Sep. 17, 2022 (5b5b7e2 in main), the buffer is always saved
unless there is an error return.

YYes.

The "vmstat -z | fgrep NAMEI" count does increase by one each
time I send a SIGHUP to mountd.
This is fixed by adding a NDFREE_PNBUF().

However, one buffer each time exports are reloaded probably is
not the leak you guys are looking for.

Definitely.

I am not sure what they reported (instead of raw output some
interpretation was provided), but so far it seems just the normal vnode caching. Perhaps they can compare the number of vnode allocated against
the cap kern.maxvnodes. The allocation number should not exceed the maxvnodes significantly.

Peter Eriksson posted this to me a little while ago...
I wish I could upgrade our front-end servers from FreeBSD 13.5 btw -
but there is a very troublesome issue with ZFS on FreeBSD 14+ -
sometimes it runs amok and basically uses up all available RAM - and
then the system load goes thru the roof and the machine basically
grinds to a hold for _long_ periods - happens when we run our backup
rsync jobs.

Hi Peter,
I posted this snippet of one of your posts to this email thread. (I hope
you don't mind?)
It sounds like you can reproduce the problem with your backup rsync
jobs.
Would it be possible for you to do that while monitoring the stats on
it? Maybe "vmstat -z" , "vmstat -m" and arc stats. (I know diddly about
ZFS, so hopefully someone else can suggest what. Maybe
"sysctl -a | fgrep arcstats"?)

https://github.com/openzfs/zfs/issues/17052

You referred to this, which is obviously Linux specific.
However, maybe something else needs to be done in
FreeBSD's arc_lowmem()?
rick
ps: I've added Peter and mav@ to the cc list. Hope they don't mind.

rick

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21a-Linux NewsLink 1.2

From Rick Macklem@rick.macklem@gmail.com to muc.lists.freebsd.stable on Thu Aug 14 09:16:27 2025

From Newsgroup: muc.lists.freebsd.stable

On Thu, Aug 14, 2025 at 12:37rC>AM Darrin Smith <beldin@beldin.org> wrote:

On Thu, 14 Aug 2025 05:38:00 +0300
Konstantin Belousov <kostikbel@gmail.com> wrote:

On Wed, Aug 13, 2025 at 05:42:55PM -0700, Rick Macklem wrote:

On Wed, Aug 13, 2025 at 3:20rC>PM Rick Macklem
<rick.macklem@gmail.com> wrote:

On Wed, Aug 13, 2025 at 12:47rC>AM Darrin Smith <beldin@beldin.org> wrote:

On Tue, 12 Aug 2025 12:57:39 +0300
Konstantin Belousov <kostikbel@gmail.com> wrote:

Start looking at differences in periodic shots of vmstat -z
and vmstat -m. It would not catch direct page allocators.

Ok, I hope I'm reading these outputs correctlty...
Looking at vmstat -z I am assuming the 'size' column shows the
size of each malloc bucket and the used indicates the number of buckets used? (A quick look at vmstat.c pointing me to
memstat_get_* suggests I'm on the right track) This results in numbers around the right order of magnitude to match my memory.

I have noticed with 3 samples over the last 18 hours (in which
time it looks like about 1/2 of my memory is now wired, which
seems a little execessive, especially considering ZFS is only
using about 6 1/2G accoding to top:

Mem: 1568M Active, 12G Inact, 656M Laundry, 36G Wired, 994M
Buf, 12G Free ARC: 6645M Total, 3099M MFU, 2617M MRU, 768K
Anon, 49M Header, 877M 4995M Compressed, 5803M Uncompressed,
1.16:1 Ratio Swap: 8192M Total, 198M Used, 7993M Free, 2% Inuse

In the middle of this rang I was building about 1000 packages in poudriere so it's been busy.

Interestingly the ZFS ARC size has actually dropped since 9
hours ago when I took the 2nd measurement (was about 15G then)
but that was at the height of the build and suggests the ARC is expiring older stuff happily.

So assuming the used * size is correct I saw the following big changes in vmstat -z:

vm_page:

18 hours ago (before build): 18159063040, 25473990656

9 hours ago (during build) : 27994304512, 29363249152
delta : +9835241472, +3889258496

recent sample : 14337658880, 35773743104
delta : -13656645632, +6410493952

NAMEI:

18 hours ago: 2 267 478 016

9 hours ago : 13 991 848 960
delta : +11 724 370 944

recent sample: 24 441 244 672
delta : +10 449 395 712

zfs_znode_cache:

18 hours ago: 370777296

9 hours ago : 975800816
delta : +605023520

recent sample: 156404656
delta : -819396160

VNODE:

18 hours ago: 440384120

9 hours ago : 952734200
delta : +512350080

recent sample: 159528160
delta : -793206040

Everything else comes out to smaller numbers, so I assume it's probably not them.

If Im getting the numbers right I'm seeing various caches
expiring after the poudriere build finished. But that NAMEI
seems to be growing quite extensively still, don't know if
that's expected or not :)

Are you running the nfsd?

I ask because there might be a pretty basic blunder in the NFS
server. There several places where the NFS server code calls
namei() and they don't do a NDFREE_PNBUF() after the call.
All but one of them is related to the pNFS server, so it would not affect anyone (no one uses it), but one of them is used to update
the V4 export list (a function called nfsrv_v4rootexport()).

So Kostik, should there be a NDFREE_PNBUF() after a successful
namei() call to get rid of the buffer?

So, I basically answered the question myself. After mjg@'s commit
on Sep. 17, 2022 (5b5b7e2 in main), the buffer is always saved
unless there is an error return.

YYes.

The "vmstat -z | fgrep NAMEI" count does increase by one each
time I send a SIGHUP to mountd.
This is fixed by adding a NDFREE_PNBUF().

However, one buffer each time exports are reloaded probably is
not the leak you guys are looking for.

Definitely.

I am not sure what they reported (instead of raw output some
interpretation was provided), but so far it seems just the normal
vnode caching. Perhaps they can compare the number of vnode allocated against the cap kern.maxvnodes. The allocation number should not
exceed the maxvnodes significantly.

I appologise for not just pasting the direct dumps, but I didn't
think multiple 246 line files would be appreciated.

All the raw data is at http://files.beldin.org/logs.

Unfortunately a power outage occured here before I was able to reach
the memory exhaustion level so I will have to wait another
approximately 2 days to hit the problem again.

A fix for a NAMEI leak was just committed to main.
Maybe you can update your kernel and see if this helps for
your problem?
rick

Darrin

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21a-Linux NewsLink 1.2

From Darrin Smith@beldin@beldin.org to muc.lists.freebsd.stable on Fri Aug 15 07:50:38 2025

From Newsgroup: muc.lists.freebsd.stable

On Thu, 14 Aug 2025 09:16:27 -0700
Rick Macklem <rick.macklem@gmail.com> wrote:

On Thu, Aug 14, 2025 at 12:37rC>AM Darrin Smith <beldin@beldin.org>
wrote:

On Thu, 14 Aug 2025 05:38:00 +0300
Konstantin Belousov <kostikbel@gmail.com> wrote:

On Wed, Aug 13, 2025 at 05:42:55PM -0700, Rick Macklem wrote:

On Wed, Aug 13, 2025 at 3:20rC>PM Rick Macklem
<rick.macklem@gmail.com> wrote:

On Wed, Aug 13, 2025 at 12:47rC>AM Darrin Smith
<beldin@beldin.org> wrote:

On Tue, 12 Aug 2025 12:57:39 +0300
Konstantin Belousov <kostikbel@gmail.com> wrote:

Start looking at differences in periodic shots of vmstat
-z and vmstat -m. It would not catch direct page
allocators.

Ok, I hope I'm reading these outputs correctlty...
Looking at vmstat -z I am assuming the 'size' column shows
the size of each malloc bucket and the used indicates the
number of buckets used? (A quick look at vmstat.c pointing
me to memstat_get_* suggests I'm on the right track) This
results in numbers around the right order of magnitude to
match my memory.

I have noticed with 3 samples over the last 18 hours (in
which time it looks like about 1/2 of my memory is now
wired, which seems a little execessive, especially
considering ZFS is only using about 6 1/2G accoding to top:

Mem: 1568M Active, 12G Inact, 656M Laundry, 36G Wired, 994M
Buf, 12G Free ARC: 6645M Total, 3099M MFU, 2617M MRU, 768K
Anon, 49M Header, 877M 4995M Compressed, 5803M Uncompressed,
1.16:1 Ratio Swap: 8192M Total, 198M Used, 7993M Free, 2%
Inuse

In the middle of this rang I was building about 1000
packages in poudriere so it's been busy.

Interestingly the ZFS ARC size has actually dropped since 9
hours ago when I took the 2nd measurement (was about 15G
then) but that was at the height of the build and suggests
the ARC is expiring older stuff happily.

So assuming the used * size is correct I saw the following
big changes in vmstat -z:

vm_page:

18 hours ago (before build): 18159063040, 25473990656

9 hours ago (during build) : 27994304512, 29363249152
delta : +9835241472, +3889258496

recent sample : 14337658880, 35773743104
delta : -13656645632, +6410493952

NAMEI:

18 hours ago: 2 267 478 016

9 hours ago : 13 991 848 960
delta : +11 724 370 944

recent sample: 24 441 244 672
delta : +10 449 395 712

zfs_znode_cache:

18 hours ago: 370777296

9 hours ago : 975800816
delta : +605023520

recent sample: 156404656
delta : -819396160

VNODE:

18 hours ago: 440384120

9 hours ago : 952734200
delta : +512350080

recent sample: 159528160
delta : -793206040

Everything else comes out to smaller numbers, so I assume
it's probably not them.

If Im getting the numbers right I'm seeing various caches
expiring after the poudriere build finished. But that NAMEI
seems to be growing quite extensively still, don't know if
that's expected or not :)

Are you running the nfsd?

I ask because there might be a pretty basic blunder in the NFS server. There several places where the NFS server code calls
namei() and they don't do a NDFREE_PNBUF() after the call.
All but one of them is related to the pNFS server, so it
would not affect anyone (no one uses it), but one of them is
used to update the V4 export list (a function called nfsrv_v4rootexport()).

So Kostik, should there be a NDFREE_PNBUF() after a successful namei() call to get rid of the buffer?

So, I basically answered the question myself. After mjg@'s
commit on Sep. 17, 2022 (5b5b7e2 in main), the buffer is always
saved unless there is an error return.

YYes.

The "vmstat -z | fgrep NAMEI" count does increase by one each
time I send a SIGHUP to mountd.
This is fixed by adding a NDFREE_PNBUF().

However, one buffer each time exports are reloaded probably is
not the leak you guys are looking for.

Definitely.

I am not sure what they reported (instead of raw output some interpretation was provided), but so far it seems just the normal
vnode caching. Perhaps they can compare the number of vnode
allocated against the cap kern.maxvnodes. The allocation number
should not exceed the maxvnodes significantly.

I appologise for not just pasting the direct dumps, but I didn't
think multiple 246 line files would be appreciated.

All the raw data is at http://files.beldin.org/logs.

Unfortunately a power outage occured here before I was able to reach
the memory exhaustion level so I will have to wait another
approximately 2 days to hit the problem again.

A fix for a NAMEI leak was just committed to main.
Maybe you can update your kernel and see if this helps for
your problem?

rick

Darrin

Sounds like a possibility. I'll set the wheels in motion and see how we
go.
Darrin
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21a-Linux NewsLink 1.2

From Jan Martin Mikkelsen@janm@transactionware.com to muc.lists.freebsd.stable on Fri Aug 15 13:24:13 2025

From Newsgroup: muc.lists.freebsd.stable

--Apple-Mail=_3C0011FE-D283-49D8-821A-B089E53E7292
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=utf-8

On 14 Aug 2025, at 06:35, Rick Macklem <rick.macklem@gmail.com> wrote:

[ =E2=80=A6 ]

Peter Eriksson posted this to me a little while ago...
I wish I could upgrade our front-end servers from FreeBSD 13.5 btw -
but there is a very troublesome issue with ZFS on FreeBSD 14+ -
sometimes it runs amok and basically uses up all available RAM - and
then the system load goes thru the roof and the machine basically
grinds to a hold for _long_ periods - happens when we run our backup
rsync jobs.
=20
https://github.com/openzfs/zfs/issues/17052

I was seeing a problem just like this in the 14.3 betas during heavy = workloads.

The problem was fixed at around BETA4, I believe by this commit. = https://cgit.freebsd.org/src/commit/?h=3Dreleng/14.3&id=3D7a9ea03e4bbfee1b= 2192d9a5b4da89a53d3a2c14

Have you see a similar problem on 14.3?

Regards,

Jan M.

--Apple-Mail=_3C0011FE-D283-49D8-821A-B089E53E7292
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
charset=utf-8

<html><head><meta http-equiv=3D"content-type" content=3D"text/html; = charset=3Dutf-8"></head><body style=3D"overflow-wrap: break-word; = -webkit-nbsp-mode: space; line-break: = after-white-space;"><br><div><blockquote type=3D"cite"><div>On 14 Aug =
2025, at 06:35, Rick Macklem <rick.macklem@gmail.com> = wrote:</div></blockquote>[ =E2=80=A6 ]<br><blockquote = type=3D"cite"><div><div>Peter Eriksson posted this to me a little while = ago...<br> I wish I could upgrade our front-end servers from FreeBSD =
13.5 btw -<br>but there is a very troublesome issue with ZFS on FreeBSD =
14+ -<br>sometimes it runs amok and basically uses up all available RAM =
- and<br>then the system load goes thru the roof and the machine = basically<br>grinds to a hold for _long_ periods - happens when we run =
our backup<br>rsync = jobs.<br><br>https://github.com/openzfs/zfs/issues/17052<br></div></div></= blockquote></div><div><br></div><div><div>I was seeing a problem just =
like this in the 14.3 betas during heavy = workloads.</div><div><br></div><div>The problem was fixed at around =
BETA4, I believe by this commit. <a = href=3D"https://cgit.freebsd.org/src/commit/?h=3Dreleng/14.3&id=3D7a9e= a03e4bbfee1b2192d9a5b4da89a53d3a2c14">https://cgit.freebsd.org/src/commit/= ?h=3Dreleng/14.3&id=3D7a9ea03e4bbfee1b2192d9a5b4da89a53d3a2c14</a></di= v><div><br></div><div>Have you see a similar problem on = 14.3?</div><div><br></div><div>Regards,</div><div><br></div><div>Jan = M.</div><br class=3D"Apple-interchange-newline"></div></body></html>=

--Apple-Mail=_3C0011FE-D283-49D8-821A-B089E53E7292--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21a-Linux NewsLink 1.2

From Darrin Smith@beldin@beldin.org to muc.lists.freebsd.stable on Sat Aug 16 15:30:08 2025

From Newsgroup: muc.lists.freebsd.stable

On Fri, 15 Aug 2025 07:50:38 +0930
Darrin Smith <beldin@beldin.org> wrote:

On Thu, 14 Aug 2025 09:16:27 -0700
Rick Macklem <rick.macklem@gmail.com> wrote:

[SNIP]

A fix for a NAMEI leak was just committed to main.
Maybe you can update your kernel and see if this helps for
your problem?

rick

Darrin

Sounds like a possibility. I'll set the wheels in motion and see how
we go.

It seems this has indeed resolved my issue, NAMEI is registering 0
allocation most of the time.

Guess I didn't have a similar issue to the original post after all.

Darrin

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21a-Linux NewsLink 1.2

From Rick Macklem@rick.macklem@gmail.com to muc.lists.freebsd.stable on Sat Aug 16 16:43:11 2025

From Newsgroup: muc.lists.freebsd.stable

On Fri, Aug 15, 2025 at 4:24rC>AM Jan Martin Mikkelsen <janm@transactionware.com> wrote:

On 14 Aug 2025, at 06:35, Rick Macklem <rick.macklem@gmail.com> wrote:

[ rCa ]

Peter Eriksson posted this to me a little while ago...
I wish I could upgrade our front-end servers from FreeBSD 13.5 btw -
but there is a very troublesome issue with ZFS on FreeBSD 14+ -
sometimes it runs amok and basically uses up all available RAM - and
then the system load goes thru the roof and the machine basically
grinds to a hold for _long_ periods - happens when we run our backup
rsync jobs.

https://github.com/openzfs/zfs/issues/17052

I was seeing a problem just like this in the 14.3 betas during heavy workloads.

The problem was fixed at around BETA4, I believe by this commit. https://cgit.freebsd.org/src/commit/?h=releng/14.3&id=7a9ea03e4bbfee1b2192d9a5b4da89a53d3a2c14

Have you see a similar problem on 14.3?

The original report was for 14.3. Of course, they may have experienced something different than what you did.
(I tried to look at whatever it was on the discord channel, but got
nothing useful,
just some sort of "welcome to discord..." message.)
I do hope that Peter can try 14.3 and determine if he still sees the problem
he reported.
rick

Regards,

Jan M.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21a-Linux NewsLink 1.2

From Mark Millard@marklmi@yahoo.com to muc.lists.freebsd.stable on Sat Aug 16 21:22:22 2025

From Newsgroup: muc.lists.freebsd.stable

Rick Macklem <rick.macklem_at_gmail.com> wrote on
Date: Sat, 16 Aug 2025 23:43:11 UTC :

On Fri, Aug 15, 2025 at 4:24rC>AM Jan Martin Mikkelsen <janm@transactionware.com> wrote:

. . .

The original report was for 14.3. Of course, they may have experienced something different than what you did.
(I tried to look at whatever it was on the discord channel, but got
nothing useful,
just some sort of "welcome to discord..." message.)

Yea, discord is a need-to-login type of context. If
one is unlikely to want to establish such a login,
such discord URLs should likely be ignored. (I
originally established a login in order to get to
Solid Run's support for their HoneyComb [aarch64
based].)
As for the specific report, some of the information
was presented with images, making for a not great
fit for the mail-list. Thus the URL usage for those
that could readily use it.
So far, it has turned out that the message to the
mail-list mostly has prompted other leaks to be
found and fixed. Not bad for unintended
consequences.

I do hope that Peter can try 14.3 and determine if he still sees the problem he reported.

===
Mark Millard
marklmi at yahoo.com
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21a-Linux NewsLink 1.2

From Rick Macklem@rick.macklem@gmail.com to muc.lists.freebsd.stable on Sat Aug 16 22:13:47 2025

From Newsgroup: muc.lists.freebsd.stable

On Sat, Aug 16, 2025 at 9:22rC>PM Mark Millard <marklmi@yahoo.com> wrote:

Rick Macklem <rick.macklem_at_gmail.com> wrote on
Date: Sat, 16 Aug 2025 23:43:11 UTC :

On Fri, Aug 15, 2025 at 4:24rC>AM Jan Martin Mikkelsen <janm@transactionware.com> wrote:

. . .

The original report was for 14.3. Of course, they may have experienced something different than what you did.
(I tried to look at whatever it was on the discord channel, but got
nothing useful,
just some sort of "welcome to discord..." message.)

Yea, discord is a need-to-login type of context. If
one is unlikely to want to establish such a login,
such discord URLs should likely be ignored. (I
originally established a login in order to get to
Solid Run's support for their HoneyComb [aarch64
based].)

I do have a discord login (for the NFSv4 bakeathons),
but when I click on what is in your original post, it just
says something about downloading an app and no text
channels.
rick

As for the specific report, some of the information
was presented with images, making for a not great
fit for the mail-list. Thus the URL usage for those
that could readily use it.

So far, it has turned out that the message to the
mail-list mostly has prompted other leaks to be
found and fixed. Not bad for unintended
consequences.

I do hope that Peter can try 14.3 and determine if he still sees the problem
he reported.

===
Mark Millard
marklmi at yahoo.com

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21a-Linux NewsLink 1.2

From Mark Millard@marklmi@yahoo.com to muc.lists.freebsd.stable on Sat Aug 16 23:08:39 2025

From Newsgroup: muc.lists.freebsd.stable

On Aug 16, 2025, at 22:13, Rick Macklem <rick.macklem@gmail.com> wrote:

On Sat, Aug 16, 2025 at 9:22rC>PM Mark Millard <marklmi@yahoo.com> wrote:

Rick Macklem <rick.macklem_at_gmail.com> wrote on
Date: Sat, 16 Aug 2025 23:43:11 UTC :

On Fri, Aug 15, 2025 at 4:24rC>AM Jan Martin Mikkelsen
<janm@transactionware.com> wrote:

. . .

The original report was for 14.3. Of course, they may have experienced
something different than what you did.
(I tried to look at whatever it was on the discord channel, but got
nothing useful,
just some sort of "welcome to discord..." message.)

Yea, discord is a need-to-login type of context. If
one is unlikely to want to establish such a login,
such discord URLs should likely be ignored. (I
originally established a login in order to get to
Solid Run's support for their HoneyComb [aarch64
based].)

I do have a discord login (for the NFSv4 bakeathons),

Ahh, so, ignoring my URL copy/paste, go to the
FreeBSD storage channel. Go to the first 2025-Aug-11 or
so "notafet" message, 12:36 AM shows here, but that is
likely local time. My note about posting to the list
follows, locally showing 1:20 PM.
The last related message shows here as on 2025-Aug-13
at 11:07 PM and was by me.
(So, I've learned that also providing instructions for
finding discord content can be appropriate.)

but when I click on what is in your original post, it just
says something about downloading an app and no text
channels.

My normal desktop environment is macOS. In that environment,
when I click on the link, the Safari Web Browser opens
to a login page. Part of the notation shown is: https://discord.com/login?redirect_to=%2Fchannels%2
Logging in takes to me to: https://discord.com/channels/727023752348434432/757305697527398481/1404367777904463914
which displays the FreeBSD storage channel with the message
that I was referencing.
It does not use the installed discord application, which
is how I normally deal with discord. I get no prompt about
downloading an app.
Using the URL that it takes me to instead as the way to
start also goes through the login sequence and takes me
to the same place via the same URL.
My normal FreeBSD context is console (serial when available)
and ssh. No graphical UI.

rick

As for the specific report, some of the information
was presented with images, making for a not great
fit for the mail-list. Thus the URL usage for those
that could readily use it.

So far, it has turned out that the message to the
mail-list mostly has prompted other leaks to be
found and fixed. Not bad for unintended
consequences.

I do hope that Peter can try 14.3 and determine if he still sees the problem
he reported.

===
Mark Millard
marklmi at yahoo.com
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21a-Linux NewsLink 1.2

Who's Online

System Info

FYI; 14.3: A discord report of Wired Memory growing to 17 GiBytes over something like 60 days; ARC shrinks to, say, 1942 MiBytes