I thought at one point I read something about how articles could be compressed in the spool in INN, or some other news software, but I can't find that now, aside from compressing just overview. (Not thinking of compression for transit here but compression at rest.)
Digging around a bit, I found a thread from 1990 in news.software.b
which discussed this[1], consensus that news articles individually tend
to be small enough that compressing them is not worth the effort.
Certainly on busy systems, it makes little sense, but for the use case
where disk space is limited and CPU is not fully utilized normally
anyways, it seems attractive, assuming you don't mind losing the ability
to easily grep through things or use other utilities with the spool
(which, as recently mentioned, are less common now).
Is anyone aware of a news package that has ever supported compression of articles in the spool? My theory is that articles today are larger than 35-40 years ago, both due to more headers and larger bodies. I did some
64K: 5,959 (0.01%)
tests on some articles I picked at random, and most seem to be in the 1KB-6KB range. I compressed a 930B article to 739B (20% savings) and a 5,730B article to 3,649B (37% savings). 25-35% seems typical.
Conceivably, on a system with limited disk space, one could increase retention by maybe around 25% just with this trick. Certainly, at some
point it starts to look attractive.
I'm toying with the concept for tradspool, though I imagine with CNFS or other multi-article files, compression would be even more effective.
[1] https://groups.google.com/g/news.software.b/c/jDdQlTLzzIw/m/ XIWEHQpvgWsJ
On 6/17/26 20:11, InterLinked wrote:
I thought at one point I read something about how articles could be
compressed in the spool in INN, or some other news software, but I
can't find that now, aside from compressing just overview. (Not
thinking of compression for transit here but compression at rest.)
Digging around a bit, I found a thread from 1990 in news.software.b
which discussed this[1], consensus that news articles individually
tend to be small enough that compressing them is not worth the effort.
The economics favor the opposites for both with any modern system.-a I.e. the data would be compressible (especially with a trained dictionary),
and the "effort" is borderline free and optimizes harder scale axis like
I/O bandwidth and page cache.
Certainly on busy systems, it makes little sense, but for the use case
where disk space is limited and CPU is not fully utilized normally
anyways, it seems attractive, assuming you don't mind losing the
ability to easily grep through things or use other utilities with the
spool (which, as recently mentioned, are less common now).
Again economy has flipped and busy systems benefit more if it is done intelligently.
Is anyone aware of a news package that has ever supported compression
of articles in the spool? My theory is that articles today are larger
than 35-40 years ago, both due to more headers and larger bodies. I
did some
I would question that intuition, I don't think article size has changed much.
Here is a histogram on my tradspool:
<512B:-a-a-a 788,863-a (0.95%)
512B-1K:-a 5,108,243 (6.16%)
1-2K:-a-a-a 31,046,227 (37.43%)
2-4K:-a-a-a 26,416,756 (31.85%)
4-8K:-a-a-a 11,161,756 (13.46%)
8-16K:-a-a-a 4,478,769 (5.40%)
16-32K:-a-a 1,522,232 (1.84%)
32-64K:-a-a-a-a 358,406 (0.43%)
64K:-a-a-a-a-a-a-a-a 5,959 (0.01%)
Total: ~82.9 million articles
(note I cutoff at 52k, the larger articles would've been sucked when I
was filling out some history on a couple groups)
tests on some articles I picked at random, and most seem to be in the
1KB-6KB range. I compressed a 930B article to 739B (20% savings) and a
5,730B article to 3,649B (37% savings). 25-35% seems typical.
Conceivably, on a system with limited disk space, one could increase
retention by maybe around 25% just with this trick. Certainly, at some
point it starts to look attractive.
I'm toying with the concept for tradspool, though I imagine with CNFS
or other multi-article files, compression would be even more effective.
Yes, something like CNFS will result in greater gains because an
untrained dictionary will span a larger working set.
Otherwise a lot of Usenet articles are small enough that you really need
a trained dictionary to get any ratio on individual articles.
I use block level LZ4 (ZFS) on my tradspool, and it is maybe a 10% improvement.-a The major issue with this setup is the filesystem metadata and data block per article, especially for small articles (the FS has no support for packing small files into the medadata node, stuffing
multiple small files into one data node etc).-a CNFS would result in a
much higher ratio.
On 6/18/2026 12:14 AM, Kevin Bowling wrote:
Even, I suspect it may not be enough to eliminate at least one block for >many articles. So I think 25% was a bit optimistic on my part, maybe
10-20% is more realistic.
I use block level LZ4 (ZFS) on my tradspool, and it is maybe a 10%
improvement.-a The major issue with this setup is the filesystem metadata >> and data block per article, especially for small articles (the FS has no
support for packing small files into the medadata node, stuffing
multiple small files into one data node etc).-a CNFS would result in a
much higher ratio.
I'm toying with the concept for tradspool, though I imagine with CNFS or other multi-article files, compression would be even more effective.
In article <1110ubg$2ld8b$1@dont-email.me>,
InterLinked <nntp@phreaknet.org> wrote:
On 6/18/2026 12:14 AM, Kevin Bowling wrote:
Even, I suspect it may not be enough to eliminate at least one block for
many articles. So I think 25% was a bit optimistic on my part, maybe
10-20% is more realistic.
I use block level LZ4 (ZFS) on my tradspool, and it is maybe a 10%
improvement.|e-a The major issue with this setup is the filesystem metadata >>> and data block per article, especially for small articles (the FS has no >>> support for packing small files into the medadata node, stuffing
multiple small files into one data node etc).|e-a CNFS would result in a >>> much higher ratio.
Tradspool here, and I still get pretty good compression with LZ4:
NAME RATIO
tank/root/usr/local/news 1.88x
tank/root/usr/local/news/db 1.77x
tank/root/usr/local/news/spool 1.89x
tank/root/usr/local/news/tmp 1.00x
It's a small server and I don't carry binaries, so that should give a
decent idea of what LZ4 on actual text will give you. ("1.89x" is
what others would call "47%".) This is with 90-day retention, the
whole of /usr/local/news is less than 10 GiB.
There's more than enough CPU to use zstd or gzip but what's the point?
This is just our default zpool configuration, and there are a bunch of optimizations in ZFS that sit on top of `compression=on`.
InterLinked <nntp@phreaknet.org> writes:
I'm toying with the concept for tradspool, though I imagine with CNFS or
other multi-article files, compression would be even more effective.
As various people have implicitly noted but perhaps not said explicitly,
one of the things that has changed since the original news servers were written is that compression is now a file system feature for some file systems.
I would not, in 2026, implement application-level compression of things you're storing as simple files disk unless you have enough knowledge to
use some application-specific compression mechanism (so, for instance,
image compression is still a different matter). If you're just using
standard general-purpose compression mechanisms like zstd, your time is almost certainly better spent finding an underlying file system that
handles the compression for you. The file system knows things like its
block layout strategy and can make good choices about opportunistic compression that the news software is not in a position to make.
It might still be worthwhile in some cases to compress blobs stored in databases or other similar cases, but even there I'd want to benchmark against a file system that natively implements compression.
ZFS is probably the most mature file system with this feature, but running
it on Linux can be a little complicated due to boring licensing reasons. btrfs is another option for transparent compression if you don't feel like dealing with ZFS, but isn't as mature. (That said, I've been using btrfs
on all my personal devices for years now without any trouble, although I
have avoided running a disk entirely out of space, which many people say btrfs doesn't always handle well.)
The other constraint with file systems is if you can use them at all. My Internet facing news server is just one service running on a Digital
Ocean droplet, and those only support ext4 and xfs for the primary disk.
zstd seems to work even better than compress; in some cases up to 50% compression for a single article even without a custom dictionary.
I realize this is probably another "feature" that has limited appeal...
but I'm sure there are a few others who may be using a VPS and want to
get the most out of it.
Train a dictionary across all the uncompressed
articles in a spool to get something representative for Usenet, and then
use that when compressing individual articles.
InterLinked <nntp@phreaknet.org> writes:
The other constraint with file systems is if you can use them at all. My
Internet facing news server is just one service running on a Digital
Ocean droplet, and those only support ext4 and xfs for the primary disk.
Ah! Yes, okay, I hadn't considered that, and that's going to be a
constraint.
zstd seems to work even better than compress; in some cases up to 50%
compression for a single article even without a custom dictionary.
Yeah, zstd is what I'd use these days.
I realize this is probably another "feature" that has limited appeal...
but I'm sure there are a few others who may be using a VPS and want to
get the most out of it.
The entirety of Usenet is features with limited appeal. :) I assume you're doing this as a hobby to have fun, and in that case I heartily encourage
you to do whatever makes you happy, including implementing things no one
else cares about. I have had so much fun in my life doing that.
I'm only giving you design advice as if this were a work project because I find it fun to kick around design questions. You should feel entirely free
to ignore me and do something that sounds more entertaining or rewarding
or just satisfying!
On Wed, 17 Jun 2026 23:11:57 -0400
InterLinked <nntp@phreaknet.org> wrote:
<snip>
I wouldn't bother with userland compression tools since that is one more thing that can break. Also be warned that the gnu coreutils and standard tools have been re-written in Rust and are all serverely buggy and borked. I wouldn't touch any compression tool with a Rust-rewrite dependency.
You could create a BTRFS or ZFS partition or image file and mount it with one of those file systems then place your spool on that volume. Enable automatic filesystem compression then go play golf or whatever.
If you are running on ext4 you can just create and mount a image file and format it for BTRFS or ZFS. I have used BTRFS image volumes that allowed me to fit 1.2+ TB of data on a 500GB drive.
InterLinked <nntp@phreaknet.org> wrote:
Train a dictionary across all the uncompressed articles in a spool to
get something representative for Usenet, and then use that when
compressing individual articles.
Replace header names with single-charater tokens and omit the colon as
an understood character.
| Sysop: | Amessyroom |
|---|---|
| Location: | Fayetteville, NC |
| Users: | 70 |
| Nodes: | 6 (0 / 6) |
| Uptime: | 01:55:29 |
| Calls: | 949 |
| Calls today: | 1 |
| Files: | 1,325 |
| Messages: | 281,112 |