• Best spool for an archive server

    From aw@aw@somewhere.invalid (Adam W.) to news.software.nntp on Sun Dec 28 00:42:58 2025
    From Newsgroup: news.software.nntp

    Hi,

    I want to set up a newsserver (inn) with the archive of Polish Usenet.
    I already have most of archives downloaded (in some weird format I'll
    convert, sort by date, and feed to inn), it's around 58 million articles (around 100 GB + overview). Seems manageable.

    What would be the best spooling and overview method for this?

    Right now I'm thinking about creating a file, formatting it as some
    filesystem (which filesystem? I use ext4 for my everyday needs, but maybe something else is better for this?), tuning its parameters, and using tradspool.

    The file would be extended if needed, so the chosen filesystem has to have
    the capability to do it (ext4 can be extended if the underlying storage is extended, but I don't know about other filesystems).

    If it's ext4, I also have a couple of questions on how to best tune it.
    That's what I want to do -- is it a good idea?

    1. Set the bytes per inode ratio to 1536 (very low, but it will give me 69 million inodes per 100 GB)
    2. Set the block size to 1024 (not too low?)
    3. Set the inode size to 128
    4. Set uid16 to disable 32-bit UIDs
    5. Disable large_file
    6. Set dir_index
    7. Set reserved blocks percentage to some low value (is 0% OK?)

    Overview would be tradindexed, I think it will suffice.

    CNFS would be better if there was a way to throttle a server when it's
    about to rotate the buffer (I don't want to lose articles, ever, even if there's some massive flood that would overwhelm my storage, I want to add
    new buffers and unthrotle server then), but is it even possible?

    Plus in case of a flood I'd have a problem with deleting articles from
    CNFS that I wouldn't have with tradspool...

    Some idea would be to use timecaf, but:

    1. It doesn't seem to be widely used, so it's also not very well tested.
    Or is it? How stable it is?

    2. Is there a way to rotate a .CF file when it's full (262144 articles), instead of relying on arrival time? I want to feed new articles as fast as
    I can

    3. Maybe there are some tools to initially write the .CF files directly, instead of letting inn handle it? Then I'd just have to build the rest (history, overview)

    The server won't accept new articles from readers -- after the initial prefeeding from my archives there will be only a single feed from my main server.

    Suggestions are welcome.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Andreas Kempe@kempe@lysator.liu.se to news.software.nntp on Wed Dec 31 17:22:36 2025
    From Newsgroup: news.software.nntp

    Den 2025-12-28 skrev Adam W. <aw@somewhere.invalid>:
    Hi,


    Hello,

    I'll share some thoughts on my experience with administrating the NNTP
    server I'm posting from.

    I want to set up a newsserver (inn) with the archive of Polish Usenet.
    I already have most of archives downloaded (in some weird format I'll convert, sort by date, and feed to inn), it's around 58 million articles (around 100 GB + overview). Seems manageable.

    What would be the best spooling and overview method for this?

    Right now I'm thinking about creating a file, formatting it as some filesystem (which filesystem? I use ext4 for my everyday needs, but maybe something else is better for this?), tuning its parameters, and using tradspool.

    The file would be extended if needed, so the chosen filesystem has to have the capability to do it (ext4 can be extended if the underlying storage is extended, but I don't know about other filesystems).


    We use ZFS and I think it has a number of advantages over ext4. Maybe
    the largest one for this use case is that inodes are dynamically
    allocated so you don't have a static limit like with ext4. Other
    than that, you get nice features like snapshotting, scrubbing, easy
    creation of datasets and transparent compression. Another good one
    when dealing with millions of small files is zfs-{send,receive} that
    allow you to transfer the file system on a block level, bypassing the
    need of iterating over millions of files and sending them one by one,
    granted not as much of an issue if you store everything in a file.

    ZFS does support growing the file system, but not shrinking it.

    In terms of tuning, you want to make sure the ashift setting matches
    your underlying storage's sector size. That's usually 4k on modern
    HDDs so ashift=12, i.e. 2^12 = 4096, even if they report a smaller
    logical sector size. The record size setting is 128k by default and
    probably doesn't need adjustment for your use-case since you do not
    plan on rewriting data. It defines the max size for individual file
    blocks and mostly affects sequential reading of large files and
    rewrite performance. You could experiment with setting it to a lower
    value if you want. It can be changed on-the-fly, but will only affect
    newly written files.

    This rich feature-set does come wit the penalty of worse IO/s
    performance, but we store about 28 million articles in a tradspool on
    spinning rust and it works.

    Regardless of what file system you end up using, I would recommend
    against using a file for this since you will impose double file system
    overhead that will likely tank your IO/s performance.

    Another option is XFS, which gives great IO/s performance, but I
    haven't used it in a server setting so I can't vouch for what it's
    like stability-wise. It also doesn't suffer the static inode problem
    of ext4 and can be grown.

    If it's ext4, I also have a couple of questions on how to best tune it. That's what I want to do -- is it a good idea?

    1. Set the bytes per inode ratio to 1536 (very low, but it will give me 69 million inodes per 100 GB)
    2. Set the block size to 1024 (not too low?)
    3. Set the inode size to 128
    4. Set uid16 to disable 32-bit UIDs
    5. Disable large_file
    6. Set dir_index
    7. Set reserved blocks percentage to some low value (is 0% OK?)

    Overview would be tradindexed, I think it will suffice.


    We were running a tradindex and got pretty serious performance issues. Switching to ovsqlite fixed that problem.

    CNFS would be better if there was a way to throttle a server when it's
    about to rotate the buffer (I don't want to lose articles, ever, even if there's some massive flood that would overwhelm my storage, I want to add new buffers and unthrotle server then), but is it even possible?

    Plus in case of a flood I'd have a problem with deleting articles from
    CNFS that I wouldn't have with tradspool...

    Some idea would be to use timecaf, but:

    1. It doesn't seem to be widely used, so it's also not very well tested.
    Or is it? How stable it is?

    2. Is there a way to rotate a .CF file when it's full (262144 articles), instead of relying on arrival time? I want to feed new articles as fast as
    I can

    3. Maybe there are some tools to initially write the .CF files directly, instead of letting inn handle it? Then I'd just have to build the rest (history, overview)


    Unfortunately I don't have good answer to any of the questions above.

    The server won't accept new articles from readers -- after the initial prefeeding from my archives there will be only a single feed from my main server.

    Suggestions are welcome.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@iulius@nom-de-mon-site.com.invalid to news.software.nntp on Sun Jan 4 21:34:43 2026
    From Newsgroup: news.software.nntp

    Hi Adam,

    I want to set up a newsserver (inn) with the archive of Polish Usenet.
    I already have most of archives downloaded (in some weird format I'll convert, sort by date, and feed to inn), it's around 58 million articles (around 100 GB + overview). Seems manageable.

    Overview would be tradindexed, I think it will suffice.

    I would use ovsqlite because it may perform a bit faster with millions
    of articles in a single newsgroup.


    CNFS would be better if there was a way to throttle a server when it's
    about to rotate the buffer (I don't want to lose articles, ever, even if there's some massive flood that would overwhelm my storage, I want to add
    new buffers and unthrotle server then), but is it even possible?

    There is no such feature. Maybe a new keyword in cycbuff.conf could be worthwhile having for such a use, like "nowrap:<buffer>[,<buffer>,...]"
    to list cyclic buffers that should not wrap. Unfortunately, it does not currently exist.


    Some idea would be to use timecaf, but:

    1. It doesn't seem to be widely used, so it's also not very well tested.
    Or is it? How stable it is?

    I have been using timecaf for more than a decade with a few hierarchies
    and never noticed any issue.


    2. Is there a way to rotate a .CF file when it's full (262144 articles), instead of relying on arrival time? I want to feed new articles as fast as
    I can

    Unfortunately no. This is the only drawback I see with this storage method.


    3. Maybe there are some tools to initially write the .CF files directly, instead of letting inn handle it? Then I'd just have to build the rest (history, overview)

    No tools exist for that.
    --
    Julien |eLIE

    -2-aJe ne cherche pas |a conna|<tre les r|-ponses, je cherche |a comprendre
    les questions.-a-+

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From aw@aw@somewhere.invalid (Adam W.) to news.software.nntp on Wed Jan 7 00:57:11 2026
    From Newsgroup: news.software.nntp

    Julien +LIE <iulius@nom-de-mon-site.com.invalid> wrote:

    I would use ovsqlite because it may perform a bit faster with millions
    of articles in a single newsgroup.

    Thanks, I'll use that.

    Thanks for responses, Julien and Andreas. I'll be doing some experiments
    with ZFS and ext4 on a smaller article set and we'll see how it works.
    --- Synchronet 3.21a-Linux NewsLink 1.2