• Switching INN 2 storage format

    From Tanguy Ortolo@tanguy@ortolo.eu to news.software.nntp on Tue Feb 11 15:11:54 2025
    From Newsgroup: news.software.nntp

    Hello all,

    My news server running INN 2 is storing all articles to a timecaf. I am currently in the process of switching my file systems to btrfs (mainly
    to get bitrot detection, see below for more details about that).

    I do not expect timecaf and a CoW filesystem such as btrfs to play well together. Indeed, writing a new small article to a large file should
    inevitably fragment it. To avoid that, I disabled copy-on-write for the timecaf, but that also disables file extent checksuming and therefore
    bitrot detection, defeating the main reason I am switching to btrfs in
    the first place.

    Therefore, I am considering switching to a news storage format more
    suitable with a CoW filesystem. I think the best option would be
    timehash, any thoughts on that?

    As I understand it, switching storage format for /new/ articles can
    simply be done by simply adding the new storage backend in first
    position in storage.conf, can it not?

    I do not plan on migrating existing articles, but simply to wait for
    them to expire since there does not seem to exist any simple migration procedure. But if someone knows a way to do so, I could be interested.
    ;-)

    Thanks for reading!


    For those interested, I have two SSDs that are set up in a software
    RAID1 with Linux LVM lvmraid(7). This is very flexible and will support
    a drive failurerCa but not bitrot. Indeed, bitrot is detected by the
    LVM /scrubbing/ process, but it will not know which drive has the
    unaltered data (if any).

    Linux LVM RAID has an optional integrity layer that can identify
    corrupted data, but while it does fix it on-the-fly by querying the
    other drive, it does not report which drive is altering data. And it
    disables volume snapshotting.

    After doing some research, it seems btrfs does fix all this, since it
    maintains data checksums, and its scrubbing process does update
    per-drive error counters.

    Btrfshat checksuming works with its copy-on-write design. In practice, disabling CoW on some files, or even on an entire filesystem, also
    disables checksuming. Therefore, I am looking for a way to store news
    that would work well with btrfs' copy-on-write. :-)
    --
    Tanguy
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Tanguy Ortolo@tanguy@ortolo.eu to news.software.nntp on Thu Feb 13 15:14:04 2025
    From Newsgroup: news.software.nntp

    Actually, while digging into INN 2 storage formats, I am more and more considering switching to tradspool. The idea is that I prefer something
    simple, easy to explain and understand, than something more complex.

    Indeed, timecaf, as documented in <https://github.com/InterNetNews/inn/blob/main/storage/timecaf/README.CAF>, really looks like some kind of filesystem. And I am a bit disturbed by
    the idea of stacking such a filesystem on top of my actual filesystem,
    because storing files, even if they are small and there are many of
    them, seems like a good job for a regular filesystem.

    (By comparison, CNFS is rightly described as a specialized filesystem.)

    timehash relies more on the actual filesystem, with articles as
    individual files, sorted in directories depending on their reception
    date and time. Compared to timecaf and CNFS, it is supposed to be slower because manipulating small files is slower than updating larger ones.

    As for tradspool, with articles as individual files in directories that replicate the newgroup hierarchy, it is supposed to by very slow with
    large groups it means manipulating files in directories with many files.
    And the expiration process is supposed to be slow as well, though I am
    not sure why it would be so.

    What I am now wondering, is how true the assumptions of slowness are
    with a modern filesystem such as btrfs. I just made a test, creating a
    million of small files (between 500 and 3000 bytes each) with random
    content. Listing is slow, but existence checking, file creation and
    deletion are not.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@iulius@nom-de-mon-site.com.invalid to news.software.nntp on Fri Feb 14 20:55:43 2025
    From Newsgroup: news.software.nntp

    Salut Tanguy,

    As for tradspool, with articles as individual files in directories that replicate the newgroup hierarchy, it is supposed to by very slow with
    large groups it means manipulating files in directories with many files.
    And the expiration process is supposed to be slow as well, though I am
    not sure why it would be so.

    I don't believe the disadvantages mentioned in the storage.conf manual
    page for tradspool still apply today. They used to on older hardware
    and with a higher traffic than today.
    The expiration process is not that slow, especially when using the
    delayrm flag with news.daily.

    I would just be inclined to change:

    "It takes a very fast file system and I/O system to keep up with current Usenet traffic volumes due to file system overhead. It requires a
    nightly expire program to delete old articles out of the news spool, a
    process that can slow down the server for several hours or more."

    to:

    "It needs a faster file system and I/O system than the cnfs and timecaf storage methods due to file system overhead. It also consumes more
    inodes and requires running a nightly expire program to delete old
    articles out of the news spool."


    What I am now wondering, is how true the assumptions of slowness are
    with a modern filesystem such as btrfs.

    I don't think you will actually notice any slowness with tradspool.
    --
    Julien |eLIE

    -2-aTravailler dur n'a jamais tu|- personne, mais pourquoi prendre le
    risque-a?-a-+ (Edgar Bergen)

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@iulius@nom-de-mon-site.com.invalid to news.software.nntp on Fri Feb 14 20:55:49 2025
    From Newsgroup: news.software.nntp

    Salut Tanguy,

    Therefore, I am considering switching to a news storage format more
    suitable with a CoW filesystem. I think the best option would be
    timehash, any thoughts on that?

    If you are looking for a news storage format writing one article per
    file, either timehash or tradspool could be used.


    As I understand it, switching storage format for /new/ articles can
    simply be done by simply adding the new storage backend in first
    position in storage.conf, can it not?

    Exactly. The first matching class found in the storage.conf file is
    used. You'll have to restart innd to take the modified file into account.


    I do not plan on migrating existing articles, but simply to wait for
    them to expire since there does not seem to exist any simple migration procedure. But if someone knows a way to do so, I could be interested.

    There is a program named "respool" in the contrib directory <https://github.com/InterNetNews/inn/blob/main/contrib/respool.c> but I
    have never used it so I do not know whether it works fine. Use at your
    own risk! :-)
    --
    Julien |eLIE

    -2-arCo Dis Ast|-rix-a! Quelle salade pour un peu d'huile-a!
    rCo Oui, et d|-p|-chons-nous de trouver un gu|-risseur avant que |oa ne
    tourne au vinaigre.-a-+ (Ast|-rix)

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Tanguy Ortolo@tanguy@ortolo.eu to news.software.nntp on Mon Feb 17 15:14:56 2025
    From Newsgroup: news.software.nntp

    Julien |eLIE, 2025-02-14 20:55+0100:
    If you are looking for a news storage format writing one article per
    file, either timehash or tradspool could be used.

    I got that already. I was considering timehash for performance reasons,
    but your insight eventually convinced me of using traspool. The simpler,
    the better. My time is more valuable than my CPU's time. ;-)

    Plus it will allow me to easily do some basic stats about article size
    and filesystem usage.

    As I understand it, switching storage format for /new/ articles can
    simply be done by simply adding the new storage backend in first
    position in storage.conf, can it not?

    Exactly. The first matching class found in the storage.conf file is
    used. You'll have to restart innd to take the modified file into account.

    Thanks for confirming. Of course I restarted innd, I never imagined such
    a change could be applied without restarting.

    There is a program named "respool" in the contrib directory <https://github.com/InterNetNews/inn/blob/main/contrib/respool.c> but I
    have never used it so I do not know whether it works fine. Use at your
    own risk! :-)

    Or not use at all. Keeping existing articles in timecaf is good enough,
    that worked for years after all. :-)
    --
    Tanguy
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Tanguy Ortolo@tanguy@ortolo.eu to news.software.nntp on Mon Feb 17 15:15:09 2025
    From Newsgroup: news.software.nntp

    Merci Julien !

    Julien |eLIE, 2025-02-14 20:55+0100:
    I don't believe the disadvantages mentioned in the storage.conf manual
    page for tradspool still apply today. They used to on older hardware
    and with a higher traffic than today.
    The expiration process is not that slow, especially when using the
    delayrm flag with news.daily.

    I thought so, thanks for confirming.

    I would just be inclined to change:

    "It takes a very fast file system and I/O system to keep up with current Usenet traffic volumes due to file system overhead. It requires a
    nightly expire program to delete old articles out of the news spool, a process that can slow down the server for several hours or more."

    to:

    "It needs a faster file system and I/O system than the cnfs and timecaf storage methods due to file system overhead. It also consumes more
    inodes and requires running a nightly expire program to delete old
    articles out of the news spool."

    By the way, timecaf also requires a nightly expire program, does it not?
    --
    Tanguy
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@iulius@nom-de-mon-site.com.invalid to news.software.nntp on Thu Feb 20 12:16:23 2025
    From Newsgroup: news.software.nntp

    Salut Tanguy,

    [tradspool]
    "It needs a faster file system and I/O system than the cnfs and timecaf
    storage methods due to file system overhead. It also consumes more
    inodes and requires running a nightly expire program to delete old
    articles out of the news spool."

    By the way, timecaf also requires a nightly expire program, does it not?

    Yes, the expire program is useful for storage backends that are not self-expiring (CNFS). I'll homogenize the wording for timehash and
    timecaf to mention that. The nightly expire program deletes old
    articles by either compacting CAF files if they still contain available articles, or removing them.
    --
    Julien |eLIE

    -2-arCo Par Pos|-idon-a! Quel prodige-a!!!
    rCo Par Neptune-a! Quel sans-g|-ne-a!-a-+ (Ast|-rix)

    --- Synchronet 3.21a-Linux NewsLink 1.2