However, the documentation for the active file in InterNetNews (INN)[2]
says that <high> is the "highest article number that has ever been used in that newsgroup". This implies it is NOT the same as the reported high
water mark, because the high water mark could decrease while <high> in
this context is monotonically increasing
- and the low and high water marks can be recomputed by scanning the
spool directory, while <high> in the active file cannot (and thus needs
to be stored persistently).
I've been perusing the source of InterNetNews (INN) to try to understand
how it behaves, as a reference. It refers to the active file <high> as
LAST in a few places, and this is used when assigning new article IDs in a group. This makes sense. For LIST COUNT and GROUP, it pulls from group
stats, which I believe is ultimately some kind of database backend that provides the reported water marks and article count. However, in the
response for LIST ACTIVE, it simply dumps the line from the active file as is. Yet, the RFC says the response format for LIST ACTIVE includes the reported high and low water marks.
1. If "LAST" is an internal value used for assigning article IDs, and
not the reported high water mark, then why is it being handed out as
such for LIST ACTIVE? I would think it would use the actual reported
high water mark, because if the high water mark article were deleted,
then the response would have the wrong high water mark.
2. The same page says <low> in the active file is ~the low water mark
but "not guaranteed to be accurate" and is just a hint. In INN, do the
values of <low> in the active file ever differ from the low water mark
in the group stats? Or are they distinct values like the active file
<high> (LAST) and the low water mark?
InterLinked <nntp@phreaknet.org> writes:
However, the documentation for the active file in InterNetNews (INN)[2]
says that <high> is the "highest article number that has ever been used in >> that newsgroup". This implies it is NOT the same as the reported high
water mark, because the high water mark could decrease while <high> in
this context is monotonically increasing
Correct, INN never decreases the high water mark under normal operations. Therefore, if the latest article in the group is deleted, the high water
mark will not decrease and will reference a non-existent article.
This is how most news servers have historically behaved, so the arguable implication in RFC 3977 that servers should decrease the high water mark
in this case is arguably a bug. In practice, it has no real effect since
news readers are required to handle this situation anyway, due to:
| The set of articles in a group may change after the GROUP command is
| carried out:
|
| o Articles may be removed from the group.
| o New articles may be added with article numbers greater than the
| reported high water mark. (If an article that was the one with
| the highest number has been removed and the high water mark has
| been adjusted accordingly, the next new article will not have the
| number one greater than the reported high water mark.)
which is implicitly incorporated by reference into LIST ACTIVE since it references that definition of high and low water marks (and logically has
to be the case regardless, since of course the state of the spool could change after LIST ACTIVE just as it could after GROUP).
- and the low and high water marks can be recomputed by scanning the
spool directory, while <high> in the active file cannot (and thus needs
to be stored persistently).
Yes, and I'm not sure INN's behavior when the news administrator
reconstructs the active file from the spool is strictly conforming in all edge cases. (For example, the low water mark could decrease, which is forbidden by RFC 3977.) In practice, the edge cases probably don't matter.
I've been perusing the source of InterNetNews (INN) to try to understand
how it behaves, as a reference. It refers to the active file <high> as
LAST in a few places, and this is used when assigning new article IDs in a >> group. This makes sense. For LIST COUNT and GROUP, it pulls from group
stats, which I believe is ultimately some kind of database backend that
provides the reported water marks and article count. However, in the
response for LIST ACTIVE, it simply dumps the line from the active file as >> is. Yet, the RFC says the response format for LIST ACTIVE includes the
reported high and low water marks.
In theory INN could construct a LIST ACTIVE response from the overview database. In practice, this is a very frequent operation and the current implementation is probably considerably faster than an overview-based implementation, for dubious benefit.
So, for your questions:
1. If "LAST" is an internal value used for assigning article IDs, and
not the reported high water mark, then why is it being handed out as
such for LIST ACTIVE? I would think it would use the actual reported
high water mark, because if the high water mark article were deleted,
then the response would have the wrong high water mark.
Because it's slow, basically. News readers like for LIST ACTIVE to be very fast with a large number of groups so that they can show unread article counts quickly on newsreader startup.
2. The same page says <low> in the active file is ~the low water mark
but "not guaranteed to be accurate" and is just a hint. In INN, do the
values of <low> in the active file ever differ from the low water mark
in the group stats? Or are they distinct values like the active file
<high> (LAST) and the low water mark?
I would never guarantee full integrity between all of INN's various
databases because they're all independent and updated non-transactionally,
so all sorts of weird things are true momentarily. In theory, the low
water mark in the active file should be eventually consistent with the low water mark in overview, but it will certainly vary while in the middle of nighly expire and may vary at other times that I'm not thinking of.
If I were writing a new news server from scratch today in 2026, I would
try very hard not to use INN's design of having four separate databases in three entirely different formats for the active file, the newsgroup descriptions, the overview, and the history. Surely there is some way to write a transactional database that could track all those things in a more reasonable but still performant way that doesn't require constantly
managing inconsistencies the way that INN does after some crashes or corruption. It used to be that all SQL databases were just too slow, particularly for history, but overview can be put in SQLite these days and I'm dubious that there is no standard database that could handle history given how much database optimization has happened over the years.
But INN will evolve slowly, if at all, because it basically works and a
lot of the bugs have been flushed out over the years and changing architectures is very hard. :)
To make an extreme example, if a group with a lot of articles had all of >them except the low water mark article deleted (and "last" is 3000), you >could have a response like:
211 3000 1 3000 misc.test
[and the count has to be at least 3000, per the RFC, so we can't even
have 211 1 1 3000 misc.test to indicate there are definite gaps]
when in reality, this is the "most accurate" response:
211 1 1 1 misc.test
InterLinked <nntp@phreaknet.org> wrote or quoted:
To make an extreme example, if a group with a lot of articles had all of
them except the low water mark article deleted (and "last" is 3000), you
could have a response like:
211 3000 1 3000 misc.test
[and the count has to be at least 3000, per the RFC, so we can't even
have 211 1 1 3000 misc.test to indicate there are definite gaps]
when in reality, this is the "most accurate" response:
211 1 1 1 misc.test
A newsreader might already have read those 3000 articles and made
an internal note:
|In that group, I have seen everything up to 3000.
. So when the newsserver then would go back to "211 1 1 1 misc.test",
the newsreader might miss the next 2999 articles because it deems then
"seen".
LISTGROUP and XHDR can be used to learn more about available articles.
On 4/30/2026 8:05 PM, Russ Allbery wrote:
Therefore, if the latest article in the group is deleted, the high
water mark will not decrease and will reference a non-existent article.
This is how most news servers have historically behaved, so the
arguable implication in RFC 3977 that servers should decrease the high
water mark in this case is arguably a bug. In practice, it has no real
effect since news readers are required to handle this situation anyway,
due to:
| The set of articles in a group may change after the GROUP command is
| carried out:
|
| o Articles may be removed from the group.
Hmm, that's an interesting way to look at it - even if the article was deleted *before* the GROUP response is generated, the client wouldn't be
able to tell.
But the 3rd bullet in the phrase you cite (6.1.1.2) also says:
| o New articles may be added with article numbers greater than the
| reported high water mark. (If an article that was the one with
| the highest number has been removed and the high water mark has
| been adjusted accordingly, the next new article will not have the
| number one greater than the reported high water mark.)
To me, this implies the high water mark can (even "should") decrease when
the high water mark article is removed - in which case, the next article assigned would indeed not have high + 1 (similar to how UIDs and UIDNEXT
work in IMAP).
Of course, the phrasing doesn't mandate that the high water mark
decrease in this case, though it seems to allow for that option.
Yes, that makes sense. But this seems more like a "loophole" or
"shortcut"... did the RFC actually intend it work this way? Or everybody
was already doing it that way before RFC 3977, and that part of it has
just been ignored?
For context, I am working on my own NNTP implementation and I've really
been scratching my head about how to handle this case. It seems like if
I'm able to provide a more accurate response (a lower high water mark),
that would be preferred, but maybe there is a good reason not to do so?
(The obvious one being it requires extra bookkeeping).
To make an extreme example, if a group with a lot of articles had all of
them except the low water mark article deleted (and "last" is 3000), you could have a response like:
211 3000 1 3000 misc.test
[and the count has to be at least 3000, per the RFC, so we can't even have 211 1 1 3000 misc.test to indicate there are definite gaps]
when in reality, this is the "most accurate" response:
211 1 1 1 misc.test
Though now that begs the question what to display if that last article (1) were then deleted. I presume in the first case, it would naturally be:
211 0 3000 2999 misc.test
And this is probably the best response. In the second case, it seems more ambiguous what the most logical reply would be, since you could start with either "last" or whatever the last true high water mark was (e.g. 211 0 0
1 misc.test).
That brings me to another observation: I've noticed that most inactive newsgroups in INN return high 0 and low 1 (at least for those I've
analyzed in responses for Usenet groups), which seemed odd to me as I
would have thought INN would naturally take the <high> in the active file, and taking low = <high> and high = low - 1, return something more like 211
0 3000 2999 misc.test
The benefit there is that from the output you could see how many articles were in the group historically, from the high water mark, even if all have since been deleted - it's extra context that can be conveyed "for free".
Is there a reason INN just uses 1/0 instead? This seems like one case
where using <last> directly would actually really make sense for a client.
Okay, so performance > strict correctness (which is a reasonable answer,
when the client can't really say it wasn't correct).
Though I don't see why the implementation could not be such that it
would be just as fast - either perhaps through an in-memory cache of low/high/count for all groups, kept in sync with the active file, or
even more simply, storing this all in the active file itself, i.e. with
a format like:
<name> <last> <reportedhigh> <reportedlow> <count> <status>
I'm not proposing either of these for INN specifically, but wondering if either would make sense in the design of new software. If I had to
guess, maybe the active file hasn't been extended like this for compatibility/portability reasons?
The low/high water marks and count could be computed at startup by
scanning the directories, and then stored in memory, but now I'm kind of tempted by the idea of just having it all in an "extended" active file.
1. It seems that convention is to "lie" about the high water mark and
just hand out "last" instead, for performance, at least the way INN is implemented (since the client can't tell that we lied). Considering it
feels against the *spirit* of the RFC, setting aside performance, do you foresee any problems with choosing to provide an accurate high water
mark? I can't see how it would break compatibility, since the RFC
already says the high water mark CAN decrease, even if nobody does it
today.
2. Is INN's active file (or file system more generally) intended to be portable with other news servers?
If not, it seems like I could just extend the active file to add the
"true" high water mark along with the article count, and then just use
that for both LIST ACTIVE and GROUP. Then I could be truthful with no performance hit.
There are two fairly obvious ways to handle the high water mark:
1. Keep low and high water marks in only one place, increment the high
water mark on every new article arrival as part of article numbering,
and never decrement it because it doubles as the source of the next
article number for that group.
2. Keep internal "next article number" data for each group but report the
high water mark based on what articles are in the spool at the time.
Historically, INN (and C News, I'm fairly sure) always did 1, so that was very widespread practice. I'm fairly sure that we wouldn't have chosen to declare it nonconformant. 2 is arguably more correct so the language
should (and was) written to *allow* it, but we wouldn't have *required* it and ruled the historic INN behavior non-compliant. INN's ability to do 1
in theory based on the overview database information is new in INN 2.x as
I recall. Before that, OVcancel was not a thing, and there was no way to remove the information about the cancelled article from overview before
the next nightly expire, so there was no independent source of truth about the current article numbers beyond checking the spool.
Thinking about the problem this morning, I do see a small but real
advantage to the client in getting an accurate high water mark: It means
that the count of unread articles derived purely from LIST ACTIVE will be more correct in the specific case that only the highest-numbered article
was removed. That in turn may save some spurious notification of unread messages. But counts based solely on LIST ACTIVE responses are going to be inaccurate for the more common case (for servers that support article
removal at all in their configuration) of an article that is *not* the highest-numbered article being removed. This is just the tradeoff of using LIST ACTIVE for article numbers; if the client wants more accurate information, it needs to use one of the other commands like OVER. But of course those are inherently heavier-weight, due to the increased amount of information returned and the requirement of a round trip per group.
If you can provide a more accurate high water mark, I don't see any
drawback to doing so. The only possible downside that I can imagine is
that some client will be surprised by the high water mark decreasing,
since it has never seen a server that would do that, and might issue some sort of warning to the user. I suppose such a client could exist. But it would surprise me a bit; decreasing the high water mark is clearly allowed
by the RFC.
To make an extreme example, if a group with a lot of articles had all of
them except the low water mark article deleted (and "last" is 3000), you
could have a response like:
211 3000 1 3000 misc.test
[and the count has to be at least 3000, per the RFC, so we can't even have >> 211 1 1 3000 misc.test to indicate there are definite gaps]
Yup, and in the days when spam and spam cancels were fighting it out, it wasn't uncommon to see things like that happen in some groups.
when in reality, this is the "most accurate" response:
211 1 1 1 misc.test
Though now that begs the question what to display if that last article (1) >> were then deleted. I presume in the first case, it would naturally be:
211 0 3000 2999 misc.test
And this is probably the best response. In the second case, it seems more
ambiguous what the most logical reply would be, since you could start with >> either "last" or whatever the last true high water mark was (e.g. 211 0 0
1 misc.test).
*If* your server would never reinstate articles, the best response in the sense of giving the client the most information would be to increase the
low water mark and return a high of 2999 and a low of 3000, because the client can then forget about all of those deleted articles permanently.
But as the RFC says, if you might ever reinstate those articles, you're
not allowed to increase the low water mark like that, so I think the best response would be to return high 0 and low 1 if the articles may later reappear.
That brings me to another observation: I've noticed that most inactive
newsgroups in INN return high 0 and low 1 (at least for those I've
analyzed in responses for Usenet groups), which seemed odd to me as I
would have thought INN would naturally take the <high> in the active file, >> and taking low = <high> and high = low - 1, return something more like 211 >> 0 3000 2999 misc.test
Is it possible that those groups have never received traffic on that
server? That's the response I would expect if the server has never stored
an article for that group.
If I were writing a news server from scratch, I would embrace modern databases as early as possible and not try to reinvent that wheel. Long experience with INN is that the reinvention of various databases is one of the hardest parts of INN to maintain and handing that all off to some suitable library or external service would be very attractive.
2. Is INN's active file (or file system more generally) intended to be
portable with other news servers?
Not really, no. Some of INN's on-disk data structures match the format of files specified in the RFC for convenience reasons, but most of INN"s
on-disk data structures (apart from the spool if tradspool is used) are
very, very specific to INN.
If not, it seems like I could just extend the active file to add the
"true" high water mark along with the article count, and then just use
that for both LIST ACTIVE and GROUP. Then I could be truthful with no
performance hit.
If you are reworking the format, I would find a way to put the newsgroup description into the same file, because desynchronization between active
and newsgroups is a long-standing annoyance in INN. And at that point I
would consider some sort of structured database with fast writes. :)
On 5/1/2026 1:38 PM, Russ Allbery wrote:
Thinking about the problem this morning, I do see a small but real
advantage to the client in getting an accurate high water mark: It means
that the count of unread articles derived purely from LIST ACTIVE will be
more correct in the specific case that only the highest-numbered article
was removed. That in turn may save some spurious notification of unread
messages. But counts based solely on LIST ACTIVE responses are going
to be
inaccurate for the more common case (for servers that support article
removal at all in their configuration) of an article that is *not* the
highest-numbered article being removed. This is just the tradeoff of
using
LIST ACTIVE for article numbers; if the client wants more accurate
information, it needs to use one of the other commands like OVER. But of
course those are inherently heavier-weight, due to the increased
amount of
information returned and the requirement of a round trip per group.
I think I'll definitely want to consider that angle - hitherto, I've
been directly using Eternal September in my newsreader (which is Mozilla-based) and I've noticed for some large groups, I see a very high count, and then when I click on the group, it changes radically. Just
now, I did a packet capture and I only see it using the GROUP command
(and not LIST ACTIVE at all), but from the configuration that INN
allows, I wonder if maybe Eternal September has their INN (for indeed,
they are using INN) configured to return estimate group counts in most cases, and thus my reader only sees the correct count when I click on
the group.
Could've been some other command, but that makes me desire even more strongly to always provide accurate counts as well, if nothing else to
avoid irritating me :)
How often does article reinstatement really occur and under what circumstances? Purely by the local newsmaster?
I probably wouldn't plan to reinstate articles that expired due to the server's local policy; are there any other reasons that might happen?
Undoing a cancel - is that a thing? (And beyond that, without some kind
of recycle bin, the article would have to be restored from some kind of backup.)
If NNTP had something analogous to UIDVALIDITY in IMAP, where one would normally increase the low water mark but could "reset" it in some
unforseen circumstance, that would allow for both behaviors, but there
isn't as far as I know.
Is it possible that those groups have never received traffic on that
server? That's the response I would expect if the server has never
stored an article for that group.
It's possible, though it would surprise me a little - this was running
LIST ACTIVE on Eternal September's INN server, which I think has been
around for a while, but maybe some of these are super old groups that
have been inactive a long while.
For active groups, I do see low water marks that are greater than 1, so
for these groups, there's a commitment to not reinstate articles below
the present low water mark. So is article reinstatement in an empty
group vs non-empty really a special case? To allow unconditional reinstatement, the low water mark would always have to be 1, which is
not really that meaningful.
If I were writing a news server from scratch, I would embrace modern
databases as early as possible and not try to reinvent that wheel. Long
experience with INN is that the reinvention of various databases is one
of the hardest parts of INN to maintain and handing that all off to
some suitable library or external service would be very attractive.
Isn't the database more of a "cache" in INN, of technically
reconstructible data? (in contrast to the active file, which has <last>
which is not reconstructible).
For LIST responses, I don't see how using a database would be faster
than reading through one of these files, especially if you already have
to do ACL checks and wildmat matches on every group - I would think
those would be the bottleneck.
For articles within a group, .overview seems fairly efficient.
GROUP would require a linear scan of the active file for its response,
to find the group, and a database could be faster in that case, but
apart from single-group responses, is there any case where a database
would result in a noticeable speedup?
And at that point, maybe a simple hash table with pointers to the
beginning of the corresponding line in the active file would close the performance gap, without needing to add a database to the picture.
Really, was more wondering about the active file than anything else. While not officially standardized anywhere, it seems in practice there are a few standardized files with standardized formats:
.active (LIST ACTIVE)
.active.times (LIST ACTIVE.TIMES)
.newsgroups (LIST NEWSGROUPS)
<group>/.overview (8 standardized fields)
<group>/<article number> for article naming
I think we've established there's no good reason the real high water mark couldn't be stored here, and I don't think there's any reason the count couldn't be either, since anything that changes the count updates the
active file already.
If you are reworking the format, I would find a way to put the
newsgroup description into the same file, because desynchronization
between active and newsgroups is a long-standing annoyance in INN. And
at that point I would consider some sort of structured database with
fast writes. :)
Hmm, could you elaborate a bit more on the kind of desynchronization that tends to happen?
If I recall, the RFC states that the list of groups from LIST ACTIVE and
LIST NEWSGROUPS can differ (though perhaps this was worded that way to prevent existing installations from violating the spec, not necessarily
to condone that practice? Ideally, would the list of groups always match identically? Or are there ever good reasons they should differ?)
If combining .newsgroups into .active, it makes me wonder, why not go
further and also combine .active.times into .active?
Were these initially separate simply because .active.times came later
and wanted to avoid breaking the format of .active,
<group> <last> <high> <low> <count> <creation epoch> <creator name> <description>
The only thing I can think of (and this applies to .newsgroups but not .active.times) is that if the description is changed, its length can
change, so now the whole active file needs to be rewritten.
But this is probably an uncommon enough occurrence (maybe even less
common than group creation or deletion?) that the performance
implication could be ignored.
The server really only has 946 articles[1]; yet, INN is reporting it has 29,344 (likely because this is larger than the value of groupexactcount,
so it just estimated it). I know the overview database has the count,
though I guess that value is not necessarily up to date, for reasons I
don't understand currently - presumably keeping it up to date would add non-constant overhead with INN's current architecture.
InterLinked <nntp@phreaknet.org> writes:
Isn't the database more of a "cache" in INN, of technically
reconstructible data? (in contrast to the active file, which has <last>
which is not reconstructible).
Well, I'm not sure I agree with the distinction you're making here, since
the active file *is* a database. INN has a whole bunch of databases, some
of which it stores as text files, but just because the format is a text
file doesn't make it a database. INN definitely uses the active file like
a database (hence the zero-padding).
For articles within a group, .overview seems fairly efficient.
Well, we wrote a whole new overview mechanism because we didn't think it
was sufficiently efficient. :) Using only a flat .overview file can be extremely slow for very large groups when clients request only a subset of the records (which is very common; they usually only care about the latest messages).
In theory, a database may be able to do much faster prefix matching than a linear scan doing wildmat matching for, e.g., LIST ACTIVE news.*, but that would require converting wildmat expressions to something the database can understand with LIKE, which may not be possible in the general case.
And at that point, maybe a simple hash table with pointers to the
beginning of the corresponding line in the active file would close the
performance gap, without needing to add a database to the picture.
See, you're going down the same path that all the INN authors, myself included, have gone down: You can see a simple data structure that would solve the problem that you have and it seems more straightforward to just implement that than use a "full database" which feels like it would have a ton of overhead.
And you can do that! That's how INN works! I'm just saying that as someone with a lot of years of experience maintaining that code with a simple hash table and whatnot, a whole lot of time and bugs would have been saved by
just using an off-the-shelf database. At, of course, the cost of having to handle database transitions and implementation changes and BerkeleyDB
getting bought by Oracle and then killed and so forth.
Really, was more wondering about the active file than anything else. While >> not officially standardized anywhere, it seems in practice there are a few >> standardized files with standardized formats:
.active (LIST ACTIVE)
.active.times (LIST ACTIVE.TIMES)
.newsgroups (LIST NEWSGROUPS)
<group>/.overview (8 standardized fields)
<group>/<article number> for article naming
The last is often not used these days because it has a lot of poor performance properties.
There are a few other ones that aren't as widely used and are arguably configuration instead, but that do need to be queryable. They're probably fine as configuration files with some in-memory representation, though,
since they're usually very small.
LIST DISTRIB.PATS
LIST MODERATORS
LIST OVERVIEW.FMT
If combining .newsgroups into .active, it makes me wonder, why not go
further and also combine .active.times into .active?
Yes, indeed.
Were these initially separate simply because .active.times came later
and wanted to avoid breaking the format of .active,
Yup, exactly.
<group> <last> <high> <low> <count> <creation epoch> <creator name>
<description>
Note that you now have a space-separated file except for the last field
and you have a problem if you want to add another field that you didn't
think of originally. I would really want to store this as some sort of structured file because you have some fields there (at least the
description, maybe the creator name) that can contain a variety of characters.
InterLinked <nntp@phreaknet.org> writes:
The server really only has 946 articles[1]; yet, INN is reporting it has
29,344 (likely because this is larger than the value of groupexactcount,
so it just estimated it). I know the overview database has the count,
though I guess that value is not necessarily up to date, for reasons I
don't understand currently - presumably keeping it up to date would add
non-constant overhead with INN's current architecture.
I don't know if it's the case here (I don't know if Eternal September even expires articles)
but historically another really common reason for this
pattern is that the very early article that's holding down the low water
mark was crossposted to some other group (traditionally *.answers) with a much longer retention and the articles after it have expired.
Note that the article count is not really useful to the news reader client under normal circumstances because the news reader often does not care in about how many *total* articles the group contains. If the user has been reading the group (the common case), the news reader really cares about
how many *unread* articles the group has, and for that the article count
is basically useless. The article count as returned by NNTP is pretty much only useful for groups that you have never read, or haven't read for so
long that your read mark is below the low water mark.
On 5/1/2026 5:14 PM, Russ Allbery wrote:
Well, I'm not sure I agree with the distinction you're making here,
since the active file *is* a database. INN has a whole bunch of
databases, some of which it stores as text files, but just because the
format is a text file doesn't make it a database. INN definitely uses
the active file like a database (hence the zero-padding).
Sorry, to be clear, I meant database in the sense of something like
SQLite or MySQL, not using a text file under direct control of the
program as a store.
Makes sense - I'm assuming the new mechanism is a database of each
article, effectively, so you can just select the articles of interest?
Thanks, this is helpful perspective. I think I still need to sleep on
this a bit but hearing about your experience here is really valuable.
Honestly, I was really set on just using flat files before but there are
some compelling reasons you've brought up. Maybe I'll abstract things in
a way such that I can start with flat files and add a DB (SQLite or
other) backend option later that could be used instead. I was trying to
avoid that complexity but it might be worth it.
If going the database route, I'm assuming you would just recommend
SQLite for everything? I would guess a regular RDBMS like MariaDB would
be overkill (and possibly cause issues if the server weren't local).
<group>/<article number> for article naming
The last is often not used these days because it has a lot of poor
performance properties.
You mean one file per article in the spool?
From the documentation, I thought the "tradspool" method in INN was the
most common deployment.
Yes, I skipped these since they're global and not "one entry per group"-
are there any others of those that I missed?
Is the "LIST MODERATORS" file all that is involved in moderation? I
didn't think there was any moderator info explicitly associated with
each group.
Some files already use tab, which I don't *think* is allowed in any of
the metadata to date? If it is, maybe a non-ASCII character like field separator would work.
Adding a field is something to think about. It would be a problem for databases too, though there are various migration tools for extending schemas, at least, and I'll grant that's one area where databases win
over plain text files. But regardless of the underlying format, I'd
prefer to invest enough time in the design up front to hopefully not
need any changes later. Since NNTP has been stable for quite a long time
now, I think that's realistic, unless there are new extensions in the
future which add more metadata - and I do have a few extensions in mind
for later but none would modify the group metadata.
Yes, this also makes sense, so now I wonder why my client gets confused
when this happens... I have a feeling it may not be doing the most intelligent thing but would be interesting to see if it has the same
issue when the count is accurate.
Sorry, to be clear, I meant database in the sense of something like
SQLite or MySQL, not using a text file under direct control of the
program as a store.
SQLite is by far the easiest to use because it's just a library that
stores its stuff in files on disk, which has lots of really nice
properties and makes it very easy to set up and maintain (not entirely trivial, but easy). But it is going to be slow. I suspect that an actual database server that is properly tuned will be faster than SQLite. You may not care. No one has cared enough for INN to write such a backend.
For a news server that requires mininmum maintenance and can mostly just
be ignored, I would recommend CNFS. That's what I use personally. You lose some control and visibility and it's a bad choice if you never want
articles to expire, but it has the huge advantage that you'll never run
out of disk space (the worst thing that happens is that things expire a
bit faster), there's no expensive expire process, and it's really fast and light on resources.
It's rather irrelevant these days, although it does let the client mail a submission to a moderated group directly, which in theory would actually
be better in these days of spam filtering, DMARC, and similar problems
with the email relay system. Not that any clients do this. :)
In any case, I'll just do "tradspool" for now but leave the door open to adding others later.
where "news/group/name" is the name of the newsgroup to which the article was posted with each period changed to a slash, and "nnnnn" is the sequence number of the article in that newsgroup
On 5/1/2026 10:44 PM, Russ Allbery wrote:
For a news server that requires mininmum maintenance and can mostly
just be ignored, I would recommend CNFS. That's what I use personally.
You lose some control and visibility and it's a bad choice if you never
want articles to expire, but it has the huge advantage that you'll
never run out of disk space (the worst thing that happens is that
things expire a bit faster), there's no expensive expire process, and
it's really fast and light on resources.
Interesting... CNFS has always seemed a bit "weird" to me - I see how it excels at certain properties, but not sure if I'm interested in
supporting it myself. My plan is really to run two news servers myself,
one in the "cloud", with expiration varying by group, and open to authenticated users, and one at home, for groups of interest, where
articles never expire (which would function as an archive, but also be
used by my local newsreader).
CNFS seems to work well if you have a set size you want to dedicate per group, but not as efficient for small/empty groups, or if you want to
expire by article count or age - maybe I'm missing something here
though.
I assume because the articles for a group are just in one big file,
articles also have to be duplicated to multiple of these blogs when cross-posted?
It's rather irrelevant these days, although it does let the client mail
a submission to a moderated group directly, which in theory would
actually be better in these days of spam filtering, DMARC, and similar
problems with the email relay system. Not that any clients do this. :)
In what sense is it irrelevant?
I hear people say moderated groups are dead, but I still subscribe to
one moderated group, comp.dcom.telecom, though I thought the news server forwarded it to the moderator, not the client directly. Is the server
not using the LIST MODERATORS data internally to send to moderator?
Admittedly I need to learn more about how moderation works - I don't
think I've seen it discussed much in any RFCs since it's implementation rather than protocol related. But I would imagine when a new group
control message gets shared, it would have to contain moderation info,
and dynamically update the moderator info at that point.
On 5/2/2026 10:18 AM, InterLinked wrote:
In any case, I'll just do "tradspool" for now but leave the door open
to adding others later.
Looking at the documentation for the different storage methods, and for traditional spool, I noticed:
where "news/group/name" is the name of the newsgroup to which the
article was posted with each period changed to a slash, and "nnnnn" is
the sequence number of the article in that newsgroup
So for "misc.test" there would be a subfolder "test" within a subfolder "misc", not just one subfolder "misc.test".
I find this a bit curious, as in IMAP, subfolders work the other way - a folder that is logically a subfolder, "Parent > Sub" is typically named parent.sub in the root maildir, and all the folders are still siblings to each other on disk (except INBOX). Coming from more of an IMAP background,
I would have intuited to just use the group name literally for the folder, but I'm guessing there's a good reason not to do this?
For LIST COUNTS and GROUP, it pulls from group stats. However, in the response for LIST ACTIVE, it simply dumps the line from the active file
as is.
I can't find any examples of newsgroups where the high water mark
article is deleted, so it's hard to poke at this behavior
To make an extreme example, if a group with a lot of articles had all of them except the low water mark article deleted (and "last" is 3000), you could have a response like:
211 3000 1 3000 misc.test
[and the count has to be at least 3000, per the RFC, so we can't even
have 211 1 1 3000 misc.test to indicate there are definite gaps]
That brings me to another observation: I've noticed that most inactive newsgroups in INN return high 0 and low 1 (at least for those I've
analyzed in responses for Usenet groups), which seemed odd to me as I
would have thought INN would naturally take the <high> in the active
file, and taking low = <high> and high = low - 1, return something more
like 211 0 3000 2999 misc.test
Is there a reason INN just uses 1/0 instead?
| o New articles may be added with article numbers greater than the
| reported high water mark. (If an article that was the one with
| the highest number has been removed and the high water mark has
| been adjusted accordingly, the next new article will not have the
| number one greater than the reported high water mark.)
To me, this implies the high water mark can (even "should") decrease when
the high water mark article is removed - in which case, the next article
assigned would indeed not have high + 1 (similar to how UIDs and UIDNEXT
work in IMAP).
So, I have to admit that I don't recall this explicitly coming up during
the RFC discussions, so I don't have a definitive answer for you about why
we worded it this way. I think if we'd noticed this at the time, we would have been a bit clearer about what clients should expect, so I think there
is a (minor) bug in the standard here.
Hi InterLinked,
To make an extreme example, if a group with a lot of articles had all
of them except the low water mark article deleted (and "last" is
3000), you could have a response like:
211 3000 1 3000 misc.test
[and the count has to be at least 3000, per the RFC, so we can't even
have 211 1 1 3000 misc.test to indicate there are definite gaps]
Where do you read in RFC 3977 that the estimate "has to be at least 3000"?
The wording is:
-a-a If the group is not empty, the estimate MUST be at least the actual
-a-a number of articles available and MUST be no greater than one more
-a-a than the difference between the reported low and high water marks.
That brings me to another observation: I've noticed that most inactive
newsgroups in INN return high 0 and low 1 (at least for those I've
analyzed in responses for Usenet groups), which seemed odd to me as I
would have thought INN would naturally take the <high> in the active
file, and taking low = <high> and high = low - 1, return something
more like 211 0 3000 2999 misc.test
Is there a reason INN just uses 1/0 instead?
I confirm what Russ said: high = low - 1 is what INN replies for empty newsgroups which formerly received at least one article.
We even had a bug until recently as for versions prior to 2.7.1, INN returned low = high + 1 which was unfortunately wrong when high was 2^31-1...-a A pretty rare case though :)
It now returns high = low - 1 except of course when low is 0 (for
newsgroups which have never received any article).
You already spotted that as you referenced the related issue in the
Github tracker :)
Hi InterLinked,
For LIST COUNTS and GROUP, it pulls from group stats. However, in the
response for LIST ACTIVE, it simply dumps the line from the active
file as is.
Indeed, and to be more precise, if you give a newsgroup name as an
argument to LIST ACTIVE, this command will pull the information from the overview (like LIST COUNTS and GROUP).
You then may end up with things like that for an empty newsgroup:
GROUP trigofacile.test3
211 0 8 7 trigofacile.test3
LIST ACTIVE trigofacile.test3
215 Newsgroups in form "group high low status"
trigofacile.test3 0000000007 0000000008 y
.
LIST ACTIVE trigofacile.test3*
215 Newsgroups in form "group high low status"
trigofacile.test3 0000000008 0000000008 y
.
The "*" at the end of the last LIST ACTIVE command forces it to parse
the active file to look for matching newsgroup names.
Also, side question, why is it called the "overview database"? It seems
like OVDB is mainly used to satisfy responses for GROUP and LIST ACTIVE
with a single group as an argument. Yet, "overview" also traditionally
refers to the overfile per-group file with a line for each message,
which stores the 8 (or more) headers used in the XOVER/OVER responses. I don't think there is a connection between the two, is there?
Sometimes I also see it referred to as "group stats" like you said,
which seems like a clearer term for what it is, but they seem to interchangeable.
high = low - 1 is what INN replies for empty
newsgroups which formerly received at least one article.
Isn't this also true for empty newsgroups which have never received an article either? low=1 and high=0
To confirm my own understanding, the only reason we do LOW = LAST (which
is the same as LOW = HIGH in INN) and then HIGH = LOW + 1, rather than
LOW = HIGH + 1, is to account for overflow when LAST/HIGH is the max
article number?
the LOW = HIGH + 1
method also has the advantage of being one higher than the other way,
which you pointed out in the erratum. I kind of wonder if it would be
valid to do it this way, except in the case that HIGH is the max article number
Hi InterLinked,
high = low - 1 is what INN replies for empty newsgroups which
formerly received at least one article.
Isn't this also true for empty newsgroups which have never received an
article either? low=1 and high=0
When the newsgroup has never received an article, I assume the concept
of "low water mark" does not exist as there hasn't been any first
article.-a But yes, if you consider low=1 in that case, the formula is
the same.
Maybe the ideal would be to advertise low=0 and high=0 in that case
(allowed by RFC 3977 to represent an empty newsgroup), which would differentiate a newsgroup which has never received any article from
another one which has received only 1 article and is now empty.
Well, nobody matters but it would make sense :)
To confirm my own understanding, the only reason we do LOW = LAST
(which is the same as LOW = HIGH in INN) and then HIGH = LOW + 1,
rather than LOW = HIGH + 1, is to account for overflow when LAST/HIGH
is the max article number?
I don't know whether that were the reason for the formula but yes, at
least it works with the max article number!
the LOW = HIGH + 1 method also has the advantage of being one higher
than the other way, which you pointed out in the erratum. I kind of
wonder if it would be valid to do it this way, except in the case that
HIGH is the max article number
Yes, it is valid.-a It respects the rule that "the high water mark will
be one less than the low water mark",
and when HIGH is the max article
number, you could use LOW = 2^31-1 and HIGH = LOW - 1 (the preferred way
per RFC 3977, as a SHOULD) or LOW = HIGH = 2^31-1 (an alternative way).
Actually, that reminds me what it was about the
erratum I didn't understand - a comment about server synchronization and
how the low water mark a client reads might decrease in this scenario.
Is anyone able to explain how that might happen?
I guess INN explicitly wants to make empty groups low=1/high=0 instead
of low=0/high=0.
Not expecting INN to change, of course, but I think I might do it this
way, as I would like to be as accurate as possible and provide as much "information" as possible in a response.
I'm a bit confused on this last point. It's valid to merely set low=high=2^31-1 to indicate a group is empty?
Is this by chance somehow the 3rd case in RFC 3977 6.1.1.2 for methods
of representing an empty group?
Looking at the erratum:
"The high water mark is one less than the low water mark for empty newsgroups. A major reason for doing it this way was to deal with
clusters of servers. If they're not perfectly synchronized, then
a cancel might be visible on one and not another. So if you connect
to the second one, it looks as if the article has been reinstated.
Wording it like this meant we didn't need special treatment of such
clusters. The low water mark cannot decrease."
If a newsgroup has only article number 12, and this article is cancelled
in cluster A a few seconds before it is in cluster B, a newsreader connecting to cluster A will see low water mark = 13, high water mark =
12 (empty newsgroup with low = high + 1) and if it disconnects and reconnects this time associated to cluster B before the cancel is
executed, it will see low water mark = high water mark = 12, thus having decreased.
When the high = low - 1 formula is used, it sees low water mark = 12 and high water mark = 11 on cluster A.-a The low water mark does not decrease.
Anyway, I agree that the problem is present in non-empty newsgroups if
the low water mark is updated on the fly.-a If cluster A has article 13,
and cluster B has articles 12 and 13, the low water mark will be
inferior when connecting to cluster B...
I'm a bit confused on this last point. It's valid to merely set
low=high=2^31-1 to indicate a group is empty?
Is this by chance somehow the 3rd case in RFC 3977 6.1.1.2 for methods
of representing an empty group?
Yes, it is the third alternative allowed by RFC 3977, and I totally
agree it follows the same rule as a non-empty newsgroup.-a Very liberal :)
-a-a o-a The high water mark is greater than or equal to the low water
-a-a-a-a-a mark.-a The estimated article count might be zero or non-zero; if
-a-a-a-a-a it is non-zero, the same requirements apply as for a non-empty
-a-a-a-a-a group.
But doesn't that still break if there are multiple cancels during
that period? Even the "official" way of doing it *can* theoretically
break.
Initially I was doing LOW = LAST + 1 and then changed to LOW = LAST
simply because INN had, but now that I understand this a bit better, I
think I might change back to LOW = LAST + 1 and just handle 2^31-1
specially to prevent an illegal response (and also use 0 0 0 for an
empty group that never had any articles).
Aside from 2^31-1, is there ever a case where one would use this?
Presumably something behaved this way historically, just can't fathom
why...
There is obviously loss of information in that the client can't tell
these two cases apart.
There are a few other ones that aren't as widely used and are arguably configuration instead, but that do need to be queryable. They're probably fine as configuration files with some in-memory representation, though,
since they're usually very small.
LIST DISTRIB.PATS
LIST MODERATORS
LIST OVERVIEW.FMT
On 5/1/2026 5:14 PM, Russ Allbery wrote:
There are a few other ones that aren't as widely used and are arguably
configuration instead, but that do need to be queryable. They're
probably fine as configuration files with some in-memory
representation, though, since they're usually very small.
LIST DISTRIB.PATS
LIST MODERATORS
LIST OVERVIEW.FMT
Question about LIST DISTRIB.PATS - is Distribution widely used anymore in practice?
I'm wondering if maybe this is because clients never caught on to using
it so that's why they configured it that way.
InterLinked <nntp@phreaknet.org> writes:
On 5/1/2026 5:14 PM, Russ Allbery wrote:
There are a few other ones that aren't as widely used and are arguably
configuration instead, but that do need to be queryable. They're
probably fine as configuration files with some in-memory
representation, though, since they're usually very small.
LIST DISTRIB.PATS
LIST MODERATORS
LIST OVERVIEW.FMT
Question about LIST DISTRIB.PATS - is Distribution widely used anymore in
practice?
Yes, it's used pretty extensively for private hierarchies to control distribution of articles that aren't intended to be propagated beyond the participating servers.
For private hierarchies, couldn't the incoming/outgoing feeds be
configured not to feed such groups to other servers not carrying the hierarchy? e.g.
*,!local.*
No, all the forwarding these days is handled by moderators.isc.org, and there's no place in a control message to document that.
Moderation is a horrible cludge. You will probably be appalled. :) It's a design from another era, and we never completed the work we were hoping to
do to try to make it less of a cludge
so it's very much something out of
an earlier era of the Internet when spam didn't exist.
On 5/2/2026 12:04 PM, Russ Allbery wrote:
No, all the forwarding these days is handled by moderators.isc.org, and
there's no place in a control message to document that. Moderation is a
horrible cludge. You will probably be appalled. :) It's a design from
another era, and we never completed the work we were hoping to do to
try to make it less of a cludge
Was this earlier work a mechanism for automatically distributing
moderation information using control messages, or something else?
Because %s changes the periods in a group name to dashes, the RFC warns
that groups differing only by periods/dashes would have identical
submission templates if only %s is used. In this case, the RFC says
"pattern template cannot be used... for these groups... explicit entries without a pattern will be required".
Since that sounds pretty definite, I'm wondering if that implies that %s
can only appear in the user part by itself or not (at least, the examples
in the RFC all have it by itself). The RFC never says %s has to be the
sole user part, so for example, is this legal?
local.*:prefix+%s@news.example.com
Is this legal? I feel like it would be, but the wording in the RFC that
says that explicit entries can't be used makes me wonder if this isn't.
To distinguish between local.foo.bar and local.foo-bar, for example, you could have:
local.*.*:period+%s@news.example.com
local.*-*:dash+%s@news.example.com
Also, a second question, I noticed in the LIST MODERATORS output from
Eternal September, comp.std.c++ has its own entry, going to std-cpp-submit@...
I can't recall any other groups with + in the name; does this exception
imply that '+' isn't allowed somewhere along the process for submission templates or isc.moderators.org, or is this just a coincidence?
InterLinked <nntp@phreaknet.org> writes:
For private hierarchies, couldn't the incoming/outgoing feeds be
configured not to feed such groups to other servers not carrying the
hierarchy? e.g.
*,!local.*
The above doesn't work properly due to crossposting.
It's possible to use @ wildcards carefully along with rejection patterns
in incoming.conf, but there are some caveats and it's relatively easy to
make a mistake. Distributions are somewhat simpler. The recommendation is
to use all of the mechanisms for effective defense in depth against misconfigurations.
On 5/9/2026 4:27 PM, Russ Allbery wrote:
It's possible to use @ wildcards carefully along with rejection
patterns in incoming.conf, but there are some caveats and it's
relatively easy to make a mistake. Distributions are somewhat simpler.
The recommendation is to use all of the mechanisms for effective
defense in depth against misconfigurations.
Is the idea here that since a distribution is once per-message (which
could have multiple newsgroups, both local and non-local), adding the Distribution prevents posts from going to other servers if any non-local group is one of the newsgroups of a post?
For example, as soon as local.foo is seen, a distribution gets added
marking which would then prevent the message from going to Usenet, even
if it includes groups that, had they been the sole newsgroup of a post,
would have gone to Usenet?
Although in this simple example, the cross-posted Usenet groups would
never reach Usenet, so from what I can tell, this only protects against "posting accidents" since a user wouldn't have a legitimate reason to
try cross-posting to both a local and public group.
| Sysop: | Amessyroom |
|---|---|
| Location: | Fayetteville, NC |
| Users: | 65 |
| Nodes: | 6 (0 / 6) |
| Uptime: | 05:53:00 |
| Calls: | 862 |
| Files: | 1,311 |
| D/L today: |
921 files (14,318M bytes) |
| Messages: | 264,697 |