Forum: Too Lazy BBS

High and low water marks vs active file

From InterLinked@nntp@phreaknet.org to news.software.nntp on Thu Apr 30 18:47:11 2026

From Newsgroup: news.software.nntp

Recently, I've been diving a bit deeper into the behavior of the high
and low water marks and the active file, and I'm a bit confused as to
how they relate.

RFC 3977 clearly spells out that the high and low water marks refer to
the smallest and largest numbered articles in a newsgroup, with the
caveat for empty groups when the high water mark is usually one less
than the low water mark (and the low water mark is the former high water mark[1]).

However, the documentation for the active file in InterNetNews (INN)[2]
says that <high> is the "highest article number that has ever been used
in that newsgroup". This implies it is NOT the same as the reported high
water mark, because the high water mark could decrease while <high> in
this context is monotonically increasing - and the low and high water
marks can be recomputed by scanning the spool directory, while <high> in
the active file cannot (and thus needs to be stored persistently).
Though it certainly doesn't help any that the name "high" is used for
this value as well.

I've been perusing the source of InterNetNews (INN) to try to understand
how it behaves, as a reference. It refers to the active file <high> as
LAST in a few places, and this is used when assigning new article IDs in
a group. This makes sense. For LIST COUNT and GROUP, it pulls from group stats, which I believe is ultimately some kind of database backend that provides the reported water marks and article count. However, in the
response for LIST ACTIVE, it simply dumps the line from the active file
as is. Yet, the RFC says the response format for LIST ACTIVE includes
the reported high and low water marks.

I can't find any examples of newsgroups where the high water mark
article is deleted, so it's hard to poke at this behavior, but it begs
the following questions:

1. If "LAST" is an internal value used for assigning article IDs, and
not the reported high water mark, then why is it being handed out as
such for LIST ACTIVE? I would think it would use the actual reported
high water mark, because if the high water mark article were deleted,
then the response would have the wrong high water mark.

2. The same page says <low> in the active file is ~the low water mark
but "not guaranteed to be accurate" and is just a hint. In INN, do the
values of <low> in the active file ever differ from the low water mark
in the group stats? Or are they distinct values like the active file
<high> (LAST) and the low water mark?

Am I misunderstanding anything here about either INN's behavior or the intention in the RFC? (And while I've used INN as an example, my
interest is more about "correct" news server behavior in general.)

Thanks!

[1] https://github.com/InterNetNews/inn/issues/250
[2] https://www.eyrie.org/~eagle/software/inn/docs/active.html
--- Synchronet 3.21f-Linux NewsLink 1.2

From Russ Allbery@eagle@eyrie.org to news.software.nntp on Thu Apr 30 17:05:24 2026

From Newsgroup: news.software.nntp

InterLinked <nntp@phreaknet.org> writes:

However, the documentation for the active file in InterNetNews (INN)[2]
says that <high> is the "highest article number that has ever been used in that newsgroup". This implies it is NOT the same as the reported high
water mark, because the high water mark could decrease while <high> in
this context is monotonically increasing

Correct, INN never decreases the high water mark under normal operations. Therefore, if the latest article in the group is deleted, the high water
mark will not decrease and will reference a non-existent article.

This is how most news servers have historically behaved, so the arguable implication in RFC 3977 that servers should decrease the high water mark
in this case is arguably a bug. In practice, it has no real effect since
news readers are required to handle this situation anyway, due to:

| The set of articles in a group may change after the GROUP command is
| carried out:
|
| o Articles may be removed from the group.

which is implicitly incorporated by reference into LIST ACTIVE since it references that definition of high and low water marks (and logically has
to be the case regardless, since of course the state of the spool could
change after LIST ACTIVE just as it could after GROUP).

- and the low and high water marks can be recomputed by scanning the
spool directory, while <high> in the active file cannot (and thus needs
to be stored persistently).

Yes, and I'm not sure INN's behavior when the news administrator
reconstructs the active file from the spool is strictly conforming in all
edge cases. (For example, the low water mark could decrease, which is
forbidden by RFC 3977.) In practice, the edge cases probably don't matter.

I've been perusing the source of InterNetNews (INN) to try to understand
how it behaves, as a reference. It refers to the active file <high> as
LAST in a few places, and this is used when assigning new article IDs in a group. This makes sense. For LIST COUNT and GROUP, it pulls from group
stats, which I believe is ultimately some kind of database backend that provides the reported water marks and article count. However, in the
response for LIST ACTIVE, it simply dumps the line from the active file as is. Yet, the RFC says the response format for LIST ACTIVE includes the reported high and low water marks.

In theory INN could construct a LIST ACTIVE response from the overview database. In practice, this is a very frequent operation and the current implementation is probably considerably faster than an overview-based implementation, for dubious benefit.

So, for your questions:

1. If "LAST" is an internal value used for assigning article IDs, and
not the reported high water mark, then why is it being handed out as
such for LIST ACTIVE? I would think it would use the actual reported
high water mark, because if the high water mark article were deleted,
then the response would have the wrong high water mark.

Because it's slow, basically. News readers like for LIST ACTIVE to be very
fast with a large number of groups so that they can show unread article
counts quickly on newsreader startup.

2. The same page says <low> in the active file is ~the low water mark
but "not guaranteed to be accurate" and is just a hint. In INN, do the
values of <low> in the active file ever differ from the low water mark
in the group stats? Or are they distinct values like the active file
<high> (LAST) and the low water mark?

I would never guarantee full integrity between all of INN's various
databases because they're all independent and updated non-transactionally,
so all sorts of weird things are true momentarily. In theory, the low
water mark in the active file should be eventually consistent with the low water mark in overview, but it will certainly vary while in the middle of nighly expire and may vary at other times that I'm not thinking of.

If I were writing a new news server from scratch today in 2026, I would
try very hard not to use INN's design of having four separate databases in three entirely different formats for the active file, the newsgroup descriptions, the overview, and the history. Surely there is some way to
write a transactional database that could track all those things in a more reasonable but still performant way that doesn't require constantly
managing inconsistencies the way that INN does after some crashes or corruption. It used to be that all SQL databases were just too slow, particularly for history, but overview can be put in SQLite these days and
I'm dubious that there is no standard database that could handle history
given how much database optimization has happened over the years.

But INN will evolve slowly, if at all, because it basically works and a
lot of the bugs have been flushed out over the years and changing
architectures is very hard. :)
--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
--- Synchronet 3.21f-Linux NewsLink 1.2

From InterLinked@nntp@phreaknet.org to news.software.nntp on Thu Apr 30 22:21:53 2026

From Newsgroup: news.software.nntp

On 4/30/2026 8:05 PM, Russ Allbery wrote:

InterLinked <nntp@phreaknet.org> writes:

However, the documentation for the active file in InterNetNews (INN)[2]
says that <high> is the "highest article number that has ever been used in >> that newsgroup". This implies it is NOT the same as the reported high
water mark, because the high water mark could decrease while <high> in
this context is monotonically increasing

Correct, INN never decreases the high water mark under normal operations. Therefore, if the latest article in the group is deleted, the high water
mark will not decrease and will reference a non-existent article.

This is how most news servers have historically behaved, so the arguable implication in RFC 3977 that servers should decrease the high water mark
in this case is arguably a bug. In practice, it has no real effect since
news readers are required to handle this situation anyway, due to:

| The set of articles in a group may change after the GROUP command is
| carried out:
|
| o Articles may be removed from the group.

Hmm, that's an interesting way to look at it - even if the article was
deleted *before* the GROUP response is generated, the client wouldn't be
able to tell.

But the 3rd bullet in the phrase you cite (6.1.1.2) also says:

| o New articles may be added with article numbers greater than the
| reported high water mark. (If an article that was the one with
| the highest number has been removed and the high water mark has
| been adjusted accordingly, the next new article will not have the
| number one greater than the reported high water mark.)

To me, this implies the high water mark can (even "should") decrease
when the high water mark article is removed - in which case, the next
article assigned would indeed not have high + 1 (similar to how UIDs and UIDNEXT work in IMAP).

Of course, the phrasing doesn't mandate that the high water mark
decrease in this case, though it seems to allow for that option.

which is implicitly incorporated by reference into LIST ACTIVE since it references that definition of high and low water marks (and logically has
to be the case regardless, since of course the state of the spool could change after LIST ACTIVE just as it could after GROUP).

Yes, that makes sense. But this seems more like a "loophole" or
"shortcut"... did the RFC actually intend it work this way? Or everybody
was already doing it that way before RFC 3977, and that part of it has
just been ignored?

For context, I am working on my own NNTP implementation and I've really
been scratching my head about how to handle this case. It seems like if
I'm able to provide a more accurate response (a lower high water mark),
that would be preferred, but maybe there is a good reason not to do so?
(The obvious one being it requires extra bookkeeping).

To make an extreme example, if a group with a lot of articles had all of
them except the low water mark article deleted (and "last" is 3000), you
could have a response like:

211 3000 1 3000 misc.test

[and the count has to be at least 3000, per the RFC, so we can't even
have 211 1 1 3000 misc.test to indicate there are definite gaps]

when in reality, this is the "most accurate" response:

211 1 1 1 misc.test

Though now that begs the question what to display if that last article
(1) were then deleted. I presume in the first case, it would naturally be:

211 0 3000 2999 misc.test

And this is probably the best response. In the second case, it seems
more ambiguous what the most logical reply would be, since you could
start with either "last" or whatever the last true high water mark was
(e.g. 211 0 0 1 misc.test).

That brings me to another observation: I've noticed that most inactive newsgroups in INN return high 0 and low 1 (at least for those I've
analyzed in responses for Usenet groups), which seemed odd to me as I
would have thought INN would naturally take the <high> in the active
file, and taking low = <high> and high = low - 1, return something more
like 211 0 3000 2999 misc.test

The benefit there is that from the output you could see how many
articles were in the group historically, from the high water mark, even
if all have since been deleted - it's extra context that can be conveyed
"for free". Is there a reason INN just uses 1/0 instead? This seems like
one case where using <last> directly would actually really make sense
for a client.

- and the low and high water marks can be recomputed by scanning the
spool directory, while <high> in the active file cannot (and thus needs
to be stored persistently).

Yes, and I'm not sure INN's behavior when the news administrator
reconstructs the active file from the spool is strictly conforming in all edge cases. (For example, the low water mark could decrease, which is forbidden by RFC 3977.) In practice, the edge cases probably don't matter.

I've been perusing the source of InterNetNews (INN) to try to understand
how it behaves, as a reference. It refers to the active file <high> as
LAST in a few places, and this is used when assigning new article IDs in a >> group. This makes sense. For LIST COUNT and GROUP, it pulls from group
stats, which I believe is ultimately some kind of database backend that
provides the reported water marks and article count. However, in the
response for LIST ACTIVE, it simply dumps the line from the active file as >> is. Yet, the RFC says the response format for LIST ACTIVE includes the
reported high and low water marks.

In theory INN could construct a LIST ACTIVE response from the overview database. In practice, this is a very frequent operation and the current implementation is probably considerably faster than an overview-based implementation, for dubious benefit.

So, for your questions:

1. If "LAST" is an internal value used for assigning article IDs, and
not the reported high water mark, then why is it being handed out as
such for LIST ACTIVE? I would think it would use the actual reported
high water mark, because if the high water mark article were deleted,
then the response would have the wrong high water mark.

Because it's slow, basically. News readers like for LIST ACTIVE to be very fast with a large number of groups so that they can show unread article counts quickly on newsreader startup.

Okay, so performance > strict correctness (which is a reasonable answer,
when the client can't really say it wasn't correct).

Though I don't see why the implementation could not be such that it
would be just as fast - either perhaps through an in-memory cache of low/high/count for all groups, kept in sync with the active file, or
even more simply, storing this all in the active file itself, i.e. with
a format like:

<name> <last> <reportedhigh> <reportedlow> <count> <status>

Performance-wise, since the active file has to be updated when new
articles are posted anyways, and deletions have to update the low water
mark anyways, overall # of writes would stay the same.

I'm not proposing either of these for INN specifically, but wondering if either would make sense in the design of new software. If I had to
guess, maybe the active file hasn't been extended like this for compatibility/portability reasons?

(For simplicity, I've also made the assumption articles won't be
manually deleted outside of the software's knowledge.)

2. The same page says <low> in the active file is ~the low water mark
but "not guaranteed to be accurate" and is just a hint. In INN, do the
values of <low> in the active file ever differ from the low water mark
in the group stats? Or are they distinct values like the active file
<high> (LAST) and the low water mark?

I would never guarantee full integrity between all of INN's various
databases because they're all independent and updated non-transactionally,
so all sorts of weird things are true momentarily. In theory, the low
water mark in the active file should be eventually consistent with the low water mark in overview, but it will certainly vary while in the middle of nighly expire and may vary at other times that I'm not thinking of.

If I were writing a new news server from scratch today in 2026, I would
try very hard not to use INN's design of having four separate databases in three entirely different formats for the active file, the newsgroup descriptions, the overview, and the history. Surely there is some way to write a transactional database that could track all those things in a more reasonable but still performant way that doesn't require constantly
managing inconsistencies the way that INN does after some crashes or corruption. It used to be that all SQL databases were just too slow, particularly for history, but overview can be put in SQLite these days and I'm dubious that there is no standard database that could handle history given how much database optimization has happened over the years.

But INN will evolve slowly, if at all, because it basically works and a
lot of the bugs have been flushed out over the years and changing architectures is very hard. :)

Thanks, this is helpful, since I'm basically writing a new news server
from scratch. Obviously, performance matters, though I don't want to prioritize that above all else - for now, this will be smaller scale
(private news hierarchies or subsets of Usenet).

The low/high water marks and count could be computed at startup by
scanning the directories, and then stored in memory, but now I'm kind of tempted by the idea of just having it all in an "extended" active file.

So, two more big questions, I guess:

1. It seems that convention is to "lie" about the high water mark and
just hand out "last" instead, for performance, at least the way INN is implemented (since the client can't tell that we lied). Considering it
feels against the *spirit* of the RFC, setting aside performance, do you foresee any problems with choosing to provide an accurate high water
mark? I can't see how it would break compatibility, since the RFC
already says the high water mark CAN decrease, even if nobody does it today.

(Edge case being if all articles are deleted, then using last makes
sense - though as I wondered above, I'm not sure if that's even what INN does.)

2. Is INN's active file (or file system more generally) intended to be portable with other news servers? If not, it seems like I could just
extend the active file to add the "true" high water mark along with the article count, and then just use that for both LIST ACTIVE and GROUP.
Then I could be truthful with no performance hit. Sure, I would have to
parse the line and omit "last" and "count", format the rest and return
it, but that seems minor and probably worth it (and as long as I'm
already formatting it at this point, I could return non-padded numbers
instead of zero-padded numbers, ultimately saving bandwidth for listing
all groups as a side benefit - of course, I'd still pad the file to
allow in-place edits).

I realize these are all edge cases, but I could see them arising and I
would prefer to be as correct as possible, especially if performance
won't be impacted much. But maybe (probably) there's something here that
I've not fully thought through...

Thanks!
--- Synchronet 3.21f-Linux NewsLink 1.2

From ram@ram@zedat.fu-berlin.de (Stefan Ram) to news.software.nntp on Fri May 1 10:04:59 2026

From Newsgroup: news.software.nntp

InterLinked <nntp@phreaknet.org> wrote or quoted:

To make an extreme example, if a group with a lot of articles had all of >them except the low water mark article deleted (and "last" is 3000), you >could have a response like:
211 3000 1 3000 misc.test
[and the count has to be at least 3000, per the RFC, so we can't even
have 211 1 1 3000 misc.test to indicate there are definite gaps]
when in reality, this is the "most accurate" response:
211 1 1 1 misc.test

A newsreader might already have read those 3000 articles and made
an internal note:

|In that group, I have seen everything up to 3000.

. So when the newsserver then would go back to "211 1 1 1 misc.test",
the newsreader might miss the next 2999 articles because it deems then
"seen".

LISTGROUP and XHDR can be used to learn more about available articles.

--- Synchronet 3.21f-Linux NewsLink 1.2

From InterLinked@nntp@phreaknet.org to news.software.nntp on Fri May 1 09:58:32 2026

From Newsgroup: news.software.nntp

On 5/1/2026 6:04 AM, Stefan Ram wrote:

InterLinked <nntp@phreaknet.org> wrote or quoted:

To make an extreme example, if a group with a lot of articles had all of
them except the low water mark article deleted (and "last" is 3000), you
could have a response like:
211 3000 1 3000 misc.test
[and the count has to be at least 3000, per the RFC, so we can't even
have 211 1 1 3000 misc.test to indicate there are definite gaps]
when in reality, this is the "most accurate" response:
211 1 1 1 misc.test

A newsreader might already have read those 3000 articles and made
an internal note:

|In that group, I have seen everything up to 3000.

Yes, but not if it's new to the group.

. So when the newsserver then would go back to "211 1 1 1 misc.test",
the newsreader might miss the next 2999 articles because it deems then
"seen".

LISTGROUP and XHDR can be used to learn more about available articles.

Yes, very true.

While all legitimate rationales, they still feel to me a bit like justifications for taking a shortcut. I know life would be simpler if I
took the same shortcut, but so far, it doesn't seem like there is
anything forcing me to either...
--- Synchronet 3.21f-Linux NewsLink 1.2

From Russ Allbery@eagle@eyrie.org to news.software.nntp on Fri May 1 10:38:50 2026

From Newsgroup: news.software.nntp

InterLinked <nntp@phreaknet.org> writes:

On 4/30/2026 8:05 PM, Russ Allbery wrote:

Therefore, if the latest article in the group is deleted, the high
water mark will not decrease and will reference a non-existent article.
This is how most news servers have historically behaved, so the
arguable implication in RFC 3977 that servers should decrease the high
water mark in this case is arguably a bug. In practice, it has no real
effect since news readers are required to handle this situation anyway,
due to:

| The set of articles in a group may change after the GROUP command is
| carried out:
|
| o Articles may be removed from the group.

Hmm, that's an interesting way to look at it - even if the article was deleted *before* the GROUP response is generated, the client wouldn't be
able to tell.

But the 3rd bullet in the phrase you cite (6.1.1.2) also says:

| o New articles may be added with article numbers greater than the
| reported high water mark. (If an article that was the one with
| the highest number has been removed and the high water mark has
| been adjusted accordingly, the next new article will not have the
| number one greater than the reported high water mark.)

To me, this implies the high water mark can (even "should") decrease when
the high water mark article is removed - in which case, the next article assigned would indeed not have high + 1 (similar to how UIDs and UIDNEXT
work in IMAP).

So, I have to admit that I don't recall this explicitly coming up during
the RFC discussions, so I don't have a definitive answer for you about why
we worded it this way. I think if we'd noticed this at the time, we would
have been a bit clearer about what clients should expect, so I think there
is a (minor) bug in the standard here.

What I can say is that the intent of RFC 3977 was to document existing
practice (which had moved on a lot since RFC 977) and add some new
features, but not to rule out the behavior of existing servers unless it
was clearly wrong in some way that would cause problems.

There are two fairly obvious ways to handle the high water mark:

1. Keep low and high water marks in only one place, increment the high
water mark on every new article arrival as part of article numbering,
and never decrement it because it doubles as the source of the next
article number for that group.

2. Keep internal "next article number" data for each group but report the
high water mark based on what articles are in the spool at the time.

Historically, INN (and C News, I'm fairly sure) always did 1, so that was
very widespread practice. I'm fairly sure that we wouldn't have chosen to declare it nonconformant. 2 is arguably more correct so the language
should (and was) written to *allow* it, but we wouldn't have *required* it
and ruled the historic INN behavior non-compliant. INN's ability to do 1
in theory based on the overview database information is new in INN 2.x as
I recall. Before that, OVcancel was not a thing, and there was no way to
remove the information about the cancelled article from overview before
the next nightly expire, so there was no independent source of truth about
the current article numbers beyond checking the spool.

I agree with you that this didn't really make it into the text, but I
think that's just a minor bug in the standard that we didn't catch at the
time.

Thinking about the problem this morning, I do see a small but real
advantage to the client in getting an accurate high water mark: It means
that the count of unread articles derived purely from LIST ACTIVE will be
more correct in the specific case that only the highest-numbered article
was removed. That in turn may save some spurious notification of unread messages. But counts based solely on LIST ACTIVE responses are going to be inaccurate for the more common case (for servers that support article
removal at all in their configuration) of an article that is *not* the highest-numbered article being removed. This is just the tradeoff of using
LIST ACTIVE for article numbers; if the client wants more accurate
information, it needs to use one of the other commands like OVER. But of
course those are inherently heavier-weight, due to the increased amount of information returned and the requirement of a round trip per group.

Of course, the phrasing doesn't mandate that the high water mark
decrease in this case, though it seems to allow for that option.

Yes, I agree that decreasing the high water mark is definitely allowed.

Yes, that makes sense. But this seems more like a "loophole" or
"shortcut"... did the RFC actually intend it work this way? Or everybody
was already doing it that way before RFC 3977, and that part of it has
just been ignored?

The latter. You'll find that this is really common in the netnews RFCs:
there was such a long gap between the initial RFCs and the updates, and so
much changed about the implementations in a not-entirely-coherent way,
that the RFCs allow for a lot of variations of behavior to avoid declaring existing implementations nonconformant with the new standard except where
that seemed warranted.

The primary purpose of the RFC refresh cycle was not to try to clean up
all the existing implementations, but instead ot document what the
behavior was in as clean of a way as possible so that new software knew
what it could rely on.

For context, I am working on my own NNTP implementation and I've really
been scratching my head about how to handle this case. It seems like if
I'm able to provide a more accurate response (a lower high water mark),
that would be preferred, but maybe there is a good reason not to do so?
(The obvious one being it requires extra bookkeeping).

If you can provide a more accurate high water mark, I don't see any
drawback to doing so. The only possible downside that I can imagine is
that some client will be surprised by the high water mark decreasing,
since it has never seen a server that would do that, and might issue some
sort of warning to the user. I suppose such a client could exist. But it
would surprise me a bit; decreasing the high water mark is clearly allowed
by the RFC.

To make an extreme example, if a group with a lot of articles had all of
them except the low water mark article deleted (and "last" is 3000), you could have a response like:

211 3000 1 3000 misc.test

[and the count has to be at least 3000, per the RFC, so we can't even have 211 1 1 3000 misc.test to indicate there are definite gaps]

Yup, and in the days when spam and spam cancels were fighting it out, it
wasn't uncommon to see things like that happen in some groups.

when in reality, this is the "most accurate" response:

211 1 1 1 misc.test

Though now that begs the question what to display if that last article (1) were then deleted. I presume in the first case, it would naturally be:

211 0 3000 2999 misc.test

And this is probably the best response. In the second case, it seems more ambiguous what the most logical reply would be, since you could start with either "last" or whatever the last true high water mark was (e.g. 211 0 0
1 misc.test).

*If* your server would never reinstate articles, the best response in the
sense of giving the client the most information would be to increase the
low water mark and return a high of 2999 and a low of 3000, because the
client can then forget about all of those deleted articles permanently.
But as the RFC says, if you might ever reinstate those articles, you're
not allowed to increase the low water mark like that, so I think the best response would be to return high 0 and low 1 if the articles may later reappear.

That brings me to another observation: I've noticed that most inactive newsgroups in INN return high 0 and low 1 (at least for those I've
analyzed in responses for Usenet groups), which seemed odd to me as I
would have thought INN would naturally take the <high> in the active file, and taking low = <high> and high = low - 1, return something more like 211
0 3000 2999 misc.test

Is it possible that those groups have never received traffic on that
server? That's the response I would expect if the server has never stored
an article for that group.

The benefit there is that from the output you could see how many articles were in the group historically, from the high water mark, even if all have since been deleted - it's extra context that can be conveyed "for free".
Is there a reason INN just uses 1/0 instead? This seems like one case
where using <last> directly would actually really make sense for a client.

I *think* that if it had ever received traffic (and the news administrator hadn't rebuilt the active file, etc.), you would see the result that you
are expecting.

Okay, so performance > strict correctness (which is a reasonable answer,
when the client can't really say it wasn't correct).

Exactly.

Though I don't see why the implementation could not be such that it
would be just as fast - either perhaps through an in-memory cache of low/high/count for all groups, kept in sync with the active file, or
even more simply, storing this all in the active file itself, i.e. with
a format like:

<name> <last> <reportedhigh> <reportedlow> <count> <status>

There is no reason that one could not make it fast. It's just extra
development work and extra bookkeeping that no one has implemented for
INN. The overview database, where the more accurate information is stored,
is optimized for per-group retrieval and, for some of the overview
backends currently implemented, iterating through all of the groups to get current low and high marks would be slow.

I'm not proposing either of these for INN specifically, but wondering if either would make sense in the design of new software. If I had to
guess, maybe the active file hasn't been extended like this for compatibility/portability reasons?

Yes, exactly, and just because this didn't seem important enough to put
effort into.

The low/high water marks and count could be computed at startup by
scanning the directories, and then stored in memory, but now I'm kind of tempted by the idea of just having it all in an "extended" active file.

If I were writing a news server from scratch, I would embrace modern
databases as early as possible and not try to reinvent that wheel. Long experience with INN is that the reinvention of various databases is one of
the hardest parts of INN to maintain and handing that all off to some
suitable library or external service would be very attractive.

1. It seems that convention is to "lie" about the high water mark and
just hand out "last" instead, for performance, at least the way INN is implemented (since the client can't tell that we lied). Considering it
feels against the *spirit* of the RFC, setting aside performance, do you foresee any problems with choosing to provide an accurate high water
mark? I can't see how it would break compatibility, since the RFC
already says the high water mark CAN decrease, even if nobody does it
today.

I suspect it would be fine to do that.

2. Is INN's active file (or file system more generally) intended to be portable with other news servers?

Not really, no. Some of INN's on-disk data structures match the format of
files specified in the RFC for convenience reasons, but most of INN"s
on-disk data structures (apart from the spool if tradspool is used) are
very, very specific to INN.

If not, it seems like I could just extend the active file to add the
"true" high water mark along with the article count, and then just use
that for both LIST ACTIVE and GROUP. Then I could be truthful with no performance hit.

If you are reworking the format, I would find a way to put the newsgroup description into the same file, because desynchronization between active
and newsgroups is a long-standing annoyance in INN. And at that point I
would consider some sort of structured database with fast writes. :)
--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
--- Synchronet 3.21f-Linux NewsLink 1.2

From InterLinked@nntp@phreaknet.org to news.software.nntp on Fri May 1 16:37:34 2026

From Newsgroup: news.software.nntp

On 5/1/2026 1:38 PM, Russ Allbery wrote:

There are two fairly obvious ways to handle the high water mark:

1. Keep low and high water marks in only one place, increment the high
water mark on every new article arrival as part of article numbering,
and never decrement it because it doubles as the source of the next
article number for that group.

2. Keep internal "next article number" data for each group but report the
high water mark based on what articles are in the spool at the time.

Historically, INN (and C News, I'm fairly sure) always did 1, so that was very widespread practice. I'm fairly sure that we wouldn't have chosen to declare it nonconformant. 2 is arguably more correct so the language
should (and was) written to *allow* it, but we wouldn't have *required* it and ruled the historic INN behavior non-compliant. INN's ability to do 1
in theory based on the overview database information is new in INN 2.x as
I recall. Before that, OVcancel was not a thing, and there was no way to remove the information about the cancelled article from overview before
the next nightly expire, so there was no independent source of truth about the current article numbers beyond checking the spool.

I think I understand the landscape now - both are compliant but choose
to gravitate slightly more towards either performance or correctness.

I'm more surprised if it's the case that maybe this is the first time
anyone is considering #2 in a design.

Thinking about the problem this morning, I do see a small but real
advantage to the client in getting an accurate high water mark: It means
that the count of unread articles derived purely from LIST ACTIVE will be more correct in the specific case that only the highest-numbered article
was removed. That in turn may save some spurious notification of unread messages. But counts based solely on LIST ACTIVE responses are going to be inaccurate for the more common case (for servers that support article
removal at all in their configuration) of an article that is *not* the highest-numbered article being removed. This is just the tradeoff of using LIST ACTIVE for article numbers; if the client wants more accurate information, it needs to use one of the other commands like OVER. But of course those are inherently heavier-weight, due to the increased amount of information returned and the requirement of a round trip per group.

I think I'll definitely want to consider that angle - hitherto, I've
been directly using Eternal September in my newsreader (which is Mozilla-based) and I've noticed for some large groups, I see a very high count, and then when I click on the group, it changes radically. Just
now, I did a packet capture and I only see it using the GROUP command
(and not LIST ACTIVE at all), but from the configuration that INN
allows, I wonder if maybe Eternal September has their INN (for indeed,
they are using INN) configured to return estimate group counts in most
cases, and thus my reader only sees the correct count when I click on
the group.

Could've been some other command, but that makes me desire even more
strongly to always provide accurate counts as well, if nothing else to
avoid irritating me :)

If you can provide a more accurate high water mark, I don't see any
drawback to doing so. The only possible downside that I can imagine is
that some client will be surprised by the high water mark decreasing,
since it has never seen a server that would do that, and might issue some sort of warning to the user. I suppose such a client could exist. But it would surprise me a bit; decreasing the high water mark is clearly allowed
by the RFC.

To make an extreme example, if a group with a lot of articles had all of
them except the low water mark article deleted (and "last" is 3000), you
could have a response like:

211 3000 1 3000 misc.test

[and the count has to be at least 3000, per the RFC, so we can't even have >> 211 1 1 3000 misc.test to indicate there are definite gaps]

Yup, and in the days when spam and spam cancels were fighting it out, it wasn't uncommon to see things like that happen in some groups.

when in reality, this is the "most accurate" response:

211 1 1 1 misc.test

Though now that begs the question what to display if that last article (1) >> were then deleted. I presume in the first case, it would naturally be:

211 0 3000 2999 misc.test

And this is probably the best response. In the second case, it seems more
ambiguous what the most logical reply would be, since you could start with >> either "last" or whatever the last true high water mark was (e.g. 211 0 0
1 misc.test).

*If* your server would never reinstate articles, the best response in the sense of giving the client the most information would be to increase the
low water mark and return a high of 2999 and a low of 3000, because the client can then forget about all of those deleted articles permanently.
But as the RFC says, if you might ever reinstate those articles, you're
not allowed to increase the low water mark like that, so I think the best response would be to return high 0 and low 1 if the articles may later reappear.

Hmm - yet another fork in the road!

How often does article reinstatement really occur and under what circumstances? Purely by the local newsmaster? I probably wouldn't plan
to reinstate articles that expired due to the server's local policy; are
there any other reasons that might happen? Undoing a cancel - is that a
thing? (And beyond that, without some kind of recycle bin, the article
would have to be restored from some kind of backup.)

My personal uninformed preference at the moment is probably to refrain
from reinstating articles, if only because empty groups would then show
the historical high water mark count in LIST ACTIVE, which to me would
be *very* useful and interesting for statistical and information purposes.

If NNTP had something analogous to UIDVALIDITY in IMAP, where one would normally increase the low water mark but could "reset" it in some
unforseen circumstance, that would allow for both behaviors, but there
isn't as far as I know. I know you mentioned water marks can change in potentially non-compliant ways if a group is renumbered, so I guess INN
may not even be consistent in other cases.

That brings me to another observation: I've noticed that most inactive
newsgroups in INN return high 0 and low 1 (at least for those I've
analyzed in responses for Usenet groups), which seemed odd to me as I
would have thought INN would naturally take the <high> in the active file, >> and taking low = <high> and high = low - 1, return something more like 211 >> 0 3000 2999 misc.test

Is it possible that those groups have never received traffic on that
server? That's the response I would expect if the server has never stored
an article for that group.

It's possible, though it would surprise me a little - this was running
LIST ACTIVE on Eternal September's INN server, which I think has been
around for a while, but maybe some of these are super old groups that
have been inactive a long while.

For active groups, I do see low water marks that are greater than 1, so
for these groups, there's a commitment to not reinstate articles below
the present low water mark. So is article reinstatement in an empty
group vs non-empty really a special case? To allow unconditional reinstatement, the low water mark would always have to be 1, which is
not really that meaningful. (So intuitively, I would feel that it makes
more sense to keep the low water mark as high as is legal at any given
point, assuming reinstatement isn't likely to occur.)

If I were writing a news server from scratch, I would embrace modern databases as early as possible and not try to reinvent that wheel. Long experience with INN is that the reinvention of various databases is one of the hardest parts of INN to maintain and handing that all off to some suitable library or external service would be very attractive.

Isn't the database more of a "cache" in INN, of technically
reconstructible data? (in contrast to the active file, which has <last>
which is not reconstructible).

For LIST responses, I don't see how using a database would be faster
than reading through one of these files, especially if you already have
to do ACL checks and wildmat matches on every group - I would think
those would be the bottleneck. For articles within a group, .overview
seems fairly efficient.

GROUP would require a linear scan of the active file for its response,
to find the group, and a database could be faster in that case, but
apart from single-group responses, is there any case where a database
would result in a noticeable speedup? And at that point, maybe a simple
hash table with pointers to the beginning of the corresponding line in
the active file would close the performance gap, without needing to add
a database to the picture.

(I'm not opposed to a database if it really made sense, but it seems
like a few flat files can get the job done here ~just as good - though
maybe I'm missing something obvious.)

2. Is INN's active file (or file system more generally) intended to be
portable with other news servers?

Not really, no. Some of INN's on-disk data structures match the format of files specified in the RFC for convenience reasons, but most of INN"s
on-disk data structures (apart from the spool if tradspool is used) are
very, very specific to INN.

Gotcha, makes sense.

Really, was more wondering about the active file than anything else.
While not officially standardized anywhere, it seems in practice there
are a few standardized files with standardized formats:

.active (LIST ACTIVE)
.active.times (LIST ACTIVE.TIMES)
.newsgroups (LIST NEWSGROUPS)
<group>/.overview (8 standardized fields)
<group>/<article number> for article naming

My plan was to go with these, and possibly bastardize .active in the
process in a way nobody else has done (adding <real high> and <count>).
This has the downside of deviating from the canonical format for the
file, which does seem to be pretty universal amongst existing software.
It just seems silly to me to add another file just to avoid breaking compatibility. (My thinking: worst case, if needed, a migration could
always be done using NNTP itself anyways, to another server - in which
case the format of my active file is nobody else's business.)

I think we've established there's no good reason the real high water
mark couldn't be stored here, and I don't think there's any reason the
count couldn't be either, since anything that changes the count updates
the active file already.

If not, it seems like I could just extend the active file to add the
"true" high water mark along with the article count, and then just use
that for both LIST ACTIVE and GROUP. Then I could be truthful with no
performance hit.

If you are reworking the format, I would find a way to put the newsgroup description into the same file, because desynchronization between active
and newsgroups is a long-standing annoyance in INN. And at that point I
would consider some sort of structured database with fast writes. :)

Hmm, could you elaborate a bit more on the kind of desynchronization
that tends to happen?

If I recall, the RFC states that the list of groups from LIST ACTIVE and
LIST NEWSGROUPS can differ (though perhaps this was worded that way to
prevent existing installations from violating the spec, not necessarily
to condone that practice? Ideally, would the list of groups always match identically? Or are there ever good reasons they should differ?)

If combining .newsgroups into .active, it makes me wonder, why not go
further and also combine .active.times into .active? Were these
initially separate simply because .active.times came later and wanted to
avoid breaking the format of .active, or for some other good reason? It
would seem there *could* theoretically be just one big global file, like so:

.active.extended

<group> <last> <high> <low> <count> <creation epoch> <creator name> <description>

The only thing I can think of (and this applies to .newsgroups but not .active.times) is that if the description is changed, its length can
change, so now the whole active file needs to be rewritten. But this is probably an uncommon enough occurrence (maybe even less common than
group creation or deletion?) that the performance implication could be ignored.

I had also previously assumed .newsgroups was separate because group descriptions contain spaces/tabs, which would complicate the parsing if combined with other stuff. But if I made it the last entry on each line,
it wouldn't pose an issue.

And of course, now all the LIST handlers would need to parse the file
and send the right info, but that's not a big deal either. Maybe
slightly more contention for the file with locking, is all I can think.

Most commands would then simply do a full scan of this file and get what
they need, either for all groups or just a specific group.

Writes (new or deleted articles) would generally update <last>, <high>,
<low>, and/or <count>, and while existing servers don't do that, they
*are* already updating *something* in the file (<last> for new posts,
and at least one of the water marks for deletions), so updating the
other metadata is effectively "free".
--- Synchronet 3.21f-Linux NewsLink 1.2

From InterLinked@nntp@phreaknet.org to news.software.nntp on Fri May 1 16:56:57 2026

From Newsgroup: news.software.nntp

On 5/1/2026 4:37 PM, InterLinked wrote:

On 5/1/2026 1:38 PM, Russ Allbery wrote:

Thinking about the problem this morning, I do see a small but real
advantage to the client in getting an accurate high water mark: It means
that the count of unread articles derived purely from LIST ACTIVE will be
more correct in the specific case that only the highest-numbered article
was removed. That in turn may save some spurious notification of unread
messages. But counts based solely on LIST ACTIVE responses are going
to be
inaccurate for the more common case (for servers that support article
removal at all in their configuration) of an article that is *not* the
highest-numbered article being removed. This is just the tradeoff of
using
LIST ACTIVE for article numbers; if the client wants more accurate
information, it needs to use one of the other commands like OVER. But of
course those are inherently heavier-weight, due to the increased
amount of
information returned and the requirement of a round trip per group.

I think I'll definitely want to consider that angle - hitherto, I've
been directly using Eternal September in my newsreader (which is Mozilla-based) and I've noticed for some large groups, I see a very high count, and then when I click on the group, it changes radically. Just
now, I did a packet capture and I only see it using the GROUP command
(and not LIST ACTIVE at all), but from the configuration that INN
allows, I wonder if maybe Eternal September has their INN (for indeed,
they are using INN) configured to return estimate group counts in most cases, and thus my reader only sees the correct count when I click on
the group.

Could've been some other command, but that makes me desire even more strongly to always provide accurate counts as well, if nothing else to
avoid irritating me :)

And my newsreader just did exactly this annoying thing, and I captured
the commands. This is using comp.os.linux.misc, via Eternal September,
as an example:

My newsreader ran GROUP and got back:

211 29344 61512 90857

Then, it immediately ran XOVER 90857-90857 and got a response to that (I
don't think this is relevant though).

In a period of about 1 second, the unread count for the group went from
6, to 20-something thousand, back to 6. I think this is the phenomenon
you were describing, and yes, it annoys the heck out of me! But now it
clicks as to why it behaves like that.

The server really only has 946 articles[1]; yet, INN is reporting it has 29,344 (likely because this is larger than the value of groupexactcount,
so it just estimated it). I know the overview database has the count,
though I guess that value is not necessarily up to date, for reasons I
don't understand currently - presumably keeping it up to date would add non-constant overhead with INN's current architecture.

But if my reasoning is sound for the proposal I've been contemplating,
then I could easily write/read the count for "free", and my software
would avoid these sorts of "UI glitches" in readers.

[1] https://www.eternal-september.org/groups.php?hierarchy=comp
--- Synchronet 3.21f-Linux NewsLink 1.2

From Russ Allbery@eagle@eyrie.org to news.software.nntp on Fri May 1 14:14:21 2026

From Newsgroup: news.software.nntp

InterLinked <nntp@phreaknet.org> writes:

How often does article reinstatement really occur and under what circumstances? Purely by the local newsmaster?

INN doesn't support it at all, although there are some corruption repair
tools that can create similar effects. It therefore never attempts to
reserve low water mark space.

I think people have discussed article reinstatement in theory, usually
around spam filtering scenarios where an article is quaratined as possible
spam and then later released. But I don't know if any server has actually implemented this, and therefore am not sure whether the discussion of this
in the NNTP RFC is theoretical or based on some implementation. It was
probably discussed at the time, but it's been more than 20 years and I
don't remember, sadly.

I probably wouldn't plan to reinstate articles that expired due to the server's local policy; are there any other reasons that might happen?
Undoing a cancel - is that a thing? (And beyond that, without some kind
of recycle bin, the article would have to be restored from some kind of backup.)

There is no control message to undo a cancel, but of course the local administrator can do anything the software allows.

If NNTP had something analogous to UIDVALIDITY in IMAP, where one would normally increase the low water mark but could "reset" it in some
unforseen circumstance, that would allow for both behaviors, but there
isn't as far as I know.

Correct, there's no such concept in NNTP.

Is it possible that those groups have never received traffic on that
server? That's the response I would expect if the server has never
stored an article for that group.

It's possible, though it would surprise me a little - this was running
LIST ACTIVE on Eternal September's INN server, which I think has been
around for a while, but maybe some of these are super old groups that
have been inactive a long while.

There are definitely Big Eight groups that haven't gotten any traffic for
10-20 years. (Some moderated ones, at least.)

For active groups, I do see low water marks that are greater than 1, so
for these groups, there's a commitment to not reinstate articles below
the present low water mark. So is article reinstatement in an empty
group vs non-empty really a special case? To allow unconditional reinstatement, the low water mark would always have to be 1, which is
not really that meaningful.

Correct. The only requirement is to not increase the low water mark if
you'd reinstate one of those older articles. The empty group isn't a
special case. The special case is more "the articles were all removed down
to the low water mark by something other than expiration," since
presumably you would never reinstate expired articles, only ones removed
by some other mechanism that may be erroneous, like cancels (which can be forged if one isn't using canlock or the like).

If I were writing a news server from scratch, I would embrace modern
databases as early as possible and not try to reinvent that wheel. Long
experience with INN is that the reinvention of various databases is one
of the hardest parts of INN to maintain and handing that all off to
some suitable library or external service would be very attractive.

Isn't the database more of a "cache" in INN, of technically
reconstructible data? (in contrast to the active file, which has <last>
which is not reconstructible).

Well, I'm not sure I agree with the distinction you're making here, since
the active file *is* a database. INN has a whole bunch of databases, some
of which it stores as text files, but just because the format is a text
file doesn't make it a database. INN definitely uses the active file like
a database (hence the zero-padding).

In general, there is data in many of the databases that cannot be
reconstructed from the spool, such as article arrival time and the record
of rejected articles. Overview is a bit of a special case that overview
can generally be regenerated solely from the spool, but that's just one of
the (many) databases.

For LIST responses, I don't see how using a database would be faster
than reading through one of these files, especially if you already have
to do ACL checks and wildmat matches on every group - I would think
those would be the bottleneck.

I don't think the database would be faster necessarily. I think it would
be more maintainable and more consistent and have fewer of the numerous
bugs we've run into with INN over the years. Having transactions, for
instance, eliminate a whole set of corruption inconsistencies. Combining
active and newsgroups eliminates a whole class of synchronization issues
when processing control messages.

For articles within a group, .overview seems fairly efficient.

Well, we wrote a whole new overview mechanism because we didn't think it
was sufficiently efficient. :) Using only a flat .overview file can be extremely slow for very large groups when clients request only a subset of
the records (which is very common; they usually only care about the latest messages).

A simple flat .overview text file is how INN 1.x worked, and it does
indeed work fine for small groups. Everything works fine for small groups.

GROUP would require a linear scan of the active file for its response,
to find the group, and a database could be faster in that case, but
apart from single-group responses, is there any case where a database
would result in a noticeable speedup?

In theory, a database may be able to do much faster prefix matching than a linear scan doing wildmat matching for, e.g., LIST ACTIVE news.*, but that would require converting wildmat expressions to something the database can understand with LIKE, which may not be possible in the general case.

And at that point, maybe a simple hash table with pointers to the
beginning of the corresponding line in the active file would close the performance gap, without needing to add a database to the picture.

See, you're going down the same path that all the INN authors, myself
included, have gone down: You can see a simple data structure that would
solve the problem that you have and it seems more straightforward to just implement that than use a "full database" which feels like it would have a
ton of overhead.

And you can do that! That's how INN works! I'm just saying that as someone
with a lot of years of experience maintaining that code with a simple hash table and whatnot, a whole lot of time and bugs would have been saved by
just using an off-the-shelf database. At, of course, the cost of having to handle database transitions and implementation changes and BerkeleyDB
getting bought by Oracle and then killed and so forth.

INN now has a SQLite overview implementation and I think that's the right direction to go.

Really, was more wondering about the active file than anything else. While not officially standardized anywhere, it seems in practice there are a few standardized files with standardized formats:

.active (LIST ACTIVE)
.active.times (LIST ACTIVE.TIMES)
.newsgroups (LIST NEWSGROUPS)
<group>/.overview (8 standardized fields)
<group>/<article number> for article naming

The last is often not used these days because it has a lot of poor
performance properties.

There are a few other ones that aren't as widely used and are arguably configuration instead, but that do need to be queryable. They're probably
fine as configuration files with some in-memory representation, though,
since they're usually very small.

LIST DISTRIB.PATS
LIST MODERATORS
LIST OVERVIEW.FMT

I think we've established there's no good reason the real high water mark couldn't be stored here, and I don't think there's any reason the count couldn't be either, since anything that changes the count updates the
active file already.

Yes, I agree that seems reasonable.

If you are reworking the format, I would find a way to put the
newsgroup description into the same file, because desynchronization
between active and newsgroups is a long-standing annoyance in INN. And
at that point I would consider some sort of structured database with
fast writes. :)

Hmm, could you elaborate a bit more on the kind of desynchronization that tends to happen?

Deleting the group doesn't remove the description line. The control
message processing dies in the middle because the server crashes and a
line gets added to one and not the other. The description gets added to
the newsgroups file more than once. Some of these are bugs, but that's
part of the point: It's theoretically possible to keep the files fully in
sync, but in practice this has been an area where INN has had tons of
bugs over the years.

If I recall, the RFC states that the list of groups from LIST ACTIVE and
LIST NEWSGROUPS can differ (though perhaps this was worded that way to prevent existing installations from violating the spec, not necessarily
to condone that practice? Ideally, would the list of groups always match identically? Or are there ever good reasons they should differ?)

Yeah, I think that may be toleration for lots and lots of historic bugs. Really, there's no reason why this should be the case except for the
narrow case of a group being added or removed between the two commands.

If combining .newsgroups into .active, it makes me wonder, why not go
further and also combine .active.times into .active?

Yes, indeed.

Were these initially separate simply because .active.times came later
and wanted to avoid breaking the format of .active,

Yup, exactly.

<group> <last> <high> <low> <count> <creation epoch> <creator name> <description>

Note that you now have a space-separated file except for the last field
and you have a problem if you want to add another field that you didn't
think of originally. I would really want to store this as some sort of structured file because you have some fields there (at least the
description, maybe the creator name) that can contain a variety of
characters.

The only thing I can think of (and this applies to .newsgroups but not .active.times) is that if the description is changed, its length can
change, so now the whole active file needs to be rewritten.

Yup, that too. That's the big argument for a database, which supports
updates without having to rewrite the whole file.

But this is probably an uncommon enough occurrence (maybe even less
common than group creation or deletion?) that the performance
implication could be ignored.

It's roughly as common as group creation or deletion in my experience.
--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Russ Allbery@eagle@eyrie.org to news.software.nntp on Fri May 1 14:18:42 2026

From Newsgroup: news.software.nntp

InterLinked <nntp@phreaknet.org> writes:

The server really only has 946 articles[1]; yet, INN is reporting it has 29,344 (likely because this is larger than the value of groupexactcount,
so it just estimated it). I know the overview database has the count,
though I guess that value is not necessarily up to date, for reasons I
don't understand currently - presumably keeping it up to date would add non-constant overhead with INN's current architecture.

I don't know if it's the case here (I don't know if Eternal September even expires articles), but historically another really common reason for this pattern is that the very early article that's holding down the low water
mark was crossposted to some other group (traditionally *.answers) with a
much longer retention and the articles after it have expired.

Note that the article count is not really useful to the news reader client under normal circumstances because the news reader often does not care in
about how many *total* articles the group contains. If the user has been reading the group (the common case), the news reader really cares about
how many *unread* articles the group has, and for that the article count
is basically useless. The article count as returned by NNTP is pretty much
only useful for groups that you have never read, or haven't read for so
long that your read mark is below the low water mark.
--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
--- Synchronet 3.21f-Linux NewsLink 1.2

From InterLinked@nntp@phreaknet.org to news.software.nntp on Fri May 1 20:29:28 2026

From Newsgroup: news.software.nntp

On 5/1/2026 5:14 PM, Russ Allbery wrote:

InterLinked <nntp@phreaknet.org> writes:

Isn't the database more of a "cache" in INN, of technically
reconstructible data? (in contrast to the active file, which has <last>
which is not reconstructible).

Well, I'm not sure I agree with the distinction you're making here, since
the active file *is* a database. INN has a whole bunch of databases, some
of which it stores as text files, but just because the format is a text
file doesn't make it a database. INN definitely uses the active file like
a database (hence the zero-padding).

Sorry, to be clear, I meant database in the sense of something like
SQLite or MySQL, not using a text file under direct control of the
program as a store.

For articles within a group, .overview seems fairly efficient.

Well, we wrote a whole new overview mechanism because we didn't think it
was sufficiently efficient. :) Using only a flat .overview file can be extremely slow for very large groups when clients request only a subset of the records (which is very common; they usually only care about the latest messages).

Makes sense - I'm assuming the new mechanism is a database of each
article, effectively, so you can just select the articles of interest?

In theory, a database may be able to do much faster prefix matching than a linear scan doing wildmat matching for, e.g., LIST ACTIVE news.*, but that would require converting wildmat expressions to something the database can understand with LIKE, which may not be possible in the general case.

Another good point, thank you.

And at that point, maybe a simple hash table with pointers to the
beginning of the corresponding line in the active file would close the
performance gap, without needing to add a database to the picture.

See, you're going down the same path that all the INN authors, myself included, have gone down: You can see a simple data structure that would solve the problem that you have and it seems more straightforward to just implement that than use a "full database" which feels like it would have a ton of overhead.

And you can do that! That's how INN works! I'm just saying that as someone with a lot of years of experience maintaining that code with a simple hash table and whatnot, a whole lot of time and bugs would have been saved by
just using an off-the-shelf database. At, of course, the cost of having to handle database transitions and implementation changes and BerkeleyDB
getting bought by Oracle and then killed and so forth.

Thanks, this is helpful perspective. I think I still need to sleep on
this a bit but hearing about your experience here is really valuable.

Honestly, I was really set on just using flat files before but there are
some compelling reasons you've brought up. Maybe I'll abstract things in
a way such that I can start with flat files and add a DB (SQLite or
other) backend option later that could be used instead. I was trying to
avoid that complexity but it might be worth it.

If going the database route, I'm assuming you would just recommend
SQLite for everything? I would guess a regular RDBMS like MariaDB would
be overkill (and possibly cause issues if the server weren't local).

Really, was more wondering about the active file than anything else. While >> not officially standardized anywhere, it seems in practice there are a few >> standardized files with standardized formats:

.active (LIST ACTIVE)
.active.times (LIST ACTIVE.TIMES)
.newsgroups (LIST NEWSGROUPS)
<group>/.overview (8 standardized fields)
<group>/<article number> for article naming

The last is often not used these days because it has a lot of poor performance properties.

You mean one file per article in the spool?
From the documentation, I thought the "tradspool" method in INN was the
most common deployment.

There are a few other ones that aren't as widely used and are arguably configuration instead, but that do need to be queryable. They're probably fine as configuration files with some in-memory representation, though,
since they're usually very small.

LIST DISTRIB.PATS
LIST MODERATORS
LIST OVERVIEW.FMT

Yes, I skipped these since they're global and not "one entry per group"-
are there any others of those that I missed?

Is the "LIST MODERATORS" file all that is involved in moderation? I
didn't think there was any moderator info explicitly associated with
each group.

If combining .newsgroups into .active, it makes me wonder, why not go
further and also combine .active.times into .active?

Yes, indeed.

Were these initially separate simply because .active.times came later
and wanted to avoid breaking the format of .active,

Yup, exactly.

<group> <last> <high> <low> <count> <creation epoch> <creator name>
<description>

Note that you now have a space-separated file except for the last field
and you have a problem if you want to add another field that you didn't
think of originally. I would really want to store this as some sort of structured file because you have some fields there (at least the
description, maybe the creator name) that can contain a variety of characters.

Some files already use tab, which I don't *think* is allowed in any of
the metadata to date? If it is, maybe a non-ASCII character like field separator would work.

Adding a field is something to think about. It would be a problem for databases too, though there are various migration tools for extending
schemas, at least, and I'll grant that's one area where databases win
over plain text files. But regardless of the underlying format, I'd
prefer to invest enough time in the design up front to hopefully not
need any changes later. Since NNTP has been stable for quite a long time
now, I think that's realistic, unless there are new extensions in the
future which add more metadata - and I do have a few extensions in mind
for later but none would modify the group metadata.

(Thinking now of the new possible file format, theoretically a new
command to return all this info at once might also be useful in the
future now that doing so would be efficient, e.g. LIST EVERYTHING or
whatever, instead of doing LIST ACTIVE, LIST ACTIVE.TIMES, LIST
NEWSGROUPS, etc. individually.)
--- Synchronet 3.21f-Linux NewsLink 1.2

From InterLinked@nntp@phreaknet.org to news.software.nntp on Fri May 1 20:34:48 2026

From Newsgroup: news.software.nntp

On 5/1/2026 5:18 PM, Russ Allbery wrote:

InterLinked <nntp@phreaknet.org> writes:

The server really only has 946 articles[1]; yet, INN is reporting it has
29,344 (likely because this is larger than the value of groupexactcount,
so it just estimated it). I know the overview database has the count,
though I guess that value is not necessarily up to date, for reasons I
don't understand currently - presumably keeping it up to date would add
non-constant overhead with INN's current architecture.

I don't know if it's the case here (I don't know if Eternal September even expires articles)

They do: "Retention is currently 3 years for de.*, 160 days for the Big
8, 130 days for alt.* and 90 days for other hierarchies."

but historically another really common reason for this
pattern is that the very early article that's holding down the low water
mark was crossposted to some other group (traditionally *.answers) with a much longer retention and the articles after it have expired.

I don't think that would be the case here since I think they expire
everything eventually, but that's another interesting case to handle. I
know it's more efficient to symlink the same message in multiple
newsgroups, but now I wonder if it would be better to just duplicate
them so they can be handled individually...

Note that the article count is not really useful to the news reader client under normal circumstances because the news reader often does not care in about how many *total* articles the group contains. If the user has been reading the group (the common case), the news reader really cares about
how many *unread* articles the group has, and for that the article count
is basically useless. The article count as returned by NNTP is pretty much only useful for groups that you have never read, or haven't read for so
long that your read mark is below the low water mark.

Yes, this also makes sense, so now I wonder why my client gets confused
when this happens... I have a feeling it may not be doing the most
intelligent thing but would be interesting to see if it has the same
issue when the count is accurate.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Russ Allbery@eagle@eyrie.org to news.software.nntp on Fri May 1 19:44:42 2026

From Newsgroup: news.software.nntp

InterLinked <nntp@phreaknet.org> writes:

On 5/1/2026 5:14 PM, Russ Allbery wrote:

Well, I'm not sure I agree with the distinction you're making here,
since the active file *is* a database. INN has a whole bunch of
databases, some of which it stores as text files, but just because the
format is a text file doesn't make it a database. INN definitely uses
the active file like a database (hence the zero-padding).

Sorry, to be clear, I meant database in the sense of something like
SQLite or MySQL, not using a text file under direct control of the
program as a store.

So, this is tricky. My installation of INN doesn't use any databases in
the sense of SQLite or MySQL. The spool is in CNFS, the overview is in tradindexed, and the history file is in hisv6, all of which are under the direct control of the program. But those are all binary structured file
formats with capabilities that make some specific types of queries fast.

A database is essentially just an abstraction over really good
implementations of a bunch of complex data structures. This is sort of
what I'm getting at in the overall discussion. Writing your own bespoke
data structures is a good idea if your needs are extremely simple or
extremely complicated (and specific to you), but there's a whole middle
space where the thousands of hours someone else has put into a generic, highly-tuned implementation of those algorithms is probably better.

People will definitely disagree over where those points are. And obviously writing your own bespoke stuff is fun, and to a large extent netnews is
just a hobby at this point, so I'm all in favor of people having fun.

Makes sense - I'm assuming the new mechanism is a database of each
article, effectively, so you can just select the articles of interest?

For ovsqlite, yes. For tradindexed, which is the backend I wrote many
years ago, there's still a .overview file as before (although it has a different name), but alongside it there is a binary index that records
some additional metadata (arrival time, for instance) and the offset and
length in the data file for each article. That allows something similar. There's also another file that stores information about each group in a
hash table that's written to disk.

(You can see all the details in storage/tradindexed in the INN source
tree. It should be pretty well-commented.)

It's all very "I was in my 20s and was having a great deal of fun writing
data structures for a real-world problem." :)

Thanks, this is helpful perspective. I think I still need to sleep on
this a bit but hearing about your experience here is really valuable.

Honestly, I was really set on just using flat files before but there are
some compelling reasons you've brought up. Maybe I'll abstract things in
a way such that I can start with flat files and add a DB (SQLite or
other) backend option later that could be used instead. I was trying to
avoid that complexity but it might be worth it.

I do want to say that by all means, do whatever makes you the most happy
and feel free to ignore advice for what's the most maintainable or the
least effort or whatever! Really at this point in netnews's history I
think the most important thing is that people are having fun.

If going the database route, I'm assuming you would just recommend
SQLite for everything? I would guess a regular RDBMS like MariaDB would
be overkill (and possibly cause issues if the server weren't local).

SQLite is by far the easiest to use because it's just a library that
stores its stuff in files on disk, which has lots of really nice
properties and makes it very easy to set up and maintain (not entirely
trivial, but easy). But it is going to be slow. I suspect that an actual database server that is properly tuned will be faster than SQLite. You may
not care. No one has cared enough for INN to write such a backend.

<group>/<article number> for article naming

The last is often not used these days because it has a lot of poor
performance properties.

You mean one file per article in the spool?
From the documentation, I thought the "tradspool" method in INN was the
most common deployment.

Right. It probably is still the most common deployment, and it has a lot
of nice advantages, particularly for small servers. It's very human comprehensible without special tools, which is nice.

However, tradspool is extremely hard on file systems and disks, so for a
really large server it tends to be slow. It also forces a rather expensive expire process (deleting lots of articles is a lot of file system
operations!) if you want to expire articles.

For a news server that requires mininmum maintenance and can mostly just
be ignored, I would recommend CNFS. That's what I use personally. You lose
some control and visibility and it's a bad choice if you never want
articles to expire, but it has the huge advantage that you'll never run
out of disk space (the worst thing that happens is that things expire a
bit faster), there's no expensive expire process, and it's really fast and light on resources.

Yes, I skipped these since they're global and not "one entry per group"-
are there any others of those that I missed?

No, I think you got them all.

Is the "LIST MODERATORS" file all that is involved in moderation? I
didn't think there was any moderator info explicitly associated with
each group.

It's rather irrelevant these days, although it does let the client mail a submission to a moderated group directly, which in theory would actually
be better in these days of spam filtering, DMARC, and similar problems
with the email relay system. Not that any clients do this. :)

Some files already use tab, which I don't *think* is allowed in any of
the metadata to date? If it is, maybe a non-ASCII character like field separator would work.

Tab is probably the best choice. Newsgroup descriptions should really be treated as full UTF-8 these days except for reserved characters, not that
all software has been updated to do that.

Adding a field is something to think about. It would be a problem for databases too, though there are various migration tools for extending schemas, at least, and I'll grant that's one area where databases win
over plain text files. But regardless of the underlying format, I'd
prefer to invest enough time in the design up front to hopefully not
need any changes later. Since NNTP has been stable for quite a long time
now, I think that's realistic, unless there are new extensions in the
future which add more metadata - and I do have a few extensions in mind
for later but none would modify the group metadata.

Yeah, this is one of those classic design trade-offs. I pretty much always build extensibility into everything I do these days because I've been
burned too many times, but you're not wrong about the unlikelihood of
major NNTP changes.
--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Russ Allbery@eagle@eyrie.org to news.software.nntp on Fri May 1 19:48:18 2026

From Newsgroup: news.software.nntp

InterLinked <nntp@phreaknet.org> writes:

Yes, this also makes sense, so now I wonder why my client gets confused
when this happens... I have a feeling it may not be doing the most intelligent thing but would be interesting to see if it has the same
issue when the count is accurate.

Yeah, I *suspect* it's cancelled (or otherwise removed, such as via NoCeM) articles that are confusing it and it's correcting its unread count when
it retrieves overview information, but I don't really know.
--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
--- Synchronet 3.21f-Linux NewsLink 1.2

From ram@ram@zedat.fu-berlin.de (Stefan Ram) to news.software.nntp on Sat May 2 09:18:33 2026

From Newsgroup: news.software.nntp

InterLinked <nntp@phreaknet.org> wrote or quoted:

Sorry, to be clear, I meant database in the sense of something like
SQLite or MySQL, not using a text file under direct control of the
program as a store.

FWIW, I am aware of this definition by Ramez Elmasri:

|A database is a collection of related data. By data, we mean
|known facts that can be recorded and that have implicit meaning.
Ramez Elmasri (2011).

But what many people mean by "database" is a
/data base management system/ (DBMS).

--- Synchronet 3.21f-Linux NewsLink 1.2

From InterLinked@nntp@phreaknet.org to news.software.nntp on Sat May 2 10:18:14 2026

From Newsgroup: news.software.nntp

On 5/1/2026 10:44 PM, Russ Allbery wrote:

SQLite is by far the easiest to use because it's just a library that
stores its stuff in files on disk, which has lots of really nice
properties and makes it very easy to set up and maintain (not entirely trivial, but easy). But it is going to be slow. I suspect that an actual database server that is properly tuned will be faster than SQLite. You may not care. No one has cared enough for INN to write such a backend.

Good to know. I think I'll start with a traditional file implementation
but leave the door open for allowing a DB implementation in the future.
That would be "fun" to experiment with.

For a news server that requires mininmum maintenance and can mostly just
be ignored, I would recommend CNFS. That's what I use personally. You lose some control and visibility and it's a bad choice if you never want
articles to expire, but it has the huge advantage that you'll never run
out of disk space (the worst thing that happens is that things expire a
bit faster), there's no expensive expire process, and it's really fast and light on resources.

Interesting... CNFS has always seemed a bit "weird" to me - I see how it excels at certain properties, but not sure if I'm interested in
supporting it myself. My plan is really to run two news servers myself,
one in the "cloud", with expiration varying by group, and open to authenticated users, and one at home, for groups of interest, where
articles never expire (which would function as an archive, but also be
used by my local newsreader).

CNFS seems to work well if you have a set size you want to dedicate per
group, but not as efficient for small/empty groups, or if you want to
expire by article count or age - maybe I'm missing something here though.

I assume because the articles for a group are just in one big file,
articles also have to be duplicated to multiple of these blogs when cross-posted?

In any case, I'll just do "tradspool" for now but leave the door open to adding others later.

It's rather irrelevant these days, although it does let the client mail a submission to a moderated group directly, which in theory would actually
be better in these days of spam filtering, DMARC, and similar problems
with the email relay system. Not that any clients do this. :)

In what sense is it irrelevant? I hear people say moderated groups are
dead, but I still subscribe to one moderated group, comp.dcom.telecom,
though I thought the news server forwarded it to the moderator, not the
client directly. Is the server not using the LIST MODERATORS data
internally to send to moderator?

Admittedly I need to learn more about how moderation works - I don't
think I've seen it discussed much in any RFCs since it's implementation
rather than protocol related. But I would imagine when a new group
control message gets shared, it would have to contain moderation info,
and dynamically update the moderator info at that point.
--- Synchronet 3.21f-Linux NewsLink 1.2

From InterLinked@nntp@phreaknet.org to news.software.nntp on Sat May 2 10:53:32 2026

From Newsgroup: news.software.nntp

On 5/2/2026 10:18 AM, InterLinked wrote:

In any case, I'll just do "tradspool" for now but leave the door open to adding others later.

Looking at the documentation for the different storage methods, and for traditional spool, I noticed:

where "news/group/name" is the name of the newsgroup to which the article was posted with each period changed to a slash, and "nnnnn" is the sequence number of the article in that newsgroup

So for "misc.test" there would be a subfolder "test" within a subfolder "misc", not just one subfolder "misc.test".

I find this a bit curious, as in IMAP, subfolders work the other way - a folder that is logically a subfolder, "Parent > Sub" is typically named parent.sub in the root maildir, and all the folders are still siblings
to each other on disk (except INBOX). Coming from more of an IMAP
background, I would have intuited to just use the group name literally
for the folder, but I'm guessing there's a good reason not to do this?

I'd guess performance has something to do with why the hierarchy is
actually a hierarchy on disk (a newsdir will probably be much, much
larger than a maildir), rather than all groups being siblings in the
root newsdir.

Is that about it, or are there other considerations that give this
method an advantage? For example, I can't think how this would make any particular operation more efficient - since I don't think "delete
hierarchy" is a thing. Likewise, there's usually no need to actually
scan over the contents of the root newsdir - that's why the active file exists. If anything, it might make group creation slightly less
efficient, since you have to create the ancestors if they don't exist
already, might end up with empty subfolders later if groups are deleted,
etc.

Was this just a historical convention, or are there any other compelling reasons to keep one method vs the other in a new system?
--- Synchronet 3.21f-Linux NewsLink 1.2

From Russ Allbery@eagle@eyrie.org to news.software.nntp on Sat May 2 09:04:38 2026

From Newsgroup: news.software.nntp

InterLinked <nntp@phreaknet.org> writes:

On 5/1/2026 10:44 PM, Russ Allbery wrote:

For a news server that requires mininmum maintenance and can mostly
just be ignored, I would recommend CNFS. That's what I use personally.
You lose some control and visibility and it's a bad choice if you never
want articles to expire, but it has the huge advantage that you'll
never run out of disk space (the worst thing that happens is that
things expire a bit faster), there's no expensive expire process, and
it's really fast and light on resources.

Interesting... CNFS has always seemed a bit "weird" to me - I see how it excels at certain properties, but not sure if I'm interested in
supporting it myself. My plan is really to run two news servers myself,
one in the "cloud", with expiration varying by group, and open to authenticated users, and one at home, for groups of interest, where
articles never expire (which would function as an archive, but also be
used by my local newsreader).

Yeah, and if you never want articles to expire, CNFS is a bad choice. It's ideal for transit-only servers, which used to be a thing and probably
aren't as much any more because there's no point these days in having so
large a server farm that you need to separate transit and reading servers unless you're one of the few sites still trying to have a go at a
commercial Usenet service. I like it for small reading servers where you
don't care about keeping things around forever and don't have any
particular preferences on expiration other than "don't run out of disk
space."

CNFS seems to work well if you have a set size you want to dedicate per group, but not as efficient for small/empty groups, or if you want to
expire by article count or age - maybe I'm missing something here
though.

I assume because the articles for a group are just in one big file,
articles also have to be duplicated to multiple of these blogs when cross-posted?

CNFS doesn't use one file per group. Well, it *can*, you can configure it
all sorts of different ways, but the configuration that I use is one
logical file for the whole server. (It's actually divided into several
files, but for no good reason.) All the articles go into the same file,
and when it rolls over the earliest articles start getting overwritten by
order of arrival.

So yes, it's complicated for fine-grained expiration control: You have to
move the articles you want to have a different expiration for into their
own CNFS files or into another storage backend like tradspool. When I was running a larger news server for more people, I had most groups in CNFS
and local groups that we kept forever in tradspool.

It's rather irrelevant these days, although it does let the client mail
a submission to a moderated group directly, which in theory would
actually be better in these days of spam filtering, DMARC, and similar
problems with the email relay system. Not that any clients do this. :)

In what sense is it irrelevant?

In the sense that all the news servers send the message to the moderator directly and clients never use that file, and also in the sense that we're doing a much better job these days of getting all the moderator addresses
into moderators.isc.org, so there's less need to have other rules than the default.

My memory on this is very vague, but I could have sworn that there were
some news readers that used this file to send mail to the moderator
directly some thirty years or more ago. It certainly used to be the case
that there were different moderator forwarding rules for different
hierarchies and moderators.uu.net (as it was back then) was only usable
for Big Eight groups.

It's still useful on the server side if you have local moderated groups.
But providing it to the client is basically pointless now.

I hear people say moderated groups are dead, but I still subscribe to
one moderated group, comp.dcom.telecom, though I thought the news server forwarded it to the moderator, not the client directly. Is the server
not using the LIST MODERATORS data internally to send to moderator?

No, it is, it's just that because it's handling that, there's no real
reason for the client to care.

I still moderate several groups and at least one of them has active
traffic, so moderated groups as a concept are not dead.

Admittedly I need to learn more about how moderation works - I don't
think I've seen it discussed much in any RFCs since it's implementation rather than protocol related. But I would imagine when a new group
control message gets shared, it would have to contain moderation info,
and dynamically update the moderator info at that point.

No, all the forwarding these days is handled by moderators.isc.org, and
there's no place in a control message to document that.

Moderation is a horrible cludge. You will probably be appalled. :) It's a design from another era, and we never completed the work we were hoping to
do to try to make it less of a cludge, so it's very much something out of
an earlier era of the Internet when spam didn't exist.

RFC 5537 is the place to go for all the details on that side of how
netnews works. There is indeed a protocol; it's just not part of NNTP. I
try to collect all the relevant RFCs on a couple of web pages; you may
find them useful:

https://www.eyrie.org/~eagle/usefor/
https://www.eyrie.org/~eagle/nntp/
--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Russ Allbery@eagle@eyrie.org to news.software.nntp on Sat May 2 09:08:23 2026

From Newsgroup: news.software.nntp

InterLinked <nntp@phreaknet.org> writes:

On 5/2/2026 10:18 AM, InterLinked wrote:

In any case, I'll just do "tradspool" for now but leave the door open
to adding others later.

Looking at the documentation for the different storage methods, and for traditional spool, I noticed:

where "news/group/name" is the name of the newsgroup to which the
article was posted with each period changed to a slash, and "nnnnn" is
the sequence number of the article in that newsgroup

So for "misc.test" there would be a subfolder "test" within a subfolder "misc", not just one subfolder "misc.test".

I find this a bit curious, as in IMAP, subfolders work the other way - a folder that is logically a subfolder, "Parent > Sub" is typically named parent.sub in the root maildir, and all the folders are still siblings to each other on disk (except INBOX). Coming from more of an IMAP background,
I would have intuited to just use the group name literally for the folder, but I'm guessing there's a good reason not to do this?

You know, I have no idea why news servers do it this way. It does mean
that you don't have a directory for every newsgroup at the top level of
the spool, which was probably part of the reason since file systems traditionally had a lot of problems with directories containing lots of
files, although I'm dubious the number of newsgroups would be less than
the number of articles in the most active group.

But that's just been the way tradspool has been organized from before I
got on Usenet in 1993. So much so that it's even influenced Usenet group
naming (rather controversially) with *.misc renaming back in the day.

I suppose it also lets you move hierarchies to separate drives easily back
when drives were small enough that you'd have to worry about the disk
usage of a bunch of netnews articles. That's possibly still relevant for alt.binaries.* for anyone who carries it (although surely these days
people mostly use CNFS or something similar for the binary groups).
--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
--- Synchronet 3.21f-Linux NewsLink 1.2

From =?UTF-8?Q?Julien_=C3=89LIE?=@iulius@nom-de-mon-site.com.invalid to news.software.nntp on Sun May 3 23:48:53 2026

From Newsgroup: news.software.nntp

Hi InterLinked,

For LIST COUNTS and GROUP, it pulls from group stats. However, in the response for LIST ACTIVE, it simply dumps the line from the active file
as is.

Indeed, and to be more precise, if you give a newsgroup name as an
argument to LIST ACTIVE, this command will pull the information from the overview (like LIST COUNTS and GROUP).

You then may end up with things like that for an empty newsgroup:

GROUP trigofacile.test3
211 0 8 7 trigofacile.test3

LIST ACTIVE trigofacile.test3
215 Newsgroups in form "group high low status"
trigofacile.test3 0000000007 0000000008 y
.

LIST ACTIVE trigofacile.test3*
215 Newsgroups in form "group high low status"
trigofacile.test3 0000000008 0000000008 y
.

The "*" at the end of the last LIST ACTIVE command forces it to parse
the active file to look for matching newsgroup names.

I can't find any examples of newsgroups where the high water mark
article is deleted, so it's hard to poke at this behavior

You could just send an article to misc.test and cancel it to see the
behaviour on a news server honouring such cancels.
You'll see the reported high water mark do not decrease with an INN news server.
--
Julien |eLIE

-2-aLe bonheur, c'est vouloir ce que l'on a.-a-+

--- Synchronet 3.21f-Linux NewsLink 1.2

From =?UTF-8?Q?Julien_=C3=89LIE?=@iulius@nom-de-mon-site.com.invalid to news.software.nntp on Sun May 3 23:49:16 2026

From Newsgroup: news.software.nntp

Hi InterLinked,

To make an extreme example, if a group with a lot of articles had all of them except the low water mark article deleted (and "last" is 3000), you could have a response like:

211 3000 1 3000 misc.test

[and the count has to be at least 3000, per the RFC, so we can't even
have 211 1 1 3000 misc.test to indicate there are definite gaps]

Where do you read in RFC 3977 that the estimate "has to be at least 3000"?

The wording is:

If the group is not empty, the estimate MUST be at least the actual
number of articles available and MUST be no greater than one more
than the difference between the reported low and high water marks.

That brings me to another observation: I've noticed that most inactive newsgroups in INN return high 0 and low 1 (at least for those I've
analyzed in responses for Usenet groups), which seemed odd to me as I
would have thought INN would naturally take the <high> in the active
file, and taking low = <high> and high = low - 1, return something more
like 211 0 3000 2999 misc.test
Is there a reason INN just uses 1/0 instead?

I confirm what Russ said: high = low - 1 is what INN replies for empty newsgroups which formerly received at least one article.

We even had a bug until recently as for versions prior to 2.7.1, INN
returned low = high + 1 which was unfortunately wrong when high was
2^31-1... A pretty rare case though :)
It now returns high = low - 1 except of course when low is 0 (for
newsgroups which have never received any article).
You already spotted that as you referenced the related issue in the
Github tracker :)
--
Julien |eLIE

-2-aQuand vous avez des ennuis, les gens qui vous appellent par sympathie
le font surtout pour avoir des d|-tails.-a-+ (Edgar Watson Howe)

--- Synchronet 3.21f-Linux NewsLink 1.2

From =?UTF-8?Q?Julien_=C3=89LIE?=@iulius@nom-de-mon-site.com.invalid to news.software.nntp on Sun May 3 23:49:23 2026

From Newsgroup: news.software.nntp

Hi Russ,

| o New articles may be added with article numbers greater than the
| reported high water mark. (If an article that was the one with
| the highest number has been removed and the high water mark has
| been adjusted accordingly, the next new article will not have the
| number one greater than the reported high water mark.)

To me, this implies the high water mark can (even "should") decrease when
the high water mark article is removed - in which case, the next article
assigned would indeed not have high + 1 (similar to how UIDs and UIDNEXT
work in IMAP).

So, I have to admit that I don't recall this explicitly coming up during
the RFC discussions, so I don't have a definitive answer for you about why
we worded it this way. I think if we'd noticed this at the time, we would have been a bit clearer about what clients should expect, so I think there
is a (minor) bug in the standard here.

"this implies the high water mark can (even "should") decrease" is
already said in RFC 3977 a few lines after the above bullet point:

the reported low water mark in the response MUST be no less than that
in any previous response for that newsgroup in this session, and it
SHOULD be no less than that in any previous response for that
newsgroup ever sent to any client.
[...]
No similar assumption can be made about the high water mark, as this
can decrease if an article is removed and then increase again if it
is reinstated or if new articles arrive.

The RFC states it "can" decrease.

There's also the use case of a slave server which does not compute
itself the article number to use as it relies on the one provided in the
Xref header field of the received article. Thus it can receive article
number 12 followed by article number 15, which is not 12 plus 1. It
could explain the wording that "new articles may be added with article
numbers greater than the reported high water mark" but indeed that use
case is not described in the parenthesis.
--
Julien |eLIE

-2-aSi l'amour est aveugle, il faut palper.-a-+

--- Synchronet 3.21f-Linux NewsLink 1.2

From InterLinked@nntp@phreaknet.org to news.software.nntp on Sun May 3 18:17:11 2026

From Newsgroup: news.software.nntp

On 5/3/2026 5:49 PM, Julien |eLIE wrote:

Hi InterLinked,

To make an extreme example, if a group with a lot of articles had all
of them except the low water mark article deleted (and "last" is
3000), you could have a response like:

211 3000 1 3000 misc.test

[and the count has to be at least 3000, per the RFC, so we can't even
have 211 1 1 3000 misc.test to indicate there are definite gaps]

Where do you read in RFC 3977 that the estimate "has to be at least 3000"?

The wording is:

-a-a If the group is not empty, the estimate MUST be at least the actual
-a-a number of articles available and MUST be no greater than one more
-a-a than the difference between the reported low and high water marks.

That was what I was looking at, but I don't think my brain was working
when I read that, disregard :)

That brings me to another observation: I've noticed that most inactive
newsgroups in INN return high 0 and low 1 (at least for those I've
analyzed in responses for Usenet groups), which seemed odd to me as I
would have thought INN would naturally take the <high> in the active
file, and taking low = <high> and high = low - 1, return something
more like 211 0 3000 2999 misc.test
Is there a reason INN just uses 1/0 instead?

I confirm what Russ said: high = low - 1 is what INN replies for empty newsgroups which formerly received at least one article.

Isn't this also true for empty newsgroups which have never received an
article either? Per my earlier comment about seeing a bunch of low=1 and high=0 in the LIST ACTIVE response from Eternal September, e.g.:

LIST ACTIVE comp.dcom*
215 Newsgroups in form "group high low status"
comp.dcom.cabling 0000000000 0000000001 y
comp.dcom.cell-relay 0000000000 0000000001 y
comp.dcom.fax 0000000000 0000000001 y
comp.dcom.isdn.capi 0000000000 0000000001 y
comp.dcom.lans.ethernet 0000000000 0000000001 y
comp.dcom.lans.misc 0000000000 0000000001 y

We even had a bug until recently as for versions prior to 2.7.1, INN returned low = high + 1 which was unfortunately wrong when high was 2^31-1...-a A pretty rare case though :)
It now returns high = low - 1 except of course when low is 0 (for
newsgroups which have never received any article).
You already spotted that as you referenced the related issue in the
Github tracker :)

Yes, in fact, I found both the issue and the fix to be very helpful (as
well as the submitted erratum, the rejection to which was not very clear
to me initially) as I was thinking about this, prior to my initial post.
Had I not seen that, I would have gone ahead and made the same "mistake"
of doing LOW = HIGH + 1 (in fact, I had already started to do that).

The "wrong" way seemed more natural to me, because you can then set LOW
= HIGH + 1 and not have to worry about adjusting LOW when the next
article is assigned. Setting LOW = HIGH (well, LOW = LAST, to be
specific) and then HIGH = LOW - 1 isn't really intuitive.

To confirm my own understanding, the only reason we do LOW = LAST (which
is the same as LOW = HIGH in INN) and then HIGH = LOW + 1, rather than
LOW = HIGH + 1, is to account for overflow when LAST/HIGH is the max
article number?

Circling back to a previous point that it's ideal to set the low water
mark as high as "legally" valid at any given point, the LOW = HIGH + 1
method also has the advantage of being one higher than the other way,
which you pointed out in the erratum. I kind of wonder if it would be
valid to do it this way, except in the case that HIGH is the max article number (which seems unlikely to happen often, and when it does, the
group is saturated anyways). Not that I'm planning to do that, but maybe
that will help me understand something else I missed.
--- Synchronet 3.21f-Linux NewsLink 1.2

From InterLinked@nntp@phreaknet.org to news.software.nntp on Sun May 3 18:22:56 2026

From Newsgroup: news.software.nntp

On 5/3/2026 5:48 PM, Julien |eLIE wrote:

Hi InterLinked,

For LIST COUNTS and GROUP, it pulls from group stats. However, in the
response for LIST ACTIVE, it simply dumps the line from the active
file as is.

Indeed, and to be more precise, if you give a newsgroup name as an
argument to LIST ACTIVE, this command will pull the information from the overview (like LIST COUNTS and GROUP).

You then may end up with things like that for an empty newsgroup:

GROUP trigofacile.test3
211 0 8 7 trigofacile.test3

LIST ACTIVE trigofacile.test3
215 Newsgroups in form "group high low status"
trigofacile.test3 0000000007 0000000008 y
.

LIST ACTIVE trigofacile.test3*
215 Newsgroups in form "group high low status"
trigofacile.test3 0000000008 0000000008 y
.

The "*" at the end of the last LIST ACTIVE command forces it to parse
the active file to look for matching newsgroup names.

Thanks, I do remember noticing that when reading the code (optimization
for single group).

And this is because, I take it, the overview database is less up to date
than the active file, as far as the water marks go? (Or perhaps vice
versa, I don't think I figured out which is more up to date).

Also, side question, why is it called the "overview database"? It seems
like OVDB is mainly used to satisfy responses for GROUP and LIST ACTIVE
with a single group as an argument. Yet, "overview" also traditionally
refers to the overfile per-group file with a line for each message,
which stores the 8 (or more) headers used in the XOVER/OVER responses. I
don't think there is a connection between the two, is there?

Sometimes I also see it referred to as "group stats" like you said,
which seems like a clearer term for what it is, but they seem to interchangeable.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Russ Allbery@eagle@eyrie.org to news.software.nntp on Sun May 3 16:09:17 2026

From Newsgroup: news.software.nntp

InterLinked <nntp@phreaknet.org> writes:

Also, side question, why is it called the "overview database"? It seems
like OVDB is mainly used to satisfy responses for GROUP and LIST ACTIVE
with a single group as an argument. Yet, "overview" also traditionally
refers to the overfile per-group file with a line for each message,
which stores the 8 (or more) headers used in the XOVER/OVER responses. I don't think there is a connection between the two, is there?

No, that's the primary purpose of the overview database: answering OVER queries. In order to answer those queries, it turns out to also have the
most accurate information for GROUP (and LIST ACTIVE for a single group),
so it's also used for those purposes. But it was originally written for overview information.

Sometimes I also see it referred to as "group stats" like you said,
which seems like a clearer term for what it is, but they seem to interchangeable.

That's just one thing that's stored in the overview database.
--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
--- Synchronet 3.21f-Linux NewsLink 1.2

From =?UTF-8?Q?Julien_=C3=89LIE?=@iulius@nom-de-mon-site.com.invalid to news.software.nntp on Mon May 4 22:52:48 2026

From Newsgroup: news.software.nntp

Hi InterLinked,

high = low - 1 is what INN replies for empty
newsgroups which formerly received at least one article.

Isn't this also true for empty newsgroups which have never received an article either? low=1 and high=0

When the newsgroup has never received an article, I assume the concept
of "low water mark" does not exist as there hasn't been any first
article. But yes, if you consider low=1 in that case, the formula is
the same.
Maybe the ideal would be to advertise low=0 and high=0 in that case
(allowed by RFC 3977 to represent an empty newsgroup), which would differentiate a newsgroup which has never received any article from
another one which has received only 1 article and is now empty.
Well, nobody matters but it would make sense :)

To confirm my own understanding, the only reason we do LOW = LAST (which
is the same as LOW = HIGH in INN) and then HIGH = LOW + 1, rather than
LOW = HIGH + 1, is to account for overflow when LAST/HIGH is the max
article number?

I don't know whether that were the reason for the formula but yes, at
least it works with the max article number!

the LOW = HIGH + 1
method also has the advantage of being one higher than the other way,
which you pointed out in the erratum. I kind of wonder if it would be
valid to do it this way, except in the case that HIGH is the max article number

Yes, it is valid. It respects the rule that "the high water mark will
be one less than the low water mark", and when HIGH is the max article
number, you could use LOW = 2^31-1 and HIGH = LOW - 1 (the preferred way
per RFC 3977, as a SHOULD) or LOW = HIGH = 2^31-1 (an alternative way).
--
Julien |eLIE

-2-aLe bonheur, c'est vouloir ce que l'on a.-a-+

--- Synchronet 3.21f-Linux NewsLink 1.2

From InterLinked@nntp@phreaknet.org to news.software.nntp on Mon May 4 18:30:07 2026

From Newsgroup: news.software.nntp

On 5/4/2026 4:52 PM, Julien |eLIE wrote:

Hi InterLinked,

high = low - 1 is what INN replies for empty newsgroups which
formerly received at least one article.

Just curious here - what's the rationale behind this, exactly?

Earlier Russ mentioned that *ideally*, you would want to provide as much information as possible. For a group with articles formerly, it seems
that would be:

* Use the last article number for the low water mark
* Set high to low - 1

Since INN doesn't support reinstating articles, there is no downside to advertising that as the low water mark, as it would only increase from
there. If I had to guess, is this because in INN, when a group is empty,
the high water mark is not present in overview and it would have to
check the active file, so 1/0 is used for efficiency?

(Or unless the last article number is the max allowed article number,
even the old way of just doing high = last and low = high + 1 seems to
be legal as well. Actually, that reminds me what it was about the
erratum I didn't understand - a comment about server synchronization and
how the low water mark a client reads might decrease in this scenario.
Is anyone able to explain how that might happen?)

Also, this answers a previous question I had about seeing a bunch of
groups with 1/0. Now I know that they indeed had articles at some point, because the response is 1/0, I have absolutely no information as to how
many articles the groups have had before going inactive.

Isn't this also true for empty newsgroups which have never received an
article either? low=1 and high=0

When the newsgroup has never received an article, I assume the concept
of "low water mark" does not exist as there hasn't been any first
article.-a But yes, if you consider low=1 in that case, the formula is
the same.
Maybe the ideal would be to advertise low=0 and high=0 in that case
(allowed by RFC 3977 to represent an empty newsgroup), which would differentiate a newsgroup which has never received any article from
another one which has received only 1 article and is now empty.
Well, nobody matters but it would make sense :)

Actually, it's a good idea. It provides a newsreader with "more"
information than simply doing low=1/high=0 in both cases. Not that
software would know the difference / treat the cases differently, but a
human looking at group info would.

I did find it curious that in several places in INN, there are checks
like this one:

if (!count) {
if (!low) low++;
high = low - 1;
}

I guess INN explicitly wants to make empty groups low=1/high=0 instead
of low=0/high=0. I think it could just as well be:

if (!count && low) high = low - 1

It seems to me ideally, a response of 0/0 on an empty group with no
articles, and low=last/high=low-1 would provide maximal information. In
that situation, low=1/high=0 would only occur in a group that only ever
had one article, which has since expired.

Not expecting INN to change, of course, but I think I might do it this
way, as I would like to be as accurate as possible and provide as much "information" as possible in a response.

To confirm my own understanding, the only reason we do LOW = LAST
(which is the same as LOW = HIGH in INN) and then HIGH = LOW + 1,
rather than LOW = HIGH + 1, is to account for overflow when LAST/HIGH
is the max article number?

I don't know whether that were the reason for the formula but yes, at
least it works with the max article number!

the LOW = HIGH + 1 method also has the advantage of being one higher
than the other way, which you pointed out in the erratum. I kind of
wonder if it would be valid to do it this way, except in the case that
HIGH is the max article number

Yes, it is valid.-a It respects the rule that "the high water mark will
be one less than the low water mark",

To clarify, I was talking about doing LOW = LAST + 1 normally, and LOW =
LAST, just for 2^31-1 (HIGH = LOW - 1 in both cases).

The effect of this would be that it would be that the responses for an
empty group where LAST = 2^31-1 and LAST = 2^31-2 would not be distinguishable. But again, the group is toast at that point so I'm not
sure if it really matters. And it would provide the advantage of being
able to have a low water mark that is one higher in all other cases, and
thus provides more meaning.

and when HIGH is the max article
number, you could use LOW = 2^31-1 and HIGH = LOW - 1 (the preferred way
per RFC 3977, as a SHOULD) or LOW = HIGH = 2^31-1 (an alternative way).

I'm a bit confused on this last point. It's valid to merely set low=high=2^31-1 to indicate a group is empty?

Is this by chance somehow the 3rd case in RFC 3977 6.1.1.2 for methods
of representing an empty group? That last case never made any sense to
me (high >= low and count can be anything), as that seems like it could
easily happen in non-empty groups. Maybe if it required count be 0, that
would be one thing, but I'm very puzzled by that qualifier - what cases (presumably) existed historically that resulted in a wording that an
empty group could have a non-empty article count, and high >= low? I'm
not really sure how I would tell if the group is empty or not.
--- Synchronet 3.21f-Linux NewsLink 1.2

From =?UTF-8?Q?Julien_=C3=89LIE?=@iulius@nom-de-mon-site.com.invalid to news.software.nntp on Tue May 5 01:00:51 2026

From Newsgroup: news.software.nntp

Hi InterLinked,

Actually, that reminds me what it was about the
erratum I didn't understand - a comment about server synchronization and
how the low water mark a client reads might decrease in this scenario.
Is anyone able to explain how that might happen?

Looking at the erratum:
"The high water mark is one less than the low water mark for empty
newsgroups. A major reason for doing it this way was to deal with
clusters of servers. If they're not perfectly synchronized, then
a cancel might be visible on one and not another. So if you connect
to the second one, it looks as if the article has been reinstated.
Wording it like this meant we didn't need special treatment of such
clusters. The low water mark cannot decrease."

If a newsgroup has only article number 12, and this article is cancelled
in cluster A a few seconds before it is in cluster B, a newsreader
connecting to cluster A will see low water mark = 13, high water mark =
12 (empty newsgroup with low = high + 1) and if it disconnects and
reconnects this time associated to cluster B before the cancel is
executed, it will see low water mark = high water mark = 12, thus having decreased.
When the high = low - 1 formula is used, it sees low water mark = 12 and
high water mark = 11 on cluster A. The low water mark does not decrease.

Anyway, I agree that the problem is present in non-empty newsgroups if
the low water mark is updated on the fly. If cluster A has article 13,
and cluster B has articles 12 and 13, the low water mark will be
inferior when connecting to cluster B...

I guess INN explicitly wants to make empty groups low=1/high=0 instead
of low=0/high=0.

Because low=1/high=0 is the preferred way per RFC 3977, mentioned as a
SHOULD.

Not expecting INN to change, of course, but I think I might do it this
way, as I would like to be as accurate as possible and provide as much "information" as possible in a response.

You could do that if you prefer. Feel free :)

I'm a bit confused on this last point. It's valid to merely set low=high=2^31-1 to indicate a group is empty?
Is this by chance somehow the 3rd case in RFC 3977 6.1.1.2 for methods
of representing an empty group?

Yes, it is the third alternative allowed by RFC 3977, and I totally
agree it follows the same rule as a non-empty newsgroup. Very liberal :)

o The high water mark is greater than or equal to the low water
mark. The estimated article count might be zero or non-zero; if
it is non-zero, the same requirements apply as for a non-empty
group.
--
Julien |eLIE

-2-aLe caf|- est un breuvage qui fait dormir quand on n'en prend pas.-a-+
(Alphonse Allais)

--- Synchronet 3.21f-Linux NewsLink 1.2

From InterLinked@nntp@phreaknet.org to news.software.nntp on Mon May 4 19:40:06 2026

From Newsgroup: news.software.nntp

On 5/4/2026 7:00 PM, Julien |eLIE wrote:

Looking at the erratum:
"The high water mark is one less than the low water mark for empty newsgroups. A major reason for doing it this way was to deal with
clusters of servers. If they're not perfectly synchronized, then
a cancel might be visible on one and not another. So if you connect
to the second one, it looks as if the article has been reinstated.
Wording it like this meant we didn't need special treatment of such
clusters. The low water mark cannot decrease."

If a newsgroup has only article number 12, and this article is cancelled
in cluster A a few seconds before it is in cluster B, a newsreader connecting to cluster A will see low water mark = 13, high water mark =
12 (empty newsgroup with low = high + 1) and if it disconnects and reconnects this time associated to cluster B before the cancel is
executed, it will see low water mark = high water mark = 12, thus having decreased.
When the high = low - 1 formula is used, it sees low water mark = 12 and high water mark = 11 on cluster A.-a The low water mark does not decrease.

But doesn't that still break if there are multiple cancels during that
period? Say the group had articles 11 and 12, and both get cancelled.
Now the low water mark is either 12 or 13, depending on the
implementation. However, you connect to a server that hasn't processed
either cancel yet, and now the low water mark is 11 again.

I think I understand the scenario, but it seems that doesn't entirely
solve the problem either, just makes it less likely.

Anyway, I agree that the problem is present in non-empty newsgroups if
the low water mark is updated on the fly.-a If cluster A has article 13,
and cluster B has articles 12 and 13, the low water mark will be
inferior when connecting to cluster B...

Yes, I think that's sort of the same scenario I was thinking above. It
doesn't even matter whether the group is empty. So the reason for
rejection in the erratum doesn't even hold muster, as even the
"official" way of doing it *can* theoretically break.

Initially I was doing LOW = LAST + 1 and then changed to LOW = LAST
simply because INN had, but now that I understand this a bit better, I
think I might change back to LOW = LAST + 1 and just handle 2^31-1
specially to prevent an illegal response (and also use 0 0 0 for an
empty group that never had any articles).

I'm a bit confused on this last point. It's valid to merely set
low=high=2^31-1 to indicate a group is empty?
Is this by chance somehow the 3rd case in RFC 3977 6.1.1.2 for methods
of representing an empty group?

Yes, it is the third alternative allowed by RFC 3977, and I totally
agree it follows the same rule as a non-empty newsgroup.-a Very liberal :)

-a-a o-a The high water mark is greater than or equal to the low water
-a-a-a-a-a mark.-a The estimated article count might be zero or non-zero; if
-a-a-a-a-a it is non-zero, the same requirements apply as for a non-empty
-a-a-a-a-a group.

Aside from 2^31-1, is there ever a case where one would use this?

I'm still having trouble seeing why case 3 is even necessary. Wouldn't
this be a legal sequence, in a world where LOW = LAST + 1 (the way INN
used to do it):

Article 2147483646 assigned, and then deleted:
LAST=2147483646
LOW=2147483647
HIGH=2147483646

Article 2147483647 assigned, and then deleted (so now group is full): LAST=2147483647
LOW=2147483647 (floored at LAST, rather than LAST + 1, only in this case) HIGH=2147483646

So the response in these two cases is actually identical; the client
can't tell them apart. But the response is still legal, since the low
water mark has not decreased, and HIGH is still LOW - 1. So if we can do
this, why bother with case 3 and set both LOW and HIGH to 2147483647? Presumably something behaved this way historically, just can't fathom why...

There is obviously loss of information in that the client can't tell
these two cases apart. However, in INN, a client also can't tell apart a
group that has never had any articles, and a group that had one article
which expired, since in both cases LOW=1 and HIGH=0, and that is legal
as well.

So if I understand correctly, I believe this approach provides "maximal" information to a client, while remaining fully legal:
1) If a group has only ever been empty, respond LOW=HIGH=0 (case 2 in RFC)
2) If LAST < 2147483647, respond LOW=LAST+1 and HIGH=LOW-1 (case 1, the
way INN used to)
3) If LAST = 2147483647, respond LOW=LAST and HIGH=LOW-1 (case 1, the
way INN does now, and preferred by the RFC, though for somewhat
unsatisfying reasons)

The benefit of adding step #2 is that in most cases, we can provide a
more accurate low water mark - as you pointed out in the erratum.

Only in case 3 is the client unsure of a piece of information (whether
LAST is 2147483646 or 2147483647), and this is arguably the least
important case anyways.
--- Synchronet 3.21f-Linux NewsLink 1.2

From =?UTF-8?Q?Julien_=C3=89LIE?=@iulius@nom-de-mon-site.com.invalid to news.software.nntp on Tue May 5 21:39:36 2026

From Newsgroup: news.software.nntp

Hi InterLinked,

But doesn't that still break if there are multiple cancels during
that period? Even the "official" way of doing it *can* theoretically
break.

Yes, it seems so indeed.

Initially I was doing LOW = LAST + 1 and then changed to LOW = LAST
simply because INN had, but now that I understand this a bit better, I
think I might change back to LOW = LAST + 1 and just handle 2^31-1
specially to prevent an illegal response (and also use 0 0 0 for an
empty group that never had any articles).

It would work.

Aside from 2^31-1, is there ever a case where one would use this?

As you speak about 2^31-1, I would like to tell that you should handle
2^64-1 article numbers by design. INN unfortunately does not, with tons
of variables limited to that size.
The idea is that a modern implementation should handle large article
numbers, advertise it with the MAXARTNUM capability (not standardized),
do not return large article numbers so as not to choke clients (last
time I checked, Thunderbird froze with such large numbers), but return
large article numbers if the client says it copes with them.
By configuration, if instructed to do so, the server could use large
article numbers even if the client does not use the MAXARTNUM capability.

I once proposed in this newsgroup how it could be done:
https://groups.google.com/g/news.software.nntp/c/4_KjHu9GlBg/

Some news clients implemented it (at least flnews and tin) as a proof of concept.

Just to let you know of that as you seem to be interested in the subject :)

Presumably something behaved this way historically, just can't fathom
why...

There were lots of different and exotic NNTP implementations at that
time, and the RFC did its best not to declare them uncompliant as Russ explained.

There is obviously loss of information in that the client can't tell
these two cases apart.

Sure, there is loss of information but I bet few people care about that.
Newsreaders don't advertise differently an empty newsgroup which never received any article and an empty newsgroup which once received an article.

If you care, that's fine, and have fun with your implementation :)
--
Julien |eLIE

-2-aMes opinions ont peut-|-tre chang|-, mais pas le fait que j'ai raison.-a-+
(Ashleigh Brilliant)

--- Synchronet 3.21f-Linux NewsLink 1.2

From InterLinked@nntp@phreaknet.org to news.software.nntp on Sat May 9 10:56:50 2026

From Newsgroup: news.software.nntp

On 5/1/2026 5:14 PM, Russ Allbery wrote:

There are a few other ones that aren't as widely used and are arguably configuration instead, but that do need to be queryable. They're probably fine as configuration files with some in-memory representation, though,
since they're usually very small.

LIST DISTRIB.PATS
LIST MODERATORS
LIST OVERVIEW.FMT

Question about LIST DISTRIB.PATS - is Distribution widely used anymore
in practice? I noticed that Eternal September responds with just this:

10:local.*:local

... which makes me think they just don't care so respond with something simple. I would think a lot of effort would have to go into setting this
up so it would be meaningful and useful.

I'm wondering if maybe this is because clients never caught on to using
it so that's why they configured it that way. I wasn't really paying
attention before but I also don't recall seeing this header much these
days. Are there any compelling reasons to respond otherwise, for either today's Usenet or local groups? And if someone is just going to respond
with that, is it better to have a simple LIST DISTRIB.PATS response like
that or just not support the category at all so as not to mislead the
client into thinking it has useful information to provide?

LIST MODERATORS I could see being non-trivial if you had local groups
that were moderated, and LIST OVERVIEW.FMT depending on the overview
file format; I'm less sure about this one.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Russ Allbery@eagle@eyrie.org to news.software.nntp on Sat May 9 10:29:29 2026

From Newsgroup: news.software.nntp

InterLinked <nntp@phreaknet.org> writes:

On 5/1/2026 5:14 PM, Russ Allbery wrote:

There are a few other ones that aren't as widely used and are arguably
configuration instead, but that do need to be queryable. They're
probably fine as configuration files with some in-memory
representation, though, since they're usually very small.

LIST DISTRIB.PATS
LIST MODERATORS
LIST OVERVIEW.FMT

Question about LIST DISTRIB.PATS - is Distribution widely used anymore in practice?

Yes, it's used pretty extensively for private hierarchies to control distribution of articles that aren't intended to be propagated beyond the participating servers.

I'm wondering if maybe this is because clients never caught on to using
it so that's why they configured it that way.

As with LIST MODERATORS, it's more of an FYI to the client. The server
will add the Distribution header on POST.
--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
--- Synchronet 3.22a-Linux NewsLink 1.2

From InterLinked@nntp@phreaknet.org to news.software.nntp on Sat May 9 15:25:11 2026

From Newsgroup: news.software.nntp

On 5/9/2026 1:29 PM, Russ Allbery wrote:

InterLinked <nntp@phreaknet.org> writes:

On 5/1/2026 5:14 PM, Russ Allbery wrote:

There are a few other ones that aren't as widely used and are arguably
configuration instead, but that do need to be queryable. They're
probably fine as configuration files with some in-memory
representation, though, since they're usually very small.

LIST DISTRIB.PATS
LIST MODERATORS
LIST OVERVIEW.FMT

Question about LIST DISTRIB.PATS - is Distribution widely used anymore in
practice?

Yes, it's used pretty extensively for private hierarchies to control distribution of articles that aren't intended to be propagated beyond the participating servers.

For private hierarchies, couldn't the incoming/outgoing feeds be
configured not to feed such groups to other servers not carrying the hierarchy? e.g.

*,!local.*

At least if I were setting up a private hierarchy, that's all I would
think to do. What does the Distribution header allow for in this case
that can't be done at the feed/group level? (One thought: perhaps an additional layer of protection to prevent propagation if one of the
other servers is not appropriately configured?)

RFC 1036 2.2.7 provides an example (which I know is obsolete, but I
assume the section on Distribution is still accurate, and RFC 5536 lacks detail in comparison). The example seems to show a kind of filtering
that is not purely per-group (Distribution: nj,ny) and that makes a bit
more sense to me, but only in the context of non-local groups that would normally go to a wide audience, e.g. all of Usenet. But if the server
adds the Distribution header purely based on the Newsgroups header, then
it seems kind of redundant to me (at least in a world where all servers
are configured as they should be).

It also seems that all servers would need to support the distributions
for things to work as intended. Is ensuring they exist everywhere they
need to be purely a manual process?
--- Synchronet 3.22a-Linux NewsLink 1.2

From Russ Allbery@eagle@eyrie.org to news.software.nntp on Sat May 9 13:27:10 2026

From Newsgroup: news.software.nntp

InterLinked <nntp@phreaknet.org> writes:

For private hierarchies, couldn't the incoming/outgoing feeds be
configured not to feed such groups to other servers not carrying the hierarchy? e.g.

*,!local.*

The above doesn't work properly due to crossposting.

It's possible to use @ wildcards carefully along with rejection patterns
in incoming.conf, but there are some caveats and it's relatively easy to
make a mistake. Distributions are somewhat simpler. The recommendation is
to use all of the mechanisms for effective defense in depth against misconfigurations.

See:

https://www.eyrie.org/~eagle/faqs/soundness-inn.html

(This hierarchy is defunct, but the same technique is still in use.)
--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
--- Synchronet 3.22a-Linux NewsLink 1.2

From InterLinked@nntp@phreaknet.org to news.software.nntp on Fri May 15 21:31:22 2026

From Newsgroup: news.software.nntp

On 5/2/2026 12:04 PM, Russ Allbery wrote:

No, all the forwarding these days is handled by moderators.isc.org, and there's no place in a control message to document that.

Moderation is a horrible cludge. You will probably be appalled. :) It's a design from another era, and we never completed the work we were hoping to
do to try to make it less of a cludge

Was this earlier work a mechanism for automatically distributing
moderation information using control messages, or something else?

so it's very much something out of
an earlier era of the Internet when spam didn't exist.

So far, I do have one question, more of a technicality, from reading RFC
6048 2.4.3.

Because %s changes the periods in a group name to dashes, the RFC warns
that groups differing only by periods/dashes would have identical
submission templates if only %s is used. In this case, the RFC says
"pattern template cannot be used... for these groups... explicit entries without a pattern will be required".

Since that sounds pretty definite, I'm wondering if that implies that %s
can only appear in the user part by itself or not (at least, the
examples in the RFC all have it by itself). The RFC never says %s has to
be the sole user part, so for example, is this legal?

local.*:prefix+%s@news.example.com

For example, local.foo would go to prefix+local-foo@isc.moderators.org

Is this legal? I feel like it would be, but the wording in the RFC that
says that explicit entries can't be used makes me wonder if this isn't.

To distinguish between local.foo.bar and local.foo-bar, for example, you
could have:

local.*.*:period+%s@news.example.com
local.*-*:dash+%s@news.example.com

Bizarre submission template naming? Absolutely. (And this is a simple
example, I realize if the hierarchy were deeper than three levels in
this example, there could again be ambiguities.) But here, for a certain
set of similarly named groups, you would only need two patterns instead
as many entries as you had groups. Is this legal, and the RFC is just misleading when it says explicit entries are required?

(A more practical example; one might want to use addresses like newsmoderator+%s@news.example.com, so the whole domain's address space
is not reserved for moderation.)

Also, a second question, I noticed in the LIST MODERATORS output from
Eternal September, comp.std.c++ has its own entry, going to
std-cpp-submit@...

I can't recall any other groups with + in the name; does this exception
imply that '+' isn't allowed somewhere along the process for submission templates or isc.moderators.org, or is this just a coincidence?
--- Synchronet 3.22a-Linux NewsLink 1.2

From Russ Allbery@eagle@eyrie.org to news.software.nntp on Fri May 15 18:37:36 2026

From Newsgroup: news.software.nntp

InterLinked <nntp@phreaknet.org> writes:

On 5/2/2026 12:04 PM, Russ Allbery wrote:

No, all the forwarding these days is handled by moderators.isc.org, and
there's no place in a control message to document that. Moderation is a
horrible cludge. You will probably be appalled. :) It's a design from
another era, and we never completed the work we were hoping to do to
try to make it less of a cludge

Was this earlier work a mechanism for automatically distributing
moderation information using control messages, or something else?

We were hoping to standardize cryptographic signatures by moderators
(PGPMoose) and an encapsulation format for conveying messages to
moderators instead of intermixing mail and news in a way that causes tons
of problems for spam filtering.

Because %s changes the periods in a group name to dashes, the RFC warns
that groups differing only by periods/dashes would have identical
submission templates if only %s is used. In this case, the RFC says
"pattern template cannot be used... for these groups... explicit entries without a pattern will be required".

Since that sounds pretty definite, I'm wondering if that implies that %s
can only appear in the user part by itself or not (at least, the examples
in the RFC all have it by itself). The RFC never says %s has to be the
sole user part, so for example, is this legal?

local.*:prefix+%s@news.example.com

I think that would be fine.

Is this legal? I feel like it would be, but the wording in the RFC that
says that explicit entries can't be used makes me wonder if this isn't.

To distinguish between local.foo.bar and local.foo-bar, for example, you could have:

local.*.*:period+%s@news.example.com
local.*-*:dash+%s@news.example.com

I think we just didn't think of that. :) I don't see any obvious reason
why that wouldn't be legal.

Also, a second question, I noticed in the LIST MODERATORS output from
Eternal September, comp.std.c++ has its own entry, going to std-cpp-submit@...

I can't recall any other groups with + in the name; does this exception
imply that '+' isn't allowed somewhere along the process for submission templates or isc.moderators.org, or is this just a coincidence?

Oh, interesting. I suspect that's working around a problem that + gets a special interpretation in a lot of email systems and maybe that was
causing some sort of problem.
--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
--- Synchronet 3.22a-Linux NewsLink 1.2

From InterLinked@nntp@phreaknet.org to news.software.nntp on Fri May 15 21:39:27 2026

From Newsgroup: news.software.nntp

On 5/9/2026 4:27 PM, Russ Allbery wrote:

InterLinked <nntp@phreaknet.org> writes:

For private hierarchies, couldn't the incoming/outgoing feeds be
configured not to feed such groups to other servers not carrying the
hierarchy? e.g.

*,!local.*

The above doesn't work properly due to crossposting.

Ah, I see, if a group were posted to local.foo and comp.foo, then it
would still get shared out to Usenet, despite this rule (thus leaking
the private post).

It's possible to use @ wildcards carefully along with rejection patterns
in incoming.conf, but there are some caveats and it's relatively easy to
make a mistake. Distributions are somewhat simpler. The recommendation is
to use all of the mechanisms for effective defense in depth against misconfigurations.

Is the idea here that since a distribution is once per-message (which
could have multiple newsgroups, both local and non-local), adding the Distribution prevents posts from going to other servers if any non-local
group is one of the newsgroups of a post?

For example, as soon as local.foo is seen, a distribution gets added
marking which would then prevent the message from going to Usenet, even
if it includes groups that, had they been the sole newsgroup of a post,
would have gone to Usenet?

Although in this simple example, the cross-posted Usenet groups would
never reach Usenet, so from what I can tell, this only protects against "posting accidents" since a user wouldn't have a legitimate reason to
try cross-posting to both a local and public group.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Russ Allbery@eagle@eyrie.org to news.software.nntp on Fri May 15 18:53:35 2026

From Newsgroup: news.software.nntp

InterLinked <nntp@phreaknet.org> writes:

On 5/9/2026 4:27 PM, Russ Allbery wrote:

It's possible to use @ wildcards carefully along with rejection
patterns in incoming.conf, but there are some caveats and it's
relatively easy to make a mistake. Distributions are somewhat simpler.
The recommendation is to use all of the mechanisms for effective
defense in depth against misconfigurations.

Is the idea here that since a distribution is once per-message (which
could have multiple newsgroups, both local and non-local), adding the Distribution prevents posts from going to other servers if any non-local group is one of the newsgroups of a post?

Right, the distribution is added by the server to all posts to local.* and
then you can exclude the distribution on all our outgoing feeds to anyone
you didn't want to exchange local.* with.

For example, as soon as local.foo is seen, a distribution gets added
marking which would then prevent the message from going to Usenet, even
if it includes groups that, had they been the sole newsgroup of a post,
would have gone to Usenet?

Yup.

Although in this simple example, the cross-posted Usenet groups would
never reach Usenet, so from what I can tell, this only protects against "posting accidents" since a user wouldn't have a legitimate reason to
try cross-posting to both a local and public group.

Yeah, and of course you can also use a filter to just block crossposts directly. There are various ways to do it, but distribution has some nice property for old servers with no programmatic filter.
--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
--- Synchronet 3.22a-Linux NewsLink 1.2

Who's Online

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	65
Nodes:	6 (0 / 6)
Uptime:	05:53:00
Calls:	862
Files:	1,311
D/L today:	921 files (14,318M bytes)
Messages:	264,697

High and low water marks vs active file

Who's Online

System Info