• High and low water marks vs active file

    From InterLinked@nntp@phreaknet.org to news.software.nntp on Thu Apr 30 18:47:11 2026
    From Newsgroup: news.software.nntp

    Recently, I've been diving a bit deeper into the behavior of the high
    and low water marks and the active file, and I'm a bit confused as to
    how they relate.

    RFC 3977 clearly spells out that the high and low water marks refer to
    the smallest and largest numbered articles in a newsgroup, with the
    caveat for empty groups when the high water mark is usually one less
    than the low water mark (and the low water mark is the former high water mark[1]).

    However, the documentation for the active file in InterNetNews (INN)[2]
    says that <high> is the "highest article number that has ever been used
    in that newsgroup". This implies it is NOT the same as the reported high
    water mark, because the high water mark could decrease while <high> in
    this context is monotonically increasing - and the low and high water
    marks can be recomputed by scanning the spool directory, while <high> in
    the active file cannot (and thus needs to be stored persistently).
    Though it certainly doesn't help any that the name "high" is used for
    this value as well.

    I've been perusing the source of InterNetNews (INN) to try to understand
    how it behaves, as a reference. It refers to the active file <high> as
    LAST in a few places, and this is used when assigning new article IDs in
    a group. This makes sense. For LIST COUNT and GROUP, it pulls from group stats, which I believe is ultimately some kind of database backend that provides the reported water marks and article count. However, in the
    response for LIST ACTIVE, it simply dumps the line from the active file
    as is. Yet, the RFC says the response format for LIST ACTIVE includes
    the reported high and low water marks.

    I can't find any examples of newsgroups where the high water mark
    article is deleted, so it's hard to poke at this behavior, but it begs
    the following questions:

    1. If "LAST" is an internal value used for assigning article IDs, and
    not the reported high water mark, then why is it being handed out as
    such for LIST ACTIVE? I would think it would use the actual reported
    high water mark, because if the high water mark article were deleted,
    then the response would have the wrong high water mark.

    2. The same page says <low> in the active file is ~the low water mark
    but "not guaranteed to be accurate" and is just a hint. In INN, do the
    values of <low> in the active file ever differ from the low water mark
    in the group stats? Or are they distinct values like the active file
    <high> (LAST) and the low water mark?

    Am I misunderstanding anything here about either INN's behavior or the intention in the RFC? (And while I've used INN as an example, my
    interest is more about "correct" news server behavior in general.)

    Thanks!

    [1] https://github.com/InterNetNews/inn/issues/250
    [2] https://www.eyrie.org/~eagle/software/inn/docs/active.html
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Russ Allbery@eagle@eyrie.org to news.software.nntp on Thu Apr 30 17:05:24 2026
    From Newsgroup: news.software.nntp

    InterLinked <nntp@phreaknet.org> writes:

    However, the documentation for the active file in InterNetNews (INN)[2]
    says that <high> is the "highest article number that has ever been used in that newsgroup". This implies it is NOT the same as the reported high
    water mark, because the high water mark could decrease while <high> in
    this context is monotonically increasing

    Correct, INN never decreases the high water mark under normal operations. Therefore, if the latest article in the group is deleted, the high water
    mark will not decrease and will reference a non-existent article.

    This is how most news servers have historically behaved, so the arguable implication in RFC 3977 that servers should decrease the high water mark
    in this case is arguably a bug. In practice, it has no real effect since
    news readers are required to handle this situation anyway, due to:

    | The set of articles in a group may change after the GROUP command is
    | carried out:
    |
    | o Articles may be removed from the group.

    which is implicitly incorporated by reference into LIST ACTIVE since it references that definition of high and low water marks (and logically has
    to be the case regardless, since of course the state of the spool could
    change after LIST ACTIVE just as it could after GROUP).

    - and the low and high water marks can be recomputed by scanning the
    spool directory, while <high> in the active file cannot (and thus needs
    to be stored persistently).

    Yes, and I'm not sure INN's behavior when the news administrator
    reconstructs the active file from the spool is strictly conforming in all
    edge cases. (For example, the low water mark could decrease, which is
    forbidden by RFC 3977.) In practice, the edge cases probably don't matter.

    I've been perusing the source of InterNetNews (INN) to try to understand
    how it behaves, as a reference. It refers to the active file <high> as
    LAST in a few places, and this is used when assigning new article IDs in a group. This makes sense. For LIST COUNT and GROUP, it pulls from group
    stats, which I believe is ultimately some kind of database backend that provides the reported water marks and article count. However, in the
    response for LIST ACTIVE, it simply dumps the line from the active file as is. Yet, the RFC says the response format for LIST ACTIVE includes the reported high and low water marks.

    In theory INN could construct a LIST ACTIVE response from the overview database. In practice, this is a very frequent operation and the current implementation is probably considerably faster than an overview-based implementation, for dubious benefit.

    So, for your questions:

    1. If "LAST" is an internal value used for assigning article IDs, and
    not the reported high water mark, then why is it being handed out as
    such for LIST ACTIVE? I would think it would use the actual reported
    high water mark, because if the high water mark article were deleted,
    then the response would have the wrong high water mark.

    Because it's slow, basically. News readers like for LIST ACTIVE to be very
    fast with a large number of groups so that they can show unread article
    counts quickly on newsreader startup.

    2. The same page says <low> in the active file is ~the low water mark
    but "not guaranteed to be accurate" and is just a hint. In INN, do the
    values of <low> in the active file ever differ from the low water mark
    in the group stats? Or are they distinct values like the active file
    <high> (LAST) and the low water mark?

    I would never guarantee full integrity between all of INN's various
    databases because they're all independent and updated non-transactionally,
    so all sorts of weird things are true momentarily. In theory, the low
    water mark in the active file should be eventually consistent with the low water mark in overview, but it will certainly vary while in the middle of nighly expire and may vary at other times that I'm not thinking of.

    If I were writing a new news server from scratch today in 2026, I would
    try very hard not to use INN's design of having four separate databases in three entirely different formats for the active file, the newsgroup descriptions, the overview, and the history. Surely there is some way to
    write a transactional database that could track all those things in a more reasonable but still performant way that doesn't require constantly
    managing inconsistencies the way that INN does after some crashes or corruption. It used to be that all SQL databases were just too slow, particularly for history, but overview can be put in SQLite these days and
    I'm dubious that there is no standard database that could handle history
    given how much database optimization has happened over the years.

    But INN will evolve slowly, if at all, because it basically works and a
    lot of the bugs have been flushed out over the years and changing
    architectures is very hard. :)
    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From InterLinked@nntp@phreaknet.org to news.software.nntp on Thu Apr 30 22:21:53 2026
    From Newsgroup: news.software.nntp

    On 4/30/2026 8:05 PM, Russ Allbery wrote:
    InterLinked <nntp@phreaknet.org> writes:

    However, the documentation for the active file in InterNetNews (INN)[2]
    says that <high> is the "highest article number that has ever been used in >> that newsgroup". This implies it is NOT the same as the reported high
    water mark, because the high water mark could decrease while <high> in
    this context is monotonically increasing

    Correct, INN never decreases the high water mark under normal operations. Therefore, if the latest article in the group is deleted, the high water
    mark will not decrease and will reference a non-existent article.

    This is how most news servers have historically behaved, so the arguable implication in RFC 3977 that servers should decrease the high water mark
    in this case is arguably a bug. In practice, it has no real effect since
    news readers are required to handle this situation anyway, due to:

    | The set of articles in a group may change after the GROUP command is
    | carried out:
    |
    | o Articles may be removed from the group.

    Hmm, that's an interesting way to look at it - even if the article was
    deleted *before* the GROUP response is generated, the client wouldn't be
    able to tell.

    But the 3rd bullet in the phrase you cite (6.1.1.2) also says:

    | o New articles may be added with article numbers greater than the
    | reported high water mark. (If an article that was the one with
    | the highest number has been removed and the high water mark has
    | been adjusted accordingly, the next new article will not have the
    | number one greater than the reported high water mark.)

    To me, this implies the high water mark can (even "should") decrease
    when the high water mark article is removed - in which case, the next
    article assigned would indeed not have high + 1 (similar to how UIDs and UIDNEXT work in IMAP).

    Of course, the phrasing doesn't mandate that the high water mark
    decrease in this case, though it seems to allow for that option.

    which is implicitly incorporated by reference into LIST ACTIVE since it references that definition of high and low water marks (and logically has
    to be the case regardless, since of course the state of the spool could change after LIST ACTIVE just as it could after GROUP).

    Yes, that makes sense. But this seems more like a "loophole" or
    "shortcut"... did the RFC actually intend it work this way? Or everybody
    was already doing it that way before RFC 3977, and that part of it has
    just been ignored?

    For context, I am working on my own NNTP implementation and I've really
    been scratching my head about how to handle this case. It seems like if
    I'm able to provide a more accurate response (a lower high water mark),
    that would be preferred, but maybe there is a good reason not to do so?
    (The obvious one being it requires extra bookkeeping).

    To make an extreme example, if a group with a lot of articles had all of
    them except the low water mark article deleted (and "last" is 3000), you
    could have a response like:

    211 3000 1 3000 misc.test

    [and the count has to be at least 3000, per the RFC, so we can't even
    have 211 1 1 3000 misc.test to indicate there are definite gaps]

    when in reality, this is the "most accurate" response:

    211 1 1 1 misc.test

    Though now that begs the question what to display if that last article
    (1) were then deleted. I presume in the first case, it would naturally be:

    211 0 3000 2999 misc.test

    And this is probably the best response. In the second case, it seems
    more ambiguous what the most logical reply would be, since you could
    start with either "last" or whatever the last true high water mark was
    (e.g. 211 0 0 1 misc.test).

    That brings me to another observation: I've noticed that most inactive newsgroups in INN return high 0 and low 1 (at least for those I've
    analyzed in responses for Usenet groups), which seemed odd to me as I
    would have thought INN would naturally take the <high> in the active
    file, and taking low = <high> and high = low - 1, return something more
    like 211 0 3000 2999 misc.test

    The benefit there is that from the output you could see how many
    articles were in the group historically, from the high water mark, even
    if all have since been deleted - it's extra context that can be conveyed
    "for free". Is there a reason INN just uses 1/0 instead? This seems like
    one case where using <last> directly would actually really make sense
    for a client.

    - and the low and high water marks can be recomputed by scanning the
    spool directory, while <high> in the active file cannot (and thus needs
    to be stored persistently).

    Yes, and I'm not sure INN's behavior when the news administrator
    reconstructs the active file from the spool is strictly conforming in all edge cases. (For example, the low water mark could decrease, which is forbidden by RFC 3977.) In practice, the edge cases probably don't matter.

    I've been perusing the source of InterNetNews (INN) to try to understand
    how it behaves, as a reference. It refers to the active file <high> as
    LAST in a few places, and this is used when assigning new article IDs in a >> group. This makes sense. For LIST COUNT and GROUP, it pulls from group
    stats, which I believe is ultimately some kind of database backend that
    provides the reported water marks and article count. However, in the
    response for LIST ACTIVE, it simply dumps the line from the active file as >> is. Yet, the RFC says the response format for LIST ACTIVE includes the
    reported high and low water marks.

    In theory INN could construct a LIST ACTIVE response from the overview database. In practice, this is a very frequent operation and the current implementation is probably considerably faster than an overview-based implementation, for dubious benefit.

    So, for your questions:

    1. If "LAST" is an internal value used for assigning article IDs, and
    not the reported high water mark, then why is it being handed out as
    such for LIST ACTIVE? I would think it would use the actual reported
    high water mark, because if the high water mark article were deleted,
    then the response would have the wrong high water mark.

    Because it's slow, basically. News readers like for LIST ACTIVE to be very fast with a large number of groups so that they can show unread article counts quickly on newsreader startup.

    Okay, so performance > strict correctness (which is a reasonable answer,
    when the client can't really say it wasn't correct).

    Though I don't see why the implementation could not be such that it
    would be just as fast - either perhaps through an in-memory cache of low/high/count for all groups, kept in sync with the active file, or
    even more simply, storing this all in the active file itself, i.e. with
    a format like:

    <name> <last> <reportedhigh> <reportedlow> <count> <status>

    Performance-wise, since the active file has to be updated when new
    articles are posted anyways, and deletions have to update the low water
    mark anyways, overall # of writes would stay the same.

    I'm not proposing either of these for INN specifically, but wondering if either would make sense in the design of new software. If I had to
    guess, maybe the active file hasn't been extended like this for compatibility/portability reasons?

    (For simplicity, I've also made the assumption articles won't be
    manually deleted outside of the software's knowledge.)

    2. The same page says <low> in the active file is ~the low water mark
    but "not guaranteed to be accurate" and is just a hint. In INN, do the
    values of <low> in the active file ever differ from the low water mark
    in the group stats? Or are they distinct values like the active file
    <high> (LAST) and the low water mark?

    I would never guarantee full integrity between all of INN's various
    databases because they're all independent and updated non-transactionally,
    so all sorts of weird things are true momentarily. In theory, the low
    water mark in the active file should be eventually consistent with the low water mark in overview, but it will certainly vary while in the middle of nighly expire and may vary at other times that I'm not thinking of.

    If I were writing a new news server from scratch today in 2026, I would
    try very hard not to use INN's design of having four separate databases in three entirely different formats for the active file, the newsgroup descriptions, the overview, and the history. Surely there is some way to write a transactional database that could track all those things in a more reasonable but still performant way that doesn't require constantly
    managing inconsistencies the way that INN does after some crashes or corruption. It used to be that all SQL databases were just too slow, particularly for history, but overview can be put in SQLite these days and I'm dubious that there is no standard database that could handle history given how much database optimization has happened over the years.

    But INN will evolve slowly, if at all, because it basically works and a
    lot of the bugs have been flushed out over the years and changing architectures is very hard. :)

    Thanks, this is helpful, since I'm basically writing a new news server
    from scratch. Obviously, performance matters, though I don't want to prioritize that above all else - for now, this will be smaller scale
    (private news hierarchies or subsets of Usenet).

    The low/high water marks and count could be computed at startup by
    scanning the directories, and then stored in memory, but now I'm kind of tempted by the idea of just having it all in an "extended" active file.

    So, two more big questions, I guess:

    1. It seems that convention is to "lie" about the high water mark and
    just hand out "last" instead, for performance, at least the way INN is implemented (since the client can't tell that we lied). Considering it
    feels against the *spirit* of the RFC, setting aside performance, do you foresee any problems with choosing to provide an accurate high water
    mark? I can't see how it would break compatibility, since the RFC
    already says the high water mark CAN decrease, even if nobody does it today.

    (Edge case being if all articles are deleted, then using last makes
    sense - though as I wondered above, I'm not sure if that's even what INN does.)

    2. Is INN's active file (or file system more generally) intended to be portable with other news servers? If not, it seems like I could just
    extend the active file to add the "true" high water mark along with the article count, and then just use that for both LIST ACTIVE and GROUP.
    Then I could be truthful with no performance hit. Sure, I would have to
    parse the line and omit "last" and "count", format the rest and return
    it, but that seems minor and probably worth it (and as long as I'm
    already formatting it at this point, I could return non-padded numbers
    instead of zero-padded numbers, ultimately saving bandwidth for listing
    all groups as a side benefit - of course, I'd still pad the file to
    allow in-place edits).

    I realize these are all edge cases, but I could see them arising and I
    would prefer to be as correct as possible, especially if performance
    won't be impacted much. But maybe (probably) there's something here that
    I've not fully thought through...

    Thanks!
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From ram@ram@zedat.fu-berlin.de (Stefan Ram) to news.software.nntp on Fri May 1 10:04:59 2026
    From Newsgroup: news.software.nntp

    InterLinked <nntp@phreaknet.org> wrote or quoted:
    To make an extreme example, if a group with a lot of articles had all of >them except the low water mark article deleted (and "last" is 3000), you >could have a response like:
    211 3000 1 3000 misc.test
    [and the count has to be at least 3000, per the RFC, so we can't even
    have 211 1 1 3000 misc.test to indicate there are definite gaps]
    when in reality, this is the "most accurate" response:
    211 1 1 1 misc.test

    A newsreader might already have read those 3000 articles and made
    an internal note:

    |In that group, I have seen everything up to 3000.

    . So when the newsserver then would go back to "211 1 1 1 misc.test",
    the newsreader might miss the next 2999 articles because it deems then
    "seen".

    LISTGROUP and XHDR can be used to learn more about available articles.


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From InterLinked@nntp@phreaknet.org to news.software.nntp on Fri May 1 09:58:32 2026
    From Newsgroup: news.software.nntp

    On 5/1/2026 6:04 AM, Stefan Ram wrote:
    InterLinked <nntp@phreaknet.org> wrote or quoted:
    To make an extreme example, if a group with a lot of articles had all of
    them except the low water mark article deleted (and "last" is 3000), you
    could have a response like:
    211 3000 1 3000 misc.test
    [and the count has to be at least 3000, per the RFC, so we can't even
    have 211 1 1 3000 misc.test to indicate there are definite gaps]
    when in reality, this is the "most accurate" response:
    211 1 1 1 misc.test

    A newsreader might already have read those 3000 articles and made
    an internal note:

    |In that group, I have seen everything up to 3000.

    Yes, but not if it's new to the group.

    . So when the newsserver then would go back to "211 1 1 1 misc.test",
    the newsreader might miss the next 2999 articles because it deems then
    "seen".

    LISTGROUP and XHDR can be used to learn more about available articles.

    Yes, very true.

    While all legitimate rationales, they still feel to me a bit like justifications for taking a shortcut. I know life would be simpler if I
    took the same shortcut, but so far, it doesn't seem like there is
    anything forcing me to either...
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Russ Allbery@eagle@eyrie.org to news.software.nntp on Fri May 1 10:38:50 2026
    From Newsgroup: news.software.nntp

    InterLinked <nntp@phreaknet.org> writes:
    On 4/30/2026 8:05 PM, Russ Allbery wrote:

    Therefore, if the latest article in the group is deleted, the high
    water mark will not decrease and will reference a non-existent article.
    This is how most news servers have historically behaved, so the
    arguable implication in RFC 3977 that servers should decrease the high
    water mark in this case is arguably a bug. In practice, it has no real
    effect since news readers are required to handle this situation anyway,
    due to:

    | The set of articles in a group may change after the GROUP command is
    | carried out:
    |
    | o Articles may be removed from the group.

    Hmm, that's an interesting way to look at it - even if the article was deleted *before* the GROUP response is generated, the client wouldn't be
    able to tell.

    But the 3rd bullet in the phrase you cite (6.1.1.2) also says:

    | o New articles may be added with article numbers greater than the
    | reported high water mark. (If an article that was the one with
    | the highest number has been removed and the high water mark has
    | been adjusted accordingly, the next new article will not have the
    | number one greater than the reported high water mark.)

    To me, this implies the high water mark can (even "should") decrease when
    the high water mark article is removed - in which case, the next article assigned would indeed not have high + 1 (similar to how UIDs and UIDNEXT
    work in IMAP).

    So, I have to admit that I don't recall this explicitly coming up during
    the RFC discussions, so I don't have a definitive answer for you about why
    we worded it this way. I think if we'd noticed this at the time, we would
    have been a bit clearer about what clients should expect, so I think there
    is a (minor) bug in the standard here.

    What I can say is that the intent of RFC 3977 was to document existing
    practice (which had moved on a lot since RFC 977) and add some new
    features, but not to rule out the behavior of existing servers unless it
    was clearly wrong in some way that would cause problems.

    There are two fairly obvious ways to handle the high water mark:

    1. Keep low and high water marks in only one place, increment the high
    water mark on every new article arrival as part of article numbering,
    and never decrement it because it doubles as the source of the next
    article number for that group.

    2. Keep internal "next article number" data for each group but report the
    high water mark based on what articles are in the spool at the time.

    Historically, INN (and C News, I'm fairly sure) always did 1, so that was
    very widespread practice. I'm fairly sure that we wouldn't have chosen to declare it nonconformant. 2 is arguably more correct so the language
    should (and was) written to *allow* it, but we wouldn't have *required* it
    and ruled the historic INN behavior non-compliant. INN's ability to do 1
    in theory based on the overview database information is new in INN 2.x as
    I recall. Before that, OVcancel was not a thing, and there was no way to
    remove the information about the cancelled article from overview before
    the next nightly expire, so there was no independent source of truth about
    the current article numbers beyond checking the spool.

    I agree with you that this didn't really make it into the text, but I
    think that's just a minor bug in the standard that we didn't catch at the
    time.

    Thinking about the problem this morning, I do see a small but real
    advantage to the client in getting an accurate high water mark: It means
    that the count of unread articles derived purely from LIST ACTIVE will be
    more correct in the specific case that only the highest-numbered article
    was removed. That in turn may save some spurious notification of unread messages. But counts based solely on LIST ACTIVE responses are going to be inaccurate for the more common case (for servers that support article
    removal at all in their configuration) of an article that is *not* the highest-numbered article being removed. This is just the tradeoff of using
    LIST ACTIVE for article numbers; if the client wants more accurate
    information, it needs to use one of the other commands like OVER. But of
    course those are inherently heavier-weight, due to the increased amount of information returned and the requirement of a round trip per group.

    Of course, the phrasing doesn't mandate that the high water mark
    decrease in this case, though it seems to allow for that option.

    Yes, I agree that decreasing the high water mark is definitely allowed.

    Yes, that makes sense. But this seems more like a "loophole" or
    "shortcut"... did the RFC actually intend it work this way? Or everybody
    was already doing it that way before RFC 3977, and that part of it has
    just been ignored?

    The latter. You'll find that this is really common in the netnews RFCs:
    there was such a long gap between the initial RFCs and the updates, and so
    much changed about the implementations in a not-entirely-coherent way,
    that the RFCs allow for a lot of variations of behavior to avoid declaring existing implementations nonconformant with the new standard except where
    that seemed warranted.

    The primary purpose of the RFC refresh cycle was not to try to clean up
    all the existing implementations, but instead ot document what the
    behavior was in as clean of a way as possible so that new software knew
    what it could rely on.

    For context, I am working on my own NNTP implementation and I've really
    been scratching my head about how to handle this case. It seems like if
    I'm able to provide a more accurate response (a lower high water mark),
    that would be preferred, but maybe there is a good reason not to do so?
    (The obvious one being it requires extra bookkeeping).

    If you can provide a more accurate high water mark, I don't see any
    drawback to doing so. The only possible downside that I can imagine is
    that some client will be surprised by the high water mark decreasing,
    since it has never seen a server that would do that, and might issue some
    sort of warning to the user. I suppose such a client could exist. But it
    would surprise me a bit; decreasing the high water mark is clearly allowed
    by the RFC.

    To make an extreme example, if a group with a lot of articles had all of
    them except the low water mark article deleted (and "last" is 3000), you could have a response like:

    211 3000 1 3000 misc.test

    [and the count has to be at least 3000, per the RFC, so we can't even have 211 1 1 3000 misc.test to indicate there are definite gaps]

    Yup, and in the days when spam and spam cancels were fighting it out, it
    wasn't uncommon to see things like that happen in some groups.

    when in reality, this is the "most accurate" response:

    211 1 1 1 misc.test

    Though now that begs the question what to display if that last article (1) were then deleted. I presume in the first case, it would naturally be:

    211 0 3000 2999 misc.test

    And this is probably the best response. In the second case, it seems more ambiguous what the most logical reply would be, since you could start with either "last" or whatever the last true high water mark was (e.g. 211 0 0
    1 misc.test).

    *If* your server would never reinstate articles, the best response in the
    sense of giving the client the most information would be to increase the
    low water mark and return a high of 2999 and a low of 3000, because the
    client can then forget about all of those deleted articles permanently.
    But as the RFC says, if you might ever reinstate those articles, you're
    not allowed to increase the low water mark like that, so I think the best response would be to return high 0 and low 1 if the articles may later reappear.

    That brings me to another observation: I've noticed that most inactive newsgroups in INN return high 0 and low 1 (at least for those I've
    analyzed in responses for Usenet groups), which seemed odd to me as I
    would have thought INN would naturally take the <high> in the active file, and taking low = <high> and high = low - 1, return something more like 211
    0 3000 2999 misc.test

    Is it possible that those groups have never received traffic on that
    server? That's the response I would expect if the server has never stored
    an article for that group.

    The benefit there is that from the output you could see how many articles were in the group historically, from the high water mark, even if all have since been deleted - it's extra context that can be conveyed "for free".
    Is there a reason INN just uses 1/0 instead? This seems like one case
    where using <last> directly would actually really make sense for a client.

    I *think* that if it had ever received traffic (and the news administrator hadn't rebuilt the active file, etc.), you would see the result that you
    are expecting.

    Okay, so performance > strict correctness (which is a reasonable answer,
    when the client can't really say it wasn't correct).

    Exactly.

    Though I don't see why the implementation could not be such that it
    would be just as fast - either perhaps through an in-memory cache of low/high/count for all groups, kept in sync with the active file, or
    even more simply, storing this all in the active file itself, i.e. with
    a format like:

    <name> <last> <reportedhigh> <reportedlow> <count> <status>

    There is no reason that one could not make it fast. It's just extra
    development work and extra bookkeeping that no one has implemented for
    INN. The overview database, where the more accurate information is stored,
    is optimized for per-group retrieval and, for some of the overview
    backends currently implemented, iterating through all of the groups to get current low and high marks would be slow.

    I'm not proposing either of these for INN specifically, but wondering if either would make sense in the design of new software. If I had to
    guess, maybe the active file hasn't been extended like this for compatibility/portability reasons?

    Yes, exactly, and just because this didn't seem important enough to put
    effort into.

    The low/high water marks and count could be computed at startup by
    scanning the directories, and then stored in memory, but now I'm kind of tempted by the idea of just having it all in an "extended" active file.

    If I were writing a news server from scratch, I would embrace modern
    databases as early as possible and not try to reinvent that wheel. Long experience with INN is that the reinvention of various databases is one of
    the hardest parts of INN to maintain and handing that all off to some
    suitable library or external service would be very attractive.

    1. It seems that convention is to "lie" about the high water mark and
    just hand out "last" instead, for performance, at least the way INN is implemented (since the client can't tell that we lied). Considering it
    feels against the *spirit* of the RFC, setting aside performance, do you foresee any problems with choosing to provide an accurate high water
    mark? I can't see how it would break compatibility, since the RFC
    already says the high water mark CAN decrease, even if nobody does it
    today.

    I suspect it would be fine to do that.

    2. Is INN's active file (or file system more generally) intended to be portable with other news servers?

    Not really, no. Some of INN's on-disk data structures match the format of
    files specified in the RFC for convenience reasons, but most of INN"s
    on-disk data structures (apart from the spool if tradspool is used) are
    very, very specific to INN.

    If not, it seems like I could just extend the active file to add the
    "true" high water mark along with the article count, and then just use
    that for both LIST ACTIVE and GROUP. Then I could be truthful with no performance hit.

    If you are reworking the format, I would find a way to put the newsgroup description into the same file, because desynchronization between active
    and newsgroups is a long-standing annoyance in INN. And at that point I
    would consider some sort of structured database with fast writes. :)
    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From InterLinked@nntp@phreaknet.org to news.software.nntp on Fri May 1 16:37:34 2026
    From Newsgroup: news.software.nntp

    On 5/1/2026 1:38 PM, Russ Allbery wrote:
    There are two fairly obvious ways to handle the high water mark:

    1. Keep low and high water marks in only one place, increment the high
    water mark on every new article arrival as part of article numbering,
    and never decrement it because it doubles as the source of the next
    article number for that group.

    2. Keep internal "next article number" data for each group but report the
    high water mark based on what articles are in the spool at the time.

    Historically, INN (and C News, I'm fairly sure) always did 1, so that was very widespread practice. I'm fairly sure that we wouldn't have chosen to declare it nonconformant. 2 is arguably more correct so the language
    should (and was) written to *allow* it, but we wouldn't have *required* it and ruled the historic INN behavior non-compliant. INN's ability to do 1
    in theory based on the overview database information is new in INN 2.x as
    I recall. Before that, OVcancel was not a thing, and there was no way to remove the information about the cancelled article from overview before
    the next nightly expire, so there was no independent source of truth about the current article numbers beyond checking the spool.

    I think I understand the landscape now - both are compliant but choose
    to gravitate slightly more towards either performance or correctness.

    I'm more surprised if it's the case that maybe this is the first time
    anyone is considering #2 in a design.

    Thinking about the problem this morning, I do see a small but real
    advantage to the client in getting an accurate high water mark: It means
    that the count of unread articles derived purely from LIST ACTIVE will be more correct in the specific case that only the highest-numbered article
    was removed. That in turn may save some spurious notification of unread messages. But counts based solely on LIST ACTIVE responses are going to be inaccurate for the more common case (for servers that support article
    removal at all in their configuration) of an article that is *not* the highest-numbered article being removed. This is just the tradeoff of using LIST ACTIVE for article numbers; if the client wants more accurate information, it needs to use one of the other commands like OVER. But of course those are inherently heavier-weight, due to the increased amount of information returned and the requirement of a round trip per group.

    I think I'll definitely want to consider that angle - hitherto, I've
    been directly using Eternal September in my newsreader (which is Mozilla-based) and I've noticed for some large groups, I see a very high count, and then when I click on the group, it changes radically. Just
    now, I did a packet capture and I only see it using the GROUP command
    (and not LIST ACTIVE at all), but from the configuration that INN
    allows, I wonder if maybe Eternal September has their INN (for indeed,
    they are using INN) configured to return estimate group counts in most
    cases, and thus my reader only sees the correct count when I click on
    the group.

    Could've been some other command, but that makes me desire even more
    strongly to always provide accurate counts as well, if nothing else to
    avoid irritating me :)

    If you can provide a more accurate high water mark, I don't see any
    drawback to doing so. The only possible downside that I can imagine is
    that some client will be surprised by the high water mark decreasing,
    since it has never seen a server that would do that, and might issue some sort of warning to the user. I suppose such a client could exist. But it would surprise me a bit; decreasing the high water mark is clearly allowed
    by the RFC.

    To make an extreme example, if a group with a lot of articles had all of
    them except the low water mark article deleted (and "last" is 3000), you
    could have a response like:

    211 3000 1 3000 misc.test

    [and the count has to be at least 3000, per the RFC, so we can't even have >> 211 1 1 3000 misc.test to indicate there are definite gaps]

    Yup, and in the days when spam and spam cancels were fighting it out, it wasn't uncommon to see things like that happen in some groups.

    when in reality, this is the "most accurate" response:

    211 1 1 1 misc.test

    Though now that begs the question what to display if that last article (1) >> were then deleted. I presume in the first case, it would naturally be:

    211 0 3000 2999 misc.test

    And this is probably the best response. In the second case, it seems more
    ambiguous what the most logical reply would be, since you could start with >> either "last" or whatever the last true high water mark was (e.g. 211 0 0
    1 misc.test).

    *If* your server would never reinstate articles, the best response in the sense of giving the client the most information would be to increase the
    low water mark and return a high of 2999 and a low of 3000, because the client can then forget about all of those deleted articles permanently.
    But as the RFC says, if you might ever reinstate those articles, you're
    not allowed to increase the low water mark like that, so I think the best response would be to return high 0 and low 1 if the articles may later reappear.

    Hmm - yet another fork in the road!

    How often does article reinstatement really occur and under what circumstances? Purely by the local newsmaster? I probably wouldn't plan
    to reinstate articles that expired due to the server's local policy; are
    there any other reasons that might happen? Undoing a cancel - is that a
    thing? (And beyond that, without some kind of recycle bin, the article
    would have to be restored from some kind of backup.)

    My personal uninformed preference at the moment is probably to refrain
    from reinstating articles, if only because empty groups would then show
    the historical high water mark count in LIST ACTIVE, which to me would
    be *very* useful and interesting for statistical and information purposes.

    If NNTP had something analogous to UIDVALIDITY in IMAP, where one would normally increase the low water mark but could "reset" it in some
    unforseen circumstance, that would allow for both behaviors, but there
    isn't as far as I know. I know you mentioned water marks can change in potentially non-compliant ways if a group is renumbered, so I guess INN
    may not even be consistent in other cases.

    That brings me to another observation: I've noticed that most inactive
    newsgroups in INN return high 0 and low 1 (at least for those I've
    analyzed in responses for Usenet groups), which seemed odd to me as I
    would have thought INN would naturally take the <high> in the active file, >> and taking low = <high> and high = low - 1, return something more like 211 >> 0 3000 2999 misc.test

    Is it possible that those groups have never received traffic on that
    server? That's the response I would expect if the server has never stored
    an article for that group.

    It's possible, though it would surprise me a little - this was running
    LIST ACTIVE on Eternal September's INN server, which I think has been
    around for a while, but maybe some of these are super old groups that
    have been inactive a long while.

    For active groups, I do see low water marks that are greater than 1, so
    for these groups, there's a commitment to not reinstate articles below
    the present low water mark. So is article reinstatement in an empty
    group vs non-empty really a special case? To allow unconditional reinstatement, the low water mark would always have to be 1, which is
    not really that meaningful. (So intuitively, I would feel that it makes
    more sense to keep the low water mark as high as is legal at any given
    point, assuming reinstatement isn't likely to occur.)

    If I were writing a news server from scratch, I would embrace modern databases as early as possible and not try to reinvent that wheel. Long experience with INN is that the reinvention of various databases is one of the hardest parts of INN to maintain and handing that all off to some suitable library or external service would be very attractive.

    Isn't the database more of a "cache" in INN, of technically
    reconstructible data? (in contrast to the active file, which has <last>
    which is not reconstructible).

    For LIST responses, I don't see how using a database would be faster
    than reading through one of these files, especially if you already have
    to do ACL checks and wildmat matches on every group - I would think
    those would be the bottleneck. For articles within a group, .overview
    seems fairly efficient.

    GROUP would require a linear scan of the active file for its response,
    to find the group, and a database could be faster in that case, but
    apart from single-group responses, is there any case where a database
    would result in a noticeable speedup? And at that point, maybe a simple
    hash table with pointers to the beginning of the corresponding line in
    the active file would close the performance gap, without needing to add
    a database to the picture.

    (I'm not opposed to a database if it really made sense, but it seems
    like a few flat files can get the job done here ~just as good - though
    maybe I'm missing something obvious.)

    2. Is INN's active file (or file system more generally) intended to be
    portable with other news servers?

    Not really, no. Some of INN's on-disk data structures match the format of files specified in the RFC for convenience reasons, but most of INN"s
    on-disk data structures (apart from the spool if tradspool is used) are
    very, very specific to INN.

    Gotcha, makes sense.

    Really, was more wondering about the active file than anything else.
    While not officially standardized anywhere, it seems in practice there
    are a few standardized files with standardized formats:

    .active (LIST ACTIVE)
    .active.times (LIST ACTIVE.TIMES)
    .newsgroups (LIST NEWSGROUPS)
    <group>/.overview (8 standardized fields)
    <group>/<article number> for article naming

    My plan was to go with these, and possibly bastardize .active in the
    process in a way nobody else has done (adding <real high> and <count>).
    This has the downside of deviating from the canonical format for the
    file, which does seem to be pretty universal amongst existing software.
    It just seems silly to me to add another file just to avoid breaking compatibility. (My thinking: worst case, if needed, a migration could
    always be done using NNTP itself anyways, to another server - in which
    case the format of my active file is nobody else's business.)

    I think we've established there's no good reason the real high water
    mark couldn't be stored here, and I don't think there's any reason the
    count couldn't be either, since anything that changes the count updates
    the active file already.

    If not, it seems like I could just extend the active file to add the
    "true" high water mark along with the article count, and then just use
    that for both LIST ACTIVE and GROUP. Then I could be truthful with no
    performance hit.

    If you are reworking the format, I would find a way to put the newsgroup description into the same file, because desynchronization between active
    and newsgroups is a long-standing annoyance in INN. And at that point I
    would consider some sort of structured database with fast writes. :)

    Hmm, could you elaborate a bit more on the kind of desynchronization
    that tends to happen?

    If I recall, the RFC states that the list of groups from LIST ACTIVE and
    LIST NEWSGROUPS can differ (though perhaps this was worded that way to
    prevent existing installations from violating the spec, not necessarily
    to condone that practice? Ideally, would the list of groups always match identically? Or are there ever good reasons they should differ?)

    If combining .newsgroups into .active, it makes me wonder, why not go
    further and also combine .active.times into .active? Were these
    initially separate simply because .active.times came later and wanted to
    avoid breaking the format of .active, or for some other good reason? It
    would seem there *could* theoretically be just one big global file, like so:

    .active.extended

    <group> <last> <high> <low> <count> <creation epoch> <creator name> <description>

    The only thing I can think of (and this applies to .newsgroups but not .active.times) is that if the description is changed, its length can
    change, so now the whole active file needs to be rewritten. But this is probably an uncommon enough occurrence (maybe even less common than
    group creation or deletion?) that the performance implication could be ignored.

    I had also previously assumed .newsgroups was separate because group descriptions contain spaces/tabs, which would complicate the parsing if combined with other stuff. But if I made it the last entry on each line,
    it wouldn't pose an issue.

    And of course, now all the LIST handlers would need to parse the file
    and send the right info, but that's not a big deal either. Maybe
    slightly more contention for the file with locking, is all I can think.

    Most commands would then simply do a full scan of this file and get what
    they need, either for all groups or just a specific group.

    Writes (new or deleted articles) would generally update <last>, <high>,
    <low>, and/or <count>, and while existing servers don't do that, they
    *are* already updating *something* in the file (<last> for new posts,
    and at least one of the water marks for deletions), so updating the
    other metadata is effectively "free".
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From InterLinked@nntp@phreaknet.org to news.software.nntp on Fri May 1 16:56:57 2026
    From Newsgroup: news.software.nntp

    On 5/1/2026 4:37 PM, InterLinked wrote:
    On 5/1/2026 1:38 PM, Russ Allbery wrote:
    Thinking about the problem this morning, I do see a small but real
    advantage to the client in getting an accurate high water mark: It means
    that the count of unread articles derived purely from LIST ACTIVE will be
    more correct in the specific case that only the highest-numbered article
    was removed. That in turn may save some spurious notification of unread
    messages. But counts based solely on LIST ACTIVE responses are going
    to be
    inaccurate for the more common case (for servers that support article
    removal at all in their configuration) of an article that is *not* the
    highest-numbered article being removed. This is just the tradeoff of
    using
    LIST ACTIVE for article numbers; if the client wants more accurate
    information, it needs to use one of the other commands like OVER. But of
    course those are inherently heavier-weight, due to the increased
    amount of
    information returned and the requirement of a round trip per group.

    I think I'll definitely want to consider that angle - hitherto, I've
    been directly using Eternal September in my newsreader (which is Mozilla-based) and I've noticed for some large groups, I see a very high count, and then when I click on the group, it changes radically. Just
    now, I did a packet capture and I only see it using the GROUP command
    (and not LIST ACTIVE at all), but from the configuration that INN
    allows, I wonder if maybe Eternal September has their INN (for indeed,
    they are using INN) configured to return estimate group counts in most cases, and thus my reader only sees the correct count when I click on
    the group.

    Could've been some other command, but that makes me desire even more strongly to always provide accurate counts as well, if nothing else to
    avoid irritating me :)

    And my newsreader just did exactly this annoying thing, and I captured
    the commands. This is using comp.os.linux.misc, via Eternal September,
    as an example:

    My newsreader ran GROUP and got back:

    211 29344 61512 90857

    Then, it immediately ran XOVER 90857-90857 and got a response to that (I
    don't think this is relevant though).

    In a period of about 1 second, the unread count for the group went from
    6, to 20-something thousand, back to 6. I think this is the phenomenon
    you were describing, and yes, it annoys the heck out of me! But now it
    clicks as to why it behaves like that.

    The server really only has 946 articles[1]; yet, INN is reporting it has 29,344 (likely because this is larger than the value of groupexactcount,
    so it just estimated it). I know the overview database has the count,
    though I guess that value is not necessarily up to date, for reasons I
    don't understand currently - presumably keeping it up to date would add non-constant overhead with INN's current architecture.

    But if my reasoning is sound for the proposal I've been contemplating,
    then I could easily write/read the count for "free", and my software
    would avoid these sorts of "UI glitches" in readers.

    [1] https://www.eternal-september.org/groups.php?hierarchy=comp
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Russ Allbery@eagle@eyrie.org to news.software.nntp on Fri May 1 14:14:21 2026
    From Newsgroup: news.software.nntp

    InterLinked <nntp@phreaknet.org> writes:

    How often does article reinstatement really occur and under what circumstances? Purely by the local newsmaster?

    INN doesn't support it at all, although there are some corruption repair
    tools that can create similar effects. It therefore never attempts to
    reserve low water mark space.

    I think people have discussed article reinstatement in theory, usually
    around spam filtering scenarios where an article is quaratined as possible
    spam and then later released. But I don't know if any server has actually implemented this, and therefore am not sure whether the discussion of this
    in the NNTP RFC is theoretical or based on some implementation. It was
    probably discussed at the time, but it's been more than 20 years and I
    don't remember, sadly.

    I probably wouldn't plan to reinstate articles that expired due to the server's local policy; are there any other reasons that might happen?
    Undoing a cancel - is that a thing? (And beyond that, without some kind
    of recycle bin, the article would have to be restored from some kind of backup.)

    There is no control message to undo a cancel, but of course the local administrator can do anything the software allows.

    If NNTP had something analogous to UIDVALIDITY in IMAP, where one would normally increase the low water mark but could "reset" it in some
    unforseen circumstance, that would allow for both behaviors, but there
    isn't as far as I know.

    Correct, there's no such concept in NNTP.

    Is it possible that those groups have never received traffic on that
    server? That's the response I would expect if the server has never
    stored an article for that group.

    It's possible, though it would surprise me a little - this was running
    LIST ACTIVE on Eternal September's INN server, which I think has been
    around for a while, but maybe some of these are super old groups that
    have been inactive a long while.

    There are definitely Big Eight groups that haven't gotten any traffic for
    10-20 years. (Some moderated ones, at least.)

    For active groups, I do see low water marks that are greater than 1, so
    for these groups, there's a commitment to not reinstate articles below
    the present low water mark. So is article reinstatement in an empty
    group vs non-empty really a special case? To allow unconditional reinstatement, the low water mark would always have to be 1, which is
    not really that meaningful.

    Correct. The only requirement is to not increase the low water mark if
    you'd reinstate one of those older articles. The empty group isn't a
    special case. The special case is more "the articles were all removed down
    to the low water mark by something other than expiration," since
    presumably you would never reinstate expired articles, only ones removed
    by some other mechanism that may be erroneous, like cancels (which can be forged if one isn't using canlock or the like).

    If I were writing a news server from scratch, I would embrace modern
    databases as early as possible and not try to reinvent that wheel. Long
    experience with INN is that the reinvention of various databases is one
    of the hardest parts of INN to maintain and handing that all off to
    some suitable library or external service would be very attractive.

    Isn't the database more of a "cache" in INN, of technically
    reconstructible data? (in contrast to the active file, which has <last>
    which is not reconstructible).

    Well, I'm not sure I agree with the distinction you're making here, since
    the active file *is* a database. INN has a whole bunch of databases, some
    of which it stores as text files, but just because the format is a text
    file doesn't make it a database. INN definitely uses the active file like
    a database (hence the zero-padding).

    In general, there is data in many of the databases that cannot be
    reconstructed from the spool, such as article arrival time and the record
    of rejected articles. Overview is a bit of a special case that overview
    can generally be regenerated solely from the spool, but that's just one of
    the (many) databases.

    For LIST responses, I don't see how using a database would be faster
    than reading through one of these files, especially if you already have
    to do ACL checks and wildmat matches on every group - I would think
    those would be the bottleneck.

    I don't think the database would be faster necessarily. I think it would
    be more maintainable and more consistent and have fewer of the numerous
    bugs we've run into with INN over the years. Having transactions, for
    instance, eliminate a whole set of corruption inconsistencies. Combining
    active and newsgroups eliminates a whole class of synchronization issues
    when processing control messages.

    For articles within a group, .overview seems fairly efficient.

    Well, we wrote a whole new overview mechanism because we didn't think it
    was sufficiently efficient. :) Using only a flat .overview file can be extremely slow for very large groups when clients request only a subset of
    the records (which is very common; they usually only care about the latest messages).

    A simple flat .overview text file is how INN 1.x worked, and it does
    indeed work fine for small groups. Everything works fine for small groups.

    GROUP would require a linear scan of the active file for its response,
    to find the group, and a database could be faster in that case, but
    apart from single-group responses, is there any case where a database
    would result in a noticeable speedup?

    In theory, a database may be able to do much faster prefix matching than a linear scan doing wildmat matching for, e.g., LIST ACTIVE news.*, but that would require converting wildmat expressions to something the database can understand with LIKE, which may not be possible in the general case.

    And at that point, maybe a simple hash table with pointers to the
    beginning of the corresponding line in the active file would close the performance gap, without needing to add a database to the picture.

    See, you're going down the same path that all the INN authors, myself
    included, have gone down: You can see a simple data structure that would
    solve the problem that you have and it seems more straightforward to just implement that than use a "full database" which feels like it would have a
    ton of overhead.

    And you can do that! That's how INN works! I'm just saying that as someone
    with a lot of years of experience maintaining that code with a simple hash table and whatnot, a whole lot of time and bugs would have been saved by
    just using an off-the-shelf database. At, of course, the cost of having to handle database transitions and implementation changes and BerkeleyDB
    getting bought by Oracle and then killed and so forth.

    INN now has a SQLite overview implementation and I think that's the right direction to go.

    Really, was more wondering about the active file than anything else. While not officially standardized anywhere, it seems in practice there are a few standardized files with standardized formats:

    .active (LIST ACTIVE)
    .active.times (LIST ACTIVE.TIMES)
    .newsgroups (LIST NEWSGROUPS)
    <group>/.overview (8 standardized fields)
    <group>/<article number> for article naming

    The last is often not used these days because it has a lot of poor
    performance properties.

    There are a few other ones that aren't as widely used and are arguably configuration instead, but that do need to be queryable. They're probably
    fine as configuration files with some in-memory representation, though,
    since they're usually very small.

    LIST DISTRIB.PATS
    LIST MODERATORS
    LIST OVERVIEW.FMT

    I think we've established there's no good reason the real high water mark couldn't be stored here, and I don't think there's any reason the count couldn't be either, since anything that changes the count updates the
    active file already.

    Yes, I agree that seems reasonable.

    If you are reworking the format, I would find a way to put the
    newsgroup description into the same file, because desynchronization
    between active and newsgroups is a long-standing annoyance in INN. And
    at that point I would consider some sort of structured database with
    fast writes. :)

    Hmm, could you elaborate a bit more on the kind of desynchronization that tends to happen?

    Deleting the group doesn't remove the description line. The control
    message processing dies in the middle because the server crashes and a
    line gets added to one and not the other. The description gets added to
    the newsgroups file more than once. Some of these are bugs, but that's
    part of the point: It's theoretically possible to keep the files fully in
    sync, but in practice this has been an area where INN has had tons of
    bugs over the years.

    If I recall, the RFC states that the list of groups from LIST ACTIVE and
    LIST NEWSGROUPS can differ (though perhaps this was worded that way to prevent existing installations from violating the spec, not necessarily
    to condone that practice? Ideally, would the list of groups always match identically? Or are there ever good reasons they should differ?)

    Yeah, I think that may be toleration for lots and lots of historic bugs. Really, there's no reason why this should be the case except for the
    narrow case of a group being added or removed between the two commands.

    If combining .newsgroups into .active, it makes me wonder, why not go
    further and also combine .active.times into .active?

    Yes, indeed.

    Were these initially separate simply because .active.times came later
    and wanted to avoid breaking the format of .active,

    Yup, exactly.

    <group> <last> <high> <low> <count> <creation epoch> <creator name> <description>

    Note that you now have a space-separated file except for the last field
    and you have a problem if you want to add another field that you didn't
    think of originally. I would really want to store this as some sort of structured file because you have some fields there (at least the
    description, maybe the creator name) that can contain a variety of
    characters.

    The only thing I can think of (and this applies to .newsgroups but not .active.times) is that if the description is changed, its length can
    change, so now the whole active file needs to be rewritten.

    Yup, that too. That's the big argument for a database, which supports
    updates without having to rewrite the whole file.

    But this is probably an uncommon enough occurrence (maybe even less
    common than group creation or deletion?) that the performance
    implication could be ignored.

    It's roughly as common as group creation or deletion in my experience.
    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Russ Allbery@eagle@eyrie.org to news.software.nntp on Fri May 1 14:18:42 2026
    From Newsgroup: news.software.nntp

    InterLinked <nntp@phreaknet.org> writes:

    The server really only has 946 articles[1]; yet, INN is reporting it has 29,344 (likely because this is larger than the value of groupexactcount,
    so it just estimated it). I know the overview database has the count,
    though I guess that value is not necessarily up to date, for reasons I
    don't understand currently - presumably keeping it up to date would add non-constant overhead with INN's current architecture.

    I don't know if it's the case here (I don't know if Eternal September even expires articles), but historically another really common reason for this pattern is that the very early article that's holding down the low water
    mark was crossposted to some other group (traditionally *.answers) with a
    much longer retention and the articles after it have expired.

    Note that the article count is not really useful to the news reader client under normal circumstances because the news reader often does not care in
    about how many *total* articles the group contains. If the user has been reading the group (the common case), the news reader really cares about
    how many *unread* articles the group has, and for that the article count
    is basically useless. The article count as returned by NNTP is pretty much
    only useful for groups that you have never read, or haven't read for so
    long that your read mark is below the low water mark.
    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From InterLinked@nntp@phreaknet.org to news.software.nntp on Fri May 1 20:29:28 2026
    From Newsgroup: news.software.nntp

    On 5/1/2026 5:14 PM, Russ Allbery wrote:
    InterLinked <nntp@phreaknet.org> writes:
    Isn't the database more of a "cache" in INN, of technically
    reconstructible data? (in contrast to the active file, which has <last>
    which is not reconstructible).

    Well, I'm not sure I agree with the distinction you're making here, since
    the active file *is* a database. INN has a whole bunch of databases, some
    of which it stores as text files, but just because the format is a text
    file doesn't make it a database. INN definitely uses the active file like
    a database (hence the zero-padding).

    Sorry, to be clear, I meant database in the sense of something like
    SQLite or MySQL, not using a text file under direct control of the
    program as a store.

    For articles within a group, .overview seems fairly efficient.

    Well, we wrote a whole new overview mechanism because we didn't think it
    was sufficiently efficient. :) Using only a flat .overview file can be extremely slow for very large groups when clients request only a subset of the records (which is very common; they usually only care about the latest messages).

    Makes sense - I'm assuming the new mechanism is a database of each
    article, effectively, so you can just select the articles of interest?

    In theory, a database may be able to do much faster prefix matching than a linear scan doing wildmat matching for, e.g., LIST ACTIVE news.*, but that would require converting wildmat expressions to something the database can understand with LIKE, which may not be possible in the general case.

    Another good point, thank you.

    And at that point, maybe a simple hash table with pointers to the
    beginning of the corresponding line in the active file would close the
    performance gap, without needing to add a database to the picture.

    See, you're going down the same path that all the INN authors, myself included, have gone down: You can see a simple data structure that would solve the problem that you have and it seems more straightforward to just implement that than use a "full database" which feels like it would have a ton of overhead.

    And you can do that! That's how INN works! I'm just saying that as someone with a lot of years of experience maintaining that code with a simple hash table and whatnot, a whole lot of time and bugs would have been saved by
    just using an off-the-shelf database. At, of course, the cost of having to handle database transitions and implementation changes and BerkeleyDB
    getting bought by Oracle and then killed and so forth.

    Thanks, this is helpful perspective. I think I still need to sleep on
    this a bit but hearing about your experience here is really valuable.

    Honestly, I was really set on just using flat files before but there are
    some compelling reasons you've brought up. Maybe I'll abstract things in
    a way such that I can start with flat files and add a DB (SQLite or
    other) backend option later that could be used instead. I was trying to
    avoid that complexity but it might be worth it.

    If going the database route, I'm assuming you would just recommend
    SQLite for everything? I would guess a regular RDBMS like MariaDB would
    be overkill (and possibly cause issues if the server weren't local).

    Really, was more wondering about the active file than anything else. While >> not officially standardized anywhere, it seems in practice there are a few >> standardized files with standardized formats:

    .active (LIST ACTIVE)
    .active.times (LIST ACTIVE.TIMES)
    .newsgroups (LIST NEWSGROUPS)
    <group>/.overview (8 standardized fields)
    <group>/<article number> for article naming

    The last is often not used these days because it has a lot of poor performance properties.

    You mean one file per article in the spool?
    From the documentation, I thought the "tradspool" method in INN was the
    most common deployment.

    There are a few other ones that aren't as widely used and are arguably configuration instead, but that do need to be queryable. They're probably fine as configuration files with some in-memory representation, though,
    since they're usually very small.

    LIST DISTRIB.PATS
    LIST MODERATORS
    LIST OVERVIEW.FMT

    Yes, I skipped these since they're global and not "one entry per group"-
    are there any others of those that I missed?

    Is the "LIST MODERATORS" file all that is involved in moderation? I
    didn't think there was any moderator info explicitly associated with
    each group.

    If combining .newsgroups into .active, it makes me wonder, why not go
    further and also combine .active.times into .active?

    Yes, indeed.

    Were these initially separate simply because .active.times came later
    and wanted to avoid breaking the format of .active,

    Yup, exactly.

    <group> <last> <high> <low> <count> <creation epoch> <creator name>
    <description>

    Note that you now have a space-separated file except for the last field
    and you have a problem if you want to add another field that you didn't
    think of originally. I would really want to store this as some sort of structured file because you have some fields there (at least the
    description, maybe the creator name) that can contain a variety of characters.

    Some files already use tab, which I don't *think* is allowed in any of
    the metadata to date? If it is, maybe a non-ASCII character like field separator would work.

    Adding a field is something to think about. It would be a problem for databases too, though there are various migration tools for extending
    schemas, at least, and I'll grant that's one area where databases win
    over plain text files. But regardless of the underlying format, I'd
    prefer to invest enough time in the design up front to hopefully not
    need any changes later. Since NNTP has been stable for quite a long time
    now, I think that's realistic, unless there are new extensions in the
    future which add more metadata - and I do have a few extensions in mind
    for later but none would modify the group metadata.

    (Thinking now of the new possible file format, theoretically a new
    command to return all this info at once might also be useful in the
    future now that doing so would be efficient, e.g. LIST EVERYTHING or
    whatever, instead of doing LIST ACTIVE, LIST ACTIVE.TIMES, LIST
    NEWSGROUPS, etc. individually.)
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From InterLinked@nntp@phreaknet.org to news.software.nntp on Fri May 1 20:34:48 2026
    From Newsgroup: news.software.nntp

    On 5/1/2026 5:18 PM, Russ Allbery wrote:
    InterLinked <nntp@phreaknet.org> writes:

    The server really only has 946 articles[1]; yet, INN is reporting it has
    29,344 (likely because this is larger than the value of groupexactcount,
    so it just estimated it). I know the overview database has the count,
    though I guess that value is not necessarily up to date, for reasons I
    don't understand currently - presumably keeping it up to date would add
    non-constant overhead with INN's current architecture.

    I don't know if it's the case here (I don't know if Eternal September even expires articles)

    They do: "Retention is currently 3 years for de.*, 160 days for the Big
    8, 130 days for alt.* and 90 days for other hierarchies."

    but historically another really common reason for this
    pattern is that the very early article that's holding down the low water
    mark was crossposted to some other group (traditionally *.answers) with a much longer retention and the articles after it have expired.

    I don't think that would be the case here since I think they expire
    everything eventually, but that's another interesting case to handle. I
    know it's more efficient to symlink the same message in multiple
    newsgroups, but now I wonder if it would be better to just duplicate
    them so they can be handled individually...

    Note that the article count is not really useful to the news reader client under normal circumstances because the news reader often does not care in about how many *total* articles the group contains. If the user has been reading the group (the common case), the news reader really cares about
    how many *unread* articles the group has, and for that the article count
    is basically useless. The article count as returned by NNTP is pretty much only useful for groups that you have never read, or haven't read for so
    long that your read mark is below the low water mark.

    Yes, this also makes sense, so now I wonder why my client gets confused
    when this happens... I have a feeling it may not be doing the most
    intelligent thing but would be interesting to see if it has the same
    issue when the count is accurate.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Russ Allbery@eagle@eyrie.org to news.software.nntp on Fri May 1 19:44:42 2026
    From Newsgroup: news.software.nntp

    InterLinked <nntp@phreaknet.org> writes:
    On 5/1/2026 5:14 PM, Russ Allbery wrote:

    Well, I'm not sure I agree with the distinction you're making here,
    since the active file *is* a database. INN has a whole bunch of
    databases, some of which it stores as text files, but just because the
    format is a text file doesn't make it a database. INN definitely uses
    the active file like a database (hence the zero-padding).

    Sorry, to be clear, I meant database in the sense of something like
    SQLite or MySQL, not using a text file under direct control of the
    program as a store.

    So, this is tricky. My installation of INN doesn't use any databases in
    the sense of SQLite or MySQL. The spool is in CNFS, the overview is in tradindexed, and the history file is in hisv6, all of which are under the direct control of the program. But those are all binary structured file
    formats with capabilities that make some specific types of queries fast.

    A database is essentially just an abstraction over really good
    implementations of a bunch of complex data structures. This is sort of
    what I'm getting at in the overall discussion. Writing your own bespoke
    data structures is a good idea if your needs are extremely simple or
    extremely complicated (and specific to you), but there's a whole middle
    space where the thousands of hours someone else has put into a generic, highly-tuned implementation of those algorithms is probably better.

    People will definitely disagree over where those points are. And obviously writing your own bespoke stuff is fun, and to a large extent netnews is
    just a hobby at this point, so I'm all in favor of people having fun.

    Makes sense - I'm assuming the new mechanism is a database of each
    article, effectively, so you can just select the articles of interest?

    For ovsqlite, yes. For tradindexed, which is the backend I wrote many
    years ago, there's still a .overview file as before (although it has a different name), but alongside it there is a binary index that records
    some additional metadata (arrival time, for instance) and the offset and
    length in the data file for each article. That allows something similar. There's also another file that stores information about each group in a
    hash table that's written to disk.

    (You can see all the details in storage/tradindexed in the INN source
    tree. It should be pretty well-commented.)

    It's all very "I was in my 20s and was having a great deal of fun writing
    data structures for a real-world problem." :)

    Thanks, this is helpful perspective. I think I still need to sleep on
    this a bit but hearing about your experience here is really valuable.

    Honestly, I was really set on just using flat files before but there are
    some compelling reasons you've brought up. Maybe I'll abstract things in
    a way such that I can start with flat files and add a DB (SQLite or
    other) backend option later that could be used instead. I was trying to
    avoid that complexity but it might be worth it.

    I do want to say that by all means, do whatever makes you the most happy
    and feel free to ignore advice for what's the most maintainable or the
    least effort or whatever! Really at this point in netnews's history I
    think the most important thing is that people are having fun.

    If going the database route, I'm assuming you would just recommend
    SQLite for everything? I would guess a regular RDBMS like MariaDB would
    be overkill (and possibly cause issues if the server weren't local).

    SQLite is by far the easiest to use because it's just a library that
    stores its stuff in files on disk, which has lots of really nice
    properties and makes it very easy to set up and maintain (not entirely
    trivial, but easy). But it is going to be slow. I suspect that an actual database server that is properly tuned will be faster than SQLite. You may
    not care. No one has cared enough for INN to write such a backend.

    <group>/<article number> for article naming

    The last is often not used these days because it has a lot of poor
    performance properties.

    You mean one file per article in the spool?
    From the documentation, I thought the "tradspool" method in INN was the
    most common deployment.

    Right. It probably is still the most common deployment, and it has a lot
    of nice advantages, particularly for small servers. It's very human comprehensible without special tools, which is nice.

    However, tradspool is extremely hard on file systems and disks, so for a
    really large server it tends to be slow. It also forces a rather expensive expire process (deleting lots of articles is a lot of file system
    operations!) if you want to expire articles.

    For a news server that requires mininmum maintenance and can mostly just
    be ignored, I would recommend CNFS. That's what I use personally. You lose
    some control and visibility and it's a bad choice if you never want
    articles to expire, but it has the huge advantage that you'll never run
    out of disk space (the worst thing that happens is that things expire a
    bit faster), there's no expensive expire process, and it's really fast and light on resources.

    Yes, I skipped these since they're global and not "one entry per group"-
    are there any others of those that I missed?

    No, I think you got them all.

    Is the "LIST MODERATORS" file all that is involved in moderation? I
    didn't think there was any moderator info explicitly associated with
    each group.

    It's rather irrelevant these days, although it does let the client mail a submission to a moderated group directly, which in theory would actually
    be better in these days of spam filtering, DMARC, and similar problems
    with the email relay system. Not that any clients do this. :)

    Some files already use tab, which I don't *think* is allowed in any of
    the metadata to date? If it is, maybe a non-ASCII character like field separator would work.

    Tab is probably the best choice. Newsgroup descriptions should really be treated as full UTF-8 these days except for reserved characters, not that
    all software has been updated to do that.

    Adding a field is something to think about. It would be a problem for databases too, though there are various migration tools for extending schemas, at least, and I'll grant that's one area where databases win
    over plain text files. But regardless of the underlying format, I'd
    prefer to invest enough time in the design up front to hopefully not
    need any changes later. Since NNTP has been stable for quite a long time
    now, I think that's realistic, unless there are new extensions in the
    future which add more metadata - and I do have a few extensions in mind
    for later but none would modify the group metadata.

    Yeah, this is one of those classic design trade-offs. I pretty much always build extensibility into everything I do these days because I've been
    burned too many times, but you're not wrong about the unlikelihood of
    major NNTP changes.
    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Russ Allbery@eagle@eyrie.org to news.software.nntp on Fri May 1 19:48:18 2026
    From Newsgroup: news.software.nntp

    InterLinked <nntp@phreaknet.org> writes:

    Yes, this also makes sense, so now I wonder why my client gets confused
    when this happens... I have a feeling it may not be doing the most intelligent thing but would be interesting to see if it has the same
    issue when the count is accurate.

    Yeah, I *suspect* it's cancelled (or otherwise removed, such as via NoCeM) articles that are confusing it and it's correcting its unread count when
    it retrieves overview information, but I don't really know.
    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From ram@ram@zedat.fu-berlin.de (Stefan Ram) to news.software.nntp on Sat May 2 09:18:33 2026
    From Newsgroup: news.software.nntp

    InterLinked <nntp@phreaknet.org> wrote or quoted:
    Sorry, to be clear, I meant database in the sense of something like
    SQLite or MySQL, not using a text file under direct control of the
    program as a store.

    FWIW, I am aware of this definition by Ramez Elmasri:

    |A database is a collection of related data. By data, we mean
    |known facts that can be recorded and that have implicit meaning.
    Ramez Elmasri (2011).

    But what many people mean by "database" is a
    /data base management system/ (DBMS).


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From InterLinked@nntp@phreaknet.org to news.software.nntp on Sat May 2 10:18:14 2026
    From Newsgroup: news.software.nntp

    On 5/1/2026 10:44 PM, Russ Allbery wrote:
    SQLite is by far the easiest to use because it's just a library that
    stores its stuff in files on disk, which has lots of really nice
    properties and makes it very easy to set up and maintain (not entirely trivial, but easy). But it is going to be slow. I suspect that an actual database server that is properly tuned will be faster than SQLite. You may not care. No one has cared enough for INN to write such a backend.

    Good to know. I think I'll start with a traditional file implementation
    but leave the door open for allowing a DB implementation in the future.
    That would be "fun" to experiment with.

    For a news server that requires mininmum maintenance and can mostly just
    be ignored, I would recommend CNFS. That's what I use personally. You lose some control and visibility and it's a bad choice if you never want
    articles to expire, but it has the huge advantage that you'll never run
    out of disk space (the worst thing that happens is that things expire a
    bit faster), there's no expensive expire process, and it's really fast and light on resources.

    Interesting... CNFS has always seemed a bit "weird" to me - I see how it excels at certain properties, but not sure if I'm interested in
    supporting it myself. My plan is really to run two news servers myself,
    one in the "cloud", with expiration varying by group, and open to authenticated users, and one at home, for groups of interest, where
    articles never expire (which would function as an archive, but also be
    used by my local newsreader).

    CNFS seems to work well if you have a set size you want to dedicate per
    group, but not as efficient for small/empty groups, or if you want to
    expire by article count or age - maybe I'm missing something here though.

    I assume because the articles for a group are just in one big file,
    articles also have to be duplicated to multiple of these blogs when cross-posted?

    In any case, I'll just do "tradspool" for now but leave the door open to adding others later.

    It's rather irrelevant these days, although it does let the client mail a submission to a moderated group directly, which in theory would actually
    be better in these days of spam filtering, DMARC, and similar problems
    with the email relay system. Not that any clients do this. :)

    In what sense is it irrelevant? I hear people say moderated groups are
    dead, but I still subscribe to one moderated group, comp.dcom.telecom,
    though I thought the news server forwarded it to the moderator, not the
    client directly. Is the server not using the LIST MODERATORS data
    internally to send to moderator?

    Admittedly I need to learn more about how moderation works - I don't
    think I've seen it discussed much in any RFCs since it's implementation
    rather than protocol related. But I would imagine when a new group
    control message gets shared, it would have to contain moderation info,
    and dynamically update the moderator info at that point.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From InterLinked@nntp@phreaknet.org to news.software.nntp on Sat May 2 10:53:32 2026
    From Newsgroup: news.software.nntp

    On 5/2/2026 10:18 AM, InterLinked wrote:
    In any case, I'll just do "tradspool" for now but leave the door open to adding others later.

    Looking at the documentation for the different storage methods, and for traditional spool, I noticed:

    where "news/group/name" is the name of the newsgroup to which the article was posted with each period changed to a slash, and "nnnnn" is the sequence number of the article in that newsgroup

    So for "misc.test" there would be a subfolder "test" within a subfolder "misc", not just one subfolder "misc.test".

    I find this a bit curious, as in IMAP, subfolders work the other way - a folder that is logically a subfolder, "Parent > Sub" is typically named parent.sub in the root maildir, and all the folders are still siblings
    to each other on disk (except INBOX). Coming from more of an IMAP
    background, I would have intuited to just use the group name literally
    for the folder, but I'm guessing there's a good reason not to do this?

    I'd guess performance has something to do with why the hierarchy is
    actually a hierarchy on disk (a newsdir will probably be much, much
    larger than a maildir), rather than all groups being siblings in the
    root newsdir.

    Is that about it, or are there other considerations that give this
    method an advantage? For example, I can't think how this would make any particular operation more efficient - since I don't think "delete
    hierarchy" is a thing. Likewise, there's usually no need to actually
    scan over the contents of the root newsdir - that's why the active file exists. If anything, it might make group creation slightly less
    efficient, since you have to create the ancestors if they don't exist
    already, might end up with empty subfolders later if groups are deleted,
    etc.

    Was this just a historical convention, or are there any other compelling reasons to keep one method vs the other in a new system?
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Russ Allbery@eagle@eyrie.org to news.software.nntp on Sat May 2 09:04:38 2026
    From Newsgroup: news.software.nntp

    InterLinked <nntp@phreaknet.org> writes:
    On 5/1/2026 10:44 PM, Russ Allbery wrote:

    For a news server that requires mininmum maintenance and can mostly
    just be ignored, I would recommend CNFS. That's what I use personally.
    You lose some control and visibility and it's a bad choice if you never
    want articles to expire, but it has the huge advantage that you'll
    never run out of disk space (the worst thing that happens is that
    things expire a bit faster), there's no expensive expire process, and
    it's really fast and light on resources.

    Interesting... CNFS has always seemed a bit "weird" to me - I see how it excels at certain properties, but not sure if I'm interested in
    supporting it myself. My plan is really to run two news servers myself,
    one in the "cloud", with expiration varying by group, and open to authenticated users, and one at home, for groups of interest, where
    articles never expire (which would function as an archive, but also be
    used by my local newsreader).

    Yeah, and if you never want articles to expire, CNFS is a bad choice. It's ideal for transit-only servers, which used to be a thing and probably
    aren't as much any more because there's no point these days in having so
    large a server farm that you need to separate transit and reading servers unless you're one of the few sites still trying to have a go at a
    commercial Usenet service. I like it for small reading servers where you
    don't care about keeping things around forever and don't have any
    particular preferences on expiration other than "don't run out of disk
    space."

    CNFS seems to work well if you have a set size you want to dedicate per group, but not as efficient for small/empty groups, or if you want to
    expire by article count or age - maybe I'm missing something here
    though.

    I assume because the articles for a group are just in one big file,
    articles also have to be duplicated to multiple of these blogs when cross-posted?

    CNFS doesn't use one file per group. Well, it *can*, you can configure it
    all sorts of different ways, but the configuration that I use is one
    logical file for the whole server. (It's actually divided into several
    files, but for no good reason.) All the articles go into the same file,
    and when it rolls over the earliest articles start getting overwritten by
    order of arrival.

    So yes, it's complicated for fine-grained expiration control: You have to
    move the articles you want to have a different expiration for into their
    own CNFS files or into another storage backend like tradspool. When I was running a larger news server for more people, I had most groups in CNFS
    and local groups that we kept forever in tradspool.

    It's rather irrelevant these days, although it does let the client mail
    a submission to a moderated group directly, which in theory would
    actually be better in these days of spam filtering, DMARC, and similar
    problems with the email relay system. Not that any clients do this. :)

    In what sense is it irrelevant?

    In the sense that all the news servers send the message to the moderator directly and clients never use that file, and also in the sense that we're doing a much better job these days of getting all the moderator addresses
    into moderators.isc.org, so there's less need to have other rules than the default.

    My memory on this is very vague, but I could have sworn that there were
    some news readers that used this file to send mail to the moderator
    directly some thirty years or more ago. It certainly used to be the case
    that there were different moderator forwarding rules for different
    hierarchies and moderators.uu.net (as it was back then) was only usable
    for Big Eight groups.

    It's still useful on the server side if you have local moderated groups.
    But providing it to the client is basically pointless now.

    I hear people say moderated groups are dead, but I still subscribe to
    one moderated group, comp.dcom.telecom, though I thought the news server forwarded it to the moderator, not the client directly. Is the server
    not using the LIST MODERATORS data internally to send to moderator?

    No, it is, it's just that because it's handling that, there's no real
    reason for the client to care.

    I still moderate several groups and at least one of them has active
    traffic, so moderated groups as a concept are not dead.

    Admittedly I need to learn more about how moderation works - I don't
    think I've seen it discussed much in any RFCs since it's implementation rather than protocol related. But I would imagine when a new group
    control message gets shared, it would have to contain moderation info,
    and dynamically update the moderator info at that point.

    No, all the forwarding these days is handled by moderators.isc.org, and
    there's no place in a control message to document that.

    Moderation is a horrible cludge. You will probably be appalled. :) It's a design from another era, and we never completed the work we were hoping to
    do to try to make it less of a cludge, so it's very much something out of
    an earlier era of the Internet when spam didn't exist.

    RFC 5537 is the place to go for all the details on that side of how
    netnews works. There is indeed a protocol; it's just not part of NNTP. I
    try to collect all the relevant RFCs on a couple of web pages; you may
    find them useful:

    https://www.eyrie.org/~eagle/usefor/
    https://www.eyrie.org/~eagle/nntp/
    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Russ Allbery@eagle@eyrie.org to news.software.nntp on Sat May 2 09:08:23 2026
    From Newsgroup: news.software.nntp

    InterLinked <nntp@phreaknet.org> writes:
    On 5/2/2026 10:18 AM, InterLinked wrote:

    In any case, I'll just do "tradspool" for now but leave the door open
    to adding others later.

    Looking at the documentation for the different storage methods, and for traditional spool, I noticed:

    where "news/group/name" is the name of the newsgroup to which the
    article was posted with each period changed to a slash, and "nnnnn" is
    the sequence number of the article in that newsgroup

    So for "misc.test" there would be a subfolder "test" within a subfolder "misc", not just one subfolder "misc.test".

    I find this a bit curious, as in IMAP, subfolders work the other way - a folder that is logically a subfolder, "Parent > Sub" is typically named parent.sub in the root maildir, and all the folders are still siblings to each other on disk (except INBOX). Coming from more of an IMAP background,
    I would have intuited to just use the group name literally for the folder, but I'm guessing there's a good reason not to do this?

    You know, I have no idea why news servers do it this way. It does mean
    that you don't have a directory for every newsgroup at the top level of
    the spool, which was probably part of the reason since file systems traditionally had a lot of problems with directories containing lots of
    files, although I'm dubious the number of newsgroups would be less than
    the number of articles in the most active group.

    But that's just been the way tradspool has been organized from before I
    got on Usenet in 1993. So much so that it's even influenced Usenet group
    naming (rather controversially) with *.misc renaming back in the day.

    I suppose it also lets you move hierarchies to separate drives easily back
    when drives were small enough that you'd have to worry about the disk
    usage of a bunch of netnews articles. That's possibly still relevant for alt.binaries.* for anyone who carries it (although surely these days
    people mostly use CNFS or something similar for the binary groups).
    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@iulius@nom-de-mon-site.com.invalid to news.software.nntp on Sun May 3 23:48:53 2026
    From Newsgroup: news.software.nntp

    Hi InterLinked,

    For LIST COUNTS and GROUP, it pulls from group stats. However, in the response for LIST ACTIVE, it simply dumps the line from the active file
    as is.

    Indeed, and to be more precise, if you give a newsgroup name as an
    argument to LIST ACTIVE, this command will pull the information from the overview (like LIST COUNTS and GROUP).

    You then may end up with things like that for an empty newsgroup:

    GROUP trigofacile.test3
    211 0 8 7 trigofacile.test3

    LIST ACTIVE trigofacile.test3
    215 Newsgroups in form "group high low status"
    trigofacile.test3 0000000007 0000000008 y
    .

    LIST ACTIVE trigofacile.test3*
    215 Newsgroups in form "group high low status"
    trigofacile.test3 0000000008 0000000008 y
    .

    The "*" at the end of the last LIST ACTIVE command forces it to parse
    the active file to look for matching newsgroup names.



    I can't find any examples of newsgroups where the high water mark
    article is deleted, so it's hard to poke at this behavior

    You could just send an article to misc.test and cancel it to see the
    behaviour on a news server honouring such cancels.
    You'll see the reported high water mark do not decrease with an INN news server.
    --
    Julien |eLIE

    -2-aLe bonheur, c'est vouloir ce que l'on a.-a-+

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@iulius@nom-de-mon-site.com.invalid to news.software.nntp on Sun May 3 23:49:16 2026
    From Newsgroup: news.software.nntp

    Hi InterLinked,

    To make an extreme example, if a group with a lot of articles had all of them except the low water mark article deleted (and "last" is 3000), you could have a response like:

    211 3000 1 3000 misc.test

    [and the count has to be at least 3000, per the RFC, so we can't even
    have 211 1 1 3000 misc.test to indicate there are definite gaps]

    Where do you read in RFC 3977 that the estimate "has to be at least 3000"?

    The wording is:

    If the group is not empty, the estimate MUST be at least the actual
    number of articles available and MUST be no greater than one more
    than the difference between the reported low and high water marks.


    That brings me to another observation: I've noticed that most inactive newsgroups in INN return high 0 and low 1 (at least for those I've
    analyzed in responses for Usenet groups), which seemed odd to me as I
    would have thought INN would naturally take the <high> in the active
    file, and taking low = <high> and high = low - 1, return something more
    like 211 0 3000 2999 misc.test
    Is there a reason INN just uses 1/0 instead?

    I confirm what Russ said: high = low - 1 is what INN replies for empty newsgroups which formerly received at least one article.

    We even had a bug until recently as for versions prior to 2.7.1, INN
    returned low = high + 1 which was unfortunately wrong when high was
    2^31-1... A pretty rare case though :)
    It now returns high = low - 1 except of course when low is 0 (for
    newsgroups which have never received any article).
    You already spotted that as you referenced the related issue in the
    Github tracker :)
    --
    Julien |eLIE

    -2-aQuand vous avez des ennuis, les gens qui vous appellent par sympathie
    le font surtout pour avoir des d|-tails.-a-+ (Edgar Watson Howe)

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@iulius@nom-de-mon-site.com.invalid to news.software.nntp on Sun May 3 23:49:23 2026
    From Newsgroup: news.software.nntp

    Hi Russ,

    | o New articles may be added with article numbers greater than the
    | reported high water mark. (If an article that was the one with
    | the highest number has been removed and the high water mark has
    | been adjusted accordingly, the next new article will not have the
    | number one greater than the reported high water mark.)

    To me, this implies the high water mark can (even "should") decrease when
    the high water mark article is removed - in which case, the next article
    assigned would indeed not have high + 1 (similar to how UIDs and UIDNEXT
    work in IMAP).

    So, I have to admit that I don't recall this explicitly coming up during
    the RFC discussions, so I don't have a definitive answer for you about why
    we worded it this way. I think if we'd noticed this at the time, we would have been a bit clearer about what clients should expect, so I think there
    is a (minor) bug in the standard here.

    "this implies the high water mark can (even "should") decrease" is
    already said in RFC 3977 a few lines after the above bullet point:

    the reported low water mark in the response MUST be no less than that
    in any previous response for that newsgroup in this session, and it
    SHOULD be no less than that in any previous response for that
    newsgroup ever sent to any client.
    [...]
    No similar assumption can be made about the high water mark, as this
    can decrease if an article is removed and then increase again if it
    is reinstated or if new articles arrive.


    The RFC states it "can" decrease.


    There's also the use case of a slave server which does not compute
    itself the article number to use as it relies on the one provided in the
    Xref header field of the received article. Thus it can receive article
    number 12 followed by article number 15, which is not 12 plus 1. It
    could explain the wording that "new articles may be added with article
    numbers greater than the reported high water mark" but indeed that use
    case is not described in the parenthesis.
    --
    Julien |eLIE

    -2-aSi l'amour est aveugle, il faut palper.-a-+

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From InterLinked@nntp@phreaknet.org to news.software.nntp on Sun May 3 18:17:11 2026
    From Newsgroup: news.software.nntp

    On 5/3/2026 5:49 PM, Julien |eLIE wrote:
    Hi InterLinked,

    To make an extreme example, if a group with a lot of articles had all
    of them except the low water mark article deleted (and "last" is
    3000), you could have a response like:

    211 3000 1 3000 misc.test

    [and the count has to be at least 3000, per the RFC, so we can't even
    have 211 1 1 3000 misc.test to indicate there are definite gaps]

    Where do you read in RFC 3977 that the estimate "has to be at least 3000"?

    The wording is:

    -a-a If the group is not empty, the estimate MUST be at least the actual
    -a-a number of articles available and MUST be no greater than one more
    -a-a than the difference between the reported low and high water marks.

    That was what I was looking at, but I don't think my brain was working
    when I read that, disregard :)

    That brings me to another observation: I've noticed that most inactive
    newsgroups in INN return high 0 and low 1 (at least for those I've
    analyzed in responses for Usenet groups), which seemed odd to me as I
    would have thought INN would naturally take the <high> in the active
    file, and taking low = <high> and high = low - 1, return something
    more like 211 0 3000 2999 misc.test
    Is there a reason INN just uses 1/0 instead?

    I confirm what Russ said: high = low - 1 is what INN replies for empty newsgroups which formerly received at least one article.

    Isn't this also true for empty newsgroups which have never received an
    article either? Per my earlier comment about seeing a bunch of low=1 and high=0 in the LIST ACTIVE response from Eternal September, e.g.:

    LIST ACTIVE comp.dcom*
    215 Newsgroups in form "group high low status"
    comp.dcom.cabling 0000000000 0000000001 y
    comp.dcom.cell-relay 0000000000 0000000001 y
    comp.dcom.fax 0000000000 0000000001 y
    comp.dcom.isdn.capi 0000000000 0000000001 y
    comp.dcom.lans.ethernet 0000000000 0000000001 y
    comp.dcom.lans.misc 0000000000 0000000001 y

    We even had a bug until recently as for versions prior to 2.7.1, INN returned low = high + 1 which was unfortunately wrong when high was 2^31-1...-a A pretty rare case though :)
    It now returns high = low - 1 except of course when low is 0 (for
    newsgroups which have never received any article).
    You already spotted that as you referenced the related issue in the
    Github tracker :)

    Yes, in fact, I found both the issue and the fix to be very helpful (as
    well as the submitted erratum, the rejection to which was not very clear
    to me initially) as I was thinking about this, prior to my initial post.
    Had I not seen that, I would have gone ahead and made the same "mistake"
    of doing LOW = HIGH + 1 (in fact, I had already started to do that).

    The "wrong" way seemed more natural to me, because you can then set LOW
    = HIGH + 1 and not have to worry about adjusting LOW when the next
    article is assigned. Setting LOW = HIGH (well, LOW = LAST, to be
    specific) and then HIGH = LOW - 1 isn't really intuitive.

    To confirm my own understanding, the only reason we do LOW = LAST (which
    is the same as LOW = HIGH in INN) and then HIGH = LOW + 1, rather than
    LOW = HIGH + 1, is to account for overflow when LAST/HIGH is the max
    article number?

    Circling back to a previous point that it's ideal to set the low water
    mark as high as "legally" valid at any given point, the LOW = HIGH + 1
    method also has the advantage of being one higher than the other way,
    which you pointed out in the erratum. I kind of wonder if it would be
    valid to do it this way, except in the case that HIGH is the max article number (which seems unlikely to happen often, and when it does, the
    group is saturated anyways). Not that I'm planning to do that, but maybe
    that will help me understand something else I missed.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From InterLinked@nntp@phreaknet.org to news.software.nntp on Sun May 3 18:22:56 2026
    From Newsgroup: news.software.nntp

    On 5/3/2026 5:48 PM, Julien |eLIE wrote:
    Hi InterLinked,

    For LIST COUNTS and GROUP, it pulls from group stats. However, in the
    response for LIST ACTIVE, it simply dumps the line from the active
    file as is.

    Indeed, and to be more precise, if you give a newsgroup name as an
    argument to LIST ACTIVE, this command will pull the information from the overview (like LIST COUNTS and GROUP).

    You then may end up with things like that for an empty newsgroup:

    GROUP trigofacile.test3
    211 0 8 7 trigofacile.test3

    LIST ACTIVE trigofacile.test3
    215 Newsgroups in form "group high low status"
    trigofacile.test3 0000000007 0000000008 y
    .

    LIST ACTIVE trigofacile.test3*
    215 Newsgroups in form "group high low status"
    trigofacile.test3 0000000008 0000000008 y
    .

    The "*" at the end of the last LIST ACTIVE command forces it to parse
    the active file to look for matching newsgroup names.

    Thanks, I do remember noticing that when reading the code (optimization
    for single group).

    And this is because, I take it, the overview database is less up to date
    than the active file, as far as the water marks go? (Or perhaps vice
    versa, I don't think I figured out which is more up to date).

    Also, side question, why is it called the "overview database"? It seems
    like OVDB is mainly used to satisfy responses for GROUP and LIST ACTIVE
    with a single group as an argument. Yet, "overview" also traditionally
    refers to the overfile per-group file with a line for each message,
    which stores the 8 (or more) headers used in the XOVER/OVER responses. I
    don't think there is a connection between the two, is there?

    Sometimes I also see it referred to as "group stats" like you said,
    which seems like a clearer term for what it is, but they seem to interchangeable.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Russ Allbery@eagle@eyrie.org to news.software.nntp on Sun May 3 16:09:17 2026
    From Newsgroup: news.software.nntp

    InterLinked <nntp@phreaknet.org> writes:

    Also, side question, why is it called the "overview database"? It seems
    like OVDB is mainly used to satisfy responses for GROUP and LIST ACTIVE
    with a single group as an argument. Yet, "overview" also traditionally
    refers to the overfile per-group file with a line for each message,
    which stores the 8 (or more) headers used in the XOVER/OVER responses. I don't think there is a connection between the two, is there?

    No, that's the primary purpose of the overview database: answering OVER queries. In order to answer those queries, it turns out to also have the
    most accurate information for GROUP (and LIST ACTIVE for a single group),
    so it's also used for those purposes. But it was originally written for overview information.

    Sometimes I also see it referred to as "group stats" like you said,
    which seems like a clearer term for what it is, but they seem to interchangeable.

    That's just one thing that's stored in the overview database.
    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@iulius@nom-de-mon-site.com.invalid to news.software.nntp on Mon May 4 22:52:48 2026
    From Newsgroup: news.software.nntp

    Hi InterLinked,

    high = low - 1 is what INN replies for empty
    newsgroups which formerly received at least one article.

    Isn't this also true for empty newsgroups which have never received an article either? low=1 and high=0

    When the newsgroup has never received an article, I assume the concept
    of "low water mark" does not exist as there hasn't been any first
    article. But yes, if you consider low=1 in that case, the formula is
    the same.
    Maybe the ideal would be to advertise low=0 and high=0 in that case
    (allowed by RFC 3977 to represent an empty newsgroup), which would differentiate a newsgroup which has never received any article from
    another one which has received only 1 article and is now empty.
    Well, nobody matters but it would make sense :)


    To confirm my own understanding, the only reason we do LOW = LAST (which
    is the same as LOW = HIGH in INN) and then HIGH = LOW + 1, rather than
    LOW = HIGH + 1, is to account for overflow when LAST/HIGH is the max
    article number?

    I don't know whether that were the reason for the formula but yes, at
    least it works with the max article number!


    the LOW = HIGH + 1
    method also has the advantage of being one higher than the other way,
    which you pointed out in the erratum. I kind of wonder if it would be
    valid to do it this way, except in the case that HIGH is the max article number

    Yes, it is valid. It respects the rule that "the high water mark will
    be one less than the low water mark", and when HIGH is the max article
    number, you could use LOW = 2^31-1 and HIGH = LOW - 1 (the preferred way
    per RFC 3977, as a SHOULD) or LOW = HIGH = 2^31-1 (an alternative way).
    --
    Julien |eLIE

    -2-aLe bonheur, c'est vouloir ce que l'on a.-a-+

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From InterLinked@nntp@phreaknet.org to news.software.nntp on Mon May 4 18:30:07 2026
    From Newsgroup: news.software.nntp

    On 5/4/2026 4:52 PM, Julien |eLIE wrote:
    Hi InterLinked,

    high = low - 1 is what INN replies for empty newsgroups which
    formerly received at least one article.

    Just curious here - what's the rationale behind this, exactly?

    Earlier Russ mentioned that *ideally*, you would want to provide as much information as possible. For a group with articles formerly, it seems
    that would be:

    * Use the last article number for the low water mark
    * Set high to low - 1

    Since INN doesn't support reinstating articles, there is no downside to advertising that as the low water mark, as it would only increase from
    there. If I had to guess, is this because in INN, when a group is empty,
    the high water mark is not present in overview and it would have to
    check the active file, so 1/0 is used for efficiency?

    (Or unless the last article number is the max allowed article number,
    even the old way of just doing high = last and low = high + 1 seems to
    be legal as well. Actually, that reminds me what it was about the
    erratum I didn't understand - a comment about server synchronization and
    how the low water mark a client reads might decrease in this scenario.
    Is anyone able to explain how that might happen?)

    Also, this answers a previous question I had about seeing a bunch of
    groups with 1/0. Now I know that they indeed had articles at some point, because the response is 1/0, I have absolutely no information as to how
    many articles the groups have had before going inactive.

    Isn't this also true for empty newsgroups which have never received an
    article either? low=1 and high=0

    When the newsgroup has never received an article, I assume the concept
    of "low water mark" does not exist as there hasn't been any first
    article.-a But yes, if you consider low=1 in that case, the formula is
    the same.
    Maybe the ideal would be to advertise low=0 and high=0 in that case
    (allowed by RFC 3977 to represent an empty newsgroup), which would differentiate a newsgroup which has never received any article from
    another one which has received only 1 article and is now empty.
    Well, nobody matters but it would make sense :)

    Actually, it's a good idea. It provides a newsreader with "more"
    information than simply doing low=1/high=0 in both cases. Not that
    software would know the difference / treat the cases differently, but a
    human looking at group info would.

    I did find it curious that in several places in INN, there are checks
    like this one:

    if (!count) {
    if (!low) low++;
    high = low - 1;
    }

    I guess INN explicitly wants to make empty groups low=1/high=0 instead
    of low=0/high=0. I think it could just as well be:

    if (!count && low) high = low - 1

    It seems to me ideally, a response of 0/0 on an empty group with no
    articles, and low=last/high=low-1 would provide maximal information. In
    that situation, low=1/high=0 would only occur in a group that only ever
    had one article, which has since expired.

    Not expecting INN to change, of course, but I think I might do it this
    way, as I would like to be as accurate as possible and provide as much "information" as possible in a response.

    To confirm my own understanding, the only reason we do LOW = LAST
    (which is the same as LOW = HIGH in INN) and then HIGH = LOW + 1,
    rather than LOW = HIGH + 1, is to account for overflow when LAST/HIGH
    is the max article number?

    I don't know whether that were the reason for the formula but yes, at
    least it works with the max article number!


    the LOW = HIGH + 1 method also has the advantage of being one higher
    than the other way, which you pointed out in the erratum. I kind of
    wonder if it would be valid to do it this way, except in the case that
    HIGH is the max article number

    Yes, it is valid.-a It respects the rule that "the high water mark will
    be one less than the low water mark",

    To clarify, I was talking about doing LOW = LAST + 1 normally, and LOW =
    LAST, just for 2^31-1 (HIGH = LOW - 1 in both cases).

    The effect of this would be that it would be that the responses for an
    empty group where LAST = 2^31-1 and LAST = 2^31-2 would not be distinguishable. But again, the group is toast at that point so I'm not
    sure if it really matters. And it would provide the advantage of being
    able to have a low water mark that is one higher in all other cases, and
    thus provides more meaning.

    and when HIGH is the max article
    number, you could use LOW = 2^31-1 and HIGH = LOW - 1 (the preferred way
    per RFC 3977, as a SHOULD) or LOW = HIGH = 2^31-1 (an alternative way).

    I'm a bit confused on this last point. It's valid to merely set low=high=2^31-1 to indicate a group is empty?

    Is this by chance somehow the 3rd case in RFC 3977 6.1.1.2 for methods
    of representing an empty group? That last case never made any sense to
    me (high >= low and count can be anything), as that seems like it could
    easily happen in non-empty groups. Maybe if it required count be 0, that
    would be one thing, but I'm very puzzled by that qualifier - what cases (presumably) existed historically that resulted in a wording that an
    empty group could have a non-empty article count, and high >= low? I'm
    not really sure how I would tell if the group is empty or not.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@iulius@nom-de-mon-site.com.invalid to news.software.nntp on Tue May 5 01:00:51 2026
    From Newsgroup: news.software.nntp

    Hi InterLinked,

    Actually, that reminds me what it was about the
    erratum I didn't understand - a comment about server synchronization and
    how the low water mark a client reads might decrease in this scenario.
    Is anyone able to explain how that might happen?

    Looking at the erratum:
    "The high water mark is one less than the low water mark for empty
    newsgroups. A major reason for doing it this way was to deal with
    clusters of servers. If they're not perfectly synchronized, then
    a cancel might be visible on one and not another. So if you connect
    to the second one, it looks as if the article has been reinstated.
    Wording it like this meant we didn't need special treatment of such
    clusters. The low water mark cannot decrease."

    If a newsgroup has only article number 12, and this article is cancelled
    in cluster A a few seconds before it is in cluster B, a newsreader
    connecting to cluster A will see low water mark = 13, high water mark =
    12 (empty newsgroup with low = high + 1) and if it disconnects and
    reconnects this time associated to cluster B before the cancel is
    executed, it will see low water mark = high water mark = 12, thus having decreased.
    When the high = low - 1 formula is used, it sees low water mark = 12 and
    high water mark = 11 on cluster A. The low water mark does not decrease.


    Anyway, I agree that the problem is present in non-empty newsgroups if
    the low water mark is updated on the fly. If cluster A has article 13,
    and cluster B has articles 12 and 13, the low water mark will be
    inferior when connecting to cluster B...


    I guess INN explicitly wants to make empty groups low=1/high=0 instead
    of low=0/high=0.

    Because low=1/high=0 is the preferred way per RFC 3977, mentioned as a
    SHOULD.


    Not expecting INN to change, of course, but I think I might do it this
    way, as I would like to be as accurate as possible and provide as much "information" as possible in a response.

    You could do that if you prefer. Feel free :)


    I'm a bit confused on this last point. It's valid to merely set low=high=2^31-1 to indicate a group is empty?
    Is this by chance somehow the 3rd case in RFC 3977 6.1.1.2 for methods
    of representing an empty group?

    Yes, it is the third alternative allowed by RFC 3977, and I totally
    agree it follows the same rule as a non-empty newsgroup. Very liberal :)

    o The high water mark is greater than or equal to the low water
    mark. The estimated article count might be zero or non-zero; if
    it is non-zero, the same requirements apply as for a non-empty
    group.
    --
    Julien |eLIE

    -2-aLe caf|- est un breuvage qui fait dormir quand on n'en prend pas.-a-+
    (Alphonse Allais)

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From InterLinked@nntp@phreaknet.org to news.software.nntp on Mon May 4 19:40:06 2026
    From Newsgroup: news.software.nntp

    On 5/4/2026 7:00 PM, Julien |eLIE wrote:
    Looking at the erratum:
    "The high water mark is one less than the low water mark for empty newsgroups. A major reason for doing it this way was to deal with
    clusters of servers. If they're not perfectly synchronized, then
    a cancel might be visible on one and not another. So if you connect
    to the second one, it looks as if the article has been reinstated.
    Wording it like this meant we didn't need special treatment of such
    clusters. The low water mark cannot decrease."

    If a newsgroup has only article number 12, and this article is cancelled
    in cluster A a few seconds before it is in cluster B, a newsreader connecting to cluster A will see low water mark = 13, high water mark =
    12 (empty newsgroup with low = high + 1) and if it disconnects and reconnects this time associated to cluster B before the cancel is
    executed, it will see low water mark = high water mark = 12, thus having decreased.
    When the high = low - 1 formula is used, it sees low water mark = 12 and high water mark = 11 on cluster A.-a The low water mark does not decrease.

    But doesn't that still break if there are multiple cancels during that
    period? Say the group had articles 11 and 12, and both get cancelled.
    Now the low water mark is either 12 or 13, depending on the
    implementation. However, you connect to a server that hasn't processed
    either cancel yet, and now the low water mark is 11 again.

    I think I understand the scenario, but it seems that doesn't entirely
    solve the problem either, just makes it less likely.

    Anyway, I agree that the problem is present in non-empty newsgroups if
    the low water mark is updated on the fly.-a If cluster A has article 13,
    and cluster B has articles 12 and 13, the low water mark will be
    inferior when connecting to cluster B...

    Yes, I think that's sort of the same scenario I was thinking above. It
    doesn't even matter whether the group is empty. So the reason for
    rejection in the erratum doesn't even hold muster, as even the
    "official" way of doing it *can* theoretically break.

    Initially I was doing LOW = LAST + 1 and then changed to LOW = LAST
    simply because INN had, but now that I understand this a bit better, I
    think I might change back to LOW = LAST + 1 and just handle 2^31-1
    specially to prevent an illegal response (and also use 0 0 0 for an
    empty group that never had any articles).

    I'm a bit confused on this last point. It's valid to merely set
    low=high=2^31-1 to indicate a group is empty?
    Is this by chance somehow the 3rd case in RFC 3977 6.1.1.2 for methods
    of representing an empty group?

    Yes, it is the third alternative allowed by RFC 3977, and I totally
    agree it follows the same rule as a non-empty newsgroup.-a Very liberal :)

    -a-a o-a The high water mark is greater than or equal to the low water
    -a-a-a-a-a mark.-a The estimated article count might be zero or non-zero; if
    -a-a-a-a-a it is non-zero, the same requirements apply as for a non-empty
    -a-a-a-a-a group.


    Aside from 2^31-1, is there ever a case where one would use this?

    I'm still having trouble seeing why case 3 is even necessary. Wouldn't
    this be a legal sequence, in a world where LOW = LAST + 1 (the way INN
    used to do it):

    Article 2147483646 assigned, and then deleted:
    LAST=2147483646
    LOW=2147483647
    HIGH=2147483646

    Article 2147483647 assigned, and then deleted (so now group is full): LAST=2147483647
    LOW=2147483647 (floored at LAST, rather than LAST + 1, only in this case) HIGH=2147483646

    So the response in these two cases is actually identical; the client
    can't tell them apart. But the response is still legal, since the low
    water mark has not decreased, and HIGH is still LOW - 1. So if we can do
    this, why bother with case 3 and set both LOW and HIGH to 2147483647? Presumably something behaved this way historically, just can't fathom why...

    There is obviously loss of information in that the client can't tell
    these two cases apart. However, in INN, a client also can't tell apart a
    group that has never had any articles, and a group that had one article
    which expired, since in both cases LOW=1 and HIGH=0, and that is legal
    as well.

    So if I understand correctly, I believe this approach provides "maximal" information to a client, while remaining fully legal:
    1) If a group has only ever been empty, respond LOW=HIGH=0 (case 2 in RFC)
    2) If LAST < 2147483647, respond LOW=LAST+1 and HIGH=LOW-1 (case 1, the
    way INN used to)
    3) If LAST = 2147483647, respond LOW=LAST and HIGH=LOW-1 (case 1, the
    way INN does now, and preferred by the RFC, though for somewhat
    unsatisfying reasons)

    The benefit of adding step #2 is that in most cases, we can provide a
    more accurate low water mark - as you pointed out in the erratum.

    Only in case 3 is the client unsure of a piece of information (whether
    LAST is 2147483646 or 2147483647), and this is arguably the least
    important case anyways.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@iulius@nom-de-mon-site.com.invalid to news.software.nntp on Tue May 5 21:39:36 2026
    From Newsgroup: news.software.nntp

    Hi InterLinked,

    But doesn't that still break if there are multiple cancels during
    that period? Even the "official" way of doing it *can* theoretically
    break.

    Yes, it seems so indeed.


    Initially I was doing LOW = LAST + 1 and then changed to LOW = LAST
    simply because INN had, but now that I understand this a bit better, I
    think I might change back to LOW = LAST + 1 and just handle 2^31-1
    specially to prevent an illegal response (and also use 0 0 0 for an
    empty group that never had any articles).

    It would work.


    Aside from 2^31-1, is there ever a case where one would use this?

    As you speak about 2^31-1, I would like to tell that you should handle
    2^64-1 article numbers by design. INN unfortunately does not, with tons
    of variables limited to that size.
    The idea is that a modern implementation should handle large article
    numbers, advertise it with the MAXARTNUM capability (not standardized),
    do not return large article numbers so as not to choke clients (last
    time I checked, Thunderbird froze with such large numbers), but return
    large article numbers if the client says it copes with them.
    By configuration, if instructed to do so, the server could use large
    article numbers even if the client does not use the MAXARTNUM capability.

    I once proposed in this newsgroup how it could be done:
    https://groups.google.com/g/news.software.nntp/c/4_KjHu9GlBg/

    Some news clients implemented it (at least flnews and tin) as a proof of concept.

    Just to let you know of that as you seem to be interested in the subject :)


    Presumably something behaved this way historically, just can't fathom
    why...

    There were lots of different and exotic NNTP implementations at that
    time, and the RFC did its best not to declare them uncompliant as Russ explained.


    There is obviously loss of information in that the client can't tell
    these two cases apart.

    Sure, there is loss of information but I bet few people care about that.
    Newsreaders don't advertise differently an empty newsgroup which never received any article and an empty newsgroup which once received an article.

    If you care, that's fine, and have fun with your implementation :)
    --
    Julien |eLIE

    -2-aMes opinions ont peut-|-tre chang|-, mais pas le fait que j'ai raison.-a-+
    (Ashleigh Brilliant)

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From InterLinked@nntp@phreaknet.org to news.software.nntp on Sat May 9 10:56:50 2026
    From Newsgroup: news.software.nntp

    On 5/1/2026 5:14 PM, Russ Allbery wrote:
    There are a few other ones that aren't as widely used and are arguably configuration instead, but that do need to be queryable. They're probably fine as configuration files with some in-memory representation, though,
    since they're usually very small.

    LIST DISTRIB.PATS
    LIST MODERATORS
    LIST OVERVIEW.FMT

    Question about LIST DISTRIB.PATS - is Distribution widely used anymore
    in practice? I noticed that Eternal September responds with just this:

    10:local.*:local

    ... which makes me think they just don't care so respond with something simple. I would think a lot of effort would have to go into setting this
    up so it would be meaningful and useful.

    I'm wondering if maybe this is because clients never caught on to using
    it so that's why they configured it that way. I wasn't really paying
    attention before but I also don't recall seeing this header much these
    days. Are there any compelling reasons to respond otherwise, for either today's Usenet or local groups? And if someone is just going to respond
    with that, is it better to have a simple LIST DISTRIB.PATS response like
    that or just not support the category at all so as not to mislead the
    client into thinking it has useful information to provide?

    LIST MODERATORS I could see being non-trivial if you had local groups
    that were moderated, and LIST OVERVIEW.FMT depending on the overview
    file format; I'm less sure about this one.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Russ Allbery@eagle@eyrie.org to news.software.nntp on Sat May 9 10:29:29 2026
    From Newsgroup: news.software.nntp

    InterLinked <nntp@phreaknet.org> writes:
    On 5/1/2026 5:14 PM, Russ Allbery wrote:

    There are a few other ones that aren't as widely used and are arguably
    configuration instead, but that do need to be queryable. They're
    probably fine as configuration files with some in-memory
    representation, though, since they're usually very small.

    LIST DISTRIB.PATS
    LIST MODERATORS
    LIST OVERVIEW.FMT

    Question about LIST DISTRIB.PATS - is Distribution widely used anymore in practice?

    Yes, it's used pretty extensively for private hierarchies to control distribution of articles that aren't intended to be propagated beyond the participating servers.

    I'm wondering if maybe this is because clients never caught on to using
    it so that's why they configured it that way.

    As with LIST MODERATORS, it's more of an FYI to the client. The server
    will add the Distribution header on POST.
    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From InterLinked@nntp@phreaknet.org to news.software.nntp on Sat May 9 15:25:11 2026
    From Newsgroup: news.software.nntp

    On 5/9/2026 1:29 PM, Russ Allbery wrote:
    InterLinked <nntp@phreaknet.org> writes:
    On 5/1/2026 5:14 PM, Russ Allbery wrote:

    There are a few other ones that aren't as widely used and are arguably
    configuration instead, but that do need to be queryable. They're
    probably fine as configuration files with some in-memory
    representation, though, since they're usually very small.

    LIST DISTRIB.PATS
    LIST MODERATORS
    LIST OVERVIEW.FMT

    Question about LIST DISTRIB.PATS - is Distribution widely used anymore in
    practice?

    Yes, it's used pretty extensively for private hierarchies to control distribution of articles that aren't intended to be propagated beyond the participating servers.

    For private hierarchies, couldn't the incoming/outgoing feeds be
    configured not to feed such groups to other servers not carrying the hierarchy? e.g.

    *,!local.*

    At least if I were setting up a private hierarchy, that's all I would
    think to do. What does the Distribution header allow for in this case
    that can't be done at the feed/group level? (One thought: perhaps an additional layer of protection to prevent propagation if one of the
    other servers is not appropriately configured?)

    RFC 1036 2.2.7 provides an example (which I know is obsolete, but I
    assume the section on Distribution is still accurate, and RFC 5536 lacks detail in comparison). The example seems to show a kind of filtering
    that is not purely per-group (Distribution: nj,ny) and that makes a bit
    more sense to me, but only in the context of non-local groups that would normally go to a wide audience, e.g. all of Usenet. But if the server
    adds the Distribution header purely based on the Newsgroups header, then
    it seems kind of redundant to me (at least in a world where all servers
    are configured as they should be).

    It also seems that all servers would need to support the distributions
    for things to work as intended. Is ensuring they exist everywhere they
    need to be purely a manual process?
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Russ Allbery@eagle@eyrie.org to news.software.nntp on Sat May 9 13:27:10 2026
    From Newsgroup: news.software.nntp

    InterLinked <nntp@phreaknet.org> writes:

    For private hierarchies, couldn't the incoming/outgoing feeds be
    configured not to feed such groups to other servers not carrying the hierarchy? e.g.

    *,!local.*

    The above doesn't work properly due to crossposting.

    It's possible to use @ wildcards carefully along with rejection patterns
    in incoming.conf, but there are some caveats and it's relatively easy to
    make a mistake. Distributions are somewhat simpler. The recommendation is
    to use all of the mechanisms for effective defense in depth against misconfigurations.

    See:

    https://www.eyrie.org/~eagle/faqs/soundness-inn.html

    (This hierarchy is defunct, but the same technique is still in use.)
    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From InterLinked@nntp@phreaknet.org to news.software.nntp on Fri May 15 21:31:22 2026
    From Newsgroup: news.software.nntp

    On 5/2/2026 12:04 PM, Russ Allbery wrote:
    No, all the forwarding these days is handled by moderators.isc.org, and there's no place in a control message to document that.

    Moderation is a horrible cludge. You will probably be appalled. :) It's a design from another era, and we never completed the work we were hoping to
    do to try to make it less of a cludge

    Was this earlier work a mechanism for automatically distributing
    moderation information using control messages, or something else?

    so it's very much something out of
    an earlier era of the Internet when spam didn't exist.

    So far, I do have one question, more of a technicality, from reading RFC
    6048 2.4.3.

    Because %s changes the periods in a group name to dashes, the RFC warns
    that groups differing only by periods/dashes would have identical
    submission templates if only %s is used. In this case, the RFC says
    "pattern template cannot be used... for these groups... explicit entries without a pattern will be required".

    Since that sounds pretty definite, I'm wondering if that implies that %s
    can only appear in the user part by itself or not (at least, the
    examples in the RFC all have it by itself). The RFC never says %s has to
    be the sole user part, so for example, is this legal?

    local.*:prefix+%s@news.example.com

    For example, local.foo would go to prefix+local-foo@isc.moderators.org

    Is this legal? I feel like it would be, but the wording in the RFC that
    says that explicit entries can't be used makes me wonder if this isn't.

    To distinguish between local.foo.bar and local.foo-bar, for example, you
    could have:

    local.*.*:period+%s@news.example.com
    local.*-*:dash+%s@news.example.com

    Bizarre submission template naming? Absolutely. (And this is a simple
    example, I realize if the hierarchy were deeper than three levels in
    this example, there could again be ambiguities.) But here, for a certain
    set of similarly named groups, you would only need two patterns instead
    as many entries as you had groups. Is this legal, and the RFC is just misleading when it says explicit entries are required?

    (A more practical example; one might want to use addresses like newsmoderator+%s@news.example.com, so the whole domain's address space
    is not reserved for moderation.)

    Also, a second question, I noticed in the LIST MODERATORS output from
    Eternal September, comp.std.c++ has its own entry, going to
    std-cpp-submit@...

    I can't recall any other groups with + in the name; does this exception
    imply that '+' isn't allowed somewhere along the process for submission templates or isc.moderators.org, or is this just a coincidence?
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Russ Allbery@eagle@eyrie.org to news.software.nntp on Fri May 15 18:37:36 2026
    From Newsgroup: news.software.nntp

    InterLinked <nntp@phreaknet.org> writes:
    On 5/2/2026 12:04 PM, Russ Allbery wrote:

    No, all the forwarding these days is handled by moderators.isc.org, and
    there's no place in a control message to document that. Moderation is a
    horrible cludge. You will probably be appalled. :) It's a design from
    another era, and we never completed the work we were hoping to do to
    try to make it less of a cludge

    Was this earlier work a mechanism for automatically distributing
    moderation information using control messages, or something else?

    We were hoping to standardize cryptographic signatures by moderators
    (PGPMoose) and an encapsulation format for conveying messages to
    moderators instead of intermixing mail and news in a way that causes tons
    of problems for spam filtering.

    Because %s changes the periods in a group name to dashes, the RFC warns
    that groups differing only by periods/dashes would have identical
    submission templates if only %s is used. In this case, the RFC says
    "pattern template cannot be used... for these groups... explicit entries without a pattern will be required".

    Since that sounds pretty definite, I'm wondering if that implies that %s
    can only appear in the user part by itself or not (at least, the examples
    in the RFC all have it by itself). The RFC never says %s has to be the
    sole user part, so for example, is this legal?

    local.*:prefix+%s@news.example.com

    I think that would be fine.

    Is this legal? I feel like it would be, but the wording in the RFC that
    says that explicit entries can't be used makes me wonder if this isn't.

    To distinguish between local.foo.bar and local.foo-bar, for example, you could have:

    local.*.*:period+%s@news.example.com
    local.*-*:dash+%s@news.example.com

    I think we just didn't think of that. :) I don't see any obvious reason
    why that wouldn't be legal.

    Also, a second question, I noticed in the LIST MODERATORS output from
    Eternal September, comp.std.c++ has its own entry, going to std-cpp-submit@...

    I can't recall any other groups with + in the name; does this exception
    imply that '+' isn't allowed somewhere along the process for submission templates or isc.moderators.org, or is this just a coincidence?

    Oh, interesting. I suspect that's working around a problem that + gets a special interpretation in a lot of email systems and maybe that was
    causing some sort of problem.
    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From InterLinked@nntp@phreaknet.org to news.software.nntp on Fri May 15 21:39:27 2026
    From Newsgroup: news.software.nntp

    On 5/9/2026 4:27 PM, Russ Allbery wrote:
    InterLinked <nntp@phreaknet.org> writes:

    For private hierarchies, couldn't the incoming/outgoing feeds be
    configured not to feed such groups to other servers not carrying the
    hierarchy? e.g.

    *,!local.*

    The above doesn't work properly due to crossposting.

    Ah, I see, if a group were posted to local.foo and comp.foo, then it
    would still get shared out to Usenet, despite this rule (thus leaking
    the private post).

    It's possible to use @ wildcards carefully along with rejection patterns
    in incoming.conf, but there are some caveats and it's relatively easy to
    make a mistake. Distributions are somewhat simpler. The recommendation is
    to use all of the mechanisms for effective defense in depth against misconfigurations.

    Is the idea here that since a distribution is once per-message (which
    could have multiple newsgroups, both local and non-local), adding the Distribution prevents posts from going to other servers if any non-local
    group is one of the newsgroups of a post?

    For example, as soon as local.foo is seen, a distribution gets added
    marking which would then prevent the message from going to Usenet, even
    if it includes groups that, had they been the sole newsgroup of a post,
    would have gone to Usenet?

    Although in this simple example, the cross-posted Usenet groups would
    never reach Usenet, so from what I can tell, this only protects against "posting accidents" since a user wouldn't have a legitimate reason to
    try cross-posting to both a local and public group.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Russ Allbery@eagle@eyrie.org to news.software.nntp on Fri May 15 18:53:35 2026
    From Newsgroup: news.software.nntp

    InterLinked <nntp@phreaknet.org> writes:
    On 5/9/2026 4:27 PM, Russ Allbery wrote:

    It's possible to use @ wildcards carefully along with rejection
    patterns in incoming.conf, but there are some caveats and it's
    relatively easy to make a mistake. Distributions are somewhat simpler.
    The recommendation is to use all of the mechanisms for effective
    defense in depth against misconfigurations.

    Is the idea here that since a distribution is once per-message (which
    could have multiple newsgroups, both local and non-local), adding the Distribution prevents posts from going to other servers if any non-local group is one of the newsgroups of a post?

    Right, the distribution is added by the server to all posts to local.* and
    then you can exclude the distribution on all our outgoing feeds to anyone
    you didn't want to exchange local.* with.

    For example, as soon as local.foo is seen, a distribution gets added
    marking which would then prevent the message from going to Usenet, even
    if it includes groups that, had they been the sole newsgroup of a post,
    would have gone to Usenet?

    Yup.

    Although in this simple example, the cross-posted Usenet groups would
    never reach Usenet, so from what I can tell, this only protects against "posting accidents" since a user wouldn't have a legitimate reason to
    try cross-posting to both a local and public group.

    Yeah, and of course you can also use a filter to just block crossposts directly. There are various ways to do it, but distribution has some nice property for old servers with no programmatic filter.
    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.
    --- Synchronet 3.22a-Linux NewsLink 1.2