• Re: Archiving Usenet 2003-2025

    From Jason Evans@tis.a.secret@pm.me to news.software.misc on Sun Jun 22 19:34:39 2025
    From Newsgroup: news.software.misc

    On 6/13/25 1:36 PM, Billy G. (go-while) wrote:
    Cool project idea, i already did the same.

    Can you provide a link to your archives or are they only on your news
    server? What newsgroup list did you use for gathering groups? I used the
    group list from isc.org
    (https://ftp.isc.org/pub/usenet/CONFIG/newsgroups) supplemented with
    Eternal September's list.

    Also, how did you get your archives? I developed a script to do this for
    me because I couldn't find a reliable way to do this otherwise. I also
    have the ability to download groups from a specific time frame which I
    hope to use every year to archive groups year-by-year. Are you doing
    something like that also?

    Jason
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Colin Macleod@user7@newsgrouper.org.invalid to news.software.misc on Mon Jun 23 10:58:28 2025
    From Newsgroup: news.software.misc

    Colin Macleod <user7@newsgrouper.org.invalid> posted:

    "Billy G. (go-while)" <no-reply@no.spam> posted:

    Cool project idea, i already did the same.

    Here is everything you can get from archive.org and probably everything you can get from the biggest paid providers....


    Impressive, some content goes back to 1983, before the "Great Renaming",
    but checking comp.lang.tcl also shows a new message posted today.

    I tried Billy's server lux-feed1.newsdeef.eu again today and but now I just
    get "481 Denied" responses. Is it still available?
    --
    Colin Macleod ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ https://cmacleod.me.uk

    EfoO Is there anybody there?
    Efa+ Is that a trick question? I'm here in spirit but not in body!
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Billy G. (go-while)@no-reply@no.spam to news.software.misc on Mon Jul 28 11:59:43 2025
    From Newsgroup: news.software.misc

    On 16.06.25 17:38, Colin Macleod wrote:
    "Billy G. (go-while)" <no-reply@no.spam> posted:

    Cool project idea, i already did the same.

    Here is everything you can get from archive.org and probably everything
    you can get from the biggest paid providers....


    Impressive, some content goes back to 1983, before the "Great Renaming",
    but checking comp.lang.tcl also shows a new message posted today.
    There are nearly half a million groups listed, but many appear to be bogus with names which are typos and no or minimal content.

    Could I add your server to this list?


    oh did not see this here!

    yes you can! :)

    that's everything what was available at archive.org
    and sucked from most providers too...
    all spam inclusive.
    --
    .......
    Billy G. (go-while)
    https://pugleaf.net
    @Newsgroup: rocksolid.nodes.help

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Billy G. (go-while)@no-reply@no.spam to news.software.misc on Mon Jul 28 12:25:09 2025
    From Newsgroup: news.software.misc

    On 23.06.25 02:34, Jason Evans wrote:
    On 6/13/25 1:36 PM, Billy G. (go-while) wrote:
    Cool project idea, i already did the same.

    Can you provide a link to your archives or are they only on your news server? What newsgroup list did you use for gathering groups? I used the group list from isc.org
    (https://ftp.isc.org/pub/usenet/CONFIG/newsgroups) supplemented with
    Eternal September's list.

    Also, how did you get your archives? I developed a script to do this for
    me because I couldn't find a reliable way to do this otherwise. I also
    have the ability to download groups from a specific time frame which I
    hope to use every year to archive groups year-by-year. Are you doing something like that also?

    Jason

    archive.org -> mbox2nntp ( https://github.com/go-while/mbox2nntp )

    https://archive.org/details/usenethistorical

    https://archive.org/details/usenet

    You need multiple TB of free space and more time to download them all =)

    Text Usenet Archive
    Host: lux-feed1.newsdeef.eu
    Port: 119 or 563 SSL
    User: usenet
    Pass: archive

    storage moved to SSD with 25 Gbps uplink and performance is crazy!

    You can use 1000 conns. Server will be happy serving you!

    Please ask if you need more conns! :D
    --
    .......
    Billy G. (go-while)
    https://pugleaf.net
    @Newsgroup: rocksolid.nodes.help

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Jesse Rehmer@jesse.rehmer@blueworldhosting.com to news.software.misc on Sat Aug 30 04:11:03 2025
    From Newsgroup: news.software.misc

    On Jun 13, 2025 at 1:36:57rC>PM CDT, ""Billy G." <go-while)" <no-reply@no.spam> wrote:

    Cool project idea, i already did the same.

    Here is everything you can get from archive.org and probably everything
    you can get from the biggest paid providers....

    10 TB of text, mostly unfiltered. maybe some google groups spam is missing.

    The archive is live and connected via peering so nothing else to do, it archives on it's own.

    The Server is written by me and lacks some commands.

    Text Usenet Archive
    Host: lux-feed1.newsdeef.eu
    Port: 119 or 563 SSL
    User: usenet
    Pass: archive

    Please don't hit it too hard but connections are limited any ways.

    You can get me on discord: https://discord.gg/rECSbHHFzp

    If anybody can take a full copy: I'm happy to share!!!

    Hey Billy, I've been meaning to reach out to you. Mind contacting me via e-mail?

    I'd like to know if there is a more efficient way than using suck/pullnews to obtain the archive? I had been putting together an archive at news.blueworldhosting.com, but have a number of holes and never got around to seriously importing the mbox files from archive.org.

    I know you're in early development stages, but if you'd like someone to test pushing/streaming articles via NNTP I'm interested. I have a lot of bandwidth and performant hardware, always a good test case for testing NNTP streaming. --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Billy G.@no-reply@no.spam to news.software.misc on Sat Aug 30 23:45:28 2025
    From Newsgroup: news.software.misc

    On 30.08.25 05:11, Jesse Rehmer wrote:

    If anybody can take a full copy: I'm happy to share!!!

    Hey Billy, I've been meaning to reach out to you. Mind contacting me via e-mail?

    I'd like to know if there is a more efficient way than using suck/pullnews to obtain the archive? I had been putting together an archive at news.blueworldhosting.com, but have a number of holes and never got around to seriously importing the mbox files from archive.org.

    I know you're in early development stages, but if you'd like someone to test pushing/streaming articles via NNTP I'm interested. I have a lot of bandwidth and performant hardware, always a good test case for testing NNTP streaming.

    Hi!

    using suck is worst way to download from the newsdeef archive.
    the overview is not a database but a flat file with offset indexes
    for every 100 articles only and downloading by article number is slow.

    Articles are stored as sha256 hash from message-id.
    best way is requesting '(X)HDR message-id' in a group first,
    then suck message-ids: results in max performance.

    I've a tool to send many groups concurrently to nntp server via ihave.

    I'll send you an email later.


    Import to pugleaf databases is completed (only *sex* missing).

    Plan is to share the database snapshots via torrent (10 TB+)

    Source Code is online too: https://github.com/go-while/go-pugleaf
    --
    .......
    Billy G. (go-while)
    https://pugleaf.net
    @Newsgroup: rocksolid.nodes.help
    irc.pugleaf.net:6697 (SSL) #lounge
    discord: https://discord.gg/rECSbHHFzp
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Billy G.@no-reply@no.spam to news.software.misc on Sat Aug 30 23:50:26 2025
    From Newsgroup: news.software.misc

    On 30.08.25 23:45, Billy G. wrote:
    I've a tool to send many groups concurrently to nntp server via ihave.

    Side note:
    INN will not accept many of the old articles because dates are weird...
    --
    .......
    Billy G. (go-while)
    https://pugleaf.net
    @Newsgroup: rocksolid.nodes.help
    irc.pugleaf.net:6697 (SSL) #lounge
    discord: https://discord.gg/rECSbHHFzp
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Jesse Rehmer@jesse.rehmer@blueworldhosting.com to news.software.misc on Sat Aug 30 23:42:45 2025
    From Newsgroup: news.software.misc

    On Aug 30, 2025 at 5:45:28rC>PM CDT, ""Billy G."" <no-reply@no.spam> wrote:

    On 30.08.25 05:11, Jesse Rehmer wrote:

    If anybody can take a full copy: I'm happy to share!!!

    Hey Billy, I've been meaning to reach out to you. Mind contacting me via
    e-mail?

    I'd like to know if there is a more efficient way than using suck/pullnews to
    obtain the archive? I had been putting together an archive at
    news.blueworldhosting.com, but have a number of holes and never got around to
    seriously importing the mbox files from archive.org.

    I know you're in early development stages, but if you'd like someone to test >> pushing/streaming articles via NNTP I'm interested. I have a lot of bandwidth
    and performant hardware, always a good test case for testing NNTP streaming.

    Hi!

    using suck is worst way to download from the newsdeef archive.
    the overview is not a database but a flat file with offset indexes
    for every 100 articles only and downloading by article number is slow.

    Articles are stored as sha256 hash from message-id.
    best way is requesting '(X)HDR message-id' in a group first,
    then suck message-ids: results in max performance.

    I've a tool to send many groups concurrently to nntp server via ihave.

    FWIW - that's the default behavior of suck, it uses XHDR, builds a database of Message-IDs, and uses ARTICLE <MID> to fetch (among other things).

    If you have a tool to push, that's great, we can sort out details via e-mail.

    I'm aware of a few challenges getting the older messages accepted by INN. From what I've observed so far, it's primarily articles originating from ANews and BNews. Seems it was primarily used from 1981-1982 based on what I'm seeing rejected). I'll have to sort out how to deal with that later.
    --- Synchronet 3.21a-Linux NewsLink 1.2