From Newsgroup: news.software.nntp
Since there has been some discussion about archiving very old articles
with INN, I found this e-mail thread I forgot about from Retro Guy (RIP)
and wanted to share the knowledge in case it helps others. I haven't used
the script yet, and he was using rpost to post as a client and stripping a
lot of headers. Personally, I would leave the headers besides Date intact
and use rnews to inject them.
-----
Here you go. I have attached some php scripts and my notes. Please read through the scripts before running them, just to be cautious. I may also
have left some hardcoded paths that you would need to change.
I hope this all makes sense. I haven't really thought about it for a few months.
Oh, and no, I'm not working with archive.org historical archive. I'm using David Wiseman's UTZOO archive:
https://archive.org/details/utzoo-wiseman- usenet-archive
-----
How I think this works. It's been a few months since I did this so
hopefully I'm not overlooking something:
First, create 'artlist.in' in the script dir ($scriptdir).
artlist.in is a file containing the full path to the articles you want to eventually import. One article per file.
So, something like:
find /full/path/to/your/unmodified/files/ -type f > artlist.in
Create a directory in the script dir named out ($scriptdir/out). This directory will be written to by the next script, so it should be empty.
Now, run datefix.php from $scriptdir. It will read './artlist.in' and
write to './out' and './newsgroups.inc'.
Next, run get_groups.php. It will read './newsgroups.inc' and write './ newsgroups.out'.
It takes all the 'Newsgroups: *' from all the articles, splits them into
one newsgroup per line, and writes them to newsgroup.out.
Next, run 'sort newsgroups.out | uniq > newsgroups.txt'.
This is pretty clear, it sorts all the newsgroups and then deletes
duplicates.
Then run the following shell script on your server (as news user) to
create the groups from 'newsgroups.txt':
-----
#/bin/bash
for WORD in `cat ./newsgroups.txt`
do
echo $WORD
ctlinnd newgroup $WORD
done
echo "Done."
-----
There WILL be messed up group names due, most likely, to people typing
them incorrectly when posting. Unless you want to read through the file
first and remove them, they will be created on the server.
Next, create a file named './artlist' that contains a list of every
article in './out' by full path name. One full path article filename per
line.
Here's a shell script example, but I'm sure you can already do this:
find /full/path/to/out/ -type f > artlist
Finally, try to post the articles. Write a script similar to (I used
rpost):
-----
#!/bin/bash
# Server details
server="server.name"
port="port number"
username="username"
password="password"
# Connect to NNTP server
rpost $server -n -u -U $username -P $password -b artlist
# Quit the NNTP server
rpost -q
echo "Articles posted successfully!"
-----
datefix.php:
#!/usr/bin/php
<?php
/* FIRST: Create artlist.in */
/* Clean ./out/* */
$artfile = "artlist.in";
$artlist = file($artfile);
$newsgroupslist = "newsgroups.inc";
unlink($newsgroupslist);
$newarticle = array();
$i=0;
foreach($artlist as $article) {
if(!is_file(trim($article))) {
continue;
}
$articleline = file(trim($article));
$lines = 0;
$is_header = 1;
foreach($articleline as $line) {
if(trim($line) == "" && $lines > 0) {
$is_header=0;
$lines++;
}
if(stripos($line, "Relay-Version") === 0 && $is_header == 1) {
continue;
}
if(stripos($line, "Posting-Version") === 0 && $is_header == 1) {
continue;
}
if(stripos($line, "Date-Received") === 0 && $is_header == 1) {
continue;
}
if(stripos($line, "Xref") === 0 && $is_header == 1) {
continue;
}
if(stripos($line, "X-Trace") === 0 && $is_header == 1) {
continue;
}
if(stripos($line, "X-Complaints-To") === 0 && $is_header == 1) {
continue;
}
if(stripos($line, "NNTP-Posting-Host") === 0 && $is_header == 1) {
continue;
}
if(stripos($line, "Injection-Info") === 0 && $is_header == 1) {
continue;
}
if(stripos($line, "Newsgroups: ") === 0 && $is_header == 1) {
$groups = explode(': ', $line);
file_put_contents($newsgroupslist, $groups[1], FILE_APPEND);
}
if(stripos($line, "Date: ") === 0 && $is_header == 1) {
$finddate=explode(': ', $line);
$newarticle[] = "Date: ".date("D, j M Y H:i T",strtotime($finddate[1]))."\n";
continue;
}
if(trim($line) == ".") {
$newarticle[] = "..\n";
continue;
}
$newarticle[] = $line;
}
$newfile = 'out/'.$i;
$i++;
foreach($newarticle as $newline) {
file_put_contents($newfile, $newline, FILE_APPEND);
}
unset($newarticle);
}
/* NEXT RUN get_groups.php */
-----
get_groups.php:
#!/usr/bin/php
<?php
$groups_file = "newsgroups.inc";
$newsgroups = file($groups_file);
$outfile = "newsgroups.out";
unlink($outfile);
foreach($newsgroups as $groups) {
$group = preg_split("/(,|\ )/", $groups);
foreach($group as $addgroup) {
file_put_contents($outfile, trim($addgroup)."\n",
FILE_APPEND);
}
}
/* NEXT IS 'sort newsgroups.out | uniq > newsgroups.txt */
/* Then send it to novalink.us and create groups */
/* THEN: Create artlist from ./out */
--- Synchronet 3.21a-Linux NewsLink 1.2