Forum: Too Lazy BBS

Who's Online
Recent Visitors
- Kawasu
  Thu Oct 16 10:17:15 2025
  from Mena, Ar via Telnet
- Geek2
  Thu Oct 16 06:39:58 2025
  from Euclid, Oh via Telnet
- Amr
  Tue Oct 14 21:13:21 2025
  from Fayetteville, Nc via Telnet
- Amr
  Tue Oct 14 20:34:34 2025
  from Fayetteville, Nc via Telnet

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	27
Nodes:	6 (0 / 6)
Uptime:	38:57:47
Calls:	631
Calls today:	2
Files:	1,187
D/L today:	23 files (29,781K bytes)
Messages:	174,061

Re: Reducing Redundancy

From Stefan+Usenet@Stefan+Usenet@Froehlich.Priv.at (Stefan Froehlich) to comp.databases.mysql on Wed Sep 11 13:41:35 2024

From Newsgroup: comp.databases.mysql

On Tue, 10 Sep 2024 17:33:01 Stefan Ram wrote:

I'm picturing some program that pulls newsgroups from newsservers
and dumps them into a database.

"There is another theory which states that this has already
happened"

In my mind's eye, a post looks something like this, give or take:

Path: A
Message-ID: B

Body: C

. But if you snag the same post from a different server, it might
look like this:

Message-ID: B
Path: D

Body: C

At first blush, you'd end up with the same body stored multiple
times in the database. Talk about a waste of space!

Not only a waste of space, you definitely want to avoid duplication
in a database. These two incarnations are the same posting so its
contents should be stored only once.

To trim the fat, we could rejigger these posts so all the variable
stuff is up front:

Path: A
Message-ID: B

Body: C

and

Path: D
Message-ID: B

Body: C

Now the tail end of both posts is identical, so we can toss that
in a separate table at position 0.

You'd really want to archive more than one incarnation of a posting,
just because you pulled it from different servers? Why?

Parse the incoming postings, extract the headers, store at least the
Message-Id in a separate attribute of your table (wisely some more)
and set a unique key to that field.

This way, you could store the same post from multiple newsservers
without eating up your hard drive space like it's In-N-Out fries.

Still, the question remains: Why?

Only reason I could see is to generate a database of distribution
paths pointing to your archive. But I can't see any benefit in that.

Bye,
Stefan
--
http://kontaktinser.at/ - die kostenlose Kontaktboerse fuer Oesterreich Offizieller Erstbesucher(TM) von mmeike

Der hastige B|+rger will Stefan. Das mu|f ja wohl einen Grund haben? (Sloganizer)
--- Synchronet 3.21a-Linux NewsLink 1.2

Who's Online

Recent Visitors

System Info

Re: Reducing Redundancy