• Re: Reducing Redundancy

    From Stefan+Usenet@Stefan+Usenet@Froehlich.Priv.at (Stefan Froehlich) to comp.databases.mysql on Wed Sep 11 13:41:35 2024
    From Newsgroup: comp.databases.mysql

    On Tue, 10 Sep 2024 17:33:01 Stefan Ram wrote:
    I'm picturing some program that pulls newsgroups from newsservers
    and dumps them into a database.

    "There is another theory which states that this has already
    happened"

    In my mind's eye, a post looks something like this, give or take:

    Path: A
    Message-ID: B

    Body: C

    . But if you snag the same post from a different server, it might
    look like this:

    Message-ID: B
    Path: D

    Body: C

    At first blush, you'd end up with the same body stored multiple
    times in the database. Talk about a waste of space!

    Not only a waste of space, you definitely want to avoid duplication
    in a database. These two incarnations are the same posting so its
    contents should be stored only once.

    To trim the fat, we could rejigger these posts so all the variable
    stuff is up front:

    Path: A
    Message-ID: B

    Body: C

    and

    Path: D
    Message-ID: B

    Body: C

    Now the tail end of both posts is identical, so we can toss that
    in a separate table at position 0.

    You'd really want to archive more than one incarnation of a posting,
    just because you pulled it from different servers? Why?

    Parse the incoming postings, extract the headers, store at least the
    Message-Id in a separate attribute of your table (wisely some more)
    and set a unique key to that field.

    This way, you could store the same post from multiple newsservers
    without eating up your hard drive space like it's In-N-Out fries.

    Still, the question remains: Why?

    Only reason I could see is to generate a database of distribution
    paths pointing to your archive. But I can't see any benefit in that.

    Bye,
    Stefan
    --
    http://kontaktinser.at/ - die kostenlose Kontaktboerse fuer Oesterreich Offizieller Erstbesucher(TM) von mmeike

    Der hastige B|+rger will Stefan. Das mu|f ja wohl einen Grund haben? (Sloganizer)
    --- Synchronet 3.21a-Linux NewsLink 1.2