Forum: Too Lazy BBS

Trivial Backup dilemma

From pinnerite@pinnerite@gmail.com to alt.os.linux.mint on Mon Feb 2 22:32:35 2026

From Newsgroup: alt.os.linux.mint

I rsync my main drive to backups.

However I am a bit messy at filing documents, just keeping them in
temporary places, like "Downloads" for example. Eventually I Will
decide to create one or more directories and move stuff into them.

When I backup of course, the new directories (and contents) will be
copied but the originals will remain on the backup drive. It would time-consuming to go through the backup drive, work out what had been duplicated and delete the original files.

In the past, because I have multiple backups, I would delete all the
contents of a backup drive before backing up to it again.

That though is time consuming and seems like using a hammer to crack a
nut.
--
Linux Mint 22.1 kernel version 6.8.0-84-generic Cinnamon 6.4.8
AMD Ryzen 7 7700, Radeon RX 6600, 32GB DDR5, 2TB SSD, 2TB Barracuda
--- Synchronet 3.21b-Linux NewsLink 1.2

From Mike Easter@MikeE@ster.invalid to alt.os.linux.mint on Mon Feb 2 15:04:39 2026

From Newsgroup: alt.os.linux.mint

pinnerite wrote:

That though is time consuming and seems like using a hammer to crack a
nut.

The key to good storage 'maintenance' is *starting with* good organization.

Theoretically, once upon a time a person would accumulate a big pile of disorderly files that need to be organized into directories.

However, the 'nature' of that disorder post hoc begins to emerge as
greater order as the post hoc organizer comes along a designates an
order w/ directories and maybe hierarchies based on what he has accumulated.

Logically, it would seem that *future* would-be disorderly files could
fall into a type of order which emerged from the prior post hoc ordering experience.

Ya' know whattuh mean, Gene?
--
Mike Easter
--- Synchronet 3.21b-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to alt.os.linux.mint on Mon Feb 2 23:13:53 2026

From Newsgroup: alt.os.linux.mint

On Mon, 2 Feb 2026 22:32:35 +0000, pinnerite wrote:

When I backup of course, the new directories (and contents) will be
copied but the originals will remain on the backup drive. It would time-consuming to go through the backup drive, work out what had been duplicated and delete the original files.

rsync has the --link-dest option so it can do the deduping for you.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Paul@nospam@needed.invalid to alt.os.linux.mint on Mon Feb 2 21:58:28 2026

From Newsgroup: alt.os.linux.mint

On Mon, 2/2/2026 5:32 PM, pinnerite wrote:

I rsync my main drive to backups.

However I am a bit messy at filing documents, just keeping them in
temporary places, like "Downloads" for example. Eventually I Will
decide to create one or more directories and move stuff into them.

When I backup of course, the new directories (and contents) will be
copied but the originals will remain on the backup drive. It would time-consuming to go through the backup drive, work out what had been duplicated and delete the original files.

In the past, because I have multiple backups, I would delete all the
contents of a backup drive before backing up to it again.

That though is time consuming and seems like using a hammer to crack a
nut.

There is an article here on hardlinking, and using it to do
a Full-incremental-incremental-incremental kind of thing.

# Intro and inode number demo

https://www.admin-magazine.com/Articles/Using-rsync-for-Backups

# More meaty

http://www.mikerubel.org/computers/rsync_snapshots/

By positioning yourself in a particular backup instance directory
like backup.1 , you get a consolidated view of a point in time.
That seems to be the intent of the method, is to make the
structure work for you, instead of you working for it.

And this does not dedup. If the source directory has apple/b and baker/b
the method copies both of them, and does not try its hand at any hijinks further saving space. But, when you delete a directory at the source, the deletion "propagates" to the time based view of backup.1, backup.2 and so on. If you select the date where a file existed, you can still access it. You
can pick a date where the "b" file existed or pick a date where it
did not exist, when doing a restore.

The purpose of hardlinking, is to have two file pointers to the same
set of data inodes. That's how you can have consistent points-in-time,
for the cost of the file system overhead for the linkage. As a real world example,
on a rather large collection of files, the hardlink overhead was 500MB.
There is a cost associated with the method, but the consistent points-in-time is the benefit.

If you dedup the source directory, that can be a bit dangerous if you don't understand the details of the implementation. Maybe it's just better to back
up things as they stand, at a particular point in time.

And as with any "rotation" strategy, you should not rely on any one filesystem maintaining sanity forever. Journaled file systems are a hell of a lot more reliable than the un-journaled ones. While it might be tempting to have Full-1-2-3...100, there is a risk that if the Delete Fairy gets in there
(the execution of a badly formatted command by the user, wipes out
the entire cache), it's helpful to have more than one filesystem you can look to for your points in time.

Disk1 Full, 1,2,3 # This is not a RAID.
/ # The two devices are intended to have some independence
Source- # A power supply failure, doesn't wipe both out.
\ # A fire burns one, and not the other
Disk2 Full, 1,2,3

Paul
--- Synchronet 3.21b-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to alt.os.linux.mint on Tue Feb 3 03:27:21 2026

From Newsgroup: alt.os.linux.mint

On Mon, 2 Feb 2026 23:13:53 -0000 (UTC), I wrote:

rsync has the --link-dest option so it can do the deduping for you.

Just to clarify, I mean deduping across different backup snapshots, to
avoid creating additional copies of a file that has not changed, not
within a single backup snapshot.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Gordon@Gordon@leaf.net.nz to alt.os.linux.mint on Tue Feb 3 07:10:47 2026

From Newsgroup: alt.os.linux.mint

On 2026-02-02, Dan Purgert <dan@djph.net> wrote:

On 2026-02-02, pinnerite wrote:

I rsync my main drive to backups.

However I am a bit messy at filing documents, just keeping them in
temporary places, like "Downloads" for example. Eventually I Will
decide to create one or more directories and move stuff into them.

When I backup of course, the new directories (and contents) will be
copied but the originals will remain on the backup drive. It would
time-consuming to go through the backup drive, work out what had been
duplicated and delete the original files.

In the past, because I have multiple backups, I would delete all the
contents of a backup drive before backing up to it again.

That though is time consuming and seems like using a hammer to crack a
nut.

So use something like fdupes or the occasional --delete[*] switch with your rsync job?

If Alex just wants his backup (via rsync) to be idential to the scource (directory) then, the --delete option will do it.

Do a test run on a sample directory, to see if you get what you are expecting.

--del an alias for --delete-during
--delete delete extraneous files from dest dirs
--delete-before receiver deletes before xfer, not during
--delete-during receiver deletes during the transfer
--delete-delay find deletions during, delete after
--delete-after receiver deletes after transfer, not during
--delete-excluded also delete excluded files from dest dirs

[*] NB -- there are several 'options' you can use with --delete; such as 'before' or 'after' the transfer (and some additional variants thereto)

See above.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Mike Scott@usenet.16@scottsonline.org.uk.invalid to alt.os.linux.mint on Tue Feb 3 08:53:10 2026

From Newsgroup: alt.os.linux.mint

On 03/02/2026 03:27, Lawrence DrCOOliveiro wrote:

On Mon, 2 Feb 2026 23:13:53 -0000 (UTC), I wrote:

rsync has the --link-dest option so it can do the deduping for you.

Just to clarify, I mean deduping across different backup snapshots, to
avoid creating additional copies of a file that has not changed, not
within a single backup snapshot.

Isn't this where timeshift and 'back in time' come in? They're really
just front-ends to rsync plus scheduling of one sort or another.
They seem to work well enough.

Or are we talking archiving here, as opposed to backup? Or some mixture?
--
Mike Scott
Harlow, England
--- Synchronet 3.21b-Linux NewsLink 1.2

From pinnerite@pinnerite@gmail.com to alt.os.linux.mint on Tue Feb 3 12:25:32 2026

From Newsgroup: alt.os.linux.mint

On Mon, 2 Feb 2026 22:32:35 +0000
pinnerite <pinnerite@gmail.com> wrote:

I rsync my main drive to backups.

However I am a bit messy at filing documents, just keeping them in
temporary places, like "Downloads" for example. Eventually I Will
decide to create one or more directories and move stuff into them.

When I backup of course, the new directories (and contents) will be
copied but the originals will remain on the backup drive. It would time-consuming to go through the backup drive, work out what had been duplicated and delete the original files.

In the past, because I have multiple backups, I would delete all the
contents of a backup drive before backing up to it again.

That though is time consuming and seems like using a hammer to crack a
nut.

The advice given (thank you) appears to assume that the file locations
will be in the same place. They won't. The source file will have been
moved to a more suitable directory. The existing backup copy will be
matching the original location of the file (which will no longer be
there of course).

That is why I now get duplicates.
--
Linux Mint 22.1 kernel version 6.8.0-84-generic Cinnamon 6.4.8
AMD Ryzen 7 7700, Radeon RX 6600, 32GB DDR5, 2TB SSD, 2TB Barracuda
--- Synchronet 3.21b-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to alt.os.linux.mint on Tue Feb 3 21:20:53 2026

From Newsgroup: alt.os.linux.mint

On Tue, 3 Feb 2026 08:53:10 +0000, Mike Scott wrote:

On 03/02/2026 03:27, Lawrence DrCOOliveiro wrote:

Just to clarify, I mean deduping across different backup
snapshots...

Or are we talking archiving here, as opposed to backup? Or some
mixture?

I was talking about backup snapshots.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Handsome Jack@jack@handsome.com to alt.os.linux.mint on Wed Feb 4 09:38:34 2026

From Newsgroup: alt.os.linux.mint

On Tue, 3 Feb 2026 08:53:10 +0000, Mike Scott wrote:

On 03/02/2026 03:27, Lawrence DrCOOliveiro wrote:

On Mon, 2 Feb 2026 23:13:53 -0000 (UTC), I wrote:

rsync has the --link-dest option so it can do the deduping for you.

Just to clarify, I mean deduping across different backup snapshots, to
avoid creating additional copies of a file that has not changed, not
within a single backup snapshot.

Isn't this where timeshift and 'back in time' come in? They're really
just front-ends to rsync plus scheduling of one sort or another.
They seem to work well enough.

Or are we talking archiving here, as opposed to backup? Or some mixture?

Timeshift doesn't work on non-ext4 destination disks, which rules it out
for removable disks that might have to be used on a spare Windows machine.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Paul@nospam@needed.invalid to alt.os.linux.mint on Wed Feb 4 06:57:54 2026

From Newsgroup: alt.os.linux.mint

On Tue, 2/3/2026 7:25 AM, pinnerite wrote:

On Mon, 2 Feb 2026 22:32:35 +0000
pinnerite <pinnerite@gmail.com> wrote:

I rsync my main drive to backups.

However I am a bit messy at filing documents, just keeping them in
temporary places, like "Downloads" for example. Eventually I Will
decide to create one or more directories and move stuff into them.

When I backup of course, the new directories (and contents) will be
copied but the originals will remain on the backup drive. It would
time-consuming to go through the backup drive, work out what had been
duplicated and delete the original files.

In the past, because I have multiple backups, I would delete all the
contents of a backup drive before backing up to it again.

That though is time consuming and seems like using a hammer to crack a
nut.

The advice given (thank you) appears to assume that the file locations
will be in the same place. They won't. The source file will have been
moved to a more suitable directory. The existing backup copy will be
matching the original location of the file (which will no longer be
there of course).

That is why I now get duplicates.

If you don't like this scheme, because it doesn't happen to
hardlink moved duplicates...

# Intro and inode number demo

https://www.admin-magazine.com/Articles/Using-rsync-for-Backups

# More meaty

http://www.mikerubel.org/computers/rsync_snapshots/

you could ask the AI how to modify the script to achieve that end.
The AI seems to be able to read material if you provide a URL
(which means the free ones have limited agentic capability). The
site of the URL though, may not like the visit of Agentic AI,
and may repulse the thing.

Or you can show it the recommended scripted sequence. The first
article, tries to pick the best of what it found on the second
web page.

rm -rf backup.3
mv backup.2 backup.3
mv backup.1 backup.2
cp -al backup.0 backup.1 <=== I think this hard links the entire backup
rsync -a --delete source_directory/ backup.0/

If I was personally "nervous" about what was going on,
then I would "track" the source directory and the destination
directories (the point-in-time backups with their hardlinks
for unchanged files), and manually deduplicate the latest backup
made (backup.0) so that additional things in there which had
been assigned brand new inodes, were replaced with hardlinks
to existing files. Perhaps the rsync can generate a log of sufficient
quality, to track everything that has happened from generation
to generation.

Summary: An insistence on complexity, eventually leads to disaster.
At some point, it's OK to suffer a little inefficiency, if it
means "the thing will never break". You have demonstrated in this
group already, your problems with getting altered NFS mounts to
work properly, as an example of a complexity that required quite
a bit of boot-kicking to fix.

Why do I like Macrium backups ? It's definitely not the efficiency.
It's the "hard to blow up" behavior. I know that the integrity of
backups can be ruined by... bad RAM in the computer! It already happened!
And, by using Verify capability, I caught it in time! So while it wastes
gobs of space for the "free version", it gets the job done in such a
way that I never have to worry about details. It's a point-in-time.
It's "complete". Is it ideal ? Does it cure cancer ? Nope.

If you want it completely deduplicated, you're probably pretty close
already. The AI may figure out a way to finish the job. Your job then,
is to assemble a testbench case, of moved duplicates and so on, in the
same style as the author of the first link above.

Paul
--- Synchronet 3.21b-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to alt.os.linux.mint on Wed Feb 4 20:42:54 2026

From Newsgroup: alt.os.linux.mint

On Wed, 4 Feb 2026 09:38:34 -0000 (UTC), Handsome Jack wrote:

Timeshift doesn't work on non-ext4 destination disks ...

Funnily enough, rsync works on any filesystem that Linux will support.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Gordon@Gordon@leaf.net.nz to alt.os.linux.mint on Wed Feb 4 22:42:54 2026

From Newsgroup: alt.os.linux.mint

On 2026-02-04, Lawrence DrCOOliveiro <ldo@nz.invalid> wrote:

On Wed, 4 Feb 2026 09:38:34 -0000 (UTC), Handsome Jack wrote:

Timeshift doesn't work on non-ext4 destination disks ...

Funnily enough, rsync works on any filesystem that Linux will support.

Timeshift also works with a btrfs, if there is a sub volume @ in the picture --- Synchronet 3.21b-Linux NewsLink 1.2

Who's Online

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	59
Nodes:	6 (0 / 6)
Uptime:	01:47:00
Calls:	810
Files:	1,287
Messages:	198,761

Trivial Backup dilemma

Who's Online

System Info