Forum: Too Lazy BBS

Re: Hard Drive techology

From Paul@nospam@needed.invalid to aus.computers,alt.os.linux.mint on Sat Dec 27 07:29:01 2025

From Newsgroup: aus.computers

On Sat, 12/27/2025 6:54 AM, Axel wrote:

thanks for this. I've been checking on drives too myself. problem is not
all makes/models are stocked by computer shops, and I don't want to spend a lot.
I have been re-purposing what I have here to free up a 1Tb drive, but if I need
to buy a new drive, do you think SMR would be Ok for desktop use, and/or data storage, or should I only buy CMR?

I consider the SMR to be fine for sequential backups.

1) I would not intentionally buy SMR.

The first generation, as reviewed on Anandtech, dropped to as low as
25MB/sec or so. The SMR drive caching policy is a lot better now than
it was back then, but still, writing seven tracks and having to do
read-modify-write to change a sector within the seven track cluster,
that is just a "crazy thing to do". For me, buying an SMR is like
agreeing to buy a car with square wheels on it.

2) If I had to buy one SMR, I would buy two of them and alternate
the storage on them. Or otherwise use a redundant storage
pattern (put valuable things on two of the drives).

3) With a CMR drive, I am less fearful of the unknown. But HDD are still HDD.
They're not SSD or NVMe, so you know their limits. If you write a program
to shake the head assembly back and forth, 24 hours a day, the arm lasts
about a year doing that (for any HDD). Most desktop usage patterns are
nowhere near that bad. On a desktop there are long periods of idleness.

And when I refer to used drives from the Chia era, the hard drives
were abused to support this project. At one time, this causes a shortage
of hard drives, because the manufacturing could not keep up with demand.
Later, the Chia People were dumping the worn drives and trying to fool people into thinking they were new drives. The Chia interval caused my computer store to lose confidence in the HDD market (they were losing money on "buying high, selling low"), and one day here, they even had an
inventory level of *zero* HDD. Today, they have some again.

( https://en.wikipedia.org/wiki/Chia_Network # Abuse HDD for money )

Paul

--- Synchronet 3.21a-Linux NewsLink 1.2

From keithr0@me@bugger.off.com.au to aus.computers,alt.os.linux.mint on Sun Dec 28 09:50:50 2025

From Newsgroup: aus.computers

On 25/12/2025 4:57 am, Lawrence DrCOOliveiro wrote:

On Wed, 24 Dec 2025 21:18:19 +1000, keithr0 wrote:

On 24/12/2025 3:33 pm, Lawrence DrCOOliveiro wrote:

I got a new 12TB drive for my backup machine just a couple months
ago.

That's a lot of eggs in one basket ...

ThatrCOs why I have a backup machine.

... personally I'd prefer a RAID 5 or 6 setup with smaller drives.

RAID is about high availability, not about backup.

RAID is about failure resilience, if you have a single drive a failure
loses everything. Make a RAID group, and the loss of a single drive
results in no data loss. Done properly, it also improves performance.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to aus.computers,alt.os.linux.mint on Sun Dec 28 00:07:50 2025

From Newsgroup: aus.computers

On Sun, 28 Dec 2025 09:50:50 +1000, keithr0 wrote:

On 25/12/2025 4:57 am, Lawrence DrCOOliveiro wrote:

RAID is about high availability, not about backup.

RAID is about failure resilience, if you have a single drive a
failure loses everything.

With JBOD, you only lose what was on that drive.

With RAID-0, you do indeed lose everything.

The point with (nonzero) RAID is to keep going while you restore from
backup. ItrCOs not about replacing the need for backup.
--- Synchronet 3.21a-Linux NewsLink 1.2

From keithr0@me@bugger.off.com.au to aus.computers,alt.os.linux.mint on Sun Dec 28 11:12:22 2025

From Newsgroup: aus.computers

On 28/12/2025 10:07 am, Lawrence DrCOOliveiro wrote:

On Sun, 28 Dec 2025 09:50:50 +1000, keithr0 wrote:

On 25/12/2025 4:57 am, Lawrence DrCOOliveiro wrote:

RAID is about high availability, not about backup.

RAID is about failure resilience, if you have a single drive a
failure loses everything.

With JBOD, you only lose what was on that drive.

With RAID-0, you do indeed lose everything.

The point with (nonzero) RAID is to keep going while you restore from
backup. ItrCOs not about replacing the need for backup.

Not exactly, with any useful sort of RAID implementation, you can hot
replace the bad drive, without the need to restore from backup. All the
data remains available throughout.

I spent 20 years working on large storage systems, beginning with boxes
of 128 5 1/4" 9gig SCSI drives, going on with larger and larger drives dropping to 3 1/2" SCSI then Ultra SCSI, fibre channel and finally SOS
(SCSI Over Serial) drives. The largest box that I encountered had 1700
400gig drives. Originally all drives were RAID 1, then RAID 5, and later
to RAID 6. The customers were mostly large mainframe sites, both IBM,
and *nix who had to have their data available 24/365 no ifs no buts.
After all, you aren't going to be well pleased if you can't get your
money from an ATM because the bank is having to restore it's database
from a backup due to a drive failure.

Backups were made, usually on large ATA drives, but more for recovery
from software or system errors rather than hardware failures

--- Synchronet 3.21a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to aus.computers,alt.os.linux.mint on Sun Dec 28 02:43:58 2025

From Newsgroup: aus.computers

On Sun, 28 Dec 2025 11:12:22 +1000, keithr0 wrote:

... with any useful sort of RAID implementation, you can hot replace
the bad drive, without the need to restore from backup.

ThatrCOs where the rCLhigh availabilityrCY comes in.

All the data remains available throughout.

Until you discover a software bug (or an operator screwup) has deleted
an important database. Which you then need to restore from ... where?

I spent 20 years working on large storage systems, beginning with
boxes of 128 5 1/4" 9gig SCSI drives, going on with larger and
larger drives dropping to 3 1/2" SCSI then Ultra SCSI, fibre channel
and finally SOS (SCSI Over Serial) drives.

Presumably, judging from your comments, you were more on the hardware
side, not the software side.
--- Synchronet 3.21a-Linux NewsLink 1.2

From keithr0@me@bugger.off.com.au to aus.computers,alt.os.linux.mint on Sun Dec 28 13:27:40 2025

From Newsgroup: aus.computers

On 28/12/2025 12:43 pm, Lawrence DrCOOliveiro wrote:

On Sun, 28 Dec 2025 11:12:22 +1000, keithr0 wrote:

... with any useful sort of RAID implementation, you can hot replace
the bad drive, without the need to restore from backup.

ThatrCOs where the rCLhigh availabilityrCY comes in.

All the data remains available throughout.

Until you discover a software bug (or an operator screwup) has deleted
an important database. Which you then need to restore from ... where?

Of course there were still backups, not always what you think though.
The machines that I worked on could take checkpoints online, some users checkpointed on an hourly basis.

Resilience was an important facto with our customers, another storage
server could be connected vie optical fibre, the slave machine being a
remote mirror of the the master. After the Twin Towers incident, some customers took it even further. Maybe the master was in New York, the
first slave 50Km away in New Jersey, and a second slave connected to the
first across the continent in LA or San Francisco. The first slave would
be one transaction max behind, the second several, as a last ditch copy.
The slaves could also be used as a test load for system changes and
resynced afterward.

I spent 20 years working on large storage systems, beginning with
boxes of 128 5 1/4" 9gig SCSI drives, going on with larger and
larger drives dropping to 3 1/2" SCSI then Ultra SCSI, fibre the first channel
and finally SOS (SCSI Over Serial) drives.

Presumably, judging from your comments, you were more on the hardware
side, not the software side.

I worked on the hardware side for 30 years then wrote software
professionally for another 25. I still write stuff for micro controllers
like the ESP32 and the Raspi Pico for fun.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Paul@nospam@needed.invalid to aus.computers,alt.os.linux.mint on Sun Dec 28 04:41:43 2025

From Newsgroup: aus.computers

On Sat, 12/27/2025 6:50 PM, keithr0 wrote:

On 25/12/2025 4:57 am, Lawrence DrCOOliveiro wrote:

On Wed, 24 Dec 2025 21:18:19 +1000, keithr0 wrote:

On 24/12/2025 3:33 pm, Lawrence DrCOOliveiro wrote:

I got a new 12TB drive for my backup machine just a couple months
ago.

That's a lot of eggs in one basket ...

ThatrCOs why I have a backup machine.

... personally I'd prefer a RAID 5 or 6 setup with smaller drives.

RAID is about high availability, not about backup.

RAID is about failure resilience, if you have a single drive a failure loses everything. Make a RAID group, and the loss of a single drive results in no data loss. Done properly, it also improves performance.

RAID is about delaying maintenance until after the work day is done.

It's nothing about long term resilience.

If you have one fault and you do not meet the MTTR, the second
fault might knock you out, or depending on RAID class, the third
fault might knock you out.

*******

You really really have not thought it through, if you believe that resilience thing.

At 2PM in the afternoon at work, the main CAD server (serves copies of CAD software), I can't remember if it was a PERC or what it was.

The *controller* wrote to all drives at once. It was
not a commanded write. It was a firmware issue of some sort.
And not a capacity-rollover type flaw.

It corrupted some area low in the disk storage.

Causing *all volumes to be wiped out instantly*.

The restore time for the server, ran past 5PM and the
five hundred engineers depending on the server for their
software, had long since gone home when the server came back up.
Somebody in management, did the calc to see what the
"lost time" had cost us. That's what was the first step in
figuring out what the response should be to the event.

And this failure happened *twice*, just on a different server
which was not as critical. Because the first incident was not
as critical, it wasn't taken as seriously and nobody at that
time, had figured what had happened to the server. It was only
when a major infrastructure incident was raised, that the event
was analyzed to its source.

This is why RAID is worth *nothing* to you.

There is a class of "common mode faults" you should consider.

An example of a common mode fault, is when the PSU +12V shoots
up to +15V and burns all disk drives. When I mentioned the possibility
of this happening, a poster wrote in and said this had happened
to him. Loss of all storage on a PSU fault.

This is *why we do backups of our RAID array* :-/

A RAID array is NOT a backup.

A mirror, is just so much bullshit for the people doing it.
Because they haven't thought about the common mode faults.

I can give an example from a USENET posting about a RAID1 two disk mirror.

A user is running RAID 1. One of the drives drops. The user thinks
this is fine and dandy. Plug in replacement drive, and oh... whats this ?
The files on the second drive, stopped updating three months ago.
The array had some kind of fault, where the controller was
not writing to the second (surviving) drive. Which means, sure,
you can do a RAID rebuilt of your mirror, but three months
worth of changes are missing. The controller in this case was
a SIL3112, so it wasn't an actual quality controller. It's just
the deception "you have RAID", that counts.

DO YOUR BACKUPS. And run a Verify on your backup, to ensure it is a good one.
I have stories about that, too :-)

Paul
--- Synchronet 3.21a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to aus.computers,alt.os.linux.mint on Sun Dec 28 20:44:35 2025

From Newsgroup: aus.computers

On Sun, 28 Dec 2025 04:41:43 -0500, Paul wrote:

The *controller* wrote to all drives at once. It was not a commanded
write. It was a firmware issue of some sort. And not a
capacity-rollover type flaw.

It corrupted some area low in the disk storage.

Causing *all volumes to be wiped out instantly*.

Let me just add that to my ever-lengthening list of reasons why I
dislike hardware RAID controllers.

If IrCOm going to use RAID, I recommend using the built-in Linux
software RAID. I have seen it deal with failures more than once, and
my respect for it has only gone up.

Not to mention, itrCOs so much easier to reconfigure without having to
mess around with obscure vendor-proprietary utilities.

This is *why we do backups of our RAID array* :-/

A RAID array is NOT a backup.

Agree, agree, thrice agree.
--- Synchronet 3.21a-Linux NewsLink 1.2

Who's Online

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	65
Nodes:	6 (0 / 6)
Uptime:	01:53:25
Calls:	862
Files:	1,311
D/L today:	10 files (20,373K bytes)
Messages:	264,321

Re: Hard Drive techology

Who's Online

System Info