From Newsgroup: alt.comp.os.windows-11
On Mon, 9/29/2025 3:20 AM, Jeff Barnett wrote:
On 9/21/2025 2:23 AM, J. P. Gilliver wrote:
On 2025/9/21 4:0:59, Paul wrote:
[]
Purely for your amusement, you could try a bad block scan of the drive.
This is pretty old software itself, having been released in 2008. The last >>> column over will check whether the blocks have bad CRC. That version of
HDTune, does not do writes, so it should not make the SSD health any worse. >>>
-a-a-a https://www.hdtune.com/files/hdtune_255.exe
<MAJOR SNIPING>
I thought that reading a "unit" on an SSD caused tits contents to be rewrote in another unit and the original to go in a pool (over partition?) that is used for wear leveling So whether that utility writes or not, the SSD will have its pot stirred.
Is my memory wrong about this?
When a drive is TRIMmed, the OS sends a list of LBAs known to
not be used, and those can be put in the Free Pool and prepared
for usage. The preparation for usage is an Erase, which costs
one wear life. Any voltage changes in a NAND floating gate, can
cause small changes in crystal structure. Annealing a NAND cell
could fix these, extending life significantly, but we don't
know how to do that on a massive scale.
When you write a file, on a TLC or QLC device, the write method
can use an SLC cache (floating gate cell made to hold only one
bit, by changing the voltage threshold scheme), it costs you
one write for the SLC cache, and one write for the final placement
into TLC storage. The SLC cache is not a dedicated static thing,
it can be moved around by applying different voltage sensing
to any LBAs declared as being part of the cache area. It's not
like the critical-data-storage area, which might be an SLC area
at all times.
But reading is "free". It isn't DRAM and does not need refresh
in that sense. DRAM dribbles in milliseconds. The floating gate
in that case, is not protected by quantum mechanics.
On something like QLC, just about every sector read needs correction.
A 512 byte sector has 4KBits and the voltage thresholds are pretty tight.
If a read has one bit in error, this is easily corrected by Reed Solomon without needing the sector to be rewritten. The error corrector code
is quite strong and no reason to panic.
Say you read a sector and 20 bits out of 4096 are errored. Maybe the code
is still strong enough for this, correcting it. The syndrome is 50 bytes
on a 512 byte sector, to give some idea of the syndrome size.
But at some point, if you were tracking the statistics on the whole
page, as the firmware, you might decide to re-write the page by pulling
a new one from Free Pool, and copying the data over. These activities
are not normally described in any detail, so we are not privy to how
the devices are tuned.
But reading as an activity, does not need refresh in the same way
that DRAM needs constant refresh and lost bandwidth. In the DRAM,
there are refresh counters that go up through the address space,
and, in effect, have "timed" refresh activity. The DRAM memory
might be refreshed 64 times within the discharge period, so the
refresh is done in such a way, there is no possibility of a
discharge failure. In years past, people would fool with the
DRAM refresh settings, to "push their luck" :-)
Compare that to a floating gate NAND, where electrons leaving
the insulated gate is quantum mechanically disallowed. Maybe an
electron tunnels to escape, and the probability is quite low.
A NAND cell should be able to remain charged to a particular
voltage for ten years. In an SLC cell, the voltage threshold
is at 50%, and there are huge margins around the threshold.
The SLC NAND, as a beast, is hardly likely to be "mushy" in
everyday usage, and will be robust right up to the ten year
estimate. Whereas QLC, where there are 15 thresholds for the
16 voltage regions, that is tight-as-a-drum and the voltage
could shift in the cell, getting dangerously close to moving
into the next threshold region. We can correct a lot of these
shifts, with our powerful error corrector, but it's getting
harder to guarantee, by trickery, that we have robust
storage (with that correction scheme).
There was a Samsung, where it was noticed the read-performance
after three months, was dropping to 300MB/sec from 540MB/sec
and this was thought to be error corrector activity on
everything being read. And it is possible at that point,
that the re-write tuning was added or tuned, to stop users
from complaining. But we really don't know what was going
on there, in terms of details. This is all just guesses from
enthusiast web site news releases. Some SSD controller chips
have four cores now, one would be the command interpreter,
while three cores could be running error correction. That's
to help hide our precious purchase is mushy, do the correction
really fast...
In decades passed, there were reports that reading did wear
on cells. However, there is no anecdotal evidence of this
being anywhere near significant, in end-user usage patterns.
You can read the shit out of them, and if there is an
effect (like it were to be making the flash cells mushy),
users do not notice that a high-read application was
causing poorer eventual read performance, than SSDs
that are loafing along through their workday. If there is
a mechanism there, no one has joined the dots and claimed
to have seen this in practice.
I have never seen a comment from a person knowledgeable on the subject, regarding why there were early reports of an effect, and no such
claims are made today. The disconnect remains unexplained.
I get the impression though, that the mushy cases, they
can be mushy just while leaving the SSD sitting on the shelf
for three months, that particular device might have shown mush.
Whether mush can be cured by making the gate to channel
thicker, I don't know if that is a factor or not. Considering
the Z-axis and the 3D nature of NAND, there is some amount
of design pressure to make then thin-as-can-be. There can be
16 die inside a NAND chip, and then a couple hundred cells
in a column within each die. And yet the chips never seem
to grow in stature.
But on the other hand, advancement of NAND is kinda slow now.
They are working on five-bit cells (which means even tighter
thresholds), but those damn things have not poked their
little heads into super-cheap product. They're not shipping
yet. Sooner or later the write life is going to have to drop
below 600 or so, writes per cell.
Micron makes an industrial flash with 6x the write life, but
like everything in high tech, the price has to be 6x higher.
Maybe you might find those in certain Enterprise products.
The 600 number is not some "manifest constant", it is just
a common number for consumer devices. There have been devices
with considerably better numbers. And if they still made actual
true SLC devices, those could manage 100,000 writes per cell,
and would outlast your lifetime. Making 600 ones, ensures,
like bog roll, there is a kind of rent-seeking behavior.
But 600 still lasts a long time when all you do is web surf
(and you set your web cache to be in RAM :-) ).
Paul
--- Synchronet 3.21a-Linux NewsLink 1.2