Sysop: | Amessyroom |
---|---|
Location: | Fayetteville, NC |
Users: | 43 |
Nodes: | 6 (0 / 6) |
Uptime: | 96:23:38 |
Calls: | 290 |
Files: | 904 |
Messages: | 76,426 |
Grant Edwards wrote:
On 2024-09-03, Dale <rdalek1967@gmail.com> wrote:I've seen that before too. I'm hoping not. I may shutdown my rig,
I was trying to re-emerge some packages. The ones I was working onIn my experience, that usually means failing RAM. I'd try running
failed with "internal compiler error: Segmentation fault" or similar
being the common reason for failing.
memtest86 for a day or two.
--
Grant
remove and reinstall the memory and then test it for a bit. May be a
bad connection. It has worked well for the past couple months tho.
Still, it is possible to either be a bad connection or just going bad.
Dang those memory sticks ain't cheap. o_~
Thanks. See if anyone else has any other ideas.
Dale
:-) :-)
I wonder how much fun getting this memory replaced is going to be. o_O
I wasn't planning to go to 128GBs yet but guess I am now.
On 2024-09-04, Dale <rdalek1967@gmail.com> wrote:
I ordered another set of memory sticks. I figure I will have to send
them both back which means no memory at all. I wasn't planning to go to 128GBs yet but guess I am now. [...]
Good luck.
[…]
I plugged them in alongside the recently purchased pair. Wouldn't
work. Either pair of SIMMs worked fine by themselves, but the only way
I could get both pairs to work together was to drop the clock speed
down to about a third the speed they were supposed to support.
On 2024-09-04, Dale <rdalek1967@gmail.com> wrote:
At one point, I looked for a set of four sticks of the memory. I
couldn't find any. They only come in sets of two. I read somewhere
that the mobo expects each pair to be matched.
Yep, that's definitely how it was supposed to work. I fully expected
my two (identically spec'ed) sets of two work. All the documentation I
could find said it should. It just didn't. :/
--
Grant
When I built this rig, I first booted the Gentoo Live boot image and
just played around a bit. Mostly to let the CPU grease settle in a
bit. Then I ran memtest through a whole test until it said it passed.
Only then did I start working on the install. The rig has ran without
issue until I noticed gkrellm temps were stuck. They wasn't updating as temps change. So, I closed gkrellm but then it wouldn't open again.
Ran it in a console and saw the error about missing module or
something. Then I tried to figure out that problem which lead to seg
fault errors. Well, that lead to the thread and the discovery of a bad memory stick. I check gkrellm often so it was most likely less than a
day. Could have been only hours. Knowing I check gkrellm often, it was likely only a matter of a couple hours or so. The only reason it might
have went longer, the CPU was mostly idle. I watch more often when the
CPU is busy, updates etc.
I've ran fsck before mounting on every file system so far. I ran it on
the OS file systems while booted from the Live image. The others I just
did before mounting. I realize this doesn't mean the files themselves
are OK but at least the file system under them is OK.
I'm not sure how
to know if any damage was done between when the memory stick failed and
when I started the repair process. I could find the ones I copied from
place to place and check them but other than watching every single
video, I'm not sure how to know if one is bad or not. So far,
thumbnails work. o_O
Some MoBos are more tolerant than others.
Regarding Dale's question, which has already been answered - yes, anything the
bad memory has touched is suspect of corruption. Without ECC RAM a dodgy module can cause a lot of damage before it is discovered.
Maybe that it only catches 1-bit errors, but Dale has more broken bits?
Or it could be Dale's kit is DDR4?
Am Wed, Sep 04, 2024 at 11:38:01PM +0100 schrieb Michael:
Some MoBos are more tolerant than others.
Regarding Dale's question, which has already been answered - yes, anything the bad memory has touched is suspect of corruption. Without ECC RAM a dodgy module can cause a lot of damage before it is discovered.
Actually I was wondering: DDR5 has built-in ECC. But that’s not the same as the server-grade stuff, because it all happens inside the module with no communication to the CPU or the OS. So what is the point of it if it still causes errors like in Dale’s case?
Maybe that it only catches 1-bit errors, but Dale has more broken bits?
Michael wrote:
On Thursday 5 September 2024 09:36:36 BST Dale wrote:
I've ran fsck before mounting on every file system so far. I ran it on
the OS file systems while booted from the Live image. The others I just >> did before mounting. I realize this doesn't mean the files themselves
are OK but at least the file system under them is OK.
This could put your mind mostly at rest, at least the OS structure is OK and the error was not running for too long.
That does help.
I'm not sure how
to know if any damage was done between when the memory stick failed and
when I started the repair process. I could find the ones I copied from
place to place and check them but other than watching every single
video, I'm not sure how to know if one is bad or not. So far,
thumbnails work. o_O
If you have a copy of these files on another machine, you can run rsync with --checksum. This will only (re)copy the file over if the checksum
is different.
I made my backups last weekend. I'm sure it was working fine then.
After all, it would have failed to compile packages if it was bad. I'm thinking about checking against that copy like you mentioned but I have
other files I've added since then. I figure if I remove the delete
option, that will solve that. It can't compare but it can leave them be.
Use rsync with:
--checksum
and
--dry-run
You can also run find to identify which files were changed during the period
you were running with the dodgy RAM. Thankfully you didn't run for too long
before you spotted it.
I have just shy of 45,000 files in 780 directories or so. Almost 6,000
in another. Some files are small, some are several GBs or so. Thing
is, backups go from a single parent directory if you will. Plus, I'd
want to compare them all anyway. Just to be sure.
Am Thu, Sep 05, 2024 at 06:30:54AM -0500 schrieb Dale:
Use rsync with:
--checksum
and
--dry-run
I suggest calculating a checksum file from your active files. Then you don’t
have to read the files over and over for each backup iteration you compare
it against.
You can also run find to identify which files were changed during the period you were running with the dodgy RAM. Thankfully you didn't run for too long before you spotted it.
This. No need to check everything you ever stored. Just the most recent stuff, or at maximum, since you got the new PC.
I have just shy of 45,000 files in 780 directories or so. Almost 6,000
in another. Some files are small, some are several GBs or so. Thing
is, backups go from a single parent directory if you will. Plus, I'd
want to compare them all anyway. Just to be sure.
I aqcuired the habit of writing checksum files in all my media directories such as music albums, tv series and such, whenever I create one such directory. That way even years later I can still check whether the files are intact. I actually experienced broken music files from time to time (mostly on the MicroSD card in my tablet). So with checksum files, I can verify
which file is bad and which (on another machine) is still good.
Michael wrote:
On Thursday 5 September 2024 19:55:56 BST Frank Steinmetzger wrote:
Am Thu, Sep 05, 2024 at 06:30:54AM -0500 schrieb Dale:
Use rsync with:
--checksum
and
--dry-run
I suggest calculating a checksum file from your active files. Then you
don’t have to read the files over and over for each backup iteration you >> compare it against.
You can also run find to identify which files were changed during the >>>> period you were running with the dodgy RAM. Thankfully you didn't run >>>> for too long before you spotted it.
This. No need to check everything you ever stored. Just the most recent
stuff, or at maximum, since you got the new PC.
I have just shy of 45,000 files in 780 directories or so. Almost 6,000 >>> in another. Some files are small, some are several GBs or so. Thing
is, backups go from a single parent directory if you will. Plus, I'd
want to compare them all anyway. Just to be sure.
I aqcuired the habit of writing checksum files in all my media
directories
such as music albums, tv series and such, whenever I create one such
directory. That way even years later I can still check whether the files >> are intact. I actually experienced broken music files from time to time
(mostly on the MicroSD card in my tablet). So with checksum files, I can >> verify which file is bad and which (on another machine) is still good.
There is also dm-verity for a more involved solution. I think for Dale something like this should work:
find path-to-directory/ -type f | xargs md5sum > digest.log
then to compare with a backup of the same directory you could run:
md5sum -c digest.log | grep FAILED
Someone more knowledgeable should be able to knock out some clever python script to do the same at speed.
I'll be honest here, on two points. I'd really like to be able to do
this but I have no idea where to or how to even start. My setup for
series type videos. In a parent directory, where I'd like a tool to
start, is about 600 directories. On a few occasions, there is another directory inside that one. That directory under the parent is the name
of the series. Sometimes I have a sub directory that has temp files;
new files I have yet to rename, considering replacing in the main series directory etc. I wouldn't mind having a file with a checksum for each
video in the top directory, and even one in the sub directory. As a
example.
TV_Series/
├── 77 Sunset Strip (1958)
│ └── torrent
├── Adam-12 (1968)
├── Airwolf (1984)
I got a part of the output of tree. The directory 'torrent' under 77
Sunset is temporary usually but sometimes a directory is there for
videos about the making of a video, history of it or something. What
I'd like, a program that would generate checksums for each file under
say 77 Sunset and it could skip or include the directory under it.
Might be best if I could switch it on or off. Obviously, I may not want
to do this for my whole system. I'd like to be able to target
directories. I have another large directory, lets say not a series but sometimes has remakes, that I'd also like to do. It is kinda set up
like the above, parent directory with a directory underneath and on
occasion one more under that.
One thing I worry about is not just memory problems, drive failure but
also just some random error or even bit rot. Some of these files are
rarely changed or even touched. I'd like a way to detect problems and
there may even be a software tool that does this with some setup,
reminds me of Kbackup where you can select what to backup or leave out
on a directory or even individual file level.
While this could likely be done with a script of some kind, my scripting skills are minimum at best, I suspect there is software out there
somewhere that can do this. I have no idea what or where it could be
tho. Given my lack of scripting skills, I'd be afraid I'd do something
bad and it delete files or something. O_O LOL
I been watching videos again, those I was watching during the time the
memory was bad. I've replaced three so far. I think I noticed this
within a few hours. Then it took a little while for me to figure out
the problem and shutdown to run the memtest. I doubt many files were affected unless it does something we don't know about. I do plan to try
to use rsync checksum and dryrun when I get back up and running. Also,
QB is finding a lot of its files are fine as well. It's still
rechecking them. It's a lot of files.
Right now, I suspect my backup copy is likely better than my main copy.
Once I get the memory in and can really run some software, then I'll run rsync with those compare options and see what it says. I just got to remember to reverse things. Backup is the source not the destination.
If this works, I may run that each time, help detect problems maybe.
Maybe??
find path-to-directory/ -type f | xargs md5sum > digest.log
then to compare with a backup of the same directory you could run:
md5sum -c digest.log | grep FAILED
Someone more knowledgeable should be able to knock out some clever python script to do the same at speed.
I'll be honest here, on two points. I'd really like to be able to do
this but I have no idea where to or how to even start. My setup for
series type videos. In a parent directory, where I'd like a tool to
start, is about 600 directories. On a few occasions, there is another directory inside that one. That directory under the parent is the name
of the series.
Sometimes I have a sub directory that has temp files;
new files I have yet to rename, considering replacing in the main series directory etc. I wouldn't mind having a file with a checksum for each video in the top directory, and even one in the sub directory. As a example.
TV_Series/
├── 77 Sunset Strip (1958)
│ └── torrent
├── Adam-12 (1968)
├── Airwolf (1984)
What
I'd like, a program that would generate checksums for each file under
say 77 Sunset and it could skip or include the directory under it.
Might be best if I could switch it on or off. Obviously, I may not want
to do this for my whole system. I'd like to be able to target
directories. I have another large directory, lets say not a series but sometimes has remakes, that I'd also like to do. It is kinda set up
like the above, parent directory with a directory underneath and on occasion one more under that.
As an example, let's assume you have the following fs tree:
VIDEO
├──TV_Series/
| ├── 77 Sunset Strip (1958)
| │ └── torrent
| ├── Adam-12 (1968)
| ├── Airwolf (1984)
|
├──Documentaries
├──Films
├──etc.
You could run:
$ find VIDEO -type f | xargs md5sum > digest.log
The file digest.log will contain md5sum hashes of each of your files within the VIDEO directory and its subdirectories.
To check if any of these files have changed, become corrupted, etc. you can run:
$ md5sum -c digest.log | grep FAILED
If you want to compare the contents of the same VIDEO directory on a back up,
you can copy the same digest file with its hashes over to the backup top directory and run again:
$ md5sum -c digest.log | grep FAILED
One thing I worry about is not just memory problems, drive failure but
also just some random error or even bit rot. Some of these files are rarely changed or even touched. I'd like a way to detect problems and there may even be a software tool that does this with some setup,
reminds me of Kbackup where you can select what to backup or leave out
on a directory or even individual file level.
Right now, I suspect my backup copy is likely better than my main copy.
This should work in rsync terms:
rsync -v --checksum --delete --recursive --dry-run SOURCE/ DESTINATION
It will output a list of files which have been deleted from the SOURCE and will need to be deleted at the DESTINATION directory.
Update. New memory sticks i bought came in today. I ran memtest from
Gentoo Live boot media and it passed. Of course, the last pair passed
when new too so let's hope this one lasts longer. Much longer.
Am Fri, Sep 06, 2024 at 01:21:20PM +0100 schrieb Michael:
find path-to-directory/ -type f | xargs md5sum > digest.log
then to compare with a backup of the same directory you could run:
md5sum -c digest.log | grep FAILED
I had a quick look at the manpage: with md5sum --quiet you can omit the grep part.
Someone more knowledgeable should be able to knock out some clever python
script to do the same at speed.
And that is exactly what I have written for myself over the last 11 years. I call it dh (short for dirhash). As I described in the previous mail, I use
it to create one hash files per directory. But it also supports one hash
file per data file and – a rather new feature – one hash file at the root of a tree. Have a look here: https://github.com/felf/dh
Clone the repo or simply download the one file and put it into your path.