Forum: Too Lazy BBS

Re: Missing MCA error messages for bad ECC

From G. Paul Ziemba@pz-freebsd-stable@ziemba.us to muc.lists.freebsd.stable on Mon Feb 2 08:16:37 2026

From Newsgroup: muc.lists.freebsd.stable

Bob,

thanks for your suggestions.

The motherboard is a plain X11SCA (no -F ipmi)

I don't know of a way to read the power supply voltages in software
while FreeBSD is running, but I did reboot into the BIOS setup and
read voltages there, and they look normal to me:

VCPU: 1.136
VDIMM: 1.224
12V: 12.233
5VCC: 5.184
3.3V_DL: 3.327
3.3VCC: 3.424
VSB: 3.328
VBAT: 3.104
VCC1_8_DL_PCM: 1.816

The BIOS versions are given as:

"ver 1.2 Build Date 12/5/19" near the top of the screen; and
"version 2.19.0045 (c) [AMI]" at the bottom of the screen

I didn't see a setting that (apparently to me) might control how
events might be filtered, but there WAS an event log that had
completely filled up with messages of the form:

<datetime> smbios 0x02 DIMMB1

with many for DIMMB1 and DIMMB2. I haven't found any documentation yet
of "0x02" other than a few online posts calling it either a single-bit
or a multi-bit ECC memory error.

I'm still favoring a diagnosis of two bad DIMMs; I just wish there were
a way to cause these errors to show up in FreeBSD somewhere so I could
detect them on a running system.

On Sun, Feb 01, 2026 at 08:30:56PM +0000, Bob Bishop wrote:

Hi,

On 1 Feb 2026, at 16:35, G. Paul Ziemba <pz-freebsd-stable@ziemba.us> wrote:

OS: 14.2-STABLE as of 250403

I seem to have at least one bad ECC DIMM

Check the power supply voltages are within tolerance if you haven???t already.

and was expecting to see MCA
messages in /var/log/messages or to the console (which I have recently redirected to /var/log/console.log via syslog.conf:

console.info /var/log/console.log

but I can't find anything in any of my logs. Why am I not seeing them?

If you have the -F variant of the board that supports IPMI, it may be that the BMC is capturing the errors so check the BMC event log. Possibly there is a setting on the BMC to control what gets passed to MCA.

Also check the BIOS event logging; I don???t see settings in the BIOS to control MCA events.

And check the BIOS version is up to date.

Background:

Motherboard: Supermicro X11SCA
CPU: Xeon E-2176G
Chipset: C246
Memory: 4x SK Hynix HMA82GU7CJR8N-VK (16GB ECC)

Bios reports ECC on its startup screen and dmidecode reports

Total Width: 72 bits
Data Width: 64 bits

for each of the dimms.

Amanda started reporting checksum errors on large backup files in its holding disk. I discovered that a large file (200GB) on any of three
disks on this system yields different sha512sum values every time I
run it on the same file. SMART data looks OK on all disks.

memtest86+ finds three bad spots in memory, at 42G, 47G and 53G. I have 4x16GB dimms installed, so I think that corresponds to two bad dimms.

% sysctl hw.mca
hw.mca.cmc_throttle: 60
hw.mca.force_scan: 0
hw.mca.interval: 300
hw.mca.maxcount: -1
hw.mca.count: 0
hw.mca.erratum383: 0
hw.mca.intel6h_HSD131: 0
hw.mca.amd10h_L1TP: 1
hw.mca.log_corrected: 1
hw.mca.enabled: 1

Thanks for any insights.
--
G. Paul Ziemba
FreeBSD unix:
8:31AM up 2 days, 14:38, 11 users, load averages: 0.71, 0.43, 0.39

--
Bob Bishop t: +44 (0)118 940 1243
rb@gid.co.uk m: +44 (0)783 626 4518

--
G. Paul Ziemba
FreeBSD unix:
7:51AM up 35 mins, 2 users, load averages: 0.32, 0.56, 0.47

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21b-Linux NewsLink 1.2

From Jan Martin Mikkelsen@janm@transactionware.com to muc.lists.freebsd.stable on Mon Feb 2 21:27:30 2026

From Newsgroup: muc.lists.freebsd.stable

Hi,
Possibly a silly question. Have you tried sysutils/mcelog?
Regards,
Jan M.

On 2. Feb 2026, at 17:16, G. Paul Ziemba <pz-freebsd-stable@ziemba.us> wrote:

Bob,

thanks for your suggestions.

The motherboard is a plain X11SCA (no -F ipmi)

I don't know of a way to read the power supply voltages in software
while FreeBSD is running, but I did reboot into the BIOS setup and
read voltages there, and they look normal to me:

VCPU: 1.136
VDIMM: 1.224
12V: 12.233
5VCC: 5.184
3.3V_DL: 3.327
3.3VCC: 3.424
VSB: 3.328
VBAT: 3.104
VCC1_8_DL_PCM: 1.816

The BIOS versions are given as:

"ver 1.2 Build Date 12/5/19" near the top of the screen; and
"version 2.19.0045 (c) [AMI]" at the bottom of the screen

I didn't see a setting that (apparently to me) might control how
events might be filtered, but there WAS an event log that had
completely filled up with messages of the form:

<datetime> smbios 0x02 DIMMB1

with many for DIMMB1 and DIMMB2. I haven't found any documentation yet
of "0x02" other than a few online posts calling it either a single-bit
or a multi-bit ECC memory error.

I'm still favoring a diagnosis of two bad DIMMs; I just wish there were
a way to cause these errors to show up in FreeBSD somewhere so I could
detect them on a running system.

On Sun, Feb 01, 2026 at 08:30:56PM +0000, Bob Bishop wrote:

Hi,

On 1 Feb 2026, at 16:35, G. Paul Ziemba <pz-freebsd-stable@ziemba.us> wrote:

OS: 14.2-STABLE as of 250403

I seem to have at least one bad ECC DIMM

Check the power supply voltages are within tolerance if you haven???t already.

and was expecting to see MCA
messages in /var/log/messages or to the console (which I have recently
redirected to /var/log/console.log via syslog.conf:

console.info /var/log/console.log

but I can't find anything in any of my logs. Why am I not seeing them?

If you have the -F variant of the board that supports IPMI, it may be that the BMC is capturing the errors so check the BMC event log. Possibly there is a setting on the BMC to control what gets passed to MCA.

Also check the BIOS event logging; I don???t see settings in the BIOS to control MCA events.

And check the BIOS version is up to date.

Background:

Motherboard: Supermicro X11SCA
CPU: Xeon E-2176G
Chipset: C246
Memory: 4x SK Hynix HMA82GU7CJR8N-VK (16GB ECC)

Bios reports ECC on its startup screen and dmidecode reports

Total Width: 72 bits
Data Width: 64 bits

for each of the dimms.

Amanda started reporting checksum errors on large backup files in its
holding disk. I discovered that a large file (200GB) on any of three
disks on this system yields different sha512sum values every time I
run it on the same file. SMART data looks OK on all disks.

memtest86+ finds three bad spots in memory, at 42G, 47G and 53G. I have
4x16GB dimms installed, so I think that corresponds to two bad dimms.

% sysctl hw.mca
hw.mca.cmc_throttle: 60
hw.mca.force_scan: 0
hw.mca.interval: 300
hw.mca.maxcount: -1
hw.mca.count: 0
hw.mca.erratum383: 0
hw.mca.intel6h_HSD131: 0
hw.mca.amd10h_L1TP: 1
hw.mca.log_corrected: 1
hw.mca.enabled: 1

Thanks for any insights.
--
G. Paul Ziemba
FreeBSD unix:
8:31AM up 2 days, 14:38, 11 users, load averages: 0.71, 0.43, 0.39

--
Bob Bishop t: +44 (0)118 940 1243
rb@gid.co.uk m: +44 (0)783 626 4518

--
G. Paul Ziemba
FreeBSD unix:
7:51AM up 35 mins, 2 users, load averages: 0.32, 0.56, 0.47

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21b-Linux NewsLink 1.2

From Chris@bsd-lists@bsdforge.com to muc.lists.freebsd.stable on Mon Feb 2 13:23:10 2026

From Newsgroup: muc.lists.freebsd.stable

--=_ac7f1ffec62d053661b333e8a5b6085a
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII;
format=flowed

On 2026-02-02 08:16, G. Paul Ziemba wrote:

Bob,

thanks for your suggestions.

The motherboard is a plain X11SCA (no -F ipmi)

I don't know of a way to read the power supply voltages in software
while FreeBSD is running, but I did reboot into the BIOS setup and
read voltages there, and they look normal to me:

VCPU: 1.136
VDIMM: 1.224
12V: 12.233
5VCC: 5.184
3.3V_DL: 3.327
3.3VCC: 3.424
VSB: 3.328
VBAT: 3.104
VCC1_8_DL_PCM: 1.816

I'd just like tg mention here, that while the voltages may read
within an expected range. It will not inform you of AC bleed. IOW
failing diodes will leak AC. Which will result in (eventual) component
failure. I've tossed many a PSU for just this reason. If you happen to
have a spare around. It'd make it pretty to test this.

--Chris

The BIOS versions are given as:

"ver 1.2 Build Date 12/5/19" near the top of the screen; and
"version 2.19.0045 (c) [AMI]" at the bottom of the screen

I didn't see a setting that (apparently to me) might control how
events might be filtered, but there WAS an event log that had
completely filled up with messages of the form:

<datetime> smbios 0x02 DIMMB1

with many for DIMMB1 and DIMMB2. I haven't found any documentation yet
of "0x02" other than a few online posts calling it either a single-bit
or a multi-bit ECC memory error.

I'm still favoring a diagnosis of two bad DIMMs; I just wish there were
a way to cause these errors to show up in FreeBSD somewhere so I could
detect them on a running system.

On Sun, Feb 01, 2026 at 08:30:56PM +0000, Bob Bishop wrote:

Hi,

On 1 Feb 2026, at 16:35, G. Paul Ziemba <pz-freebsd-stable@ziemba.us> wrote:

OS: 14.2-STABLE as of 250403

I seem to have at least one bad ECC DIMM

Check the power supply voltages are within tolerance if you haven???t
already.

and was expecting to see MCA
messages in /var/log/messages or to the console (which I have recently
redirected to /var/log/console.log via syslog.conf:

console.info /var/log/console.log

but I can't find anything in any of my logs. Why am I not seeing them?

If you have the -F variant of the board that supports IPMI, it may be that >> the BMC is capturing the errors so check the BMC event log. Possibly there >> is a setting on the BMC to control what gets passed to MCA.

Also check the BIOS event logging; I don???t see settings in the BIOS to
control MCA events.

And check the BIOS version is up to date.

Background:

Motherboard: Supermicro X11SCA
CPU: Xeon E-2176G
Chipset: C246
Memory: 4x SK Hynix HMA82GU7CJR8N-VK (16GB ECC)

Bios reports ECC on its startup screen and dmidecode reports

Total Width: 72 bits
Data Width: 64 bits

for each of the dimms.

Amanda started reporting checksum errors on large backup files in its
holding disk. I discovered that a large file (200GB) on any of three
disks on this system yields different sha512sum values every time I
run it on the same file. SMART data looks OK on all disks.

memtest86+ finds three bad spots in memory, at 42G, 47G and 53G. I have
4x16GB dimms installed, so I think that corresponds to two bad dimms.

% sysctl hw.mca
hw.mca.cmc_throttle: 60
hw.mca.force_scan: 0
hw.mca.interval: 300
hw.mca.maxcount: -1
hw.mca.count: 0
hw.mca.erratum383: 0
hw.mca.intel6h_HSD131: 0
hw.mca.amd10h_L1TP: 1
hw.mca.log_corrected: 1
hw.mca.enabled: 1

Thanks for any insights.
--
G. Paul Ziemba
FreeBSD unix:
8:31AM up 2 days, 14:38, 11 users, load averages: 0.71, 0.43, 0.39

--
Bob Bishop t: +44 (0)118 940 1243
rb@gid.co.uk m: +44 (0)783 626 4518

--Chris
--=_ac7f1ffec62d053661b333e8a5b6085a
Content-Transfer-Encoding: 7bit
Content-Type: application/pgp-keys;
name=0xE512722F.asc
Content-Disposition: attachment;
filename=0xE512722F.asc;
size=3074

-----BEGIN PGP PUBLIC KEY BLOCK-----

mQENBGf/G0IBCADARuJc6IcwOe3jv7dQsP1X/EIHvCFExPbTmlMNFMXbMMccQUnV o8ayEn+wmTvPhw7uL3PDk7DQs16W1sN2b8UMFc804cVWNGtoG3rA+Np+TFEYlXJx eh5Q42VHptkuwzHKl+q2utkpRlS7uHyfjsInQAoHxLyi/wrsaZTHHhDbLLhJ5Ez0 arohQ2Q1w0M5e9rW8Fy5rpC7RpC6uO1SZMxcbdqURI/BBqxbiD1iW62cDWFkfFX+ dtaEXghFV7BIBMDSrgIunGoEfdMZgXys7O6bPWn8z0cuOZIPj4HrjoCYARyQ+sdc rjz/k06SLM/UvEZDorJhT4DbYrwMNvaPWJiPABEBAAG0HkNocmlzIDxic2QtbGlz dHNAYnNkZm9yZ2UuY29tPokBNQQQAQgAHwUCZ/8bQgYLCQcIAwIEFQgKAgMWAgEC GQECGwMCHgEACgkQVKBqaOUSci8bSwf/fK3QcTYXRMrv82HIp4SiGCSD7/bRmyWr ipv2vzknGFHxPBN4AEWIqF/U4j5oDXaodyU6xsy59Z47/lgbyzyZiVR6nmJVgZVf el/EgwnLt7ZuYGLLEhIN2pd9itJkB8PMPZrUHMWgIw8BxX5YFYGuyiNe9pGn0Coj 98t/v3fouhqksH+BpB4TBHJBBDSxSiMm66VTJX4Xcnpf0ZnQVP4GBuoyodnFBfdI wqftPLESsCC08lUhD2j7v2NRWwMi/q3ed8D6VCKPImBByYnBZL5gu56K5bwqaQfN itu06APuIYnG71qxgn1EPO63lovWP5NZGgOKvzs3K+JfPF79BiOUFbQjQ2hyaXMg PG1haWxvcEBocmNvbW11bmljYXRpb25zLm5ldD6JATEEEAEIABwFAmf/G0IGCwkH CAMCBBUICgIDFgIBAhsDAh4BAAoJEFSgamjlEnIvBH8H9RGwzZuU6+zvH1WjQa97 yWpEt9rC+BIBJThev2Cpls2LqBqIeIQVZPnyLAZWgFaiezL6+xbvcNt6OnfidIYa x8iRwCMC6/Bs8H2Wef9qfGxXi+jHPLYQk3juiZVmBhIK6FJZkzaW4wSiawofwzbp zqNxO8dZ0j4foaJZrNi8iqsvKjiiHoSFaJtumIThAeydI18CNLeFaS53sk5nad6I wCYeFKmJ/22dMP7DOFEgyG1iNYgY+AGREMkEsBiLpqYjJ5asK+1UdUy/TRly1hOt HHxCiX0Fh9ZYM2vLIj7sq4LKaMPGeYC3qTqBYugVeyz7LkiI2ft/BKveA5JxuYKk ZrQiQ2hyaXMgPG5hbm9nQGhyY29tbXVuaWNhdGlvbnMubmV0PokBMgQQAQgAHAUC Z/8bQgYLCQcIAwIEFQgKAgMWAgECGwMCHgEACgkQVKBqaOUSci+4Bwf8D0Ogk2/X ud/CsAgHozwzKPqfesL5SRWM14hLnU9/EHoplnZgNexbVY1wXIi2FYPo5cve9QxW Nmt3S3UTF9j2fGqv0wmeHv3EqogFUHnftLyWpbeTPOFDMIQp/BOD6ygfeXxXWxRT L6zvUkSrDtHvkQHPWGRxwP+ihWjpw9AQR/R4/qAuTAZZM0O7UnJEo4mWXatl+utF wegG2giwFTTxfF+1rMpFtUDjYCpRQ6ZmE+gC1mHUMoH7GJMQv12DbqwKrxtwGfd0 AJNO3ZDnxl24BmIfl1YqQGZQ5iIH7At4YItESbU45hoNNsG9oDrsil78EUCAtXHd UPScj+eXaeAkgrQfQ2hyaXMgPHBvcnRtYXN0ZXJAYnNkZm9yZ2UuY29tPokBMgQQ AQgAHAUCZ/8bQgYLCQcIAwIEFQgKAgMWAgECGwMCHgEACgkQVKBqaOUSci9o7Af+ Lwu5hJlI5HZNGwAll7QTIFZVW+y4OEg+amhxTDGbAAqlnSIkHC1KgkmIOOrThme3 kTFCqfIIsuP73yKxHq6kRG0zH5/7asAPNAUOfzD7B2o/gMyuTRKyG5r9f3UmACr4 6qvtFhIwROXr6+NNT2IKg3l0/8F58A0N/TR8D2PTHeo4x6jYcZQDCrCy7BAdk3cu V16k4z/1UzRa07b5McezbWL20cIaZ+dqNcCjKZpzPlTyTCGgrNNtaDpNVhoWUKMB YNcKql+tfC1IpX8l+IU6OBKcDKMkQojvO1QrZqY8MDJGo8jq/CtotQ8+IpAai3Bx dQEsxrxlcKTR4rUqvd8VGbkBDQRn/xtCAQgAv5Nv/aQN72xsLik+K73PJwpUmyhu vnI6stM6dSecylXVHjZ7C4n/m0eQEeQCl+9lByHR9N8H+WS3DtAd4pmciiIxRQLA JZiuaLYcy9ziy1h7130VoR7hhJHzo9FIhWkTGlCDX3egUZrYhMiwFUO8lNltLB8o TBvIrMSsnUzawtQjq/otv0Jf+oBPbG+gIYnAm7w6r86n/l+eVxf5eEoS7wV0DJfp b2jE5zWErWk8I/tq4e8T+1VQeVQR6wz+NrUCSxkPkpNAm19AFUHOk//yvMGWVlDW F6gr3ErN2a0w/kZ0lz3Msxsb87QT+MnJf/T3cuEqdTIoSk74BfNEAdMohQARAQAB iQEfBBgBCAAJBQJn/xtCAhsMAAoJEFSgamjlEnIvyvIH/26zytSVNDaxtprg7XtX LerIWf9RyVx8omCw/lXKRCcgkfwD7QR+nSZ0thWOGMpcnivjuReeVRkz/webUF47 BXJ/Tge07nrxdtyTIHBbp35fPIriaKaII6YWc2Ufdxwv+cD8PADS6gQWAlgrWLmn VmYtyHs4kwtiPZyUyuBdWnZal2GyYY0WVwYjvbk95eInwOaIdoTjesJ7ZhUFu155 r4hh9GlvM0uv8WJ5Mw9wvHa5fIM205I5g0IWC7yvTwwwKHlV4JQQOqMwfv569OEl 1GKqA12nSVziB1+UV+I0NqOABWi/MOi+IySPzYP+XgdPfRNx4vmoHYZwWOQ3t4Jd
TEM=
=oj6y
-----END PGP PUBLIC KEY BLOCK-----

--=_ac7f1ffec62d053661b333e8a5b6085a--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21b-Linux NewsLink 1.2

From G. Paul Ziemba@pz-freebsd-stable@ziemba.us to muc.lists.freebsd.stable on Fri Feb 6 18:54:50 2026

From Newsgroup: muc.lists.freebsd.stable

For X11SCA owners who might find this thread in the future:

Test results:

I did some more exhaustive testing with memtest86 and individual
DIMMs installed. These DIMMs were all SK Hynix HMA82GU7CJR8N-VK,
purchased directly from Supermicro around 2019-2020 in one batch.

All four DIMMs had varying degrees of failure. Two of them had in
the neighborhood of 20 errors each in one full pass of memtest86.
A third had about two errors in a full pass of memtest86. In all
of those cases, a bunch of "smbios 0x02" events showed up in the
event log visible from the BIOS setup screens.

The fourth original DIMM had no errors in one full pass of memtest86,
but generated a few events in the smbios log.

I got two new noname ECC DIMMs and tested them. No errors in one
full pass of memtest86, and no smbios events logged.

Future monitoring:

I'm still dismayed that FreeBSD doesn't seem to notice/report these
ECC events. I have not had a chance to exhaustively search the BIOS
setup screens for a setting that might enable signaling the OS, yet.

However, I noticed that "dmidecode" reports information about the
"System Event Log". I found the published SMBIOS specification
(see, for example, https://www.dmtf.org/standards/smbios) that
describes the format of the System Event Log.

I wrote a simple perl script to open /dev/mem, seek to the start
address, and read the log area and got back what looked like a
valid event log. It should be straightforward to parse the log entries
and discover ECC events, so I can build a monitoring solution for
this motherboard.

pz-freebsd-stable@ziemba.us ("G. Paul Ziemba") writes:

Bob,

thanks for your suggestions.

The motherboard is a plain X11SCA (no -F ipmi)

I don't know of a way to read the power supply voltages in software
while FreeBSD is running, but I did reboot into the BIOS setup and
read voltages there, and they look normal to me:

VCPU: 1.136
VDIMM: 1.224
12V: 12.233
5VCC: 5.184
3.3V_DL: 3.327
3.3VCC: 3.424
VSB: 3.328
VBAT: 3.104
VCC1_8_DL_PCM: 1.816

The BIOS versions are given as:

"ver 1.2 Build Date 12/5/19" near the top of the screen; and
"version 2.19.0045 (c) [AMI]" at the bottom of the screen

I didn't see a setting that (apparently to me) might control how
events might be filtered, but there WAS an event log that had
completely filled up with messages of the form:

<datetime> smbios 0x02 DIMMB1

with many for DIMMB1 and DIMMB2. I haven't found any documentation yet
of "0x02" other than a few online posts calling it either a single-bit
or a multi-bit ECC memory error.

I'm still favoring a diagnosis of two bad DIMMs; I just wish there were
a way to cause these errors to show up in FreeBSD somewhere so I could
detect them on a running system.

--
G. Paul Ziemba
FreeBSD unix:
10:51AM up 1 day, 13:34, 17 users, load averages: 0.35, 0.25, 0.26

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21b-Linux NewsLink 1.2

From G. Paul Ziemba@pz-freebsd-stable@ziemba.us to muc.lists.freebsd.stable on Sun Feb 1 16:35:29 2026

From Newsgroup: muc.lists.freebsd.stable

OS: 14.2-STABLE as of 250403

I seem to have at least one bad ECC DIMM and was expecting to see MCA
messages in /var/log/messages or to the console (which I have recently redirected to /var/log/console.log via syslog.conf:

console.info /var/log/console.log

but I can't find anything in any of my logs. Why am I not seeing them?

Background:

Motherboard: Supermicro X11SCA
CPU: Xeon E-2176G
Chipset: C246
Memory: 4x SK Hynix HMA82GU7CJR8N-VK (16GB ECC)

Bios reports ECC on its startup screen and dmidecode reports

Total Width: 72 bits
Data Width: 64 bits

for each of the dimms.

Amanda started reporting checksum errors on large backup files in its
holding disk. I discovered that a large file (200GB) on any of three
disks on this system yields different sha512sum values every time I
run it on the same file. SMART data looks OK on all disks.

memtest86+ finds three bad spots in memory, at 42G, 47G and 53G. I have
4x16GB dimms installed, so I think that corresponds to two bad dimms.

% sysctl hw.mca
hw.mca.cmc_throttle: 60
hw.mca.force_scan: 0
hw.mca.interval: 300
hw.mca.maxcount: -1
hw.mca.count: 0
hw.mca.erratum383: 0
hw.mca.intel6h_HSD131: 0
hw.mca.amd10h_L1TP: 1
hw.mca.log_corrected: 1
hw.mca.enabled: 1

Thanks for any insights.
--
G. Paul Ziemba
FreeBSD unix:
8:31AM up 2 days, 14:38, 11 users, load averages: 0.71, 0.43, 0.39

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bob Bishop@rb@gid.co.uk to muc.lists.freebsd.stable on Sun Feb 1 20:30:56 2026

From Newsgroup: muc.lists.freebsd.stable

Hi,

On 1 Feb 2026, at 16:35, G. Paul Ziemba <pz-freebsd-stable@ziemba.us> wrote:

OS: 14.2-STABLE as of 250403

I seem to have at least one bad ECC DIMM

Check the power supply voltages are within tolerance if you havenrCOt already.

and was expecting to see MCA
messages in /var/log/messages or to the console (which I have recently redirected to /var/log/console.log via syslog.conf:

console.info /var/log/console.log

but I can't find anything in any of my logs. Why am I not seeing them?

If you have the -F variant of the board that supports IPMI, it may be that the BMC is capturing the errors so check the BMC event log. Possibly there is a setting on the BMC to control what gets passed to MCA.
Also check the BIOS event logging; I donrCOt see settings in the BIOS to control MCA events.
And check the BIOS version is up to date.

Background:

Motherboard: Supermicro X11SCA
CPU: Xeon E-2176G
Chipset: C246
Memory: 4x SK Hynix HMA82GU7CJR8N-VK (16GB ECC)

Bios reports ECC on its startup screen and dmidecode reports

Total Width: 72 bits
Data Width: 64 bits

for each of the dimms.

Amanda started reporting checksum errors on large backup files in its
holding disk. I discovered that a large file (200GB) on any of three
disks on this system yields different sha512sum values every time I
run it on the same file. SMART data looks OK on all disks.

memtest86+ finds three bad spots in memory, at 42G, 47G and 53G. I have 4x16GB dimms installed, so I think that corresponds to two bad dimms.

% sysctl hw.mca
hw.mca.cmc_throttle: 60
hw.mca.force_scan: 0
hw.mca.interval: 300
hw.mca.maxcount: -1
hw.mca.count: 0
hw.mca.erratum383: 0
hw.mca.intel6h_HSD131: 0
hw.mca.amd10h_L1TP: 1
hw.mca.log_corrected: 1
hw.mca.enabled: 1

Thanks for any insights.
--
G. Paul Ziemba
FreeBSD unix:
8:31AM up 2 days, 14:38, 11 users, load averages: 0.71, 0.43, 0.39

--
Bob Bishop t: +44 (0)118 940 1243
rb@gid.co.uk m: +44 (0)783 626 4518
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-admin@muc.de
--- Synchronet 3.21d-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Geek2
  Sun May 17 07:06:15 2026
  from Euclid, Oh via Telnet
- Geek2
  Sat May 16 21:25:04 2026
  from Euclid, Oh via Telnet
- Jas Hud
  Sat May 16 00:50:28 2026
  from Bbs.Eob-Bbs.Com,wi via Telnet
- Geek2
  Fri May 15 19:53:20 2026
  from Euclid, Oh via Telnet

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	65
Nodes:	6 (0 / 6)
Uptime:	14:20:37
Calls:	862
Files:	1,311
D/L today:	10 files (18,532K bytes)
Messages:	265,525

Re: Missing MCA error messages for bad ECC

Who's Online

Recent Visitors

System Info