Hi,
On 1 Feb 2026, at 16:35, G. Paul Ziemba <pz-freebsd-stable@ziemba.us> wrote:
OS: 14.2-STABLE as of 250403
I seem to have at least one bad ECC DIMM
Check the power supply voltages are within tolerance if you haven???t already.
and was expecting to see MCA
messages in /var/log/messages or to the console (which I have recently redirected to /var/log/console.log via syslog.conf:
console.info /var/log/console.log
but I can't find anything in any of my logs. Why am I not seeing them?
If you have the -F variant of the board that supports IPMI, it may be that the BMC is capturing the errors so check the BMC event log. Possibly there is a setting on the BMC to control what gets passed to MCA.
Also check the BIOS event logging; I don???t see settings in the BIOS to control MCA events.
And check the BIOS version is up to date.
Background:
Motherboard: Supermicro X11SCA
CPU: Xeon E-2176G
Chipset: C246
Memory: 4x SK Hynix HMA82GU7CJR8N-VK (16GB ECC)
Bios reports ECC on its startup screen and dmidecode reports
Total Width: 72 bits
Data Width: 64 bits
for each of the dimms.
Amanda started reporting checksum errors on large backup files in its holding disk. I discovered that a large file (200GB) on any of three
disks on this system yields different sha512sum values every time I
run it on the same file. SMART data looks OK on all disks.
memtest86+ finds three bad spots in memory, at 42G, 47G and 53G. I have 4x16GB dimms installed, so I think that corresponds to two bad dimms.
% sysctl hw.mca
hw.mca.cmc_throttle: 60
hw.mca.force_scan: 0
hw.mca.interval: 300
hw.mca.maxcount: -1
hw.mca.count: 0
hw.mca.erratum383: 0
hw.mca.intel6h_HSD131: 0
hw.mca.amd10h_L1TP: 1
hw.mca.log_corrected: 1
hw.mca.enabled: 1
Thanks for any insights.
--
G. Paul Ziemba
FreeBSD unix:
8:31AM up 2 days, 14:38, 11 users, load averages: 0.71, 0.43, 0.39
--
Bob Bishop t: +44 (0)118 940 1243
rb@gid.co.uk m: +44 (0)783 626 4518
On 2. Feb 2026, at 17:16, G. Paul Ziemba <pz-freebsd-stable@ziemba.us> wrote:
Bob,
thanks for your suggestions.
The motherboard is a plain X11SCA (no -F ipmi)
I don't know of a way to read the power supply voltages in software
while FreeBSD is running, but I did reboot into the BIOS setup and
read voltages there, and they look normal to me:
VCPU: 1.136
VDIMM: 1.224
12V: 12.233
5VCC: 5.184
3.3V_DL: 3.327
3.3VCC: 3.424
VSB: 3.328
VBAT: 3.104
VCC1_8_DL_PCM: 1.816
The BIOS versions are given as:
"ver 1.2 Build Date 12/5/19" near the top of the screen; and
"version 2.19.0045 (c) [AMI]" at the bottom of the screen
I didn't see a setting that (apparently to me) might control how
events might be filtered, but there WAS an event log that had
completely filled up with messages of the form:
<datetime> smbios 0x02 DIMMB1
with many for DIMMB1 and DIMMB2. I haven't found any documentation yet
of "0x02" other than a few online posts calling it either a single-bit
or a multi-bit ECC memory error.
I'm still favoring a diagnosis of two bad DIMMs; I just wish there were
a way to cause these errors to show up in FreeBSD somewhere so I could
detect them on a running system.
On Sun, Feb 01, 2026 at 08:30:56PM +0000, Bob Bishop wrote:
Hi,
On 1 Feb 2026, at 16:35, G. Paul Ziemba <pz-freebsd-stable@ziemba.us> wrote:
OS: 14.2-STABLE as of 250403
I seem to have at least one bad ECC DIMM
Check the power supply voltages are within tolerance if you haven???t already.
and was expecting to see MCA
messages in /var/log/messages or to the console (which I have recently
redirected to /var/log/console.log via syslog.conf:
console.info /var/log/console.log
but I can't find anything in any of my logs. Why am I not seeing them?
If you have the -F variant of the board that supports IPMI, it may be that the BMC is capturing the errors so check the BMC event log. Possibly there is a setting on the BMC to control what gets passed to MCA.
Also check the BIOS event logging; I don???t see settings in the BIOS to control MCA events.
And check the BIOS version is up to date.
Background:
Motherboard: Supermicro X11SCA
CPU: Xeon E-2176G
Chipset: C246
Memory: 4x SK Hynix HMA82GU7CJR8N-VK (16GB ECC)
Bios reports ECC on its startup screen and dmidecode reports
Total Width: 72 bits
Data Width: 64 bits
for each of the dimms.
Amanda started reporting checksum errors on large backup files in its
holding disk. I discovered that a large file (200GB) on any of three
disks on this system yields different sha512sum values every time I
run it on the same file. SMART data looks OK on all disks.
memtest86+ finds three bad spots in memory, at 42G, 47G and 53G. I have
4x16GB dimms installed, so I think that corresponds to two bad dimms.
% sysctl hw.mca
hw.mca.cmc_throttle: 60
hw.mca.force_scan: 0
hw.mca.interval: 300
hw.mca.maxcount: -1
hw.mca.count: 0
hw.mca.erratum383: 0
hw.mca.intel6h_HSD131: 0
hw.mca.amd10h_L1TP: 1
hw.mca.log_corrected: 1
hw.mca.enabled: 1
Thanks for any insights.
--
G. Paul Ziemba
FreeBSD unix:
8:31AM up 2 days, 14:38, 11 users, load averages: 0.71, 0.43, 0.39
--
Bob Bishop t: +44 (0)118 940 1243
rb@gid.co.uk m: +44 (0)783 626 4518
--
G. Paul Ziemba
FreeBSD unix:
7:51AM up 35 mins, 2 users, load averages: 0.32, 0.56, 0.47
Bob,
thanks for your suggestions.
The motherboard is a plain X11SCA (no -F ipmi)
I don't know of a way to read the power supply voltages in software
while FreeBSD is running, but I did reboot into the BIOS setup and
read voltages there, and they look normal to me:
VCPU: 1.136
VDIMM: 1.224
12V: 12.233
5VCC: 5.184
3.3V_DL: 3.327
3.3VCC: 3.424
VSB: 3.328
VBAT: 3.104
VCC1_8_DL_PCM: 1.816
The BIOS versions are given as:
"ver 1.2 Build Date 12/5/19" near the top of the screen; and
"version 2.19.0045 (c) [AMI]" at the bottom of the screen
I didn't see a setting that (apparently to me) might control how
events might be filtered, but there WAS an event log that had
completely filled up with messages of the form:
<datetime> smbios 0x02 DIMMB1
with many for DIMMB1 and DIMMB2. I haven't found any documentation yet
of "0x02" other than a few online posts calling it either a single-bit
or a multi-bit ECC memory error.
I'm still favoring a diagnosis of two bad DIMMs; I just wish there were
a way to cause these errors to show up in FreeBSD somewhere so I could
detect them on a running system.
On Sun, Feb 01, 2026 at 08:30:56PM +0000, Bob Bishop wrote:
Hi,
On 1 Feb 2026, at 16:35, G. Paul Ziemba <pz-freebsd-stable@ziemba.us> wrote:
OS: 14.2-STABLE as of 250403
I seem to have at least one bad ECC DIMM
Check the power supply voltages are within tolerance if you haven???t
already.
and was expecting to see MCA
messages in /var/log/messages or to the console (which I have recently
redirected to /var/log/console.log via syslog.conf:
console.info /var/log/console.log
but I can't find anything in any of my logs. Why am I not seeing them?
If you have the -F variant of the board that supports IPMI, it may be that >> the BMC is capturing the errors so check the BMC event log. Possibly there >> is a setting on the BMC to control what gets passed to MCA.
Also check the BIOS event logging; I don???t see settings in the BIOS to
control MCA events.
And check the BIOS version is up to date.
Background:
Motherboard: Supermicro X11SCA
CPU: Xeon E-2176G
Chipset: C246
Memory: 4x SK Hynix HMA82GU7CJR8N-VK (16GB ECC)
Bios reports ECC on its startup screen and dmidecode reports
Total Width: 72 bits
Data Width: 64 bits
for each of the dimms.
Amanda started reporting checksum errors on large backup files in its
holding disk. I discovered that a large file (200GB) on any of three
disks on this system yields different sha512sum values every time I
run it on the same file. SMART data looks OK on all disks.
memtest86+ finds three bad spots in memory, at 42G, 47G and 53G. I have
4x16GB dimms installed, so I think that corresponds to two bad dimms.
% sysctl hw.mca
hw.mca.cmc_throttle: 60
hw.mca.force_scan: 0
hw.mca.interval: 300
hw.mca.maxcount: -1
hw.mca.count: 0
hw.mca.erratum383: 0
hw.mca.intel6h_HSD131: 0
hw.mca.amd10h_L1TP: 1
hw.mca.log_corrected: 1
hw.mca.enabled: 1
Thanks for any insights.
--
G. Paul Ziemba
FreeBSD unix:
8:31AM up 2 days, 14:38, 11 users, load averages: 0.71, 0.43, 0.39
--
Bob Bishop t: +44 (0)118 940 1243
rb@gid.co.uk m: +44 (0)783 626 4518
Bob,--
thanks for your suggestions.
The motherboard is a plain X11SCA (no -F ipmi)
I don't know of a way to read the power supply voltages in software
while FreeBSD is running, but I did reboot into the BIOS setup and
read voltages there, and they look normal to me:
VCPU: 1.136
VDIMM: 1.224
12V: 12.233
5VCC: 5.184
3.3V_DL: 3.327
3.3VCC: 3.424
VSB: 3.328
VBAT: 3.104
VCC1_8_DL_PCM: 1.816
The BIOS versions are given as:
"ver 1.2 Build Date 12/5/19" near the top of the screen; and
"version 2.19.0045 (c) [AMI]" at the bottom of the screen
I didn't see a setting that (apparently to me) might control how
events might be filtered, but there WAS an event log that had
completely filled up with messages of the form:
<datetime> smbios 0x02 DIMMB1
with many for DIMMB1 and DIMMB2. I haven't found any documentation yet
of "0x02" other than a few online posts calling it either a single-bit
or a multi-bit ECC memory error.
I'm still favoring a diagnosis of two bad DIMMs; I just wish there were
a way to cause these errors to show up in FreeBSD somewhere so I could
detect them on a running system.
| Sysop: | Amessyroom |
|---|---|
| Location: | Fayetteville, NC |
| Users: | 59 |
| Nodes: | 6 (0 / 6) |
| Uptime: | 05:36:12 |
| Calls: | 810 |
| Files: | 1,287 |
| D/L today: |
6 files (10,211K bytes) |
| Messages: | 204,948 |