• finding cause of reboot

    From void@void@f-m.fm to muc.lists.freebsd.stable on Thu Oct 16 14:35:06 2025
    From Newsgroup: muc.lists.freebsd.stable

    Hi,

    I'm trying to work out why a reboot is happening.
    line power is fine. PSU is fine.
    There's no coredump.
    Nothing in /var/log/messages console.log all.log or daemon.log

    Is there a thing I can set somewhere which when enabled will
    capture why a system reboots?

    thnx
    --


    --
    Posted automagically by a mail2news gateway at muc.de e.V.
    Please direct questions, flames, donations, etc. to news-admin@muc.de
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Alan Somers@asomers@freebsd.org to muc.lists.freebsd.stable on Thu Oct 16 08:12:10 2025
    From Newsgroup: muc.lists.freebsd.stable

    --000000000000bd9ac106414734fc
    Content-Type: text/plain; charset="UTF-8"
    Content-Transfer-Encoding: quoted-printable

    The "boot" command will give you a vague description like "shutdown" or "crash". If a crash was the cause then maybe your system isn't configured
    to make core dumps. I suggest forcing a crash with
    "sysctl debug.kdb.panic=3D1" while you watch the screen to see what happens=
    .

    On Thu, Oct 16, 2025 at 7:35=E2=80=AFAM void <void@f-m.fm> wrote:

    Hi,

    I'm trying to work out why a reboot is happening.
    line power is fine. PSU is fine.
    There's no coredump.
    Nothing in /var/log/messages console.log all.log or daemon.log

    Is there a thing I can set somewhere which when enabled will
    capture why a system reboots?

    thnx
    --



    --000000000000bd9ac106414734fc
    Content-Type: text/html; charset="UTF-8"
    Content-Transfer-Encoding: quoted-printable

    <div dir=3D"ltr"><div>The &quot;boot&quot; command will give you a vague de= scription like &quot;shutdown&quot; or &quot;crash&quot;.=C2=A0 If a crash = was the cause then maybe your system isn&#39;t configured to make core dump= s.=C2=A0 I suggest forcing a crash with &quot;sysctl=C2=A0debug.kdb.panic= =3D1&quot; while=C2=A0you watch the screen to see what happens.</div><br><d=
    iv class=3D"gmail_quote gmail_quote_container"><div dir=3D"ltr" class=3D"gm= ail_attr">On Thu, Oct 16, 2025 at 7:35=E2=80=AFAM void &lt;<a href=3D"mailt= o:void@f-m.fm">void@f-m.fm</a>&gt; wrote:<br></div><blockquote class=3D"gma= il_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,2= 04,204);padding-left:1ex">Hi,<br>

    I&#39;m trying to work out why a reboot is happening.<br>
    line power is fine. PSU is fine.<br>
    There&#39;s no coredump.<br>
    Nothing in /var/log/messages console.log all.log or daemon.log<br>

    Is there a thing I can set somewhere which when enabled will <br>
    capture why a system reboots?<br>

    thnx<br>
    -- <br>

    </blockquote></div></div>

    --000000000000bd9ac106414734fc--


    --
    Posted automagically by a mail2news gateway at muc.de e.V.
    Please direct questions, flames, donations, etc. to news-admin@muc.de
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Warner Losh@imp@bsdimp.com to muc.lists.freebsd.stable on Thu Oct 16 10:26:13 2025
    From Newsgroup: muc.lists.freebsd.stable

    --0000000000002962ef06414914f7
    Content-Type: text/plain; charset="UTF-8"
    Content-Transfer-Encoding: quoted-printable

    On Thu, Oct 16, 2025 at 8:12=E2=80=AFAM Alan Somers <asomers@freebsd.org> w= rote:

    The "boot" command will give you a vague description like "shutdown" or "crash".


    boot command? I don't have this on my system. How do yo uget that?


    If a crash was the cause then maybe your system isn't configured to make
    core dumps. I suggest forcing a crash with "sysctl debug.kdb.panic=3D1" while you watch the screen to see what happens.


    Yea. If it is a reboot, and crash dumps are enabled, and there's nothing is
    in /var/log/messages, you have limited options.

    If there's a BMC, and it speaks IPMI, there might be something in the IPMI
    log if it was hardware triggered (ipmitool sel list):
    1 | Pre-Init |0000001024| Processor #0xe4 | Presence detected |
    Asserted
    2 | Pre-Init |0000000000| System Event | OEM System boot event |
    Asserted
    3 | Pre-Init |0000000001| System Event | Timestamp Clock Sync |
    Asserted
    4 | 04/17/23 | 16:07:49 MDT | System Event | Timestamp Clock Sync |
    Asserted
    5 | 04/17/23 | 16:11:12 MDT | OS Boot #0x22 | boot completed - device
    not specified | Asserted
    6 | 04/17/23 | 16:20:42 MDT | System Event | OEM System boot event | Asserted
    7 | 04/17/23 | 16:23:16 MDT | System Event | OEM System boot event | Asserted
    8 | 04/17/23 | 16:28:44 MDT | System Event | OEM System boot event | Asserted
    9 | 04/17/23 | 16:30:25 MDT | OS Boot #0x22 | boot completed - device
    not specified | Asserted
    ...
    (oh my, I need to clear my log)

    Warner

    On Thu, Oct 16, 2025 at 7:35=E2=80=AFAM void <void@f-m.fm> wrote:

    Hi,

    I'm trying to work out why a reboot is happening.
    line power is fine. PSU is fine.
    There's no coredump.
    Nothing in /var/log/messages console.log all.log or daemon.log

    Is there a thing I can set somewhere which when enabled will
    capture why a system reboots?

    thnx
    --



    --0000000000002962ef06414914f7
    Content-Type: text/html; charset="UTF-8"
    Content-Transfer-Encoding: quoted-printable

    <div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote g= mail_quote_container"><div dir=3D"ltr" class=3D"gmail_attr">On Thu, Oct 16,=
    2025 at 8:12=E2=80=AFAM Alan Somers &lt;<a href=3D"mailto:asomers@freebsd.= org">asomers@freebsd.org</a>&gt; wrote:<br></div><blockquote class=3D"gmail= _quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204= ,204);padding-left:1ex"><div dir=3D"ltr"><div>The &quot;boot&quot; command = will give you a vague description like &quot;shutdown&quot; or &quot;crash&= quot;. </div></div></blockquote><div><br></div><div><div>boot command? I do= n&#39;t have this on my system. How do yo uget that?</div></div><div>=C2=A0= </div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;b= order-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><d= iv>If a crash was the cause then maybe your system isn&#39;t configured to = make core dumps.=C2=A0 I suggest forcing a crash with &quot;sysctl=C2=A0deb= ug.kdb.panic=3D1&quot; while=C2=A0you watch the screen to see what happens.= </div></div></blockquote><div><br></div><div>Yea. If it is a reboot, and cr= ash dumps are enabled, and there&#39;s nothing is in /var/log/messages, you=
    have limited options.</div><div><br></div><div>If there&#39;s a BMC, and i=
    t speaks IPMI, there might be something in the IPMI log if it was hardware = triggered (ipmitool sel list):</div><div>=C2=A0 =C2=A01 | =C2=A0Pre-Init = =C2=A0|0000001024| Processor #0xe4 | Presence detected | Asserted<br>=C2=A0=
    =C2=A02 | =C2=A0Pre-Init =C2=A0|0000000000| System Event | OEM System boot=
    event | Asserted<br>=C2=A0 =C2=A03 | =C2=A0Pre-Init =C2=A0|0000000001| Sys= tem Event | Timestamp Clock Sync | Asserted<br>=C2=A0 =C2=A04 | 04/17/23 | = 16:07:49 MDT | System Event | Timestamp Clock Sync | Asserted<br>=C2=A0 =C2= =A05 | 04/17/23 | 16:11:12 MDT | OS Boot #0x22 | boot completed - device no=
    t specified | Asserted<br>=C2=A0 =C2=A06 | 04/17/23 | 16:20:42 MDT | System=
    Event | OEM System boot event | Asserted<br>=C2=A0 =C2=A07 | 04/17/23 | 16= :23:16 MDT | System Event | OEM System boot event | Asserted<br>=C2=A0 =C2= =A08 | 04/17/23 | 16:28:44 MDT | System Event | OEM System boot event | Ass= erted<br>=C2=A0 =C2=A09 | 04/17/23 | 16:30:25 MDT | OS Boot #0x22 | boot co= mpleted - device not specified | Asserted</div><div>...</div><div>(oh my, I=
    need to clear my log)</div><div><br></div><div>Warner=C2=A0</div><blockquo=
    te class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px = solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div class=3D"gma= il_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Thu, Oct 16, 2025 at 7:3= 5=E2=80=AFAM void &lt;<a href=3D"mailto:void@f-m.fm" target=3D"_blank">void= @f-m.fm</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"= margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-lef= t:1ex">Hi,<br>

    I&#39;m trying to work out why a reboot is happening.<br>
    line power is fine. PSU is fine.<br>
    There&#39;s no coredump.<br>
    Nothing in /var/log/messages console.log all.log or daemon.log<br>

    Is there a thing I can set somewhere which when enabled will <br>
    capture why a system reboots?<br>

    thnx<br>
    -- <br>

    </blockquote></div></div>
    </blockquote></div></div>

    --0000000000002962ef06414914f7--


    --
    Posted automagically by a mail2news gateway at muc.de e.V.
    Please direct questions, flames, donations, etc. to news-admin@muc.de
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Alan Somers@asomers@freebsd.org to muc.lists.freebsd.stable on Thu Oct 16 10:45:23 2025
    From Newsgroup: muc.lists.freebsd.stable

    --000000000000b8c75206414958b4
    Content-Type: text/plain; charset="UTF-8"
    Content-Transfer-Encoding: quoted-printable

    On Thu, Oct 16, 2025 at 10:26=E2=80=AFAM Warner Losh <imp@bsdimp.com> wrote=
    :



    On Thu, Oct 16, 2025 at 8:12=E2=80=AFAM Alan Somers <asomers@freebsd.org>=
    wrote:

    The "boot" command will give you a vague description like "shutdown" or
    "crash".


    boot command? I don't have this on my system. How do yo uget that?


    Sorry, I meant "last".




    If a crash was the cause then maybe your system isn't configured to make
    core dumps. I suggest forcing a crash with "sysctl debug.kdb.panic=3D1"
    while you watch the screen to see what happens.


    Yea. If it is a reboot, and crash dumps are enabled, and there's nothing
    is in /var/log/messages, you have limited options.

    If there's a BMC, and it speaks IPMI, there might be something in the IPM=
    I
    log if it was hardware triggered (ipmitool sel list):
    1 | Pre-Init |0000001024| Processor #0xe4 | Presence detected |
    Asserted
    2 | Pre-Init |0000000000| System Event | OEM System boot event | Asserted
    3 | Pre-Init |0000000001| System Event | Timestamp Clock Sync |
    Asserted
    4 | 04/17/23 | 16:07:49 MDT | System Event | Timestamp Clock Sync | Asserted
    5 | 04/17/23 | 16:11:12 MDT | OS Boot #0x22 | boot completed - device
    not specified | Asserted
    6 | 04/17/23 | 16:20:42 MDT | System Event | OEM System boot event | Asserted
    7 | 04/17/23 | 16:23:16 MDT | System Event | OEM System boot event | Asserted
    8 | 04/17/23 | 16:28:44 MDT | System Event | OEM System boot event | Asserted
    9 | 04/17/23 | 16:30:25 MDT | OS Boot #0x22 | boot completed - device
    not specified | Asserted
    ...
    (oh my, I need to clear my log)

    Warner

    On Thu, Oct 16, 2025 at 7:35=E2=80=AFAM void <void@f-m.fm> wrote:

    Hi,

    I'm trying to work out why a reboot is happening.
    line power is fine. PSU is fine.
    There's no coredump.
    Nothing in /var/log/messages console.log all.log or daemon.log

    Is there a thing I can set somewhere which when enabled will
    capture why a system reboots?

    thnx
    --



    --000000000000b8c75206414958b4
    Content-Type: text/html; charset="UTF-8"
    Content-Transfer-Encoding: quoted-printable

    <div dir=3D"ltr"><div class=3D"gmail_quote gmail_quote_container"><div dir= =3D"ltr" class=3D"gmail_attr">On Thu, Oct 16, 2025 at 10:26=E2=80=AFAM Warn=
    er Losh &lt;<a href=3D"mailto:imp@bsdimp.com">imp@bsdimp.com</a>&gt; wrote:= <br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8= ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr= "><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote"><div dir=3D"ltr=
    " class=3D"gmail_attr">On Thu, Oct 16, 2025 at 8:12=E2=80=AFAM Alan Somers = &lt;<a href=3D"mailto:asomers@freebsd.org" target=3D"_blank">asomers@freebs= d.org</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"ma= rgin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:= 1ex"><div dir=3D"ltr"><div>The &quot;boot&quot; command will give you a vag=
    ue description like &quot;shutdown&quot; or &quot;crash&quot;. </div></div>= </blockquote><div><br></div><div><div>boot command? I don&#39;t have this o=
    n my system. How do yo uget that?</div></div></div></div></blockquote><div>= <br></div><div>Sorry, I meant &quot;last&quot;.</div><div>=C2=A0</div><bloc= kquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:= 1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div class=3D= "gmail_quote"><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"m= argin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left= :1ex"><div dir=3D"ltr"><div>If a crash was the cause then maybe your system=
    isn&#39;t configured to make core dumps.=C2=A0 I suggest forcing a crash w= ith &quot;sysctl=C2=A0debug.kdb.panic=3D1&quot; while=C2=A0you watch the sc= reen to see what happens.</div></div></blockquote><div><br></div><div>Yea. =
    If it is a reboot, and crash dumps are enabled, and there&#39;s nothing is =
    in /var/log/messages, you have limited options.</div><div><br></div><div>If=
    there&#39;s a BMC, and it speaks IPMI, there might be something in the IPM=
    I log if it was hardware triggered (ipmitool sel list):</div><div>=C2=A0 = =C2=A01 | =C2=A0Pre-Init =C2=A0|0000001024| Processor #0xe4 | Presence dete= cted | Asserted<br>=C2=A0 =C2=A02 | =C2=A0Pre-Init =C2=A0|0000000000| Syste=
    m Event | OEM System boot event | Asserted<br>=C2=A0 =C2=A03 | =C2=A0Pre-In=
    it =C2=A0|0000000001| System Event | Timestamp Clock Sync | Asserted<br>=C2= =A0 =C2=A04 | 04/17/23 | 16:07:49 MDT | System Event | Timestamp Clock Sync=
    | Asserted<br>=C2=A0 =C2=A05 | 04/17/23 | 16:11:12 MDT | OS Boot #0x22 | b= oot completed - device not specified | Asserted<br>=C2=A0 =C2=A06 | 04/17/2=
    3 | 16:20:42 MDT | System Event | OEM System boot event | Asserted<br>=C2=
    =A0 =C2=A07 | 04/17/23 | 16:23:16 MDT | System Event | OEM System boot even=
    t | Asserted<br>=C2=A0 =C2=A08 | 04/17/23 | 16:28:44 MDT | System Event | O=
    EM System boot event | Asserted<br>=C2=A0 =C2=A09 | 04/17/23 | 16:30:25 MDT=
    | OS Boot #0x22 | boot completed - device not specified | Asserted</div><d= iv>...</div><div>(oh my, I need to clear my log)</div><div><br></div><div>W= arner=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px = 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir= =3D"ltr"><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">O=
    n Thu, Oct 16, 2025 at 7:35=E2=80=AFAM void &lt;<a href=3D"mailto:void@f-m.= fm" target=3D"_blank">void@f-m.fm</a>&gt; wrote:<br></div><blockquote class= =3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rg= b(204,204,204);padding-left:1ex">Hi,<br>

    I&#39;m trying to work out why a reboot is happening.<br>
    line power is fine. PSU is fine.<br>
    There&#39;s no coredump.<br>
    Nothing in /var/log/messages console.log all.log or daemon.log<br>

    Is there a thing I can set somewhere which when enabled will <br>
    capture why a system reboots?<br>

    thnx<br>
    -- <br>

    </blockquote></div></div>
    </blockquote></div></div>
    </blockquote></div></div>

    --000000000000b8c75206414958b4--


    --
    Posted automagically by a mail2news gateway at muc.de e.V.
    Please direct questions, flames, donations, etc. to news-admin@muc.de
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From void@void@f-m.fm to muc.lists.freebsd.stable on Fri Oct 17 11:33:04 2025
    From Newsgroup: muc.lists.freebsd.stable

    On Thu, Oct 16, 2025 at 10:26:13AM -0600, Warner Losh wrote:

    Yea. If it is a reboot, and crash dumps are enabled, and there's nothing is >in /var/log/messages, you have limited options.

    If there's a BMC, and it speaks IPMI, there might be something in the IPMI >log if it was hardware triggered (ipmitool sel list):
    1 | Pre-Init |0000001024| Processor #0xe4 | Presence detected |

    Alas no BMC there is iLo but the iLo doesn't read lots of detail from
    the os state it just says 'server reset' in iLo diags.

    The context of the reboot(s) is/are here: https://lists.freebsd.org/archives/freebsd-ports/2025-October/008592.html

    The only other contexts i've seen hard reboots like this, like almost
    as if the wall socket was unplugged, have been with MCE errors but even they get
    logged to /var/log/messages.
    --


    --
    Posted automagically by a mail2news gateway at muc.de e.V.
    Please direct questions, flames, donations, etc. to news-admin@muc.de
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From void@void@f-m.fm to muc.lists.freebsd.stable on Fri Oct 17 14:24:11 2025
    From Newsgroup: muc.lists.freebsd.stable

    Hi,

    On Thu, Oct 16, 2025 at 08:12:10AM -0600, Alan Somers wrote:

    to make core dumps. I suggest forcing a crash with
    "sysctl debug.kdb.panic=1" while you watch the screen to see what happens.

    Not sure what this would do, apart from crash the machine immediately.
    The problem, when it happens, is it's like power reset.

    Ideally what I guess needs to happen is that the problem (whatever is
    causing it) needs to crash to debugger or and/or make a crash dump
    before completely exiting. Do you think enabling WITNESS and friends
    may help?
    --


    --
    Posted automagically by a mail2news gateway at muc.de e.V.
    Please direct questions, flames, donations, etc. to news-admin@muc.de
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Alan Somers@asomers@freebsd.org to muc.lists.freebsd.stable on Fri Oct 17 07:38:15 2025
    From Newsgroup: muc.lists.freebsd.stable

    --00000000000042052606415ad9fa
    Content-Type: text/plain; charset="UTF-8"
    Content-Transfer-Encoding: quoted-printable

    On Fri, Oct 17, 2025 at 7:24=E2=80=AFAM void <void@f-m.fm> wrote:

    Hi,

    On Thu, Oct 16, 2025 at 08:12:10AM -0600, Alan Somers wrote:

    to make core dumps. I suggest forcing a crash with
    "sysctl debug.kdb.panic=3D1" while you watch the screen to see what happ=
    ens.

    Not sure what this would do, apart from crash the machine immediately.
    The problem, when it happens, is it's like power reset.


    I meant that you should do that just to check that core dumps are working.
    If your dump device were misconfigured, for example, then a kernel panic
    would lead to a reboot, looking much like a power reset.



    Ideally what I guess needs to happen is that the problem (whatever is
    causing it) needs to crash to debugger or and/or make a crash dump
    before completely exiting. Do you think enabling WITNESS and friends
    may help?
    --


    That's a good idea, if it's really the dump device that's a problem. But
    you don't need WITNESS. Simply enabling ddb should be sufficient. You'll
    have to configure /etc/ddb.conf to break to debugger rather than dump core.

    --00000000000042052606415ad9fa
    Content-Type: text/html; charset="UTF-8"
    Content-Transfer-Encoding: quoted-printable

    <div dir=3D"ltr"><div class=3D"gmail_quote gmail_quote_container"><div dir= =3D"ltr" class=3D"gmail_attr">On Fri, Oct 17, 2025 at 7:24=E2=80=AFAM void = &lt;<a href=3D"mailto:void@f-m.fm">void@f-m.fm</a>&gt; wrote:<br></div><blo= ckquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left= :1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>

    On Thu, Oct 16, 2025 at 08:12:10AM -0600, Alan Somers wrote:<br>

    &gt;to make core dumps.=C2=A0 I suggest forcing a crash with<br> &gt;&quot;sysctl debug.kdb.panic=3D1&quot; while you watch the screen to se=
    e what happens.<br>

    Not sure what this would do, apart from crash the machine immediately.<br>
    The problem, when it happens, is it&#39;s like power reset.<br></blockquote= ><div><br></div><div>I meant that you should do that just to check that cor=
    e dumps are working.=C2=A0 If your dump device were misconfigured, for exam= ple, then a kernel panic would lead to a reboot,=C2=A0looking much like a p= ower reset.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style= =3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding= -left:1ex">

    Ideally what I guess needs to happen is that the problem (whatever is <br> causing it) needs to crash to debugger or and/or make a crash dump<br>
    before completely exiting. Do you think enabling WITNESS and friends <br>
    may help?<br>
    --=C2=A0<br></blockquote><div><br></div><div>That&#39;s a good idea, if it&= #39;s really the dump device that&#39;s a problem.=C2=A0 But you don&#39;t = need WITNESS.=C2=A0 Simply enabling ddb should be sufficient.=C2=A0 You&#39= ;ll have to configure /etc/ddb.conf to break to debugger rather than dump c= ore.=C2=A0</div></div></div>

    --00000000000042052606415ad9fa--


    --
    Posted automagically by a mail2news gateway at muc.de e.V.
    Please direct questions, flames, donations, etc. to news-admin@muc.de
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From void@void@f-m.fm to muc.lists.freebsd.stable on Fri Oct 17 17:16:29 2025
    From Newsgroup: muc.lists.freebsd.stable

    On Fri, Oct 17, 2025 at 09:56:29AM -0400, mike tancsa wrote:
    Not sure if it helps in this case or not, but If you capture smartctl
    info regularly, you can some times infer from the disk power cycle
    count and power on hours if the box rebooted due to a power issue or
    not.

    Good point.

    I'm unsure how I'd distinguish between soft power cycle
    and reboot though. The latter i'd expect to increment power cycle
    count the former might but it might not.

    I think power is a red herring though. This happens with
    two different poudriere builder machines different UPSes different locations. The only commonality I can see so far is that both were building the same thing.
    --


    --
    Posted automagically by a mail2news gateway at muc.de e.V.
    Please direct questions, flames, donations, etc. to news-admin@muc.de
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Brandon Allbery@allbery.b@gmail.com to muc.lists.freebsd.stable on Fri Oct 17 12:26:40 2025
    From Newsgroup: muc.lists.freebsd.stable

    --0000000000008ebd5406415d3391
    Content-Type: text/plain; charset="UTF-8"
    Content-Transfer-Encoding: quoted-printable

    On Fri, Oct 17, 2025 at 12:16=E2=80=AFPM void <void@f-m.fm> wrote:

    I think power is a red herring though. This happens with
    two different poudriere builder machines different UPSes different
    locations.
    The only commonality I can see so far is that both were building the same thing.


    My nasty suspicious mind immediately wonders about system memory usage and management (e.g. thrashing/page table load) during "the same thing".

    --=20
    brandon s allbery kf8nh
    allbery.b@gmail.com

    --0000000000008ebd5406415d3391
    Content-Type: text/html; charset="UTF-8"
    Content-Transfer-Encoding: quoted-printable

    <div dir=3D"ltr"><div dir=3D"ltr">On Fri, Oct 17, 2025 at 12:16=E2=80=AFPM = void &lt;<a href=3D"mailto:void@f-m.fm">void@f-m.fm</a>&gt; wrote:</div><di=
    v class=3D"gmail_quote gmail_quote_container"><blockquote class=3D"gmail_qu= ote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,20= 4);padding-left:1ex">I think power is a red herring though. This happens wi= th<br>
    two different poudriere builder machines different UPSes different location=
    s. <br>
    The only commonality I can see so far is that both were building the same <=

    thing.<br></blockquote><div><br></div><div>My nasty suspicious mind immedia= tely wonders about system memory usage and management (e.g. thrashing/page = table load) during &quot;the same thing&quot;.=C2=A0</div></div><div><br></= div><span class=3D"gmail_signature_prefix">-- </span><br><div dir=3D"ltr" c= lass=3D"gmail_signature"><div dir=3D"ltr"><div><div dir=3D"ltr"><div>brando=
    n s allbery kf8nh</div><div><a href=3D"mailto:allbery.b@gmail.com" target= =3D"_blank">allbery.b@gmail.com</a></div></div></div></div></div></div>

    --0000000000008ebd5406415d3391--


    --
    Posted automagically by a mail2news gateway at muc.de e.V.
    Please direct questions, flames, donations, etc. to news-admin@muc.de
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From void@void@f-m.fm to muc.lists.freebsd.stable on Fri Oct 17 19:37:13 2025
    From Newsgroup: muc.lists.freebsd.stable

    On Fri, Oct 17, 2025 at 12:26:40PM -0400, Brandon Allbery wrote:
    My nasty suspicious mind immediately wonders about system memory usage and >management (e.g. thrashing/page table load) during "the same thing".

    I thought about thrashing/resources but one would expect
    the system to complain in eg /var/log/messages
    One system is a dual Xeon CPU E5-2690 v2 with 128GB ram and
    hw.ncpu=20 as HT is turned off.
    At the time of the crash this was the only thing being built.

    More context here: https://lists.freebsd.org/archives/freebsd-ports/2025-October/008592.html
    --


    --
    Posted automagically by a mail2news gateway at muc.de e.V.
    Please direct questions, flames, donations, etc. to news-admin@muc.de
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Warner Losh@imp@bsdimp.com to muc.lists.freebsd.stable on Fri Oct 17 14:33:16 2025
    From Newsgroup: muc.lists.freebsd.stable

    --000000000000733619064160a5fa
    Content-Type: text/plain; charset="UTF-8"
    Content-Transfer-Encoding: quoted-printable

    On Fri, Oct 17, 2025, 2:12=E2=80=AFPM mike tancsa <mike@sentex.net> wrote:

    On 10/17/2025 12:16 PM, void wrote:
    On Fri, Oct 17, 2025 at 09:56:29AM -0400, mike tancsa wrote:
    Not sure if it helps in this case or not, but If you capture smartctl
    info regularly, you can some times infer from the disk power cycle
    count and power on hours if the box rebooted due to a power issue or
    not.

    Good point.

    I'm unsure how I'd distinguish between soft power cycle
    and reboot though. The latter i'd expect to increment power cycle
    count the former might but it might not.

    In my case, I had it with a failing power supply or one that could not
    deal with load. In my case showed power cycle counts increasing. Where
    as, a shutdown -r now before

    0{r-14mfitest}# smartctl -a /dev/ada0 | grep -i power
    entering power-saving mode.
    9 Power_On_Hours 0x0032 100 100 000 Old_age
    Always - 40206
    12 Power_Cycle_Count 0x0032 100 100 000 Old_age
    Always - 91
    0{r-14mfitest}#

    and after a reboot

    s0{r-14mfitest}# smartctl -a /dev/ada0 | grep -i power
    entering power-saving mode.
    9 Power_On_Hours 0x0032 100 100 000 Old_age
    Always - 40206
    12 Power_Cycle_Count 0x0032 100 100 000 Old_age
    Always - 91
    0{r-14mfitest}#

    At least on my SuperMicro board shows the same power cycle count. Like
    you said, probably does not apply in your case, but sometimes handy to
    keep in the back pocket


    You can also look at the unclean shutdown count in nvme drives...

    nvmecontrol logpage -p2 nvme0

    to catch power issues.

    Warner

    ---Mike




    --000000000000733619064160a5fa
    Content-Type: text/html; charset="UTF-8"
    Content-Transfer-Encoding: quoted-printable

    <div dir=3D"auto"><div><br><br><div class=3D"gmail_quote gmail_quote_contai= ner"><div dir=3D"ltr" class=3D"gmail_attr">On Fri, Oct 17, 2025, 2:12=E2=80= =AFPM mike tancsa &lt;<a href=3D"mailto:mike@sentex.net">mike@sentex.net</a= >&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0=
    0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 10/17/2025 12:16 PM=
    , void wrote:<br>
    &gt; On Fri, Oct 17, 2025 at 09:56:29AM -0400, mike tancsa wrote:<br>
    &gt;&gt; Not sure if it helps in this case or not, but If you capture smart= ctl <br>
    &gt;&gt; info regularly, you can some times infer from the disk power cycle=

    &gt;&gt; count and power on hours if the box rebooted due to a power issue =
    or <br>
    &gt;&gt; not.<br>
    &gt;<br>
    &gt; Good point.<br>
    &gt;<br>
    &gt; I&#39;m unsure how I&#39;d distinguish between soft power cycle<br>
    &gt; and reboot though. The latter i&#39;d expect to increment power cycle<=

    &gt; count the former might but it might not.<br>

    In my case, I had it with a failing power supply or one that could not <br> deal with load.=C2=A0 In my case showed power cycle counts increasing.=C2=
    =A0 Where <br>
    as, a shutdown -r now before<br>

    0{r-14mfitest}# smartctl -a /dev/ada0 | grep -i power<br>
    =C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
    =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ente= ring power-saving mode.<br>
    =C2=A0=C2=A0 9 Power_On_Hours=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0x0032=C2=
    =A0 =C2=A0100=C2=A0 =C2=A0100=C2=A0 =C2=A0000=C2=A0 =C2=A0 Old_age <br> =C2=A0=C2=A0Always=C2=A0 =C2=A0 =C2=A0 =C2=A0-=C2=A0 =C2=A0 =C2=A0 =C2=A040= 206<br>
    =C2=A0=C2=A012 Power_Cycle_Count=C2=A0 =C2=A0 =C2=A0 =C2=A00x0032=C2=A0 =C2= =A0100=C2=A0 =C2=A0100=C2=A0 =C2=A0000=C2=A0 =C2=A0 Old_age <br> =C2=A0=C2=A0Always=C2=A0 =C2=A0 =C2=A0 =C2=A0-=C2=A0 =C2=A0 =C2=A0 =C2=A091=

    0{r-14mfitest}#<br>

    and after a reboot<br>

    s0{r-14mfitest}# smartctl -a /dev/ada0 | grep -i power<br>
    =C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
    =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ente= ring power-saving mode.<br>
    =C2=A0=C2=A0 9 Power_On_Hours=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0x0032=C2=
    =A0 =C2=A0100=C2=A0 =C2=A0100=C2=A0 =C2=A0000=C2=A0 =C2=A0 Old_age <br> =C2=A0=C2=A0Always=C2=A0 =C2=A0 =C2=A0 =C2=A0-=C2=A0 =C2=A0 =C2=A0 =C2=A040= 206<br>
    =C2=A0=C2=A012 Power_Cycle_Count=C2=A0 =C2=A0 =C2=A0 =C2=A00x0032=C2=A0 =C2= =A0100=C2=A0 =C2=A0100=C2=A0 =C2=A0000=C2=A0 =C2=A0 Old_age <br> =C2=A0=C2=A0Always=C2=A0 =C2=A0 =C2=A0 =C2=A0-=C2=A0 =C2=A0 =C2=A0 =C2=A091=

    0{r-14mfitest}#<br>

    At least on my SuperMicro board shows the same power cycle count.=C2=A0 =C2= =A0Like <br>
    you said, probably does not apply in your case, but sometimes handy to <br> keep in the back pocket<br></blockquote></div></div><div dir=3D"auto"><br><= /div><div dir=3D"auto">You can also look at the unclean shutdown count in n= vme drives...</div><div dir=3D"auto"><br></div><div dir=3D"auto">nvmecontro=
    l logpage -p2 nvme0</div><div dir=3D"auto"><br></div><div dir=3D"auto">to c= atch power issues.</div><div dir=3D"auto"><br></div><div dir=3D"auto">Warne= r</div><div dir=3D"auto"><br></div><div dir=3D"auto"><div class=3D"gmail_qu= ote gmail_quote_container"><blockquote class=3D"gmail_quote" style=3D"margi= n:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
    =C2=A0=C2=A0 =C2=A0 ---Mike<br>


    </blockquote></div></div></div>

    --000000000000733619064160a5fa--


    --
    Posted automagically by a mail2news gateway at muc.de e.V.
    Please direct questions, flames, donations, etc. to news-admin@muc.de
    --- Synchronet 3.21a-Linux NewsLink 1.2