Checkstop

From RCS Wiki
Revision as of 14:54, 3 March 2023 by JeremyRand (talk | contribs) (Glossary)
Jump to navigation Jump to search

Checkstop (xstop): An error that results in the system being forcibly rebooted by the firmware.

Diagnosing a Checkstop

There are a few ways to obtain logs of a checkstop.

nvram

From either the OS or Skiroot, run this as root/sudo after the machine has force-rebooted following the checkstop (but before rebooting again):

nvram --unzip lnx,oops-log

If you're lucky, it will return a log of the most recent checkstop. If you instead get nvram: ERROR: can't decompress text: inflate() returned -3, then the log in NVRAM is corrupted for some reason, and you'll need to try a different approach.

opal-prd

Before the checkstop occurs, run the following from the OS (this is for Debian; most other distros package it as well; see your distro's documentation for details):

sudo apt install opal-prd

Once installed, if you're lucky, any subsequent checkstops should show up in journalctl output.

Client Console

While the checkstop occurs, be connected to the BMC Client Console from another machine. During the subsequent forced reboot, Hostboot will print a log of the checkstop.

Known Checkstop Issues

(NCUFIR[11]) NCU no response to snooped TLBIE

This is a firmware bug that was already fixed by IBM PNOR v2.18. According to Hostboot issue 220, the fix was in HCODE commit 9eb379569ffc1ae192aaa82bba43b25a051633b4 ("CME: big core workaround for field TLBIE xstop", committed 2021 March 23). Unfortunately that fix has not yet been merged to Raptor's HCODE repository.