Checkstop (xstop): An error that results in the system being forcibly rebooted by the firmware.
Diagnosing a Checkstop
There are a few ways to obtain logs of a checkstop.
From either the OS or Skiroot, run this as root/sudo after the machine has force-rebooted following the checkstop (but before rebooting again):
nvram --unzip lnx,oops-log
If you're lucky, it will return a log of the most recent checkstop. If you instead get
nvram: ERROR: can't decompress text: inflate() returned -3, then the log in NVRAM is corrupted for some reason, and you'll need to try a different approach.
Before the checkstop occurs, run the following from the OS (this is for Debian; most other distros package it as well; see your distro's documentation for details):
sudo apt install opal-prd
Once installed, if you're lucky, any subsequent checkstops should show up in
While the checkstop occurs, be connected to the BMC Client Console from another machine. During the subsequent forced reboot, Hostboot will print a log of the checkstop.
Known Checkstop Issues
(NCUFIR) NCU no response to snooped TLBIE
This is a firmware bug that was already fixed by IBM PNOR v2.18. According to Hostboot issue 220, the fix was in HCODE commit 9eb379569ffc1ae192aaa82bba43b25a051633b4 ("CME: big core workaround for field TLBIE xstop", committed 2021 March 23). Unfortunately that fix has not yet been merged to Raptor's HCODE repository.