Difference between revisions of "Troubleshooting/Guard Partition"

From RCS Wiki
Jump to navigation Jump to search
(Add page on clearing the guard partition)
 
(more information about the mechanics behind this required)
Line 2: Line 2:
  
 
To clear the guard partition (and thereby force the system to try those components again on next boot), issue <code>pflash -P GUARD -c</code> from the BMC shell.
 
To clear the guard partition (and thereby force the system to try those components again on next boot), issue <code>pflash -P GUARD -c</code> from the BMC shell.
 +
 +
'''Note:'''
 +
CPUs being no longer guarded ''might'' not be a rare occurrence. It has been reported [https://www.talospace.com/2020/05/the-case-of-disappearing-core.html here] and [http://tenfourfox.blogspot.com/2018/05/a-semi-review-of-raptor-talos-ii.html here] for example. Which also could mean that it is "dialed-in" to be very safe. More insight into the mechanics in this wiki would be appreciated.

Revision as of 06:50, 1 June 2022

If some components (e.g. a CPU or some cores on a CPU) are not being detected, they may have been guarded out. This is a mechanism used to allow POWER systems to function when broken components are detected, but if a component is incorrectly detected as broken (or if it really is broken but is later fixed), it can prevent the component from working until the spurious guard entry is manually cleared.

To clear the guard partition (and thereby force the system to try those components again on next boot), issue pflash -P GUARD -c from the BMC shell.

Note: CPUs being no longer guarded might not be a rare occurrence. It has been reported here and here for example. Which also could mean that it is "dialed-in" to be very safe. More insight into the mechanics in this wiki would be appreciated.