Difference between revisions of "Hostboot Debug Howto"

From RCS Wiki
Jump to navigation Jump to search
(Created page with "==WIP== ==How to Debug Hostboot== So, you have an abort or crash early in hostboot, and knowing that hostboot is able to generate stack traces, would prefer not to resort to...")
 
(Instructions for debug printouts.)
 
(5 intermediate revisions by one other user not shown)
Line 1: Line 1:
==WIP==
+
=Enabling Debug Print statements=
 +
Throughout the code there are info, warning, debug, and other messages that are not normally print to the screen. If you build your own PNOR firmware you can configure hostboot to show all of these messages. This WILL slow boot down significantly, so you will not want to run this way normally.
 +
 
 +
It's recommended to set the scrollback buffer in your terminal extremely high (10,000+ lines). SSH to the bmc and start `ombc-console-client` to watch the boot process.
 +
 
 +
To enable, edit {{ic|openpower/configs/hostboot/talos.config}}. Find the line {{ic|unset CONSOLE_OUTPUT_TRACE}} and change it to {{ic|set CONSOLE_OUTPUT_TRACE}} before building the PNOR firmware.
 +
 
 +
See [[Compiling Firmware]] to build your own pnor image.
 +
 
 +
=WIP=
  
 
==How to Debug Hostboot==
 
==How to Debug Hostboot==
  
So, you have an abort or crash early in hostboot, and knowing that hostboot is able to generate stack traces, would prefer not to resort to shotgun debugging.  Unfortunately, the [[errl utility is not available outside of IBM|] so the traditional eSEL.pl and related tools won't work, but fear not, there is still a (convoluted) way to make this work.
+
So, you have an abort or crash early in hostboot, and knowing that hostboot is able to generate stack traces, would prefer not to resort to [https://wikipedia.org/wiki/Shotgun_debugging shotgun debugging].  Unfortunately, the [https://github.com/open-power/hostboot/issues/174 errl utility is not available outside of IBM] so the traditional eSEL.pl and related tools won't work, but fear not, there is still a (convoluted) way to make this work.
  
 
Ready?
 
Ready?
  
===Step 1===
+
===Grab and decode the HBEL entry===
'''Grab and decode the HBEL entry'''
 
  
 
Provided PNOR access is working, hostboot will save the crash information to the HBEL partition.  Since we have no tooling to parse this partition, it's easiest to debug a reproducible crash by wiping the HBEL contents:
 
Provided PNOR access is working, hostboot will save the crash information to the HBEL partition.  Since we have no tooling to parse this partition, it's easiest to debug a reproducible crash by wiping the HBEL contents:
Line 24: Line 32:
 
</nowiki>
 
</nowiki>
  
The partition is ECC protected, so the first thing you will need to do is remove every ninth byte.  Once complete, in between HBEL binary record data, you will see something similar to:
+
The partition is ECC protected, so the first thing you will need to do is remove every ninth byte (keep the first 8 bytes, remove the ninth, keep the next 8, etc).  Once complete, in between HBEL binary record data, you will see something similar to:
  
 
  <nowiki>
 
  <nowiki>
Line 35: Line 43:
 
</nowiki>
 
</nowiki>
  
The most recent faulting instruction is to the left of the backtrace string.
+
The most recent faulting instruction address is to the left of the backtrace string, with the innermost instruction address listed separately after "Instruction where it occurred".
  
===Step 1===
+
===Look up offending symbol in table===
'''Look up offending symbol in table'''
 
  
By itself the backtrace isn't very useful, however there is a symbol table built as part of the normal op-build process called hbicore.syms.  You can find it in talos-op-build/output/host/powerpc64le-buildroot-linux-gnu/sysroot/hostboot_build_images/, alongside various other utilities that mostly don't work due to the errl missing proprietary errl binary.
+
By itself the backtrace isn't very useful, however there is a symbol table built as part of the normal op-build process called hbicore.syms.  You can find it in talos-op-build/output/host/powerpc64le-buildroot-linux-gnu/sysroot/hostboot_build_images/, alongside various other utilities that mostly don't work due to the missing proprietary errl binary.
  
 
The format appears to be:
 
The format appears to be:
Line 48: Line 55:
  
 
You will need to find the symbol range that contains the faulting address in this table, then you can determine the symbol name.
 
You will need to find the symbol range that contains the faulting address in this table, then you can determine the symbol name.
 +
 +
= See Also =
 +
 +
* [https://amboar.github.io/notes/2018/09/03/debugging-hostboot.html Debugging Hostboot]
 +
* [https://amboar.github.io/notes/2018/08/17/hacking-hostboot.html Hacking Hostboot]

Latest revision as of 12:17, 4 March 2025

Enabling Debug Print statements

Throughout the code there are info, warning, debug, and other messages that are not normally print to the screen. If you build your own PNOR firmware you can configure hostboot to show all of these messages. This WILL slow boot down significantly, so you will not want to run this way normally.

It's recommended to set the scrollback buffer in your terminal extremely high (10,000+ lines). SSH to the bmc and start `ombc-console-client` to watch the boot process.

To enable, edit openpower/configs/hostboot/talos.config. Find the line unset CONSOLE_OUTPUT_TRACE and change it to set CONSOLE_OUTPUT_TRACE before building the PNOR firmware.

See Compiling Firmware to build your own pnor image.

WIP

How to Debug Hostboot

So, you have an abort or crash early in hostboot, and knowing that hostboot is able to generate stack traces, would prefer not to resort to shotgun debugging. Unfortunately, the errl utility is not available outside of IBM so the traditional eSEL.pl and related tools won't work, but fear not, there is still a (convoluted) way to make this work.

Ready?

Grab and decode the HBEL entry

Provided PNOR access is working, hostboot will save the crash information to the HBEL partition. Since we have no tooling to parse this partition, it's easiest to debug a reproducible crash by wiping the HBEL contents:

root@bmc# pflash -P HBEL -c

Now attempt to IPL the system once, and wait for it to shut back down. Issue a chassis off command to make sure it stays down at this point.

Read the HBEL partition into a temporary file:

root@bmc# pflash -P HBEL -r /tmp/hbel.dat

The partition is ECC protected, so the first thing you will need to do is remove every ninth byte (keep the first 8 bytes, remove the ninth, keep the next 8, etc). Once complete, in between HBEL binary record data, you will see something similar to:

TID: 37
Bad Address: 10004abb6
Exception Type: 40000000
Instruction where it occurred: 0x40639fc0
K:Backtrace for 37:
  <-0x366D08<-0x40639D98<-0x4063B788<-0x40643784<-0x406438B8<-0x4063C8F0<-0x4063CA84<-0x4063E624<-0x4064B60C<-0x2714

The most recent faulting instruction address is to the left of the backtrace string, with the innermost instruction address listed separately after "Instruction where it occurred".

Look up offending symbol in table

By itself the backtrace isn't very useful, however there is a symbol table built as part of the normal op-build process called hbicore.syms. You can find it in talos-op-build/output/host/powerpc64le-buildroot-linux-gnu/sysroot/hostboot_build_images/, alongside various other utilities that mostly don't work due to the missing proprietary errl binary.

The format appears to be:

Symbol_Type,Start_Address,Stop_Address,Size,Name

You will need to find the symbol range that contains the faulting address in this table, then you can determine the symbol name.

See Also