Difference between revisions of "Debricking the BMC"
m (HLandau moved page Talos II/U-Boot Recovery to Debricking the BMC: Not Talos specific anymore) |
|||
Line 1: | Line 1: | ||
+ | ==Purpose== | ||
+ | This guide explains how to debrick the BMC when the BMC has been rendered inoperable, for example due to a defective firmware update. | ||
− | == | + | ==Applicability== |
− | While these instructions have been successfully applied in practice, they are still preliminary. Ask questions in IRC if you are unclear on what to do! | + | All RCS [[OpenPOWER]] systems. |
+ | |||
+ | ==Overview== | ||
+ | There are three means of debricking the BMC: | ||
+ | |||
+ | * Remove the BMC SPI flash chip and reflash it with a flash programmer | ||
+ | * Flash new BMC firmware via U-Boot TFTP (requires that U-Boot is still intact on the flash) | ||
+ | * Flash new BMC firmware via serial port (requires proprietary BMC chip vendor tool) | ||
+ | |||
+ | ==Flash new BMC firmware via U-Boot TFTP== | ||
+ | '''Note:''' While these instructions have been successfully applied in practice, they are still preliminary. Ask questions in IRC if you are unclear on what to do! | ||
<!-- Hi fellow wiki people! Ask Bdragon in IRC if you have questions about this procedure. | <!-- Hi fellow wiki people! Ask Bdragon in IRC if you have questions about this procedure. | ||
IRC user dragon_pilot was successfully able to recover a nonworking BMC from u-boot, these instructions are the result of that experiment. | IRC user dragon_pilot was successfully able to recover a nonworking BMC from u-boot, these instructions are the result of that experiment. | ||
Line 7: | Line 19: | ||
--> | --> | ||
− | In the event of a failure updating the BMC, but with a functioning | + | In the event of a failure when updating the BMC, but with a functioning U-boot, you can still recover by using U-Boot to manually bootstrap the BMC by manually loading a boot image over the network or the BMC serial port. |
− | If your BMC flash is corrupted to the extent that U-Boot | + | If your BMC flash is corrupted to the extent that U-Boot does not load properly, these instructions will not work; you will need to remove and reflash the BMC flash chip externally, or flash new firmware [[#Flash new BMC firmware via serial port|via serial port]]. |
− | * Prepare a TFTP server, and place <code>image-bmc</code>, <code>image-rofs</code>, and <code>image-kernel</code> in the root. | + | * Prepare a TFTP server, and place <code>image-bmc</code>, <code>image-rofs</code>, and <code>image-kernel</code> in the root. |
− | * Connect a serial console to the [[Talos_II/Building_FAQ#BMC_serial_port_J7701|BMC serial port]] (J7701, serial port bracket required) | + | * Connect a serial console to the [[Talos_II/Building_FAQ#BMC_serial_port_J7701|BMC serial port]] (J7701, serial port bracket required). The serial port configuration is <tt>115200,8n1</tt>. |
* Disconnect and reconnect power to the machine to force a BMC restart. Press a key to interrupt auto-boot when prompted. | * Disconnect and reconnect power to the machine to force a BMC restart. Press a key to interrupt auto-boot when prompted. | ||
* If you are having trouble with U-Boot resetting while you are trying to run these steps, have a slow network, or you are going to be loading over serial, you can [[Talos_II/U-Boot_Recovery/Watchdog|disable the FPGA watchdog]]. | * If you are having trouble with U-Boot resetting while you are trying to run these steps, have a slow network, or you are going to be loading over serial, you can [[Talos_II/U-Boot_Recovery/Watchdog|disable the FPGA watchdog]]. | ||
Line 36: | Line 48: | ||
* (TODO: Discussion of u-boot memory map) Short version is: flash lives at 0x20000000 and default base address for the memory loading tools is 0x83000000. So add 0x63000000 to any flash address to get the eqivilent address for an image-bmc file loaded into RAM. For example, the bootable image of a loaded image-bmc is at 0x83080000. | * (TODO: Discussion of u-boot memory map) Short version is: flash lives at 0x20000000 and default base address for the memory loading tools is 0x83000000. So add 0x63000000 to any flash address to get the eqivilent address for an image-bmc file loaded into RAM. For example, the bootable image of a loaded image-bmc is at 0x83080000. | ||
− | == | + | ==Flash new BMC firmware via serial port== |
− | ''This method was discovered by Centurion Dan as an alternative to pulling and reflashing the BMC SPI chip after a failed update had corrupted/wiped U-Boot'' | + | ''This method was discovered by Centurion Dan as an alternative to pulling and reflashing the BMC SPI chip after a failed update had corrupted/wiped U-Boot.'' |
Tools required: | Tools required: | ||
− | + | * [[Talos_II/Building_FAQ#BMC_serial_port_J7701|BMC serial port]] | |
− | + | * An x86 computer with a serial port (usb to serial works fine) - preferably running linux. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
+ | Software: | ||
+ | * Proprietary SOC Flash Utility from [https://www.aspeedtech.com/support.php Aspeed Technology's Support Page]: at least version [http://upload.aspeedtech.com/SOC/v11800.zip 1.18.00] | ||
+ | * BMC Firmware bundle: [[Talos_II/Firmware Firmware]] BMC [[:File:Talos_ii_openbmc_v1.07_bundle.tar.bz2| System Package 1.06 2a92dec044239591244b6ed69c3fac162a6b9ea4]] | ||
Procedure: | Procedure: | ||
− | + | # Unzip the SOC FLASH Utility on your other computer, and unzip the appropriate SOC Flash Utility bundle for that computer. | |
− | + | # Extract the BMC firmware bundle. | |
− | + | # Run the following command '''./socflash -s option=u comport="4" cs=0 if=image-u-boot gpio_b=S71 gpio_a=S70 option=f''' | |
− | + | #* You can drop the'' '''option=f''' ''for a slower but verified write process'' | |
− | + | #* if your serial interface can handle the baudrate 921600 add the parameter:'' ''' baudrate=921600''' | |
− | + | #* if you want to see what is going on, you can strace it by prepending:'' '''strace -e trace=open,close,read,write''' to the command above. | |
− | + | # Be Patient: it took me about 45 minutes to complete the flash process. | |
Notes: | Notes: | ||
− | + | * ''gpio_b=S71'' and ''gpio_a=S70'' are used to turn of the fpga watchdog timer before the flash process and then re-enables it after it's completed. | |
[[Category:Guides]] | [[Category:Guides]] |
Revision as of 13:17, 2 March 2019
Contents
Purpose
This guide explains how to debrick the BMC when the BMC has been rendered inoperable, for example due to a defective firmware update.
Applicability
All RCS OpenPOWER systems.
Overview
There are three means of debricking the BMC:
- Remove the BMC SPI flash chip and reflash it with a flash programmer
- Flash new BMC firmware via U-Boot TFTP (requires that U-Boot is still intact on the flash)
- Flash new BMC firmware via serial port (requires proprietary BMC chip vendor tool)
Flash new BMC firmware via U-Boot TFTP
Note: While these instructions have been successfully applied in practice, they are still preliminary. Ask questions in IRC if you are unclear on what to do!
In the event of a failure when updating the BMC, but with a functioning U-boot, you can still recover by using U-Boot to manually bootstrap the BMC by manually loading a boot image over the network or the BMC serial port.
If your BMC flash is corrupted to the extent that U-Boot does not load properly, these instructions will not work; you will need to remove and reflash the BMC flash chip externally, or flash new firmware via serial port.
- Prepare a TFTP server, and place
image-bmc
,image-rofs
, andimage-kernel
in the root.
- Connect a serial console to the BMC serial port (J7701, serial port bracket required). The serial port configuration is 115200,8n1.
- Disconnect and reconnect power to the machine to force a BMC restart. Press a key to interrupt auto-boot when prompted.
- If you are having trouble with U-Boot resetting while you are trying to run these steps, have a slow network, or you are going to be loading over serial, you can disable the FPGA watchdog.
- Run
dhcp x.x.x.x:image-bmc
, replacing the IP address of your TFTP server. This will load a copy of the stock boot image into RAM. - Run
bootm 83080000
. This will prepare and boot off of the loaded virtual image. - If your rofs partition is not functional, you will be dropped into the systemd emergency shell at this point. Try both the password you set as well as the default
0penBmc
, it may be one or the other depending on the state of the rwfs partition. If it boots up properly instead of dropping you into the emergency shell, the problem is probably in your kernel partition and you can retry flashing yourimage-kernel
using the normal procedure. (The rest of these instructions are for the systemd emergency shell.) mount -t tmpfs none /tmp
- run
udhcpc
to get an IP address. (TODO: verify that this is the actual command that you run. Do you have to specify the network interface too?) cd /tmp
tftp -g -r image-rofs x.x.x.x
tftp -g -r image-kernel x.x.x.x
- IMPORTANT: Use
md5sum
,sha1sum
, orsha256sum
to verify successful transfer of image-rofs and image-kernel! tftp is a very barebones protocol and relies on transport layer checksumming, which is optional and not always available in UDP! - Verify that the output of
cat /sys/class/mtd/mtd3/name
iskernel
and the output ofcat /sys/class/mtd/mtd4/name
isrofs
. We will be flashing mtd partitions directly in the next step and this is the last chance to verify that they will be flashed to the correct partition. flashcp -v image-kernel /dev/mtd3
flashcp -v image-rofs /dev/mtd4
- (TODO: Describe how to reset rwfs in case it was damaged as well?) note: the kernel param for bypassing rwfs is "overlay-filesystem-in-ram". Append it to the existing boot-args before running the bootm command. This can also be used as part of a password reset procedure.
- After the flash is complete, you can run restart the BMC and it should boot successfully.
- (TODO: Discussion of using Kermit to upload the image without network access) note: I (Bdragon) have successfully done a ram-only boot using cu's built in xmodem support (escape sequence ~X) to do an image transfer into RAM over the BMC serial interface.
- (TODO: Discuss using u-boot's built in cmp tool to perform basic validation of the u-boot image against a second copy loaded into RAM.)
- (TODO: Load recovery images over USB?) note: The onboard USB port is connected to the USB switch after all, so this might be problematic.
- (TODO: Discussion of u-boot memory map) Short version is: flash lives at 0x20000000 and default base address for the memory loading tools is 0x83000000. So add 0x63000000 to any flash address to get the eqivilent address for an image-bmc file loaded into RAM. For example, the bootable image of a loaded image-bmc is at 0x83080000.
Flash new BMC firmware via serial port
This method was discovered by Centurion Dan as an alternative to pulling and reflashing the BMC SPI chip after a failed update had corrupted/wiped U-Boot.
Tools required:
- BMC serial port
- An x86 computer with a serial port (usb to serial works fine) - preferably running linux.
Software:
- Proprietary SOC Flash Utility from Aspeed Technology's Support Page: at least version 1.18.00
- BMC Firmware bundle: Talos_II/Firmware Firmware BMC System Package 1.06 2a92dec044239591244b6ed69c3fac162a6b9ea4
Procedure:
- Unzip the SOC FLASH Utility on your other computer, and unzip the appropriate SOC Flash Utility bundle for that computer.
- Extract the BMC firmware bundle.
- Run the following command ./socflash -s option=u comport="4" cs=0 if=image-u-boot gpio_b=S71 gpio_a=S70 option=f
- You can drop the option=f for a slower but verified write process
- if your serial interface can handle the baudrate 921600 add the parameter: baudrate=921600
- if you want to see what is going on, you can strace it by prepending: strace -e trace=open,close,read,write to the command above.
- Be Patient: it took me about 45 minutes to complete the flash process.
Notes:
- gpio_b=S71 and gpio_a=S70 are used to turn of the fpga watchdog timer before the flash process and then re-enables it after it's completed.