Difference between revisions of "Debricking the BMC/Watchdog"

From RCS Wiki
Jump to navigation Jump to search
m (→‎Notes: note the signal name too.)
(fix bug pointed out by awilfox: make sure and return properly so the stack doesn't get damaged.)
Line 25: Line 25:
 
   *gpio_ctl_reg |= 0x00800000;
 
   *gpio_ctl_reg |= 0x00800000;
 
   *gpio_data_reg &= ~0x00800000;
 
   *gpio_data_reg &= ~0x00800000;
 +
  return 0;
 
}
 
}
 
</source>
 
</source>
Line 37: Line 38:
  
 
<pre>S00E0000746F67676C652E7372656394
 
<pre>S00E0000746F67676C652E7372656394
S113000018309FE5842093E5022582E3842083E56C
+
S11300001C309FE50000A0E3842093E5022582E3F1
S1130010802093E50225C2E3802083E51EFF2FE1C3
+
S1130010842083E5802093E50225C2E3802083E5E4
S10700200000781E42
+
S10B00201EFF2FE10000781E11
 
S9030000FC</pre>
 
S9030000FC</pre>
  

Revision as of 16:34, 30 August 2018

Once standby power has been applied to the Talos II (i.e. you have plugged it in), the FPGA waits for the power to stabilize, and then signals the BMC to reset.

At this point, a watchdog counter in the FPGA begins running. This is done because of a hardware issue in the ast2500, where very occasionally the chip would get stuck when attempting to start.[1] If the BMC boot gets stuck in the U-Boot phase, the FPGA will force a reset after approximately 40 seconds. This interferes with BMC recovery, as that leaves you with a limited window to run commands.

U-Boot on the Talos II has been slightly modified to disable this watchdog counter immediately before transferring control to the OpenBMC kernel.[2]

Since we will want to spend more than 40 seconds at the U-Boot shell during recovery, we need to disable this watchdog ourselves.

However, general GPIO support is not built into the Talos II U-Boot loader, so the hardware must be manipulated directly.

Using an ARM cross-compiler, we can build a tiny program to do the same thing as the bootm.c code.

Payload creation

/* watchdog.c - minimal code to disable the FPGA watchdog.
 * Derived from Raptor Engineering changes to U-Boot common/bootm.c.
 * SPDX-License-Identifier:	GPL-2.0+
 */
#include <stdint.h>
int main() {
  uint32_t* gpio_ctl_reg = 0x1e780084;
  uint32_t* gpio_data_reg = 0x1e780080;

  *gpio_ctl_reg |= 0x00800000;
  *gpio_data_reg &= ~0x00800000;
  return 0;
}

Compiling this to an object and then converting it to an s-record will give you a file that can be directly loaded into U-Boot without using special tools.

$ arm-linux-gnueabihf-gcc -ffreestanding -march=armv6 -mfloat-abi=soft -marm -Os -c watchdog.c

$ arm-linux-gnueabihf-objcopy -O srec watchdog.o watchdog.srec

$ cat watchdog.srec

S00E0000746F67676C652E7372656394
S11300001C309FE50000A0E3842093E5022582E3F1
S1130010842083E5802093E50225C2E3802083E5E4
S10B00201EFF2FE10000781E11
S9030000FC

Due to compiler differences, your output may be slightly different than the provided example s-record code. The example was compiled with arm-linux-gnueabihf-gcc (Debian 6.3.0-18) 6.3.0 20170516.

  • Copy the srec data to the clipboard, as we will need to send it to the BMC within a limited time window in a bit.

Main Procedure

To load and execute this code, do the following at the ast# shell within the watchdog time window:

  • ast# loads 83000000
  • Paste the contents of the srec file into the terminal when prompted. Some summary data will be shown on screen.

(todo: grab and stick the example output in here.)

Run the code using the go command.

  • ast# go 83000000

(todo: stick the output in here)

The program will run and then return control to U-Boot. At this point, the watchdog has been disabled, and you can take your time with the rest of the recovery commands. The loaded program code is no longer needed and the memory range can be reused.

Notes

  • For reference, the GPIO pin associated with this watchdog is GPIOS7 (GPIO 151, physical pin AA20 on the ast2500). The signal is labelled SEQ_CONT if you are looking at it on the schematics.