Difference between revisions of "Debricking the BMC/Watchdog"

From RCS Wiki
Jump to navigation Jump to search
(Describe a method of disabling the early FPGA watchdog on the BMC, to make doing U-Boot recovery tasks much easier.)
 
(Updating GPIO pin in notes for Blackbird. I am too lazy to convert it into an ordinal GPIO number currently.)
 
(6 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
Once standby power has been applied to the Talos II (i.e. you have plugged it in), the FPGA waits for the power to stabilize, and then signals the BMC to reset.
 
Once standby power has been applied to the Talos II (i.e. you have plugged it in), the FPGA waits for the power to stabilize, and then signals the BMC to reset.
  
At this point, a watchdog counter in the FPGA begins running. This is done in an attempt to ensure the BMC does not get stuck in early bootup, which would prevent the Talos II from working. If the BMC boot gets stuck in the U-Boot phase, the FPGA will force a reset after approximately 40 seconds. This interferes with BMC recovery, as that leaves you with a limited window to run commands.
+
At this point, a watchdog counter in the FPGA begins running. This is done because of a hardware issue in the ast2500, where very occasionally the chip would get stuck when attempting to start.[https://git.raptorcs.com/git/talos-system-fpga/commit/main.v?id=e90ca898402a250e9d2f6e303e25ddaceb0cf8d6] If the BMC boot gets stuck in the U-Boot phase, the FPGA will force a reset after approximately 40 seconds. This interferes with BMC recovery, as that leaves you with a limited window to run commands.
  
 
U-Boot on the Talos II has been slightly modified to disable this watchdog counter immediately before transferring control to the OpenBMC kernel.[https://git.raptorcs.com/git/talos-obmc-uboot/tree/common/bootm.c?id=cfee45a3ef2a592d130573ce3b7d8bcfe056060b#n707]
 
U-Boot on the Talos II has been slightly modified to disable this watchdog counter immediately before transferring control to the OpenBMC kernel.[https://git.raptorcs.com/git/talos-obmc-uboot/tree/common/bootm.c?id=cfee45a3ef2a592d130573ce3b7d8bcfe056060b#n707]
Line 7: Line 7:
 
Since we will want to spend more than 40 seconds at the U-Boot shell during recovery, we need to disable this watchdog ourselves.
 
Since we will want to spend more than 40 seconds at the U-Boot shell during recovery, we need to disable this watchdog ourselves.
  
(Todo: figure out if the gpio command can be used instead. The target is the SEQ_CONT signal connected to BMC pin GPIOS7.)
+
However, general GPIO support is not built into the Talos II U-Boot loader, so the hardware must be manipulated directly.
  
Using an ARM cross-compiler, we can build a tiny program to do the same thing.
+
Using an ARM cross-compiler, we can build a tiny program to do the same thing as the bootm.c code.
 +
 
 +
== Payload creation ==
 +
 
 +
=== For Talos II: ===
  
 
<source lang="c">
 
<source lang="c">
Line 23: Line 27:
 
   *gpio_ctl_reg |= 0x00800000;
 
   *gpio_ctl_reg |= 0x00800000;
 
   *gpio_data_reg &= ~0x00800000;
 
   *gpio_data_reg &= ~0x00800000;
 +
  return 0;
 +
}
 +
</source>
 +
 +
=== For Blackbird: ===
 +
 +
<source lang="c">
 +
/* watchdog.c - minimal code to disable the FPGA watchdog.
 +
* Derived from Raptor Engineering changes to U-Boot common/bootm.c.
 +
* SPDX-License-Identifier: GPL-2.0+
 +
*/
 +
#include <stdint.h>
 +
int main() {
 +
  uint32_t* gpio_ctl_reg = 0x1e780024;
 +
  uint32_t* gpio_data_reg = 0x1e780020;
 +
 +
  *gpio_ctl_reg |= 0x00010000;
 +
  *gpio_data_reg &= ~0x00010000;
 +
  return 0;
 
}
 
}
 
</source>
 
</source>
Line 34: Line 57:
 
<code>$ cat watchdog.srec</code>
 
<code>$ cat watchdog.srec</code>
  
<pre>S00E0000746F67676C652E7372656394
+
=== For Talos II: ===
S113000018309FE5842093E5022582E3842083E56C
+
<pre>S01000007761746368646F672E73726563C3
S1130010802093E50225C2E3802083E51EFF2FE1C3
+
S11300001C309FE50000A0E3842093E5022582E3F1
S10700200000781E42
+
S1130010842083E5802093E50225C2E3802083E5E4
 +
S10B00201EFF2FE10000781E11
 +
S9030000FC</pre>
 +
 
 +
=== For Blackbird: ===
 +
<pre>S01000007761746368646F672E73726563C3
 +
S11300001C309FE50000A0E3242093E5012882E34F
 +
S1130010242083E5202093E50128C2E3202083E502
 +
S10B00201EFF2FE10000781E11
 
S9030000FC</pre>
 
S9030000FC</pre>
  
Due to compiler differences, your output may be slightly different than the provided example s-record code. The example was compiled with <code>arm-linux-gnueabihf-gcc (Debian 6.3.0-18) 6.3.0 20170516</code>.
+
Due to compiler differences, your output may be slightly different than the provided example s-record code. The example was compiled with <code>arm-linux-gnueabihf-gcc (Debian 8.3.0-2) 8.3.0</code>.
  
Copy this data to the clipboard, as we will need to send it to the BMC within a limited time window in a bit.
+
* Copy the srec data to the clipboard, as we will need to send it to the BMC within a limited time window in a bit.
 +
 
 +
== Main Procedure ==
  
 
To load and execute this code, do the following at the <code>ast#</code> shell within the watchdog time window:
 
To load and execute this code, do the following at the <code>ast#</code> shell within the watchdog time window:
Line 58: Line 91:
 
(todo: stick the output in here)
 
(todo: stick the output in here)
  
The program will run and then return control to U-Boot. At this point, the watchdog has been disabled, and you can take your time with the rest of the recovery commands.
+
The program will run and then return control to U-Boot. At this point, the watchdog has been disabled, and you can take your time with the rest of the recovery commands. The loaded program code is no longer needed and the memory range can be reused.
 +
 
 +
== Notes ==
 +
 
 +
* For reference, the GPIO pin associated with this watchdog (For Talos II / Lite) is GPIOS7 (GPIO 151, physical pin AA20 on the ast2500). The signal is labelled SEQ_CONT if you are looking at it on the schematics.
 +
* On Blackbird, this pin is instead GPIOG0 (GPIO ???, Physical pin A19 on the ast2500). The signal is labelled BMC_BOOT_PHASE if you are looking at it on the schematics.
 +
 
 +
[[Category:Guides]]

Latest revision as of 21:32, 6 March 2020

Once standby power has been applied to the Talos II (i.e. you have plugged it in), the FPGA waits for the power to stabilize, and then signals the BMC to reset.

At this point, a watchdog counter in the FPGA begins running. This is done because of a hardware issue in the ast2500, where very occasionally the chip would get stuck when attempting to start.[1] If the BMC boot gets stuck in the U-Boot phase, the FPGA will force a reset after approximately 40 seconds. This interferes with BMC recovery, as that leaves you with a limited window to run commands.

U-Boot on the Talos II has been slightly modified to disable this watchdog counter immediately before transferring control to the OpenBMC kernel.[2]

Since we will want to spend more than 40 seconds at the U-Boot shell during recovery, we need to disable this watchdog ourselves.

However, general GPIO support is not built into the Talos II U-Boot loader, so the hardware must be manipulated directly.

Using an ARM cross-compiler, we can build a tiny program to do the same thing as the bootm.c code.

Payload creation

For Talos II:

/* watchdog.c - minimal code to disable the FPGA watchdog.
 * Derived from Raptor Engineering changes to U-Boot common/bootm.c.
 * SPDX-License-Identifier:	GPL-2.0+
 */
#include <stdint.h>
int main() {
  uint32_t* gpio_ctl_reg = 0x1e780084;
  uint32_t* gpio_data_reg = 0x1e780080;

  *gpio_ctl_reg |= 0x00800000;
  *gpio_data_reg &= ~0x00800000;
  return 0;
}

For Blackbird:

/* watchdog.c - minimal code to disable the FPGA watchdog.
 * Derived from Raptor Engineering changes to U-Boot common/bootm.c.
 * SPDX-License-Identifier:	GPL-2.0+
 */
#include <stdint.h>
int main() {
  uint32_t* gpio_ctl_reg = 0x1e780024;
  uint32_t* gpio_data_reg = 0x1e780020;

  *gpio_ctl_reg |= 0x00010000;
  *gpio_data_reg &= ~0x00010000;
  return 0;
}

Compiling this to an object and then converting it to an s-record will give you a file that can be directly loaded into U-Boot without using special tools.

$ arm-linux-gnueabihf-gcc -ffreestanding -march=armv6 -mfloat-abi=soft -marm -Os -c watchdog.c

$ arm-linux-gnueabihf-objcopy -O srec watchdog.o watchdog.srec

$ cat watchdog.srec

For Talos II:

S01000007761746368646F672E73726563C3
S11300001C309FE50000A0E3842093E5022582E3F1
S1130010842083E5802093E50225C2E3802083E5E4
S10B00201EFF2FE10000781E11
S9030000FC

For Blackbird:

S01000007761746368646F672E73726563C3
S11300001C309FE50000A0E3242093E5012882E34F
S1130010242083E5202093E50128C2E3202083E502
S10B00201EFF2FE10000781E11
S9030000FC

Due to compiler differences, your output may be slightly different than the provided example s-record code. The example was compiled with arm-linux-gnueabihf-gcc (Debian 8.3.0-2) 8.3.0.

  • Copy the srec data to the clipboard, as we will need to send it to the BMC within a limited time window in a bit.

Main Procedure

To load and execute this code, do the following at the ast# shell within the watchdog time window:

  • ast# loads 83000000
  • Paste the contents of the srec file into the terminal when prompted. Some summary data will be shown on screen.

(todo: grab and stick the example output in here.)

Run the code using the go command.

  • ast# go 83000000

(todo: stick the output in here)

The program will run and then return control to U-Boot. At this point, the watchdog has been disabled, and you can take your time with the rest of the recovery commands. The loaded program code is no longer needed and the memory range can be reused.

Notes

  • For reference, the GPIO pin associated with this watchdog (For Talos II / Lite) is GPIOS7 (GPIO 151, physical pin AA20 on the ast2500). The signal is labelled SEQ_CONT if you are looking at it on the schematics.
  • On Blackbird, this pin is instead GPIOG0 (GPIO ???, Physical pin A19 on the ast2500). The signal is labelled BMC_BOOT_PHASE if you are looking at it on the schematics.