Add GPU Firmware To BOOTKERNFW

From RCS Wiki
Jump to: navigation, search

Purpose

This guide explains how to add GPU (or other) firmware files to the BOOTKERNFW partition of the boot firmware flash. This allows Linux kernel drivers which require firmware blobs to function correctly within the Skiroot bootloader environment.

For example, the amdgpu driver requires AMD firmware blobs to bring up AMD GPUs. Copying these files into the firmware partition enables attached displays to be brought up when showing the boot menu.

Background

Many cards require firmware to be loaded at startup time to initialize. On an X86 system, this is usually accomplished by bootstrap code called an "Option ROM", or by extra code in the BIOS.

On OpenPOWER, this is not a possibility, due to the Option ROM code being platform-specific, as well as the OpenPOWER security stance being that trusting random bootstrap code on cards is a bad idea.

Therefore, for things like video cards to work, it is necessary for the card's firmware to be loaded in before the card can be used. Linux video drivers in petitboot are capable of doing this loading, however, the actual firmware itself needs to be available so the drivers actually have something to load.

So, on RCS OpenPOWER systems, there is a special area of flash set aside for user-provided firmware files, called BOOTKERNFW. Due to the limited size of this partition, the user is required to make decisions as to what firmware files to install.

The exact size of BOOTKERNFW may vary depending on the firmware version you are on, but the standard size currently is 1,966,080 bytes (0x1E0000 hex, approximately 1.8MB.)

Applicability

All RCS OpenPOWER systems.

Instructions

Step 1. Generate firmware partition image

Boot into an OS. A Linux environment is assumed. You will need the mksquashfs tool available.

For Debian, you can install mksquashfs using the following command:

$ apt install squashfs-tools

Create a directory with the required firmware files available:

$ mkdir /tmp/firmware
$ # ... (copy required files into /tmp/firmware) ...

The directory structure under /tmp/firmware gets mounted at /lib/firmware. Here is an example of the required directory structure:

/tmp/firmware/radeon/PITCAIRN_pfp.bin
/tmp/firmware/amdgpu/polaris10_mc.bin

You can obtain these firmware files from most Linux distros (they tend to be installed in /lib/firmware), or from the linux-firmware repository.

Having generated the correct directory structure, generate the image:

$ cd /tmp/firmware
$ mksquashfs * /tmp/firmware.bin -all-root -keep-as-directory

firmware.bin is now a SquashFS image which you will copy to a firmware flash partition. You will need to access this image while in the Skiroot environment, so you may wish to copy it to e.g. /boot or another partition which you can easily access from Skiroot.

Step 2. Copy firmware partition image to flash

Method 1: From Petitboot

Reboot into the Skiroot environment, attaching a display using the onboard VGA or HDMI port if necessary. When the Petitboot bootloader appears, select “Exit to Shell”.

Locate the firmware.bin file you prepared earlier. Petitboot mounts all filesystems it can recognize in folders under /var/petitboot/mnt/dev/, named after disk device partitions. In this example, we will assume that the file was at the path /var/petitboot/mnt/dev/nvme0n1p2/boot/firmware.bin.

# ls /var/petitboot/mnt/dev/
# ls /var/petitboot/mnt/dev/nvme0n1p2/boot/

Use the pflash tool to flash the firmware.bin to the BOOTKERNFW partition. If you are on an old PNOR version that does not have pflash available, switch to following one of the other flashing methods.

# pflash -P BOOTKERNFW -e -p /var/petitboot/mnt/dev/nvme0n1p2/boot/firmware.bin
About to erase 0x03e10000..0x03ff0000 !
WARNING ! This will modify your HOST flash chip content !
Enter "yes" to confirm:

The pflash tool will prompt you to confirm that you are about to modify your HOST flash chip content. Answer "yes" and press Enter.

Erasing...
[==================================================] 100% ETA:0s     
About to program "bootkern.img" at 0x03e10000..0x03ff0000 !
Programming & Verifying...
[==================================================] 100% ETA:0s     
Updating actual size in partition header...

Reboot the system:

# reboot

When the system reboots, exit into the shell at the Petitboot menu again and check to see if the firmware made it as expected:

# ls /lib/firmware/

Method 2: From the BMC

On your local machine, scp the firmware.bin to /tmp/ on the BMC.

$ scp /tmp/firmware.bin root@bmc-ip-address:/tmp/firmware.bin

Power off the host. (optional, but recommended)

Log into the BMC on port 22.

$ ssh root@bmc-ip-address

Use the pflash tool to flash the firmware.bin to the BOOTKERNFW partition.

# pflash -P BOOTKERNFW -e -p /tmp/firmware.bin
About to erase 0x03e10000..0x03ff0000 !
WARNING ! This will modify your HOST flash chip content !
Enter "yes" to confirm:

The pflash tool will prompt you to confirm that you are about to modify your HOST flash chip content. Answer "yes" and press Enter.

Erasing...
[==================================================] 100% ETA:0s     
About to program "bootkern.img" at 0x03e10000..0x03ff0000 !
Programming & Verifying...
[==================================================] 100% ETA:0s     
Updating actual size in partition header...

Power the host back up.

On the host, exit into the shell at the Petitboot menu again and check to see if the firmware made it as expected:

# ls /lib/firmware/

Method 3: Old Petitboot method, DANGEROUS

WARNING: This method is error-prone and should only be done from the petitboot shell, NEVER on the BMC.

Reboot into the Skiroot environment, attaching a display using the onboard VGA or HDMI port if necessary. When the Petitboot bootloader appears, select “Exit to Shell”.

IMPORTANT: ACCIDENTALLY PERFORMING THESE INSTRUCTIONS ON THE BMC INSTEAD OF THE PETITBOOT CONSOLE MAY DAMAGE YOUR BMC FIRMWARE!

Make sure you can see the BOOTKERNFW partition (should return: mtd5: 000e0000 00010000 "BOOTKERNFW"):

# cat /proc/mtd | grep BOOTKERNFW

Find the flash_erase (mine was in /var/petitboot/mnt/dev/nvme0n1p2/usr/sbin/):

# find / -name flash_erase

Erase /dev/mtd5:

# /var/petitboot/mnt/dev/nvme0n1p2/usr/sbin/flash_erase /dev/mtd5 0 0

Flash /dev/mtd5:

# dd if=/var/petitboot/mnt/dev/nvme0n1p2/boot/firmware.bin of=/dev/mtd5 bs=64k

Reboot the system:

# reboot

When the system reboots, exit into the shell at the Petitboot menu again and check to see if the firmware made it as expected:

# ls /lib/firmware/

Notes

If you are still getting an error message about not being able to find VGA BIOS, it may be because your GPU is being initialized before /lib/firmware is mounted. You can check this by running dmesg from the Skiroot shell. For example, my GPU was trying to initialize at around T+6.5 seconds and the mtd device wasn't actually mounted until T+7.25. I was able to successfully load the firmware after after /dev/mtd5 was mounted by running rmmod amdgpu && modprobe amdgpu from the shell. However, I did experience stability issues in Debian Stretch with amdgpu. However, if I let Skiroot fail to load the firmware and boot anyway, the radeon driver works fine.

e.g.

$ dmesg
...
[    6.412518] [drm] radeon kernel modesetting enabled.
[    6.412848] pci 0033:00:00.0: enabling device (0105 -> 0107)
[    6.412867] radeon 0033:01:00.0: enabling device (0140 -> 0142)
[    6.413077] [drm] initializing kernel modesetting (PITCAIRN 0x1002:0x6819 0x1043:0x0431 0x00).
[    6.413086] radeon: No suitable DMA available
[    6.413150] [drm:radeon_device_init] *ERROR* Unable to find PCI I/O BAR
[    6.533321] [drm:radeon_atombios_init] *ERROR* Unable to find PCI I/O BAR; using MMIO for ATOM IIO
[    6.533415] ATOM BIOS: 6819.15.17.0.0.AS01
[    6.533434] [drm] GPU not posted. posting now...
[    6.541203] radeon 0033:01:00.0: VRAM: 2048M 0x0000000000000000 - 0x000000007FFFFFFF (2048M used)
[    6.541208] radeon 0033:01:00.0: GTT: 2048M 0x0000000080000000 - 0x00000000FFFFFFFF
[    6.541211] [drm] Detected VRAM RAM=2048M, BAR=256M
[    6.541213] [drm] RAM width 256bits DDR
[    6.541295] [TTM] Zone  kernel: Available graphics memory: 50118240 kiB
[    6.541297] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[    6.541299] [TTM] Initializing pool allocator
[    6.541371] [drm] radeon: 2048M of VRAM memory ready
[    6.541374] [drm] radeon: 2048M of GTT memory ready.
[    6.541390] [drm] Loading pitcairn Microcode
[    6.541429] radeon 0033:01:00.0: Direct firmware load for radeon/pitcairn_pfp.bin failed with error -2
[    6.541456] radeon 0033:01:00.0: Direct firmware load for radeon/PITCAIRN_pfp.bin failed with error -2
[    6.541459] si_cp: Failed to load firmware "radeon/PITCAIRN_pfp.bin"
[    6.541560] [drm:si_init] *ERROR* Failed to load firmware!
[    6.541653] radeon 0033:01:00.0: Fatal error during GPU init
[    6.541772] [drm] radeon: finishing device.
...
[    7.123614] 6 ofpart partitions found on MTD device flash
[    7.123617] Creating 6 MTD partitions on "flash":
[    7.123621] 0x000000000000-0x000004000000 : "PNOR"
[    7.123773] 0x000001b21000-0x000003a21000 : "BOOTKERNEL"
[    7.123868] 0x000003b44000-0x000003b68000 : "CAPP"
[    7.123961] 0x000003b88000-0x000003b89000 : "VERSION"
[    7.124056] 0x000003b89000-0x000003bc9000 : "IMA_CATALOG"
[    7.124149] 0x000003f10000-0x000003ff0000 : "BOOTKERNFW"
...
$ rmmod amdgpu && modprobe amdgpu
$ dmesg
...
[ 2727.836343] [drm] amdgpu kernel modesetting enabled.
[ 2727.836900] amdgpu 0033:01:00.0: SI support provided by radeon.
[ 2727.836905] amdgpu 0033:01:00.0: Use radeon.si_support=0 amdgpu.si_support=1 to override.

Backing up BOOTKERNFW

If you are using one of the prebuilt workstations that came with a GPU card preinstalled, you may wish to backup your BOOTKERNFW so you have a known-good version to go back to, as they come with the firmware preloaded into BOOTKERNFW and removing it may cause graphical display to stop working until you reload appropriate firmware into BOOTKERNFW.

This backup can be taken from either the BMC or the Petitboot shell. The procedure is the same on either.

# cd /tmp
# pflash -P BOOTKERNFW -r BOOTKERNFW.bin

From another machine, copy the backup to a permanent location.

$ scp root@bmc-ip-address:/tmp/BOOTKERNFW.bin .

Erasing BOOTKERNFW

If you change your mind and decide to stop using BOOTKERNFW, you can erase it from either the BMC or the Petitboot shell. The procedure is the same on either.

# pflash -P BOOTKERNFW -e
About to erase 0x03e10000..0x03ff0000 !
WARNING ! This will modify your HOST flash chip content !
Enter "yes" to confirm:

The pflash tool will prompt you to confirm that you are about to modify your HOST flash chip content. Answer "yes" and press Enter.

Erasing...
[==================================================] 100% ETA:0s     
Updating actual size in partition header...

At this point you are back to an empty BOOTKERNFW partition.

See also