Troubleshooting/GPU

From RCS Wiki
Revision as of 02:46, 3 February 2018 by SiteAdmin (talk | contribs) (Created page with "Category:Troubleshooting == Background == Because OpenPOWER systems do not have a legacy graphics interface to fall back to, and as a result rely heavily on the running...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Category:Troubleshooting

Background

Because OpenPOWER systems do not have a legacy graphics interface to fall back to, and as a result rely heavily on the running operating system and its drivers to handle display tasks, a few rough edges are exposed. This page attempts to document the current status of these rough edges and suggested workarounds pending actual fixes.

Common Issues

Xorg will not start / crashes when a discrete GPU is installed

Installing more than one GPU into an OpenPOWER system (for instance, when adding a discrete GPU) exposes all GPUs directly to the operating system -- there is no concept of a "primary" GPU like there is on x86. Xorg does not handle this gracefully, tending to crash during autoconfiguration. At least one bug report has been filed but fixing the root cause of this issue (incorrect Xorg drivers binding to underlying DRM devices) does not seem to be an Xorg priority.

Fortunately, the workaround is fairly simple, and consists of explicitly assigning Xorg drivers for each installed GPU. For this example we'll show how to fix Xorg on Debian with an AMD WX7100 discrete GPU installed.

Step 1: Locate Bus Numbers

root@talos:~# lspci | grep VGA
0000:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon Pro WX 7100]
0005:02:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)

Note the numbers to the left of the "VGA compatible controller" string. Each of these numbers is the PCI d:B:D:F[1] number of the GPU, and is unique to the slot(s) you have your GPU(s) installed in. As a result your bus numbers may differ from those shown in this example; always use your bus IDs going forward. This also means that if you move your GPU to a different slot you will need to update the bus ID associated with that GPU.

Step 2: Create Xorg Configuration Snippet

root@talos:~# mkdir /etc/X11/xorg.conf.d

Create and open /etc/X11/xorg.conf.d/21-gpu-driver.conf for editing, then adjust the following template with your GPU information. Pay close attention to the BusID and Driver fields, as they must match your installed GPU(s). Note that Xorg uses decimal numbering, not hexadecimal like lspci, so you will need to convert the numbers within the colons of the lspci output to decimal in order to constrict a valid Xorg BusID. Furthermore, xorg doesn't use leading zeroes like lspci does; these must be stripped off when assembling the Xorg BusID. Finally, Xorg expects to see a BusID assembled as "PCI:B@d:D:F" (note Bus and Domain are swapped), and should not be assembled not using the format shown by lspci.

# AST2500
Section "Device"
    Identifier     "GPU0"
    Driver         "modesetting"
    BusID          "PCI:2@5:0:0"
    VendorName     "ASpeed Corporation"
EndSection

# WX7100
Section "Device"
    Identifier     "GPU1"
    Driver         "amdgpu"
    BusID          "PCI:1@0:0:0"
    VendorName     "AMD Corporation"
EndSection

Save and exit the configuration snippet file, then restart Xorg. Your GPUs should now function as intended.

  1. PCI Domain:Bus:Device:Function