Hey guys,
I have a Surface Book and to make the dGPU properly work I have the use a custom kernel from the Surface Linux community.
I was using the dGPU without any problems on Arch, but I decided to make the change and try Fedora.
To make it work on Arch, after installing the custom kernel there is an additional setup to make that I discovered, which related specifically to my Surface Book model.
It consists in creating the following script and the respective systemd service:
#!/bin/sh
echo 1 | tee /sys/bus/platform/devices/MSHW0041:00/dgpu_power
echo 1 > /sys/bus/pci/rescan
setpci -H1 -s 01:00.0 6a.b=81
setpci -H1 -s 01:00.0 4.w=0407
echo 1 > /sys/bus/pci/rescan
setpci -s 01:00.0 4.w
setpci -s 01:00.0 6a.b
modprobe nvidia
and the service:
[Unit]
Description=Nvidia GPU initialization
Before=display-manager.service
[Service]
Type=oneshot
ExecStart=/usr/bin/dgpu.sh
ExecStartPre=/bin/sleep 10
[Install]
WantedBy=multi-user.target
After enabling it (systemctl enable dgpu.service
) I usually go on and install DKMS or equivalent. I learned that for Fedora is Akmod.
Unfortunately, I’m struggling a lot to make the NVIDIA driver work. Let me explain what I’ve done.
So I went ahead and followed the steps on the RPM Fusion Doc and after enabling the Non-Free Repo I installed akmod-nvidia
and xorg-x11-drv-nvidia-cuda
. Then waited some minutes, rebooted, but it still wasn’t loaded.
I can see some problems with the service I need to use, actually. It seems to be not able to find the nvidia module. This is what systemctl status dgpu.service
prompts me:
× dgpu.service - Nvidia GPU initialization
Loaded: loaded (/etc/systemd/system/dgpu.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2022-08-31 20:48:15 -03; 2min 58s ago
Process: 732 ExecStartPre=/bin/sleep 1 (code=exited, status=0/SUCCESS)
Process: 992 ExecStart=/usr/local/bin/dgpu.sh (code=exited, status=1/FAILURE)
Main PID: 992 (code=exited, status=1/FAILURE)
CPU: 26ms
Aug 31 20:48:13 fedora systemd[1]: Starting dgpu.service - Nvidia GPU initialization...
Aug 31 20:48:14 fedora dgpu.sh[999]: 1
Aug 31 20:48:15 fedora dgpu.sh[1062]: 0407
Aug 31 20:48:15 fedora dgpu.sh[1063]: 81
Aug 31 20:48:15 fedora dgpu.sh[1064]: modprobe: FATAL: Module nvidia not found in directory /lib/modules/5.19.4-1.surface.fc36.x86_64
Aug 31 20:48:15 fedora systemd[1]: dgpu.service: Main process exited, code=exited, status=1/FAILURE
Aug 31 20:48:15 fedora systemd[1]: dgpu.service: Failed with result 'exit-code'.
Aug 31 20:48:15 fedora systemd[1]: Failed to start dgpu.service - Nvidia GPU initialization.
My SecureBoot is disabled as well (from sudo mokutil --sb-state
):
SecureBoot disabled
Platform is in Setup Mode
I confirmed that I’m on the right kernel with uname -r
: 5.19.4-1.surface.fc36.x86_64
I also confirmed that the NVIDIA dGPU is recognized with lspci | grep 'NVIDIA'
:
01:00.0 3D controller: NVIDIA Corporation GM206M [GeForce GTX 965M] (rev a1)
These are my kernel parameters:
GRUB_CMDLINE_LINUX="rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 initcall_blacklist=simpledrm_platform_driver_init rhgb quiet nouveau.modeset=0 pci=realloc pcie_port_pm=off pcie_aspm=off"
And to make sure it is all correct in grub I ran: sudo grub2-mkconfig -o /boot/grub2/grub.cfg
By listing everything NVIDIA-related, I get (from sudo dnf list installed *nvidia*
):
Installed Packages
akmod-nvidia.x86_64 3:515.65.01-1.fc36 @rpmfusion-nonfree-nvidia-driver
nvidia-persistenced.x86_64 3:515.65.01-1.fc36 @rpmfusion-nonfree-nvidia-driver
nvidia-settings.x86_64 3:515.65.01-1.fc36 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia.x86_64 3:515.65.01-1.fc36 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-cuda.x86_64 3:515.65.01-1.fc36 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-cuda-libs.i686 3:515.65.01-1.fc36 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-cuda-libs.x86_64 3:515.65.01-1.fc36 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-kmodsrc.x86_64 3:515.65.01-1.fc36 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-libs.i686 3:515.65.01-1.fc36 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-libs.x86_64 3:515.65.01-1.fc36 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-power.x86_64 3:515.65.01-1.fc36 @rpmfusion-nonfree-nvidia-driver
At last, running nvidia-smi
gives me:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
I don’t know what to try next.
Can anyone help me?