GDM: Wayland sessions most of the times unavailable on Nvidia GPU

Hello,
I have a problem that I just cannot understand. After restarting my PC running Fedora 39 I got to the login screen and noticed that no wayland sessions are available. I have checked the gdm custom.conf (also in /run/gdm) but could not find anything wrong. Journalctl reports:
org.gnome.Shell@wayland.service - GNOME Shell on Wayland was skipped because of an unmet condition check (ConditionEnvironment=XDG_SESSION_TYPE=wayland).
I have checked, that hibernate/resume/suspend units are active.
My GPU is an RTX 4070 running driver version 545 from RPM Fusion. Issues already existed on version 535. The crazy thing is, that sometimes I can actually start a wayland session, but this only ever worked after about 5 boots.
Does anyone have any idea what the actual problem could be?

Hello @nicoopheys ,
Welcome to :fedora: !

Possibly check this discussion out … Wayland session is unavailable with vendor-provided NVIDIA driver

That may be applicable for you if you have NVidia graphics. Otherwise you will have to provide a bit more info for someone to help you out here.

A good first step when troubleshooting problems like this is to create a new user login. If this works, then the problem is some configuration setting for your login (starting with the login shell, profile files, and moving on to settings in ~/.config).

Thank you for your reply. The problem exists for all users, also newly created ones.

Thank you for that warm welcome and your reply.

The post you have linked seems to be about problems in regards with the drivers directly from Nvidia, not from RPM Fusion. As stated, sometimes (very rarely) the wayland sessions will be available and will function properly. But most of the times, they won’t. Do you have an idea about where should look for errors? For me, this seems to be an issue with Nvidia drivers on startup, which leads to wayland sessions being excluded from the available sessions.

It links to the BZ report which links to a similar issue that was closed as similar even though in the issue that was closed the original poster stated they had a vanilla install of Fedora with no special modifications. That was why I linked it to your topic. As an afterthought on this, perhaps a GDM reinstall is required?

Hello and welcome as well.

If you are using the nvidia drivers directly from nvidia.com then there may be issues.
If the drivers are installed from rpmfusion there are usually fewer problems.

It would help us if you would post (copy and paste as text using the preformatted text button </> on the toolbar) the output of inxi -Fzxx (may need to install inxi first) and the output of dnf list installed \*nvidia\*

Thank you for your help!

Here is the output of inxi -Fzxx. Note: I have truncated some things for better oversight.

System:
  Kernel: 6.6.4-200.fc39.x86_64 arch: x86_64 bits: 64 compiler: gcc
    v: 2.40-13.fc39 Desktop: GNOME v: 45.2 tk: GTK v: 3.24.38 wm: gnome-shell
    dm: 1: GDM 2: SDDM note: stopped Distro: Fedora release 39 (Thirty Nine)
Machine:
  Type: Desktop Mobo: Gigabyte model: Z790 AORUS ELITE AX v: x.x
    serial: <superuser required> UEFI: American Megatrends LLC. v: FC
    date: 03/07/2023
CPU:
  Info: 12-core (8-mt/4-st) model: 12th Gen Intel Core i7-12700K bits: 64
    type: MST AMCP arch: Alder Lake rev: 2 cache: L1: 1024 KiB L2: 12 MiB
    L3: 25 MiB
  Speed (MHz): [...]
    bogomips: 144383
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Graphics:
  Device-1: Intel AlderLake-S GT1 vendor: Gigabyte driver: i915 v: kernel
    ports: active: none empty: DP-1,HDMI-A-1,HDMI-A-2 bus-ID: 00:02.0
    chip-ID: 8086:4680
  Device-2: NVIDIA AD104 [GeForce RTX 4070] vendor: ASUSTeK driver: nvidia
    v: 545.29.06 arch: Lovelace pcie: speed: 2.5 GT/s lanes: 16 ports:
    active: none off: DP-3,DP-4,HDMI-A-3 empty: DP-2 bus-ID: 01:00.0
    chip-ID: 10de:2786
  Display: x11 server: X.Org v: 1.20.14 with: Xwayland v: 23.2.2
    compositor: gnome-shell driver: X: loaded: modesetting,nvidia
    unloaded: fbdev,nouveau,vesa alternate: nv dri: iris
    gpu: nvidia,nvidia-nvswitch display-ID: :1 screens: 1
  Screen-1: 0 s-res: 6400x1440 s-dpi: 96
  Monitor-1: DP-3 note: disabled pos: bottom-r model: Dell D2421H
    res: 1920x1080 dpi: 93 diag: 604mm (23.8")
  Monitor-2: DP-4 note: disabled pos: primary,top-center
    model: Samsung LC32G5xT res: 2560x1440 dpi: 93 diag: 806mm (31.7")
  Monitor-3: HDMI-A-3 mapped: HDMI-0 note: disabled pos: middle-l
    model: Samsung C27F390 res: 1920x1080 dpi: 82 diag: 686mm (27")
  API: EGL v: 1.5 platforms: device: 0 drv: nvidia device: 1 drv: iris
    device: 3 drv: swrast gbm: drv: iris surfaceless: drv: nvidia x11:
    drv: nvidia inactive: wayland,device-2
  API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: nvidia mesa v: 545.29.06
    glx-v: 1.4 direct-render: yes renderer: NVIDIA GeForce RTX 4070/PCIe/SSE2
  API: Vulkan v: 1.3.268 surfaces: xcb,xlib device: 0 type: discrete-gpu
    driver: nvidia device-ID: 10de:2786 device: 1 type: integrated-gpu
    driver: mesa intel device-ID: 8086:4680 device: 2 type: cpu
    driver: mesa llvmpipe device-ID: 10005:0000
Audio:
  Device-1: Intel vendor: Gigabyte driver: snd_hda_intel v: kernel
    bus-ID: 00:1f.3 chip-ID: 8086:7a50
  Device-2: NVIDIA vendor: ASUSTeK driver: snd_hda_intel v: kernel pcie:
    speed: 16 GT/s lanes: 16 bus-ID: 01:00.1 chip-ID: 10de:22bc
  [...]
  API: ALSA v: k6.6.4-200.fc39.x86_64 status: kernel-api
  Server-1: JACK v: 1.9.22 status: off
  Server-2: PipeWire v: 1.0.0 status: active with: 1: pipewire-pulse
    status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
Network:
  [...]
Bluetooth:
  [...]
Drives:
  [...]
Partition:
  [...]
Swap:
  [...]
Sensors:
  System Temperatures: cpu: 39.0 C mobo: 29.0 C gpu: nvidia temp: 38 C
  Fan Speeds (rpm): N/A gpu: nvidia fan: 0%
Info:
  Processes: 500 Uptime: 38m Memory: total: 32 GiB note: est.
  available: 31.11 GiB used: 3.53 GiB (11.3%) Init: systemd v: 254
  target: graphical (5) default: graphical Compilers: gcc: 13.2.1 Packages:
  pm: rpm pkgs: N/A note: see --rpm pm: flatpak pkgs: 52 Shell: Bash v: 5.2.21
  running-in: gnome-terminal inxi: 3.3.31

And here dnf list installed \*nvidia\*:

akmod-nvidia.x86_64                                           3:545.29.06-1.fc39                       @rpmfusion-nonfree-nvidia-driver
kmod-nvidia-6.6.2-201.fc39.x86_64.x86_64                      3:545.29.06-1.fc39                       @@commandline                   
kmod-nvidia-6.6.3-200.fc39.x86_64.x86_64                      3:545.29.06-1.fc39                       @@commandline                   
kmod-nvidia-6.6.4-200.fc39.x86_64.x86_64                      3:545.29.06-1.fc39                       @@commandline                   
libnvidia-container-tools.x86_64                              1.14.3-1                                 @nvidia-container-toolkit       
libnvidia-container1.x86_64                                   1.14.3-1                                 @nvidia-container-toolkit       
nvidia-container-toolkit.x86_64                               1.14.3-1                                 @nvidia-container-toolkit       
nvidia-container-toolkit-base.x86_64                          1.14.3-1                                 @nvidia-container-toolkit       
nvidia-gpu-firmware.noarch                                    20231111-1.fc39                          @updates                        
nvidia-modprobe.x86_64                                        3:545.29.06-1.fc39                       @rpmfusion-nonfree-nvidia-driver
nvidia-persistenced.x86_64                                    3:545.29.06-1.fc39                       @rpmfusion-nonfree-nvidia-driver
nvidia-settings.x86_64                                        3:545.29.06-1.fc39                       @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia.x86_64                                    3:545.29.06-1.fc39                       @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-cuda.x86_64                               3:545.29.06-1.fc39                       @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-cuda-libs.i686                            3:545.29.06-1.fc39                       @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-cuda-libs.x86_64                          3:545.29.06-1.fc39                       @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-kmodsrc.x86_64                            3:545.29.06-1.fc39                       @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-libs.i686                                 3:545.29.06-1.fc39                       @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-libs.x86_64                               3:545.29.06-1.fc39                       @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-power.x86_64                              3:545.29.06-1.fc39                       @rpmfusion-nonfree-nvidia-driver

All of that looks relatively normal.
Have you considered that the upgraded drivers (version 545) may have a problem?

When my system upgraded from the 535 driver to the 545 driver it broke cuda and the apps using cuda could not run. I downgraded back to the 535 driver and cuda now functions again.

That can be accomplished with

  1. sudo dnf remove \*nvidia\*545\* to remove the 545 drivers
  2. sudo dnf install almod-nvidia-535\* xorg-x11-drv-nvidia-cuda-535\* nvidia-{persistenced,settings}-535\* to reinstall all the needed packages.
  3. Wiat about 5 minutes then reboot.

It may be worth a try.
If you want to return to the 545 drivers simply sudo dnf upgrade
If you want to upgrade and keep the 535 drivers simply sudo dnf upgrade --exclude akmod-nvidia

Thank you for having a look. I really appreciate it!

This problem does not seem to be related to the newer driver (for me at least), because I had the same issue on version 535. Cuda seems to be working for me on the latest version.

By the way, I have now recognized an interesting pattern: When coming to the login screen, most of the times the UI elements are shown on the wrong monitor and mouse movement between screens is not like it should be (according to GNOME settings). When I finally have the wayland sessions available after many reboots, the UI looks like it should and displays the UI elements on the correct screen.

If it’s happening intermittently and you’ve already checked /run/gdm/custom.conf, then the GDM greeter session is probably crashing on the boots where Wayland is unavailable.

You can check the GDM greeter session logs: journalctl -b _UID=42. You’re looking for a crash/failure following this line:

Running GNOME Shell (using mutter 45.2) as a Wayland display server

It would have been nice to know you’re using a multi-monitor setup. There is an issue with wayland+nvidia+gdm being discussed, I don’t seem able to locate though.
[Edit] @chrisawi is talking about what I’m thinking of.

You are right. Sorry for overlooking this. I am fairly new to this.

fedora gnome-shell[1860]: Running GNOME Shell (using mutter 45.2) as a Wayland display server
fedora gnome-shell[1860]: Made thread 'KMS thread' realtime scheduled
fedora gnome-shell[1860]: Device '/dev/dri/card1' prefers shadow buffer
fedora gnome-session[1848]: gnome-session-binary[1848]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 9
fedora gnome-session-binary[1848]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 9
fedora gnome-session-binary[1848]: Unrecoverable failure in required component org.gnome.Shell.desktop
fedora /usr/libexec/gdm-wayland-session[1847]: dbus-daemon[1847]: [session uid=42 pid=1847] Activating service name='ca.desrt.dconf' requested by ':1.2' (uid=42 pid=1848 comm="/usr/libexec/gnome-session-binary --autostart /usr" label="system_u:system_r:xdm_t:s0-s0:c0.c1023")
fedora /usr/libexec/gdm-wayland-session[1847]: dbus-daemon[1847]: [session uid=42 pid=1847] Successfully activated service 'ca.desrt.dconf'

After this, it’s starting the X11 Server.

Yeah, this looks like:

The second user in that thread has the same log entries where gnome-shell is getting SIGKILLed. The next step is for someone affected to open an issue against mutter upstream.

This issue seems to match what I’m seeing on my Fedora setup… something that does not appear to be an issue on Arch Linux on the same machine.

I’ve been trying to narrow down a root cause, but failing miserably. I have raised an issue with everything I’ve tried to date here