NVidia GPU only available on 2nd login via LightDM

Hi everyone,
I’m running the fedora 38 Cinnamon spin for some 2 months now and am quite happy with it.
As a long-time Linuxer, I’m already used to the PITA that Nvidia hardware causes us Linuxers. Nevertheless, I went on to install the proprietary Nvidia drivers on my Thinkpad P52s - an Optimus system with onboard Intel GPU and floating Nvidia P500.

(Edit) GOAL: Use NVIDIA graphics only / as the primary rendering provider.

Having followed the NVIDIA HOWTO, Optimus Howto and the CUDA Howto on rpmfusion.org and a bit from the referenced ArchWiki as well, and also thanks to this forum, I got things running.

Mostly.

One thing that remains is that the Nvidia GPU looks to be available only after logging in a second time on the LightDM prompt (i.e., log in, log out, log in - don’t ask how long it took me to find this out).
I suspect this is due to the startup script /etc/lightdm/display_setup.sh used to start PRIME Synchronization (background here). But I am stuck at verifying this or finding the actual cause of this issue.

Have you run into a similar issue? Where would you suggest to look further? Or would you rather suggest me to try an alternative solution to the shell script?

TIA,
Uwe

Here’s some diagnostic info for starters:

# After 1st login via lightDM:
# The GPU is visible, but not used.
#
[me@myp52s ~]$ nvidia-smi 
Mon Jun 26 11:58:26 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro P500                    Off | 00000000:02:00.0 Off |                  N/A |
| N/A   47C    P8              N/A / ERR! |      4MiB /  2048MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
#
# after logoff and 2nd login:
# Processes are using the GPU with visible utilization.
#
Mon Jun 26 12:02:54 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro P500                    Off | 00000000:02:00.0 Off |                  N/A |
| N/A   54C    P0              N/A / ERR! |    501MiB /  2048MiB |      5%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      3972      G   /usr/libexec/Xorg                           403MiB |
|    0   N/A  N/A      4421      G   cinnamon                                     96MiB |
+---------------------------------------------------------------------------------------+

Output of screenfetch:

           /:-------------:\          me@myp52s
        :-------------------::        OS: Fedora 
      :-----------/shhOHbmp---:\      Kernel: x86_64 Linux 6.3.8-200.fc38.x86_64
    /-----------omMMMNNNMMD  ---:     Uptime: 5h 23m
   :-----------sMMMMNMNMP.    ---:    Packages: 2087
  :-----------:MMMdP-------    ---\   Shell: bash 5.2.15
 ,------------:MMMd--------    ---:   Resolution: 6432x2570
 :------------:MMMd-------    .---:   DE: GNOME
 :----    oNMMMMMMMMMNho     .----:   WM: Muffin
 :--     .+shhhMMMmhhy++   .------/   WM Theme: Mint-Y-Dark-Aqua (Adwaita)
 :-    -------:MMMd--------------:    GTK Theme: Adwaita [GTK2/3]
 :-   --------/MMMd-------------;     Icon Theme: Adwaita
 :-    ------/hMMMy------------:      Font: Cantarell 11
 :-- :dMNdhhdNMMNo------------;       Disk: 117G / 2.8T (5%)
 :---:sdNMMMMNds:------------:        CPU: Intel Core i7-8650U @ 8x 4.2GHz [69.0°C]
 :------:://:-------------::          GPU: Quadro P500
 :---------------------://            RAM: 5936MiB / 31925MiB

You don’t mention which configuration you want, except that we can assume you do want to use the NVIDIA GPU but not nouveau. We can rule out integrated graphics only, leaving NVIDIA only, or all but one (nouveau) of the many “use NVIDIA GPU when needed and keep it powered off to save power” options.

Thanks for the quick reply, George.
I thought i was clear on the part which driver I want to use when I wrote I installed the proprietary driver. My bad if this wasn’t clear enough. I deliberately wanted to use the Nvidia driver over nouveau.

On the remaining options:

  • Primarily, I’d like to have a setup where the Nvidia GPU is running all of the time. Getting this to run without the need to log out and in again was the intention of my question.
  • Now that you sparked my appetite :wink: , Having a second option for mobile use on battery only, where the Nvidia dGPU would power down, would also be a nice thing to have. Earlier I thought about achieving this with a separate configuration in LightDM that uses Wayland with Nouveau. But I’d be happy to have your suggestion.

Cheers,
Uwe

Do you notice any difference in thermal behaviour or rendering speed after the 2nd login?

NVIDIA 535.54.03 README) has lots of configuration detail, but assumes intimate knowledge of the hardware configuration in your system.

Finding an appropriate configuration also depends on your workloads, whether you can tolerate slow rendering, how long you need to run on battery, and ambient temperatures. If you dual boot Windows you may encounter different behaviours depending on the order in which you boot and how hot the system was when you rebooted. I recommend sticking with what you have until you really need a change based on how the system it working.

On the Nvidia GPU, I do. Just like nvidia-smi shows > 0% GPU load, nvidia-settings shows rising temperatures after running a while under graphical load.

Although I do dual boot at times, I notice the issue on cold boots as well as hot boots into Fedora.

I’m pretty convinced that is has to do with the way in which the X configuration is persisted in LightDM - using a shell script rather than ~/.xinitrc. But I cannot prove it - neither am I willing to simply give up, because: yes, I do note a difference in rendering speed. Not with the desktop in general, but surely when it comes to HW video decoding or 3D rendering.

Edit: I sifted through the README you linked (particularly the common problems), but no new information so far for me.

There many ways to use the NVIDIA driver in a system with both integrated and NVIDIA graphics. You mention PRIME synchronization, but it isn’t clear if you think it doesn’t work or isn’t appropriate for your use case. I no longer have hardware with both type of graphics, so can’t investigate further. The Arch documents do describe ways to force the use of PRIME with xrandr --setprovideroffloadsink ... and tests like DRI_PRIME=1 glxinfo | grep "OpenGL renderer". The should also be some details of the NVIDIA configuration in journalctl. Have you seen Debian NVIDIA Optimus? It notes that you need details of how displays are connected to GPU’s in your system, but

There appears to be no way to query the system for this information, which leaves trial and error: try the configurations for each hardware variant until one works.

Running Linux On A Thinkpad P52 describes manually switching in 2019, but invites questions.

Thanks, again.

Basically, I stuck to all the docs I’ve linked to in my OP to get the Nvidia dGPU used as the primary GPU for any HW acceleration. There it says you need do make the Nvidia card the primary GPU in the X Configuration :ballot_box_with_check:, use modesetting to enable PRIME synch to avoid screen tearing :ballot_box_with_check:, block nouveau from being loaded :ballot_box_with_check: and finally create a shell script for LightDM as a substitute for .xinitrc :ballot_box_with_check:.

Bottom line: I don’t know which part of that doesn’t work - this is what I need to track down.

At first I suspected the LightDM display-setup-script doesn’t run at first login, but I falsified that in the meantime.

  • “Debian NVIDIA Optimus” - skimmed over it, not much different from the rpmfusion Optimus HOWTO, which I followed and trust more because it was written for Fedora and not Debian.
  • I’ve already tried xrandr options or the tests you suggested with env variables and querying glxinfo. All lead me to the conclusion that the driver module is loaded, but somehow the dGPU isn’t used.
  • The article on “Running Linux On A Thinkpad P52” doesn’t fully apply because the P52s has a different GPU. And Nvidia configuration has changed quite a bit since 2019.

Example: This is what xdpyinfo should get me when the dGPU is used:

$ xdpyinfo|grep -E GLX
    GLX
    NV-GLX

I get this after the 2nd login. Otherwise, only “GLX” is returned.

I’ll try a different approach in parallel now, using nvidia-xrun to load the drivers on demand. Thanks to Btrfs snapshots, this can be quickly reverted if necessary.

Getting NVIDIA working in linux is often a battle, and too often the resulting setup doesn’t hold up over time.

Please search for errors related to your issues in journactl. It is counterproductive to thrash around with different configurations if some underlying error is in play.

Few of the internet recipes will work if applied without adjustment. It is important to understand the steps and then modify as needed for your configuration. Good recipes do include ways to verify that each step is working, but they are hard to find, so you may need to devise your own step-by-step verification.

The dGPU is not normally used in an optimus system unless specifically requested by the user for a specific app when launched, or unless one is using xorg and has set the dGPU as primary as shown here.

That specifically sets the system to use the nvidia GPU, always, but is only functional when using an X11 desktop.

It is also the case that often the dGPU is focused on an exteral monitor and the iGPU is focused on the interal screen. The nvidia.conf file allows the nvidia gpu to access both screens even without setting it as primary.

Thanks for chiming in, Jeff.

Which is exactly what I did according to my first post.
That article provides a good walkthrough. I wish I had found that earlier, it would have saved me jumping back and forth between rpmfusion, ArchWiki and Nvidia docs… Although, for LightDM I still would have needed to consult the ArchWiki.
Comparing the steps in the article to mine - all identical, so no hint to the cause, here.

YES! That took the guesswork out and helped getting a step closer to the cause:

# journalctl -b|grep -i nvidia
...
Jun 28 11:14:30 myp52s (udev-worker)[663]: nvidia: Process '/usr/bin/bash -c 'for i in $(cat /proc/driver/nvidia/gpus/*/information | grep Minor | cut -d \  -f 4); do /usr/bin/mknod -Z -m 666 /dev/nvidia${i} c $(grep nvidia-frontend /proc/devices | cut -d \  -f 1) ${i}; done'' failed with exit code 1.
...
Jun 28 11:14:33 myp52s lightdm[1699]: Could not find provider with name NVIDIA-0

After 2nd login, no further Journal entries with “nvidia” in it get added. So this time the LightDM display-setup-script ran successfully. :ballot_box_with_check:

Q.E.D. - but what now?

Actually, I’ll probably stick with that effect. After having tried “nvidia-xrun” with even more issues, I don’t want to waste more time on Nvidia.

That first udev-worker line from journactl is suspect. From man 7 udev:

udev supplies the system software with device events, manages permissions of device nodes and may create additional symlinks in the /dev/ directory

This suggests that mknod is out-of-scope for udev. Maybe systemd is fighting with udev.

You should be able to understand where the error occurs by splitting out the steps and running them in a terminal. If something else is actually doing the some of the work that udev-worker line was intended to achieve, the resulting error might prevent using NVIDIA despite a correct setup.

You would be justified in deciding to live with the current “magical” 2 logins incantation while waiting for others to provide a proper solution. The trick is to make sure you advertise the problem in the right places: a) rpm-forge bugzilla so others with the same issue can find it as well as getting the attention of developers/packagers, b) upstream forums.

A final note on this thread:
I stuck to the two-fold behaviour (needing a 2nd login to activate the dGPU), until now.

Today, I upgraded to Fedora 40 (Cinnamon). Now, the Nvidia dGPU is used directly upon first login via LightDM.

For whatever reason, Cinnamon is still on X11. I had expected to have to fight with Wayland now, but that’s for a different thread…