Discrete NVIDIA 2050 GPU reverts to llvmpipe after sometime while using laptop

Hello, sorry for the problems but I’ve noticed that after some usage the system would fail to detect my nvidia graphics card. I first noticed the problem after trying to run my Lutris app again and it no longer detected my GPU, the problem seems to revert after a reset of the system.

My laptop is Acer Nitro V15-51 with intel i5 and Nvidia 2050

I followed this RPM Fusion tutorial: Making sure you're not a bot!

And after asking in ChatGPT/Gemini it share how to showing logs. These are the sort of commands I used to try and see what is happening:

mpernia@fedora:~$ lspci -k | grep -EA3 ‘VGA|3D|Display’
0000:00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-P [UHD Graphics] (rev 04)
Subsystem: Acer Incorporated [ALI] Device 171e
Kernel driver in use: i915
Kernel modules: i915, xe

0000:01:00.0 VGA compatible controller: NVIDIA Corporation GA107 [GeForce RTX 2050] (rev a1)
Subsystem: Acer Incorporated [ALI] Device 171e
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_drm, nvidia

mpernia@fedora:~$ nvidia-smi
Unable to determine the device handle for GPU0: 0000:01:00.0: Unknown Error
No devices were found
mpernia@fedora:~$ lsmod | grep nvidia
lsmod | grep nouveau
nvidia_drm 159744 2
nvidia_modeset 2162688 1 nvidia_drm
nvidia_uvm 4218880 0
nvidia 12947456 4 nvidia_uvm,nvidia_modeset
drm_ttm_helper 16384 2 nvidia_drm,xe
video 81920 4 acer_wmi,xe,i915,nvidia_modeset

mpernia@fedora:~$ mokutil --sb-state
SecureBoot disabled
mpernia@fedora:~$ sudo journalctl -b -k | grep -iE ‘nvidia|nvrM|gpu|error|fail’
Jun 20 02:41:30 fedora kernel: ACPI BIOS Error (bug): Could not resolve symbol [_SB.PC00.RP08.CMDR], AE_NOT_FOUND (20240827/psargs-332)
Jun 20 02:41:30 fedora kernel: ACPI Error: Aborting method _SB.PC00.RP05.PCRP._ON due to previous error (AE_NOT_FOUND) (20240827/psparse-529)
Jun 20 02:41:30 fedora kernel: RAS: Correctable Errors collector initialized.
Jun 20 02:41:31 fedora kernel: pci 10000:e0:06.2: bridge window [io size 0x1000]: failed to assign
Jun 19 18:41:33 fedora kernel: nvidia: loading out-of-tree module taints kernel.
Jun 19 18:41:33 fedora kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
Jun 19 18:41:33 fedora kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 509
Jun 19 18:41:33 fedora kernel: nvidia 0000:01:00.0: enabling device (0006 → 0007)
Jun 19 18:41:33 fedora kernel: nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
Jun 19 18:41:33 fedora kernel: RAPL PMU: hw unit of domain pp1-gpu 2^-14 Joules
Jun 19 18:41:33 fedora kernel: NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 575.57.08 Release Build (dvs-builder@U22-I3-H04-01-5) Sat May 24 07:03:13 UTC 2025
Jun 19 18:41:33 fedora kernel: nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64 575.57.08 Release Build (dvs-builder@U22-I3-H04-01-5) Sat May 24 06:53:21 UTC 2025
Jun 19 18:41:33 fedora kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Jun 19 18:41:34 fedora kernel: input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:06.0/0000:01:00.1/sound/card0/input16
Jun 19 18:41:34 fedora kernel: input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:06.0/0000:01:00.1/sound/card0/input17
Jun 19 18:41:34 fedora kernel: input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:06.0/0000:01:00.1/sound/card0/input18
Jun 19 18:41:34 fedora kernel: input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:06.0/0000:01:00.1/sound/card0/input19
Jun 19 18:41:34 fedora kernel: NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11.
Jun 19 18:41:35 fedora kernel: NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11.
Jun 19 18:41:36 fedora kernel: [drm] Initialized nvidia-drm 0.0.0 for 0000:01:00.0 on minor 0
Jun 19 18:41:36 fedora kernel: nvidia 0000:01:00.0: [drm] Cannot find any crtc or sizes
Jun 19 19:38:08 fedora kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from _memdescAllocInternal(pMemDesc) @ mem_desc.c:1353
Jun 19 19:38:08 fedora kernel: NVRM: sysmemConstruct_IMPL: *** Cannot allocate sysmem through fb heap
Jun 19 19:38:08 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pRmApi->Alloc(pRmApi, device->session->handle, isSystemMemory ? device->handle : device->subhandle, &physHandle, isSystemMemory ? NV01_MEMORY_SYSTEM : NV01_MEMORY_LOCAL_USER, &memAllocParams, sizeof(memAllocParams)) @ nv_gpu_ops.c:4647
Jun 19 19:43:11 fedora kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from _memdescAllocInternal(pMemDesc) @ mem_desc.c:1353
Jun 19 19:43:11 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from status @ kernel_gsp.c:4615
Jun 19 19:43:11 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from kgspCreateRadix3(pGpu, pKernelGsp, &pKernelGsp->pSRRadix3Descriptor, NULL, NULL, gspfwSRMeta.sizeOfSuspendResumeData) @ kernel_gsp_tu102.c:1303
Jun 19 19:43:11 fedora kernel: nvidia 0000:01:00.0: can’t suspend (nv_pmops_runtime_suspend [nvidia] returned -5)
Jun 19 20:07:35 fedora kernel: NVRM: Error in service of callback

mpernia@fedora:~$ /sbin/lspci | grep -e VGA
0000:00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-P [UHD Graphics] (rev 04)
0000:01:00.0 VGA compatible controller: NVIDIA Corporation GA107 [GeForce RTX 2050] (rev a1)
mpernia@fedora:~$ /sbin/lspci | grep -e 3D
mpernia@fedora:~$ modinfo -F version nvidia
575.57.08

Having the exact same issue after updating to the latest driver, 575.57, from 570, on my ASUS TUF Gaming F15 (2023) with an RTX 4070 and i7-13620H. A few minutes after starting the system up, the same exact issue regarding “out of memory” occurs and the GPU is no longer accessible to the system.

Here is the relevant excerpt of my dmesg logs:

[ 99.260465] NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from _memdescAllocInternal(pMemDesc) @ mem_desc.c:1353
[ 99.260470] NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from status @ kernel_gsp.c:4615
[ 99.260483] NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from kgspCreateRadix3(pGpu, pKernelGsp, &pKernelGsp->pSRRadix3Descriptor, NULL, NULL, gspfwSRMeta.sizeOfSuspendResumeData) @ kernel_gsp_tu102.c:1303
[ 99.267540] nvidia 0000:01:00.0: can’t suspend (nv_pmops_runtime_suspend [nvidia] returned -5)
[ 123.410734] NVRM: _issueRpcAndWait: rpcSendMessage failed with status 0x00000011 for fn 103!
[ 123.410738] NVRM: rpcRmApiAlloc_GSP: GspRmAlloc failed: hClient=0xc1d0002f; hParent=0x00000000; hObject=0x00000000; hClass=0x00000000; paramsSize=0x00000078; paramsStatus=0x00000000; status=0x00000011
[ 123.410740] NVRM: _issueRpcAndWait: rpcSendMessage failed with status 0x00000011 for fn 103!
[ 123.410741] NVRM: rpcRmApiAlloc_GSP: GspRmAlloc failed: hClient=0xc1d0002f; hParent=0xc1d0002f; hObject=0xcaf00000; hClass=0x00000080; paramsSize=0x00000038; paramsStatus=0x00000000; status=0x00000011
[ 123.410750] NVRM: _issueRpcAndWait: rpcSendMessage failed with status 0x00000011 for fn 103!
[ 123.410751] NVRM: rpcRmApiAlloc_GSP: GspRmAlloc failed: hClient=0xc1d0002f; hParent=0xcaf00000; hObject=0xcaf00001; hClass=0x00002080; paramsSize=0x00000004; paramsStatus=0x00000000; status=0x00000011
[ 123.410765] NVRM: _issueRpcAndWait: rpcSendMessage failed with status 0x00000011 for fn 10!
[ 123.410766] NVRM: rpcRmApiFree_GSP: GspRmFree failed: hClient=0xc1d0002f; hObject=0xcaf00001; paramsStatus=0x00000000; status=0x00000011
[ 123.410772] NVRM: _issueRpcAndWait: rpcSendMessage failed with status 0x00000011 for fn 10!
[ 123.410773] NVRM: rpcRmApiFree_GSP: GspRmFree failed: hClient=0xc1d0002f; hObject=0xcaf00000; paramsStatus=0x00000000; status=0x00000011
[ 123.553780] NVRM: _issueRpcAndWait: rpcSendMessage failed with status 0x00000011 for fn 103!
[ 123.553785] NVRM: rpcRmApiAlloc_GSP: GspRmAlloc failed: hClient=0xc1d00034; hParent=0x00000000; hObject=0x00000000; hClass=0x00000000; paramsSize=0x00000078; paramsStatus=0x00000000; status=0x00000011
[ 123.553787] NVRM: _issueRpcAndWait: rpcSendMessage failed with status 0x00000011 for fn 103!
[ 123.553788] NVRM: rpcRmApiAlloc_GSP: GspRmAlloc failed: hClient=0xc1d00034; hParent=0xc1d00034; hObject=0xcaf00000; hClass=0x00000080; paramsSize=0x00000038; paramsStatus=0x00000000; status=0x00000011
[ 123.553799] NVRM: _issueRpcAndWait: rpcSendMessage failed with status 0x00000011 for fn 103!
[ 123.553800] NVRM: rpcRmApiAlloc_GSP: GspRmAlloc failed: hClient=0xc1d00034; hParent=0xcaf00000; hObject=0xcaf00001; hClass=0x00002080; paramsSize=0x00000004; paramsStatus=0x00000000; status=0x00000011
[ 123.553819] NVRM: _issueRpcAndWait: rpcSendMessage failed with status 0x00000011 for fn 10!
[ 123.553820] NVRM: rpcRmApiFree_GSP: GspRmFree failed: hClient=0xc1d00034; hObject=0xcaf00001; paramsStatus=0x00000000; status=0x00000011
[ 123.553827] NVRM: _issueRpcAndWait: rpcSendMessage failed with status 0x00000011 for fn 10!
[ 123.553828] NVRM: rpcRmApiFree_GSP: GspRmFree failed: hClient=0xc1d00034; hObject=0xcaf00000; paramsStatus=0x00000000; status=0x00000011
[ 128.409842] NVRM: Error in service of callback
[ 306.996381] NVRM: rm_power_source_change_event: rm_power_source_change_event: Failed to handle Power Source change event, status=0x11

EDIT: I’m still testing this in the long-term so I’m not sure if it works or not, but setting the kernel parameter nvidia.NVreg_EnableGpuFirmware=1 seems to work around this for the time being. From what I understand, setting it to 0 shouldn’t do anything because the default is apparently already to set it to 0, as of later driver versions (535?).

EDIT 2: The core issue likely still happens occasionally, but it recovers now. Below is the new dmesg log I get when it “runs out of memory”:

[ 8089.608012] NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from _memdescAllocInternal(pMemDesc) @ mem_desc.c:1353
[ 8089.608025] NVRM: faultbufCtrlCmdMmuFaultBufferRegisterNonReplayBuf_IMPL: Error allocating client shadow fault buffer for non-replayable faults
[ 8089.704996] NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from _memdescAllocInternal(pMemDesc) @ mem_desc.c:1353
[ 8089.705010] NVRM: faultbufCtrlCmdMmuFaultBufferRegisterNonReplayBuf_IMPL: Error allocating client shadow fault buffer for non-replayable faults

maybe try to read the documentation for the current release and not some ancient and obsolete driver versions?

Interesting, thanks for linking the documentation! In any case, it seems that enabling the GSP firmware does seem to mitigate the issue, though there is the occasional suspend issue, but that’s likely a different NVIDIA issue altogether.

Turning off the GPU when not in use also seems to be working properly, at least on the RTX 4070.

The documentation for the currently installed nvidia driver version can be found at /usr/share/doc/xorg-x11-drv-nvidia/

If installed driver is 575.57 then consider upgrading to 575.64. It’s available in rpmfusion-nonfree-updates-testing

1 Like

Unfortuanately, it seems the problem still happens (albeit after much longer) with GSP firmware turned on, and also on 575.64. I decided to make an NVIDIA forums topic about the issue, as I didn’t directly see any posts about the issue there.

I do wish some more testing was done (on laptops, perhaps?) before pushing this driver to the stable rpmfusion repositories, as this is quite a critical bug for many users. For the time being, I will revert back to driver 570 where there are less issues, but I am happy to test other workarounds.

can you reproduce on 570.x with open source kernel modules?
Drivers packaged by rpmfusion don’t enable the open source kernel modules for releases 570.x or older. 575.x detect supported GPUs and build open kernel modules.

check with modinfo -l nvidia
NVIDIA are the closed source kernel modules

sudo sh -c 'echo "%_with_kmod_nvidia_open 1" > /etc/rpm/macros.nvidia-kmod'
sudo akmods --kernels $(uname -r) --rebuild

delete the file in /etc/rpm and rebuild the kernel modules to revert.

you can also force the 575.x drivers to use the closed source modules

sudo sh -c 'echo "%_without_kmod_nvidia_detect 1" > /etc/rpm/macros.nvidia-kmod'
sudo akmods --kernels $(uname -r) --rebuild

modinfo -l nvidia should now report ‘NVIDIA’ instead of ‘Dual MIT/GPL

I’ve switched my kernel modules on 570.x to the open source ones (can confirm modinfo reports “Dual MIT/GPL”), and so far I’ve yet to be able to reproduce this issue. I have left the system without the dGPU in use for a while, and every time a call to nvidia-smi just takes a few seconds to wake the GPU, but eventually succeeds.

Suspending the system manually also seems to work, from the few times I tested it.

Quick update: An NVIDIA representative replied to my thread with a patch that purported to fix the issue, and as of testing on the open kernel modules, it seems to work perfectly!

The fix should apparently be in the next driver release, but this patch fixes things for the time being.

EDIT: This does not entirely fix the issue, as some CUDA-based services like NVENC encoding seem to not work intermittently after a suspend.

EDIT 2: Driver version 575.64.1, as released on the stable rpmfusion repos seems to work properly, albeit does not work on kernel 6.15 without patches (which patches I don’t know, I remain on 6.14 for now).

1 Like