Fedora KDE Plasma 43 slowly crashing

Hi everyone,

I’m running Fedora 43 and experiencing persistent hard freezes specifically when using Wayland with the latest Nvidia 580.xx beta drivers.

I’m reaching out to see if this behavior is known with the 580 series on KDE 6.5, or if there are specific workarounds for this setup.

System Specifications:

  • OS: Fedora 43
  • Kernel: 6.17.12-300.fc43.x86_64
  • DE: KDE Plasma 6.5.4 (Wayland)
  • GPU: Nvidia RTX 4060 Ti
  • Driver: 580.119.02 (RPM Fusion)
  • Monitors: Dual Setup - Monitor A (180Hz) + Monitor B (143.47Hz)

The Symptoms: Randomly, often while gaming and/or using Discord for extended periods (a few minutes or hours, if lucky), the system starts “piecing” itself off, the background vanishes along with the taskbar, then some apps start closing, the terminal becomes unusable and finally, the screen fades to black.

  • Video: Screen turns black (eventually).
  • Input: System becomes unresponsive. switching to TTY (Ctrl+Alt+F3 to `F10) does not work.
  • Recovery: The only way to reboot safely is via SysRq (REISUB).

Troubleshooting & Current Status: I saw in other forums that different refresh rates in multiple monitor (in my case, 180 Hz vs 144 Hz) could cause the issue.

  1. Logs: journalctl usually cuts off right before the crash, offering no “smoking gun”.
  2. Kernel Parameters: I have applied nvidia_drm.modeset=1.
  3. Power Management: I applied pcie_port_pm=off and usbcore.autosuspend=-1 via GRUB to prevent the GPU/USB from entering sleep states.
  4. Current Test: I am currently testing forcing both monitors to 60Hz to eliminate the refresh rate mismatch.

Question: Has anyone else experienced these hard freezes (where only REISUB works) with the Nvidia 580.xx drivers?

I’m very new to Linux (I did my research beforehand, but I did the switch a few weeks ago), and this started happening a few days ago, so I would be very grateful if someone know what’s the issue lol.

Thanks!

Update: matching both monitors to 60Hz did not work

My guess is that you run out of memory and the oom killer runs to save the system by killing processes that are using a lot of memory.

If i am right you will find logs in the system journal reporting what processes are being killed off.

I’ve tried using journalctl -b -1 | grep -iE "oom-killer|out of memory|Kill process" to see if something regarding RAM happened during last boot, to no avail, nothing in the terminal popped up (which I don’t think would be the case, I have 64GB of RAM and usage rarely goes above 20% when gaming and having btop open on another monitor).

However, I tried using sudo journalctl | grep -iE "oom-killer|out of memory|Kill process" if something regarding memory happened AT ALL and I had this log on December 21st, seems like something regarding the GPU VRAM (It’s 8GB)?

kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pRmApi->Alloc(…) @ nv_gpu_ops.c:4968
kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from _memdescAllocInternal(pMemDesc) @ mem_desc.c:1359

Full log:

dez 19 02:48:28 fedora kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from _memdescAllocInte
rnal(pMemDesc) @ mem_desc.c:1359
dez 19 02:48:28 fedora kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from rmStatus @ system
_mem.c:356
dez 19 02:48:28 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pRmApi->Allo
c(pRmApi, device->session->handle, isSystemMemory ? device->handle : device->subhandle, &physHandle, isSystemMemory ? NV01_MEMORY_SYSTEM : NV01_MEMORY
_LOCAL_USER, &memAllocParams, sizeof(memAllocParams)) @ nv_gpu_ops.c:4968
dez 19 03:39:19 DESKTOP-C4A3MUO.home kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from _me
mdescAllocInternal(pMemDesc) @ mem_desc.c:1359
dez 19 03:39:19 DESKTOP-C4A3MUO.home kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from rmS
tatus @ system_mem.c:356
dez 19 03:39:19 DESKTOP-C4A3MUO.home kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned fro
m pRmApi->Alloc(pRmApi, device->session->handle, isSystemMemory ? device->handle : device->subhandle, &physHandle, isSystemMemory ? NV01_MEMORY_SYSTEM
: NV01_MEMORY_LOCAL_USER, &memAllocParams, sizeof(memAllocParams)) @ nv_gpu_ops.c:4968
dez 19 03:46:02 DESKTOP-C4A3MUO.home kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from _me
mdescAllocInternal(pMemDesc) @ mem_desc.c:1359
dez 19 03:46:02 DESKTOP-C4A3MUO.home kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from rmS
tatus @ system_mem.c:356
dez 19 03:46:02 DESKTOP-C4A3MUO.home kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned fro
m pRmApi->Alloc(pRmApi, device->session->handle, isSystemMemory ? device->handle : device->subhandle, &physHandle, isSystemMemory ? NV01_MEMORY_SYSTEM
: NV01_MEMORY_LOCAL_USER, &memAllocParams, sizeof(memAllocParams)) @ nv_gpu_ops.c:4968
dez 21 21:08:09 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pmaAllocateP
ages(pMemReserveInfo->pPma, pageSize / PMA_CHUNK_SIZE_64K, PMA_CHUNK_SIZE_64K, &allocOptions, &pageBegin) @ pool_alloc.c:266
dez 21 21:08:09 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from rmMemPoolRes
erve(pCtxBufPool->pMemPool[i], totalSize[i], 0) @ ctx_buf_pool.c:315
dez 21 21:08:09 fedora kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from ctxBufPoolReserve
(pGpu, pKernelChannelGroup->pCtxBufPool, bufInfoList, bufCount) @ kernel_channel_group_api.c:557
dez 21 21:08:09 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pmaAllocateP
ages(pMemReserveInfo->pPma, pageSize / PMA_CHUNK_SIZE_64K, PMA_CHUNK_SIZE_64K, &allocOptions, &pageBegin) @ pool_alloc.c:266
dez 21 21:08:09 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from rmMemPoolRes
erve(pCtxBufPool->pMemPool[i], totalSize[i], 0) @ ctx_buf_pool.c:315
dez 21 21:08:09 fedora kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from ctxBufPoolReserve
(pGpu, pKernelChannelGroup->pCtxBufPool, bufInfoList, bufCount) @ kernel_channel_group_api.c:557
dez 21 21:08:09 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pRmApi->Allo
cWithSecInfo(pRmApi, hClient, hParent, &pChannelGpfifoParams->hPhysChannelGroup, KEPLER_CHANNEL_GROUP_A, NV_PTR_TO_NvP64(&tsgParams), sizeof(tsgParams
), RMAPI_ALLOC_FLAGS_SKIP_RPC, NvP64_NULL, &pRmApi->defaultSecInfo) @ kernel_channel.c:381
dez 21 21:08:09 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pmaAllocateP
ages(pMemReserveInfo->pPma, pageSize / PMA_CHUNK_SIZE_64K, PMA_CHUNK_SIZE_64K, &allocOptions, &pageBegin) @ pool_alloc.c:266
dez 21 21:08:09 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from rmMemPoolRes
erve(pCtxBufPool->pMemPool[i], totalSize[i], 0) @ ctx_buf_pool.c:315
dez 21 21:08:09 fedora kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from ctxBufPoolReserve
(pGpu, pKernelChannelGroup->pCtxBufPool, bufInfoList, bufCount) @ kernel_channel_group_api.c:557
dez 21 21:08:09 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pRmApi->Allo
cWithSecInfo(pRmApi, hClient, hParent, &pChannelGpfifoParams->hPhysChannelGroup, KEPLER_CHANNEL_GROUP_A, NV_PTR_TO_NvP64(&tsgParams), sizeof(tsgParams
), RMAPI_ALLOC_FLAGS_SKIP_RPC, NvP64_NULL, &pRmApi->defaultSecInfo) @ kernel_channel.c:381
dez 21 21:08:09 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pmaAllocateP
ages(pMemReserveInfo->pPma, pageSize / PMA_CHUNK_SIZE_64K, PMA_CHUNK_SIZE_64K, &allocOptions, &pageBegin) @ pool_alloc.c:266
dez 21 21:08:09 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from rmMemPoolRes
erve(pCtxBufPool->pMemPool[i], totalSize[i], 0) @ ctx_buf_pool.c:315
dez 21 21:08:09 fedora kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from ctxBufPoolReserve
(pGpu, pKernelChannelGroup->pCtxBufPool, bufInfoList, bufCount) @ kernel_channel_group_api.c:557
dez 21 21:08:09 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pmaAllocateP
ages(pMemReserveInfo->pPma, pageSize / PMA_CHUNK_SIZE_64K, PMA_CHUNK_SIZE_64K, &allocOptions, &pageBegin) @ pool_alloc.c:266
dez 21 21:08:09 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from rmMemPoolRes
erve(pCtxBufPool->pMemPool[i], totalSize[i], 0) @ ctx_buf_pool.c:315
dez 21 21:08:09 fedora kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from ctxBufPoolReserve
(pGpu, pKernelChannelGroup->pCtxBufPool, bufInfoList, bufCount) @ kernel_channel_group_api.c:557
dez 21 21:08:09 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pRmApi->Allo
cWithSecInfo(pRmApi, hClient, hParent, &pChannelGpfifoParams->hPhysChannelGroup, KEPLER_CHANNEL_GROUP_A, NV_PTR_TO_NvP64(&tsgParams), sizeof(tsgParams
), RMAPI_ALLOC_FLAGS_SKIP_RPC, NvP64_NULL, &pRmApi->defaultSecInfo) @ kernel_channel.c:381
dez 21 21:08:09 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pmaAllocateP
ages(pMemReserveInfo->pPma, pageSize / PMA_CHUNK_SIZE_64K, PMA_CHUNK_SIZE_64K, &allocOptions, &pageBegin) @ pool_alloc.c:266
dez 21 21:08:09 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from rmMemPoolRes
erve(pCtxBufPool->pMemPool[i], totalSize[i], 0) @ ctx_buf_pool.c:315
dez 21 21:08:09 fedora kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from ctxBufPoolReserve
(pGpu, pKernelChannelGroup->pCtxBufPool, bufInfoList, bufCount) @ kernel_channel_group_api.c:557
dez 21 21:08:09 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pRmApi->Allo
cWithSecInfo(pRmApi, hClient, hParent, &pChannelGpfifoParams->hPhysChannelGroup, KEPLER_CHANNEL_GROUP_A, NV_PTR_TO_NvP64(&tsgParams), sizeof(tsgParams
), RMAPI_ALLOC_FLAGS_SKIP_RPC, NvP64_NULL, &pRmApi->defaultSecInfo) @ kernel_channel.c:381
dez 21 21:08:10 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pmaAllocateP
ages(pMemReserveInfo->pPma, pageSize / PMA_CHUNK_SIZE_64K, PMA_CHUNK_SIZE_64K, &allocOptions, &pageBegin) @ pool_alloc.c:266
dez 21 21:08:10 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from rmMemPoolRes
erve(pCtxBufPool->pMemPool[i], totalSize[i], 0) @ ctx_buf_pool.c:315
dez 21 21:08:10 fedora kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from ctxBufPoolReserve
(pGpu, pKernelChannelGroup->pCtxBufPool, bufInfoList, bufCount) @ kernel_channel_group_api.c:557
dez 21 21:08:10 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pmaAllocateP
ages(pMemReserveInfo->pPma, pageSize / PMA_CHUNK_SIZE_64K, PMA_CHUNK_SIZE_64K, &allocOptions, &pageBegin) @ pool_alloc.c:266
dez 21 21:08:10 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from rmMemPoolRes
erve(pCtxBufPool->pMemPool[i], totalSize[i], 0) @ ctx_buf_pool.c:315
dez 21 21:08:10 fedora kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from ctxBufPoolReserve
(pGpu, pKernelChannelGroup->pCtxBufPool, bufInfoList, bufCount) @ kernel_channel_group_api.c:557
dez 21 21:08:10 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pRmApi->Allo
cWithSecInfo(pRmApi, hClient, hParent, &pChannelGpfifoParams->hPhysChannelGroup, KEPLER_CHANNEL_GROUP_A, NV_PTR_TO_NvP64(&tsgParams), sizeof(tsgParams
), RMAPI_ALLOC_FLAGS_SKIP_RPC, NvP64_NULL, &pRmApi->defaultSecInfo) @ kernel_channel.c:381
dez 21 21:08:10 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pmaAllocateP
ages(pMemReserveInfo->pPma, pageSize / PMA_CHUNK_SIZE_64K, PMA_CHUNK_SIZE_64K, &allocOptions, &pageBegin) @ pool_alloc.c:266
dez 21 21:08:10 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from rmMemPoolRes
erve(pCtxBufPool->pMemPool[i], totalSize[i], 0) @ ctx_buf_pool.c:315
dez 21 21:08:10 fedora kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from ctxBufPoolReserve
(pGpu, pKernelChannelGroup->pCtxBufPool, bufInfoList, bufCount) @ kernel_channel_group_api.c:557
dez 21 21:08:10 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pRmApi->Allo
cWithSecInfo(pRmApi, hClient, hParent, &pChannelGpfifoParams->hPhysChannelGroup, KEPLER_CHANNEL_GROUP_A, NV_PTR_TO_NvP64(&tsgParams), sizeof(tsgParams
), RMAPI_ALLOC_FLAGS_SKIP_RPC, NvP64_NULL, &pRmApi->defaultSecInfo) @ kernel_channel.c:381
dez 21 21:08:11 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pmaAllocateP
ages(pMemReserveInfo->pPma, pageSize / PMA_CHUNK_SIZE_64K, PMA_CHUNK_SIZE_64K, &allocOptions, &pageBegin) @ pool_alloc.c:266
dez 21 21:08:11 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from rmMemPoolRes
erve(pCtxBufPool->pMemPool[i], totalSize[i], 0) @ ctx_buf_pool.c:315
dez 21 21:08:11 fedora kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from ctxBufPoolReserve
(pGpu, pKernelChannelGroup->pCtxBufPool, bufInfoList, bufCount) @ kernel_channel_group_api.c:557
dez 21 21:08:11 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pmaAllocateP
ages(pMemReserveInfo->pPma, pageSize / PMA_CHUNK_SIZE_64K, PMA_CHUNK_SIZE_64K, &allocOptions, &pageBegin) @ pool_alloc.c:266
dez 21 21:08:11 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from rmMemPoolRes
erve(pCtxBufPool->pMemPool[i], totalSize[i], 0) @ ctx_buf_pool.c:315
dez 21 21:08:11 fedora kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from ctxBufPoolReserve
(pGpu, pKernelChannelGroup->pCtxBufPool, bufInfoList, bufCount) @ kernel_channel_group_api.c:557
dez 21 21:08:11 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pRmApi->Allo
cWithSecInfo(pRmApi, hClient, hParent, &pChannelGpfifoParams->hPhysChannelGroup, KEPLER_CHANNEL_GROUP_A, NV_PTR_TO_NvP64(&tsgParams), sizeof(tsgParams
), RMAPI_ALLOC_FLAGS_SKIP_RPC, NvP64_NULL, &pRmApi->defaultSecInfo) @ kernel_channel.c:381
dez 21 21:08:11 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pmaAllocateP
ages(pMemReserveInfo->pPma, pageSize / PMA_CHUNK_SIZE_64K, PMA_CHUNK_SIZE_64K, &allocOptions, &pageBegin) @ pool_alloc.c:266
dez 21 21:08:11 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from rmMemPoolRes
erve(pCtxBufPool->pMemPool[i], totalSize[i], 0) @ ctx_buf_pool.c:315
dez 21 21:08:11 fedora kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from ctxBufPoolReserve
(pGpu, pKernelChannelGroup->pCtxBufPool, bufInfoList, bufCount) @ kernel_channel_group_api.c:557
dez 21 21:08:11 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pRmApi->Allo
cWithSecInfo(pRmApi, hClient, hParent, &pChannelGpfifoParams->hPhysChannelGroup, KEPLER_CHANNEL_GROUP_A, NV_PTR_TO_NvP64(&tsgParams), sizeof(tsgParams
), RMAPI_ALLOC_FLAGS_SKIP_RPC, NvP64_NULL, &pRmApi->defaultSecInfo) @ kernel_channel.c:381
dez 21 21:08:28 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pmaAllocateP
ages(pMemReserveInfo->pPma, pageSize / PMA_CHUNK_SIZE_64K, PMA_CHUNK_SIZE_64K, &allocOptions, &pageBegin) @ pool_alloc.c:266
dez 21 21:08:28 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from rmMemPoolRes
erve(pCtxBufPool->pMemPool[i], totalSize[i], 0) @ ctx_buf_pool.c:315
dez 21 21:08:28 fedora kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from ctxBufPoolReserve
(pGpu, pKernelChannelGroup->pCtxBufPool, bufInfoList, bufCount) @ kernel_channel_group_api.c:557
dez 21 21:08:28 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pRmApi->Allo
cWithSecInfo(pRmApi, hClient, hParent, &pChannelGpfifoParams->hPhysChannelGroup, KEPLER_CHANNEL_GROUP_A, NV_PTR_TO_NvP64(&tsgParams), sizeof(tsgParams
), RMAPI_ALLOC_FLAGS_SKIP_RPC, NvP64_NULL, &pRmApi->defaultSecInfo) @ kernel_channel.c:381
dez 21 21:08:28 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pmaAllocateP
ages(pMemReserveInfo->pPma, pageSize / PMA_CHUNK_SIZE_64K, PMA_CHUNK_SIZE_64K, &allocOptions, &pageBegin) @ pool_alloc.c:266
dez 21 21:08:28 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from rmMemPoolRes
erve(pCtxBufPool->pMemPool[i], totalSize[i], 0) @ ctx_buf_pool.c:315
dez 21 21:08:28 fedora kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from ctxBufPoolReserve
(pGpu, pKernelChannelGroup->pCtxBufPool, bufInfoList, bufCount) @ kernel_channel_group_api.c:557
dez 21 21:08:28 fedora kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from pRmApi->Allo
cWithSecInfo(pRmApi, hClient, hParent, &pChannelGpfifoParams->hPhysChannelGroup, KEPLER_CHANNEL_GROUP_A, NV_PTR_TO_NvP64(&tsgParams), sizeof(tsgParams
), RMAPI_ALLOC_FLAGS_SKIP_RPC, NvP64_NULL, &pRmApi->defaultSecInfo) @ kernel_channel.c:381

Ok, I believe I’ve found the issue. Somehow, someway, there were still nouveau GPU drivers still on my system even if I only installed RPM Fusion drivers, so I blacklisted any and all traces of it, and it SEEMED to have worked (as in, I’ve been using the PC the whole day and it didn’t crash).

Code I used (please warn me if I did something wrong):

  • Blacklist the nouveau drivers
    echo "blacklist nouveau" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
    echo "options nouveau modeset=0" | sudo tee -a /etc/modprobe.d/blacklist-nouveau.conf

  • Edit the GRUB file
    sudo nano /etc/default/grub
    Add these parameters :down_arrow: at the end of GRUB_CMD_LINUX
    rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1

  • Rebuild Initramfs and GRUB
    sudo dracut --force --verbose
    sudo grub2-mkconfig -o /boot/grub2/grub.cfg

After rebooting, lsmod | grep -i nouveau returned nothing on the terminal :folded_hands:

Thanks for the response, Barry!

If you installed the nvidia driver from rpmfusion then the nouveau driver is blacklisted on the kernel command line when installing the nvidia driver. This is what is applied when the nvidia drivers are installed.

rd.driver.blacklist=nouveau,nova_core modprobe.blacklist=nouveau,nova_core

If you were required to add those arguments then you apparently installed the nvidia driver from some other source. Rpmfusion makes certain all the needed tweaks and configs are performed during the installation, while installing from other sources may not handle that for you.

Thus, none of the steps you indicate are necessary if the driver is installed from the rpmfusion repo.

Strange, I’m 99% sure I’ve installed them through the rpmfusion repo (I’ve tried following it again through their how-to and everything was already installed)

In any case, it was a false positive. It crashed in the same way like 40 minutes after I posted this message, lol.

I was using Lutris when the crash happened, and a “Disk I/O error” popped up, could it be the cause or the consequence of the error?

You can show the list of enabled repos with dnf repolist and the currently installed nvidia packages with dnf list --installed \*nvidia\*.
If you post those results here it would be helpful.

dnf repolist
repo id                                                                                                   repo name                                                                                                                         
code                                                                                                      Visual Studio Code                                                                                                                
fedora                                                                                                    Fedora 43 - x86_64                                                                                                                
fedora-cisco-openh264                                                                                     Fedora 43 openh264 (From Cisco) - x86_64                                                                                          
rpmfusion-free                                                                                            RPM Fusion for Fedora 43 - Free                                                                                                   
rpmfusion-free-updates                                                                                    RPM Fusion for Fedora 43 - Free - Updates                                                                                         
rpmfusion-nonfree                                                                                         RPM Fusion for Fedora 43 - Nonfree                                                                                                
rpmfusion-nonfree-updates                                                                                 RPM Fusion for Fedora 43 - Nonfree - Updates                                                                                      
updates                                                                                                   Fedora 43 - x86_64 - Updates                                                                                                      


dnf list --installed \*nvidia\*
Pacotes instalados
akmod-nvidia.x86_64                        3:580.119.02-1.fc43 rpmfusion-nonfree-updates
kmod-nvidia-6.17.12-300.fc43.x86_64.x86_64 3:580.119.02-1.fc43 @commandline
nvidia-gpu-firmware.noarch                 20251125-1.fc43     <desconhecido>
nvidia-modprobe.x86_64                     3:580.119.02-1.fc43 rpmfusion-nonfree-updates
nvidia-persistenced.x86_64                 3:580.119.02-1.fc43 rpmfusion-nonfree-updates
nvidia-settings.x86_64                     3:580.119.02-1.fc43 rpmfusion-nonfree-updates
xorg-x11-drv-nvidia.x86_64                 3:580.119.02-1.fc43 rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-cuda.x86_64            3:580.119.02-1.fc43 rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-cuda-libs.i686         3:580.119.02-1.fc43 rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-cuda-libs.x86_64       3:580.119.02-1.fc43 rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-kmodsrc.x86_64         3:580.119.02-1.fc43 rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-libs.i686              3:580.119.02-1.fc43 rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-libs.x86_64            3:580.119.02-1.fc43 rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-power.x86_64           3:580.119.02-1.fc43 rpmfusion-nonfree-updates

My system is in portuguese, but “Pacotes instalados” is “Packets installed” and “desconhecido” is “unknown”

this is quite obvious. GFX VRAM is full and programs cannot allocate more memory on GPU.
Close some programs or windows (e.g. too many firefox windows and tabs open at the same time?)
Maybe a program is not releasing memory on the GPU?
query used gpu memory i.e with nvtop or with nvidia-smi

See if installing egl-wayland2 changes anything.

2 Likes

This may be related

Hi everyone, thanks for all the responses! I’ll try them out and report any results.

Regarding F42 memory leak in FireFox, I checked if there were any “Isolated Web Co” processes running and there sure were, all PIDs seemed correlated enough with the tab opened (in any case, I disabled hardware acceleration and the performance settings, just to make sure).

As for the VRAM usage, I tried to force a crash by running a game on max settings while keeping multiple browser tabs open to max out the GPU memory. Surprisingly, I couldn’t reproduce the crash during this stress test. This makes me wonder if VRAM is the sole culprit, especially since the actual crashes happen much more frequently than the system logs suggest (sometimes multiple times a day, whereas the logs only caught it on Dec 21st and Dec 26th).

I also noticed something odd in the SMART logs: one of my SSD sensors (sudo smartctl -a /dev/nvme0n1) seems stuck at 70 degrees Celsius, even immediately after a cold boot. I also see a high count of unsafe shutdowns, which I assume is just a consequence of the hard freezes/reboots I’ve been forced to perform:

Warning  Comp. Temp. Threshold:     84 Celsius
Critical Comp. Temp. Threshold:     89 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     8.80W       -        -    0  0  0  0        0       0
 1 +     7.10W       -        -    1  1  1  1        0       0
 2 +     5.20W       -        -    2  2  2  2        0       0
 3 -   0.0620W       -        -    3  3  3  3     2500    7500
 4 -   0.0620W       -        -    4  4  4  4     2500    7500

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -    4096       0         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning:                   0x00
Temperature:                        57 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    1.844.018 [944 GB]
Data Units Written:                 1.994.139 [1,02 TB]
Host Read Commands:                 10.415.866
Host Write Commands:                22.943.801
Controller Busy Time:               29
Power Cycles:                       157
Power On Hours:                     125
Unsafe Shutdowns:                   95
Media and Data Integrity Errors:    0
Error Information Log Entries:      287
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 2:               70 Celsius

Error Information (NVMe Log 0x01, 16 of 63 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS  Message
  0        287     0  0x7017  0x4004  0x028            0     0     -  Invalid Field in Command
  1        286     0  0x0017  0x4004      -            0     0     -  Invalid Field in Command

Hello again!

Unfortunately, neither looking after my SSD with a heatsink (it did help with its temperature, though) or installing egl-wayland2 (through sudo dnf install egl-wayland2, is that the right command?) or updating BIOS solved my problem. It’s definitely something “load” related, as it only happens when playing medium/heavy games, but the browser doesn’t even need to be open, it crashed sometimes while only the game (e.g. Ninja Gaiden 4, Marvel Rivals, Clair Obscur, Powerwash simulator 2) is open. But opening lots of Firefox windows, clogging the VRAM, doesn’t seem to crash the PC, it only loses performance as it’s starts to use RAM.

Something that happened as well is that, when playing a game on Lutris and it crashes, Lutris reports a “Disk I/O error” and that some file within gets a “Read-only” error before closing the game (and crashing the system along with it)

I am having this issue although it’s not like a “piecing itself off” but rather a sudden complete freeze. However, when I disconnect or reconnect the USB I can hear the tone as it’s been mounted/unmounted.

I have been reading it’s problem with Wayland and Nvidia drivers, the recommendation was the switch to X11. Anyone have success with that?

Please understand that fedora (both Workstation and KDE) no longer support X11 as of the release of f43. There are 2 or 3 spins of fedora that still support X11, but not the one in this topic (KDE).

You speak of recommendations to switch to X11 but fail to identify where those were seen. If it may be from AI then recognize that those suggestions are often out of date since that may have been valid a year or more back but not today. Also finding such on mailing lists or forums may also be dated similarly.

Your symptoms reported are not the same as the OP. Me too! is not helpful in many cases unless you have exactly the same hardware, software, and symptoms.

Please open your own thread and provide detailed info about the problem so we may focus on this problem for the OP and on your topic for your problem.

I’m now trying to use the steps guided in this arch forum, mainly:

- Try adding nvme_core.default_ps_max_latency_us=0 pcie_aspm=off to your kernel parameters.
- Try adding iommu=soft to your kernel config.
- Check if your drive is overheating (watch --interval 0.25 sudo nvme smart-log /dev/nvme0)
- Try a different slot if possible
- Check if your SSD is properly connected

I’m not quite sure if it’s a hardware issue (I hope not), as I tried to use smartctl’s short and long test on my SSD and both passed without errors (beside that sensor that seems to be broken?)

In most cases this would not indicate a problem with a drive. I think the drive temp you notices is a red herring, and the high count of unsafe shutdowns is probably related to the crashes you note.

The CPU, GPU, and RAM are heavily involved in games while a drive would be accessed infrequently in comparison.

Are these games being played on steam? or lutris? or bottles?

Both Steam and Lutris have crashed my system before.

Ok, thanks. Will start a new thread once it freezes so I can get the journal files.

I’ve got good news and bad news (for myself only lmao):

The good news is that I was able to log the erros I’ve been having through sudo dmesg -wH

The bad news is that I’m 75% sure this is a hardware issue, as I have a KC3000 SSD in a very “entry-level” motherboard and PSU, and the commands that the first few lines suggest (nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off) using were already on when this crash happened.

Given that I’ve already nuked every power-saving feature via kernel parameters and the drive still disconnects under load returning 0xffffffff, is there any software stone left unturned? Or is this definitively a case of the Mobo/PSU not handling the KC3000?

 [jan10 04:33] nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff

[  +0,000006] nvme nvme0: Does your device have a faulty power saving mode enabled?

[  +0,000001] nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off" and report a bug

[  +0,014367] nvme0n1: Read(0x2) @ LBA 548035856, 512 blocks, Host Aborted Command (sct 0x3 / sc 0x71)

[  +0,000008] I/O error, dev nvme0n1, sector 548035856 op 0x0:(READ) flags 0x80700 phys_seg 9 prio class 2

[  +0,000010] nvme0n1: Read(0x2) @ LBA 713777568, 128 blocks, Host Aborted Command (sct 0x3 / sc 0x71)

[  +0,000002] I/O error, dev nvme0n1, sector 713777568 op 0x0:(READ) flags 0x80700 phys_seg 16 prio class 2

[  +0,000006] nvme0n1: Read(0x2) @ LBA 687120920, 256 blocks, Host Aborted Command (sct 0x3 / sc 0x71)

[  +0,000002] I/O error, dev nvme0n1, sector 687120920 op 0x0:(READ) flags 0x80700 phys_seg 29 prio class 2

[  +0,003634] nvme 0000:02:00.0: Unable to change power state from D3cold to D0, device inaccessible

[  +0,000285] nvme nvme0: Disabling device after reset failure: -19

[  +0,006642] I/O error, dev nvme0n1, sector 1248102904 op 0x1:(WRITE) flags 0x4000800 phys_seg 1 prio class 2

[  +0,000494] EXT4-fs (nvme0n1p2): shut down requested (2)

[  +0,000004] Aborting journal on device nvme0n1p2-8.

[  +0,000005] Buffer I/O error on dev nvme0n1p2, logical block 262144, lost sync page write

[  +0,000002] JBD2: I/O error when updating journal superblock for nvme0n1p2-8.

[  +0,005951] systemd-journald[508]: /var/log/journal/1ec85d174c3a4376b68e32710e3fb626/system.journal: Journal file corrupted, rotating.

[  +0,000029] systemd-journald[508]: Failed to rotate /var/log/journal/1ec85d174c3a4376b68e32710e3fb626/system.journal: Input/output error

[  +0,000314] BTRFS: error (device nvme0n1p3) in btrfs_commit_transaction:2535: errno=-5 IO failure (Error while writing out transaction)

[  +0,000005] BTRFS info (device nvme0n1p3 state E): forced readonly

[  +0,000002] BTRFS warning (device nvme0n1p3 state E): Skipping commit of aborted transaction.

[  +0,000001] BTRFS error (device nvme0n1p3 state EA): Transaction aborted (error -5)

[  +0,000002] BTRFS: error (device nvme0n1p3 state EA) in cleanup_transaction:2020: errno=-5 IO failure

[  +0,000308] systemd-journald[508]: Failed to write entry to /var/log/journal/1ec85d174c3a4376b68e32710e3fb626/system.journal (15 items, 445 bytes) despite vacuuming, ignoring: Input/output error

[  +0,000286] systemd-journald[508]: Failed to rotate /var/log/journal/1ec85d174c3a4376b68e32710e3fb626/system.journal: Read-only file system

[  +0,000641] systemd-journald[508]: /var/log/journal/1ec85d174c3a4376b68e32710e3fb626/system.journal: IO error, rotating.

[  +0,000003] systemd-journald[508]: Suppressing rotation, as we already rotated immediately before write attempt. Giving up.

[  +0,005671] systemd-journald[508]: /var/log/journal/1ec85d174c3a4376b68e32710e3fb626/user-1000.journal: IO error, rotating.

[  +0,000043] systemd-journald[508]: Failed to rotate /var/log/journal/1ec85d174c3a4376b68e32710e3fb626/user-1000.journal: Read-only file system

[  +0,000708] systemd-journald[508]: Failed to rotate /var/log/journal/1ec85d174c3a4376b68e32710e3fb626/system.journal: Read-only file system

[  +0,000423] systemd-journald[508]: /var/log/journal/1ec85d174c3a4376b68e32710e3fb626/system.journal: IO error, rotating.

[  +0,000003] systemd-journald[508]: Suppressing rotation, as we already rotated immediately before write attempt. Giving up.

[  +0,000356] systemd-journald[508]: Suppressing rotation, as we already rotated immediately before write attempt. Giving up.

[  +0,317636] BTRFS error (device nvme0n1p3 state EA): run_delalloc_nocow failed, root=256 inode=839802 start=0 len=4096 cur_offset=0 oe_cleanup=0 oe_cleanup_len=0 untouched_start=0 untouched_len=4096: -5

[  +0,000008] BTRFS error (device nvme0n1p3 state EA): failed to run delalloc range, root=256 ino=839802 folio=0 submit_bitmap=0 start=0 len=4096: -5

[  +0,575669] coredump: 1610(in:imjournal): |/usr/lib/systemd/systemd-coredump pipe failed

[  +0,337489] coredump: 53608(rsyslogd): |/usr/lib/systemd/systemd-coredump pipe failed

[  +0,246252] coredump: 53610(rsyslogd): |/usr/lib/systemd/systemd-coredump pipe failed

[  +0,250070] coredump: 53613(rsyslogd): |/usr/lib/systemd/systemd-coredump pipe failed

[  +0,086046] coredump: 1132(systemd-logind): |/usr/lib/systemd/systemd-coredump pipe failed

[  +0,042363] coredump: 53628(rsyslogd): |/usr/lib/systemd/systemd-coredump pipe failed

[  +0,119970] coredump: 53647(rsyslogd): |/usr/lib/systemd/systemd-coredump pipe failed

[  +0,392123] coredump: 28529(steamwebhelper): |/usr/lib/systemd/systemd-coredump pipe failed

[  +0,401867] BTRFS error (device nvme0n1p3 state EA): error loading props for ino 729339 (root 257): -5

[  +0,008623] BTRFS error (device nvme0n1p3 state EA): error loading props for ino 729929 (root 257): -5

[  +0,191961] coredump: 28238(steam): |/usr/lib/systemd/systemd-coredump pipe failed

[  +0,014614] coredump: 1(steamwebhelper): |/usr/lib/systemd/systemd-coredump pipe failed

[  +3,628565] btrfs_dev_stat_inc_and_print: 6665 callbacks suppressed

[  +0,000003] BTRFS error (device nvme0n1p3 state EA): bdev /dev/nvme0n1p3 errs: wr 894, rd 5782, flush 0, corrupt 0, gen 0

[  +0,000047] BTRFS error (device nvme0n1p3 state EA): bdev /dev/nvme0n1p3 errs: wr 895, rd 5782, flush 0, corrupt 0, gen 0

[  +0,000043] BTRFS error (device nvme0n1p3 state EA): bdev /dev/nvme0n1p3 errs: wr 896, rd 5782, flush 0, corrupt 0, gen 0

[  +0,000010] BTRFS error (device nvme0n1p3 state EA): bdev /dev/nvme0n1p3 errs: wr 897, rd 5782, flush 0, corrupt 0, gen 0

[  +0,000020] BTRFS error (device nvme0n1p3 state EA): bdev /dev/nvme0n1p3 errs: wr 898, rd 5782, flush 0, corrupt 0, gen 0

... (lots of more errors of this nature)