Hey guys, please help me. I googled this but I can’t fix it myself.
So I have Thinkpad E595 with integrated Vega AMD GPU and for some reason OpenGL drivers doesn’t load properly. Most of the time it works okay, but every other day it starts rendering all windows black and shortly after that it crashes completely. It’s really annoying.
Another weird thing happens when I watch video in MPV in fullscreen mode and I have Firefox open in background. So when I exit fullscreen mode in MPV my Firefox has completely black window until I move my cursor over it. This happens ONLY in Firefox. After trial and error I found if I set Firefox flag gfx.x11-egl.force-disabled to true it fixes black screen problem completely. But after latest video driver update even this flag doesn’t seem to help.
Anyway, I’m not sure what else can I say. Here’s some commands output that might be useful.
I found a thread here that ends with a suggestion to update the firmware. I think it is a similar card to yours (Ryzen 5 3500U Vega 8?) and it mentions sporadic (once 1-3 days) problems.
I’m not sure how to update that firmware though. I wonder if fwupdmgr get-updates would do it?
fwupdmgr get-updates
Devices with no available firmware updates:
• System Firmware
• UEFI Device Firmware
• UEFI Device Firmware
Devices with the latest available firmware version:
• MZALQ256HAJD-000L1
• UEFI dbx
No updates available
The thing is I have this problem only on Fedora. I was using Arch and Ubuntu earlier and they work fine.
I wonder is it possible this problem comes from badly configured user groups? I read somewhere that user is suppose to be in render and video groups. And I don’t have those when I enter groups in terminal.
There might also be some VBIOS updates provided by laptop manufacturer (bundled with BIOS/UEFI update) or GPU manufacturer (with flashing utility or bundled with drivers, mostly for desktop GPUs).
That bug report I linked to earlier also stated that the problem didn’t occur on older kernels. So if the other distros are running older kernels, that might be why they don’t see the problem. In any case, it sounds like a pretty low-level driver/firmware problem. So your options to fix it are probably to try different drivers/firmware or maybe tweak what features the driver is trying to use with some parameters. For the latter, you should be able to get a list of what tunables are available by running modinfo <driver-name>. Also, lspci -v should work to verify what driver your device is using.
That theory can be easily tested sudo usermod -a -G <group name> <username> would add the user to the named group. A logout and log back in would make the change effective.
Not sure what parameters I suppose to change, there are like 80 there.
And what do you mean by “try different drivers”? How do I do that? I only know about amdgpu…
sudo lspci -v
<...>
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso/Raven 2 [Radeon Vega Series / Radeon Vega Mobile Series] (rev c2) (prog-if 00 [VGA controller])
Subsystem: Lenovo ThinkPad E595
Flags: bus master, fast devsel, latency 0, IRQ 40, IOMMU group 13
Memory at c0000000 (64-bit, prefetchable) [size=256M]
Memory at d0000000 (64-bit, prefetchable) [size=2M]
I/O ports at 1000 [size=256]
Memory at d0500000 (32-bit, non-prefetchable) [size=512K]
Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/4 Maskable- 64bit+
Capabilities: [c0] MSI-X: Enable+ Count=3 Masked-
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [200] Physical Resizable BAR
Capabilities: [270] Secondary PCI Express
Capabilities: [2a0] Access Control Services
Capabilities: [2b0] Address Translation Service (ATS)
Capabilities: [2c0] Page Request Interface (PRI)
Capabilities: [2d0] Process Address Space ID (PASID)
Capabilities: [320] Latency Tolerance Reporting
Kernel driver in use: amdgpu
Kernel modules: amdgpu
<...>
In this case, I probably should have said different driver versions. I.e., if you are up to running a different/older kernel version, that might have a chance of working based on that linked report. (On Linux, the drivers come with the kernel, so different kernel version ~~ different driver version.)
That is more of a last-resort trial-and-error sort of thing. Since your graphics card is a fairly new model, it might not be fully supported by the amdgpu driver yet. You might get better results by enabling the exp_hw_support option.
To try that, create a new file under /etc/modprobe.d containing the following line.
Yeah. I expect that if you download the same kernel version that you are using for Arch, you will get the same behavior. (I think you said/implied that your video was working properly under Arch.) Here is a link to a Fedora Magazine article that explains how to download and install a specific kernel on Fedora Linux.
Edit: You should also remove that file from /etc/modprobe.d since it didn’t work.
Edit: Actually, I may have overlooked something. It looks like the amdgpu driver is loaded early in the bootup process.
Since the amdgpu driver is loaded during the initramfs stage, you would need to have that modprobe configuration file in your initramfs. So before going through the hassle of trying to downgrade your kernel, you might want to try running sudo dracut -f with that modprobe conf file present. You can verify that your config file got included by running the same sort of command I showed above for checking the presence of amdgpu.ko, just substitute the name that you chose for your modprobe conf file. Once you have that file included, reboot to test it. Rebuilding your initramfs is only necessary the first time. Future kernel installations/upgrades should automatically pull in that file.
Edit: You can also verify that the parameter took by running the following command.
Oh yeah, it does show 1, but still no luck. Anyway, thanks for you help.
I don’t think I’m gonna follow through downgrading kernel. Seems like a bad idea to me and a little bit unnecessary. Maybe I just need to try my luck with another OS.
Okay, it seems I was wrong about not having this problem on other distros. My compositor wasn’t crashing as often when I was using Garuda and EndeavourOS. But now I actually downloaded couple of live ISO and tested. Turns out I have the same amdgpu_device_initialize failed error on Garuda (Arch with zen kernel i think), MX Linux (Debian 11 with 5.10 kernel) and on Kubuntu. So there is that