Blender, AMD, and OpenCL

How do you feel about writing a howto? Or doing a Fed Mag article?

I feel like an imopster! But this is useful info, so I’m for it :smiley:

2 Likes

@glb and @rlengland ,
Could you guys use an article about this?

Is this about the Blender removal of OpenCL support on AMD GPUs and the effort to make Blender usable on those graphics cards? It may be a bit specific but there are probably enough Blender users in the Fedora Linux realm to make it worth while.

That’s my 2 cents worth.

(I should read the entire thread, eh? :slight_smile: )

2 Likes

I think I concur with the statement that the author made in the last paragraph:

Obviously I don’t encourage anyone to do what I did …

While I’m sure it is a useful hack for many people to work around the current problem, I don’t think Fedora Magazine should promote this.

Just my 2¢

1 Like

@rlengland I agree that this may be a pretty specific use case. With Nvidia cards, there’s so much that ‘just works’ for Blender users who run Fedora, but in order for AMD users to be able to get GPU compute capability out of their cards, more steps are necessary. This was something I really needed when I switched.

@glb That warning rings true. It is a hack, and someone else’s mileage may vary. In fact, it’s entirely possible that this hack will only work until AMD produces a newer proprietary driver (if there are changes to the OpenCL portion) or if Blender makes a breaking change to their HIP implementation.

My intention was only to point other Blender/Fedora AMD users toward something that helped me get GPU compute capabilities out of the Cycles render engine under Blender 3.2. The reddit article was difficult for me to find, so I thought drawing some attention to it may help others find it, too!

Yes, AMD is unfortunately being problematic, and Nvidia is not always able to “just work”, although the included drivers are pretty good.
Perhaps a howto doc https://docs.fedoraproject.org/en-US/quick-docs/contribute-to-quick-docs/ would be more appropriate.

1 Like

Hi,

First off Thank you for creating this post!

After following the steps I unfortunately still had a few issues:

  1. No devices would show up in the HIP Tab in the Blender System Settings.
    When I checked the console where I opened blender I saw an error something like “which: hipcc not found”. (I don’t have the exact error anymore unfortunately)
    To get hipcc I had to install the package hip-devel

  2. Even after installing that package I didn’t have /opt/rocm/bin (where hipcc is located) in my $PATH, so I created the following file:
    echo 'export PATH=$PATH:/opt/rocm/bin:/opt/rocm/profiler/bin:/opt/rocm/opencl/bin/x86_64' | sudo tee -a /etc/profile.d/rocm.sh
    (I found this command in the ROCm Docs)
    Only then my GPU and CPU (5900XT & 5800X) showed up in the HIP Panel in blender.

  3. After turning on both devices in the Blender HIP System Settings I tried rendering in Cycles using the GPU but got the following error in the console (and the render just didn’t start):

[...] 
Compiling HIP kernel ...
hipcc -Wno-parentheses-equality -Wno-unused-value --hipcc-func-supp -O3 -ffast-math --amdgpu-target=gfx1030 -I /usr/share/blender/3.2/scripts/addons/cycles/source --genco /usr/share/blender/3.2/scripts/addons/cycles/source/kernel/device/hip/kernel.cpp -o "/home/joni/.cache/cycles/kernels/cycles_kernel_gfx1030_D473C7AC23F611898D03A1AA639945AA"
clang-14: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library
Failed to execute compilation command, see console for details.
[...] 

So I tinkered and researched for a bit and tried installing the package “rocm-device-libs” which provides /usr/lib64/amdgcn/bitcode which apparently is needed?!
I also had to set DEVICE_LIB_PATH=/usr/lib64/amdgcn/bitcode because hipcc just wouldn’t find it.
So I expanded the file which I created earlier by a second line:

[joni@linuxjoni02 bin]$ cat /etc/profile.d/rocm.sh
#FOR BLENDER
export PATH=$PATH:/opt/rocm/bin:/opt/rocm/profiler/bin:/opt/rocm/opencl/bin/x86_64
export DEVICE_LIB_PATH=/usr/lib64/amdgcn/bitcode

And now It seems to work - I could at least render the default cube a couple of times.

Did I miss something or does somebody know, why I had to do these extra steps?

You’re so welcome!

I’m sorry to hear that you encountered some bumps along the way. It sounds like you did a lot of extra research and work to make this possible on your machine. I’m not certian why you needed to do so much more faffing than I did in order to get Blender to recognize your AMD GPU/CPU.

How do you download Blender? For instance, I always download the AppImage directly from Blender.org. Do you use the Fedora repositories? Flatpak?

All the HIP packages I needed were in those libraries I downloaded, as well as the amdgpu-install package for hip in step 5.

I have found, though, that the way that I installed things above has caused great instability in my X and Wayland experience when using the Blender viewport with my AMD GPU (I also have an NVIDIA GPU that I use in tandem to use viewport now). I think it’s related to this issue: https://discussion.fedoraproject.org/t/random-freezes-on-fedora-36-with-amd-gpu/77614/6 also discussed tangentially here: https://discussion.fedoraproject.org/t/where-to-report-amd-gpu-lockups-crasher-hangs-bugs-related-to-seemingly-triggered-by-firefox-va-api-after-resuming-from-suspend/69378.

I think there’s an open bug for it, (https://bugzilla.redhat.com/show_bug.cgi?id=2022980) but almost every time I try to navigate through the viewport using my RX 6700 XT, my entire X session (or Wayland) crashes and I have to pkill my user to log back in. It’s a very frustrating issue, but HIP works consistently for rendering (again, just not viewport in Blender).

I would love to hear about the stability of Blender viewport using the method you described here, and how you’re using Blender (AppImage, flatpak, rpm) to compare!

Hi!
I prefer “standard” packages instead of flatpak or AppImage so I got the latest available blender Version (3.2.1) on the fedora package testing page
https://bodhi.fedoraproject.org/updates/FEDORA-2022-7fe04fc1a5

I didn’t have any issues so far, but I only worked with it for ~3h yesterday creating some textures in the node editor.
I wasn’t using the viewport much, but I was moving around/rotating a bit to get a good view in the “rendered” view checking reflections for example and had no issues - it was smooth, fast and stable.

I am running on Wayland (and KDE) - I don’t use X11 anymore, so I can’t say anything about that.

I attached a Screenshot of my Setup (I can only embed one picture unfortunately):

In Blender System Settings I have both the CPU and GPU enabled and can render without issues.

some additional Info about my system:

[joni@linuxjoni02 ~]$ blender --version | head -5
Blender 3.2.1
        build date: 2022-07-11
        build time: 00:00:00
        build commit date: 1970-01-01
        build commit time: 00:00
[joni@linuxjoni02 ~]$ neofetch
             .',;::::;,'.                joni@linuxjoni02                                                                                                                                           
         .';:cccccccccccc:;,.            ----------------                                                                                                                                           
      .;cccccccccccccccccccccc;.         OS: Fedora Linux 36 (KDE Plasma) x86_64                                                                                                                    
    .:cccccccccccccccccccccccccc:.       Kernel: 5.18.16-200.fc36.x86_64                                                                                                                            
  .;ccccccccccccc;.:dddl:.;ccccccc;.     Uptime: 9 mins                                                                                                                                             
 .:ccccccccccccc;OWMKOOXMWd;ccccccc:.    Packages: 2742 (rpm)                                                                                                                                       
.:ccccccccccccc;KMMc;cc;xMMc:ccccccc:.   Shell: bash 5.1.16                                                                                                                                         
,cccccccccccccc;MMM.;cc;;WW::cccccccc,   Resolution: 2560x1440                                                                                                                                      
:cccccccccccccc;MMM.;cccccccccccccccc:   DE: Plasma 5.25.4                                                                                                                                          
:ccccccc;oxOOOo;MMM0OOk.;cccccccccccc:   WM: kwin                                                                                                                                                   
cccccc:0MMKxdd:;MMMkddc.;cccccccccccc;   WM Theme: GruvboxAurorae                                                                                                                                   
ccccc:XM0';cccc;MMM.;cccccccccccccccc'   Theme: [Plasma], Adwaita [GTK2], Breeze [GTK3]                                                                                                             
ccccc;MMo;ccccc;MMW.;ccccccccccccccc;    Icons: [Plasma], candy-icons [GTK2/3]                                                                                                                      
ccccc;0MNc.ccc.xMMd:ccccccccccccccc;     Terminal: cool-retro-term                                                                                                                                  
cccccc;dNMWXXXWM0::cccccccccccccc:,      CPU: AMD Ryzen 7 5800X (16) @ 3.800GHz                                                                                                                     
cccccccc;.:odl:.;cccccccccccccc:,.       GPU: AMD ATI Radeon RX 6800/6800 XT / 6900 XT                                                                                                              
:cccccccccccccccccccccccccccc:'.         Memory: 3284MiB / 31999MiB                                                                                                                                 
.:cccccccccccccccccccccc:;,..                                                                                                                                                                       
  '::cccccccccccccc::;,.                                                                                                                                                                                                                                                                                                                                                                
1 Like

I have to say a huge THANK YOU for explaining your process. I followed your steps:

  1. Install blender, hip-devel, and rocm-device-libs
  2. echo 'export PATH=$PATH:/opt/rocm/bin:/opt/rocm/profiler/bin:/opt/rocm/opencl/bin/x86_64 export DEVICE_LIB_PATH=/usr/lib64/amdgcn/bitcode' | sudo tee -a /etc/profile.d/rocm.sh
  3. Restart the machine
  4. Login under Wayland OR X

After I followed those steps, now I can use my RX 6700 XT with Blender and navigate around the viewport without having terrible crashes that result in lost work and having to kill the user session to log back in.

Thank you, and thank you again.

EDIT: I’m still seeing some instability, but it seems to be localized to certain textures. I could be wrong, but I think my machine is still having trouble with the setup :frowning:

Glad to hear that it somewhat helped. I hope you get the rest sorted out as well.
By the way: If you have the .blend file I can try to reproduce your issue, if that helps.

Uh oh!
I think I got the same issue as you with my entire system (most likely the amd driver) crashing because of the blender viewport (but as far as I see it only happens in the rendered viewport).

when checking the last lines of the journalctl command I see some different amdgpu crashes.

Aug 11 21:35:40 linuxjoni02 kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Aug 11 21:35:45 linuxjoni02 kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Aug 11 21:35:45 linuxjoni02 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=64866, emitted seq=64868
Aug 11 21:35:45 linuxjoni02 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process blender pid 2370 thread blender:cs0 pid 2392
Aug 11 21:35:45 linuxjoni02 kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Aug 11 21:35:45 linuxjoni02 kernel: amdgpu: Failed to suspend process 0x800a
Aug 11 21:35:45 linuxjoni02 kernel: amdgpu_cs_ioctl: 134 callbacks suppressed
Aug 11 21:35:45 linuxjoni02 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Aug 11 21:35:45 linuxjoni02 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Aug 11 21:35:45 linuxjoni02 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Aug 11 21:35:45 linuxjoni02 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Aug 11 21:35:45 linuxjoni02 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Aug 11 21:35:46 linuxjoni02 kernel: amdgpu 0000:0c:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Aug 11 21:35:46 linuxjoni02 kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
Aug 11 21:35:46 linuxjoni02 kernel: amdgpu 0000:0c:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Aug 11 21:35:46 linuxjoni02 kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
Aug 11 21:35:46 linuxjoni02 kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx

And further below

Aug 11 21:35:47 linuxjoni02 kernel: [drm] Skip scheduling IBs!
Aug 11 21:35:47 linuxjoni02 kernel: [drm] Skip scheduling IBs!
Aug 11 21:35:47 linuxjoni02 kernel: [drm] Skip scheduling IBs!
Aug 11 21:35:47 linuxjoni02 kernel: [drm] Skip scheduling IBs!
Aug 11 21:35:47 linuxjoni02 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Aug 11 21:35:47 linuxjoni02 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Aug 11 21:35:47 linuxjoni02 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Aug 11 21:35:47 linuxjoni02 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Aug 11 21:35:48 linuxjoni02 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Aug 11 21:35:57 linuxjoni02 kernel: amdgpu_cs_ioctl: 18 callbacks suppressed
Aug 11 21:35:57 linuxjoni02 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Aug 11 21:35:57 linuxjoni02 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Aug 11 21:35:57 linuxjoni02 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Aug 11 21:35:57 linuxjoni02 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Aug 11 21:35:57 linuxjoni02 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Aug 11 21:35:57 linuxjoni02 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Aug 11 21:35:57 linuxjoni02 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Aug 11 21:35:57 linuxjoni02 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Aug 11 21:35:57 linuxjoni02 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Aug 11 21:35:57 linuxjoni02 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Aug 11 21:36:07 linuxjoni02 kernel: amdgpu_cs_ioctl: 10 callbacks suppressed
Aug 11 21:36:07 linuxjoni02 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Aug 11 21:36:07 linuxjoni02 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!

Just not using the rendered view and then rendering stuff normally seems to still work fine at the moment.

I think I’ll open a bug report on blender just to be sure, because it doesn’t really sound the same as #97591 - Cycles HIP error with image textures on Linux and RDNA1 - blender - Blender Projects

1 Like

Ahh yes. So the ‘fix’ I thought I had has been a fine line and I’m still getting this exact same crash. When you have that bug up, would you mind linking it so that I can throw a +1 on there, too? It’s been making me crazy enough to want to try other Linux distros just to see if someone out there has a magical answer to this dilemma. :frowning:

Here’s the link to my Blender bug report - feel free to add your experience to the “Task”

1 Like

I was considering AMD for my next GPU after my Nvidia or Intel burns out. Exactly because I thought it is Open Source (ROCm Core Technology · GitHub), which means bugs can be fixed without waiting on upstream. So I am surprised to see ROCm referenced as proprietary. What I don’t know?

I wanted to add here that a user on Ask Fedora showed me a better way to accomplish all of this without all the fuss I went through using a copr repo! No need to faff around.

https://discussion.fedoraproject.org/t/how-can-i-get-started-with-blender-cycles-hip-rendering/70090/25?u=bhibb

So… I’m a step closer to resolving this issue. See my update on the blender issue-thread: #100353 - Cycles HIP rendered viewport crashes system/GPU on Linux with RDNA2 GPU - blender - Blender Projects

I tried it on RHEL 8.6 with and without the kernel module and it only works with it - I’m pretty sure if we get the kernel module to compile & install it will work as intended.

Steps to reproduce how far I have come so far:

  1. Download RHEL 9.0 amdgpu-install:
    https://www.amd.com/en/support/linux-drivers

  2. Check /etc/yum.repos.d/amdgpu.repo:

[joni@linuxjoni02 yum.repos.d]$ cat /etc/yum.repos.d/amdgpu.repo
[amdgpu]
name=AMDGPU 22.10.2 repository
baseurl=https://repo.radeon.com/amdgpu/latest/rhel/9.0/main/x86_64
enabled=1
gpgcheck=1
gpgkey=file:///etc/amdgpu-install/rocm.gpg.key
[...]
  1. #> sudo amdgpu-install --usecase=hip

  2. Check for error Messages and look at make.log:

[joni@linuxjoni02 yum.repos.d]$ sudo amdgpu-install --usecase=hip
AMDGPU 22.10.2 repository                                      7.2 kB/s | 2.9 kB     00:00    
Dependencies resolved.
===============================================================================================
 Package                   Arch       Version                                Repository   Size
===============================================================================================
Installing:
 amdgpu-dkms               noarch     1:5.16.9.22.20.50200-1438747.el9       amdgpu      7.5 M
 rocm-hip-runtime          x86_64     5.1.2.50102-55.el9                     rocm        7.1 k
Installing dependencies:
 amdgpu-dkms-firmware      noarch     1:5.16.9.22.20.50200-1438747.el9       amdgpu      9.5 M
 clang                     x86_64     14.0.5-1.fc36                          updates      82 k
 comgr                     x86_64     2.4.0.50102-55.el9                     rocm         35 M
 hip-runtime-amd           x86_64     5.1.20532.50102-55.el9                 rocm        5.9 M
 hsa-rocr                  x86_64     1.5.0.50102-55.el9                     rocm        522 k
 hsa-rocr-devel            x86_64     1.5.0.50102-55.el9                     rocm         83 k
 hsakmt-roct-devel         x86_64     20220128.1.7.50102-55.el9              rocm         90 k
 rocm-core                 x86_64     5.1.2.50102-55.el9                     rocm         15 k
 rocm-device-libs          x86_64     5.2.0-1.fc36                           updates     597 k
 rocm-language-runtime     x86_64     5.1.2.50102-55.el9                     rocm        7.2 k
 rocm-llvm                 x86_64     14.0.0.22114.50102-55.el9              rocm        641 M
 rocminfo                  x86_64     5.2.0-1.fc36                           updates      33 k

Transaction Summary
===============================================================================================
Install  14 Packages

Total download size: 700 M
Installed size: 3.4 G
Is this ok [Y/n]: Y
Downloading Packages:
(1/14): rocminfo-5.2.0-1.fc36.x86_64.rpm                       205 kB/s |  33 kB     00:00    
(2/14): clang-14.0.5-1.fc36.x86_64.rpm                         462 kB/s |  82 kB     00:00    
(3/14): rocm-device-libs-5.2.0-1.fc36.x86_64.rpm               2.7 MB/s | 597 kB     00:00    
(4/14): hsa-rocr-1.5.0.50102-55.el9.x86_64.rpm                 545 kB/s | 522 kB     00:00    
(5/14): hsa-rocr-devel-1.5.0.50102-55.el9.x86_64.rpm           367 kB/s |  83 kB     00:00    
(6/14): hsakmt-roct-devel-20220128.1.7.50102-55.el9.x86_64.rpm 554 kB/s |  90 kB     00:00    
(7/14): rocm-core-5.1.2.50102-55.el9.x86_64.rpm                140 kB/s |  15 kB     00:00    
(8/14): amdgpu-dkms-5.16.9.22.20.50200-1438747.el9.noarch.rpm  4.7 MB/s | 7.5 MB     00:01    
(9/14): rocm-hip-runtime-5.1.2.50102-55.el9.x86_64.rpm          46 kB/s | 7.1 kB     00:00    
(10/14): rocm-language-runtime-5.1.2.50102-55.el9.x86_64.rpm    70 kB/s | 7.2 kB     00:00    
(11/14): hip-runtime-amd-5.1.20532.50102-55.el9.x86_64.rpm     3.1 MB/s | 5.9 MB     00:01    
(12/14): amdgpu-dkms-firmware-5.16.9.22.20.50200-1438747.el9.n 3.7 MB/s | 9.5 MB     00:02    
(13/14): comgr-2.4.0.50102-55.el9.x86_64.rpm                   4.9 MB/s |  35 MB     00:07    
(14/14): rocm-llvm-14.0.0.22114.50102-55.el9.x86_64.rpm         19 MB/s | 641 MB     00:34    
-----------------------------------------------------------------------------------------------
Total                                                           19 MB/s | 700 MB     00:36     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Running scriptlet: hsakmt-roct-devel-20220128.1.7.50102-55.el9.x86_64                    1/1 
  Preparing        :                                                                       1/1 
  Installing       : rocm-core-5.1.2.50102-55.el9.x86_64                                  1/14 
  Running scriptlet: rocm-core-5.1.2.50102-55.el9.x86_64                                  1/14 
  Installing       : comgr-2.4.0.50102-55.el9.x86_64                                      2/14 
  Running scriptlet: hsakmt-roct-devel-20220128.1.7.50102-55.el9.x86_64                   3/14 
  Installing       : hsakmt-roct-devel-20220128.1.7.50102-55.el9.x86_64                   3/14 
  Running scriptlet: hsakmt-roct-devel-20220128.1.7.50102-55.el9.x86_64                   3/14 
  Installing       : hsa-rocr-1.5.0.50102-55.el9.x86_64                                   4/14 
  Running scriptlet: hsa-rocr-1.5.0.50102-55.el9.x86_64                                   4/14 
  Installing       : rocminfo-5.2.0-1.fc36.x86_64                                         5/14 
  Installing       : hsa-rocr-devel-1.5.0.50102-55.el9.x86_64                             6/14 
  Running scriptlet: hsa-rocr-devel-1.5.0.50102-55.el9.x86_64                             6/14 
  Installing       : rocm-llvm-14.0.0.22114.50102-55.el9.x86_64                           7/14 
  Running scriptlet: rocm-llvm-14.0.0.22114.50102-55.el9.x86_64                           7/14 
  Installing       : hip-runtime-amd-5.1.20532.50102-55.el9.x86_64                        8/14 
  Running scriptlet: hip-runtime-amd-5.1.20532.50102-55.el9.x86_64                        8/14 
  Installing       : clang-14.0.5-1.fc36.x86_64                                           9/14 
  Installing       : rocm-device-libs-5.2.0-1.fc36.x86_64                                10/14 
  Installing       : rocm-language-runtime-5.1.2.50102-55.el9.x86_64                     11/14 
  Installing       : amdgpu-dkms-firmware-1:5.16.9.22.20.50200-1438747.el9.noarch        12/14 
  Installing       : amdgpu-dkms-1:5.16.9.22.20.50200-1438747.el9.noarch                 13/14 
  Running scriptlet: amdgpu-dkms-1:5.16.9.22.20.50200-1438747.el9.noarch                 13/14 
Loading new amdgpu-5.16.9.22.20-1438747.el9 DKMS files...
Building for 5.19.4-200.fc36.x86_64
Building initial module for 5.19.4-200.fc36.x86_64
Error! Bad return status for module build on kernel: 5.19.4-200.fc36.x86_64 (x86_64)
Consult /var/lib/dkms/amdgpu/5.16.9.22.20-1438747.el9/build/make.log for more information.
warning: %post(amdgpu-dkms-1:5.16.9.22.20.50200-1438747.el9.noarch) scriptlet failed, exit status 10

Error in POSTIN scriptlet in rpm package amdgpu-dkms
  Installing       : rocm-hip-runtime-5.1.2.50102-55.el9.x86_64                          14/14 
  Running scriptlet: hsakmt-roct-devel-20220128.1.7.50102-55.el9.x86_64                  14/14 
  Running scriptlet: rocm-hip-runtime-5.1.2.50102-55.el9.x86_64                          14/14 
  Verifying        : amdgpu-dkms-1:5.16.9.22.20.50200-1438747.el9.noarch                  1/14 
  Verifying        : amdgpu-dkms-firmware-1:5.16.9.22.20.50200-1438747.el9.noarch         2/14 
  Verifying        : clang-14.0.5-1.fc36.x86_64                                           3/14 
  Verifying        : rocm-device-libs-5.2.0-1.fc36.x86_64                                 4/14 
  Verifying        : rocminfo-5.2.0-1.fc36.x86_64                                         5/14 
  Verifying        : comgr-2.4.0.50102-55.el9.x86_64                                      6/14 
  Verifying        : hip-runtime-amd-5.1.20532.50102-55.el9.x86_64                        7/14 
  Verifying        : hsa-rocr-1.5.0.50102-55.el9.x86_64                                   8/14 
  Verifying        : hsa-rocr-devel-1.5.0.50102-55.el9.x86_64                             9/14 
  Verifying        : hsakmt-roct-devel-20220128.1.7.50102-55.el9.x86_64                  10/14 
  Verifying        : rocm-core-5.1.2.50102-55.el9.x86_64                                 11/14 
  Verifying        : rocm-hip-runtime-5.1.2.50102-55.el9.x86_64                          12/14 
  Verifying        : rocm-language-runtime-5.1.2.50102-55.el9.x86_64                     13/14 
  Verifying        : rocm-llvm-14.0.0.22114.50102-55.el9.x86_64                          14/14 

Installed:
  amdgpu-dkms-1:5.16.9.22.20.50200-1438747.el9.noarch                                          
  amdgpu-dkms-firmware-1:5.16.9.22.20.50200-1438747.el9.noarch                                 
  clang-14.0.5-1.fc36.x86_64                                                                   
  comgr-2.4.0.50102-55.el9.x86_64                                                              
  hip-runtime-amd-5.1.20532.50102-55.el9.x86_64                                                
  hsa-rocr-1.5.0.50102-55.el9.x86_64                                                           
  hsa-rocr-devel-1.5.0.50102-55.el9.x86_64                                                     
  hsakmt-roct-devel-20220128.1.7.50102-55.el9.x86_64                                           
  rocm-core-5.1.2.50102-55.el9.x86_64                                                          
  rocm-device-libs-5.2.0-1.fc36.x86_64                                                         
  rocm-hip-runtime-5.1.2.50102-55.el9.x86_64                                                   
  rocm-language-runtime-5.1.2.50102-55.el9.x86_64                                              
  rocm-llvm-14.0.0.22114.50102-55.el9.x86_64                                                   
  rocminfo-5.2.0-1.fc36.x86_64                                                                 

Complete!
WARNING: amdgpu dkms failed for running kernel
[joni@linuxjoni02 yum.repos.d]$ cat /var/lib/dkms/amdgpu/5.16.9.22.20-1438747.el9/build/make.log
DKMS make.log for amdgpu-5.16.9.22.20-1438747.el9 for kernel 5.19.4-200.fc36.x86_64 (x86_64)
Fri Sep  2 10:20:45 PM CEST 2022
make: Entering directory '/usr/src/kernels/5.19.4-200.fc36.x86_64'
/var/lib/dkms/amdgpu/5.16.9.22.20-1438747.el9/build/Makefile:16: *** dma_resv->seq is missing., exit....  Stop.
make: *** [Makefile:1851: /var/lib/dkms/amdgpu/5.16.9.22.20-1438747.el9/build] Error 2
make: Leaving directory '/usr/src/kernels/5.19.4-200.fc36.x86_64'

For now I’ll try to get the a kernel version that’s as close as possible to RHEL 9.0 - maybe that will help.

1 Like

Any luck on your further experimentation? I’ve never compiled the kernel myself but I’d venture that way if it meant no more gpu crashes!

I got the kernel modules to compile on a 5.14.0-61 kernel and got rocm from the official amd installer but it still didnt work:

joni@linuxjoni02 ~]$ sudo amdgpu-install --usecase=hip
[sudo] password for joni:
Last metadata expiration check: 0:23:30 ago on Fri 02 Sep 2022 11:29:03 PM CEST.
Dependencies resolved.
==============================================================================
 Package               Arch   Version                           Repo     Size
==============================================================================
Installing:
 amdgpu-dkms           noarch 1:5.16.9.22.20.50200-1438747.el9  amdgpu  7.5 M
 rocm-hip-runtime      x86_64 5.1.2.50102-55.el9                rocm    7.1 k
Installing dependencies:
 amdgpu-dkms-firmware  noarch 1:5.16.9.22.20.50200-1438747.el9  amdgpu  9.5 M
 clang                 x86_64 14.0.5-1.fc36                     updates  82 k
 comgr                 x86_64 2.4.0.50102-55.el9                rocm     35 M
 hip-runtime-amd       x86_64 5.1.20532.50102-55.el9            rocm    5.9 M
 hsa-rocr              x86_64 1.5.0.50102-55.el9                rocm    522 k
 hsa-rocr-devel        x86_64 1.5.0.50102-55.el9                rocm     83 k
 hsakmt-roct-devel     x86_64 20220128.1.7.50102-55.el9         rocm     90 k
 rocm-core             x86_64 5.1.2.50102-55.el9                rocm     15 k
 rocm-device-libs      x86_64 5.2.0-1.fc36                      updates 597 k
 rocm-language-runtime x86_64 5.1.2.50102-55.el9                rocm    7.2 k
 rocm-llvm             x86_64 14.0.0.22114.50102-55.el9         rocm    641 M
 rocminfo              x86_64 5.2.0-1.fc36                      updates  33 k

Transaction Summary
==============================================================================
Install  14 Packages

Total download size: 700 M
Installed size: 3.4 G
Is this ok [Y/n]:
Downloading Packages:
(1/14): clang-14.0.5-1.fc36.x86_64.rpm        488 kB/s |  82 kB     00:00
(2/14): rocminfo-5.2.0-1.fc36.x86_64.rpm      194 kB/s |  33 kB     00:00
(3/14): rocm-device-libs-5.2.0-1.fc36.x86_64. 2.6 MB/s | 597 kB     00:00
(4/14): hsa-rocr-1.5.0.50102-55.el9.x86_64.rp 542 kB/s | 522 kB     00:00
(5/14): hsa-rocr-devel-1.5.0.50102-55.el9.x86 565 kB/s |  83 kB     00:00
(6/14): hsakmt-roct-devel-20220128.1.7.50102- 317 kB/s |  90 kB     00:00
(7/14): rocm-core-5.1.2.50102-55.el9.x86_64.r 145 kB/s |  15 kB     00:00
(8/14): amdgpu-dkms-5.16.9.22.20.50200-143874 4.9 MB/s | 7.5 MB     00:01
(9/14): rocm-hip-runtime-5.1.2.50102-55.el9.x  66 kB/s | 7.1 kB     00:00
(10/14): rocm-language-runtime-5.1.2.50102-55  70 kB/s | 7.2 kB     00:00
(11/14): amdgpu-dkms-firmware-5.16.9.22.20.50 4.1 MB/s | 9.5 MB     00:02
(12/14): comgr-2.4.0.50102-55.el9.x86_64.rpm                                                                                                                    6.5 MB/s |  35 MB     00:05
(13/14): hip-runtime-amd-5.1.20532.50102-55.el9.x86_64.rpm                                                                                                      938 kB/s | 5.9 MB     00:06
(14/14): rocm-llvm-14.0.0.22114.50102-55.el9.x86_64.rpm                                                                                                          22 MB/s | 641 MB     00:28
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                                                            23 MB/s | 700 MB     00:30
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Running scriptlet: hsakmt-roct-devel-20220128.1.7.50102-55.el9.x86_64                                                                                                                     1/1
  Preparing        :                                                                                                                                                                        1/1
  Installing       : rocm-core-5.1.2.50102-55.el9.x86_64                                                                                                                                   1/14
  Running scriptlet: rocm-core-5.1.2.50102-55.el9.x86_64                                                                                                                                   1/14
  Installing       : comgr-2.4.0.50102-55.el9.x86_64                                                                                                                                       2/14
  Running scriptlet: hsakmt-roct-devel-20220128.1.7.50102-55.el9.x86_64                                                                                                                    3/14
  Installing       : hsakmt-roct-devel-20220128.1.7.50102-55.el9.x86_64                                                                                                                    3/14
  Running scriptlet: hsakmt-roct-devel-20220128.1.7.50102-55.el9.x86_64                                                                                                                    3/14
  Installing       : hsa-rocr-1.5.0.50102-55.el9.x86_64                                                                                                                                    4/14
  Running scriptlet: hsa-rocr-1.5.0.50102-55.el9.x86_64                                                                                                                                    4/14
  Installing       : rocminfo-5.2.0-1.fc36.x86_64                                                                                                                                          5/14
  Installing       : hsa-rocr-devel-1.5.0.50102-55.el9.x86_64                                                                                                                              6/14
  Running scriptlet: hsa-rocr-devel-1.5.0.50102-55.el9.x86_64                                                                                                                              6/14
  Installing       : rocm-llvm-14.0.0.22114.50102-55.el9.x86_64                                                                                                                            7/14
  Running scriptlet: rocm-llvm-14.0.0.22114.50102-55.el9.x86_64                                                                                                                            7/14
  Installing       : hip-runtime-amd-5.1.20532.50102-55.el9.x86_64                                                                                                                         8/14
  Running scriptlet: hip-runtime-amd-5.1.20532.50102-55.el9.x86_64                                                                                                                         8/14
  Installing       : clang-14.0.5-1.fc36.x86_64                                                                                                                                            9/14
  Installing       : rocm-device-libs-5.2.0-1.fc36.x86_64                                                                                                                                 10/14
  Installing       : rocm-language-runtime-5.1.2.50102-55.el9.x86_64                                                                                                                      11/14
  Installing       : amdgpu-dkms-firmware-1:5.16.9.22.20.50200-1438747.el9.noarch                                                                                                         12/14
  Installing       : amdgpu-dkms-1:5.16.9.22.20.50200-1438747.el9.noarch                                                                                                                  13/14
  Running scriptlet: amdgpu-dkms-1:5.16.9.22.20.50200-1438747.el9.noarch                                                                                                                  13/14
Loading new amdgpu-5.16.9.22.20-1438747.el9 DKMS files...
Building for 5.14.0-61.fc36.x86_64
Building initial module for 5.14.0-61.fc36.x86_64
Done.
Forcing installation of amdgpu

amdgpu.ko.xz:
Running module version sanity check.
 - Original module
   - Found /lib/modules/5.14.0-61.fc36.x86_64/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko.xz
   - Storing in /var/lib/dkms/amdgpu/original_module/5.14.0-61.fc36.x86_64/x86_64/
   - Archiving for uninstallation purposes
 - Installation
   - Installing to /lib/modules/5.14.0-61.fc36.x86_64/extra/

amdttm.ko.xz:
Running module version sanity check.
 - Original module
 - Installation
   - Installing to /lib/modules/5.14.0-61.fc36.x86_64/extra/

amdkcl.ko.xz:
Running module version sanity check.
 - Original module
 - Installation
   - Installing to /lib/modules/5.14.0-61.fc36.x86_64/extra/

amd-sched.ko.xz:
Running module version sanity check.
 - Original module
 - Installation
   - Installing to /lib/modules/5.14.0-61.fc36.x86_64/extra/

amddrm_ttm_helper.ko.xz:
Running module version sanity check.
 - Original module
 - Installation
   - Installing to /lib/modules/5.14.0-61.fc36.x86_64/extra/

Running the post_install script:
depmod....

  Installing       : rocm-hip-runtime-5.1.2.50102-55.el9.x86_64                                                                                                                           14/14
  Running scriptlet: hsakmt-roct-devel-20220128.1.7.50102-55.el9.x86_64                                                                                                                   14/14
  Running scriptlet: rocm-hip-runtime-5.1.2.50102-55.el9.x86_64                                                                                                                           14/14
  Verifying        : amdgpu-dkms-1:5.16.9.22.20.50200-1438747.el9.noarch                                                                                                                   1/14
  Verifying        : amdgpu-dkms-firmware-1:5.16.9.22.20.50200-1438747.el9.noarch                                                                                                          2/14
  Verifying        : clang-14.0.5-1.fc36.x86_64                                                                                                                                            3/14
  Verifying        : rocm-device-libs-5.2.0-1.fc36.x86_64                                                                                                                                  4/14
  Verifying        : rocminfo-5.2.0-1.fc36.x86_64                                                                                                                                          5/14
  Verifying        : comgr-2.4.0.50102-55.el9.x86_64                                                                                                                                       6/14
  Verifying        : hip-runtime-amd-5.1.20532.50102-55.el9.x86_64                                                                                                                         7/14
  Verifying        : hsa-rocr-1.5.0.50102-55.el9.x86_64                                                                                                                                    8/14
  Verifying        : hsa-rocr-devel-1.5.0.50102-55.el9.x86_64                                                                                                                              9/14
  Verifying        : hsakmt-roct-devel-20220128.1.7.50102-55.el9.x86_64                                                                                                                   10/14
  Verifying        : rocm-core-5.1.2.50102-55.el9.x86_64                                                                                                                                  11/14
  Verifying        : rocm-hip-runtime-5.1.2.50102-55.el9.x86_64                                                                                                                           12/14
  Verifying        : rocm-language-runtime-5.1.2.50102-55.el9.x86_64                                                                                                                      13/14
  Verifying        : rocm-llvm-14.0.0.22114.50102-55.el9.x86_64                                                                                                                           14/14

Installed:
  amdgpu-dkms-1:5.16.9.22.20.50200-1438747.el9.noarch           amdgpu-dkms-firmware-1:5.16.9.22.20.50200-1438747.el9.noarch           clang-14.0.5-1.fc36.x86_64
  comgr-2.4.0.50102-55.el9.x86_64                               hip-runtime-amd-5.1.20532.50102-55.el9.x86_64                          hsa-rocr-1.5.0.50102-55.el9.x86_64
  hsa-rocr-devel-1.5.0.50102-55.el9.x86_64                      hsakmt-roct-devel-20220128.1.7.50102-55.el9.x86_64                     rocm-core-5.1.2.50102-55.el9.x86_64
  rocm-device-libs-5.2.0-1.fc36.x86_64                          rocm-hip-runtime-5.1.2.50102-55.el9.x86_64                             rocm-language-runtime-5.1.2.50102-55.el9.x86_64
  rocm-llvm-14.0.0.22114.50102-55.el9.x86_64                    rocminfo-5.2.0-1.fc36.x86_64

Complete!
[joni@linuxjoni02 ~]$


[joni@linuxjoni02 ~]$ sudo dkms status
amdgpu/5.16.9.22.20-1438747.el9, 5.14.0-61.fc36.x86_64, x86_64: installed (original_module exists)

I also tried reproducing this issue in RedHat 8.6 and blender actually worked without the issue.

I have no idea what the problem is.
Something has to be different from RedHat still.

My Ticket on the amd gitlab got closed unfortunately as Fedora is not supported.