Mystro256/rocm-opencl

Description

WARNING: REPO IS FOR TESTING PURPOSES ONLY. PLEASE USE AT YOUR OWN RISK.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Installation Instructions

Instructions not filled in by author. Author knows what to do. Everybody else should avoid this repo.

Active Releases

The following unofficial repositories are provided as-is by owner of this project. Contact the owner directly for bugs or issues (IE: not bugzilla).

Release Architectures Repo Download Fedora rawhide ppc64le (0)*, x86_64 (0)* Fedora rawhide (0 downloads)

* Total number of packages downloaded in the last seven days.


This is a companion discussion topic for the original entry at https://copr.fedorainfracloud.org/coprs/mystro256/rocm-opencl/
1 Like

Thanks for doing this. I’d struggled to get a stable OpenCL 2.0 dev environment working for years with lots of sloppy work.

1 Like

I installed Fedora 36 and this repo works nicely with my 6600 XT!

I’m not sure if it’s because of the app I’m using, but I needed to involve a file from Clang 13 and had to install clang13-libs.

sudo mkdir -p '/usr/lib64/clang/13.0.1/include' && sudo ln -s '/usr/lib64/llvm13/lib/clang/13.0.1/include/opencl-c-base.h' '/usr/lib64/clang/13.0.1/include/opencl-c-base.h'

The OpenCL app I’m using works fine after that! I was unable to use Clover, and the instructions to get AMDGPU-PRO on Fedora were messy and fragile. With this repo, all I had to do was install rocm-opencl (and the above clang13).

Firstly: Thanks for your work on this!


I finally bought a new computer last week; it has an AMD GPU (6700xt) and was hoping to use it with darktable and OpenCL. But, unless I’m mistaken… the current build of rocm-opencl for Fedora 36 doesn’t seem to actually contain OpenCL?

$ rpm -ql rocm-opencl
/etc/OpenCL/vendors/amdocl64.icd
/usr/lib/.build-id
/usr/lib/.build-id/03
/usr/lib/.build-id/03/0ca4c95d5b12b624b4571157414936e6080d18
/usr/lib/.build-id/47
/usr/lib/.build-id/47/e9e9968893225a473949a20e160faf6f5749ee
/usr/lib64/libamdocl64.so
/usr/lib64/libcltrace.so.5.1
/usr/lib64/libcltrace.so.5.1.0
/usr/share/licenses/rocm-opencl
/usr/share/licenses/rocm-opencl/LICENSE.txt
$ clinfo 
Number of platforms                               0

Was there a build issue?

I did have OpenCL working a few days ago in some form, but it wouldn’t work with darktable due to headers being in the wrong place (or it looking in the wrong place), as noted above with a symlink (the sudo mkdir -p '/usr/lib64/clang/13.0.1/include' && sudo ln -s '/usr/lib64/llvm13/lib/clang/13.0.1/include/opencl-c-base.h' '/usr/lib64/clang/13.0.1/include/opencl-c-base.h' command)… but it’s impossible to add a symlink like that (in /usr/) on Silverblue, so that was also a problem. But meanwhile, I updated and OpenCL doesn’t even work at all, even outside of darktable.

Something happened with the latest rocm-opencl package update that broke OpenCL. Reverting the update works-around that for now. On F36 for now, I went back to AMDGPU-PRO libraries.

First up, thanks a lot Jeremy Newton (@mystro256) not only for creating and maintaining the ROCm fedora packages, but also for upstreaming all the fixes to the utterly broken build system of all the different ROCm components.

In the process of getting ROCm set up on my Thinkpad P14s with an AMD Ryzen/Radeon Vega 8 APU, I build several libraries from source as well and was amazed by all the unnecessary quirks and hoops one has to jump through to get them even compiled.

On that note, I have OpenCL working now to a certain extent using the official RHEL/CentOS 8 packages, but get various errors using the Fedora packages on a fresh F36 installation. Are there some relevant fixes in the 5.1.1 patch release that I’m using or is it just that the F36 packages are right now broken somehow?

Sorry I was experimenting with different patches, please test 5.1.3 I just uploaded. I put that for review for Fedora, so hopefully that works.

@garrett Regarding:

the current build of rocm-opencl for Fedora 36 doesn’t seem to actually contain OpenCL?

Do you mean libOpenCL.so? That’s provided by the ocl-icd, which should be installed with rocm-opencl. If not, please let me know.

1 Like

The updates work perfectly now. Thank you so much!

I’m using darktable with OpenCL now and it’s so much faster. Everything’s basically instant; it’s a huge difference.

Thanks again!

Thanks, I’m glad it could help. I hope to get this into Fedora and EPEL soon.

1 Like

Hi @mystro256 , thank you so much for all your work, I’m pretty new to ROCm I tried getting pytorch build with ROCm support but I failed to get it working, I’m not sure what I did wrong as far as i understand all i had to do is grab the pytorch sources
git clone --recursive GitHub - ROCmSoftwarePlatform/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Then I needed to set: (according to rocminfo)
export PYTORCH_ROCM_ARCH=gfx90c

Then to build
export USE_NINJA=1
Various fixes for pytorch third party:
export CXXFLAGS="-Wno-error=bitwise-instead-of-logical -Wno-error=unused-but-set-variable -Wno-error=unused-parameter -Wno-error=sign-compare -pthread -Wno-defaulted-function-deleted"
USE_ROCM=1 CC=clang CXX=clang++ MAX_JOBS=4 python setup.py build

But for some reason during setup I noticed the output
– USE_ROCM : OFF

I think it has to do with the fact I might have hipconfig missing have you ever tried to play with getting pytorch enabled with ROCm on Fedora?

Thanks! and sorry if it’s a bit off topic

@arilou So to be clear, I’m not a ROCm developer, so most of what I have done is somewhat trial and error.

From what I understand, pytorch needs HIP, and then a bunch more math libraries on top.
I don’t think my rocm-hip package works yet because none of the mathlibs compile for me.

The cmake file:

seems to imply you need:
hip, rocrand, hiprand, rocblas, miopen, rocfft, hipfft, rocsparse, hipsparse, rccl, rocprim, hipcub, rocthrust

Thanks @mystro256 do you happen to have your own git (with all those projects) where you did the trial and error attempts perhaps I can try playing with it a bit as well and see how far I can get? (In case I do get a bit further ill send a PR to your git)

@arilou Give me a few days, I’ll make a new Copr for HIP. I recently requested a patch to be backported to Fedora’s clang to fix a HIP issue. I’ll try to sync up with Debian, as they’ve made more progress in packaging HIP and the others.

@arilou this might help if you want to attempt it yourself:

Awesome thank you @mystro256 I will try to play with it during the weekend and see how far I get :slight_smile: but I’ll keep an eye to see if you publish a new Copr with HIP

Thanks again,
– Jon.

@arilou
I created a build of rocm-hip:
https://copr.fedorainfracloud.org/coprs/mystro256/rocm-hip/

This likely won’t be imported into Fedora as-is due to complicated build/sources issues, but I’m working with upstream to resolve them. As well, I just got some HW that works with HIP, so I can now work on this. Prior I could only had HW to test OpenCL, so I my packaging was a bit dodgy.

Wow this is awesome thank you @mystro256 I will try to see if i can get pytorch going with this :slight_smile:

For anyone interested, I’ve setup a SIG wiki page:
https://fedoraproject.org/wiki/SIGs/HC

@mystro256 Thanks for packaging rocm, this is one of those packages that was sorely lacking on Fedora.
However, I want to report the problems I noticed:
On a desktop computer with 6900XT there are no problems, but on a laptop with a discrete 6800M card and integrated RENOIR graphics, clinfo hang and consume 100% cpu after printing follow lines: clinfo output - Pastebin.com

Here is backtrace:

[New LWP 115241]

This GDB supports auto-downloading debuginfo from the following URLs:
https://debuginfod.fedoraproject.org/ 
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f4634991e1b in sched_yield () at ../sysdeps/unix/syscall-template.S:120
120	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)

Thread 2 (Thread 0x7f463415d640 (LWP 115241) "clinfo"):
#0  __GI___ioctl (fd=fd@entry=4, request=request@entry=3222817548) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1  0x00007f46341be100 in kmtIoctl (fd=4, request=request@entry=3222817548, arg=arg@entry=0x7f463415cc00) at /usr/src/debug/hsakmt-1.0.6-22.rocm5.1.1.fc37.x86_64/src/libhsakmt.c:13
#2  0x00007f46341bebb2 in hsaKmtWaitOnMultipleEvents (Milliseconds=4294967294, WaitOnAll=<optimized out>, NumEvents=<optimized out>, Events=0x7f463415cd00) at /usr/src/debug/hsakmt-1.0.6-22.rocm5.1.1.fc37.x86_64/src/events.c:312
#3  hsaKmtWaitOnMultipleEvents (Events=0x7f463415cd00, NumEvents=3, WaitOnAll=<optimized out>, Milliseconds=4294967294) at /usr/src/debug/hsakmt-1.0.6-22.rocm5.1.1.fc37.x86_64/src/events.c:286
#4  0x00007f463457355f in rocr::core::Signal::WaitAny (satisfying_value=0x7f463415cdd8, wait_hint=<optimized out>, timeout=<optimized out>, values=0x7f462c000c10, conds=0x558a57b7ed30, hsa_signals=0x7f462c000be0, signal_count=4) at /usr/src/debug/rocm-runtime-5.1.3-1.fc37.x86_64/src/core/runtime/signal.cpp:312
#5  rocr::AMD::hsa_amd_signal_wait_any (signal_count=4, hsa_signals=0x7f462c000be0, conds=0x558a57b7ed30, values=0x7f462c000c10, timeout_hint=<optimized out>, wait_hint=<optimized out>, satisfying_value=0x7f463415cdd8) at /usr/src/debug/rocm-runtime-5.1.3-1.fc37.x86_64/src/core/runtime/hsa_ext_amd.cpp:505
#6  0x00007f463457ed0a in rocr::core::Runtime::AsyncEventsLoop () at /usr/include/c++/12/bits/stl_vector.h:1123
#7  0x00007f46345492fb in rocr::os::ThreadTrampoline (arg=<optimized out>) at /usr/src/debug/rocm-runtime-5.1.3-1.fc37.x86_64/src/core/util/lnx/os_linux.cpp:76
#8  0x00007f463492be9d in start_thread (arg=<optimized out>) at pthread_create.c:442
#9  0x00007f46349ac680 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 1 (Thread 0x7f46348a2740 (LWP 115240) "clinfo"):
#0  0x00007f4634991e1b in sched_yield () at ../sysdeps/unix/syscall-template.S:120
#1  0x00007f4634561f95 in rocr::os::YieldThread () at /usr/src/debug/rocm-runtime-5.1.3-1.fc37.x86_64/src/core/util/lnx/os_linux.cpp:230
#2  rocr::AMD::AqlQueue::ExecutePM4 (this=0x558a57c318c0, cmd_data=<optimized out>, cmd_size_b=<optimized out>) at /usr/src/debug/rocm-runtime-5.1.3-1.fc37.x86_64/src/core/runtime/amd_aql_queue.cpp:1243
#3  0x00007f4634554711 in rocr::AMD::GpuAgent::InvalidateCodeCaches (this=0x558a57b79430) at /usr/src/debug/rocm-runtime-5.1.3-1.fc37.x86_64/src/core/runtime/amd_gpu_agent.cpp:1513
#4  0x00007f463455d5b7 in rocr::amd::LoaderContext::SegmentAlloc (this=<optimized out>, segment=<optimized out>, agent=..., size=10504, align=256, zero=<optimized out>) at /usr/src/debug/rocm-runtime-5.1.3-1.fc37.x86_64/src/core/runtime/amd_loader_context.cpp:475
#5  0x00007f463459044a in rocr::amd::hsa::loader::ExecutableImpl::LoadSegmentsV2 (this=0x558a57a89210, agent=..., c=0x558a57ce7a90) at /usr/src/debug/rocm-runtime-5.1.3-1.fc37.x86_64/src/loader/executable.cpp:1325
#6  0x00007f4634587a8e in rocr::amd::hsa::loader::ExecutableImpl::LoadSegments (majorVersion=<optimized out>, c=0x558a57ce7a90, agent=..., this=0x558a57a89210) at /usr/src/debug/rocm-runtime-5.1.3-1.fc37.x86_64/src/loader/executable.cpp:1301
#7  rocr::amd::hsa::loader::ExecutableImpl::LoadCodeObject (this=0x558a57a89210, agent=..., code_object=..., code_object_size=<optimized out>, options=<optimized out>, uri="memory://115240#offset=0x558a57c9f990&size=3472", loaded_code_object=0x0) at /usr/src/debug/rocm-runtime-5.1.3-1.fc37.x86_64/src/loader/executable.cpp:1262
#8  0x00007f4634570013 in rocr::amd::hsa::loader::ExecutableImpl::LoadCodeObject (loaded_code_object=0x0, uri="memory://115240#offset=0x558a57c9f990&size=3472", options=0x0, code_object=..., agent=..., this=0x558a57a89210) at /usr/src/debug/rocm-runtime-5.1.3-1.fc37.x86_64/src/loader/executable.cpp:1126
#9  rocr::HSA::hsa_executable_load_agent_code_object (executable=..., agent=..., code_object_reader=..., options=0x0, loaded_code_object=0x0) at /usr/src/debug/rocm-runtime-5.1.3-1.fc37.x86_64/src/core/runtime/hsa.cpp:2293
#10 0x00007f46347e1ba7 in roc::LightningProgram::setKernels (this=0x558a582c4910, binary=0x558a57c9f990, binSize=<optimized out>, fdesc=<optimized out>, foffset=<optimized out>, uri=...) at /usr/src/debug/rocm-opencl-5.1.3-2.fc37.x86_64/ROCclr-rocm-5.1.3/device/rocm/rocprogram.cpp:316
#11 0x00007f463483ee0d in device::Program::loadLC (this=0x558a582c4910) at /usr/src/debug/rocm-opencl-5.1.3-2.fc37.x86_64/ROCclr-rocm-5.1.3/device/devprogram.cpp:1892
#12 device::Program::load (this=0x558a582c4910) at /usr/src/debug/rocm-opencl-5.1.3-2.fc37.x86_64/ROCclr-rocm-5.1.3/device/devprogram.cpp:1903
#13 amd::Program::load(std::vector<amd::Device*, std::allocator<amd::Device*> > const&) [clone .constprop.0] (this=this@entry=0x558a57a87a50, devices=std::vector of length 0, capacity 0) at /usr/src/debug/rocm-opencl-5.1.3-2.fc37.x86_64/ROCclr-rocm-5.1.3/platform/program.cpp:630
#14 0x00007f46347b67d7 in clCreateKernel (program=0x558a57a87a60, kernel_name=0x558a57a87f40 "sum", errcode_ret=0x7fffa40e8be0) at /usr/src/debug/rocm-opencl-5.1.3-2.fc37.x86_64/amdocl/cl_program.cpp:1310
#15 0x00007f4634a8ba51 in clCreateKernel (program=0x558a57a87a60, kernel_name=0x558a57a87f40 "sum", errcode_ret=0x7fffa40e8be0) at /usr/src/debug/ocl-icd-2.3.1-1.fc37.x86_64/ocl_icd_loader_gen.c:2119
#16 0x0000558a56c01d16 in getWGsizes (wgm_sz=1, output=<optimized out>, wgm=0x7fffa40e8ac8, loc=<optimized out>, ret=0x7fffa40e8be0) at src/clinfo.c:1502
#17 device_info_wg (ret=0x7fffa40e8be0, loc=0x7fffa40e8ba0, chk=<optimized out>, output=<optimized out>) at src/clinfo.c:1534
#18 0x0000558a56c024b5 in printDeviceInfo (dev=dev@entry=0x558a57bfb260, plist=plist@entry=0x7fffa40e9130, p=p@entry=0, param_whitelist=<optimized out>, param_whitelist@entry=0x0, output=output@entry=0x7fffa40e90b0) at src/clinfo.c:2888
#19 0x0000558a56c036ba in printPlatformDevices (plist=0x7fffa40e9130, p=0, device=0x558a57bfd7a0, ndevs=<optimized out>, str=0x7fffa40e9170, output=0x7fffa40e90b0, these_are_offline=0) at src/clinfo.c:3163
#20 0x0000558a56bf801e in showDevices (output=0x7fffa40e90b0, plist=0x7fffa40e9130) at src/clinfo.c:3220
#21 main (argc=<optimized out>, argv=<optimized out>) at src/clinfo.c:3991
[Inferior 1 (process 115240) detached]

In kernel log I see follow trace:

[21756.673590] amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[21756.673595] amdgpu: Failed to evict process queues
[21756.675296] amdgpu: Failed to quiesce KFD
[21760.682874] amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[21760.682880] amdgpu: Resetting wave fronts (cpsch) on dev 000000002a318dc0
[36049.140533] amdgpu: HIQ MQD's queue_doorbell_id0 is not 0, Queue preemption time out
[36049.140538] amdgpu: Failed to evict process queues
[36049.140545] amdgpu: Failed to quiesce KFD
[36049.143573] ------------[ cut here ]------------
[36049.143576] WARNING: CPU: 13 PID: 82 at drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device_queue_manager.c:932 restore_process_queues_cpsch+0x1c2/0x1e0 [amdgpu]
[36049.143704] Modules linked in: binfmt_misc tls overlay xpad ff_memless tun uinput rfcomm snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_midi_event nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep sunrpc snd_hda_codec_realtek snd_hda_codec_generic mt7921e mt7921_common snd_hda_codec_hdmi snd_sof_amd_renoir snd_sof_amd_acp mt76_connac_lib snd_sof_pci snd_sof mt76 snd_hda_intel snd_sof_utils snd_intel_dspcfg ledtrig_audio vfat snd_intel_sdw_acpi fat snd_soc_core snd_usb_audio mac80211 btusb snd_hda_codec btrtl snd_compress ac97_bus btbcm snd_usbmidi_lib snd_hda_core snd_pcm_dmaengine snd_rawmidi btintel snd_hwdep snd_pci_acp6x mc btmtk snd_seq libarc4 bluetooth snd_seq_device snd_pcm cfg80211 ecdh_generic pcspkr asus_nb_wmi wmi_bmof joydev snd_timer snd_pci_acp5x snd
[36049.143734]  snd_rn_pci_acp3x snd_acp_config snd_soc_acpi snd_pci_acp3x soundcore i2c_piix4 k10temp acpi_cpufreq amd_pmc asus_wireless zram amdgpu hid_asus drm_ttm_helper ttm asus_wmi iommu_v2 hid_multitouch sparse_keymap nvme gpu_sched platform_profile ucsi_acpi typec_ucsi serio_raw rfkill ccp sp5100_tco nvme_core drm_dp_helper r8169 typec wmi video i2c_hid_acpi i2c_hid ip6_tables ip_tables ipmi_devintf ipmi_msghandler fuse
[36049.143752] CPU: 13 PID: 82 Comm: kworker/13:0 Tainted: G        W    L   --------  ---  5.19.0-0.rc0.20220525gitfdaf9a5840ac.2.fc37.x86_64 #1
[36049.143754] Hardware name: ASUSTeK COMPUTER INC. ROG Strix G513QY_G513QY/G513QY, BIOS G513QY.318 03/29/2022
[36049.143755] Workqueue: events amdgpu_amdkfd_restore_userptr_worker [amdgpu]
[36049.143841] RIP: 0010:restore_process_queues_cpsch+0x1c2/0x1e0 [amdgpu]
[36049.143920] Code: ed 89 45 3c 8b 8b 68 02 00 00 e9 7c ff ff ff 4c 89 ea 48 c7 c6 30 b5 a1 c0 48 c7 c7 10 31 bd c0 e8 c3 b9 18 f6 e9 b0 fe ff ff <0f> 0b 45 31 ed e9 57 ff ff ff 41 bd fb ff ff ff e9 29 ff ff ff 66
[36049.143921] RSP: 0018:ffffa08100437d78 EFLAGS: 00010246
[36049.143923] RAX: 0000000000000000 RBX: ffff9089b3aff000 RCX: 0000000000000000
[36049.143924] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000ffffffff
[36049.143925] RBP: ffff908bdb86dc10 R08: 0000000000000001 R09: 0000000000000000
[36049.143926] R10: ffffa08100437d78 R11: 0000000000000001 R12: ffff9089b3aff1b8
[36049.143926] R13: 00000002fdfc5001 R14: 0000000000000008 R15: ffff908951e16000
[36049.143927] FS:  0000000000000000(0000) GS:ffff909816a00000(0000) knlGS:0000000000000000
[36049.143928] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[36049.143929] CR2: 00007f695333c000 CR3: 0000000d14028000 CR4: 0000000000750ee0
[36049.143930] PKRU: 55555554
[36049.143931] Call Trace:
[36049.143933]  <TASK>
[36049.143935]  kfd_process_restore_queues+0x3c/0x70 [amdgpu]
[36049.144018]  kgd2kfd_resume_mm+0x1c/0x40 [amdgpu]
[36049.144125]  amdgpu_amdkfd_restore_userptr_worker+0x38f/0x420 [amdgpu]
[36049.144233]  process_one_work+0x29d/0x5f0
[36049.144238]  worker_thread+0x4f/0x390
[36049.144241]  ? process_one_work+0x5f0/0x5f0
[36049.144242]  kthread+0xf5/0x120
[36049.144244]  ? kthread_complete_and_exit+0x20/0x20
[36049.144246]  ret_from_fork+0x22/0x30
[36049.144251]  </TASK>
[36049.144252] irq event stamp: 66
[36049.144252] hardirqs last  enabled at (65): [<ffffffffb6ed6ab4>] _raw_spin_unlock_irq+0x24/0x50
[36049.144255] hardirqs last disabled at (66): [<ffffffffb6ece1cb>] __schedule+0xbdb/0x1640
[36049.144257] softirqs last  enabled at (0): [<ffffffffb60ea522>] copy_process+0x9e2/0x1e10
[36049.144260] softirqs last disabled at (0): [<0000000000000000>] 0x0
[36049.144280] ---[ end trace 0000000000000000 ]---

Is this a rocm or kernel issue?

Having the Fedora SIG is really great!

When installing and setting up ROCm OpenCL on my laptop, I had some confusion regarding the groups for non-root users and the discussion pages in the wiki are not yet enabled.

According to the FAQ in the official AMD docs non-root users should be part of the video or render group to access the AMD GPU devices. However, the wiki only mentions a “vendor” group. Is this an oversight or indeed correct?

For what it’s worth, on my laptop I got it all working with the non-root users being only members of the video and render groups.