ROCm CXX_FLAGS issue during cmake generation of makefiles

I spent quite a bit of time on this yesterday and still don’t have an answer. I’m hoping that there’s something simple I’m overlooking that someone will quickly see.

I’m trying to get rocblas to build on a rawhide system as part of the process towards packaging it. I had some issues with cmake wanting to use hipcc but ran into an error about finding the ROCm device library:

clang-16: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library

It turns out that hipcc is looking for the bitcode provided by rocm-device-libs in /usr/lib64/clang/16/amdgcn/bitcode/. Providing that directory as --rocm-device-lib-path or --hip-device-lib-path makes the error go away, cmake’s test compilation is happy and the process finishes.

However, the CXX_FLAGS which are passed to hipcc by the cmake-generated makefiles end up with an additional flag (which isn’t used for cmake’s test compiles) in a flags.make file after the intended flag and this overrides the intended device path:
--hip-device-lib-path=/usr/lib64/cmake/AMDDeviceLibs/AMDDeviceLibsConfig.cmake/lib

With that additional flag (which comes after the correct one supplied to cmake and is also in CXX_FLAGS), the build fails with a familiar error:

$ make
[  0%] Generating /home/tflink/rocm/rocBLAS/build/include/rocblas/internal/rocblas_device_malloc.hpp
[  0%] Built target rocblas_device_malloc
[  1%] Generating prototypes from /home/tflink/rocm/rocBLAS/library/src.
[  1%] Built target rocblas_proto_templates
[  1%] Building CXX object library/src/CMakeFiles/rocblas.dir/blas_ex/rocblas_axpy_ex.cpp.o
clang-16: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library
clang-16: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library
clang-16: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library
clang-16: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library
clang-16: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library
clang-16: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library
clang-16: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library
clang-16: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library
clang-16: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library
clang-16: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library
clang-16: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library
clang-16: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library
make[2]: *** [library/src/CMakeFiles/rocblas.dir/build.make:76: library/src/CMakeFiles/rocblas.dir/blas_ex/rocblas_axpy_ex.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:198: library/src/CMakeFiles/rocblas.dir/all] Error 2
make: *** [Makefile:156: all] Error 2

If I go into the generated makefile that sets all the CXX_FLAGS after cmake is done and manually remove that second, strange --hip-device-lib-path, everything builds without issue. After finding that, you’d think that it would be an easy fix, right?

For the life of me, I cannot figure out how and why that second --hip-device-lib-path is being added or why it’s an invalid path to start with. rocRAND shows the same issue in the generated makefiles so I don’t think it’s an issue inside just rocBLAS.

I’ve inserted debug statements into CmakeLists.txt files, I put cmake into verbose mode and I’ve gone through every kinda related project I can think of but I still have no idea where this duplicated flag is coming from or why it’s an invalid path.

Does anyone see what I’m missing or have ideas on where I can look?

1 Like

Could the --hip-device-lib-path option be coming from here? The device lib path is constructed here and it’s the ROCM_PATH env variable + /lib, so maybe the ROCM_PATH is getting accidentally overridden somewhere? Or maybe it’s getting a bad value from here?

1 Like