#1113460 rccl: FTBFS with CMake 4

Package:
src:rccl
Source:
src:rccl
Submitter:
Date:
2026-03-16 06:29:02 UTC
Severity:
normal
Tags:
#1113460#5
Date:
2025-08-31 23:03:47 UTC
From:
To:
Dear maintainer,

During a test rebuild for CMake 4, rccl failed to rebuild.

Log Summary:
-------------------------------------------------------------------------------
[...]
//ADVANCED property for variable: CMAKE_EXE_LINKER_FLAGS_MINSIZEREL
CMAKE_EXE_LINKER_FLAGS_MINSIZEREL-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_EXE_LINKER_FLAGS_RELEASE
CMAKE_EXE_LINKER_FLAGS_RELEASE-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_EXE_LINKER_FLAGS_RELWITHDEBINFO
CMAKE_EXE_LINKER_FLAGS_RELWITHDEBINFO-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_EXPORT_COMPILE_COMMANDS
CMAKE_EXPORT_COMPILE_COMMANDS-ADVANCED:INTERNAL=1
//Name of external makefile project generator.
CMAKE_EXTRA_GENERATOR:INTERNAL=
//Name of generator.
CMAKE_GENERATOR:INTERNAL=Unix Makefiles
//Generator instance identifier.
CMAKE_GENERATOR_INSTANCE:INTERNAL=
//Name of generator platform.
CMAKE_GENERATOR_PLATFORM:INTERNAL=
//Name of generator toolset.
CMAKE_GENERATOR_TOOLSET:INTERNAL=
//Test CMAKE_HAVE_LIBC_PTHREAD
CMAKE_HAVE_LIBC_PTHREAD:INTERNAL=1
//Source directory with the top level CMakeLists.txt file for this
// project
CMAKE_HOME_DIRECTORY:INTERNAL=/build/reproducible-path/rccl-5.4.3
//ADVANCED property for variable: CMAKE_INSTALL_BINDIR
CMAKE_INSTALL_BINDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_DATADIR
CMAKE_INSTALL_DATADIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_DATAROOTDIR
CMAKE_INSTALL_DATAROOTDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_DOCDIR
CMAKE_INSTALL_DOCDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_INCLUDEDIR
CMAKE_INSTALL_INCLUDEDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_INFODIR
CMAKE_INSTALL_INFODIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_LIBDIR
CMAKE_INSTALL_LIBDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_LIBEXECDIR
CMAKE_INSTALL_LIBEXECDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_LOCALEDIR
CMAKE_INSTALL_LOCALEDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_LOCALSTATEDIR
CMAKE_INSTALL_LOCALSTATEDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_MANDIR
CMAKE_INSTALL_MANDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_OLDINCLUDEDIR
CMAKE_INSTALL_OLDINCLUDEDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_RUNSTATEDIR
CMAKE_INSTALL_RUNSTATEDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_SBINDIR
CMAKE_INSTALL_SBINDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_SHAREDSTATEDIR
CMAKE_INSTALL_SHAREDSTATEDIR-ADVANCED:INTERNAL=1
//Install .so files without execute permission.
CMAKE_INSTALL_SO_NO_EXE:INTERNAL=1
//ADVANCED property for variable: CMAKE_INSTALL_SYSCONFDIR
CMAKE_INSTALL_SYSCONFDIR-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_LINKER
CMAKE_LINKER-ADVANCED:INTERNAL=1
//Name of CMakeLists files to read
CMAKE_LIST_FILE_NAME:INTERNAL=CMakeLists.txt
//ADVANCED property for variable: CMAKE_MAKE_PROGRAM
CMAKE_MAKE_PROGRAM-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_MODULE_LINKER_FLAGS
CMAKE_MODULE_LINKER_FLAGS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_MODULE_LINKER_FLAGS_DEBUG
CMAKE_MODULE_LINKER_FLAGS_DEBUG-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_MODULE_LINKER_FLAGS_MINSIZEREL
CMAKE_MODULE_LINKER_FLAGS_MINSIZEREL-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_MODULE_LINKER_FLAGS_RELEASE
CMAKE_MODULE_LINKER_FLAGS_RELEASE-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_MODULE_LINKER_FLAGS_RELWITHDEBINFO
CMAKE_MODULE_LINKER_FLAGS_RELWITHDEBINFO-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_NM
CMAKE_NM-ADVANCED:INTERNAL=1
//number of local generators
CMAKE_NUMBER_OF_MAKEFILES:INTERNAL=2
//ADVANCED property for variable: CMAKE_OBJCOPY
CMAKE_OBJCOPY-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_OBJDUMP
CMAKE_OBJDUMP-ADVANCED:INTERNAL=1
//Platform information initialized
CMAKE_PLATFORM_INFO_INITIALIZED:INTERNAL=1
//ADVANCED property for variable: CMAKE_RANLIB
CMAKE_RANLIB-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_READELF
CMAKE_READELF-ADVANCED:INTERNAL=1
//Path to CMake installation.
CMAKE_ROOT:INTERNAL=/usr/share/cmake-4.1
//ADVANCED property for variable: CMAKE_SHARED_LINKER_FLAGS
CMAKE_SHARED_LINKER_FLAGS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_SHARED_LINKER_FLAGS_DEBUG
CMAKE_SHARED_LINKER_FLAGS_DEBUG-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_SHARED_LINKER_FLAGS_MINSIZEREL
CMAKE_SHARED_LINKER_FLAGS_MINSIZEREL-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_SHARED_LINKER_FLAGS_RELEASE
CMAKE_SHARED_LINKER_FLAGS_RELEASE-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_SHARED_LINKER_FLAGS_RELWITHDEBINFO
CMAKE_SHARED_LINKER_FLAGS_RELWITHDEBINFO-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_SKIP_INSTALL_RPATH
CMAKE_SKIP_INSTALL_RPATH-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_SKIP_RPATH
CMAKE_SKIP_RPATH-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_STATIC_LINKER_FLAGS
CMAKE_STATIC_LINKER_FLAGS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_STATIC_LINKER_FLAGS_DEBUG
CMAKE_STATIC_LINKER_FLAGS_DEBUG-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_STATIC_LINKER_FLAGS_MINSIZEREL
CMAKE_STATIC_LINKER_FLAGS_MINSIZEREL-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_STATIC_LINKER_FLAGS_RELEASE
CMAKE_STATIC_LINKER_FLAGS_RELEASE-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_STATIC_LINKER_FLAGS_RELWITHDEBINFO
CMAKE_STATIC_LINKER_FLAGS_RELWITHDEBINFO-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_STRIP
CMAKE_STRIP-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_TAPI
CMAKE_TAPI-ADVANCED:INTERNAL=1
//uname command
CMAKE_UNAME:INTERNAL=/usr/bin/uname
//ADVANCED property for variable: CMAKE_VERBOSE_MAKEFILE
CMAKE_VERBOSE_MAKEFILE-ADVANCED:INTERNAL=1
//Test COMPILER_HAS_TARGET_ID_gfx1030
COMPILER_HAS_TARGET_ID_gfx1030:INTERNAL=1
//Test COMPILER_HAS_TARGET_ID_gfx1100
COMPILER_HAS_TARGET_ID_gfx1100:INTERNAL=1
//Test COMPILER_HAS_TARGET_ID_gfx1101
COMPILER_HAS_TARGET_ID_gfx1101:INTERNAL=1
//Test COMPILER_HAS_TARGET_ID_gfx1102
COMPILER_HAS_TARGET_ID_gfx1102:INTERNAL=1
//Test COMPILER_HAS_TARGET_ID_gfx803
COMPILER_HAS_TARGET_ID_gfx803:INTERNAL=1
//Test COMPILER_HAS_TARGET_ID_gfx900_xnack_off
COMPILER_HAS_TARGET_ID_gfx900_xnack_off:INTERNAL=1
//Test COMPILER_HAS_TARGET_ID_gfx906_xnack_off
COMPILER_HAS_TARGET_ID_gfx906_xnack_off:INTERNAL=1
//Test COMPILER_HAS_TARGET_ID_gfx908_xnack_off
COMPILER_HAS_TARGET_ID_gfx908_xnack_off:INTERNAL=1
//Test COMPILER_HAS_TARGET_ID_gfx90a_xnack_off
COMPILER_HAS_TARGET_ID_gfx90a_xnack_off:INTERNAL=1
//Test COMPILER_HAS_TARGET_ID_gfx90a_xnack_on
COMPILER_HAS_TARGET_ID_gfx90a_xnack_on:INTERNAL=1
//Details about finding GTest
FIND_PACKAGE_MESSAGE_DETAILS_GTest:INTERNAL=[/usr/lib/aarch64-linux-gnu/cmake/GTest/GTestConfig.cmake][ ][v1.17.0(1.11)]
//Details about finding Threads
FIND_PACKAGE_MESSAGE_DETAILS_Threads:INTERNAL=[TRUE][v()]
//Have includes bfd.h
HAVE_BFD:INTERNAL=
//Have include /usr/include/rocm_smi/rocm_smi64Config.h
HAVE_ROCM_SMI64CONFIG:INTERNAL=1
//Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS
HIP_CLANG_SUPPORTS_PARALLEL_JOBS:INTERNAL=
//Track whether rocm_create_package has been called.
ROCM_PACKAGE_CREATED:INTERNAL=FALSE
//Path to wrapper header file template.
ROCM_WRAPPER_TEMPLATE_HEADER:INTERNAL=/usr/share/rocmcmakebuildtools/cmake/header_template.h.in
//CMAKE_INSTALL_PREFIX during last run
_GNUInstallDirs_LAST_CMAKE_INSTALL_PREFIX:INTERNAL=/usr

dh_auto_configure: error: cd obj-aarch64-linux-gnu && DEB_PYTHON_INSTALL_LAYOUT=deb PKG_CONFIG=/usr/bin/pkg-config cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_BUILD_TYPE=None -DCMAKE_INSTALL_SYSCONFDIR=/etc -DCMAKE_INSTALL_LOCALSTATEDIR=/var -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON -DCMAKE_FIND_USE_PACKAGE_REGISTRY=OFF -DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON -DFETCHCONTENT_FULLY_DISCONNECTED=ON -DCMAKE_INSTALL_RUNSTATEDIR=/run -DCMAKE_SKIP_INSTALL_ALL_DEPENDENCY=ON "-GUnix Makefiles" -DCMAKE_VERBOSE_MAKEFILE=ON -DCMAKE_CXX_COMPILER=hipcc -DCMAKE_INSTALL_LIBDIR=lib/aarch64-linux-gnu -DCMAKE_BUILD_TYPE=Release -DCMAKE_SKIP_RPATH=ON -DAMDGPU_TARGETS=gfx803\;gfx900\;gfx906\;gfx908\;gfx90a\;gfx1010\;gfx1030\;gfx1100\;gfx1101\;gfx1102 -DROCM_SYMLINK_LIBS=OFF -DBUILD_FILE_REORG_BACKWARD_COMPATIBILITY=OFF -DBUILD_TESTS=ON .. returned exit code 1
make[1]: *** [debian/rules:28: override_dh_auto_configure-arch] Error 2
make[1]: Leaving directory '/build/reproducible-path/rccl-5.4.3'
make: *** [debian/rules:25: binary] Error 2
dpkg-buildpackage: error: debian/rules binary subprocess returned exit status 2
--------------------------------------------------------------------------------
Build finished at 2025-08-30T16:54:34Z
-------------------------------------------------------------------------------

The above is just how the build ends and not necessarily the most relevant part.
If required, the full build log is available here (for the next 30 days):
https://debusine.debian.net/artifact/2410036/

The most likely cause of build failures is the removed backwards compatibility for
CMake versions earlier than 3.5. You can find additional information in my
debian-devel announcement:

https://lists.debian.org/debian-devel/2025/04/msg00310.html

About the archive rebuild: The build was made on debusine.debian.net,
using sbuild.

You can find the build task here:
https://debusine.debian.net/work-request/154716/

If this is really a bug in one of the build-depends, please use
reassign and affects, so that this is still visible in the BTS web
page for this package.

Thanks,
Timo

#1113460#12
Date:
2025-09-01 11:52:29 UTC
From:
To:
I have downgraded the bug to non-RC severity temporarily and reverted
CMake in unstable to version 3.31.6. I plan to re-upgrade in about a
month's time. I realize that despite my April announcement, the upload
to unstable came rather surprising, and I don't want to cause
unnecessary pain.

Also, the build log URL has an unfortunate mistake. The correct URL
should include the workspace, i.e.,

https://debusine.debian.net/debian/developers/artifact/XXXX

CMake 4 will also be available in experimental again, so it can be
used to verify that the bug is fixed.

Cheers
Timo

#1113460#23
Date:
2026-03-10 11:07:49 UTC
From:
To:
Hi,

At this time, the CMake 4 failure is the tip of the iceberg.

rccl needs a binNMU or upload, because it still depends on
libamd-comgr2, which has been bumped to libamd-comgr3 in the mean time.

Fixing the CMake part seems easy enough. tests/CMakeLists.txt has a
version lower than CMakeLists.txt. The latter is at 3.5, which is still
supported by CMake 4. Bumping the version in tests, moves the build
quite a bit further.

Eventually though, src/graph/xml.cc tries to look up the gcnArch
attribute of a hipDeviceProp_t. That structure had an API bump and the
integer field was converted into a string gcnArchName. More work is
needed here.

Also note that the rccl version in unstable is quite far behind
upstream. Both the CMake version in tests and the gcnArch access are
fixed upstream. I suggest that spending any further time on the unstable
version is wasted effort. What is really needed here is an upload of a
new version.

Helmut

#1113460#30
Date:
2026-03-10 22:32:47 UTC
From:
To:
Yes, I tried the same a while back. I gave up and decided to update to rccl from ROCm 6.4.

It was easy enough to move to rccl from ROCm 6.4 (albeit with a reduced set of supported GPUs), but I got hung up on a missing symbol error. Upstream had dropped a function without changing the SONAME. In talking to them, they justified it on the basis that the function never worked anyway. I still think we need a dummy implementation that returns an error just for satisfying the linker. That's where I left off.

The incomplete rccl update is on my salsa account.

Sincerely,
Cory Bloor

#1113460#35
Date:
2026-03-13 06:28:07 UTC
From:
To:
The removed "function that never worked correctly" that I was referring to was ncclCommitInitRankMulti.

Canonical used my draft rccl 6.4 update and packaged the ROCm 7.1.0 version of rccl currently included in Ubuntu Resolute.

I have not yet reviewed what they did (otherwise, I would probably have already ported it back to Debian Experimental with whatever changes I felt necessary). I suspect rccl 7.1.0 may be able to build and run on the HIP Runtime 6.4 on Unstable, which would make that slightly easier.

You might be able to loosen the B-D from the Ubuntu rccl package and have it build and run successfully on Unstable. Most of the breaking changes in the HIP Runtime from 6.4 to 7.1 were things becoming stricter, so libraries written against the newer runtime might still be source compatible with the older runtime (but vice versa is unlikely, at least for big, complex libraries).

The rccl 7.1.0 + rocm-hipamd 6.4.4 combination won't have been tested by upstream. One risk is that there were significant changes to hipGetLastError behavior between ROCm 6 and ROCm 7, so it's possible that version mix could cause some subtle bugs at runtime. I think it's probably fine, though. If the unit tests pass, that would be a good sign.

In any case, I hope this information is helpful.

Sincerely,
qCory Blooe

#1113460#40
Date:
2026-03-11 11:20:58 UTC
From:
To:
Hi Cory,

Thank you.

I appreciate your attention to detail. While being attentive to dropped
symbols is good as a general rule, I suggest that there may be
exceptions.

librccl1 does not have any reverse dependencies in Debian (trixie nor
sid).

When we did the t64 transition, libselinux1 changed a symbol without
bumping soname. The rationale was that bumping the package name would
have broken dpkg (due to the improper soname transition) and there were
no in-archive users of the changed symbol.

So consider the effort you spend on this matter compared to the breakage
you might cause in pretending the symbol never existed.

We should not do this lightly, but the case at hand seems to have good
reasons. If in doubt, consider proposing the symbol drop on
debian-devel@l.d.o and see how many developers object. I guess none.

Helmut