#1128208 pytorch: enable support of ROCm

#1128208#5
Date:
2026-02-16 13:57:15 UTC
From:
To:
It would be nice to have PyTorch with ROCm enabled.
#1128208#12
Date:
2026-02-16 16:59:14 UTC
From:
To:
Dylan Aïssi dijo [Mon, Feb 16, 2026 at 01:57:15PM +0000]:

That needs to be done as a separate package, enabling a different set of
sources, if I understand things correctly — more or less in the same way we
have the CPU version (python3-torch) with differing sources and conflicting
with the CUDA version (python3-torch-cuda).

#1128208#17
Date:
2026-02-16 16:59:14 UTC
From:
To:
Dylan Aïssi dijo [Mon, Feb 16, 2026 at 01:57:15PM +0000]:

That needs to be done as a separate package, enabling a different set of
sources, if I understand things correctly — more or less in the same way we
have the CPU version (python3-torch) with differing sources and conflicting
with the CUDA version (python3-torch-cuda).

#1128208#22
Date:
2026-02-16 17:07:14 UTC
From:
To:
Hi,

Indeed, you are right!

That's what I'm working on, I have just created these bugs (pytorch and
gloo) for the new DFSG team, otherwise new source packages have a low
score in the NEW queue.


Best regards,
Dylan

#1128208#27
Date:
2026-03-13 11:13:01 UTC
From:
To:
Dear Maintainer,

In ubuntu we started building pytorch-rocm that is forked from
src:ubuntu but there is something wrong with the configs it seems. It
looks like it doesnt build for ROCm at all.
See:
https://bugs.launchpad.net/ubuntu/+source/pytorch-rocm/+bug/2144078a for
more details.

I started working on it and saw some bugs/typos already in FindHIP.cmake
and FindGloo.cmake files.

I also have questions regarding the need of fork for ROCm. Why
we are not using build profiles to build for rocm. I started
experimenting this and the only conlict I see so far due to gloo-rocm
(again forked from src:gloo) I think we could apply the same build
profile logic to there as well. I am working on a PoC to see if that
works. If it works it will require changes in src:gloo as well.

I would like to hear your toughts as well.

#1128208#32
Date:
2026-03-13 11:46:32 UTC
From:
To:
Looking at the build log attached:

 >   CMake Warning at cmake/public/LoadHIP.cmake:67 (find_package):
 >    No "FindHIP.cmake" found in CMAKE_MODULE_PATH.

At least for the FindHIP.cmake I found an issue with multiarch search
path recently and filed this upstream:

https://github.com/pytorch/pytorch/pull/175349

I also sent up to Salsa though; is that missing from the build log shown?

https://salsa.debian.org/deeplearning-team/pytorch/-/commit/d5497d377ae2b2e54ec1bf8589cd3b27c1800094

#1128208#37
Date:
2026-03-14 07:37:22 UTC
From:
To:
Hi Talha,

If, by this, you mean:

  (1) have one src:pytorch, with multiple build profiles
  (2) build and upload the ROCm variant via such a profile, instead of
      a source fork

Then sadly I don't think this will work, because the autobuilder network
will not build with such profiles. Furthermore, this goes against the
general expectation that one can just do `apt-get source` and
`dpkg-buildpackage` to get the same binary result.

I tried something similar for ggml [1] and ultimately had to drop this
again [2], for the above reasons.

Best,
Christian

[1]: https://lists.debian.org/debian-ai/2025/12/msg00083.html
[2]: https://lists.debian.org/debian-ai/2025/12/msg00097.html

#1128208#42
Date:
2026-03-16 07:26:04 UTC
From:
To:
Hello Christian,

Thank you for sharing your experience, yes indeed building profiles
doesn't really fix the problem,
they are just providing convenience for building in this case. Meanwhile, I
continued with my experimentation
on gloo by taking "curl" as an example, and managed to build rocm and cpu
in the same src:gloo. See my on going
efforts on src:gloo:
https://salsa.debian.org/deeplearning-team/gloo/-/merge_requests/6/diffs

But if we can't resolve the conflict between libgloo-rocm-dev and
libgloo-dev then this won't help for the pytorch case.
I dont have the historical knowledge regarding why we decided to have rocm
as separate src:gloo-rocm package (I understood
the cuda case) so trying to keep behaviour as same as today so that is
creating challenge on how to handle gloo/config.h as it has
information like `GLOO_USE_ROCM` which should be 0 for cpu only version and
1 for rocm only version.

I will dig a bit more on gloo side, if it is possible to resolve the
conflict while keeping the behaviour the same,
without too much hassle (like with alternatives etc.) .

But it would be great if you or someone else shared the experience on such
cases so I can act
accordingly.

Best Regards,
Talha

Christian Kastner <ckk@debian.org>, 14 Mar 2026 Cmt, 08:37 tarihinde şunu
yazdı: