#1040616 ITP: rocm-validation-suite -- AMD GPU system validation tools

#1040616#5
Date:
2023-07-08 00:05:32 UTC
From:
To:
* Package name    : rocm-validation-suite
  Version         : 5.6.0
* URL             : https://github.com/ROCm-Developer-Tools/ROCmValidationSuite
* License         : Expat
  Programming Lang: C++
  Description     : AMD GPU system validation tools

 The ROCm Validation Suite (RVS) is a collection of utilities for
 verifying the correct functioning of the AMD GPUs installed on a system.
 It provides system administrators with tests, benchmarks and other
 tools for troubleshooting common problems found in high-performance
 computing environments.
 .
 RVS provides utilities for querying GPU properties, monitoring GPU
 information, monitoring the PCI Express link speeds and power,
 querying relevent PCI Express bus properties for a GPU, verifying
 the GPU SBIOS mapping, benchmarking peer-to-peer links between GPUs,
 benchmarking the PCI Express bus, stress-testing installed GPUs,
 stress-testing the system PSU, verifying GPU memory to detect hardware
 errors, and benchmarking device global memory.

This package provides a variety of tools for checking the correct
functioning of AMD GPU hardware, which would be valuable for ensuring
that malfunctioning hardware and misconfigured systems are identified.
It is useful for ruling out hardware problems when unexpected
software behaviours are encountered.

This package is part of AMD's ROCm stack and will be maintained
under the Debian AI team umbrella.

#1040616#14
Date:
2025-07-19 22:13:20 UTC
From:
To:
As per discussion with Andrew, the most recent versions of
rocm-validation-suite depend on hipblaslt. I believe it is used for
stress testing.

Sincerely,
Cory Bloor

#1040616#27
Date:
2026-02-19 16:09:34 UTC
From:
To:

Hello team,

Thanks Cordell and Christian got hipblaslt packaged and updated to
version 7.1.1+dfsg-3 in sid. It really unblocks me to continue with
this ITP. So I also imported version 7.1.1 in my WIP project for
this ITP.

I first got blocked at mxDataGenerator download at configure time:
https://github.com/ROCm/ROCmValidationSuite/blob/master/CMakeLists.txt#L470

I saw here is an opened issue on upstream:
https://github.com/ROCm/ROCmValidationSuite/issues/1023

I simply workaround it by make the headers only library as a patch.
Not sure if other part of rocm packages also needs that? If yes, that
means we probably have to package the mxDataGenerator separately.

And then I got errors on cf-protection flagis that I found workaround
from hipblaslt package.

There are also errors on unsupported types and mismatch types that
I added workaround inito debian/rules file.

The package is now available on salsa:
https://salsa.debian.org/rocm-team/rocm-validation-suite

However I don't have suitable hardware to verify and test for it at
the moment. Could someone who has the suitable hardware do a review
and tests?

Best regards,

#1040616#32
Date:
2026-03-04 19:29:10 UTC
From:
To:
Hello Andrew, Cordell, Team

 > However I don't have suitable hardware to verify and test for it at

I built/installed and made some test:

===========
---
rocm_agent_enumerator
gfx1201
--- rvs -g ROCm Validation Suite (version 1.3.0) Supported GPUs available: 0000:03:00.0 - GPU[ 1 - 44339] AMD Radeon AI PRO R9700 (Device 30033)
--- rvs -c /usr/share/rocm-validation-suite/conf/gst_single.conf [RESULT] [ 95112.6009 ] Action name :gpustress-9000-sgemm-false [RESULT] [ 95112.9752 ] Module name :gst [RESULT] [ 95112.542666] [gpustress-9000-sgemm-false] [GPU:: 44339] Start of GPU ramp up rocBLAS error: Cannot read /usr/lib/x86_64-linux-gnu/rocblas/4.4.1/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1201 List of available TensileLibrary Files : "/usr/lib/x86_64-linux-gnu/rocblas/4.4.1/library/TensileLibrary_lazy_gfx908.dat" "/usr/lib/x86_64-linux-gnu/rocblas/4.4.1/library/TensileLibrary_lazy_gfx1010.dat" "/usr/lib/x86_64-linux-gnu/rocblas/4.4.1/library/TensileLibrary_lazy_gfx803.dat" "/usr/lib/x86_64-linux-gnu/rocblas/4.4.1/library/TensileLibrary_lazy_gfx900.dat" "/usr/lib/x86_64-linux-gnu/rocblas/4.4.1/library/TensileLibrary_lazy_gfx1030.dat" "/usr/lib/x86_64-linux-gnu/rocblas/4.4.1/library/TensileLibrary_lazy_gfx1102.dat" "/usr/lib/x86_64-linux-gnu/rocblas/4.4.1/library/TensileLibrary_lazy_gfx90a.dat" "/usr/lib/x86_64-linux-gnu/rocblas/4.4.1/library/TensileLibrary_lazy_gfx906.dat" "/usr/lib/x86_64-linux-gnu/rocblas/4.4.1/library/TensileLibrary_lazy_gfx1100.dat" "/usr/lib/x86_64-linux-gnu/rocblas/4.4.1/library/TensileLibrary_lazy_gfx1101.dat" Abandon (core dumped)rvs -c /usr/share/rocm-validation-suite/conf/gst_single.conf As far as i understand, my GPU is not yet supported and will require a more recent rocblas to work =========== --- rocm_agent_enumerator gfx90a
--- rvs -g ROCm Validation Suite (version 1.3.0) Supported GPUs available: 0000:b3:00.0 - GPU[ 2 - 39824] AMD Instinct MI210 (Device 29711)
--- rvs -c /usr/share/rocm-validation-suite/conf/gst_single.conf [RESULT] [7768732.143122] Action name :gpustress-9000-sgemm-false [RESULT] [7768732.158223] Module name :gst [RESULT] [7768732.765097] [gpustress-9000-sgemm-false] [GPU:: 39824] Start of GPU ramp up ... [RESULT] [7769024.111252] [gpustress-8000-device-false] [GPU:: 39824] GFLOPS 32855 Target GFLOPS: 8000 met: TRUE +=====================================================================+ | ROCm Validation Suite (RVS) Summary | +=====================================================================+ | System Overview | +---------------------------------------------------------------------+ | Operating System | Debian GNU/Linux forky/sid | | RVS version | 1.3.0 | | ROCm version | N/A | sh: 1: dkms: not found | amdgpu version | N/A | | GPUs | 1 | +---------------------------------------------------------------------+ | GPU Name - GPU ID | AMD Instinct MI210 - 39824 | | ID - Node ID - BDF | 0 - 2 - 0000:b3:00.0 | +=====================================================================+ | Action Name | Module | Result | +=====================================================================+ | gpustress-9000-sgemm-false | GST | PASS | | gpustress-8000-sgemm-true | GST | PASS | | gpustress-8000-hgemm-false | GST | PASS | | gpustress-8000-hgemm-true | GST | PASS | | gpustress-8000-dgemm-false | GST | PASS | | gpustress-8000-dgemm-true | GST | PASS | | gpustress-8000-device-false | GST | PASS | +---------------------------------------------------------------------+
--- Uploading the package with this simple tests included would allow to automate for the arch available in https://ci.rocm.debian.net/ Regards Christian
#1040616#37
Date:
2026-03-05 17:54:26 UTC
From:
To:
Hi Christian,

The version of rocblas on unstable is sufficient to support your gfx1201
GPU. We just need to upload pkg-rocm-tools 0.9.7 and binNMU the various
libraries that depend on it [1].

Sincerely,
Cory Bloor

[1]: https://lists.debian.org/debian-ai/2026/02/msg00267.html