Dear Maintainer,
When profiling the rocblas calls from llama.cpp, I was using the
following comamnd:
ROCBLAS_LAYER=2 ROCBLAS_LOG_BENCH_PATH=$HOME/bench.log \
./llama-cli -ngl 99 --color -c 2048 --temp 0.7 \
--repeat_penalty 1.1 -n -1 -m dolphin-2.2.1-mistral-7b.Q5_K_M.gguf \
-no-cnv --prompt "Once upon a time"
and got output like this in bench.log:
./rocblas-bench -f gemm_batched_ex --transposeA T --transposeB N -m 32 -n 2 -k 128 --alpha 1 --a_type f16_r --lda 1024 --b_type f16_r --ldb 4096 --beta 0 --c_type f16_r --ldc --d_type f16_r --ldd 32 --batch_count 32 --compute_type f16_r --algo 0 --solution_index 0 --flags 1
However, these arguments are incomplete, as shown when passing them to librocblas0-bench:
$ cd /usr/libexec/rocm/librocblas0-bench
$ ./rocblas-bench -f gemm_batched_ex --transposeA T --transposeB N -m 32 -n 2 -k 128 --alpha 1 --a_type f16_r --lda 1024 --b_type f16_r --ldb 4096 --beta 0 --c_type f16_r --ldc --d_type f16_r --ldd 32 --batch_count 32 --compute_type f16_r --algo 0 --solution_index 0 --flags 1
Invalid value for --ldc
This can be worked around by dropping the --ldc flag, but this is
definitely a bug. The command line emitted by the bench logging should
be suitable for pasing directly to rocblas-bench.
Sincerely,
Cory Bloor