#1130261#5
Date:
2026-03-10 14:28:13 UTC
From:
To:
While RPC is still labeled experimental, at least for CPU it does work, and in some cases can be a
substantial win (eg in my case I don't have 60G available memory in one server to run gpt-oss 120B
but across two I do, giving a 10x speedup vs running with the model 10% in the filesystem).

I did local builds with -DGGML_RPC=ON in both ggml and llama.cpp's rules file, renamed rpc-server to
llama-rpc-server in llama.cpp-tools-extra.install and added
usr/lib/${DEB_HOST_MULTIARCH}/ggml/backends0/libggml-rpc.so to libggml0.install and it seems to work
great.

#1130261#10
Date:
2026-06-26 13:08:01 UTC
From:
To:
Hi,

This also worked on my end.

I've enabled the ggml backend in the most recent upload to experimental.

Regarding rpc-server, I agree that this should be called
'llama-rpc-server', and requested a rename upstream [1]. 'rpc-server'
alone is too generic for /usr/bin.

@Mathieu: this could be a good candidate for another systemd service, I
think?

Best,
Christian

[1]: https://github.com/ggml-org/llama.cpp/pull/25045

#1130261#15
Date:
2026-06-26 15:03:53 UTC
From:
To:
In my own old packages I had actually named it ggml-rpc-server, but
according to the latest discussion on [1], this is also what has been
chosen. So, all looks good.

It should certainly be a systemd service! But I don't have much
practice with it.

If I well understand, it will simply load the (full-fledged) ggml
backends which have been installed on the same system, and expose them
so that they can be accessed by remote tools via their RPC backend.

@Christian Do you want me to look at it? Since I am more on whisper.cpp
right now, it may take a few days.

Cheers,

Mathieu

#1130261#20
Date:
2026-06-26 15:25:51 UTC
From:
To:
There is no urgency, better for us to think it through and then do it.
Until then, we can simply ship the ggml-rpc-server binary without a service.

Best,
Christian