While RPC is still labeled experimental, at least for CPU it does work, and in some cases can be a
substantial win (eg in my case I don't have 60G available memory in one server to run gpt-oss 120B
but across two I do, giving a 10x speedup vs running with the model 10% in the filesystem).
I did local builds with -DGGML_RPC=ON in both ggml and llama.cpp's rules file, renamed rpc-server to
llama-rpc-server in llama.cpp-tools-extra.install and added
usr/lib/${DEB_HOST_MULTIARCH}/ggml/backends0/libggml-rpc.so to libggml0.install and it seems to work
great.
Hi, This also worked on my end. I've enabled the ggml backend in the most recent upload to experimental. Regarding rpc-server, I agree that this should be called 'llama-rpc-server', and requested a rename upstream [1]. 'rpc-server' alone is too generic for /usr/bin. @Mathieu: this could be a good candidate for another systemd service, I think? Best, Christian [1]: https://github.com/ggml-org/llama.cpp/pull/25045
In my own old packages I had actually named it ggml-rpc-server, but according to the latest discussion on [1], this is also what has been chosen. So, all looks good. It should certainly be a systemd service! But I don't have much practice with it. If I well understand, it will simply load the (full-fledged) ggml backends which have been installed on the same system, and expose them so that they can be accessed by remote tools via their RPC backend. @Christian Do you want me to look at it? Since I am more on whisper.cpp right now, it may take a few days. Cheers, Mathieu
There is no urgency, better for us to think it through and then do it. Until then, we can simply ship the ggml-rpc-server binary without a service. Best, Christian