#805449 Allow to replace the existing dbus daemon without reboot

Package:
dbus
Source:
dbus
Description:
simple interprocess messaging system (system message bus)
Submitter:
Yuri D'Elia
Date:
2023-06-28 20:24:04 UTC
Severity:
wishlist
Tags:
#805449#5
Date:
2015-11-18 09:45:48 UTC
From:
To:
Setting up dbus (1.10.4-1) ...
A reboot is required to replace the running dbus-daemon.

Somehow I have a big problem with that.
We can re-exec init, but not dbus?

#805449#10
Date:
2015-11-18 10:23:52 UTC
From:
To:
Control: severity 805449 wishlist
Control: tags 805449 + upstream wontfix
...

Yes. Handing off IPC connections across a re-exec is a feature of
significant complexity, which will break the system if done wrong; it
would require support for serializing and deserializing dbus-daemon's
entire state (name ownership, connection credentials, the buffer of
partially-read messages on each connection, etc.) in a way that is
compatible across versions.

dbus-daemon has never had this feature, and will continue to not have
this feature until/unless someone implements it.

If you want to propose an implementation, I'd be happy to review a patch
series upstream (but I suspect it isn't actually feasible).

    S

#805449#19
Date:
2015-11-18 11:10:24 UTC
From:
To:
True, but dbus in the past was mostly local to an user session.

dbus now is critical for most system machinery. It deserves, actually
no, it *needs* such a feature.

I understand all you said and I agree on the complexity, but I still see
it as a big issue for a component that sits side-by-side with init.

Somehow it didn't click for me before that message.

I realize debian has nothing to do with this, but still, as long as you
don't close the bug, I'm fine with wontfix...

#805449#24
Date:
2015-11-18 13:07:14 UTC
From:
To:
We've already had some concerns about that issue at the local institute...

Am 18. November 2015 12:10:24 MEZ, schrieb Yuri D'Elia <wavexx@thregr.org>:

I don't think this should be marked wontfixed, but rather kept open and given a higher severity.
And especially further non desktop stuff should probably be forbidden to strongly require dbus until this problem has been solved by upstream.


When dbus used to be mostly for the desktop, that design problem didn't cause much worries, but nowadays one basically cannot circumvent it on server systems either, and I guess in the future it will be rather used more than less.

But not being able to restart it (e.g. for security updates) without a reboot is in fact a major issue that upstream  should work on.
Especially when one considers that there are still many things that can't be clustered.

Cheers,
Chris.

#805449#29
Date:
2015-12-26 12:50:23 UTC
From:
To:
So while I was restarting the system for another dbus update, I actually
changed my mind. This should be of critical priority because of the way we're
using it. A systemd-based system becomes essentially un-usable after a dbus
restart.

It might as well be a critical design issue in systemd for failing to recognize
that the current dbus implementation cannot be restarted.

Either way, it's unacceptable.

#805449#34
Date:
2017-08-26 08:07:55 UTC
From:
To:
I'd like to chime in on this issue. I just did a point update on a
remote server, and got the "must reboot for dbus" message. It is a
royal pain to reboot that particular computer. Moreover, if dbus were
upgraded automatically as a security patch (unattended-upgrades) there
would be no person seeing that little bon-mot.

This behaviour really is unacceptable, and this bug should be elevated
in priority.

One technical solution would be to serialize the state of dbus and do
an orderly handover to an upgraded daemon, as discussed above. I'd
suggest that it might be easier to try to ensure that dbus is
"crashable" or "crash-only", meaning that if the daemon is
unexpectedly terminated with prejudice, it will be restarted
automatically and its clients will all reestablish appropriate
connections. There is a rich literature on such software, and it is in
general a more convenient way to build robust systems. (ref:
https://en.wikipedia.org/wiki/Crash-only_software) Stateless protocols
are an example of this philosophy. However even stateful connections
can be accommodated, by ensuring that clients can reconnect and
re-establish the desired state when necessary.

#805449#39
Date:
2017-08-26 17:10:56 UTC
From:
To:
I agree that in a perfect world, D-Bus would be restartable. However,
we do not live in that perfect world. If we did, many things would work
differently. If it was as simple as "raise the bug priority and someone
will solve the problem", then it would have been resolved a decade ago.

You are welcome to try to fix this behaviour while respecting the various
constraints that exist. However, consensus among the upstream maintainers
of D-Bus (the protocol) and dbus (the reference implementation of that
protocol) is that attempting to implement state handover (re-exec) would
be sufficiently complex that the most likely result would be worse bugs
than the need to restart. In particular, it is plausible that bugs in
the re-exec handover would themselves cause security vulnerabilities.
As a result, I would suggest that your time (and the D-Bus maintainers'
time, for either implementation or review) would be better spent
elsewhere.
reference implementation dbus, and compatible reimplementations like
systemd's sd-bus and GLib's GDBus) do not reconnect, and that change
would break them. The D-Bus protocol was frozen more than 10 years
ago, so incompatible changes cannot be accepted.

Also, the application-facing API of D-Bus signals (broadcasts) cannot
work across a reconnection, because there is no way to know whether you
have missed a signal during the time between disconnection and
reconnection, and many higher-level APIs that use D-Bus rely on never
missing a signal for their correctness. A hypothetical protocol that
resembles D-Bus but is crash-safe would require application code to be
aware of reconnections, and resynchronize state after reconnection;
the lower-level code that implements the D-Bus protocol does not have
enough domain-specific knowledge to be able to resynchronize.

If you want a crash-safe IPC protocol, you are welcome to design and
implement one, but D-Bus is not and will not be that protocol. A broadly
similar but crash-safe protocol would be sufficiently different, from
an application's point of view, that it would need to have a separate
name (evencalling it something like "D-Bus 2.0" would be deeply
misleading) and would need to be implemented separately.

I know this is not the answer you want, but it's the only one I have.

Regards,
    S

#805449#44
Date:
2017-08-27 09:45:30 UTC
From:
To:
Thanks for the detailed analysis of the technical difficulty of
allowing dbus to be restarted. It really does sound like some
unfortunate design decisions were made. However, I don't think it
would be quite impossible to work around. E.g., the libraries used to
connect to dbus could be modified to attempt to silently reconnect,
and a replay daemon could record and replay events upon request to
allow the reconnecting processes to see what they'd missed. I'll grant
that it would be a serious undertaking. On the other hand, it really
is a pretty dire situation when patching a dbus security bug is as
disruptive to high-availability systems as patching a kernel security
bug.

If I might ask, how is this currently handled with
unattended-upgrades? The whole point of enabling unattended upgrades
for security is to avoid running vulnerable systems, even when the
system has no babysitter. This would seem to mandate a triggered
unattended reboot. But that's not being done right now; I'm not even
sure we have a trigger for that. Just leaving the system vulnerable
until a coincidental reboot doesn't seem appropriate.

#805449#49
Date:
2017-11-26 15:13:48 UTC
From:
To:
It is (or should be) handled the same way as upgrading the kernel, which
cannot be replaced in-place; display managers like gdm, which cannot
be replaced in-place without disrupting user sessions; and in general
anything that isn't conveniently restartable by a systemd unit or
init script, like user sessions and user-provided code.

dbus.postinst touches /var/run/reboot-required, which is apparently
used by unattended-upgrades to detect that a reboot is needed
(/etc/kernel/postinst.d/unattended-upgrades does the same thing).
If there are other APIs for notifying the rest of the system that a
reboot is needed, I'd be happy to add them - please file a wishlist bug
with a link to their API documentation. #867263 suggests that there
might be some other API that provides more information than "a reboot
is needed", but doesn't specify who consumes that information or what
its "API" is, so is not currently actionable.

In general, unattended upgrade infrastructure can't know for sure that
a reboot isn't needed. If there's a bad enough security vulnerability
to be seriously concerned about it, the safe thing to do is to reboot
the system, to ensure that there can't be any lurking libraries or
configurations from before the upgrade. I know this is disruptive,
particularly on systems that have been designed with the assumption
that reboots don't often happen, but it's also the only way to be
sure; and system designs have to be able to cope with semi-frequent
reboots *anyway*, because the kernel semi-frequently has exploitable
vulnerabilities fixed.

This is one of the main reasons that non-apt OS deployment mechanisms
like OSTree require a reboot to apply an upgrade. The other is that
their designers want to perform an atomic cut-over from one
good/consistent state to another, eliminating the transitional
undefined/broken state that occurs during apt/rpm/etc. upgrades, which
could cause incorrect behaviour when related code runs during the upgrade
or if the upgrade is interrupted by a power failure, an unexpected reboot
or sysadmin action.

    smcv

#805449#54
Date:
2023-06-28 01:50:07 UTC
From:
To:
Hi,

I got this:

Setting up dbus (1.12.28-0+deb11u1) ...
A reboot is required to replace the running dbus-daemon.
Please reboot the system when convenient.

I read through the bugreport and I can see why it should not
just be automatically restarted.

But I could just restart dbus and then all things that use it.
This is a systemd-free shop, so… I don’t even know what would
use dbus at all. It’s there as Qt dependency :/

Is there a “needrestart”-like tool, or something like ps/netstat,
showing which programs use dbus, for restarting them afterwards?

bye,
//mirabilos

#805449#59
Date:
2023-06-28 10:17:23 UTC
From:
To:
This has been discussed at *extensive* length before, and I'm pretty
sure nothing I say is going to change your opinion anyway, but OK,
let's do this one more time.

I should start by saying that none of this is new, and none of this was
my idea: what I describe here has been how D-Bus is designed to work for
at least 15 years, more like 20, considerably pre-dating my involvement
in it. I'm constrained by compatibility just as much as anyone else is.

Restarting the system dbus-daemon, `dbus-daemon --system`, on a running
system is not a supported action. It disconnects all system-level D-Bus
clients and services, which nearly always causes them to exit. Of the
major protocol implementations, libdbus calls exit() unless told not to;
GDBus does a raise(SIGTERM) unless told not to, which with hindsight is
a better route to take because long-running processes need to cope with
being sent SIGTERM anyway, either exiting immediately from its default
handling or triggering whatever cleanup they need; and I don't immediately
know what sd-bus does.

Exiting on disconenction by default is very much intentional here. If
they didn't exit, and even if they automatically reconnected (which in
practice none do), then D-Bus clients would lose track of the state
of the services they are communicating with (because they would have
missed an unknown number of change-notification signal messages while
they were reconnecting). For well-designed D-Bus services, the clients
would need domain-specific knowledge of how to do state-recovery by
calling methods to ask the service "sorry, I got disconnected, what is
the state of the world now?", which is a code path that would be rarely
tested and therefore in practice often wouldn't work; and they would
also need to be careful not to take any actions with side-effects based
on potentially outdated information while they were catching up with
the current state of the world.

For less well-designed D-Bus services, state recovery is not always
even implementable, because API designers haven't always thought about
it. (Obviously you should use well-designed services if you can, but
sometimes those are not what's available.)

I've said "clients" and "services" as a simplification, but as far as
the design of D-Bus is concerned, they are all just peers - many services
that use D-Bus to provide functionality to other processes are also a
client of a "smaller" service - so anything that is necessary for clients
is also necessary for services.

Restarting the session dbus-daemon, `dbus-daemon --session`, in a
running login session is similarly not a supported action. It disconnects
all session-level (per-login or per-user) D-Bus clients and services,
similar to what happens on the system bus.

On the session bus, as well as the practical concerns above, the design
is that being disconnected from the session bus is the way that non-GUI
processes are notified that the session has ended, so if they did not
exit when disconnected (either immediately or after doing some cleanup),
they would continue to run after your login session was ended; and instead
of bug reports about how you think the dbus-daemon should restart itself
on upgrade, you'd be opening bug reports about how some random session
service (one that doesn't use X11 itself, like dconf or gnome-keyring)
isn't exiting as you want it to when you log out from an X11 session.
Having background session services exit deterministically when the session
ends was one of the original (circa 2002) design goals of D-Bus.

Compare with any other stateful IPC protocol, like X11. In principle, it
is possible to write an X11 client that can reconnect when you restart the
X11 server. In practice, essentially nobody does (I think Emacs might?)
because it's difficult to do correctly; and arguably it's a bug to try,
because being disconnected from the X11 server is the way that graphical
applications are told "the session is over, it's time to exit".

My earlier messages to this bug talk about handing off IPC connections
across a re-exec. In principle, there would be nothing to stop Someone™
from making it possible to restart the dbus-daemon without disconnecting
all of the connected processes, by serializing the entire state of the
dbus-daemon, fd-passing all of its connections to some helper process,
re-exec'ing itself, reading the serialized state back in, receiving
the conection fds back from the helper process and carrying on. systemd
does the equivalent thing for its state and its (intentionally simpler)
IPC protocols.

I am not aware of anyone having ever seriously attempted this for D-Bus
(or for X11, which would likely have comparable complexity). Even
if someone stepped up to do that, reviewing that implementation for
correctness (and/or fixing the inevitable CVEs when it turned out not
to be perfect) would take a significant amount of maintainer time that,
to be blunt, I just don't have.

You can do this if you are confident that you understand what will happen,
but it is an unsupported action and I will not help you to do so.

On a typical Debian system, restarting `dbus-daemon --system` will
cause system services to stop working, in a way that is only practical
to recover from by a reboot. If you have an in-depth understanding of
all the components on your system that use D-Bus, then I'm sure you can
figure out how to identify and restart them manually in the correct order,
but that is not a supported or supportable thing to do.

Similarly, in general restarting `dbus-daemon --session` will cause
parts of your X11 (or Wayland) session to stop working, in a way that
is only practical to recover from by logging out and back in.

Both of these would be extremely disruptive, and dbus-daemon would
(correctly!) receive critical-severity bug reports if we tried to do
this automatically, so we don't.

D-Bus is about 8 years older than systemd, and many things that are
not systemd use it. I don't know what all of them are, in the same way
that the maintainers of Xorg would be unable to give you a comprehensive
list of all the X11 applications that exist.

The D-Bus protocol operates over an AF_UNIX socket (a lot like X11 and
Wayland), so passing suitable options to netstat or ss will tell you
what is connected to it.

    smcv

#805449#64
Date:
2023-06-28 17:24:32 UTC
From:
To:
Simon McVittie dixit:

I wrote “I can see why”, not “I can’t see why”.

But that’s OKAY! I want them to exit so that I can restart them.
I just need a list of affected services.

I don’t have such a thing.

$ ps ax | fgrep dbus
 1497 ?        Ss     0:00 /usr/bin/dbus-daemon --system
 6883 pts/17   S+     0:00 grep -F dbus

So it’s just system services.

Yes, I know, it’s fine. I just want a list of services to restart.

WHICH ones?

I’m pretty sure this only applies to systemd, which I don’t use.

Yes, but which ones, or rather, how can I find them out?

Hmm.

This?

$ sudo netstat -anp | fgrep 1497
unix  2      [ ACC ]     STREAM     LISTENING     14017    1497/dbus-daemon     /run/dbus/system_bus_socket
unix  3      [ ]         STREAM     CONNECTED     1356473  1497/dbus-daemon     /run/dbus/system_bus_socket
unix  2      [ ]         DGRAM                    1267167  1497/dbus-daemon
unix  3      [ ]         STREAM     CONNECTED     15526    1497/dbus-daemon
unix  3      [ ]         STREAM     CONNECTED     15527    1497/dbus-daemon

And then this?

$ sudo netstat -anp | fgrep /run/dbus/system_bus_socket
unix  2      [ ACC ]     STREAM     LISTENING     14017    1497/dbus-daemon     /run/dbus/system_bus_socket
unix  3      [ ]         STREAM     CONNECTED     1356473  1497/dbus-daemon     /run/dbus/system_bus_socket

Does this mean nothing is connected to the running system dbus?

Another system has even more:

$ sudo netstat -anp | fgrep /run/dbus/system_bus_socket
unix  2      [ ACC ]     STREAM     LISTENING     20510    2203/dbus-daemon     /run/dbus/system_bus_socket
unix  3      [ ]         STREAM     CONNECTED     31073    2203/dbus-daemon     /run/dbus/system_bus_socket
unix  3      [ ]         STREAM     CONNECTED     24671    2203/dbus-daemon     /run/dbus/system_bus_socket
unix  3      [ ]         STREAM     CONNECTED     24022    2203/dbus-daemon     /run/dbus/system_bus_socket
unix  3      [ ]         STREAM     CONNECTED     20019    2203/dbus-daemon     /run/dbus/system_bus_socket
unix  3      [ ]         STREAM     CONNECTED     20769    2203/dbus-daemon     /run/dbus/system_bus_socket
unix  3      [ ]         STREAM     CONNECTED     28910    2203/dbus-daemon     /run/dbus/system_bus_socket

But still all in $pid/dbus-daemon.

$ sudo lsof /run/dbus/system_bus_socket
COMMAND    PID       USER   FD   TYPE             DEVICE SIZE/OFF  NODE NAME
dbus-daem 2203 messagebus    4u  unix 0x000000005a3413d3      0t0 20510 /run/dbus/system_bus_socket type=STREAM
dbus-daem 2203 messagebus    9u  unix 0x00000000ec903f8e      0t0 20769 /run/dbus/system_bus_socket type=STREAM
dbus-daem 2203 messagebus   10u  unix 0x00000000f91f0f2d      0t0 20019 /run/dbus/system_bus_socket type=STREAM
dbus-daem 2203 messagebus   11u  unix 0x00000000487984d7      0t0 31073 /run/dbus/system_bus_socket type=STREAM
dbus-daem 2203 messagebus   12u  unix 0x00000000fb9c734e      0t0 24022 /run/dbus/system_bus_socket type=STREAM
dbus-daem 2203 messagebus   13u  unix 0x00000000a12611a9      0t0 24671 /run/dbus/system_bus_socket type=STREAM
dbus-daem 2203 messagebus   14u  unix 0x00000000266ed9ec      0t0 28910 /run/dbus/system_bus_socket type=STREAM

Would either of these commands positively find all affected users
of dbus?

$ ll /run/dbus/
total 4
-rw-r--r-- 1 root root 5 Jun 24 23:53 pid
srwxrwxrwx 1 root root 0 Jun 24 23:53 system_bus_socket=

(on both systems)

That’s all I want to know…

bye,
//mirabilos

#805449#69
Date:
2023-06-28 19:58:42 UTC
From:
To:
I don't know what's installed on your system, so I have no way to
answer that. A random selection of services that definitely do listen
on the system bus: avahi-daemon, polkitd, rtkit-daemon, bluetoothd,
ModemManager, NetworkManager, firewalld, udisksd, elogind. This is far
from an exhaustive list, Debian is quite large.

This does not only apply to systemd. avahi-daemon and bluetoothd are
examples of services that rely on D-Bus and are older than systemd,
and elogind relies on D-Bus and is not even co-installable with systemd.

(Perhaps you don't use those either, but that is not implied by you not
using systemd.)
...
(etc.)

Unfortunately, that looks as though netstat is telling us the process IDs
of the server end of each connection (in the dbus-daemon), which tells
us nothing about the client end (the processes you want to list). So
you now have a good guess at *how many* clients there are, but not
*what*. That's a lot less useful than I had hoped.

If you use `sudo ss --unix -p` instead of netstat, you can correlate
the dbus-daemon end that netstat also shows with the client end by their
inode number. For instance this is the equivalent of one of your lines of
netstat output:

u_str ESTAB 0      0                     /run/dbus/system_bus_socket 33798              * 33797   users:(("dbus-daemon",pid=798,fd=46))                                                         >

and if I search the output for 33797, I can find the other end of this
connection:

u_str ESTAB 0      0                                               * 33797              * 33798   users:(("gvfs-udisks2-vo",pid=2093,fd=5))                                                     >

which tells me which process is connected to the system bus. Not exactly
convenient, but maybe scriptable (although looking at wherever netstat and
ss get their information from might be more reliable than screen-scraping
their output).

I think I've already spent more time on this than I can justify, and
I suspect you have too: dbus is not uploaded very often (particularly
to stable releases), so there's a really low limit to how many reboots
can be avoided by micro-optimizing how you deal with dbus updates. The
kernel gets at least an order of magnitude more security updates than
dbus, and you have to reboot to put a kernel update into effect anyway,
so occasional reboots are going to be necessary for a maintained machine
regardless of whether you reboot for a dbus update.

    smcv

#805449#74
Date:
2023-06-28 20:19:34 UTC
From:
To:
Simon McVittie dixit:

Yeah, but I was hoping for a method to enumerate them.

[ context ]

Indeed, but these can easily be restarted. You wrote:

This doesn’t apply to avahi-daemon, for example.

Yeah, well, no elogind here after it changed the laptop to crash¹
on lid close. I threw that out very quickly…

① it tried to suspend, but I don’t have suspend-to-disc set up
[…]

Oh, that seems better. I haven’t used the ss tool yet.

$ sudo ss --unix -p | fgrep dbus
u_str ESTAB 0      0      /run/dbus/system_bus_socket 1356473            * 1355554 users:(("dbus-daemon",pid=1497,fd=9))
u_str ESTAB 0      0             /tmp/dbus-HWIk9QtjqW 1370793            * 1359638 users:(("dbus-daemon",pid=9974,fd=15))
u_str ESTAB 0      0                                * 15526              * 15527   users:(("dbus-daemon",pid=1497,fd=7))
u_str ESTAB 0      0                                * 15527              * 15526   users:(("dbus-daemon",pid=1497,fd=8))
u_str ESTAB 0      0                                * 1370736            * 1359493 users:(("dbus-launch",pid=9973,fd=3))
u_str ESTAB 0      0                                * 1359499            * 1359500 users:(("dbus-launch",pid=9973,fd=4))
u_str ESTAB 0      0                                * 1359498            * 1359497 users:(("dbus-daemon",pid=9974,fd=7))
u_str ESTAB 0      0             /tmp/dbus-HWIk9QtjqW 1370769            * 1359608 users:(("dbus-daemon",pid=9974,fd=11))
u_str ESTAB 0      0                                * 1359497            * 1359498 users:(("dbus-daemon",pid=9974,fd=6))
u_str ESTAB 0      0             /tmp/dbus-HWIk9QtjqW 1370774            * 1359625 users:(("dbus-daemon",pid=9974,fd=14))
u_str ESTAB 0      0             /tmp/dbus-HWIk9QtjqW 1370770            * 1359618 users:(("dbus-daemon",pid=9974,fd=12))
u_str ESTAB 0      0      /run/dbus/system_bus_socket 1359632            * 1359631 users:(("dbus-daemon",pid=1497,fd=11))
$ sudo ss --unix -p | fgrep 1355554
u_str ESTAB 0      0      /run/dbus/system_bus_socket 1356473            * 1355554 users:(("dbus-daemon",pid=1497,fd=9))
u_str ESTAB 0      0                                * 1355554            * 1356473 users:(("Xorg",pid=3548,fd=11))
$ sudo ss --unix -p | fgrep 1359631
u_str ESTAB 0      0                                * 1359631            * 1359632 users:(("xdg-desktop-por",pid=9999,fd=7))
u_str ESTAB 0      0      /run/dbus/system_bus_socket 1359632            * 1359631 users:(("dbus-daemon",pid=1497,fd=11))
$ ps ax -q 3548,9999
  PID TTY      STAT   TIME COMMAND
 3548 tty2     Rl     2:09 /usr/lib/xorg/Xorg -retro -nolisten tcp -dpi 106 :0 vt2 -keeptty -auth /tmp/serverauth
 9999 ?        Sl     0:00 /usr/libexec/xdg-desktop-portal

The output is wider than my laptop screen, despite being cut off,
but I can now see that, on this machine, it’s apparently the X11
server (why?) and xdg-desktop-portal (understandable, but TTBOMK
that gets (re‑?)started automatically anyway, it’s for Firefox).

But this is basically what I was looking for, yes.

Yeah, depends, I occasionally have a VM host with long-running tasks
on the VMs. I was able to reboot it anyway today but was interested in
how this can be generically decided.

Feel free to extract this discussion into README.Debian of dbus.

Thanks,
//mirabilos