#993716 bridge-utils: IPv6 network bridge fails after upgrading Buster to Bullseye

Package:
bridge-utils
Source:
bridge-utils
Description:
Utilities for configuring the Linux Ethernet bridge
Submitter:
Pieter Hollander
Date:
2022-04-05 08:30:02 UTC
Severity:
important
Tags:
#993716#5
Date:
2021-09-05 11:48:16 UTC
From:
To:
After upgrading from Buster to Bullseye, bridge-utils no longer
configures my IPv6 network bridge correctly.
It fails to bring up br0 and therefore also breaks Unbound on my system,
as the IPv6 interface address Unbound needs to bind to remains in
"tentative" state.
Looking at the bridge-utils changelog, I thought it might be related to
the fixes for #980752, but I am unsure about this.
I have personally not been able to fix the problem yet and cannot find
documentation explaining the change in bridge-utils behaviour.

The following setup works correctly on buster but fails on bullseye.
"sudo journalctl -u networking -f" shows the following message:

Starting Raise network interfaces...
Sep 02 12:10:30 x ifup[849]: Waiting for DAD... Done
Sep 02 12:10:37 x ifup[1042]: Waiting for DAD... Timed out
Sep 02 12:10:37 x ifup[712]: ifup: failed to bring up br0
Sep 02 12:10:37 x systemd[1]: networking.service: Main process exited,
code=exited, status=1/FAILURE
Sep 02 12:10:37 x systemd[1]: networking.service: Failed with result
'exit-code'.
Sep 02 12:10:37 x systemd[1]: Failed to start Raise network interfaces.

Overview of configuration:

/etc/network/interfaces:

# interfaces(5) file used by ifup(8) and ifdown(8)
# Include files from /etc/network/interfaces.d:
source-directory /etc/network/interfaces.d

auto lo
iface lo inet loopback

auto ens3
iface ens3 inet static
         address 203.0.113.0.118
         netmask 255.255.255.0
         gateway 203.0.113.0.1
         dns-nameservers 127.0.0.1

iface ens3 inet6 static
         address 2001:db8::1
         netmask 128
         gateway fe80::1
         dns-nameservers ::1

auto br0
iface br0 inet static
         address  10.10.10.1
         netmask  255.255.255.0
         bridge_ports none
         bridge_stp off
         bridge_fd 0

iface br0 inet6 static
         address 2001:db8::2
         netmask 64

/etc/sysctl.conf

net.ipv4.ip_forward=1
net.ipv6.conf.all.forwarding=1
net.ipv6.conf.ens3.proxy_ndp=1

/etc/ndppd.conf

route-ttl 30000
proxy ens3 {
     router no
     timeout 500
     ttl 30000
     rule 2001:db8::/64 {
     auto
     }
}

/etc/unbound/unbound.conf:

include: "/etc/unbound/unbound.conf.d/*.conf"

server:
interface: 127.0.0.1
interface: ::1
access-control: 127.0.0.0/8 allow
access-control: ::1/128 allow

# Listen on the LXC bridge IPv4 & IPv6 network
interface: 10.10.10.1
interface: 2001:db8::2
# Allow access to Unbound from containers
access-control: 10.10.10.1/24 allow
access-control: 2001:db8::1/64 allow

#993716#10
Date:
2021-09-05 16:27:27 UTC
From:
To:
X-Debbugs-CC: Pieter Hollander <debian@hollander.online>

These stanzas are missing the "bridge_hw" entry which can be a MAC
address or the name of the interface whose MAC to take.  Thus your
bridge ends up being connected to nothing.

The NEWS.Debian file has details on why this was introduced (the
kernel changed behaviour).  You aren't the only one who ran into this.

Regards.

#993716#15
Date:
2021-09-05 20:06:24 UTC
From:
To:
I also encountered this.

My working hypothesis is that it may be related to this (from README.Debian):

Could you try duplicating the bridge_ports in the inet6 stanza and see if that eliminates the problem?

Alternatively, perhaps adding dad-attempts 0 would work.

I have not yet been able to try either remedy. Please let me know if one of them helps.

#993716#20
Date:
2021-09-06 09:04:58 UTC
From:
To:
Hi!

I'm CCing Josué because this seems to be more on ifupdown's side than on
bridge-utils.

I know we are having quite some trouble with the random bridge mac address,
but this doesn't seem to be one of those cases.

For what I see this is a problem with DAD, the bridge is created without any
port attached to it, so the kernel doesn't allow it to transition from:
18: br0    inet6 X/64 scope link tentative \       valid_lft forever preferred_lft forever
to:
18: br0    inet6 X/64 scope link \       valid_lft forever preferred_lft forever

This is because without any port on the bridge the kernel cannot do any DAD.
So, without trasitioning it remains on tentative all the time, and thus the
script /lib/ifupdown/settle-dad.sh from the package ifupdown exits with a
timeout message, like the one you are seing and an error status of 1, thus
breaking the network setup.

I have thanged the exit 1 to an exit 0 on that script and everything works
like expected, this is a nasty workaround while we don't arrive to a better
solution, the other solution would be to attach something to the bridge,
maybe even a dummy port or similar.

Josué, we've had the idea of integrating bridge setup (now on bridge-utils)
into ifupdown, I wouldn't mind doing this for Bookworm, I would continue to
take care of this part to the best of my knowledte even if it is on
ifupdown.  Maybe it it is the time to do that.  As for this bug, the
workaround I describe is not a valid solution, but maybe we can check on the
settle-dad script to see if the device is a bridge without any interface
added to it, and thus not transitioning, and return with a 0 on the timeout
in that case?

About integration...  we have talked about that on some bugs that are
somehow half way between bridge-utils and ifupdown, last one may be #939713,
I would try to rewrite everything on the ifupdown scripts to depend on ip
and not brctl, so that ifupdown wouldn't depend on bridge-utils.

We can start some other thread on this if you want or we can talk about this
on irc or mail, whatever.

As for this bug... I believe it is on the ifupdown side, so I think we
should reasign it, unless you see a way to fix this problem on the
bridge-utils side, but I can't think about any fix on bridge-utils side
right now.

Regards.

#993716#25
Date:
2021-09-06 12:25:24 UTC
From:
To:
PS: Additionally, I forgot to mention that adding dad-attempts 0 to the
br0 inet6 config also solves the issues of networking failing.

#993716#30
Date:
2021-09-06 12:17:26 UTC
From:
To:
Hi all,

I tried the fixes proposed here. Unfortunately, adding bridge_ports or
bridge_hw did not solve the issue.
The workaround of changing exit 1 to exit 0 in the
/lib/ifupdown/settle-dad.sh "DAD timed out: section did cause networking
to start up successfully, although it still logs the timeout.

systemd[1]: Starting Raise network interfaces...
ifup[1463]: Waiting for DAD... Done
ifup[1606]: Waiting for DAD... Timed out
systemd[1]: Finished Raise network interfaces.

Using a dummy port to perform DAD sounds like the best solution to me.

Back to the workaround: It also solves the problems I encountered with
Unbound & PostgreSQL being unable to bind to the IPv6 interface address
specified.

Thank you all for the swift replies. This is my first Debian bug report
and it's great to be met with this amount of responses.
Feel free to move it to ifupdown if that's more applicable.

It sounds good to fix it with a long-term solution in Bookworm,
but as I know multiple people depending on a configuration similar to
the one I described earlier, it might be wise to fix in Bullseye as well.

Best regards,
Pieter

#993716#35
Date:
2022-01-08 19:29:42 UTC
From:
To:
Hi Maintainer

The release team no longer [1] considers popcon a criterion for
inclusion in the list of key packages [2].

This email is a courtesy reminder of this bug, and should prevent
instant auto-removal once the rule is changed in britney.

Regards
Graham


[1] http://meetbot.debian.net/debian-release/2021/debian-release.2021-01-27-19.07.html
[2] https://udd.debian.org/cgi-bin/key_packages.yaml.cgi

#993716#40
Date:
2022-01-31 05:45:22 UTC
From:
To:
This is more a bug with the kernel than bridge-utils.  brctl just
configures the kernel ioctls for the bridge.  Same with systemd-networkd.
Lodge the bug against the kernel!  I have had issues with these kernel
bridges, and switched to running openvswitch as that is what Open
Stack uses extensively.

Regards,

Matt Grant

#993716#45
Date:
2022-03-09 18:04:12 UTC
From:
To:
Hi,

* Santiago Garcia Mantinan [Mon Sep 06, 2021 at 11:04:58AM +0200]:

[...]
[...]

bridge-utils disappeared from Debian/testing due to this RC bug,
should we downgrade severity and/or reassign the bug to proceed from
here?

regards
-mika-

#993716#52
Date:
2022-04-05 08:27:05 UTC
From:
To:
Hi,

I have lowered the severity of this bug to allow the package to migrate
back to testing.

The reasoning is that, yes, there's an important issue to fix, as
upgrade is broken. However, having no bridgeutils in testing is more
harmful than this bug.

Cheers,

Thomas Goirand (zigo)