Hello Debian team, I would like to report problem which possibly has to do with IPROUTE2 package, I’m experiencing it both Debian 10 (this) and 12 (6.1.0-3). I really did my best reviewing at least 7 stack-exchange (and like) stories and I’m at my wit’s end, wondering why this is possibly not fixed in 2023 seeing debates go back into like 2014.. So it’s plain simple to want to make multiple namespaces able to communicate via common host bridge to external network. VETH tech is all time documented as solution to this. The problem on given path in subject is this: NS veth IP@ = .251 , 0e:61:72:97:aa:ff (Bridge) veth IP@ = .44 , ce:18:16:4b:0c:c2 Bridge IP@ = .254 , 00:50:56:01:01:02 External IP@ = .other 1) When I initially set up plain “veth port --> NS veth port”, with IP@ at each end, it’s all seamless, ARP and pings.. 3) Veth also does not work on IP level anymore, all the time with ICMP echo from NS space it runs ARP first, though both “ip nei” are populated with mutual MAC records. The following goes in loop.. peterg@debian:~$ sudo tcpdump -ni vinet-brp tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vinet-brp, link-type EN10MB (Ethernet), capture size 262144 bytes 11:18:12.966955 IP 70.0.0.251 > 70.0.0.44: ICMP echo request, id 2333, seq 0, length 64 11:18:12.966984 ARP, Request who-has 70.0.0.251 tell 70.0.0.44, length 28 11:18:12.966989 ARP, Reply 70.0.0.251 is-at 0e:61:72:97:aa:ff, length 28 11:18:13.967994 IP 70.0.0.251 > 70.0.0.44: ICMP echo request, id 2333, seq 1, length 4) Once I configure bridge iface with some IP address of same subnet /24 like veth and NS veth (also externals) use → the NS nei can show changing MAC address for bridge veth iface 11:30:27.860907 ARP, Reply 70.0.0.251 is-at 0e:61:72:97:aa:ff, length 28 11:30:28.848251 IP 70.0.0.251 > 70.0.0.44: ICMP echo request, id 2352, seq 14, length 64 11:30:28.884892 ARP, Request who-has 70.0.0.251 tell 70.0.0.44, length 28 11:30:28.884908 ARP, Reply 70.0.0.251 is-at 0e:61:72:97:aa:ff, length 28 11:30:28.980890 ARP, Request who-has 70.0.0.44 tell 70.0.0.251, length 28 11:30:28.980909 ARP, Reply 70.0.0.44 is-at 00:50:56:01:01:02, length 28 <--- inet_bash >> ip nei 70.0.0.1 dev vinet FAILED 70.0.0.44 dev vinet lladdr ce:18:16:4b:0c:c2 DELAY <--- 5) The bridge vs NS veth pinging works 6) The bridge relays ARP into external network and back (checked on Cisco switch), learns of external MAC@s ===> 7) External MAC@ does not make it to NS space by ARP <=== 8) I don’t aim to deploy IP@ with bridge and bridge veth ifaces → this is just to check how it works 9) This blog was quite surprising stating that bridge without IP@ can affect routing in global namespace, few /sys kernel tweaks → no help https://unix.stackexchange.com/questions/655602/why-two-bridged-veth-cannot-ping-each-other/674621#674621 10) Even tried to stop default MAC learning on bridge veth iface by “ip link set dev vinet-brp type bridge_slave learning off” ⇒ did not work, neigh flushed and pinging .251 vs .254 just worked. So I believe that bridge veth iface is broken in its essential functionality using default “learning/flooding on” settings. Thanks for your time to look at this and give hope or deny this so I need to create extra ports in my host to get what I want to. BR Peter
Control: tags -1 upstream You need to report this upstream, nobody here is going to look at something like this
No attempt at all? Then it's against your own rules I've read before submitting this. Control: tags -1 upstream You need to report this upstream, nobody here is going to look at something like this
Hi Peter,
I think Luca was a bit harsh here, I'd be happy to help debug this. From a
first look it seems unlikely this is related to iproute2, smells more like
a kernel issue to me, but either way we need a reproducer.
So first step to move this forward is to put together a self contained set
of instructions to reproduce the problem. Your original report is a bit
sparse on context and details.
If you don't feel up to compiling the reproducer script yourself you could
start by showing us your system state using
$ ip -d addr show # on the host and inside the NS if you could
$ bridge -d link; bridge fdb
snippets from /etc/network/interfaces or similar relevant config would help
too.
Hi Daniel,
Thank you for this yet hope even my joy to help fix something in this amazing Linux world you dive so deep in in contrast to me.
Would it be possible to join a Webex session setup by me to check this out quickly? It's all lab environment.
If not I will proceed per your instructions
BR
Peter
Hi Peter,
I think Luca was a bit harsh here, I'd be happy to help debug this. From a first look it seems unlikely this is related to iproute2, smells more like a kernel issue to me, but either way we need a reproducer.
So first step to move this forward is to put together a self contained set of instructions to reproduce the problem. Your original report is a bit sparse on context and details.
If you don't feel up to compiling the reproducer script yourself you could start by showing us your system state using
$ ip -d addr show # on the host and inside the NS if you could
$ bridge -d link; bridge fdb
snippets from /etc/network/interfaces or similar relevant config would help too.
Hi Peter, I don't think that would help with reproducing your environment in this case, besides I only offer synchronous debugging sessions for paid consulting engagements. Please do.
Hi Daniel,
Definitely I can't do any script at the moment, so manual steps could be enough I hope so.
1) As was reported, foreign external world MAC@ does not pass into network namespace, just external border point "vlan199"
2) now collecting data for you, honestly I don’t see external mac address on "inet-br" object, so my previous statement was incorrect.. {ossibly I might mixed this up with another "labinet-br" (working in its limited scope) which is IP-defined, while "inet-br" in question is not.
3) so question is, if the MACs learnt via vlan199 are supposed to be paired (displayed) with "inet-br" object and all way up into NS....
4) I collected all into text file. If this is problem, then I paste it here.
Thanks, BR
Peter
Hi Peter,
I don't think that would help with reproducing your environment in this case, besides I only offer synchronous debugging sessions for paid consulting engagements.
Please do.
Hi Daniel,
Hope you are good. What is the outlook after a week here? Thanks.
BR
Peter
Hi Daniel,
Definitely I can't do any script at the moment, so manual steps could be enough I hope so.
1) As was reported, foreign external world MAC@ does not pass into network namespace, just external border point "vlan199"
2) now collecting data for you, honestly I don’t see external mac address on "inet-br" object, so my previous statement was incorrect.. {ossibly I might mixed this up with another "labinet-br" (working in its limited scope) which is IP-defined, while "inet-br" in question is not.
3) so question is, if the MACs learnt via vlan199 are supposed to be paired (displayed) with "inet-br" object and all way up into NS....
4) I collected all into text file. If this is problem, then I paste it here.
Thanks, BR
Peter
Hi Peter,
I don't think that would help with reproducing your environment in this case, besides I only offer synchronous debugging sessions for paid consulting engagements.
Please do.
Hi Peter,
looking at the ip/bridge dumps I don't see anything obviously broken so I
started by building a reproducer using two netns'en and a bridge on the
host to simulate your setup, leaving out the vlan stuff for now.
I setup two namespaces ns0/ns1 with a veth pair each connected to br0 on
the host. I assign MAC addresses statically to make looking at `bridge fdb`
easier (grep ^aa:). The script looks like this (trimmed, full version
attached):
ip netns add ns0
ip link add veth0 type veth \
peer name veth0 address aa:00:00:00:00:00 netns ns0
ip netns add ns1
ip link add veth1 type veth \
peer name veth1 address aa:00:00:00:00:01 netns ns1
ip link add br0 address aa:bb:bb:bb:bb:bb type bridge forward_delay 0
#^ forward_delay=0 to disable STP as this delays interfaces coming up
ip link set dev veth0 master br0
ip link set dev veth1 master br0
ip -n ns0 addr add 10.0.0.100/24 dev veth0
ip -n ns1 addr add 10.0.0.101/24 dev veth1
ip link set br0 up
ip link set dev veth0 up
ip -n ns0 link set dev veth0 up
ip link set dev veth1 up
ip -n ns1 link set dev veth1 up
ip -n ns0 link set dev lo up
ip -n ns1 link set dev lo up
ip netns exec ns0 ping -c4 10.0.0.101
Seems to work fine. So we can establish the basic functionality does work
and we need to go deeper.
Peter, can you confirm this script works on your system? If so the next
step is introducing vlans.
How did you check this?
reaching into the NS? I assume using `ip neigh`? I'd have a look at tcpdump
this will tell you whether ARP is even reaching the NS or not.
Something simple like
$ tcpdump -enli $IFACE 'arp or icmp'
optionally you can filter by MAC (`ether host` in pcap-filter speak):
$ tcpdump -enli $IFACE ('arp or icmp) and ether host aa:00:00:00:00:01
Oh and one last thing: please double and tripple check that a firewall
isn't interfering.
Hi Daniel,
Thank you honestly for you time to look at this and cooperation.. Good decision to supply my directly with scripts like you want to deal with this. So this one was success, it worked:
peterg@debian:~/Downloads$ ./repro.sh
Cannot remove namespace file "/var/run/netns/ns0": No such file or directory
Cannot remove namespace file "/var/run/netns/ns1": No such file or directory
Cannot find device "br0"
PING 10.0.0.101 (10.0.0.101): 56 data bytes
64 bytes from 10.0.0.101: icmp_seq=0 ttl=64 time=0.057 ms
64 bytes from 10.0.0.101: icmp_seq=1 ttl=64 time=0.072 ms
64 bytes from 10.0.0.101: icmp_seq=2 ttl=64 time=0.037 ms
64 bytes from 10.0.0.101: icmp_seq=3 ttl=64 time=0.055 ms
--- 10.0.0.101 ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.037/0.055/0.072/0.000 ms
In the meantime, I was stubborn to find a solution to what I need in order to progress and MACVLAN tech actually delivered it (private mode enough), something newer than VETH tech what I could read about, and it's just perfect avoiding bridge itself. So no problem to cooperate on this fix, I will be glad, just it can get lower priority on your side if you even attributed it some 😊
Thanks, wishing successful new week.
Peter
Hi Peter,
looking at the ip/bridge dumps I don't see anything obviously broken so I started by building a reproducer using two netns'en and a bridge on the host to simulate your setup, leaving out the vlan stuff for now.
I setup two namespaces ns0/ns1 with a veth pair each connected to br0 on the host. I assign MAC addresses statically to make looking at `bridge fdb` easier (grep ^aa:). The script looks like this (trimmed, full version
attached):
ip netns add ns0
ip link add veth0 type veth \
peer name veth0 address aa:00:00:00:00:00 netns ns0
ip netns add ns1
ip link add veth1 type veth \
peer name veth1 address aa:00:00:00:00:01 netns ns1
ip link add br0 address aa:bb:bb:bb:bb:bb type bridge forward_delay 0
#^ forward_delay=0 to disable STP as this delays interfaces coming up
ip link set dev veth0 master br0
ip link set dev veth1 master br0
ip -n ns0 addr add 10.0.0.100/24 dev veth0
ip -n ns1 addr add 10.0.0.101/24 dev veth1
ip link set br0 up
ip link set dev veth0 up
ip -n ns0 link set dev veth0 up
ip link set dev veth1 up
ip -n ns1 link set dev veth1 up
ip -n ns0 link set dev lo up
ip -n ns1 link set dev lo up
ip netns exec ns0 ping -c4 10.0.0.101
Seems to work fine. So we can establish the basic functionality does work and we need to go deeper.
Peter, can you confirm this script works on your system? If so the next step is introducing vlans.
How did you check this?
Something simple like
$ tcpdump -enli $IFACE 'arp or icmp'
optionally you can filter by MAC (`ether host` in pcap-filter speak):
$ tcpdump -enli $IFACE ('arp or icmp) and ether host aa:00:00:00:00:01
Oh and one last thing: please double and tripple check that a firewall isn't interfering.
Hi Peter, I used to love macvlan too but now I do L3 instead ;P I'd be happy to still track this bug down but I need you to investigate the behaviour in your environment. If you've torn down the lab already we can also just call it quits. If you do want to continue some questions are still pending, see quoted below.
Hi Daniel, of course we can steadily move on, no problem. So now we move to VLAN level? BR Peter Hi Peter, I used to love macvlan too but now I do L3 instead ;P I'd be happy to still track this bug down but I need you to investigate the behaviour in your environment. If you've torn down the lab already we can also just call it quits. If you do want to continue some questions are still pending, see quoted below.
Hi Peter, Yeah, but I'm still waiting for the answers to my questions from two emails ago:
Hi Daniel, hope you are good, had peaceful Christmas time, entering yet better NY 2024 hope so... sorry for overlooking this, even wanted to respond early December, then got delayed again.. Now I do so as its still interesting to me! 1) yes, my sole quick method was "ip nei" command to confirm the ARP passthrough 2) no firewall at all, plain Debian installation 3) you will not believe --> but before Xmas and now, it all works and MAC is passed e2e. That's so pitty. Only change I made was my underlay change of vSwitch uplink to another port... because I re-considered my overall lab setup, yet it hardly could improve this as the external MAC made it to external (VLAN) iface of the bridge, before/. Anyhow, possibly I understand the "bridge fbd" only shows learned MACs on given interface (my VLAN199) and is not supposed to attribute it to all others all way up to NS, like I attempted to guess.. Finally, either this of MACVLAN setup (where I found this), I have new finding which I don’t like as it creates a hell of duplicate traffic into network. The problem is, that either VETH or MACVLAN-configured IP host's VM duplicates incoming packets on its receiving port, connected to vSphere vSwitch, which in turn just dully floods it to uplinks, where my Wireshark sniffer sees it. This is how I discovered that. I prepared this diagram for you to see and tell. https://docs.google.com/document/d/1mNkZswDSG_OjLnsgXJvIX2tUGSEebcZf720eS29eFCA/edit?usp=sharing BR, all the best wishes in NY2024! Peter Hi Peter, Yeah, but I'm still waiting for the answers to my questions from two emails ago:
Hello,Did you receive my previous message? Greetings. ------------------ Pozdrav, jeste li primili moju prethodnu poruku? Lijepi pozdrav.
Hello Peter, I have problems understanding your mail. Under 3) you write "it all works" but then there are still some issues about duplicate traffic (which isn't the original problem?). Can you please clearify if there is still something to do/fix? Best regards Uwe
Hi Uwe, Nice to meet you. Please allow me for one more week to come back to you as im on working absence, recovering from illness and did not address this stuff for long. Thanks for catching this either! BR Peter Hello Peter, I have problems understanding your mail. Under 3) you write "it all works" but then there are still some issues about duplicate traffic (which isn't the original problem?). Can you please clearify if there is still something to do/fix? Best regards Uwe
Hello, so finally reacting to this after long time, even missing late 2024. I've retested the OP and confirming L2 path problem vanished, so from this perspective this case can be closed. Problem with reflecting (I originally called "duplicated") VM-ingress packets with VETH/MACVLAN setups is present. How do we proceed? BR Peter Hello Peter, I have problems understanding your mail. Under 3) you write "it all works" but then there are still some issues about duplicate traffic (which isn't the original problem?). Can you please clearify if there is still something to do/fix? Best regards Uwe
Hi Your mails are really hard to read. Please stick to proper customs and fix your client to do proper quoting and restrict your line length. See also https://www.debian.org/MailingLists/#codeofconduct From the minimal description you gave, I would assume you see MACVLAN as it is supposed to work, at least depending on your exact config. Reflecting packets via the network infrastructure is normal behaviour, if configured this way. Please provide information about the concrete configuration used on the Linux box for the relevant network interfaces. And you really want to forego sending this to a public list. You made it public. Bastian
Hello, Thanks for commnent. I then wonder how you can get lost in my short communication, in contrast to ton of material you ever read to get a pro. I truly don't think why minimal VETH/MACVLAN config shall reflect the inbound traffic for anybody to see (in ESXI environment at least) -- where is it documented please? Thanks for teaching me here. Peter ATTENTION : Cet e-mail provient de l'extérieur de l'entreprise. Ne cliquez pas sur les liens ou n'ouvrez pas les pièces jointes à moins de connaitre l'expéditeur. -------------------------------------------------------------------------------------------------------------- Hi Your mails are really hard to read. Please stick to proper customs and fix your client to do proper quoting and restrict your line length. See also https://www.debian.org/MailingLists/#codeofconduct From the minimal description you gave, I would assume you see MACVLAN as it is supposed to work, at least depending on your exact config. Reflecting packets via the network infrastructure is normal behaviour, if configured this way. Please provide information about the concrete configuration used on the Linux box for the relevant network interfaces. And you really want to forego sending this to a public list. You made it public. Bastian
communication, in contrast to ton of material you ever read to get a pro. So far you have talked about namespaces veth and macvlan devices and vSwitch and nowhere have you actually explained how these are actually connected together and configured. In the Google doc you linked earlier you showed connections between VMs and vSwitches, but nothing about the virtual devices on each VM. inbound traffic for anybody to see (in ESXI environment at least) -- where is it documented please? I think there's been some confusion here. Bastian (and I) thought you were talking about the usual VEPA mode of macvlan, where macvlan devices attached to the same underlying device are not bridged and all packets transmitted on a macvlan device are then forwarded to the underlying devices. But on re-reading it seems like you are saying that packets received on the external interface of the VM are being forwarded back out of that same interface. I would agree this is unexpected behaviour, but until we see your actual configuration it's impossible to know whether this is a bug or misconfiguration. Ben.
Hi Ben,
nice to meet you and welcome to the case. All I said is true and VETH case was already covered by your predecessor Daniel back in 2023 (as the problem vanished on my side 2024 due to unknown reasons).
As to MACVLAN case, which seems to be missing here in sample provided, I do so now:
peterg@debian:~/Labs$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 00:50:56:01:01:01 brd ff:ff:ff:ff:ff:ff
4: ens224: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 00:50:56:01:01:02 brd ff:ff:ff:ff:ff:ff
5: ens256: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 00:50:56:01:01:03 brd ff:ff:ff:ff:ff:ff
6: ixia@ens224: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 00:50:56:01:01:02 brd ff:ff:ff:ff:ff:ff
7: dmz1@ens224: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 00:50:56:01:01:02 brd ff:ff:ff:ff:ff:ff
26: inet@ens224: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 00:50:56:01:01:02 brd ff:ff:ff:ff:ff:ff
peterg@debian:~/Labs$ ip net
proxy (id: 11)
inet (id: 8)
sniffer (id: 10)
s7 (id: 6)
s6 (id: 5)
s5 (id: 4)
s4 (id: 3)
s3 (id: 2)
s2 (id: 1)
s1 (id: 0)
peterg@debian:~/Labs$ more inet_setup
#!/bin/bash
ip link add link ens224 name inet type vlan id 199
ip link set dev inet up
ip netns add inet
ip link add link inet name lab_inet type macvlan mode private
ip link set lab_inet netns inet
ip -n inet link set dev lo up
ip -n inet link set dev lab_inet up
ip -n inet link set address 00:50:56:01:01:21 dev lab_inet
ip -n inet addr add 70.0.0.254/24 dev lab_inet
ip -n inet route add default via 70.0.0.253 dev lab_inet
ip -n inet route add 172.17.0.0/24 via 70.0.0.1
peterg@debian:~/Labs$ ip -n inet link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
27: lab_inet@if26: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 00:50:56:01:01:21 brd ff:ff:ff:ff:ff:ff link-netnsid 0
Again, port-VLAN based config does not exhibit any inbound traffic reflection and stays perfectly silent on its unicast sessions.
HTH, BR
Peter
-----Original Message-----
From: Ben Hutchings <ben@decadent.org.uk>
Sent: Wednesday, March 26, 2025 7:21 PM
To: GASPAROVIC Peter OBS/MKT <peter.gasparovic@orange.com>
Cc: 1054642@bugs.debian.org; Bastian Blank <waldi@debian.org>
communication, in contrast to ton of material you ever read to get a pro.
So far you have talked about namespaces veth and macvlan devices and vSwitch and nowhere have you actually explained how these are actually connected together and configured.
In the Google doc you linked earlier you showed connections between VMs and vSwitches, but nothing about the virtual devices on each VM.
inbound traffic for anybody to see (in ESXI environment at least) -- where is it documented please?
I think there's been some confusion here. Bastian (and I) thought you were talking about the usual VEPA mode of macvlan, where macvlan devices attached to the same underlying device are not bridged and all packets transmitted on a macvlan device are then forwarded to the underlying devices.
But on re-reading it seems like you are saying that packets received on the external interface of the VM are being forwarded back out of that same interface. I would agree this is unexpected behaviour, but until we see your actual configuration it's impossible to know whether this is a bug or misconfiguration.
Ben.