Dear Maintainer,
After performing a recent upgrade of many debian packages and rebooting
I have found teamd is stuck at using 100% of one of my CPU. Restarts are
not changing this behavior.
I've run strace on the process and noticed it was mostly netlink
traffic:
# strace -p {teamd pid} -T -ttt
1586953451.038076 epoll_wait(10, [{EPOLLIN, {u32=8, u64=8}}], 2, -1) = 1 <0.000032>
1586953451.038222 recvmsg(8, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=0x000008}, msg_namelen=12, msg_iov=[{iov_base=[{{len=72, type=team, flags=NLM_F_MULTI, seq=0, pid=0}, "\x02\x01\x00\x00\x08\x00\x01\x00\x0f\x00\x00\x00\x2c\x00\x02\x00\x28\x00\x01\x00\x0c\x00\x01\x00\x65\x6e\x61\x62\x6c\x65\x64\x00"...}, {len=16, type=NLMSG_DONE, flags=NLM_F_MULTI, seq=0, pid=0}], iov_len=16384}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_PEEK|MSG_TRUNC) = 88 <0.000042>
1586953451.038650 recvmsg(8, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=0x000008}, msg_namelen=12, msg_iov=[{iov_base=[{{len=72, type=team, flags=NLM_F_MULTI, seq=0, pid=0}, "\x02\x01\x00\x00\x08\x00\x01\x00\x0f\x00\x00\x00\x2c\x00\x02\x00\x28\x00\x01\x00\x0c\x00\x01\x00\x65\x6e\x61\x62\x6c\x65\x64\x00"...}, {len=16, type=NLMSG_DONE, flags=NLM_F_MULTI, seq=0, pid=0}], iov_len=16384}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 88 <0.000061>
1586953451.038994 sendmsg(7, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base={{len=68, type=team, flags=NLM_F_REQUEST|NLM_F_ACK, seq=1589353843, pid=3548384774}, "\x01\x00\x00\x00\x08\x00\x01\x00\x0f\x00\x00\x00\x28\x00\x02\x00\x24\x00\x01\x00\x0c\x00\x01\x00\x65\x6e\x61\x62\x6c\x65\x64\x00"...}, iov_len=68}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 68 <0.000109>
1586953451.039295 recvmsg(7, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base={{len=36, type=NLMSG_ERROR, flags=NLM_F_CAPPED, seq=1589353843, pid=3548384774}, {error=0, msg={len=68, type=team, flags=NLM_F_REQUEST|NLM_F_ACK, seq=1589353843, pid=3548384774}}}, iov_len=16384}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_PEEK|MSG_TRUNC) = 36 <0.000064>
1586953451.039523 recvmsg(7, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base={{len=36, type=NLMSG_ERROR, flags=NLM_F_CAPPED, seq=1589353843, pid=3548384774}, {error=0, msg={len=68, type=team, flags=NLM_F_REQUEST|NLM_F_ACK, seq=1589353843, pid=3548384774}}}, iov_len=16384}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 36 <0.000035>
1586953451.039956 select(17, [3 10 11 16], [], [], NULL) = 1 (in [10]) <0.000039>
And then I used the teamnl command to look at the situation as well:
# teamnl lan monitor
options:
*enabled (port:enp4s0f0) true changed
options:
*enabled (port:enp4s0f0) true changed
options:
*enabled (port:enp4s0f0) true changed
options:
*enabled (port:enp4s0f1) true changed
options:
*enabled (port:enp4s0f1) true changed
options:
*enabled (port:enp4s0f1) true changed
options:
*enabled (port:enp4s0f0) true changed
options:
*enabled (port:enp4s0f0) true changed
options:
*enabled (port:enp4s0f0) true changed
options:
*enabled (port:enp4s0f1) true changed
options:
*enabled (port:enp4s0f1) true changed
options:
*enabled (port:enp4s0f1) true changed
Here is my team config:
{
"device": "lan",
"hwaddr": "DE:AD:BE:EF:00:01",
"runner": {
"name": "loadbalance",
"tx_hash": ["eth", "ipv4", "ipv6"]
},
"link_watch": {
"name": "ethtool"
},
"ports": {"enp4s0f0": {}, "enp4s0f1": {}}
}
Any advice on debugging or next steps would be appreciated.