Dear Maintainer,
* What led up to the situation?
Attempting live migration of VM between hosts, either from virt-manager on a seperate workstation or from the host itself via terminal.
Example - virsh migrate --live web02 qemu+ssh://hypervisor01:64228/system
* What exactly did you do (or not do) that was effective (or ineffective)?
Investigated the following logs.
/var/log/syslog
Apr 18 09:25:45 hypervisor04 libvirtd[542]: Cannot start job (query, none, none) for domain email01; current job is (none, none, migration in) owned by (0 <null>, 0 <null>, 0 remoteDispatchDomainMigratePrepare3Params (flags=0x19)) for (0s, 0s, 305s)
/var/log/kern.log
Apr 17 22:45:45 hypervisor05 kernel: [ 804.114785] Internal error: Oops: 96000004 [#1] SMP
Apr 17 22:57:49 hypervisor05 kernel: [ 206.952482] Internal error: Oops: 96000004 [#1] SMP
Apr 18 01:06:11 hypervisor05 kernel: [ 463.133575] Internal error: Oops: 96000004 [#1] SMP
Apr 18 11:12:39 hypervisor05 kernel: [36851.073954] Internal error: Oops: 96000004 [#2] SMP
Apr 18 11:29:19 hypervisor05 kernel: [37850.896463] Internal error: Oops: 96000004 [#3] SMP
error in dmesg
[ 324.673078] audit: type=1400 audit(1650240228.486:23): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-72799f47-939c-4415-92c3-73ec371425fd" pid=979 comm="apparmor_parser"
[ 324.768028] audit: type=1400 audit(1650240228.582:24): apparmor="DENIED" operation="capable" profile="libvirtd" pid=542 comm="rpc-worker" capability=39 capname="bpf"
[ 324.774241] audit: type=1400 audit(1650240228.586:25): apparmor="DENIED" operation="capable" profile="libvirtd" pid=542 comm="rpc-worker" capability=38 capname="perfmon"
[ 326.770324] audit: type=1400 audit(1650240230.582:26): apparmor="DENIED" operation="capable" profile="libvirtd" pid=542 comm="rpc-worker" capability=39 capname="bpf"
I added
capability bpf,
capability perfmon,
to /etc/apparmor.d/usr.sbin.libvirtd which resolved the DENIED errors but did not resolve the live migration failures.
* What was the outcome of this action?
The following errors were produced.
kernel:\[37850.896463\] Internal error: Oops: 96000004 \[#3\] SMP
Message from syslogd@hypervisor05 at Apr 18 11:29:19 ...
kernel:\[37851.195226\] Code: 910003fd f9000bf3 2a0003f3 97ff7164 (b95ed801)
The VM that was submitted for migration ends up hung in a paused state. The only way to recover it is force power off on the VM, then 'sudo systemctl restart libvirtd.service'. The VM can then be powered on again normally.
* What outcome did you expect instead?
Live migration to complete successfully which has been the case on eariler kernel versions. However at this time I do not know which kernel versions worked other than the one it shipped with which was as follows.
linux-image-5.10.0-8-arm64 5.10.46-5 arm64 Linux 5.10 for 64-bit ARMv8 machines (signed)