#1009878 /usr/sbin/libvirtd: Live migration of guest VM fails with kernel oops error

Package:
libvirt-daemon
Source:
libvirt
Description:
Virtualization daemon
Submitter:
Ross Moutell
Date:
2022-04-19 19:18:21 UTC
Severity:
important
#1009878#5
Date:
2022-04-19 19:13:12 UTC
From:
To:
Dear Maintainer,

   * What led up to the situation?

     Attempting live migration of VM between hosts, either from virt-manager on a seperate workstation or from the host itself via terminal.

     Example - virsh migrate --live web02 qemu+ssh://hypervisor01:64228/system

   * What exactly did you do (or not do) that was effective (or ineffective)?

     Investigated the following logs.

     /var/log/syslog

     Apr 18 09:25:45 hypervisor04 libvirtd[542]: Cannot start job (query, none, none) for domain email01; current job is (none, none, migration in) owned by (0 <null>, 0 <null>, 0 remoteDispatchDomainMigratePrepare3Params (flags=0x19)) for (0s, 0s, 305s)

     /var/log/kern.log

     Apr 17 22:45:45 hypervisor05 kernel: [  804.114785] Internal error: Oops: 96000004 [#1] SMP
     Apr 17 22:57:49 hypervisor05 kernel: [  206.952482] Internal error: Oops: 96000004 [#1] SMP
     Apr 18 01:06:11 hypervisor05 kernel: [  463.133575] Internal error: Oops: 96000004 [#1] SMP
     Apr 18 11:12:39 hypervisor05 kernel: [36851.073954] Internal error: Oops: 96000004 [#2] SMP
     Apr 18 11:29:19 hypervisor05 kernel: [37850.896463] Internal error: Oops: 96000004 [#3] SMP

     error in dmesg

     [  324.673078] audit: type=1400 audit(1650240228.486:23): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-72799f47-939c-4415-92c3-73ec371425fd" pid=979 comm="apparmor_parser"
     [  324.768028] audit: type=1400 audit(1650240228.582:24): apparmor="DENIED" operation="capable" profile="libvirtd" pid=542 comm="rpc-worker" capability=39  capname="bpf"
     [  324.774241] audit: type=1400 audit(1650240228.586:25): apparmor="DENIED" operation="capable" profile="libvirtd" pid=542 comm="rpc-worker" capability=38  capname="perfmon"
     [  326.770324] audit: type=1400 audit(1650240230.582:26): apparmor="DENIED" operation="capable" profile="libvirtd" pid=542 comm="rpc-worker" capability=39  capname="bpf"

     I added

      capability bpf,
      capability perfmon,

     to /etc/apparmor.d/usr.sbin.libvirtd which resolved the DENIED errors but did not resolve the live migration failures.

   * What was the outcome of this action?

     The following errors were produced.

     kernel:\[37850.896463\] Internal error: Oops: 96000004 \[#3\] SMP

     Message from syslogd@hypervisor05 at Apr 18 11:29:19 ...

     kernel:\[37851.195226\] Code: 910003fd f9000bf3 2a0003f3 97ff7164 (b95ed801)

     The VM that was submitted for migration ends up hung in a paused state. The only way to recover it is force power off on the VM, then 'sudo systemctl restart libvirtd.service'. The VM can then be powered on again normally.

   * What outcome did you expect instead?

     Live migration to complete successfully which has been the case on eariler kernel versions. However at this time I do not know which kernel versions worked other than the one it shipped with which was as follows.

     linux-image-5.10.0-8-arm64  5.10.46-5  arm64  Linux 5.10 for 64-bit ARMv8 machines (signed)