#778291 setting unsupported timeout for i6300esb watchdog causes hw reset

Package:
qemu-system-x86
Source:
qemu
Description:
QEMU full system emulation binaries (x86)
Submitter:
Peter Colberg
Date:
2020-12-09 07:12:03 UTC
Severity:
normal
Tags:
#778291#5
Date:
2015-02-12 02:19:20 UTC
From:
To:
Dear Maintainer,

After upgrading to jessie from wheezy, I noticed that some of the
system logs experience corruption when rebooting the system.

After rebooting the system using the command ‘reboot’ or ‘systemctl
reboot’, some log files such as /var/log/syslog contain a hole with
nil bytes:

  Feb 11 14:48:22 himeji systemd[20266]: Reached target Sockets.
  Feb 11 14:48:22 himeji systemd[20266]: Starting Basic System.
  Feb 11 14:48:22 himeji systemd[20266]: Reached target Basic System.
  Feb 11 14:48:22 himeji systemd[20266]: Starting Default.
  Feb 11 14:48:22 himeji systemd[20266]: Reached target Default.
  Feb 11 14:48:22 himeji systemd[20266]: Startup finished in 3ms.
  ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
  ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
  ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
  ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
  ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
  ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
  ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
  ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
  ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
  ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
  ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
  ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Feb 11 14:54:55 himeji rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="539" x-info="http://www.rsyslog.com"] start
  Feb 11 14:54:55 himeji systemd-fsck[137]: /dev/vda3: clean, 81576/2621440 files, 889144/10485760 blocks
  Feb 11 14:54:55 himeji systemd-fsck[233]: /dev/vda4: clean, 2187/3670016 files, 1858872/14679547 blocks
  Feb 11 14:54:55 himeji networking[245]: Configuring network interfaces...Waiting for DAD... Done
  Feb 11 14:54:55 himeji networking[245]: done.

This happened with log files managed by rsyslog as well as one log
file not managed by rsyslog. I assume this issue is not related to
logging, but rather manifests itself in any files that are modified
right before the reboot, e.g. due to daemons logging a shutdown
notice.

I could reproduce this issue with three machines that are virtualized
with qemu-kvm. On the other hand, I did not see this issue with two
physical machines. All of these are running the above version of
systemd.

Further, I found a work-around to avoid data corruption as seen in the
log files. If the system is rebooted through an external ACPI event
(sent to the qemu-kvm process), /var/log/syslog is not corrupted:

  Feb 11 18:17:03 himeji systemd[4412]: Reached target Default.
  Feb 11 18:17:03 himeji systemd[4412]: Startup finished in 4ms.
  Feb 11 18:17:04 himeji systemd[4412]: Stopping Default.
  Feb 11 18:17:04 himeji systemd[4412]: Stopped target Default.
  Feb 11 18:17:04 himeji systemd[4412]: Stopping Basic System.
  Feb 11 18:17:04 himeji systemd[4412]: Stopped target Basic System.
  Feb 11 18:17:04 himeji systemd[4412]: Stopping Paths.
  Feb 11 18:17:04 himeji systemd[4412]: Stopped target Paths.
  Feb 11 18:17:04 himeji systemd[4412]: Stopping Timers.
  Feb 11 18:17:04 himeji systemd[4412]: Stopped target Timers.
  Feb 11 18:17:04 himeji systemd[4412]: Stopping Sockets.
  Feb 11 18:17:04 himeji systemd[4412]: Stopped target Sockets.
  Feb 11 18:17:04 himeji systemd[4412]: Starting Shutdown.
  Feb 11 18:17:04 himeji systemd[4412]: Reached target Shutdown.
  Feb 11 18:17:04 himeji systemd[4412]: Starting Exit the Session...
  Feb 11 18:17:04 himeji systemd[4412]: Received SIGRTMIN+24 from PID 4419 (kill).
  Feb 11 18:25:40 himeji ntpd[461]: ntpd exiting on signal 15
  Feb 11 18:25:40 himeji spamd[972]: spamd: server killed by SIGTERM, shutting down
  Feb 11 18:25:40 himeji fetchmail[618]: terminated with signal 15
  Feb 11 18:25:40 himeji rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="539" x-info="http://www.rsyslog.com"] exiting on signal 15.
  Feb 11 18:28:11 himeji rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="527" x-info="http://www.rsyslog.com"] start
  Feb 11 18:28:11 himeji systemd-fsck[135]: /dev/vda3: clean, 80518/2621440 files, 878036/10485760 blocks
  Feb 11 18:28:11 himeji systemd-fsck[230]: /dev/vda4: clean, 2187/3670016 files, 1858872/14679547 blocks
  Feb 11 18:28:11 himeji networking[242]: Configuring network interfaces...Waiting for DAD... Done
  Feb 11 18:28:11 himeji networking[242]: done.

In summary, an externally-trigged reboot works fine, while an
internally-trigged reboot causes data corruption when systemd
is running in a virtual machine.

Please find attached to this bug report the debugging output of
systemd obtained via the serial console for two cases, a reboot
triggered internally (through “reboot”) and a reboot triggered
externally (through ACPI).

I noticed that for the externally-triggered reboot, there are a dozen
systemd-shutdown messages towards the end, of which one reads

  [   53.200957] systemd-shutdown[1]: Unmounting file systems.

These messages are completely missing for the internally-triggered
reboot, where it appears that systemd-shutdown is not invoked at all.

Please let me know how I can debug this issue further.

Regards,
Peter

#778291#10
Date:
2015-02-12 03:28:38 UTC
From:
To:
control: tags -1 moreinfo unreproducible

Am 12.02.2015 um 03:19 schrieb Peter Colberg:
dist-upgrade from wheezy (sysvinit) to jessie (systemd-sysv) on reboot
or everytime you reboot your jessie system?



What makes you think this is a disk corruption? The log message you
provided shows that the disks are clean?

Which other file? What file system is this?

 I assume this issue is not related to

Can you share such a (gzipped) qemu-kvm image or provide instructions,
how we can reproduce the problem?

Since systemd-logind reacts on external ACPI power button presses with
the same command (systemctl poweroff), I'm suprised that this is
supposed to make a difference.


Can you follow the steps at [1] and provide a verbose debug log from
boot and poweroff by logging to the serial console.


[1] http://freedesktop.org/wiki/Software/systemd/Debugging/

#778291#17
Date:
2015-02-12 04:44:59 UTC
From:
To:
control: severity -1 important

Am 12.02.2015 um 04:28 schrieb Michael Biebl:

Fwiw, I pulled the debian_wheezy_amd64_desktop.qcow2 qemu-kvm image from
[1], started it with "qemu-system-x86_64 -enable-kvm -hda
debian_wheezy_amd64_standard.qcow2".
Then did a dist-upgrade to jessie, which replaced sysvinit with
systemd-sysv as default.
Did a couple of reboots and couldn't find any corruption in any of the
rsyslog managed log files.



[1] https://people.debian.org/~aurel32/qemu/amd64/

#778291#24
Date:
2015-02-12 05:22:02 UTC
From:
To:
It happens everytime I reboot from within the VM.

Yes, this is not disk corruption. The fsck at bootup is clean.

This is more acurately described as data loss. I see it in the form of
log data produced shortly before the reboot (that is after reboot is
issued and before the machine is reset) not being written to disk.
Instead, the log files contain variably-sized holes of nil bytes.

So far I noticed corruption in /var/log/syslog and /var/log/daemon.log
which are written by rsyslog, and in /var/log/prosody/prosody.log
which is written by prosody.

Each VM has attached a single virtio disk with a GPT partition label
that contains a swap partition, an ext4 filesystem for /, and further
ext4 filesystems for /home and /srv.

Sorry, I cannot share the machine images since they contain private
data. However, I have now disabled any non-essential services on one
machine. Please see the file 'ps_aux' that shows all processes running
when logged in via ssh. I can still reproduce the corruption with this
mimimal configuration.
qemu-kvm emulates a watchdog device. I am using libvirt, please see
the file 'himeji.xml' for the machine setup.

When I remove the watchdog device from the VM, the corruption
disappears and /var/log/syslog contains the expected last and
first messages before and after the reboot:

Feb 11 23:41:19 himeji rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="532" x-info="http://www.rsyslog.com"] exiting on signal 15.
Feb 11 23:41:57 himeji rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="518" x-info="http://www.rsyslog.com"] start

I captured a reboot of the VM with and without a watchdog device. Please note
that reboots in a VM with watchdog device were working fine with sysvinit.

Does systemd always use /dev/watchdog? /etc/systemd/system.conf was left
unmodified, in particular it does not define RuntimeWatchdogSec.

Regards,
Peter

#778291#29
Date:
2015-02-12 05:57:33 UTC
From:
To:
Good idea! I did the same and then started the VM with

qemu-system-x86_64 -curses -enable-kvm -device i6300esb -watchdog-action reset -hda debian_wheezy_amd64_standard.qcow2

A reboot triggers corruption in /var/log/syslog with a hole of nils.

#778291#34
Date:
2015-02-12 06:14:05 UTC
From:
To:
control: tags -1 = confirmed
control: retitle -1 data corruption on reboot when using watchdog/qemu

Am 12.02.2015 um 06:57 schrieb Peter Colberg:

Thanks, I was able to reproduce the issue this way (marking the bug
accordingly).
It's likely that the watchog is resetting systemd/rsyslog before it has
a chance to stop properly.

Let's see, where we can go from here.

#778291#43
Date:
2015-02-12 06:29:58 UTC
From:
To:
In a vimdiff comparing the two console logs I sent earlier, the last
messages before the reboot of the VM with watchdog are

  Failed to send unit remove signal for final.target: Transport endpoint is not connected
  Hardware watchdog 'i6300ESB timer', version 0

while the VM without watchdog does unmounts after final.target

  Failed to send unit remove signal for final.target: Transport endpoint is not connected
  Sending SIGTERM to remaining processes...
  [   62.134791] systemd-journald[146]: Received SIGTERM from PID 1 (systemd-shutdow).
  Sending SIGKILL to remaining processes...
  Unmounting file systems.
  Unmounting /dev/mqueue.
  Unmounting /sys/kernel/debug.
  Unmounting /dev/hugepages.
  [   62.314461] EXT4-fs (vda3): re-mounted. Opts: (null)
  [   62.316377] EXT4-fs (vda3): re-mounted. Opts: (null)
  [   62.316999] EXT4-fs (vda3): re-mounted. Opts: (null)
  All filesystems unmounted.
  Deactivating swaps.
  All swaps deactivated.
  Detaching loop devices.
  All loop devices detached.
  Detaching DM devices.
  All DM devices detached.
  /lib/systemd/system-shutdown succeeded.
  [   62.382524] Unregister pv shared memory for cpu 2
  Rebooting.
  [   62.382524] Unregister pv shared memory for cpu 0
  [   62.382524] Unregister pv shared memory for cpu 1
  [   62.385199] Unregister pv shared memory for cpu 3
  [   62.386474] reboot: Restarting system
  [   62.387022] reboot: machine restart


It is curious that the bug only manifests itself with the qemu watchdog.
Two physical machines using the hardware iTCO_wdt watchdog reboot fine.

#778291#48
Date:
2015-02-12 06:34:48 UTC
From:
To:
Am 12.02.2015 um 07:29 schrieb Peter Colberg:

In my tests, I did indeed have an unclean shutdown, and got an fsck on
next boot. Which looks to me, as if the watchdog i.e. qemu resets the VM
before it can shutdown cleanly.

#778291#53
Date:
2015-02-12 07:18:22 UTC
From:
To:
Am 12.02.2015 um 07:34 schrieb Michael Biebl:
case, the system is properly shutdown.
Only on "systemctl reboot" I'm to reproduce the problem, that the VM is
reset after having reached the shutdown target.

With poweroff it looks like this (dump from the serial console log)


...
[^[[32m  OK  ^[[0m] Stopped target Local File Systems.
         Unmounting /run/user/0...
[^[[32m  OK  ^[[0m] Stopped Load/Save Random Seed.
[^[[32m  OK  ^[[0m] Unmounted /run/user/0.
[^[[32m  OK  ^[[0m] Reached target Unmount All Filesystems.
[^[[32m  OK  ^[[0m] Stopped target Local File Systems (Pre).
         Stopping Create Static Device Nodes in /dev...
[^[[32m  OK  ^[[0m] Stopped Create Static Device Nodes in /dev.
         Stopping Remount Root and Kernel File Systems...
[^[[32m  OK  ^[[0m] Stopped Remount Root and Kernel File Systems.
[^[[32m  OK  ^[[0m] Reached target Shutdown.
[    7.381639] systemd-shutdown[1]: Sending SIGTERM to remaining
processes...
[    7.383562] systemd-journald[127]: Received SIGTERM from PID 1
(systemd-shutdow).
[    7.434620] systemd-shutdown[1]: Sending SIGKILL to remaining
processes...
[    7.437469] systemd-shutdown[1]: Unmounting file systems.
[    7.438414] systemd-shutdown[1]: Unmounting /sys/kernel/debug.
[    7.439219] systemd-shutdown[1]: Unmounting /dev/mqueue.
[    7.439959] systemd-shutdown[1]: Unmounting /dev/hugepages.
[    7.474276] EXT4-fs (vda1): re-mounted. Opts: (null)
[    7.475479] EXT4-fs (vda1): re-mounted. Opts: (null)
[    7.475881] EXT4-fs (vda1): re-mounted. Opts: (null)
[    7.476303] systemd-shutdown[1]: All filesystems unmounted.
[    7.476754] systemd-shutdown[1]: Deactivating swaps.
[    7.477228] systemd-shutdown[1]: All swaps deactivated.
[    7.477667] systemd-shutdown[1]: Detaching loop devices.
[    7.478984] systemd-shutdown[1]: All loop devices detached.
[    7.479815] systemd-shutdown[1]: Detaching DM devices.
[    7.480675] systemd-shutdown[1]: All DM devices detached.
[    7.482287] systemd-shutdown[1]: Powering off.
[    7.503279] ACPI: Preparing to enter system sleep state S5
[    7.503821] reboot: Power down


with reboot:

[^[[32m  OK  ^[[0m] Reached target Unmount All Filesystems.
[^[[32m  OK  ^[[0m] Stopped target Local File Systems (Pre).
         Stopping Create Static Device Nodes in /dev...
[^[[32m  OK  ^[[0m] Stopped Create Static Device Nodes in /dev.
         Stopping Remount Root and Kernel File Systems...
[^[[32m  OK  ^[[0m] Stopped Remount Root and Kernel File Systems.
[^[[32m  OK  ^[[0m] Reached target Shutdown.

nothing beyond this line


I then changed the watchdog action to dump. This time I get the
following log on reboot:

[^[[32m  OK  ^[[0m] Stopped Create Static Device Nodes in /dev.
         Stopping Remount Root and Kernel File Systems...
[^[[32m  OK  ^[[0m] Stopped Remount Root and Kernel File Systems.
[^[[32m  OK  ^[[0m] Reached target Shutdown.
[    9.402243] i6300esb: Unexpected close, not stopping watchdog!
[    9.448797] systemd-shutdown[1]: Sending SIGTERM to remaining
processes...
[    9.514019] systemd-journald[127]: Received SIGTERM from PID 1
(systemd-shutdow).
[    9.541822] systemd-shutdown[1]: Sending SIGKILL to remaining
processes...
[    9.546845] systemd-shutdown[1]: Hardware watchdog 'i6300ESB timer',
version 0
[    9.606194] systemd-shutdown[1]: Unmounting file systems.
[    9.608565] systemd-shutdown[1]: Unmounting /sys/kernel/debug.
[    9.610247] systemd-shutdown[1]: Unmounting /dev/mqueue.
[    9.611636] systemd-shutdown[1]: Unmounting /dev/hugepages.
[    9.664149] EXT4-fs (vda1): re-mounted. Opts: (null)
[    9.666461] EXT4-fs (vda1): re-mounted. Opts: (null)
[    9.666976] EXT4-fs (vda1): re-mounted. Opts: (null)
[    9.667571] systemd-shutdown[1]: All filesystems unmounted.
[    9.668304] systemd-shutdown[1]: Deactivating swaps.
[    9.668988] systemd-shutdown[1]: All swaps deactivated.
[    9.669632] systemd-shutdown[1]: Detaching loop devices.
[    9.671348] systemd-shutdown[1]: All loop devices detached.
[    9.672155] systemd-shutdown[1]: Detaching DM devices.
[    9.672949] systemd-shutdown[1]: All DM devices detached.
[    9.674747] systemd-shutdown[1]: Rebooting.


Note this line:
[    9.402243] i6300esb: Unexpected close, not stopping watchdog!

#778291#58
Date:
2015-02-12 07:31:33 UTC
From:
To:
Am 12.02.2015 um 07:34 schrieb Michael Biebl:
case, the system is properly shutdown.
Only on "systemctl reboot" I'm to reproduce the problem, that the VM is
reset after having reached the shutdown target.

With poweroff it looks like this (dump from the serial console log)


...
[^[[32m  OK  ^[[0m] Stopped target Local File Systems.
         Unmounting /run/user/0...
[^[[32m  OK  ^[[0m] Stopped Load/Save Random Seed.
[^[[32m  OK  ^[[0m] Unmounted /run/user/0.
[^[[32m  OK  ^[[0m] Reached target Unmount All Filesystems.
[^[[32m  OK  ^[[0m] Stopped target Local File Systems (Pre).
         Stopping Create Static Device Nodes in /dev...
[^[[32m  OK  ^[[0m] Stopped Create Static Device Nodes in /dev.
         Stopping Remount Root and Kernel File Systems...
[^[[32m  OK  ^[[0m] Stopped Remount Root and Kernel File Systems.
[^[[32m  OK  ^[[0m] Reached target Shutdown.
[    7.381639] systemd-shutdown[1]: Sending SIGTERM to remaining
processes...
[    7.383562] systemd-journald[127]: Received SIGTERM from PID 1
(systemd-shutdow).
[    7.434620] systemd-shutdown[1]: Sending SIGKILL to remaining
processes...
[    7.437469] systemd-shutdown[1]: Unmounting file systems.
[    7.438414] systemd-shutdown[1]: Unmounting /sys/kernel/debug.
[    7.439219] systemd-shutdown[1]: Unmounting /dev/mqueue.
[    7.439959] systemd-shutdown[1]: Unmounting /dev/hugepages.
[    7.474276] EXT4-fs (vda1): re-mounted. Opts: (null)
[    7.475479] EXT4-fs (vda1): re-mounted. Opts: (null)
[    7.475881] EXT4-fs (vda1): re-mounted. Opts: (null)
[    7.476303] systemd-shutdown[1]: All filesystems unmounted.
[    7.476754] systemd-shutdown[1]: Deactivating swaps.
[    7.477228] systemd-shutdown[1]: All swaps deactivated.
[    7.477667] systemd-shutdown[1]: Detaching loop devices.
[    7.478984] systemd-shutdown[1]: All loop devices detached.
[    7.479815] systemd-shutdown[1]: Detaching DM devices.
[    7.480675] systemd-shutdown[1]: All DM devices detached.
[    7.482287] systemd-shutdown[1]: Powering off.
[    7.503279] ACPI: Preparing to enter system sleep state S5
[    7.503821] reboot: Power down


with reboot:

[^[[32m  OK  ^[[0m] Reached target Unmount All Filesystems.
[^[[32m  OK  ^[[0m] Stopped target Local File Systems (Pre).
         Stopping Create Static Device Nodes in /dev...
[^[[32m  OK  ^[[0m] Stopped Create Static Device Nodes in /dev.
         Stopping Remount Root and Kernel File Systems...
[^[[32m  OK  ^[[0m] Stopped Remount Root and Kernel File Systems.
[^[[32m  OK  ^[[0m] Reached target Shutdown.

nothing beyond this line


I then changed the watchdog action to dump. This time I get the
following log on reboot:

[^[[32m  OK  ^[[0m] Stopped Create Static Device Nodes in /dev.
         Stopping Remount Root and Kernel File Systems...
[^[[32m  OK  ^[[0m] Stopped Remount Root and Kernel File Systems.
[^[[32m  OK  ^[[0m] Reached target Shutdown.
[    9.402243] i6300esb: Unexpected close, not stopping watchdog!
[    9.448797] systemd-shutdown[1]: Sending SIGTERM to remaining
processes...
[    9.514019] systemd-journald[127]: Received SIGTERM from PID 1
(systemd-shutdow).
[    9.541822] systemd-shutdown[1]: Sending SIGKILL to remaining
processes...
[    9.546845] systemd-shutdown[1]: Hardware watchdog 'i6300ESB timer',
version 0
[    9.606194] systemd-shutdown[1]: Unmounting file systems.
[    9.608565] systemd-shutdown[1]: Unmounting /sys/kernel/debug.
[    9.610247] systemd-shutdown[1]: Unmounting /dev/mqueue.
[    9.611636] systemd-shutdown[1]: Unmounting /dev/hugepages.
[    9.664149] EXT4-fs (vda1): re-mounted. Opts: (null)
[    9.666461] EXT4-fs (vda1): re-mounted. Opts: (null)
[    9.666976] EXT4-fs (vda1): re-mounted. Opts: (null)
[    9.667571] systemd-shutdown[1]: All filesystems unmounted.
[    9.668304] systemd-shutdown[1]: Deactivating swaps.
[    9.668988] systemd-shutdown[1]: All swaps deactivated.
[    9.669632] systemd-shutdown[1]: Detaching loop devices.
[    9.671348] systemd-shutdown[1]: All loop devices detached.
[    9.672155] systemd-shutdown[1]: Detaching DM devices.
[    9.672949] systemd-shutdown[1]: All DM devices detached.
[    9.674747] systemd-shutdown[1]: Rebooting.


Note this line:
[    9.402243] i6300esb: Unexpected close, not stopping watchdog!

#778291#63
Date:
2015-02-12 07:50:59 UTC
From:
To:
Am 12.02.2015 um 08:31 schrieb Michael Biebl:

...
shutdown, it also seems to only happen with i6300esb. Switching the qemu
config to use ib700 instead of i6300esb, I am not able to trigger the
problem. This confirm's Peter's findings who was not able to reproduce
it on real hardware.

I wonder, if this is a qemu issue, after all?

CCing our Debian qemu maintainer. Maybe he has some insight on this
matter (the complete bug log is at [1]).

Michael

[1] http://bugs.debian.org/777735

#778291#68
Date:
2015-02-12 08:54:09 UTC
From:
To:
For completeness, these are the messages using ib700 with action "reset":

[   11.441964] systemd[1]: Reached target Shutdown.
[   11.442542] systemd[1]: Starting Final Step.
[   11.443075] systemd[1]: Reached target Final Step.
[   11.443599] systemd[1]: Starting Reboot...
[   11.445617] systemd[1]: Shutting down.
[   11.473895] systemd[1]: Failed to set timeout to 600s: Invalid argument
[   11.474649] ib700wdt: WDT device closed unexpectedly.  WDT will not stop!
[   11.497648] systemd-shutdown[1]: Sending SIGTERM to remaining processes...
[   11.500527] systemd-journald[148]: Received SIGTERM from PID 1 (systemd-shutdow).
[   11.506153] systemd-shutdown[1]: Sending SIGKILL to remaining processes...
[   11.508942] systemd-shutdown[1]: Hardware watchdog 'IB700 WDT', version 1
[   11.510122] systemd-shutdown[1]: Unmounting file systems.
[   11.511108] systemd-shutdown[1]: Unmounting /dev/hugepages.
[   11.536126] systemd-shutdown[1]: Unmounting /sys/kernel/debug.
[   11.568089] systemd-shutdown[1]: Unmounting /dev/mqueue.
[   11.719123] EXT4-fs (vda3): re-mounted. Opts: (null)
[   11.722559] EXT4-fs (vda3): re-mounted. Opts: (null)
[   11.723643] EXT4-fs (vda3): re-mounted. Opts: (null)
[   11.724687] systemd-shutdown[1]: All filesystems unmounted.
[   11.725834] systemd-shutdown[1]: Deactivating swaps.
[   11.726946] systemd-shutdown[1]: All swaps deactivated.
[   11.728077] systemd-shutdown[1]: Detaching loop devices.
[   11.749468] systemd-shutdown[1]: All loop devices detached.
[   11.750649] systemd-shutdown[1]: Detaching DM devices.
[   11.751759] systemd-shutdown[1]: All DM devices detached.
[   11.766548] systemd-shutdown[1]: Rebooting.
[   11.767417] Unregister pv shared memory for cpu 0
[   11.767421] Unregister pv shared memory for cpu 3
[   11.767460] Unregister pv shared memory for cpu 1
[   11.769581] Unregister pv shared memory for cpu 2
[   11.770811] reboot: Restarting system
[   11.771454] reboot: machine restart

This looks similar to i6300esb with action "dump".

[   11.473895] systemd[1]: Failed to set timeout to 600s: Invalid argument
[   11.474649] ib700wdt: WDT device closed unexpectedly.  WDT will not stop!

Both watchdogs close unexpectedly, but i6300esb immediately resets the
machine, while ib700wdt allows the machine to continue the shutdown.

Peter

#778291#73
Date:
2015-02-13 06:44:47 UTC
From:
To:
Am 12.02.2015 um 09:54 schrieb Peter Colberg:
a/ the watchdog driver(s) not accepting timeouts of 600s (or rather
anything > 255s)
b/ the i6300esb watchdog driver resetting the machine if such a timeout
is set.

Arguably, b/ is the more nasty one.
After discussion witch upstream, we came to the conclusion that both
issues should be fixed in the driver/kernel.

I went on and looked what the watchdog package does. Interestingly, it
clamps any timeout value to 254s, thus not hitting this issue.
Apparently, i6300esb is able to deal with such a timeout.

Peter, I therefore recommend you set

 ShutdownWatchdogSec=254s

in /etc/systemd/system.conf as well for the time being. This should be
sufficient to workaround this issue for this buggy driver.
Setting 256s, does indeed trigger the issue, which leads to the
conclusion, that the driver is using a char to hold a time value (0-255).

I wonder, if we should workaround this issue by clamping the timeout to
254s as well in systemd. Upstream was strictly against this though, as
he doesn't want to paper over such issues but rather prefers to see the
issue fixed at its source. So we'd have to carry a downstream patch for
this.

#778291#78
Date:
2015-02-13 07:02:09 UTC
From:
To:
Am 13.02.2015 um 07:44 schrieb Michael Biebl:

Actually, after a few more experiments, it turns out that for i6300esb,
the safe max value in my qemu VM is 549s. Setting it to 550s triggers
the reset.

#778291#83
Date:
2015-02-13 07:55:20 UTC
From:
To:
In the linux-3.16.7-ckt4 source the range of the timeout value for
the i6300esb is 1s < timeout < 2*1023s. In theory it should work
with the default shutdown timeout of 10min.

For the ib700 the range is 0s <= timeout <= 30s, hence the error.

Peter

#778291#88
Date:
2015-02-13 08:11:04 UTC
From:
To:
Am 13.02.2015 um 08:55 schrieb Peter Colberg:

This again might suggest, that his is a hardware, i.e. qemu bug?

Oh my, this is seriously messed up :-/

#778291#93
Date:
2015-02-13 09:11:44 UTC
From:
To:
clone 777735 -1
reassign -1 qemu-system-x86
retile -1 setting unsupported timeout for i6300esb watchdog causes reset
thanks

Am 13.02.2015 um 08:55 schrieb Peter Colberg:

I decided to clone this bug and assign it to qemu.
If the kernel is supposed to support such timeouts, this looks like a
hardware/qemu issue to me then.
While not supporting the timeout is one issue, the more important one
is, that this causes a hw reset and this needs to be fixed.

If this turns out to be indeed a kernel bug, please re-assign accordingly.

I'll keep this bug report against systemd open for new. Haven't decided
yet, if we apply some workarounds to systemd for this, like using a
"safer" default like 4min on shutdown, assuming that because the
watchdog package has been using 254 as max timeout in the past, drivers
are at least tested with that limit.


Michael