#933916 IO hang on guest after stop / cont commands to monitor

Package:
qemu-system-x86
Source:
qemu
Description:
QEMU full system emulation binaries (x86)
Submitter:
Michael Stroucken
Date:
2025-08-13 07:21:01 UTC
Severity:
important
Tags:
#933916#5
Date:
2019-08-05 04:38:10 UTC
From:
To:
I have VM servers that are running Jessie. On testing a migration of one
to stretch, I noticed that one of the VMs was getting stuck on IO.

I have noticed that this happens when guest is performing IO, and the
system issues a "stop" command via the monitor followed by a "cont"
command. The system does this to allow for guest filesystem
snapshotting. Only the virtual disk that was being accessed will remain
stuck.

The kernel of the system is linux-image-4.9.0-9-amd64 4.9.168-1+deb9u4
Linux HOST 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1+deb9u4 (2019-07-19)
x86_64 GNU/Linux

The problem does not occur with the jessie version of qemu running on
the stretch kernel (qemu-system-x86 1:2.1+dfsg-12+deb8u11).

Conversely, the problem does occur with the stretch version of qemu
running on the jessie kernel (linux-image-3.16.0-10-amd64 3.16.70-1).

The command line of the guest is as follows:
qemu-system-x86_64 -enable-kvm -enable-kvm -cpu qemu64,+x2apic
-daemonize -pidfile /guest/run/GUEST.pid -name GUEST,process=GUEST -m
128 -nodefaults -monitor unix:/guest/run/GUEST.mon,server,nowait -vga
cirrus -rtc base=utc -balloon virtio -vnc 127.0.0.1:20 -smp 4 -m 22528
-netdev
type=tap,id=GUEST0,ifname=GUEST0,script=/guest/net/local/ifup,downscript=/guest/net/local/ifdown
-device virtio-net-pci,netdev=GUEST0,mac=52:54:01:23:45:67 -drive
if=virtio,aio=native,cache=none,media=disk,file=disk0.img -drive
if=virtio,format=raw,aio=native,cache=none,media=disk,file=/dev/lvm/venti

The issue happened regardless of whether aio was native or threads.

The host looks idle during the occurrence of the problem.

On the guest, top will reflect 25% iowait (there are four processors
assigned). iostat -xk 1 will show 0 for all values except 100% on %util.

Future accesses to that storage unit will also hang, raising the load
average.

I have not been able to duplicate the problem with IO loads like dd, but
have been constantly successful in doing so with plan9port's venti.

This is a blocker for me to move to stretch, but I can return to jessie
for now, hoping that this bug will be fixed in future. The host and
guest are large and in production, so unless I can set up a test system,
my ability to run further tests here will be limited.

Following is information collected on the system during the issue.

strace from venti around the time of a stop / cont:
2807  pread64(5,
"r\220\37\37g\223(\2046\364l8=\10\203\336\362v\305e\250\320\201
\362\317\327\376:\177L\201\305"..., 8192, 103062593536) = 8192
2807  fadvise64(5, 103062593536, 8192, POSIX_FADV_DONTNEED) = 0
2807  futex(0x215a994, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x215a990,
FUTEX_OP_SET<<28|
0<<12|FUTEX_OP_CMP_GT<<24|0x1 <unfinished ...>
2814  <... futex resumed> )             = 0
2814  futex(0x215a900, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
2807  <... futex resumed> )             = 1
2807  futex(0x215a900, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
2814  <... futex resumed> )             = 0
2814  futex(0x215a900, FUTEX_WAKE_PRIVATE, 1) = 0
2814  rt_sigprocmask(SIG_SETMASK, [CHLD], [CHLD], 8) = 0
2814  rt_sigprocmask(SIG_SETMASK, [CHLD], [CHLD], 8) = 0
2814  futex(0x215a994, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
2807  <... futex resumed> )             = 1
2807  futex(0x215a994, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x215a990,
FUTEX_OP_SET<<28|
0<<12|FUTEX_OP_CMP_GT<<24|0x1 <unfinished ...>
2814  <... futex resumed> )             = 0
2814  futex(0x215a900, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
2807  <... futex resumed> )             = 1
2807  futex(0x215a900, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
2814  <... futex resumed> )             = 0
2814  futex(0x215a900, FUTEX_WAKE_PRIVATE, 1) = 0
2814  rt_sigprocmask(SIG_SETMASK, [CHLD], [CHLD], 8) = 0
2814  rt_sigprocmask(SIG_SETMASK, [CHLD], [CHLD], 8) = 0
2814  futex(0x215a994, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
2807  <... futex resumed> )             = 1
2807  rt_sigprocmask(SIG_SETMASK, [CHLD], [CHLD], 8) = 0
2807  rt_sigprocmask(SIG_SETMASK, [CHLD], [CHLD], 8) = 0
2807  pread64(5,  <unfinished ...>
2808  <... select resumed> )            = 0 (Timeout)
2808  select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
2808  select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
2808  select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
2808  select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
2808  select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
2808  select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
2808  select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
2808  select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
2808  select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
2808  select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
2808  select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
2808  select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
2808  select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)

The program will make no further progress.
The process can be killed, but the IO system on that drive will remain hung.

On first observation, updatedb.mlocate was blocked and the kernel issued
the following trace:

 INFO: task updatedb.mlocat:6298 blocked for more than 120 seconds.
       Not tainted 4.9.0-9-amd64 #1 Debian 4.9.168-1+deb9u4
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 updatedb.mlocat D    0  6298   6297 0x00000000
  ffff8fdfb6befc00 0000000000000000 ffff8fdfb6938f80 ffff8fe53fc18980
  ffffffffb2811500 ffff9ffb83f8f8c0 ffffffffb2215b89 0000000000000000
  0000000000000012 ffff8fe53fc18980 0000000000000008 ffff8fdfb6938f80
 Call Trace:
  [<ffffffffb2215b89>] ? __schedule+0x239/0x6f0
  [<ffffffffb2216840>] ? bit_wait+0x50/0x50
  [<ffffffffb2216072>] ? schedule+0x32/0x80
  [<ffffffffb221940d>] ? schedule_timeout+0x1dd/0x380
  [<ffffffffb1cf14fe>] ? ktime_get+0x3e/0xb0
  [<ffffffffb2216840>] ? bit_wait+0x50/0x50
  [<ffffffffb22158ed>] ? io_schedule_timeout+0x9d/0x100
  [<ffffffffb1cbd377>] ? prepare_to_wait+0x57/0x80
  [<ffffffffb2216857>] ? bit_wait_io+0x17/0x60
  [<ffffffffb2216415>] ? __wait_on_bit+0x55/0x80
  [<ffffffffc057d2c0>] ? metapage_get_blocks+0xb0/0xb0 [jfs]
  [<ffffffffb1d8210f>] ? wait_on_page_bit+0x7f/0xa0
  [<ffffffffb1cbd7e0>] ? wake_atomic_t_function+0x60/0x60
  [<ffffffffb1d8405e>] ? do_read_cache_page+0x1ce/0x2f0
  [<ffffffffc057dc3d>] ? __get_metapage+0x9d/0x4e0 [jfs]
  [<ffffffffc057a143>] ? dtSearch+0x143/0x8b0 [jfs]
  [<ffffffffc056eaaf>] ? diIAGRead.isra.4+0x2f/0x40 [jfs]
  [<ffffffffc056f508>] ? diRead+0x68/0x1f0 [jfs]
  [<ffffffffb1e2994b>] ? iget_locked+0x18b/0x1b0
  [<ffffffffc0566a18>] ? jfs_iget+0x38/0x120 [jfs]
  [<ffffffffc0567b37>] ? jfs_lookup+0xa7/0xc0 [jfs]
  [<ffffffffb1e26ab1>] ? d_alloc_parallel+0xb1/0x4f0
  [<ffffffffb1d8d256>] ? __alloc_pages_nodemask+0xf6/0x260
  [<ffffffffb1e181c9>] ? lookup_slow+0xa9/0x170
  [<ffffffffb1e18599>] ? walk_component+0x1f9/0x330
  [<ffffffffb1e196a7>] ? path_lookupat+0x67/0x120
  [<ffffffffb1e1c371>] ? filename_lookup+0xb1/0x180
  [<ffffffffb1e07d4a>] ? __check_object_size+0xfa/0x1d8
  [<ffffffffb1f63b58>] ? strncpy_from_user+0x48/0x160
  [<ffffffffb1e10da9>] ? vfs_fstatat+0x59/0xb0
  [<ffffffffb1e1135d>] ? SYSC_newlstat+0x2d/0x60
  [<ffffffffb1c03b7d>] ? do_syscall_64+0x8d/0x100
  [<ffffffffb221a94e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

The backtrace of the venti in IO wait state:
 venti           D    0  2218      1 0x00000000
  ffff9ed460d41000 0000000000000000 ffff9ed4612e1280 ffff9ed47fd18980
  ffff9ed46583a1c0 ffffb6e94392fc30 ffffffffb9e15b89 ffffb6e94392fc30
  0000000000000000 ffff9ed47fd18980 ffffb6e94392fd00 ffff9ed4612e1280
 Call Trace:
  [<ffffffffb9e15b89>] ? __schedule+0x239/0x6f0
  [<ffffffffb9e16840>] ? bit_wait+0x50/0x50
  [<ffffffffb9e16072>] ? schedule+0x32/0x80
  [<ffffffffb9e1940d>] ? schedule_timeout+0x1dd/0x380
  [<ffffffffb9b09b67>] ? blk_finish_plug+0x27/0x40
  [<ffffffffb985af3a>] ? kvm_clock_get_cycles+0x1a/0x20
  [<ffffffffb9e16840>] ? bit_wait+0x50/0x50
  [<ffffffffb9e158ed>] ? io_schedule_timeout+0x9d/0x100
  [<ffffffffb98bd377>] ? prepare_to_wait+0x57/0x80
  [<ffffffffb9e16857>] ? bit_wait_io+0x17/0x60
  [<ffffffffb9e16415>] ? __wait_on_bit+0x55/0x80
  [<ffffffffb9985276>] ? wait_on_page_bit_killable+0x96/0xa0
  [<ffffffffb98bd7e0>] ? wake_atomic_t_function+0x60/0x60
  [<ffffffffb99853e9>] ? generic_file_read_iter+0x169/0x8c0
  [<ffffffffb9e1aa64>] ? __switch_to_asm+0x34/0x70
  [<ffffffffb9e1aa70>] ? __switch_to_asm+0x40/0x70
  [<ffffffffb9e1aa64>] ? __switch_to_asm+0x34/0x70
  [<ffffffffb9e1aa70>] ? __switch_to_asm+0x40/0x70
  [<ffffffffb9e1aa64>] ? __switch_to_asm+0x34/0x70
  [<ffffffffb9e1aa70>] ? __switch_to_asm+0x40/0x70
  [<ffffffffb9a0b39d>] ? new_sync_read+0xdd/0x130
  [<ffffffffb9a0bba1>] ? vfs_read+0x91/0x130
  [<ffffffffb9a0d230>] ? SyS_pread64+0x90/0xb0
  [<ffffffffb9803b7d>] ? do_syscall_64+0x8d/0x100
  [<ffffffffb9e1a94e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

Backtrace of a hung sync command:
 sync            D    0 12324   9388 0x00000000
  ffff9ed464594400 0000000000000000 ffff9ecec59b0340 ffff9ed47fc98980
  ffff9ed465838180 ffffb6e948503c00 ffffffffb9e15b89 0000000000000000
  0000000000000000 ffff9ed47fc98980 ffff9ed4654e6300 ffff9ecec59b0340
 Call Trace:
  [<ffffffffb9e15b89>] ? __schedule+0x239/0x6f0
  [<ffffffffb9e16840>] ? bit_wait+0x50/0x50
  [<ffffffffb9e16072>] ? schedule+0x32/0x80
  [<ffffffffb9e1940d>] ? schedule_timeout+0x1dd/0x380
  [<ffffffffc0289c85>] ? ext4_file_open+0xa5/0x280 [ext4]
  [<ffffffffb98addc5>] ? select_idle_sibling+0x25/0x330
  [<ffffffffb985af3a>] ? kvm_clock_get_cycles+0x1a/0x20
  [<ffffffffb9e16840>] ? bit_wait+0x50/0x50
  [<ffffffffb9e158ed>] ? io_schedule_timeout+0x9d/0x100
  [<ffffffffb98bd377>] ? prepare_to_wait+0x57/0x80
  [<ffffffffb9e16857>] ? bit_wait_io+0x17/0x60
  [<ffffffffb9e16415>] ? __wait_on_bit+0x55/0x80
  [<ffffffffb9982ea8>] ? find_get_pages_tag+0x158/0x2e0
  [<ffffffffb998210f>] ? wait_on_page_bit+0x7f/0xa0
  [<ffffffffb98bd7e0>] ? wake_atomic_t_function+0x60/0x60
  [<ffffffffb9982210>] ? __filemap_fdatawait_range+0xe0/0x140
  [<ffffffffb9a3828f>] ? wb_wait_for_completion+0x7f/0x90
  [<ffffffffb9a3a257>] ? sync_inodes_sb+0x227/0x280
  [<ffffffffb9a405f0>] ? SyS_tee+0x420/0x420
  [<ffffffffb9a0f905>] ? iterate_supers+0xb5/0x100
  [<ffffffffb9a40952>] ? sys_sync+0x42/0xb0
  [<ffffffffb9803b7d>] ? do_syscall_64+0x8d/0x100
  [<ffffffffb9e1a94e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

Scheduler info:
 Sched Debug Version: v0.11, 4.9.0-9-amd64 #1
 ktime                                   : 83062979.397916
 sched_clk                               : 83061954.857544
 cpu_clk                                 : 83061954.857556
 jiffies                                 : 4315658042
 sched_clock_stable()                    : 1

 sysctl_sched
   .sysctl_sched_latency                    : 18.000000
   .sysctl_sched_min_granularity            : 2.250000
   .sysctl_sched_wakeup_granularity         : 3.000000
   .sysctl_sched_child_runs_first           : 0
   .sysctl_sched_features                   : 87867
   .sysctl_sched_tunable_scaling            : 1 (logaritmic)
...
 Showing busy workqueues and worker pools:
 workqueue events: flags=0x0
   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
     in-flight: 47:do_sync_work
   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
     pending: vmstat_shepherd
 workqueue vmstat: flags=0xc
   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
     pending: vmstat_update
 pool 4: cpus=2 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 12353 23

kworker blocked for 120 seconds:
 INFO: task kworker/2:1:47 blocked for more than 120 seconds.
       Not tainted 4.9.0-9-amd64 #1 Debian 4.9.168-1+deb9u4
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 kworker/2:1     D    0    47      2 0x00000000
 Workqueue: events do_sync_work
  ffff9ed46133cc00 0000000000000000 ffff9ed465b3a0c0 ffff9ed47fd18980
  ffff9ecec3def1c0 ffffb6e9432c3c88 ffffffffb9e15b89 ffff9ed47fd18980
  00ff9ed465b3a0c0 ffff9ed47fd18980 ffff9ed46133cc00 ffff9ed465b3a0c0
 Call Trace:
  [<ffffffffb9e15b89>] ? __schedule+0x239/0x6f0
  [<ffffffffb9e16072>] ? schedule+0x32/0x80
  [<ffffffffb9e1632a>] ? schedule_preempt_disabled+0xa/0x10
  [<ffffffffb9e17d84>] ? __mutex_lock_slowpath+0xb4/0x130
  [<ffffffffb9a3828f>] ? wb_wait_for_completion+0x7f/0x90
  [<ffffffffb9a405f0>] ? SyS_tee+0x420/0x420
  [<ffffffffb9e17e1b>] ? mutex_lock+0x1b/0x30
  [<ffffffffb9a3a111>] ? sync_inodes_sb+0xe1/0x280
  [<ffffffffb9e1aa70>] ? __switch_to_asm+0x40/0x70
  [<ffffffffb9a405f0>] ? SyS_tee+0x420/0x420
  [<ffffffffb9a0f905>] ? iterate_supers+0xb5/0x100
  [<ffffffffb9a406d6>] ? do_sync_work+0x36/0xc0
  [<ffffffffb989460a>] ? process_one_work+0x18a/0x430
  [<ffffffffb98948fd>] ? worker_thread+0x4d/0x490
  [<ffffffffb98948b0>] ? process_one_work+0x430/0x430
  [<ffffffffb989a969>] ? kthread+0xd9/0xf0
  [<ffffffffb9e1aa64>] ? __switch_to_asm+0x34/0x70
  [<ffffffffb989a890>] ? kthread_park+0x60/0x60
  [<ffffffffb9e1aaf7>] ? ret_from_fork+0x57/0x70

Active CPU backtrace:
 Sending NMI from CPU 2 to CPUs 0-1,3:
 NMI backtrace for cpu 0 skipped: idling at pc 0xffffffffb9e1a1be
 NMI backtrace for cpu 1 skipped: idling at pc 0xffffffffb9e1a1be
 NMI backtrace for cpu 3 skipped: idling at pc 0xffffffffb9e1a1be

Show blocked state:
 sysrq: SysRq : Show Blocked State
   task                        PC stack   pid father
 kworker/2:1     D    0    47      2 0x00000000
 Workqueue: events do_sync_work
  ffff9ed46133cc00 0000000000000000 ffff9ed465b3a0c0 ffff9ed47fd18980
  ffff9ecec3def1c0 ffffb6e9432c3c88 ffffffffb9e15b89 ffff9ed47fd18980
  00ff9ed465b3a0c0 ffff9ed47fd18980 ffff9ed46133cc00 ffff9ed465b3a0c0
 Call Trace:
  [<ffffffffb9e15b89>] ? __schedule+0x239/0x6f0
  [<ffffffffb9e16072>] ? schedule+0x32/0x80
  [<ffffffffb9e1632a>] ? schedule_preempt_disabled+0xa/0x10
  [<ffffffffb9e17d84>] ? __mutex_lock_slowpath+0xb4/0x130
  [<ffffffffb9a3828f>] ? wb_wait_for_completion+0x7f/0x90
  [<ffffffffb9a405f0>] ? SyS_tee+0x420/0x420
  [<ffffffffb9e17e1b>] ? mutex_lock+0x1b/0x30
  [<ffffffffb9a3a111>] ? sync_inodes_sb+0xe1/0x280
  [<ffffffffb9e1aa70>] ? __switch_to_asm+0x40/0x70
  [<ffffffffb9a405f0>] ? SyS_tee+0x420/0x420
  [<ffffffffb9a0f905>] ? iterate_supers+0xb5/0x100
  [<ffffffffb9a406d6>] ? do_sync_work+0x36/0xc0
  [<ffffffffb989460a>] ? process_one_work+0x18a/0x430
  [<ffffffffb98948fd>] ? worker_thread+0x4d/0x490
  [<ffffffffb98948b0>] ? process_one_work+0x430/0x430
  [<ffffffffb989a969>] ? kthread+0xd9/0xf0
  [<ffffffffb9e1aa64>] ? __switch_to_asm+0x34/0x70
  [<ffffffffb989a890>] ? kthread_park+0x60/0x60
  [<ffffffffb9e1aaf7>] ? ret_from_fork+0x57/0x70
 venti           D    0  2203      1 0x00000000
  ffff9ed460d41000 0000000000000000 ffff9ed4616db100 ffff9ed47fd98980
  ffff9ed46583af00 ffffb6e9438c7c30 ffffffffb9e15b89 ffffb6e9438c7c30
  0000000000000000 ffff9ed47fd98980 ffffb6e9438c7d00 ffff9ed4616db100
 Call Trace:
  [<ffffffffb9e15b89>] ? __schedule+0x239/0x6f0
  [<ffffffffb9e16840>] ? bit_wait+0x50/0x50
  [<ffffffffb9e16072>] ? schedule+0x32/0x80
  [<ffffffffb9e1940d>] ? schedule_timeout+0x1dd/0x380
  [<ffffffffb9b09b67>] ? blk_finish_plug+0x27/0x40
  [<ffffffffb985af3a>] ? kvm_clock_get_cycles+0x1a/0x20
  [<ffffffffb9e16840>] ? bit_wait+0x50/0x50
  [<ffffffffb9e158ed>] ? io_schedule_timeout+0x9d/0x100
  [<ffffffffb98bd377>] ? prepare_to_wait+0x57/0x80
  [<ffffffffb9e16857>] ? bit_wait_io+0x17/0x60
  [<ffffffffb9e16415>] ? __wait_on_bit+0x55/0x80
  [<ffffffffb9985276>] ? wait_on_page_bit_killable+0x96/0xa0
  [<ffffffffb98bd7e0>] ? wake_atomic_t_function+0x60/0x60
  [<ffffffffb99853e9>] ? generic_file_read_iter+0x169/0x8c0
  [<ffffffffb9e1aa64>] ? __switch_to_asm+0x34/0x70
  [<ffffffffb9e1aa70>] ? __switch_to_asm+0x40/0x70
  [<ffffffffb9e1aa64>] ? __switch_to_asm+0x34/0x70
  [<ffffffffb9e1aa70>] ? __switch_to_asm+0x40/0x70
  [<ffffffffb9e1aa64>] ? __switch_to_asm+0x34/0x70
  [<ffffffffb9e1aa70>] ? __switch_to_asm+0x40/0x70
  [<ffffffffb9a0b39d>] ? new_sync_read+0xdd/0x130
  [<ffffffffb9a0bba1>] ? vfs_read+0x91/0x130
  [<ffffffffb9a0d230>] ? SyS_pread64+0x90/0xb0
  [<ffffffffb9803b7d>] ? do_syscall_64+0x8d/0x100
  [<ffffffffb9e1a94e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
 venti           D    0  2218      1 0x00000000
  ffff9ed460d41000 0000000000000000 ffff9ed4612e1280 ffff9ed47fd18980
  ffff9ed46583a1c0 ffffb6e94392fc30 ffffffffb9e15b89 ffffb6e94392fc30
  0000000000000000 ffff9ed47fd18980 ffffb6e94392fd00 ffff9ed4612e1280
 Call Trace:
  [<ffffffffb9e15b89>] ? __schedule+0x239/0x6f0
  [<ffffffffb9e16840>] ? bit_wait+0x50/0x50
  [<ffffffffb9e16072>] ? schedule+0x32/0x80
  [<ffffffffb9e1940d>] ? schedule_timeout+0x1dd/0x380
  [<ffffffffb9b09b67>] ? blk_finish_plug+0x27/0x40
  [<ffffffffb985af3a>] ? kvm_clock_get_cycles+0x1a/0x20
  [<ffffffffb9e16840>] ? bit_wait+0x50/0x50
  [<ffffffffb9e158ed>] ? io_schedule_timeout+0x9d/0x100
  [<ffffffffb98bd377>] ? prepare_to_wait+0x57/0x80
  [<ffffffffb9e16857>] ? bit_wait_io+0x17/0x60
  [<ffffffffb9e16415>] ? __wait_on_bit+0x55/0x80
  [<ffffffffb9985276>] ? wait_on_page_bit_killable+0x96/0xa0
  [<ffffffffb98bd7e0>] ? wake_atomic_t_function+0x60/0x60
  [<ffffffffb99853e9>] ? generic_file_read_iter+0x169/0x8c0
  [<ffffffffb9e1aa64>] ? __switch_to_asm+0x34/0x70
  [<ffffffffb9e1aa70>] ? __switch_to_asm+0x40/0x70
  [<ffffffffb9e1aa64>] ? __switch_to_asm+0x34/0x70
  [<ffffffffb9e1aa70>] ? __switch_to_asm+0x40/0x70
  [<ffffffffb9e1aa64>] ? __switch_to_asm+0x34/0x70
  [<ffffffffb9e1aa70>] ? __switch_to_asm+0x40/0x70
  [<ffffffffb9a0b39d>] ? new_sync_read+0xdd/0x130
  [<ffffffffb9a0bba1>] ? vfs_read+0x91/0x130
  [<ffffffffb9a0d230>] ? SyS_pread64+0x90/0xb0
  [<ffffffffb9803b7d>] ? do_syscall_64+0x8d/0x100
  [<ffffffffb9e1a94e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
 sync            D    0 12324   9388 0x00000000
  ffff9ed464594400 0000000000000000 ffff9ecec59b0340 ffff9ed47fc98980
  ffff9ed465838180 ffffb6e948503c00 ffffffffb9e15b89 0000000000000000
  0000000000000000 ffff9ed47fc98980 ffff9ed4654e6300 ffff9ecec59b0340
 Call Trace:
  [<ffffffffb9e15b89>] ? __schedule+0x239/0x6f0
  [<ffffffffb9e16840>] ? bit_wait+0x50/0x50
  [<ffffffffb9e16072>] ? schedule+0x32/0x80
  [<ffffffffb9e1940d>] ? schedule_timeout+0x1dd/0x380
  [<ffffffffc0289c85>] ? ext4_file_open+0xa5/0x280 [ext4]
  [<ffffffffb98addc5>] ? select_idle_sibling+0x25/0x330
  [<ffffffffb985af3a>] ? kvm_clock_get_cycles+0x1a/0x20
  [<ffffffffb9e16840>] ? bit_wait+0x50/0x50
  [<ffffffffb9e158ed>] ? io_schedule_timeout+0x9d/0x100
  [<ffffffffb98bd377>] ? prepare_to_wait+0x57/0x80
  [<ffffffffb9e16857>] ? bit_wait_io+0x17/0x60
  [<ffffffffb9e16415>] ? __wait_on_bit+0x55/0x80
  [<ffffffffb9982ea8>] ? find_get_pages_tag+0x158/0x2e0
  [<ffffffffb998210f>] ? wait_on_page_bit+0x7f/0xa0
  [<ffffffffb98bd7e0>] ? wake_atomic_t_function+0x60/0x60
  [<ffffffffb9982210>] ? __filemap_fdatawait_range+0xe0/0x140
  [<ffffffffb9a3828f>] ? wb_wait_for_completion+0x7f/0x90
  [<ffffffffb9a3a257>] ? sync_inodes_sb+0x227/0x280
  [<ffffffffb9a405f0>] ? SyS_tee+0x420/0x420
  [<ffffffffb9a0f905>] ? iterate_supers+0xb5/0x100
  [<ffffffffb9a40952>] ? sys_sync+0x42/0xb0
  [<ffffffffb9803b7d>] ? do_syscall_64+0x8d/0x100
  [<ffffffffb9e1a94e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

dump ftrace:
 sysrq: SysRq : Dump ftrace buffer
 Dumping ftrace buffer:
    (ftrace buffer empty)

Additional run with the kernel running the jessie kernel:
 INFO: task kworker/u8:0:6 blocked for more than 120 seconds.
       Not tainted 3.16.0-10-amd64 #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 kworker/u8:0    D ffffffff810a33f6     0     6      2 0x00000000
 Workqueue: writeback bdi_writeback_workfn (flush-254:16)
  ffff8805a874a050 0000000000000046 0000000000014000 0000000000014000
  0000000000014000 ffff8805bfd148b0 ffff8805bffa58a0 0000000000000002
  ffffffff81149d50 ffff8805a8763a30 ffff8805a7093dc0 ffffffff8153b73b
 Call Trace:
  [<ffffffff81149d50>] ? wait_on_page_read+0x60/0x60
  [<ffffffff8153b73b>] ? io_schedule+0xab/0x140
  [<ffffffff81149d5a>] ? sleep_on_page+0xa/0x10
  [<ffffffff8153bb0e>] ? __wait_on_bit+0x5e/0x90
  [<ffffffff81149b55>] ? wait_on_page_bit+0xc5/0xd0
  [<ffffffff810b14a0>] ? autoremove_wake_function+0x30/0x30
  [<ffffffffa0255860>] ? txLock+0x5c0/0x760 [jfs]
  [<ffffffffa02514d7>] ? __get_metapage+0x247/0x4a0 [jfs]
  [<ffffffff811582cd>] ? pagevec_lookup_tag+0x1d/0x30
  [<ffffffff811555d7>] ? write_cache_pages+0xe7/0x420
  [<ffffffffa02438a7>] ? diWrite+0xc7/0x7e0 [jfs]
  [<ffffffffa0256562>] ? txCommit+0x1c2/0xe40 [jfs]
  [<ffffffff811f6a78>] ? mpage_writepages+0x88/0xb0
  [<ffffffffa0239be1>] ? jfs_commit_inode+0xb1/0x110 [jfs]
  [<ffffffffa0239c6f>] ? jfs_write_inode+0x2f/0x60 [jfs]
  [<ffffffffa0239c40>] ? jfs_commit_inode+0x110/0x110 [jfs]
  [<ffffffff811e3a2c>] ? __writeback_single_inode+0x1cc/0x2d0
  [<ffffffff811e6750>] ? writeback_sb_inodes+0x1b0/0x420
  [<ffffffff811e6a56>] ? __writeback_inodes_wb+0x96/0xc0
  [<ffffffff811e6ceb>] ? wb_writeback+0x26b/0x330
  [<ffffffff8108c3ff>] ? set_worker_desc+0x8f/0xa0
  [<ffffffff811e71c2>] ? bdi_writeback_workfn+0x2c2/0x470
  [<ffffffff8108944c>] ? process_one_work+0x14c/0x470
  [<ffffffff8108a1fb>] ? worker_thread+0x3b/0x520
  [<ffffffff8108a1c0>] ? rescuer_thread+0x2a0/0x2a0
  [<ffffffff81090561>] ? kthread+0xd1/0xf0
  [<ffffffff8153af0f>] ? __schedule+0x22f/0x750
  [<ffffffff81090490>] ? kthread_create_on_node+0x1b0/0x1b0
  [<ffffffff8153f37e>] ? ret_from_fork+0x6e/0xa0
  [<ffffffff81090490>] ? kthread_create_on_node+0x1b0/0x1b0
 INFO: task venti:2069 blocked for more than 120 seconds.
       Not tainted 3.16.0-10-amd64 #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 venti           D ffffffff810a33f6     0  2069      1 0x00000000
  ffff8805a55375d0 0000000000000086 0000000000014000 0000000000014000
  0000000000014000 ffff8805a64b71f8 ffff8802372bfbc0 ffff8805a64b71fc
  ffff8805a55375d0 00000000ffffffff ffff8805a64b7200 ffffffff8153b8b5
 Call Trace:
  [<ffffffff8153b8b5>] ? schedule_preempt_disabled+0x25/0x70
  [<ffffffff8153d6f3>] ? __mutex_lock_slowpath+0xd3/0x200
  [<ffffffffa0254fc2>] ? txBeginAnon+0x22/0x1c0 [jfs]
  [<ffffffff8153d83b>] ? mutex_lock+0x1b/0x30
  [<ffffffffa025020d>] ? extAlloc+0x4d/0x440 [jfs]
  [<ffffffffa0250675>] ? extHint+0x75/0x100 [jfs]
  [<ffffffffa02398e8>] ? jfs_get_block+0x208/0x280 [jfs]
  [<ffffffff811ed11d>] ? alloc_buffer_head+0x1d/0x70
  [<ffffffff811ef901>] ? nobh_write_begin+0x1a1/0x460
  [<ffffffffa0239ff5>] ? jfs_write_begin+0x35/0x80 [jfs]
  [<ffffffffa02396e0>] ? jfs_open+0xc0/0xc0 [jfs]
  [<ffffffff8114a24c>] ? generic_perform_write+0xcc/0x1d0
  [<ffffffff8114c44d>] ? __generic_file_write_iter+0x1dd/0x340
  [<ffffffff8114c5e8>] ? generic_file_write_iter+0x38/0xa0
  [<ffffffff811ba88a>] ? new_sync_write+0x7a/0xb0
  [<ffffffff811bb025>] ? vfs_write+0xb5/0x1f0
  [<ffffffff810deec3>] ? SyS_futex+0x73/0x170
  [<ffffffff811bbd1b>] ? SyS_pwrite64+0x6b/0xa0
  [<ffffffff8153f45c>] ? system_call_fast_compare_end+0x1c/0x21

#933916#10
Date:
2019-08-06 02:26:25 UTC
From:
To:
When the problem occurs, iostat -xk shows the following:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.25   24.94    0.00   74.81

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
vdb               0.00     0.00    0.00    0.00     0.00     0.00
0.00     1.00    0.00    0.00    0.00   0.00 100.00

Backtrace from all threads on the qemu process:
(gdb) thread apply all bt

Thread 7 (Thread 0x7f25e65ff700 (LWP 4428)):
#0  0x00007f25ff2ba17f in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x0000560da8d5b359 in qemu_cond_wait ()
#2  0x0000560da8cadd0b in ?? ()
#3  0x0000560da8cae258 in ?? ()
#4  0x00007f25ff2b44a4 in start_thread ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
#5  0x00007f25feff6d0f in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 6 (Thread 0x7f25e7fff700 (LWP 4426)):
#0  0x00007f25fefef017 in ioctl () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x0000560da8a0a4b7 in kvm_vcpu_ioctl ()
#2  0x0000560da8a0a610 in kvm_cpu_exec ()
#3  0x0000560da89f8138 in ?? ()
#4  0x00007f25ff2b44a4 in start_thread ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
#5  0x00007f25feff6d0f in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 5 (Thread 0x7f25f4a2c700 (LWP 4425)):
#0  0x00007f25fefef017 in ioctl () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x0000560da8a0a4b7 in kvm_vcpu_ioctl ()
#2  0x0000560da8a0a610 in kvm_cpu_exec ()
#3  0x0000560da89f8138 in ?? ()
#4  0x00007f25ff2b44a4 in start_thread ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
#5  0x00007f25feff6d0f in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 4 (Thread 0x7f25f522d700 (LWP 4424)):
#0  0x00007f25fefef017 in ioctl () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x0000560da8a0a4b7 in kvm_vcpu_ioctl ()
#2  0x0000560da8a0a610 in kvm_cpu_exec ()
#3  0x0000560da89f8138 in ?? ()
#4  0x00007f25ff2b44a4 in start_thread ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
#5  0x00007f25feff6d0f in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 3 (Thread 0x7f25f5a2e700 (LWP 4423)):
#0  0x00007f25fefef017 in ioctl () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x0000560da8a0a4b7 in kvm_vcpu_ioctl ()
#2  0x0000560da8a0a610 in kvm_cpu_exec ()
#3  0x0000560da89f8138 in ?? ()
#4  0x00007f25ff2b44a4 in start_thread ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
#5  0x00007f25feff6d0f in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 2 (Thread 0x7f25f6c32700 (LWP 4387)):
#0  0x00007f25feff2469 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x0000560da8d5b668 in qemu_event_wait ()
#2  0x0000560da8d6a94e in ?? ()
#3  0x00007f25ff2b44a4 in start_thread ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
#4  0x00007f25feff6d0f in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 1 (Thread 0x7f2604776c80 (LWP 4386)):
#0  0x00007f25fefed981 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x0000560da8cbd359 in qemu_poll_ns ()
#2  0x0000560da8cbcd6a in main_loop_wait ()
#3  0x0000560da89be1ed in main ()

(gdb) info thread
  Id   Target Id         Frame
* 1    Thread 0x7f2604776c80 (LWP 4386) "GUEST" 0x00007f25fefed981 in
ppoll ()
   from /lib/x86_64-linux-gnu/libc.so.6
  2    Thread 0x7f25f6c32700 (LWP 4387) "qemu-system-x86"
0x00007f25feff2469 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
  3    Thread 0x7f25f5a2e700 (LWP 4423) "GUEST" 0x00007f25fefef017 in
ioctl ()
   from /lib/x86_64-linux-gnu/libc.so.6
  4    Thread 0x7f25f522d700 (LWP 4424) "GUEST" 0x00007f25fefef017 in
ioctl ()
   from /lib/x86_64-linux-gnu/libc.so.6
  5    Thread 0x7f25f4a2c700 (LWP 4425) "GUEST" 0x00007f25fefef017 in
ioctl ()
   from /lib/x86_64-linux-gnu/libc.so.6
  6    Thread 0x7f25e7fff700 (LWP 4426) "GUEST" 0x00007f25fefef017 in
ioctl ()
   from /lib/x86_64-linux-gnu/libc.so.6
  7    Thread 0x7f25e65ff700 (LWP 4428) "GUEST" 0x00007f25ff2ba17f in
pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0

#933916#15
Date:
2025-08-13 06:59:19 UTC
From:
To:
Replying to an old bug report which received no attention at the time..
[...]


This was an excellent bug report at the time, with a lot of details.
Unfortunately it got no attention whatsoever.

I remember some I/O being stuck sometimes after stop/cont or after
suspend/resume of the guest.  It's been really a while since I've
seen it for the last time.  It looks like this particular bug has been
fixed long time ago.

Can you please check current version of qemu (bookworm or trixie) to
see if the problem is still there?

If not, let's close this bug with bookworm version of qemu (sort of
arbitrary).

Thanks,

/mjt