#923423 dpkg: package install can be very slow with some disks due to too frequent fsync

Package:
dpkg
Source:
dpkg
Description:
Debian package management system
Submitter:
Vincent Lefevre
Date:
2021-04-01 09:54:12 UTC
Severity:
important
Tags:
#923423#5
Date:
2019-02-28 00:06:28 UTC
From:
To:
Installation of texlive-fonts-extra 2018.20190227-1 hangs.

Here's px information:

# px 5553
/usr/bin/dpkg
  --status-fd
  80
  --no-triggers
  --unpack
  --auto-deconfigure
  --recursive
  /tmp/apt-dpkg-install-mos7uq

kernel(0)                             root
  init(1)                             root
    lightdm(915)                      root
      lightdm(1525)                   root
        zsh(1547)                     vinc17
          fvwm2(1629)                 vinc17
            sh(1745)                  vinc17
              xterm(1746)             vinc17
                zsh(1750)             vinc17
                  su(1839)            root
                    bash(1841)        root
                      aptitude(2012)  root
----------------------> dpkg(5553)    root

17m01s ago dpkg was started, at 2019-02-28T00:48:45+01:00.
1.5% has been its average CPU usage since then, or 15s/17m01s

Other processes started close to dpkg(5553):
  [kworker/0:1-events](5351) was started 2m57s before dpkg(5553)
  [kworker/u17:0-kcryptd](5540) was started 1.0s before dpkg(5553)
  [kworker/4:2](5771) was started 2m19s after dpkg(5553)
  [kworker/u16:1-events_unbound](5928) was started 4m21s after dpkg(5553)
  [kworker/5:2-mm_percpu_wq](6068) was started 5m06s after dpkg(5553)

Users logged in when dpkg(5553) started:
  vinc17 from 155.133.131.76
  vinc17 from :0

2019-02-28T01:05:46.585799: Now invoking lsof, this can take over a minute on a big system...
2019-02-28T01:05:47.098503: lsof done, proceeding.

File descriptors:
  stdin : [CHR] /dev/pts/16
  stdout: [CHR] /dev/pts/16
  stderr: [CHR] /dev/pts/16

Network connections:

Inter Process Communication:
  aptitude(2012): [FIFO] pipe

For a list of all open files, do "sudo lsof -p 5553", or "sudo watch lsof -p 5553" for a live view.

#923423#10
Date:
2019-02-28 00:17:19 UTC
From:
To:
Control: severity -1 important
Control: retitle -1 texlive-fonts-extra installation hangs for 19 minutes
[...]

It eventually resumed. In the /var/log/dpkg.log file:

[...]
2019-02-28 00:49:41 install texlive-bibtex-extra:all 2018.20190207-1 2018.20190227-1
2019-02-28 00:49:41 status half-installed texlive-bibtex-extra:all 2018.20190207-1
2019-02-28 00:50:27 status unpacked texlive-bibtex-extra:all 2018.20190227-1
2019-02-28 00:50:27 install texlive-extra-utils:all 2018.20190207-1 2018.20190227-1
2019-02-28 00:50:27 status half-installed texlive-extra-utils:all 2018.20190207-1
2019-02-28 00:50:55 status unpacked texlive-extra-utils:all 2018.20190227-1
2019-02-28 00:50:55 install texlive-font-utils:all 2018.20190207-1 2018.20190227-1
2019-02-28 00:50:55 status half-installed texlive-font-utils:all 2018.20190207-1
2019-02-28 00:51:00 status unpacked texlive-font-utils:all 2018.20190227-1
2019-02-28 00:51:01 install texlive-fonts-extra:all 2018.20190207-1 2018.20190227-1
2019-02-28 00:51:01 status half-installed texlive-fonts-extra:all 2018.20190207-1
2019-02-28 01:10:50 status unpacked texlive-fonts-extra:all 2018.20190227-1
2019-02-28 01:10:50 install texlive-fonts-recommended:all 2018.20190207-1 2018.20190227-1
2019-02-28 01:10:50 status half-installed texlive-fonts-recommended:all 2018.20190207-1
2019-02-28 01:11:49 status unpacked texlive-fonts-recommended:all 2018.20190227-1
[...]

So that's about 19 minutes with almost no CPU/disk/network activity.

#923423#19
Date:
2019-02-28 03:07:25 UTC
From:
To:
reassign 923423 dpkg
thanks

Nothing I can do about it, please ask dpkg maintainers what is going on.

Norbert

#923423#30
Date:
2019-02-28 09:03:15 UTC
From:
To:
In short:

[...]
2019-02-28 00:51:01 status half-installed texlive-fonts-extra:all 2018.20190207-1
2019-02-28 01:10:50 status unpacked texlive-fonts-extra:all 2018.20190227-1
[...]

And ditto in the past:

[...]
2019-02-02 16:21:55 status half-installed texlive-fonts-extra:all 2018.20190126-1
2019-02-02 16:41:32 status unpacked texlive-fonts-extra:all 2018.20190131-1
[...]

[...]
2019-02-26 01:35:08 status half-installed texlive-fonts-extra:all 2018.20190131-1
2019-02-26 01:55:12 status unpacked texlive-fonts-extra:all 2018.20190131-2
[...]

[...]
2019-02-27 02:33:14 status half-installed texlive-fonts-extra:all 2018.20190131-2
2019-02-27 02:53:18 status unpacked texlive-fonts-extra:all 2018.20190207-1
[...]

#923423#37
Date:
2019-03-02 23:15:13 UTC
From:
To:
Hi!

Thanks for all the info. This does look like a hardware or kernel
problem to me, though.

Did this start happening due to the new kernel version? What
filesystem are you using (say btrfs or similar?), etc, can you
reproduce on another system? I certainly can't. :)

I guess attaching strace to the running dpkg might also give some more
info on what's going on, or where it's stalling. Also if you can
simply reproduce via «dpkg -i», then enabling all dpkg debug flags
might also help.

I see the kernel is tainted. Perhaps try to unload those and see? :)

Thanks,
Guillem

#923423#44
Date:
2019-03-03 01:38:10 UTC
From:
To:
In March 2018, this was already slow, but much less:

[...]
2018-03-08 00:55:55 status unpacked texlive-fonts-extra:all 2017.20180225-1
2018-03-08 00:55:55 status half-installed texlive-fonts-extra:all 2017.20180225-1
2018-03-08 01:02:55 status half-installed texlive-fonts-extra:all 2017.20180225-1
2018-03-08 01:02:56 status unpacked texlive-fonts-extra:all 2017.20180305-2
2018-03-08 01:02:56 status unpacked texlive-fonts-extra:all 2017.20180305-2
[...]

I don't have older information.

mount information:

/dev/mapper/zira--vg-root on / type ext4 (rw,relatime,errors=remount-ro)

No, on two of my other machines, it takes around 1 minute
(both for old and new versions). I assume that this is what
is expected.

With

  dpkg -i /var/cache/apt/archives/texlive-fonts-extra_2018.20190227-1_all.deb

GKrellM shows full disk activity for a bit less than 2 minutes, then
no activity, but dpkg is still running. Attaching strace against it
during this time shows lots of rename's:

[...]
rename("/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2cijw8y.tfm.dpkg-new", "/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2cijw8y.tfm") = 0
openat(AT_FDCWD, "/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2ciw8y.tfm.dpkg-new", O_WRONLY) = 11
fsync(11)                               = 0
close(11)                               = 0
rename("/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2ciw8y.tfm.dpkg-new", "/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2ciw8y.tfm") = 0
openat(AT_FDCWD, "/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2cj8c.tfm.dpkg-new", O_WRONLY) = 11
fsync(11)                               = 0
close(11)                               = 0
rename("/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2cj8c.tfm.dpkg-new", "/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2cj8c.tfm") = 0
openat(AT_FDCWD, "/usr/share/texlive/texmf-dist/fonts/tfm/arkandis/berenisadf/ybdr2cj8t.tfm.dpkg-new", O_WRONLY) = 11
fsync(11)                               = 0
close(11)                               = 0
[...]

That's the NVIDIA drivers (because nouveau is too broken).

#923423#49
Date:
2019-03-05 12:52:27 UTC
From:
To:
Hi!

Yes. This does really look like a kernel or hardware issue TBH.
Could you try downgrading the kernel to earlier versions to see
whether that improves, that'd give a definitive answer.

I'd say try also older dpkg versions, but I seriously doubt that'd
change much, as I don't remember any major changes in this area
recently.

Thanks. Yes this would be the code doing the delayed fsync(3 ) with
atomic renames(3), which has been the solution to earlier problems
with ext4.

Thanks,
Guillem

#923423#54
Date:
2019-03-05 13:55:20 UTC
From:
To:
Same issue (also around 20 minutes) with the kernel from stable:

Linux zira 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 GNU/Linux

If it's a hardware issue, I'm wondering why I don't have messages
in journalctl output.

#923423#59
Date:
2019-03-05 14:43:05 UTC
From:
To:
I've done a test with "iozone -a -e", and with this -e option
(to include fsync), the operations are much slower than on the
other machine.

zira:~> iozone -a -e
[...]
                                                              random    random     bkwd    record    stride
              kB  reclen    write  rewrite    read    reread    read     write     read   rewrite      read   fwrite frewrite    fread  freread
              64       4     1821     1987     1922     3703     3657     1770     1621      1628      1697     1707     1773  1879725  2892445
              64       8     1606     1684     1713     3493     3476     1700     1765      1730      1812     2135     1852  2278628  3406295
              64      16     1807     1823     1986     3898     3798     1977     1813      1988      1871     2049     3320  2561267  4018152
              64      32     1702     1714     1755     3494     3683     1741     1841      1796      1737     1633     1674  1879725  4018152
              64      64     1808     1790     1632     3314     3387     2254     1995      1843      1984     1789     1820  2801873  3203069
[...]

#923423#64
Date:
2019-03-05 14:44:31 UTC
From:
To:
I meant, except fread and freread.
#923423#69
Date:
2019-04-08 22:33:27 UTC
From:
To:
Hi!

Ok, can we conclude this is not a problem in dpkg then? :) It seems to
me either hardware, filesystem or kernel related as mentioned before.
Could you reassign where you think it would be more appropriate?

Thanks,
Guillem

#923423#74
Date:
2019-04-09 09:34:08 UTC
From:
To:
Hi,

I'm not sure. This may just be a slow disk after all, and in practice
(I mean except benchmarking), the problem seems specific to dpkg.
I assume that programs normally don't do a fsync at every fraction
of microsecond! I don't know what dpkg is trying to achieve with such
a frequency of fsync, but this doesn't seem to make sense to me. You
had said "which has been the solution to earlier problems with ext4",
but the real solution would be to fix ext4 (does "earlier" mean this
has eventually been fixed?).

I've seen https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1785020
"fsync is slow on later kernels with ext4 filesystms" but the comments
in this bug page and 4.19.28-1 kernel logs show that this has been
fixed.

FYI, I get very similar timings on the same machine, but on /boot,
which is:

/dev/sda1 on /boot type ext2 (rw,relatime,block_validity,barrier,user_xattr,acl,stripe=4)

The fact that the main, ext4 partition is encrypted but not this
one makes another difference, but the timings are similar, so that
this doesn't seem to be related to the FS system itself and appears
to be at a lower level.

Note also that my machine is a laptop, and I couldn't do a comparison
with other laptops, just in case.

I think that this should still be regarded as a dpkg bug, based on
my first paragraph above (the fsync's occur much too often in dpkg
compared to other programs, and this doesn't seem to be useful).

#923423#81
Date:
2021-04-01 09:45:47 UTC
From:
To:
To confirm the issue:

Unpacking texlive-fonts-extra (2020.20210202-3) over (2020.20210202-1) ...

is incredibly slow.

I'm running Debian/bullseye; the SSD is ext4 and far from full.

What is the status of this issue?
What are the conclusions after all the above correspondence?
Is there any hope?

- Joachim