#972525 buildd: sbuild randomly fails to sign changes file despite valid signature keys

Package:
src:gnupg2
Source:
gnupg2
Submitter:
John Paul Adrian Glaubitz
Date:
2024-02-05 15:57:02 UTC
Severity:
normal
#972525#5
Date:
2020-10-19 21:41:59 UTC
From:
To:
Hi!

I'm observing random failures of sbuild signing the changes file after build
on some buildds, especially on sparc64 and most often on the machine sompek.

I'm not sure yet what the problem is but it looks like a GPG problem [1]:

gpg: can't connect to the agent: IPC connect call failed
gpg: keydb_search failed: No agent running
gpg: skipped "F1EA40F487003E5047A04D0A62D1430FE7E0DE86": No agent running
gpg: /tmp/debsign.oB0Milcr/rust-kstring_1.0.0-1_sparc64.changes: clear-sign failed: No agent running
debsign: gpg error occurred!  Aborting....

I'm filing this issue here to get some attention and maybe some ideas where
the problem could be. Maybe sbuild should wait a little longer before attempting
to sign? Or maybe the GPG daemon is crashing often?

In case it's safe not to be an issue with buildd/sbuild, feel free to reassign.

Thanks,
Adrian

#972525#10
Date:
2020-10-22 02:32:00 UTC
From:
To:
Hello,

IIUC, it is likely a bug of GnuPG.

When a server with higher load, it takes time for a process of gpg-agent
to start listen(2)ing the connection than the value of
SECS_TO_WAIT_FOR_AGENT.  It's five seconds, hard-coded.

Last year, I pushed a change to remove possible race condition around
the initial connect to gpg-agent.

I don't know if this patch fixes the particular problem of sbuild, but,
it should improve the situation, hopefully, much.

#972525#15
Date:
2021-02-08 23:18:33 UTC
From:
To:
We believe that the bug you reported is fixed in the latest version of
gnupg2, which is due to be installed in the Debian FTP archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 972525@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Daniel Kahn Gillmor <dkg@fifthhorseman.net> (supplier of updated gnupg2 package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@ftp-master.debian.org)
Format: 1.8
Date: Mon, 08 Feb 2021 17:57:00 -0500
Source: gnupg2
Architecture: source
Version: 2.2.27-1
Distribution: unstable
Urgency: medium
Maintainer: Debian GnuPG Maintainers <pkg-gnupg-maint@lists.alioth.debian.org>
Changed-By: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
Closes: 868550 972525
Changes:
 gnupg2 (2.2.27-1) unstable; urgency=medium
 .
   [ NIIBE Yutaka ]
   * New upstream release.
 .
   [ Christoph Biedl ]
   * Tighten libgcrypt and libksba dependency
 .
   [ Daniel Kahn Gillmor ]
   * change debian packaging branch name to debian/main
   * refresh patches using gbp pq
   * point to upstream commit used to improve spawning reliability
   * Refresh 3072-bit default patch
   * standards-version: bump to 4.5.1 (no changes needed)
   * dh: bump to dh 13
   * clean up lintian overrides
   * fully drop symcryptrun
 .
 gnupg2 (2.2.26-1) UNRELEASED; urgency=medium
 .
   [ Jeremiah C. Foster ]
   * debian/scdaemon.udev: Add an entry for Librem Key.
 .
   [ NIIBE Yutaka ]
   * New upstream release.
   * refresh patches.
   * debian/rules: Add build for regexp.
   * debian/gnupg-utils.install: Remove /usr/bin/symcryptrun.
     Fix for gpgsplit, which is changed in upstream from 'noinst'.
   * debian/patches/gpg-change-agent-spawn-2019-07-24-v2.patch: New patch to
     fix a race condition, backported from master (Closes: #868550, #972525).
   * debian/scdaemon.udev: Add a generic entry for "Gnuk Token" and another
     for GnuPG e.V.
   * org.gnupg.scdaemon.metainfo.xml: Add an entry for GnuPG e.V.
Checksums-Sha1:
 eedfdd8cef88679841202fb937945125b901dbc6 3236 gnupg2_2.2.27-1.dsc
 d928d4bd0808ffb8fe20d1161501401d5d389458 7191555 gnupg2_2.2.27.orig.tar.bz2
 7fc979ac1633b07f7ccb2fa06150402935726b05 119 gnupg2_2.2.27.orig.tar.bz2.asc
 a2fb026d7704e36f7a72ae3804de19d1579e4c7c 62536 gnupg2_2.2.27-1.debian.tar.xz
 fc81c61f782d110f9a3cfe3034f5db386b964151 18862 gnupg2_2.2.27-1_amd64.buildinfo
Checksums-Sha256:
 d43af9a91d0392c2edcccea24dc0dd6e34b4bac65950f3c962c3b916a4cd9774 3236 gnupg2_2.2.27-1.dsc
 34e60009014ea16402069136e0a5f63d9b65f90096244975db5cea74b3d02399 7191555 gnupg2_2.2.27.orig.tar.bz2
 2b44fd82da223cb629062b9c8840d92698c003be8531fc393c38f97b28cae2a4 119 gnupg2_2.2.27.orig.tar.bz2.asc
 668b77dc35cde0b65a6c18fab5620fca2c92208d0cc8298822c6eee56342e390 62536 gnupg2_2.2.27-1.debian.tar.xz
 e846f2a03ad7508a77a2b1578c0aeecbabf3a1e7455bfaaad902944249bc1b0c 18862 gnupg2_2.2.27-1_amd64.buildinfo
Files:
 b75d8759d87e00528a62bf6da17ebefc 3236 utils optional gnupg2_2.2.27-1.dsc
 a9c002b5356103c97412955a1956ae0c 7191555 utils optional gnupg2_2.2.27.orig.tar.bz2
 3a7ebb524a333b41032765eb651ea032 119 utils optional gnupg2_2.2.27.orig.tar.bz2.asc
 57978d4d1312044655b2c1fa4ea6843f 62536 utils optional gnupg2_2.2.27-1.debian.tar.xz
 667fcc0af533972bdbd028c860c26956 18862 utils optional gnupg2_2.2.27-1_amd64.buildinfo
-----BEGIN PGP SIGNATURE-----

iHUEARYIAB0WIQQttUkcnfDcj0MoY88+nXFzcd5WXAUCYCHEFwAKCRA+nXFzcd5W
XLr9AQC82VMrWVAhlWZBMmufgbL4/dL1lZCGwzP11fwY3pHr9gEA+Nzw5HQZnzTD
jK8B67Ka6oAe+miIc0DVvmXXWnAHywA=
=VzcZ
-----END PGP SIGNATURE-----

#972525#34
Date:
2022-02-28 21:21:45 UTC
From:
To:
control: reopen 972525
control: found 972525 gnupg2/2.2.27-2
from time to time on the buildds. For instance:

Signature with key '37B76E354F4D3F4EB64D699B374AD85D2226CD3D' requested:
 signfile buildinfo /home/buildd/build/linux_5.10.92-2_amd64-buildd.buildinfo 37B76E354F4D3F4EB64D699B374AD85D2226CD3D
gpg: can't connect to the agent: IPC connect call failed
gpg: keydb_search failed: No agent running
gpg: skipped "37B76E354F4D3F4EB64D699B374AD85D2226CD3D": No agent running
gpg: /tmp/debsign.12nNJjLy/linux_5.10.92-2_amd64-buildd.buildinfo: clear-sign failed: No agent running
debsign: gpg error occurred!  Aborting....

#972525#45
Date:
2022-03-18 07:17:49 UTC
From:
To:
Aurelien Jarno <aurelien@aurel32.net> wrote:

Thanks a lot for sharing the fact.

I assume that the failure occurs in a situation of heavy load, where it
takes time for gpg-agent to start its service.

In upstream, I opened another ticket, because I found
a possible cause of this issue (another one):

https://dev.gnupg.org/T5884

The change of GnuPG 2.3 in master:

https://dev.gnupg.org/rGd94b411f129f572aace005f38cd78515d810135a

can be applied manually.  (Manually, because we had spell fix
of the comment in 2.3.)

#972525#50
Date:
2024-02-05 15:53:06 UTC
From:
To:
I've been seeing this regularly, and getting hundreds of 'dupload
failed' emails as a result (they get sent every 5 mins now after it
goes wrong).  I've not been keeping records, (because I just bin
those hundreds of emails) but it happens most weeks, and I've had two
this week (opencv and gnuradio)

I'll start collecting info here to see if we can narrow this down a bit.

So, latest is
opencv_4.6.0+dfsg-13.1~exp1_armel built for experimental on arm-ubc-06
the changes file is Feb  4 03:29
Looking in the build logs I see it was built and uploaded successfully 3hrs later on
arm-arm-03
https://buildd.debian.org/status/architecture.php?a=armel&suite=experimental&buildd=buildd_armel-arm-ubc-06
https://buildd.debian.org/status/architecture.php?a=armel&suite=experimental&buildd=buildd_armel-arm-arm-03

The second build started 1hr after the .changes files for the frist one was made, so I guess there is a timeout of 1hr after the log arrives and if there is no uploade by then the buildd assumes failure and schedules another build?

I have noticed before that usually by the time I look at the failed
upload there is already a new build uploaded. It would be nice if the
buildds tidied up after themselves once the build is in the archive
and stopped sending tiresome email awaiting a manual clear-up. Once a
new build has been issued the old failed upload should be removed. I'm
not quite sure exactly what that check should look like. Alternatively
we could stop sending very frequent mail to buildd admins, and let the
'are files a week old' script tidy them up in due course.

The actual error on the failed log is:
Finished at 2024-02-04T03:29:15Z
Signature with key '764BC9A1354021955868EF5CC98724D9AA73AAA3' requested:
 signfile buildinfo /home/buildd/build/opencv_4.6.0+dfsg-13.1~exp1_armel-buildd.buildinfo 764BC9A1354021955868EF5CC98724D9AA73AAA3
gpg: error running '/usr/bin/gpg-agent': exit status 2
gpg: failed to start agent '/usr/bin/gpg-agent': General error
gpg: can't connect to the agent: General error
gpg: keydb_search failed: No agent running
gpg: skipped "764BC9A1354021955868EF5CC98724D9AA73AAA3": No agent running
gpg: /tmp/debsign.e1vK8yhj/opencv_4.6.0+dfsg-13.1~exp1_armel-buildd.buildinfo: clear-sign failed: No agent running
debsign: gpg error occurred!  Aborting....

Looking in var/log/messages at that time (on arm-ubc-06)
We see that some script set to log starts 1 second after sbuild returns, at 03:29:16
and sends no more messages after 03:30:38. So takes 1m22s (82s) to run.
Does that indicate the suspected high load which might be making gpg fail?
I don't think we log load per se, do we? It has a lot of debs to check so I think it just takes a while.

times for running that script in this log for various packages:
 16s singular_4.3.2-p10+ds-1.1~exp1_armel
 15s redland_1.0.17-3.1~exp1_armel
  9s shapetools_1.4pl6-16.1~exp1_armel
  9s solvespace_3.1+ds1-3.1~exp1_armel
  8s secsipidx_1.3.2-2.1~exp1_armel
  8s scamper_20211212-1.2~exp1_armel
  6s rttr_0.9.6+dfsg1-6.1~exp1_armel
  8s mfem_4.5.2+ds-1.5~exp1_armel
 82s opencv_4.6.0+dfsg-13.1~exp1_armel  (failed to sign)
 15s rhvoice_1.8.0+dfsg-3.1~exp1_armel
  5s libposix-2008-perl_0.23-1_armel
  4s rust-expectrl_0.7.1-2_armel
  5s liblinux-fd-perl_0.016-1_armel
 10s swami_2.2.2-2.1~exp1_armel
  6s symmetrica_3.0.1+ds-2.1~exp1_armel
  9s muffin_5.8.1-2.1~exp1_armel
  5s t4kcommon_0.1.1-11.1~exp1_armel
  4s netperfmeter_1.9.6-1_armel
  6s pidgin-skype_20240122+gitab786a3+dfsg-2_armel
  6s tinyframe_0.1.1-4.1~exp1_armel
  5s toontag_0.0~git20220105193632.41237ef-2.1~exp1_armhf
  4s tse3_0.3.1-6.1~exp1_armel
 86s gcc-10_10.5.0-3_armel
  5s lomiri-camera-app_4.0.5+dfsg-1_armel
  8s opendmarc_1.4.2-4.1~exp1_armel
154s libreoffice_24.2.0-1~bpo12+1_armel
 59s gnuradio_3.10.9.2-1.1~exp1_armel    (failed to sign)

So opencv is not the longest package to process. libreoffice takes quite a lot longer. , gcc-10 slightly longer. But most are way quicker.

I have noticed that it's usually larger packages that go wrong. (libreoffice, gcc, binutils, but not always)

Not sure if any of this info helps but that's my investigations
today. Suggestions for other monitoring, or the best way to work around it by
not fixing it, just making it less annoying, welcome.

Wookey