- Package:
- src:gnupg2
- Source:
- gnupg2
- Submitter:
- John Paul Adrian Glaubitz
- Date:
- 2024-02-05 15:57:02 UTC
- Severity:
- normal
Hi! I'm observing random failures of sbuild signing the changes file after build on some buildds, especially on sparc64 and most often on the machine sompek. I'm not sure yet what the problem is but it looks like a GPG problem [1]: gpg: can't connect to the agent: IPC connect call failed gpg: keydb_search failed: No agent running gpg: skipped "F1EA40F487003E5047A04D0A62D1430FE7E0DE86": No agent running gpg: /tmp/debsign.oB0Milcr/rust-kstring_1.0.0-1_sparc64.changes: clear-sign failed: No agent running debsign: gpg error occurred! Aborting.... I'm filing this issue here to get some attention and maybe some ideas where the problem could be. Maybe sbuild should wait a little longer before attempting to sign? Or maybe the GPG daemon is crashing often? In case it's safe not to be an issue with buildd/sbuild, feel free to reassign. Thanks, Adrian
Hello, IIUC, it is likely a bug of GnuPG. When a server with higher load, it takes time for a process of gpg-agent to start listen(2)ing the connection than the value of SECS_TO_WAIT_FOR_AGENT. It's five seconds, hard-coded. Last year, I pushed a change to remove possible race condition around the initial connect to gpg-agent. I don't know if this patch fixes the particular problem of sbuild, but, it should improve the situation, hopefully, much.
We believe that the bug you reported is fixed in the latest version of
gnupg2, which is due to be installed in the Debian FTP archive.
A summary of the changes between this version and the previous one is
attached.
Thank you for reporting the bug, which will now be closed. If you
have further comments please address them to 972525@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.
Debian distribution maintenance software
pp.
Daniel Kahn Gillmor <dkg@fifthhorseman.net> (supplier of updated gnupg2 package)
(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@ftp-master.debian.org)
Format: 1.8
Date: Mon, 08 Feb 2021 17:57:00 -0500
Source: gnupg2
Architecture: source
Version: 2.2.27-1
Distribution: unstable
Urgency: medium
Maintainer: Debian GnuPG Maintainers <pkg-gnupg-maint@lists.alioth.debian.org>
Changed-By: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
Closes: 868550 972525
Changes:
gnupg2 (2.2.27-1) unstable; urgency=medium
.
[ NIIBE Yutaka ]
* New upstream release.
.
[ Christoph Biedl ]
* Tighten libgcrypt and libksba dependency
.
[ Daniel Kahn Gillmor ]
* change debian packaging branch name to debian/main
* refresh patches using gbp pq
* point to upstream commit used to improve spawning reliability
* Refresh 3072-bit default patch
* standards-version: bump to 4.5.1 (no changes needed)
* dh: bump to dh 13
* clean up lintian overrides
* fully drop symcryptrun
.
gnupg2 (2.2.26-1) UNRELEASED; urgency=medium
.
[ Jeremiah C. Foster ]
* debian/scdaemon.udev: Add an entry for Librem Key.
.
[ NIIBE Yutaka ]
* New upstream release.
* refresh patches.
* debian/rules: Add build for regexp.
* debian/gnupg-utils.install: Remove /usr/bin/symcryptrun.
Fix for gpgsplit, which is changed in upstream from 'noinst'.
* debian/patches/gpg-change-agent-spawn-2019-07-24-v2.patch: New patch to
fix a race condition, backported from master (Closes: #868550, #972525).
* debian/scdaemon.udev: Add a generic entry for "Gnuk Token" and another
for GnuPG e.V.
* org.gnupg.scdaemon.metainfo.xml: Add an entry for GnuPG e.V.
Checksums-Sha1:
eedfdd8cef88679841202fb937945125b901dbc6 3236 gnupg2_2.2.27-1.dsc
d928d4bd0808ffb8fe20d1161501401d5d389458 7191555 gnupg2_2.2.27.orig.tar.bz2
7fc979ac1633b07f7ccb2fa06150402935726b05 119 gnupg2_2.2.27.orig.tar.bz2.asc
a2fb026d7704e36f7a72ae3804de19d1579e4c7c 62536 gnupg2_2.2.27-1.debian.tar.xz
fc81c61f782d110f9a3cfe3034f5db386b964151 18862 gnupg2_2.2.27-1_amd64.buildinfo
Checksums-Sha256:
d43af9a91d0392c2edcccea24dc0dd6e34b4bac65950f3c962c3b916a4cd9774 3236 gnupg2_2.2.27-1.dsc
34e60009014ea16402069136e0a5f63d9b65f90096244975db5cea74b3d02399 7191555 gnupg2_2.2.27.orig.tar.bz2
2b44fd82da223cb629062b9c8840d92698c003be8531fc393c38f97b28cae2a4 119 gnupg2_2.2.27.orig.tar.bz2.asc
668b77dc35cde0b65a6c18fab5620fca2c92208d0cc8298822c6eee56342e390 62536 gnupg2_2.2.27-1.debian.tar.xz
e846f2a03ad7508a77a2b1578c0aeecbabf3a1e7455bfaaad902944249bc1b0c 18862 gnupg2_2.2.27-1_amd64.buildinfo
Files:
b75d8759d87e00528a62bf6da17ebefc 3236 utils optional gnupg2_2.2.27-1.dsc
a9c002b5356103c97412955a1956ae0c 7191555 utils optional gnupg2_2.2.27.orig.tar.bz2
3a7ebb524a333b41032765eb651ea032 119 utils optional gnupg2_2.2.27.orig.tar.bz2.asc
57978d4d1312044655b2c1fa4ea6843f 62536 utils optional gnupg2_2.2.27-1.debian.tar.xz
667fcc0af533972bdbd028c860c26956 18862 utils optional gnupg2_2.2.27-1_amd64.buildinfo
-----BEGIN PGP SIGNATURE-----
iHUEARYIAB0WIQQttUkcnfDcj0MoY88+nXFzcd5WXAUCYCHEFwAKCRA+nXFzcd5W
XLr9AQC82VMrWVAhlWZBMmufgbL4/dL1lZCGwzP11fwY3pHr9gEA+Nzw5HQZnzTD
jK8B67Ka6oAe+miIc0DVvmXXWnAHywA=
=VzcZ
-----END PGP SIGNATURE-----
control: reopen 972525 control: found 972525 gnupg2/2.2.27-2 from time to time on the buildds. For instance: Signature with key '37B76E354F4D3F4EB64D699B374AD85D2226CD3D' requested: signfile buildinfo /home/buildd/build/linux_5.10.92-2_amd64-buildd.buildinfo 37B76E354F4D3F4EB64D699B374AD85D2226CD3D gpg: can't connect to the agent: IPC connect call failed gpg: keydb_search failed: No agent running gpg: skipped "37B76E354F4D3F4EB64D699B374AD85D2226CD3D": No agent running gpg: /tmp/debsign.12nNJjLy/linux_5.10.92-2_amd64-buildd.buildinfo: clear-sign failed: No agent running debsign: gpg error occurred! Aborting....
Aurelien Jarno <aurelien@aurel32.net> wrote: Thanks a lot for sharing the fact. I assume that the failure occurs in a situation of heavy load, where it takes time for gpg-agent to start its service. In upstream, I opened another ticket, because I found a possible cause of this issue (another one): https://dev.gnupg.org/T5884 The change of GnuPG 2.3 in master: https://dev.gnupg.org/rGd94b411f129f572aace005f38cd78515d810135a can be applied manually. (Manually, because we had spell fix of the comment in 2.3.)
I've been seeing this regularly, and getting hundreds of 'dupload failed' emails as a result (they get sent every 5 mins now after it goes wrong). I've not been keeping records, (because I just bin those hundreds of emails) but it happens most weeks, and I've had two this week (opencv and gnuradio) I'll start collecting info here to see if we can narrow this down a bit. So, latest is opencv_4.6.0+dfsg-13.1~exp1_armel built for experimental on arm-ubc-06 the changes file is Feb 4 03:29 Looking in the build logs I see it was built and uploaded successfully 3hrs later on arm-arm-03 https://buildd.debian.org/status/architecture.php?a=armel&suite=experimental&buildd=buildd_armel-arm-ubc-06 https://buildd.debian.org/status/architecture.php?a=armel&suite=experimental&buildd=buildd_armel-arm-arm-03 The second build started 1hr after the .changes files for the frist one was made, so I guess there is a timeout of 1hr after the log arrives and if there is no uploade by then the buildd assumes failure and schedules another build? I have noticed before that usually by the time I look at the failed upload there is already a new build uploaded. It would be nice if the buildds tidied up after themselves once the build is in the archive and stopped sending tiresome email awaiting a manual clear-up. Once a new build has been issued the old failed upload should be removed. I'm not quite sure exactly what that check should look like. Alternatively we could stop sending very frequent mail to buildd admins, and let the 'are files a week old' script tidy them up in due course. The actual error on the failed log is: Finished at 2024-02-04T03:29:15Z Signature with key '764BC9A1354021955868EF5CC98724D9AA73AAA3' requested: signfile buildinfo /home/buildd/build/opencv_4.6.0+dfsg-13.1~exp1_armel-buildd.buildinfo 764BC9A1354021955868EF5CC98724D9AA73AAA3 gpg: error running '/usr/bin/gpg-agent': exit status 2 gpg: failed to start agent '/usr/bin/gpg-agent': General error gpg: can't connect to the agent: General error gpg: keydb_search failed: No agent running gpg: skipped "764BC9A1354021955868EF5CC98724D9AA73AAA3": No agent running gpg: /tmp/debsign.e1vK8yhj/opencv_4.6.0+dfsg-13.1~exp1_armel-buildd.buildinfo: clear-sign failed: No agent running debsign: gpg error occurred! Aborting.... Looking in var/log/messages at that time (on arm-ubc-06) We see that some script set to log starts 1 second after sbuild returns, at 03:29:16 and sends no more messages after 03:30:38. So takes 1m22s (82s) to run. Does that indicate the suspected high load which might be making gpg fail? I don't think we log load per se, do we? It has a lot of debs to check so I think it just takes a while. times for running that script in this log for various packages: 16s singular_4.3.2-p10+ds-1.1~exp1_armel 15s redland_1.0.17-3.1~exp1_armel 9s shapetools_1.4pl6-16.1~exp1_armel 9s solvespace_3.1+ds1-3.1~exp1_armel 8s secsipidx_1.3.2-2.1~exp1_armel 8s scamper_20211212-1.2~exp1_armel 6s rttr_0.9.6+dfsg1-6.1~exp1_armel 8s mfem_4.5.2+ds-1.5~exp1_armel 82s opencv_4.6.0+dfsg-13.1~exp1_armel (failed to sign) 15s rhvoice_1.8.0+dfsg-3.1~exp1_armel 5s libposix-2008-perl_0.23-1_armel 4s rust-expectrl_0.7.1-2_armel 5s liblinux-fd-perl_0.016-1_armel 10s swami_2.2.2-2.1~exp1_armel 6s symmetrica_3.0.1+ds-2.1~exp1_armel 9s muffin_5.8.1-2.1~exp1_armel 5s t4kcommon_0.1.1-11.1~exp1_armel 4s netperfmeter_1.9.6-1_armel 6s pidgin-skype_20240122+gitab786a3+dfsg-2_armel 6s tinyframe_0.1.1-4.1~exp1_armel 5s toontag_0.0~git20220105193632.41237ef-2.1~exp1_armhf 4s tse3_0.3.1-6.1~exp1_armel 86s gcc-10_10.5.0-3_armel 5s lomiri-camera-app_4.0.5+dfsg-1_armel 8s opendmarc_1.4.2-4.1~exp1_armel 154s libreoffice_24.2.0-1~bpo12+1_armel 59s gnuradio_3.10.9.2-1.1~exp1_armel (failed to sign) So opencv is not the longest package to process. libreoffice takes quite a lot longer. , gcc-10 slightly longer. But most are way quicker. I have noticed that it's usually larger packages that go wrong. (libreoffice, gcc, binutils, but not always) Not sure if any of this info helps but that's my investigations today. Suggestions for other monitoring, or the best way to work around it by not fixing it, just making it less annoying, welcome. Wookey