#1012318 diffoscope 214 produced no output and was killed after running into timeout after 150m

#1012318#5
Date:
2022-06-03 22:05:18 UTC
From:
To:
Dear maintainer,

it would appear, diffoscope is failing when runing agains halide package:

https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/halide.html

Unfortunately, i do not know what the more specific problem is.

Roman.

- -- System Information:
Debian Release: bookworm/sid
  APT prefers unstable
  APT policy: (990, 'unstable'), (500, 'unstable-debug'), (1, 'experimental-debug'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 5.17.0-3-amd64 (SMP w/32 CPU threads; PREEMPT)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages diffoscope depends on:
pn  diffoscope-minimal  <none>

Versions of packages diffoscope recommends:
pn  abootimg                                       <none>
ii  acl                                            2.3.1-1
pn  androguard                                     <none>
pn  apksigner                                      <none>
pn  apktool                                        <none>
pn  binutils-multiarch                             <none>
ii  bzip2                                          1.0.8-5
pn  caca-utils                                     <none>
pn  colord                                         <none>
pn  db-util                                        <none>
pn  default-jdk-headless | default-jdk | java-sdk  <none>
pn  device-tree-compiler                           <none>
pn  docx2txt                                       <none>
ii  e2fsprogs                                      1.46.5-2
pn  enjarify                                       <none>
ii  ffmpeg                                         10:4.4.2-dmo5
pn  fontforge-extras                               <none>
pn  fp-utils                                       <none>
ii  genisoimage                                    9:1.1.11-3.2
ii  gettext                                        0.21-6
pn  ghc                                            <none>
ii  ghostscript                                    9.56.1~dfsg-1
pn  giflib-tools                                   <none>
pn  gnumeric                                       <none>
ii  gnupg                                          2.2.35-2
ii  gnupg-utils                                    2.2.35-2
pn  hdf5-tools                                     <none>
ii  imagemagick-6.q16 [imagemagick]                8:6.9.11.60+dfsg-1.3+b2
pn  jsbeautifier                                   <none>
ii  libarchive-tools                               3.6.0-1
pn  libxmlb-dev                                    <none>
ii  llvm                                           1:15.0-55~20220203224054.8
ii  lz4 [liblz4-tool]                              1.9.3-2
pn  mono-utils                                     <none>
pn  ocaml-nox                                      <none>
pn  odt2txt                                        <none>
pn  oggvideotools                                  <none>
ii  openssh-client                                 1:9.0p1-1+b1
ii  openssl                                        3.0.3-5
pn  pgpdump                                        <none>
ii  poppler-utils                                  22.02.0-3
pn  procyon-decompiler                             <none>
pn  python3-argcomplete                            <none>
pn  python3-binwalk                                <none>
ii  python3-debian                                 0.1.44
ii  python3-defusedxml                             0.7.1-1
pn  python3-guestfs                                <none>
pn  python3-jsondiff                               <none>
pn  python3-pdfminer                               <none>
pn  python3-progressbar                            <none>
pn  python3-pypdf2                                 <none>
pn  python3-pyxattr                                <none>
pn  python3-rpm                                    <none>
pn  python3-tlsh                                   <none>
pn  r-base-core                                    <none>
pn  radare2                                        <none>
ii  rpm2cpio                                       4.17.0+dfsg1-4
pn  sng                                            <none>
ii  sqlite3                                        3.38.5-1
pn  squashfs-tools                                 <none>
pn  tcpdump                                        <none>
pn  u-boot-tools                                   <none>
ii  unzip                                          6.0-26
ii  vim-common                                     2:8.2.4793-1
pn  wabt                                           <none>
pn  xmlbeans                                       <none>
ii  xxd                                            2:8.2.4793-1
ii  xz-utils                                       5.2.5-2.1
ii  zip                                            3.0-12
ii  zstd                                           1.5.2+dfsg-1

Versions of packages diffoscope suggests:
ii  libjs-jquery  3.6.0+dfsg+~3.5.13-1
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEE+UikBxeiu50LOdFYgbqkFMWfZdAFAmKahZ0ACgkQgbqkFMWf
ZdCizhAAxer8KRHDg7wmjBNf9zK1HH4Oyf/KfRLHW22JsZlMhkKjp449H6E3R2q5
mAlyGE/5L9j6mGFRhxZaGWqxCin54EA0lPZzDgIpHIpLeYZbBP9QUASI3nj+0IfX
5yTkS0vz2SNVWt7/GxNBGUb19vu3fb/jounZyCiWLrP5cU4QhM/nEOgdcvvyllTH
kggEARqBfSQDk0bgkeh8yO3NW6swp97BbP18K6LdnPmKUftSHChePjQo27a0OltZ
ilO4gRym4AIlUTsDsZ98a695Hux5SKmn7Lsok0oYdBumge2cHeFvb7X5FLOwriv6
OnSxjR4sKQROITRjp2/cqijzIsRqBd2SN09yXvOKusZn0yIjTDgDer5UQfzhesFX
NSDyTJgqR5FD5E6f5NaZPl9paw1+c7sgtGEN8bIv4kploeHgME23HKQKxVHt18o0
1AbCKDvcJBt4M7xksYrz4nfRa4XyRzpbEm0oYzQutgZL/+Osr/y9tdaNYXsSYJMu
o9nJcnb5J+jc3JBIYDdTz8p4xgvyq0aPx1vOBZM6/E/q13EQRhF03DTGWAWK3ext
kJMR4HasD3J22YqgJ61S02mYTHnaLezgHSMJbRKm04tMiOjXUYtwOljpNmjiDJAy
xYoBr5KGeAUvKv58InsQY04pSxivjBuBHl563JQP7zvd85TKYCU=
=lBAV
-----END PGP SIGNATURE-----

#1012318#10
Date:
2022-06-04 05:32:27 UTC
From:
To:
Some things just take a long time to run diffoscope, and
tests.reproducible-builds.org only lets diffoscope run for 150m and then
kills it to avoid very large unreproducible packages from eating all the
resources.

Options would be to run diffoscope on two builds of halide locally. The
tool reprotest does this and runs diffoscope by default if it is
installed.

There is also beta.tests.reproducible-builds.org which may not have a
diffoscope timeout, so you might be able to find diffoscope output for
halide there.

There may be improvements that could make diffoscope run faster and
avoid the timeout, though I'm not sure this is a fundamental problem of
diffoscope.

I'll see if I can find you some diffoscope output for halide by running
it on a fairly powerful machine I have access to...


live well,
  vagrant

#1012318#15
Date:
2022-06-06 06:19:10 UTC
From:
To:
...

https://people.debian.org/~vagrant/reproducible/halide.diffoscope.out.zst

It's under 200MB download, expands to ~2GB uncompresed...


live well,
  vagrant

#1012318#20
Date:
2022-06-06 09:11:51 UTC
From:
To:
Vagrant Cascadian wrote:

(I do see that Vagrant has sent a diff in a subsequent email in this
thread...)

For the benefit of Roman, there are actually two levels of timeouts
involved when running diffoscope on tests.reproducible-builds.org:

First, we pass a value to diffoscope's --timeout option:

#1012318#25
Date:
2022-06-07 22:17:12 UTC
From:
To:
Nevertheless, it could get stuck on some arbitrary file, no? Or does it
actually try to abort processing a file if it is taking too long?

Yeah, I think maybe we should add more time to the hard jenkins timeout
to make sure it really gets *some* output.

The original was just the defaults that reprotest uses (perhaps we
should add --profile?).

I re-ran diffoscope with profile output using the same artifacts....

# Profiling output for: /usr/bin/diffoscope --progress --profile --text=halide.diffoscope control/ experiment-1

## close_archive (total time: 0.001s)
       0.000s     24 calls    diffoscope.comparators.xz.XzContainer
       0.000s     22 calls    diffoscope.comparators.tar.TarContainer
       0.000s      6 calls    diffoscope.comparators.deb.DebContainer
       0.000s     10 calls    diffoscope.comparators.deb.DebTarContainer

## command (total time: 1816.184s)
     805.925s   8949 calls    readelf
     770.865s   2011 calls    diff
     232.245s   1188 calls    objdump
       5.085s     36 calls    xz
       2.002s    198 calls    strings
       0.026s      2 calls    cmp
       0.026s      2 calls    cmp (external)
       0.004s     54 calls    cmp (internal)
       0.003s     66 calls    stat
       0.002s     18 calls    objcopy

## compare_files (cumulative) (total time: 3904.923s)
     922.891s      6 calls    abc.DebFile
     922.341s     12 calls    abc.XzFile
     916.800s      6 calls    abc.DebDataTarFile
     873.202s     18 calls    abc.ElfFile
     183.319s     68 calls    diffoscope.comparators.elf.ElfCodeSection
      46.212s    338 calls    diffoscope.comparators.elf.ElfSection
      29.528s   1047 calls    abc.TextFile
      10.056s     34 calls    diffoscope.comparators.elf.ElfStringSection
       0.446s      6 calls    abc.TarFile
       0.126s      6 calls    abc.Md5sumsFile
       0.000s      1 call     diffoscope.comparators.directory.FilesystemDirectory

## container_extract (total time: 6.318s)
       5.089s     36 calls    diffoscope.comparators.xz.XzContainer
       1.109s   6388 calls    diffoscope.comparators.deb.DebTarContainer
       0.098s     48 calls    diffoscope.comparators.deb.DebContainer
       0.022s     40 calls    diffoscope.comparators.tar.TarContainer

## has_same_content_as (total time: 0.109s)
       0.071s   1057 calls    abc.TextFile
       0.016s      9 calls    abc.DebFile
       0.013s      6 calls    abc.DebDataTarFile
       0.003s    338 calls    diffoscope.comparators.elf.ElfSection
       0.002s     18 calls    abc.ElfFile
       0.001s     12 calls    abc.Md5sumsFile
       0.001s     12 calls    abc.XzFile
       0.001s      6 calls    abc.TarFile
       0.001s     68 calls    diffoscope.comparators.elf.ElfCodeSection
       0.000s      3 calls    diffoscope.comparators.utils.libarchive.LibarchiveSymlink
       0.000s     34 calls    diffoscope.comparators.elf.ElfStringSection
       0.000s      1 call     diffoscope.comparators.directory.FilesystemDirectory

## main (total time: 935.422s)
     935.417s      2 calls    outputs
       0.005s      1 call     cleanup

## open_archive (total time: 0.001s)
       0.000s     36 calls    diffoscope.comparators.xz.XzContainer
       0.000s     24 calls    diffoscope.comparators.tar.TarContainer
       0.000s     12 calls    diffoscope.comparators.deb.DebTarContainer
       0.000s     12 calls    diffoscope.comparators.deb.DebContainer

## output (total time: 12.131s)
      12.131s      1 call     text

## recognizes (total time: 8.809s)
       7.074s  51970 calls    diffoscope.comparators.utils.libarchive.LibarchiveMember
       1.111s   7436 calls    diffoscope.comparators.elf.ElfSection
       0.240s   1496 calls    diffoscope.comparators.elf.ElfCodeSection
       0.221s   1488 calls    diffoscope.comparators.utils.archive.ArchiveMember
       0.119s    748 calls    diffoscope.comparators.elf.ElfStringSection
       0.043s    468 calls    diffoscope.comparators.binary.FilesystemFile
       0.001s     96 calls    abc.Md5sumsFile
       0.000s     12 calls    diffoscope.comparators.utils.libarchive.LibarchiveSymlink

## specialize (total time: 9.476s)
       9.476s   1564 calls    specialize



And again for html generation:

# Profiling output for: /usr/bin/diffoscope --progress --profile --html halide.diffoscope.out.html control/ experiment-1

## close_archive (total time: 0.001s)
0.000s     24 calls    diffoscope.comparators.tar.TarContainer
0.000s     36 calls    diffoscope.comparators.xz.XzContainer
0.000s     12 calls    diffoscope.comparators.deb.DebContainer
0.000s     12 calls    diffoscope.comparators.deb.DebTarContainer

## command (total time: 1784.837s)
784.955s   8949 calls    readelf
757.622s   2011 calls    diff
235.317s   1188 calls    objdump
4.824s     36 calls    xz
2.056s    198 calls    strings
0.027s      2 calls    cmp
0.027s      2 calls    cmp (external)
0.004s     54 calls    cmp (internal)
0.003s     66 calls    stat
0.002s     18 calls    objcopy

## compare_files (cumulative) (total time: 3866.521s)
915.357s      6 calls    abc.DebFile
914.809s     12 calls    abc.XzFile
909.520s      6 calls    abc.DebDataTarFile
856.681s     18 calls    abc.ElfFile
184.384s     68 calls    diffoscope.comparators.elf.ElfCodeSection
46.102s    338 calls    diffoscope.comparators.elf.ElfSection
29.150s   1047 calls    abc.TextFile
9.956s     34 calls    diffoscope.comparators.elf.ElfStringSection
0.440s      6 calls    abc.TarFile
0.123s      6 calls    abc.Md5sumsFile
0.001s      1 call     diffoscope.comparators.directory.FilesystemDirectory

## container_extract (total time: 6.051s)
4.827s     36 calls    diffoscope.comparators.xz.XzContainer
1.104s   6388 calls    diffoscope.comparators.deb.DebTarContainer
0.096s     48 calls    diffoscope.comparators.deb.DebContainer
0.023s     40 calls    diffoscope.comparators.tar.TarContainer

## diff (total time: 15.681s)
15.681s   7207 calls    linediff

## has_same_content_as (total time: 0.109s)
0.070s   1057 calls    abc.TextFile
0.016s      6 calls    abc.DebDataTarFile
0.014s      9 calls    abc.DebFile
0.003s    338 calls    diffoscope.comparators.elf.ElfSection
0.002s     18 calls    abc.ElfFile
0.001s     12 calls    abc.Md5sumsFile
0.001s     12 calls    abc.XzFile
0.001s      6 calls    abc.TarFile
0.001s     68 calls    diffoscope.comparators.elf.ElfCodeSection
0.000s      3 calls    diffoscope.comparators.utils.libarchive.LibarchiveSymlink
0.000s     34 calls    diffoscope.comparators.elf.ElfStringSection
0.000s      1 call     diffoscope.comparators.directory.FilesystemDirectory

## main (total time: 1329.326s)
1329.326s      2 calls    outputs
0.000s      1 call     cleanup

## open_archive (total time: 0.001s)
0.000s     36 calls    diffoscope.comparators.xz.XzContainer
0.000s     24 calls    diffoscope.comparators.tar.TarContainer
0.000s     12 calls    diffoscope.comparators.deb.DebTarContainer
0.000s     12 calls    diffoscope.comparators.deb.DebContainer

## output (total time: 413.571s)
413.571s      1 call     html

## recognizes (total time: 8.921s)
7.049s  51970 calls    diffoscope.comparators.utils.libarchive.LibarchiveMember
1.216s   7436 calls    diffoscope.comparators.elf.ElfSection
0.255s   1496 calls    diffoscope.comparators.elf.ElfCodeSection
0.224s   1488 calls    diffoscope.comparators.utils.archive.ArchiveMember
0.128s    748 calls    diffoscope.comparators.elf.ElfStringSection
0.047s    468 calls    diffoscope.comparators.binary.FilesystemFile
0.001s     96 calls    abc.Md5sumsFile
0.000s     12 calls    diffoscope.comparators.utils.libarchive.LibarchiveSymlink

## specialize (total time: 9.574s)
       9.574s   1564 calls    specialize

#1012318#28
Date:
2022-06-07 23:06:05 UTC
From:
To:
Just mentioning that we already call diffoscope with --profile on
jenkins, just that if it gets killed hard it won't print it out.

Furthermore, there are 6 minutes (i.e. 0.1 hours) between SIGTERM and
SIGKILL.  I don't know at which stage of this case diffoscope was
terminated, but perhaps it's one side that could be investiage: to make
diffoscope output something on SIGTERM when it still has a chance to
continue running.

vs

So it's not really the presenter that takes excessively.  But otoh 413
seconds are more than 6 minutes.  If somebody looks into the point above
I can change jenkins to wait 12 minutes (or 18) before KILLing after the
TERM signal.

#1012318#33
Date:
2022-06-08 07:24:04 UTC
From:
To:
Vagrant Cascadian wrote:

You are right — diffoscope could get stuck processing some arbitrary
file and --timeout does not cause diffoscope abort halfway through
processing a file (or, more likely, a call to /usr/bin/readelf!) if
it's taking too long.

I have tried to make this pretty clear in the `--help` output as well
as clarifying that in any Salsa issue involving the --timeout flag
but it still seems to be causing some confusion, alas. :)  If it's
okay with you, I'll save an explanation of why this would be difficult
to implement within diffoscope for another venue, and not in this
(slightly) unrelated Debian bug report.

Either way, getting stuck on a particular file does not appear to be
the problem in this case. Assuming that the issue is not specific to
Jenkins' build of halide, you appear to be able to run it successfully
to the end...


Best wishes,

#1012318#38
Date:
2022-06-08 07:15:43 UTC
From:
To:
Mattia Rizzolo wrote:

I'm not quite sure what connection you are making between the signal
processing within diffoscope and HTML output generation. When
diffoscope receives a TERM signal, it only attempts to clean up
temporary files, it does not attempt to try and write any HTML output.
See sigterm_handler in diffoscope/main.py, etc.. (I'm not entirely
sure it should attempt to do so either. Signals are weird.)

We could certainly try and print the profiling output if we receive a
TERM though... aaaand I've pushed something to that effect to master.
I've also added some logging warnings so it's much clearer if we are
in that state.

I'm +1 for increasing the hard upper limit on Jenkins and could be
convinced to increase this TERM/KILL gap too, but as I mention above,
I'm not sure we're following the same rationale yet.


Regards,

#1012318#41
Date:
2022-06-08 10:21:32 UTC
From:
To:
Thank you for reminding me what the handler was doing!
I wrote my email yesterday without actually looking, I just vaguely
remembered it was doing _something_!

Oh, yes, that's probably what it ought to do indeed (logging that it
received a signal and leave some traces of itself on stdout, at least).

Before raising the KILL limit diffoscope should learn how to treat TERM
(and/or probably other signals too?) in a way that makes it at least
terminating its current task (i.e. killing subprocesses, and start to
render the output).  But indeed, should diffoscope really do that?


Overall, I agree we don't have a bug here.

#1012318#46
Date:
2022-06-08 12:15:17 UTC
From:
To:
Hey Mattia,

Hm, by "leave some traces of itself on stdout", what exactly do you
mean by that? You appear to be implying it should do more than simply
logging it has received a signal and is cleaning up after itself?

Hmm. Okay, let me spell out some of the downsides of doing this, if
only as a way of explaining them back to myself (!) and seeing how
much of a problem they really are:

* It's unclear whether this fits the semantics of the TERM signal.
  As you yourself ask in your reply, it is unclear whether
  diffoscope *should* actually do this. (If I were on the command-line
  and hit CTRL+C, I'm not entirely sure I'd want it to stop performing
  a diff and start opening files to write HTML...)

* Writing "signal-safe" code is fraught with weirdness, and this is
  compounded by us using a scripting language that partly abstracts
  away signal handling. (In other words, *could* we reliably do this?
  I'm not sure.)

* Related to the previous point, the code paths for this particular
  mode would be difficult to test, and even if we had good testing
  coverage, they would not be exercised very often in real life.
  Re-creating issues arising from supporting this would likely be
  almost impossible — something breaking on Jenkins would be difficult
  to reproduce locally.

* Things like subprocesses should be reaped anyway, at least according
  to my understanding about how processes work.

* It's all a bit of a workaround to "diffoscope being slow". Or, rather,
  there are higher-priority things that will avoid us reaching the
  timeout in the first place. (For instance, improving the overall
  speed of ELF handling.)

(Sorry for the sloppy braindump; again, it's not intended as a rejoinder
to what you wrote...)


Best wishes,

#1012318#49
Date:
2022-06-08 12:34:13 UTC
From:
To:
Sorry for being unclear.  I meant that what you commit is exactly what
it should have done from the start.  Currently (before your commits)
diffoscope would not print anything out after receiving TERM, save from
some .debug() output about cleaning up temp files (which are not printed
by default).

Right.  That's exactly why I also mentioned other signals, which might
have better semantics for this case (like, HUP, which nowadays is pretty
much process-specific).

(besides, doesn't CTRL+C send INT, not TERM?)

But I agree with this, so my musing above are probably better discarded
at this time, and consider this bug closed :)

#1012318#54
Date:
2022-06-08 13:27:22 UTC
From:
To:
Mattia Rizzolo wrote:

(Yes, whoops, you are absolutely right.)

Yes, I should say so. Please go ahead...


Best wishes,

#1012318#59
Date:
2022-06-08 13:50:28 UTC
From:
To:
Thus, closing.

Chris committed a couple of commits to have a nicer log on the console
when diffoscope is killed, but regardless of that, that diffoscope was
killed after a timeout on jenkins is not an intrisic bug of diffoscope.