#942893 ftp.debian.org: please drop MD5sum lines from Packages

#942893#5
Date:
2019-10-22 21:30:41 UTC
From:
To:
The Packages file is growing, and we would like to keep it smaller.

The MD5sum lines are vestigial at this point.  Anything that they do
can be done better with the data from the SHA256sum lines.

Removal of the MD5sum lines would reduce the size of the gzip'ed
Packages file by ~13%, a significant win for a frequently-downloaded
file:

$ grep -v ^MD5sum < /var/lib/apt/lists/ftp.debian.org_debian_dists_sid_main_binary-amd64_Packages | gzip -9 | wc -c
9541056
0 dkg@alice:~$ cat < /var/lib/apt/lists/ftp.debian.org_debian_dists_sid_main_binary-amd64_Packages | gzip -9 | wc -c
10913735
0 dkg@alice:~$ echo $(( 100 - 100 * 9541056 / 10913735 ))
13
0 dkg@alice:~$

This removal was attempted once before, as documented in #818463, and
all of the subsequent blocking bugs appear to have been fixed in the
archive for several years.

#887831 suggests that jigdo may currently still be broken if MD5sum
goes away, but perhaps that's more of a reflection on the unmaintained
state of jigdo than it is on the state of the archive.

#942893#8
Date:
2019-10-22 21:51:56 UTC
From:
To:
Daniel Kahn Gillmor writes:

I agree it would be nice to remove MD5sum from Packages; there are a few
other fields that might also not be that useful (e.g. Maintainer).

From looking, I believe it is debian-cd's tools/grab_md5 that is using
the MD5sum from Packages (and Sources) to avoid having to compute all
these checksums itself.

We could look into either

 - writing MD5sum in a separate file only used by debian-cd (if present,
   otherwise debian-cd should fall back to using Packages), or

 - using a (truncated) sha256sum; this requires that the jigdo client
   only uses the "md5sum" as an opaque identifier for a file.

I've CC'ed debian-cd@ for input.

Ansgar

#942893#13
Date:
2019-10-23 08:27:54 UTC
From:
To:
Hi,

since Steve McIntyre seems to be busy, i try to answer the general
questions about jigdo.

The files .jigdo and .template get created by xorriso along with the
creation of the .iso image file.

The MD5s in .jigdo and .template are used for bringing together the
file items in both formats. .template has a byte interval gap and a MD5,
.jigdo has a MD5 and a package path, beginning at "pool/".
Like (after gunzip *.jigdo):
  FexKzYyIVG2rRb1UjUKj8Q=Debian:pool/contrib/b/biomaj-watcher/biomaj-watcher_1.2.2-4_all.deb

Insofar the MD5 (here as base64 string "FexKzYyIVG2rRb1UjUKj8Q") is only
an opaque identifier.
But at other occasions it is indeed used as error detector. See bug #772110
where jigdo-file reports a damaged download of a .deb, but is just not able
to correct the problem on its own.
Neither secure nor perfect. But better than no hint at all, i'd say.

In general, a change of the opaque identifier would demand changes in
libjte, which produces .jigdo and .template under control of xorriso,
and in jigdo-file, which would then have to learn to re-compute the
identifier of a package for its imperfect check for glitches in mirror
server or transport.

Changes in libjte would probably my realm. I am ready to follow tangible
instructions. Best relying on a checksum that it can already compute:
(MD5), (SHA1), SHA256, SHA512.

But given Steve McIntyre's silence on the discussion of bug #887831,
which is actually about beefing up jigdo-lite's initial and final
tests for success to the strength of SHA512SUMS.sign and SHA512SUMS,
i am pessimistic that a change from MD5 to some part of the SHA256
will happen soon in jigdo-lite/file.
(He would also have to package the new libjte version.)

Further it would create  the need for a legacy version of jigdo-lite/file
for MD5-based jigdos which are available in the archive:
http://cdimage.debian.org/mirror/cdimage/archive/
(Between 6 and 9 there are no iso-dvd sub directories. Since 9.2. they
 are back.)
------------------------------------------------------------------- Ansgar wrote: Looking at https://sources.debian.org/src/debian-cd/3.1.26/tools/grab_md5/ i think that line 105 could get changed from MD5=`echo $ENTRY | /bin/sed 's/:.*$//g'` to something which uses /usr/bin/md5sum on the package file, rather than inquiring the package information. I believe to see in line 107 printf '%s %12.12s %s\n' $MD5 $SIZE $PATH the production of a line for the input file of xorrisofs option -md5-list, as described in man xorrisofs: "Each designated file is represented in the .md5 file by a single text line: MD5 as 32 hex digits, 2 blanks, size as 12 decimal digits or blanks, 2 blanks, symbolic file address " So $MD5 should be filled with the first word of the output of md5sum. (Now who can guess where to find the path to the package file .deb ?) Have a nice day :) Thomas
#942893#18
Date:
2019-10-23 15:39:24 UTC
From:
To:
Well, not just that. It grabs them for use in the jigdo file. The
jigdo backend in xorriso (libjte) also checks them as it creates the
ISO, for sanity checking on archive/mirror consistency right there.

The actual md5 checksum is calculated by the clients too, so the
latter is not really an option.

I've started a local branch to update jigdo and jigit/libjte to use
sha256 some time ago, but -ENOTIME. As mentioned in IRC yesterday, we
will also need some time to update clients in the field to be able to
upgrade safely. That includes Windows binaries (yay!)...

#942893#23
Date:
2019-10-24 01:29:53 UTC
From:
To:
Sounds like this is the only option available given the constraints of
deployed systems in the field.

What parts of debian's internal machinery need to be updated to do such
a thing?

Bummer, and i feel for you.

Perhaps we should officially EOL jigdo now, if no one has time to work
on it.

Obviously, we'd continue supporting deployed legacy systems and give
them a chance (one release cycle?) to switch to something that is
actually maintained, but it is doing them no favors to pretend that a
system they're relying on is getting maintenance when no one has time to
work on it.

The time to update (or deprecate) deployed clients that depend on md5
for object integrity was something like 8 years ago when RFC 6151 was
published :(

#942893#28
Date:
2019-10-24 07:05:16 UTC
From:
To:
Hi,

Ansgar wrote:

Steve McIntyre wrote:

The aspect of "archive/mirror consistency" is not what i perceive as
the main purpose of the MD5s. I'd rather characterize them as relation
keys and as transport checksums.
Not as security precaution.

If you had asked me, i would have tried to talk you out of this.

The MD5s are sufficient for their purposes. Nothing essential is gained
by using SHA256 instead (and why not SHA512 if security matters ?).
An estimation of birthday paradox probability with a billion .deb packages
yields as upper limit for collision probability: 1 - e exp -1e-20
The negative e potency nearest to 0 which powl(3) can compute as non-0
is e exp -1e-18 (and the result printfed by %Ld looks questionable:
0.999999999999999999).

The security weakness of jigdo-lite download is in the fact that the
input file .info is not verified at all, .template is verified by an MD5
(not one of the package MD5s), and the result .iso is verified by MD5.

The user has means to do better. But they are neither mandatory nor
described in a way that a novice could apply them.
So the verification steps need to be augmented to match the security of
user applicable SHA512SUMS.sign and SHA512SUMS.

I propose to directly use this stronger authentication at the start
and end of jigdo-lite, and to leave the jigdo entrails as they are.

My proposal would make this update of clients much smoother, because the
old not-so-safe clients would continue to work with new jigdo files.

I wonder whether it is really that hard for debian-cd to compute the MD5s
on its own, before it runs xorriso.

Who will maintain them ?

If there is expertise about MS-Windows and MacOS available, i would ask
for help with the open questions in

https://wiki.debian.org/JigdoOnLive

which are:

- How to get firmware and network helper software when Debian Live is up ?

- How to get write access to the usual OS' filesystem in order to download
  .jigdo and .template, and in order to create the .iso file ?

(Both are questions which Debian Live should be able to answer anyways,
 if shall not only be a demo but also a rescue system.)


Daniel Kahn Gillmor wrote:

But how will Debian then distribute its full DVD sets and the BD-sized
ISOs ?


Have a nice day :)

Thomas

#942893#33
Date:
2019-10-24 07:10:50 UTC
From:
To:
Hi Ansgar

We still do that, see /indices/md5sum.gz.

Bastian

#942893#38
Date:
2019-10-24 07:22:11 UTC
From:
To:
Hi,

i wrote falsely:

It would of course have to be "compute as non-1".


Have a nice day :)

Thomas

+

#942893#43
Date:
2019-10-24 09:26:08 UTC
From:
To:
Hi,

too early in the morning i wrote:

I meant input file ".jigdo", not ".info".


Have a nice day :)

Thomas

#942893#48
Date:
2019-10-24 10:16:10 UTC
From:
To:
No, *really* no. It's just bumped up my priority list now.

It's more complicated than this - we *also* use jigdo for:

 * mirroring of images, both on the mirror network and also for those
   of us doing release day tests etc.

 * providing a wider range of images for download without having to
   store all the data for ISO / BT download (e.g. a full range of
   DVDs, BD images, etc.)

 * archiving older releases, again so we don't have to keep *all* the
   ISOs *ever*

The vast majority of the usage of MD5 here is for (essentially)
content-addressable storage. Given the context (with a checksum over
the whole image too), this is not such a critical failing.

#942893#53
Date:
2019-10-24 10:23:21 UTC
From:
To:
Sure, that's *most* of it.

It's *also* checking for potential corruption in the mirror at build
time. We used to have a separate slow step in debian-cd for that, then
replaced it with the checking inside JTE. We *have* found occasional
errors this way over the years.

Agreed.

[ suggestion to stay with md5 internally ]

I *do* want to update things here, and it's not far off done AFAICS.

But that loses the mirror-checking feature that I'd like to keep. I'm
looking at moving to sha256 now, and this will pull through the whole
pipeline.

#942893#58
Date:
2019-10-24 10:56:53 UTC
From:
To:
Hi,

i wrote:

Steve McIntyre wrote:

MD5 is well suited for that, as long as this is not considered to be part
of an intrusion detection system.

How about mirror checking by SHA256 in grab_md5, before computing the
MD5 for jigdo ?
This would authorize the MD5 in a similar strength as it is currently by
the list from which grab_md5 reads it.

But the confusion caused by the format change ...
"old-old-stable" not being able to download the full DVD set of "stable".

Don't forget to notify me when a new libjte tarball is ready for inclusion
in GNU xorriso.


Have a nice day :)

Thomas

#942893#63
Date:
2019-10-24 11:27:44 UTC
From:
To:
Hi,

i wrote:

Or you could let libjte internally compute both, SHA256 and MD5,
let it work with SHA256, but store in .jigdo and .template the MD5.


(I just checked the API definition. If you can tolerate the function
 names libjte_set_md5_path() and libjte_add_md5_demand(), then the
 API part used by xorriso will need no change, whatever you decide.

 Not so good looks the API part which re-narrates the way how genisoimage
 produced jigdo. Functions libjte_decide_file_jigdo() and
 libjte_write_match_record() have MD5 char arrays as parameters.
 They'd need to be deprecated and/or replaced by new functions.
 I am not aware of any other user of libjte except xorriso. So maybe
 just throw out the "Traditional Data File API".

 What happens to "powerpc" ISOs ?
 Will you backport the new JTE to genisoimage ?
)


Have a nice day :)

Thomas

#942893#68
Date:
2019-10-24 13:24:37 UTC
From:
To:
Exactly.
we're doing the I/O anyway. I'd much rather just switch from md5 to
sha256 in both places and use the already-available checksum
data. That's a lot of the point of the JTE design in the first place.

It'll take time to switch everything - I'll make an EOL announcement.

Yup, of course. :-)

#942893#73
Date:
2019-10-24 16:14:54 UTC
From:
To:
Does this mean that we can drop the lines from Packages and the
debian-cd and jigdo will be fine?

Sorry that i still don't understand all the pieces in play here.

#942893#78
Date:
2019-10-24 16:13:59 UTC
From:
To:
Is the final checksum over the whole image also MD5, or do we use
something stronger?

Is there a reason that a maintained version shouldn't use SHA256
instead?

From the debian ecosystem perspective, it would be better to publish
only a single set of "content-addressable" digests (hence this bug
report), so whatever that mechanism is might as well also be
cryptographically strong.

#942893#83
Date:
2019-10-24 18:17:32 UTC
From:
To:
Hi,

Daniel Kahn Gillmor wrote:
range of better checksums to choose from.

A typical .jigdo file contains this header part (after gunzip):
--------------------------------------------------------------------------
  [Image]
  Filename=debian-9.4.0-amd64-DLBD-2.iso
  Template=debian-9.4.0-amd64-DLBD-2.template
  Template-MD5Sum=UUlMi543CsRBsp4bsc3qqQ
  ShortInfo='Debian GNU/Linux 9.4.0 "Stretch" - Official amd64 DLBD Binary-2 20180310-11:21 (20180310)'
  Info='Generated on Sat, 10 Mar 2018 11:51:35 +0000'
  # Template Hex MD5Sum 51494c8b9e370ac441b29e1bb1cdeaa9
  # Template size 9515642 bytes
  # Image Hex MD5Sum 7ba8110513d4b78ae9a3546ad89ba91a
  # Image Hex SHA1Sum 9e3d3335827d6957b4625417694b985c0d1cfb46
  # Image Hex SHA256Sum 3fd0372d7b21d4e5d687029bc06760085aef5d567f38c8a2a5813ffe8ef3c938
  # Image Hex SHA512Sum 2eadb17b18214d81ed0b874f16de6b678cc5f1fee93b8dc9057a3534289c5c73bd833fe9ba17632ea83a3a7e6a51ac5a9681ba63b998d682215ebbc13fe27c58
  # Image size 11999660032 bytes
--------------------------------------------------------------------------

So we see that there are MD5, SHA1, SHA256, SHA512 for the resulting .iso
image file. The only opportunity to check the input file .template is MD5.

But the officially advised way of verifying a Debian ISO is to use the
files SHA*SUMS.sign and SHA*SUMS from the same location from where .jigdo
and .template come.
For example
https://cdimage.debian.org/mirror/cdimage/archive/9.4.0/amd64/jigdo-dlbd/SHA256SUMS
has
--------------------------------------------------------------------------
  3fd0372d7b21d4e5d687029bc06760085aef5d567f38c8a2a5813ffe8ef3c938  debian-9.4.0-amd64-DLBD-2.iso
  7beb78f882cafe6febd43f9677e0cb46a37ff93f1cf5fefd72b5f17afb79b6aa  debian-9.4.0-amd64-DLBD-2.jigdo
  9fe6e66383199303d59c7cb5315163cc1d00a1506ed279ee7cebe54ca8d85fd7  debian-9.4.0-amd64-DLBD-2.template
--------------------------------------------------------------------------

Note the match of the SHA256 sums in both, .jigdo and SHA256SUMS.


Have a nice day :)

Thomas

#942893#88
Date:
2019-10-25 16:01:07 UTC
From:
To:
Much cleaner to switch to sha256 here, I think.

I'll take a look at thst shortly. Working on the core jigdo tool first.

We haven't made official "powerpc" ISOs in a while, so I'm not sure we
need to bother.

#942893#93
Date:
2019-11-14 00:41:14 UTC
From:
To:
Daniel Kahn Gillmor <dkg@fifthhorseman.net> (2019-10-22):

I don't think python-apt is quite ready yet: #944696.


Cheers,

#942893#98
Date:
2019-11-15 17:23:41 UTC
From:
To:
Following up here for information on progress...

I have new versions of jigdo and jigit just about ready to go. I've
defined a new format v2 for jigdo, which uses SHA256 instead of MD%
throughout. The tools to produce jigdo files will now allow the user
to choose which format to create (defaulting to v1 *for now*), while
the client tools will auto-detect and work with either format.

I'm working on a website for the new jigdo binaries, ready with
Windows builds of the tools as well. Richard (original author of
jigdo) is happy with what I've done and will redirect users to my new
stuff.

I'm going to get that finished, then start publicising the new tools
and the version switch. After a reasonable period I'll switch our
production code to format v2.

So, progress...

#942893#103
Date:
2019-11-21 19:38:01 UTC
From:
To:
And another update...

 * I've released and uploaded jigdo 0.8.0 with support for format v2
   (mainly for end user clients). I'll upload backports of this as
   soon as possible, so I can get more people using it on (old)stable
   too. I've prepped and published the new website, complete with a
   set of Windows binaries for people to use.

 * I've released and uploaded jigit 1.22 (including libjte 2) with
   support for format v2. I've pushed a patch at Thomas so the next
   xorriso release should get this support.

 * I've just uploaded all the changes needed for debian-cd to use the
   new xorriso version and generate jigdo v2 format. It's still set to
   do v1 by default until we decide to switch

I'll want to do a quick audit of the backend bits and pieces in the
cdimage production and publishing next, but that's not urgent yet
until...

I'm going to make a big song and dance about these changes and the new
software releases on my blog, and on the CD areas of the Debian
website. We need to get users in the field updated so we can switch to
the new format. I *want* to give people plenty of warning before we
switch, starting with testing/bullseye images.

/me heads off for an evening of game playing... :-)