- Package:
- ftp.debian.org
- Source:
- ftp.debian.org
- Submitter:
- Daniel Kahn Gillmor
- Date:
- 2019-11-21 19:39:04 UTC
- Severity:
- normal
The Packages file is growing, and we would like to keep it smaller. The MD5sum lines are vestigial at this point. Anything that they do can be done better with the data from the SHA256sum lines. Removal of the MD5sum lines would reduce the size of the gzip'ed Packages file by ~13%, a significant win for a frequently-downloaded file: $ grep -v ^MD5sum < /var/lib/apt/lists/ftp.debian.org_debian_dists_sid_main_binary-amd64_Packages | gzip -9 | wc -c 9541056 0 dkg@alice:~$ cat < /var/lib/apt/lists/ftp.debian.org_debian_dists_sid_main_binary-amd64_Packages | gzip -9 | wc -c 10913735 0 dkg@alice:~$ echo $(( 100 - 100 * 9541056 / 10913735 )) 13 0 dkg@alice:~$ This removal was attempted once before, as documented in #818463, and all of the subsequent blocking bugs appear to have been fixed in the archive for several years. #887831 suggests that jigdo may currently still be broken if MD5sum goes away, but perhaps that's more of a reflection on the unmaintained state of jigdo than it is on the state of the archive.
Daniel Kahn Gillmor writes: I agree it would be nice to remove MD5sum from Packages; there are a few other fields that might also not be that useful (e.g. Maintainer). From looking, I believe it is debian-cd's tools/grab_md5 that is using the MD5sum from Packages (and Sources) to avoid having to compute all these checksums itself. We could look into either - writing MD5sum in a separate file only used by debian-cd (if present, otherwise debian-cd should fall back to using Packages), or - using a (truncated) sha256sum; this requires that the jigdo client only uses the "md5sum" as an opaque identifier for a file. I've CC'ed debian-cd@ for input. Ansgar
Hi, since Steve McIntyre seems to be busy, i try to answer the general questions about jigdo. The files .jigdo and .template get created by xorriso along with the creation of the .iso image file. The MD5s in .jigdo and .template are used for bringing together the file items in both formats. .template has a byte interval gap and a MD5, .jigdo has a MD5 and a package path, beginning at "pool/". Like (after gunzip *.jigdo): FexKzYyIVG2rRb1UjUKj8Q=Debian:pool/contrib/b/biomaj-watcher/biomaj-watcher_1.2.2-4_all.deb Insofar the MD5 (here as base64 string "FexKzYyIVG2rRb1UjUKj8Q") is only an opaque identifier. But at other occasions it is indeed used as error detector. See bug #772110 where jigdo-file reports a damaged download of a .deb, but is just not able to correct the problem on its own. Neither secure nor perfect. But better than no hint at all, i'd say. In general, a change of the opaque identifier would demand changes in libjte, which produces .jigdo and .template under control of xorriso, and in jigdo-file, which would then have to learn to re-compute the identifier of a package for its imperfect check for glitches in mirror server or transport. Changes in libjte would probably my realm. I am ready to follow tangible instructions. Best relying on a checksum that it can already compute: (MD5), (SHA1), SHA256, SHA512. But given Steve McIntyre's silence on the discussion of bug #887831, which is actually about beefing up jigdo-lite's initial and final tests for success to the strength of SHA512SUMS.sign and SHA512SUMS, i am pessimistic that a change from MD5 to some part of the SHA256 will happen soon in jigdo-lite/file. (He would also have to package the new libjte version.) Further it would create the need for a legacy version of jigdo-lite/file for MD5-based jigdos which are available in the archive: http://cdimage.debian.org/mirror/cdimage/archive/ (Between 6 and 9 there are no iso-dvd sub directories. Since 9.2. they are back.)------------------------------------------------------------------- Ansgar wrote: Looking at https://sources.debian.org/src/debian-cd/3.1.26/tools/grab_md5/ i think that line 105 could get changed from MD5=`echo $ENTRY | /bin/sed 's/:.*$//g'` to something which uses /usr/bin/md5sum on the package file, rather than inquiring the package information. I believe to see in line 107 printf '%s %12.12s %s\n' $MD5 $SIZE $PATH the production of a line for the input file of xorrisofs option -md5-list, as described in man xorrisofs: "Each designated file is represented in the .md5 file by a single text line: MD5 as 32 hex digits, 2 blanks, size as 12 decimal digits or blanks, 2 blanks, symbolic file address " So $MD5 should be filled with the first word of the output of md5sum. (Now who can guess where to find the path to the package file .deb ?) Have a nice day :) Thomas
Well, not just that. It grabs them for use in the jigdo file. The jigdo backend in xorriso (libjte) also checks them as it creates the ISO, for sanity checking on archive/mirror consistency right there. The actual md5 checksum is calculated by the clients too, so the latter is not really an option. I've started a local branch to update jigdo and jigit/libjte to use sha256 some time ago, but -ENOTIME. As mentioned in IRC yesterday, we will also need some time to update clients in the field to be able to upgrade safely. That includes Windows binaries (yay!)...
Sounds like this is the only option available given the constraints of deployed systems in the field. What parts of debian's internal machinery need to be updated to do such a thing? Bummer, and i feel for you. Perhaps we should officially EOL jigdo now, if no one has time to work on it. Obviously, we'd continue supporting deployed legacy systems and give them a chance (one release cycle?) to switch to something that is actually maintained, but it is doing them no favors to pretend that a system they're relying on is getting maintenance when no one has time to work on it. The time to update (or deprecate) deployed clients that depend on md5 for object integrity was something like 8 years ago when RFC 6151 was published :(
Hi, Ansgar wrote: Steve McIntyre wrote: The aspect of "archive/mirror consistency" is not what i perceive as the main purpose of the MD5s. I'd rather characterize them as relation keys and as transport checksums. Not as security precaution. If you had asked me, i would have tried to talk you out of this. The MD5s are sufficient for their purposes. Nothing essential is gained by using SHA256 instead (and why not SHA512 if security matters ?). An estimation of birthday paradox probability with a billion .deb packages yields as upper limit for collision probability: 1 - e exp -1e-20 The negative e potency nearest to 0 which powl(3) can compute as non-0 is e exp -1e-18 (and the result printfed by %Ld looks questionable: 0.999999999999999999). The security weakness of jigdo-lite download is in the fact that the input file .info is not verified at all, .template is verified by an MD5 (not one of the package MD5s), and the result .iso is verified by MD5. The user has means to do better. But they are neither mandatory nor described in a way that a novice could apply them. So the verification steps need to be augmented to match the security of user applicable SHA512SUMS.sign and SHA512SUMS. I propose to directly use this stronger authentication at the start and end of jigdo-lite, and to leave the jigdo entrails as they are. My proposal would make this update of clients much smoother, because the old not-so-safe clients would continue to work with new jigdo files. I wonder whether it is really that hard for debian-cd to compute the MD5s on its own, before it runs xorriso. Who will maintain them ? If there is expertise about MS-Windows and MacOS available, i would ask for help with the open questions in https://wiki.debian.org/JigdoOnLive which are: - How to get firmware and network helper software when Debian Live is up ? - How to get write access to the usual OS' filesystem in order to download .jigdo and .template, and in order to create the .iso file ? (Both are questions which Debian Live should be able to answer anyways, if shall not only be a demo but also a rescue system.) Daniel Kahn Gillmor wrote: But how will Debian then distribute its full DVD sets and the BD-sized ISOs ? Have a nice day :) Thomas
Hi Ansgar We still do that, see /indices/md5sum.gz. Bastian
Hi, i wrote falsely: It would of course have to be "compute as non-1". Have a nice day :) Thomas +
Hi, too early in the morning i wrote: I meant input file ".jigdo", not ".info". Have a nice day :) Thomas
No, *really* no. It's just bumped up my priority list now. It's more complicated than this - we *also* use jigdo for: * mirroring of images, both on the mirror network and also for those of us doing release day tests etc. * providing a wider range of images for download without having to store all the data for ISO / BT download (e.g. a full range of DVDs, BD images, etc.) * archiving older releases, again so we don't have to keep *all* the ISOs *ever* The vast majority of the usage of MD5 here is for (essentially) content-addressable storage. Given the context (with a checksum over the whole image too), this is not such a critical failing.
Sure, that's *most* of it. It's *also* checking for potential corruption in the mirror at build time. We used to have a separate slow step in debian-cd for that, then replaced it with the checking inside JTE. We *have* found occasional errors this way over the years. Agreed. [ suggestion to stay with md5 internally ] I *do* want to update things here, and it's not far off done AFAICS. But that loses the mirror-checking feature that I'd like to keep. I'm looking at moving to sha256 now, and this will pull through the whole pipeline.
Hi, i wrote: Steve McIntyre wrote: MD5 is well suited for that, as long as this is not considered to be part of an intrusion detection system. How about mirror checking by SHA256 in grab_md5, before computing the MD5 for jigdo ? This would authorize the MD5 in a similar strength as it is currently by the list from which grab_md5 reads it. But the confusion caused by the format change ... "old-old-stable" not being able to download the full DVD set of "stable". Don't forget to notify me when a new libjte tarball is ready for inclusion in GNU xorriso. Have a nice day :) Thomas
Hi, i wrote: Or you could let libjte internally compute both, SHA256 and MD5, let it work with SHA256, but store in .jigdo and .template the MD5. (I just checked the API definition. If you can tolerate the function names libjte_set_md5_path() and libjte_add_md5_demand(), then the API part used by xorriso will need no change, whatever you decide. Not so good looks the API part which re-narrates the way how genisoimage produced jigdo. Functions libjte_decide_file_jigdo() and libjte_write_match_record() have MD5 char arrays as parameters. They'd need to be deprecated and/or replaced by new functions. I am not aware of any other user of libjte except xorriso. So maybe just throw out the "Traditional Data File API". What happens to "powerpc" ISOs ? Will you backport the new JTE to genisoimage ? ) Have a nice day :) Thomas
Exactly. we're doing the I/O anyway. I'd much rather just switch from md5 to sha256 in both places and use the already-available checksum data. That's a lot of the point of the JTE design in the first place. It'll take time to switch everything - I'll make an EOL announcement. Yup, of course. :-)
Does this mean that we can drop the lines from Packages and the debian-cd and jigdo will be fine? Sorry that i still don't understand all the pieces in play here.
Is the final checksum over the whole image also MD5, or do we use something stronger? Is there a reason that a maintained version shouldn't use SHA256 instead? From the debian ecosystem perspective, it would be better to publish only a single set of "content-addressable" digests (hence this bug report), so whatever that mechanism is might as well also be cryptographically strong.
Hi, Daniel Kahn Gillmor wrote: range of better checksums to choose from. A typical .jigdo file contains this header part (after gunzip): -------------------------------------------------------------------------- [Image] Filename=debian-9.4.0-amd64-DLBD-2.iso Template=debian-9.4.0-amd64-DLBD-2.template Template-MD5Sum=UUlMi543CsRBsp4bsc3qqQ ShortInfo='Debian GNU/Linux 9.4.0 "Stretch" - Official amd64 DLBD Binary-2 20180310-11:21 (20180310)' Info='Generated on Sat, 10 Mar 2018 11:51:35 +0000' # Template Hex MD5Sum 51494c8b9e370ac441b29e1bb1cdeaa9 # Template size 9515642 bytes # Image Hex MD5Sum 7ba8110513d4b78ae9a3546ad89ba91a # Image Hex SHA1Sum 9e3d3335827d6957b4625417694b985c0d1cfb46 # Image Hex SHA256Sum 3fd0372d7b21d4e5d687029bc06760085aef5d567f38c8a2a5813ffe8ef3c938 # Image Hex SHA512Sum 2eadb17b18214d81ed0b874f16de6b678cc5f1fee93b8dc9057a3534289c5c73bd833fe9ba17632ea83a3a7e6a51ac5a9681ba63b998d682215ebbc13fe27c58 # Image size 11999660032 bytes -------------------------------------------------------------------------- So we see that there are MD5, SHA1, SHA256, SHA512 for the resulting .iso image file. The only opportunity to check the input file .template is MD5. But the officially advised way of verifying a Debian ISO is to use the files SHA*SUMS.sign and SHA*SUMS from the same location from where .jigdo and .template come. For example https://cdimage.debian.org/mirror/cdimage/archive/9.4.0/amd64/jigdo-dlbd/SHA256SUMS has -------------------------------------------------------------------------- 3fd0372d7b21d4e5d687029bc06760085aef5d567f38c8a2a5813ffe8ef3c938 debian-9.4.0-amd64-DLBD-2.iso 7beb78f882cafe6febd43f9677e0cb46a37ff93f1cf5fefd72b5f17afb79b6aa debian-9.4.0-amd64-DLBD-2.jigdo 9fe6e66383199303d59c7cb5315163cc1d00a1506ed279ee7cebe54ca8d85fd7 debian-9.4.0-amd64-DLBD-2.template -------------------------------------------------------------------------- Note the match of the SHA256 sums in both, .jigdo and SHA256SUMS. Have a nice day :) Thomas
Much cleaner to switch to sha256 here, I think. I'll take a look at thst shortly. Working on the core jigdo tool first. We haven't made official "powerpc" ISOs in a while, so I'm not sure we need to bother.
Daniel Kahn Gillmor <dkg@fifthhorseman.net> (2019-10-22): I don't think python-apt is quite ready yet: #944696. Cheers,
Following up here for information on progress... I have new versions of jigdo and jigit just about ready to go. I've defined a new format v2 for jigdo, which uses SHA256 instead of MD% throughout. The tools to produce jigdo files will now allow the user to choose which format to create (defaulting to v1 *for now*), while the client tools will auto-detect and work with either format. I'm working on a website for the new jigdo binaries, ready with Windows builds of the tools as well. Richard (original author of jigdo) is happy with what I've done and will redirect users to my new stuff. I'm going to get that finished, then start publicising the new tools and the version switch. After a reasonable period I'll switch our production code to format v2. So, progress...
And another update... * I've released and uploaded jigdo 0.8.0 with support for format v2 (mainly for end user clients). I'll upload backports of this as soon as possible, so I can get more people using it on (old)stable too. I've prepped and published the new website, complete with a set of Windows binaries for people to use. * I've released and uploaded jigit 1.22 (including libjte 2) with support for format v2. I've pushed a patch at Thomas so the next xorriso release should get this support. * I've just uploaded all the changes needed for debian-cd to use the new xorriso version and generate jigdo v2 format. It's still set to do v1 by default until we decide to switch I'll want to do a quick audit of the backend bits and pieces in the cdimage production and publishing next, but that's not urgent yet until... I'm going to make a big song and dance about these changes and the new software releases on my blog, and on the CD areas of the Debian website. We need to get users in the field updated so we can switch to the new format. I *want* to give people plenty of warning before we switch, starting with testing/bullseye images. /me heads off for an evening of game playing... :-)