- Package:
- jigdo-file
- Source:
- jigdo
- Description:
- Download Debian CD/DVD/USB images from any Debian mirror
- Submitter:
- "Thomas Schmitt"
- Date:
- 2017-12-29 19:21:03 UTC
- Severity:
- normal
Dear Maintainer, the attempt to create debian-6.0.7-amd64-DVD-1.iso from http://cdimage.debian.org/cdimage/archive/6.0.7/amd64/jigdo-dvd/debian-6.0.7-amd64-DVD-1.jigdo and the default mirror http://ftp.de.debian.org/debian/ failed with the message Aaargh - 7 files could not be downloaded. Reason are at least 7 packages on the fallback mirror which is mentioned in the .jigdo file: http://us.cdimage.debian.org/cdimage/snapshot/Debian They get downloaded but not grafted into the emerging ISO image. Five of them can be fetched from http://archive.debian.org/debian/ and then get grafted in. Philip Hands confirmed in https://lists.debian.org/debian-cd/2017/12/msg00020.html that the reamaining two package files are corrupt on the fallback mirror. After he replaced them by correct copies, jigdo-lite could finish the job and the ISO verifies with SHA256. The problem is that jigdo-file does not tell any reason why the 7 or 2 files did not get into effect. Its statement "could not be downloaded" is wrong, because the program messages show that downloading worked. The reason is obviously in the downloaded package bytes. But one only sees that they get downloaded again and again. I failed to find the code part in in the source which decides over the acceptability of a package file. So i cannot propose a patch. My jigdo-lite and jigdo-file are not the youngest Debian versions. But they stem from the same source version and no patch on https://sources.debian.org/src/jigdo/0.7.3-5/debian/patches/ looks related.
Hi, if i shall place a bet, then i'd say the checksum mismatch in https://sources.debian.org/src/jigdo/0.7.3-5/src/mkimage.cc/#L508 is where the message should be issued: ... case JigdoDesc::MATCHED_FILE: { /* If file present in cache, copy its data to image, if not, copy zeroes. if check==true, verify MD sum match. If successful, turn MatchedFile into WrittenFile. */ ... if (mfile == 0 || self->md5() != *(mfile->getMD5Sum(cache))) { // Write right amount of zeroes Here it should say something. The file was found but the checksum does not match. (Still riddling under which circumstances mfile would be 0.) memClear(buf, readAmount); while (*img && toWrite > 0) { size_t n = (toWrite < readAmount ? toWrite : readAmount); writeBytes(*img, buf, n); reportBytesWritten(n, off, nextReport, totalBytes, reporter); toWrite -= n; } if (result == 0) result = 1; // Soft failure } else { /* Copy data from file to image, taking care not to write beyond toWrite. */ int status = fileToImage(img, *mfile, *self, checkMD5, imageInfo.blockLength(), reporter, buf, readAmount, off, nextReport, totalBytes); toCopy.pop(); if (result < status) result = status; if (status == 0) { // Mark file as written to image ... Function fileToImage() does its own checksum verification, but i believe to see in https://sources.debian.org/src/jigdo/0.7.3-5/src/mkimage.cc/#L407 that it would issue a message if this verification failed: } else if (checkMD5 && (md.finish() != matched.md5() || (rsyncLen > 0 && rs != matched.rsync()))) { err = subst(_("Error: `%1' does not match checksum in template data"), fileName); } So if the check in fileToImage() issues an error message, then a MD5 mismatch of the whole file should issue such a message, too. That much is clear. But i can only hope that this is the place where in my case the files got rejected. I hacked myself a jigdo-file with =========================================================================== --- src/mkimage.cc.orig 2017-12-16 16:24:10.297866522 +0100 +++ src/mkimage.cc 2017-12-16 16:55:49.369873692 +0100 @@ -507,6 +507,16 @@ namespace { (mfile != 0 ? mfile->leafName() : ""), toCopy.size()); if (mfile == 0 || self->md5() != *(mfile->getMD5Sum(cache))) { // Write right amount of zeroes + + // ts B71216 : Experimental (especially the mfile message) + if (mfile == 0) { + reporter.error(_("mfile == 0 with matched file")); + } else { + string err = subst(_("MD5 mismatch with matched file `%1'"), + mfile->leafName()); + reporter.error(err); + } + memClear(buf, readAmount); while (*img && toWrite > 0) { size_t n = (toWrite < readAmount ? toWrite : readAmount); =========================================================================== and tried to provoke the messages. But Philip Hands must have been faster with repairing the package files. My unfinished DVD-2 simply got completed. OK: Checksums match, image is good! I will probably try to fake a mismatch later this weekend. (Something like hardcoding a package name into my hack and throwing artificial mismatch during a full download. I should have made a copy of my incomplete DVD-2. Now each try will last 20 minutes and Telekom will begin to hate me.) Have a nice day :) Thomas
Hi, long live the netinst ISOs ! But my patch is not good. Much too many false positive messages. At least i now know that the program execution really gets through that code part. But obviously it does this not only with files which were actually downloaded. Also i learned that mfile->leafName() does not show the package file name but rather paths like debian-6.0.7-amd64-netinst.iso.tmpdir/us.cdimage.debian.org/cdimage/snapshot/Debian/pool/main/o/openssl/libssl0.9.8_0.9.8o-4squeeze14_amd64.deb But i still do not understand how this program is supposed to work on the large scale. Have a nice day :) Thomas
Hi,
i identified a better candidate for detecting non-matching FilePart objects
during scanning. (I learned that jigdo-lite downloads 10 files and then lets
jigdo-file make a scan over the whole image in order to find the places
where they fit. Is this done because jigdo-file has no idea of downloading ?)
The included experimental changeset demonstrates a fallback check. Nevertheless
it seems necessary to verify the package files after downloading, not deep
in the course of image reconstruction.
The ichangeset is not yet intended for production.
(At least the macro SIMULATE_TRIGGER_OF_DEBIAN_BUG_884526 should get
undefined, my "ts B712.." marks and surplus empty lines should get removed.)
============================================================================
--- src/scan.hh.orig 2017-12-16 16:24:10.297866522 +0100
+++ src/scan.hh 2017-12-17 10:58:25.354118956 +0100
@@ -100,6 +100,10 @@ public:
failed, or other reasons. */
bool deleted() const { return fileSize == 0; }
+ // ts B71217 : Experimental
+ void markAsFoundInCache();
+ bool wasFoundInCache();
+
/** Do not call - this is public only because list<> must be able to
delete FileParts */
~FilePart() { }
@@ -151,6 +155,12 @@ private:
WAS_LOOKED_UP = 2,
// Write this file's info into the cache file during ~JigdoCache()
TO_BE_WRITTEN = 4
+
+ ,
+ // ts B71216 : Experimental
+ // This file in the cache was found as matching the MD5 of a template part
+ WAS_FOUND_IN_CACHE = 8
+
};
Flags flags;
bool getFlag(Flags f) const { return (flags & f) != 0; }
--- src/scan.cc.orig 2017-12-17 10:35:36.214113786 +0100
+++ src/scan.cc 2017-12-17 10:56:22.358118491 +0100
@@ -490,3 +490,17 @@ void JigdoCache::addFile(const string& n
FilePart fp(i, nameRest, fileInfo.st_size, fileInfo.st_mtime);
files.push_back(fp);
}
+//______________________________________________________________________
+
+// ts B71217 : Experimental
+void FilePart::markAsFoundInCache() {
+ setFlag(WAS_FOUND_IN_CACHE);
+}
+//______________________________________________________________________
+
+// ts B71217 : Experimental
+bool FilePart::wasFoundInCache() {
+ return getFlag(WAS_FOUND_IN_CACHE);
+}
+//______________________________________________________________________
+
--- src/mkimage.cc.orig 2017-12-16 16:24:10.297866522 +0100
+++ src/mkimage.cc 2017-12-17 12:13:26.850135952 +0100
@@ -817,11 +817,31 @@ int JigdoDesc::makeImage(JigdoCache* cac
while (ci != ce) {
// The call to getMD5Sum() may cause the whole file to be read!
const MD5Sum* md = ci->getMD5Sum(cache);
+
+#define SIMULATE_TRIGGER_OF_DEBIAN_BUG_884526 yes
+#ifdef SIMULATE_TRIGGER_OF_DEBIAN_BUG_884526
+
+ // ts B71217 : Experimental : Fake failure in debian-6.0.7-amd64-netinst
+ if (strstr(ci->leafName().c_str(), "partman-reiserfs_50_all.udeb")
+ != NULL) {
+ reporter.info(_(""));
+ reporter.info(_("ATTENTION : Faking a checksum mismatch with "
+ "package `partman-reiserfs_50_all.udeb'"));
+ ++ci;
+ continue;
+ }
+
+#endif /* SIMULATE_TRIGGER_OF_DEBIAN_BUG_884526 */
+
if (md != 0 && *md == m->md5()) {
toCopy.push(&*ci); // Found matching file
totalBytes += m->size();
debug("%1 found, pushed %2", m->md5().toString(), &*ci);
found = true;
+
+ // ts B71217 : Experimental
+ ci->markAsFoundInCache();
+
break;
}
++ci;
@@ -831,6 +851,23 @@ int JigdoDesc::makeImage(JigdoCache* cac
}
//____________________
+
+ // ts B71217 : Experimental
+ // Warn about cache files which were not used during the scan
+ ci = cache->begin();
+ while (ci != ce) {
+ if (!ci->wasFoundInCache()) {
+ reporter.info(_(""));
+ reporter.info(_("POSSIBLE FILE CORRUPTION: "
+ "Downloaded file did not fit into the template."));
+ string warn_text = subst(_("POSSIBLY CORRUPTED: `%1'"), ci->leafName());
+ reporter.info(warn_text);
+ reporter.info(_(""));
+ }
+ ++ci;
+ }
+
+
debug("JigdoDesc::mkImage: %1 missing, %2 found for copying to image, "
"%3 entries in template", missing, toCopy.size(), files.size());
============================================================================
When i use this jigdo-file with jigdo-lite and these three input lines
[http://cdimage.debian.org/cdimage/archive/6.0.7/amd64/jigdo-cd/debian-6.0.7-amd64-netinst.jigdo
http://us.cdimage.debian.org/cdimage/snapshot/Debian/
then i get these messages
ATTENTION : Faking a checksum mismatch with package `partman-reiserfs_50_all.udeb'
POSSIBLE FILE CORRUPTION: Downloaded file did not fit into the template.
POSSIBLY CORRUPTED: `debian-6.0.7-amd64-netinst.iso.tmpdir/us.cdimage.debian.org/cdimage/snapshot/Debian/pool/main/p/partman-reiserfs/partman-reiserfs_50_all.udeb'
...
-----------------------------------------------------------------
Aaargh - 1 files could not be downloaded.
The message part with the package file name can still be improved.
But i already learned more about C++ today than i ever wanted to know.
I will try to find a good spot in jigdo-lite where the freshly downloaded
package can be compared with the checksum (from the .jigdo file ?).
Have a nice day :)
Thomas
Hi, Ouch. In jigdo-lite it is not easy to have the downloaded files verified with the checksums of the expected FileParts. Steve, i could need a decision in which direction i should go: - Check .jigdo MD5s by jigdo-lite. - Check by jigdo-file, with a new option --warn-unused-file to enable my "POSSIBLE FILE CORRUPTION" test when jigdo-lite is cycling between downloading and jigdo-file "make-image" scanning. (I expect this test to produce lots of false positives if jigdo-file would use it when exploiting a large local pool tree.) - Declare "Won't fix" and have other fun.--------------------------------------------------------------------- Things which are so far ok for a MD5 check in jigdo-lite: The list of files to download is obtained by a run of jigdo-file print-missing-all ... This is not too bad, because it not only delivers a list of possible URLs per file (usually one per file) but also a MD5 in jigdo-file's modified base64 encoding jigdo-file command MD5SUM is supposed to produce a disk file's MD5 in the same format. So comparison would be possible If i add "http://archive.debian.org/..." to the [Servers] list in .jigdo, i get per missing file: two URLs, one encoded MD5, and an empty line. http://archive.debian.org/.../openssh-client-udeb_5.5p1-6+squeeze3_amd64.udeb http://us.cdimage.debian.org/.../openssh/openssh-client-udeb_5.5p1-6+squeeze3_amd64.udeb MD5Sum:BjBWgpWgZYkV0gdXgcpm5A http://archive.debian.org/.../reiserfsprogs-udeb_3.6.21-1_amd64.udeb http://us.cdimage.debian.org/.../reiserfsprogs-udeb_3.6.21-1_amd64.udeb MD5Sum:HEsrTtJufOa50DKzAIQ3EA jigdo-lite seems to expect up to 8 such URLs per file. See it counting by fingers in line 591: for pass in x xx xxx xxxx xxxxx xxxxxx xxxxxxx xxxxxxxx; do ... while $readLine url <&3; do count="x$count" ... if test "$count" != "$pass"; then continue; fi Up to 10 collected URLs are then handed as arguments to function fetchAndMerge which not only downloads them but also runs jigdo-file to put them into the emerging ISO. So this is where verifying would have to happen. I made a plan how to give the MD5s of the URLs as further arguments to fetchAndMerge. Since the encoded MD5s are single words one could send them down as first argument, shift 1, and then give the other arguments to function "fetch" for download.--------------------------------------------------------------------- But then i becomes ugly: Now fetchAndMerge has URLs for wget and corresponding MD5s for files. It would need to deduce the file paths from the URLs in order to run jigdo-file MD5SUM on it. jigdo-file MAKE-IMAGE gets the root of the file pool. I do not dare to guess whether only the freshly downloaded files are in there. If others are present, the relation between downloaded files and MD5s would derail.--------------------------------------------------------------------- Possible workaround: I am now exploring the effort to introduce a new option for jigdo-file:--------------------------------------------------------------------- Well, given the fact that this is only for the unusual case of damaged files on the fallback server, one could easily argue that the risk of a regression is not outwighted by the potential benefit. Have a nice day :) Thomas
Hi, i Cc: debian-cd with this follow-up to bug 884526 in the hope to get some review for the endeavor to detect damaged downloaded package files during a run of jigdo-lite. Some disputable aspects remain (plus a possible bug in current jigdo-lite, which will vanish by my proposal). I put them behind my changeset, which is based on Sid of today. ============================================================================ How it works: Currently and in my proposal jigdo-lite asks jigdo-file for a list of files yet missing in the emerging image. It downloads them in groups of 10 and submits each group to jigdo-file for grafting them into the image. The list consists of text blocks of the form: URL of package Alternative URL of same package ... more alternative URLs ... MD5Sum:... <empty line> One of the URLs from each block gets picked by a loop with variable "pass", but currently jigdo-file ignores the MD5Sum lines. My proposal collects the MD5s and submits them to function fetchAndMerge which currently only gets the 10 URLs. I had to re-arrange the entrails of the list reading loop in imageDownload. It currently ends when the 10th URL is found and thus would not have yet reached the 10th checksum line. My proposal uses the empty line as trigger to append the picked URL and the MD5 to the argument list. Tested with Jessie's jigdo-file and these three input lines: http://cdimage.debian.org/mirror/cdimage/archive/6.0.7/amd64/jigdo-cd/debian-6.0.7-amd64-businesscard.jigdo http://us.cdimage.debian.org/cdimage/snapshot/Debian/--- /usr/bin/jigdo-lite.sid 2017-12-28 14:20:23.882643023 +0100 +++ /home/thomas/projekte/jigdo_dir/jigdo-lite.sid.with_md5_check 2017-12-28 15:12:37.738654856 +0100 @@ -75,10 +75,61 @@ fetch() { } #______________________________________________________________________ -# Given URLs, fetch them into $imageTmp, then merge them into image +# Simulated MD5 mismatch: +simulateMD5Mismatch="partman-reiserfs_50_all.udeb" +# simulateMD5Mismatch="NOT_A_PACKAGE_NAME" + +# Given URLs and MD5s, fetch them into $imageTmp and verify, +# then merge them into image fetchAndMerge() { + + # The other arguments are URLs in the same sequence as the words in md5List + md5List="$1" + shift 1 + if test "$#" -eq 0; then return 0; fi fetch --force-directories --directory-prefix="$imageTmp" -- "$@" + + # Try to verify downloaded files + for md5 in $md5List + do + url="$1" + shift 1 + test "$md5" = ".no.MD5.known." && continue + + # Simulated MD5 mismatch + if echo "$url" | grep '/'"$simulateMD5Mismatch"'$' >/dev/null 2>&1 + then + echo "ATTENTION : Faking a checksum mismatch with package $simulateMD5Mismatch" >&2 + md5="*INVALIDATED*CHECKSUM*" + fi + + localPath="$imageTmp"/`echo "$url" | \ + sed -e 's/^[hH][tT][tT][pP]:\/\///' \ + -e 's/^[hH][tT][tT][pP][sS]:\/\///' \ + -e 's/^[fF][tT][pP]:\/\///' \ + -e 's/^[fF][iI][lL][eE]:\/\///'` + if test -e "$localPath" + then + fileMD5=`$jigdoFile md5 "$localPath" 2>/dev/null | awk '{print $1}'` + if test "$md5" != "$fileMD5" + then + echo >&2 + echo "WARNING: Downloaded file does not match expected MD5:" >&2 + echo " $url" >&2 + echo " $localPath" >&2 + echo " expected: $md5 | $fileMD5 :downloaded" >&2 + echo >&2 + fi + else + echo >&2 + echo "WARNING: File not fot found after download:" >&2 + echo " $url" >&2 + echo " $localPath" >&2 + echo >&2 + fi + done + # Merge into the image $jigdoFile $jigdoOpts --no-cache make-image --image="$image" \ --jigdo="$jigdoF" --template="$template" "$imageTmp" @@ -596,31 +647,56 @@ imageDownload() { for pass in x xx xxx xxxx xxxxx xxxxxx xxxxxxx xxxxxxxx; do $jigdoFile print-missing-all --image="$image" --jigdo="$jigdoF" \ --template="$template" $jigdoOpts $uriOpts \ - | egrep -i '^(http:|ftp:|$)' >"$list" + >"$list" + # This counts the empty lines missingCount=`egrep '^$' <"$list" | wc -l | sed -e 's/ *//g'` # Accumulate URLs in $@, pass them to fetchAndMerge in batches shift "$#" # Solaris /bin/sh doesn't understand "set --" count="" exec 3<"$list" + useUrl="." + md5=".no.MD5.known." + md5List="" while $readLine url <&3; do count="x$count" - if strEmpty "$url"; then count=""; continue; fi - if test "$count" != "$pass"; then continue; fi - if $noMorePasses; then - hrule - echo "$missingCount files not found in previous pass, trying" - echo "alternative download locations:" - echo - fi - noMorePasses=false - set -- "$@" "$url" - if test "$#" -ge "$filesPerFetch"; then - if fetchAndMerge "$@"; then true; else exec 3<&-; return 1; fi - shift "$#" # Solaris /bin/sh doesn't understand "set --" + if strEmpty "$url" + then + if test "$useUrl" != '.' + then + set -- "$@" "$useUrl" + md5List="$md5List $md5" + fi + if test "$#" -ge "$filesPerFetch"; then + if fetchAndMerge "$md5List" "$@" + then + true + else + exec 3<&- + return 1 + fi + shift "$#" # Solaris /bin/sh doesn't understand "set --" + md5List="" + fi + count="" + useUrl="." + md5=".no.MD5.known." + elif echo " $url" | egrep '^ MD5Sum:' >/dev/null 2>&1 + then + md5=`echo " $url" | sed -e 's/ MD5Sum://'` + elif test "$count" = "$pass" + then + useUrl="$url" + if $noMorePasses; then + hrule + echo "$missingCount files not found in previous pass, trying" + echo "alternative download locations:" + echo + fi + noMorePasses=false fi done exec 3<&- - if test "$#" -ge 1; then fetchAndMerge "$@" || return 1; fi + if test "$#" -ge 1; then fetchAndMerge "$md5List" "$@" || return 1; fi if $noMorePasses; then break; fi if test -r "$image"; then break; fi noMorePasses=true ============================================================================ Especially in need of attention: - I could not find a clear description how "wget" determines the local file paths from URLs and option --directory-prefix="$imageTmp". My current conversion from URL to path is purely heuristic therefore: localPath="$imageTmp"/`echo "$url" | \ sed -e 's/^[hH][tT][tT][pP]:\/\///' \ -e 's/^[hH][tT][tT][pP][sS]:\/\///' \ -e 's/^[fF][tT][pP]:\/\///' \ -e 's/^[fF][iI][lL][eE]:\/\///'` - I introduced a dependency on "awk", which was not used in jigdo-lite before. The task is to obtain the first word of jigdo-file's output: fileMD5=`$jigdoFile md5 "$localPath" 2>/dev/null | awk '{print $1}'` - The script does not prevent files with unexpected MD5 from being handed over to jigdo-file for grafting into the ISO. This has the advantage of not hampering success in case of false alarm. But it may also burry the new warning messages in the avalanche of jigdo-lite messages. - There is no fix yet for the potentially wrong message echo "Aaargh - $missingCount files could not be downloaded. [...]" The term "downloaded" is wrong if just the MD5 of a downloaded package did not match any desired file. A lame improvement would be "could not be downloaded or did not match the expected checksums." But better would be a list of experienced problems: Not downloaded, MD5 mismatch after download, not accepted into emerging image. - There is still mismatch simulating code in my changeset. # Simulated MD5 mismatch: simulateMD5Mismatch="partman-reiserfs_50_all.udeb" # simulateMD5Mismatch="NOT_A_PACKAGE_NAME" It should probably be removed for a production version of jigdo-lite. It watches for a particular package name and defaces the expected MD5 string. This yields on stderr: ATTENTION : Faking a checksum mismatch with package partman-reiserfs_50_all.udeb WARNING: Downloaded file does not match expected MD5: http://us.cdimage.debian.org/cdimage/snapshot/Debian/pool/main/p/partman-reiserfs/partman-reiserfs_50_all.udeb debian-6.0.7-amd64-businesscard.iso.tmpdir/us.cdimage.debian.org/cdimage/snapshot/Debian/pool/main/p/partman-reiserfs/partman-reiserfs_50_all.udeb expected: *INVALIDATED*CHECKSUM* | _2QsTFP7LjB6uXQZPYSeeA :downloaded - I assume that this line, which i removed to see the MD5s, was forgotten with 03.jigdo-lite-https.patch : | egrep -i '^(http:|ftp:|$)' >"$list" In my own URL converting expressions, i took the current function isURI() as guideline. Have a nice day :) Thomas
On Thu, 28 Dec 2017, "Thomas Schmitt" <scdbackup@gmx.net> wrote:
...
A rather less laboured way of getting the same effect with sed would be:
sed -e 's,^\(https\?\|ftp\|file\)://,,i'
[ Things to note about that:
s,,, in place of s/// means that no escaping of / is needed
the 'i' flag at the end makes the match case insensitive
s\? means match zero or one 's'
]
However, I doubt that it's important to worry about the potential for
unexpectedly removing a prefix of e.g. cdrom:// or ://, in which case
you could dispense with sed and instead do this:
localpath="$imageTmp/${url#[[:alpha:]]*://}"
The way it's done elsewhere in the script (which I happen to think is
pretty horrible, but that's what is already there) is using set, thus:
set -- `$jigdoFile md5sum --report=quiet "$localPath"`
which leaves the value that you are after in $1.
I also happen to think that using `` rather than $() is pretty horrible
in this day and age, but that's what's currently there throughout the
script, so I guess one should stick with that, or fix it everywhere.
Cheers, Phil.
Hi,
first a correction of my proposal:
The else-case with
echo "WARNING: File not found after download:" >&2
is not good.
It floods the log if one uses a mirror with few matching files.
Not finding a file after the download is a normal situation.
-----------------------------------------------------------------------
I wrote:
Philip Hands wrote:
Are these widely portable enough ?
Mine can be justified by S.R.Bourne's "The Unix System", i guess,
and it is coordinated with function isURI.
Well, my scruples are mainly about what wget guarantees to use as
local disk path. I understand that jigdo-file would be quite tolerant
as long as the file is somewhere in the "$imageTmp" tree.
Maybe i should invest a "find" run in case of missing file. The tree is small.
I wrote:
One would have to wrap the "set --" into a sub-shell, because fetchAndMerge
already tampers with its own arguments.
Like:
answer=`$jigdoFile md5sum --report=quiet "$localPath"`
fileMD5=`(set -- $answer ; echo "$1")`
Now that's really ugly.
If direct objections emerge against "awk", i'd consider some helper
function which echos "$1".
Yep. Not to speak of the headless camelBack variable names.
I strive to be minimally intrusive for the purpose and to be as
conservative as in an autotools script.
An alternative to changing the code would still be to tell the user with
the "Aaargh" text that repeated download and subsequent "Aaargh" could
indicate damaged files on the mirror. In this case the user shall search
the web for other mirrors which offer the repeatedly downloaded packages.
But that would be embarrassing for the involved programmers.
(Having script jigdo-lite instead of doing the job inside jigdo-file is
also not overly glorious ...)
Have a nice day :)
Thomas
the , rather than / feature is already in use in the script (except that
its using s%%%).
\( \) is already in use, and AFAIK \| has been there for as long
\? _might_ be a bit later as a feature, in which case one could add
\|https, but then again isURI() doesn't match https: anyway
The i flag is a GNU extension, so is probably not that portable, so one
could go for \(http\|HTTP\|...
For the shell, I suspect that [[:alpha:]] is an innovation from the
90's, so one could play it safe (well, except that it might break with
odd codings) with [a-zA-Z]. posh doesn't seem to know about [:alpha:]
for instance.
posh does know about the ${ # } thing, but that wasn't in Solaris SVR4
shell AFAIK.
This seems preferable, and avoids new dependencies:
`$jigdoFile md5sum --report=quiet "$localPath" | sed 's/ .*$//'`
Fair enough.
Cheers, Phil.
Hi,
I wrote:
Philip Hands wrote:
I still think that the long explicit sed is clearer. But in the end it will
be up to Steve to decide which one to use.
I tested both proposals of yours and have put them as comments into my
evolving changeset.
The importance of this expression has decreased by my decision to run "find"
if the guessed local path does not lead to an existing file:
localPath=... guessed from URL ...
if test ! -e "$localPath"
then
# Maybe above guess was wrong
baseName=`basename "$url"`
localPath=`find "$imageTmp" -name "$baseName" | head -1`
fi
if test -n "$localPath" -a -e "$localPath"
then
... checksum verification ...
The use of "head" and "find" will be new in the script. But the increased
ruggedness makes it worthwhile in my opinion.
I made mini benchmarks with guessed names and found names. No significant
differences were to see.
(The tree is really small because fetchAndMerge() deletes it when the 10
files are processed.)
The effective throughput of roughly 1.5 to 2.5 MB/s is still much slower
than wget's speed report of about 5.5 MB/s.
I tried with 100 files per run of wget and "jigdo-file make-image".
No significant difference to see. It's all about mirror server latency
with each single file, i guess.
us.cdimage.debian.org is a quick one.
I wrote:
Philip Hands wrote:
I'll take this one.
Next development step will be to issue a correct "Aaargh" message and to
tell at least some of the mismatching files in that message.
Have a nice day :)
Thomas
FWIW I agree. And with respect to the new awk dependency mentioned earlier, it's a simple expression and awk tempts me with that all the time because it's so much easier. Just you know, be advised that on say Debian 8.9 it's gawk while on Ubuntu 17.10 it's mawk, etc. Debian 9.2 is still gawk. From which location? Germany? I find it a bit slower than average from here inside the US, but I have large, local mirrors. And at long last, one of my own is budgeted in 2018. You betcha ;-) Oh, OK. Yeah, better. Vielen Dank Thomas.
Hi, i wrote: Nicholas Geovanis wrote: Ja. How about this final message in case that files are missing and that mismatching downloads were detected ? (The mismatches shown are fake and recorded twice to get more than 2.) ========================================================== begin of example ----------------------------------------------------------------- Aaargh - 1 files remain missing. This should not happen! 4 download attempts yielded files with mismatching MD5 checksums: http://archive.debian.org/debian/pool/main/p/partman-reiserfs/partman-reiserfs_50_all.udeb http://archive.debian.org/debian/pool/main/p/partman-reiserfs/partman-reiserfs_50_all.udeb http://us.cdimage.debian.org/cdimage/snapshot/Debian/pool/main/p/partman-reiserfs/partman-reiserfs_50_all.udeb ... the WARNING messages of this run report more ... After a retry with the same mirror consider to search the web for the names of remaining mismatching packages. As mirror name use the found URL up to the "/" before the directory name "pool". Press Return to retry downloading the missing files. Press Ctrl-C to abort. (If you re-run jigdo-lite later, it will resume from here, the downloaded data is not lost if you press Ctrl-C now.) : ========================================================== end of example Is the advise understandable ? I propose one repetition to remove any warnings from the user defined mirror which were resolved by the fallback mirror. In the second run only messages about the problematic files should emerge, hopefully making it easier to spot the checksum warnings. The list of URLs must be restricted to 3 and the usual messages must be omitted in order to keep the text below 24 lines if the URLs are longer than two lines. The usual message appears if no mismatches were detected but still files are missing. I changed the old message Aaargh - 1 files could not be downloaded. This should not happen! ... in both cases to Aaargh - 1 files remain missing. This should not happen! because it matches better the spectrum of potential problem causes. ============================================================================ --- /usr/bin/jigdo-lite.sid 2017-12-28 14:20:23.882643023 +0100 +++ /home/thomas/projekte/jigdo_dir/jigdo-lite.sid.with_md5_check 2017-12-29 20:09:34.055048360 +0100 @@ -29,6 +29,13 @@ else windows=false nl='\n' fi + +# Counter of MD5 mismatches after download +fileMismatchCount=0 +# Short list of URLs which yielded mismatch +fileMismatchList="" +# Very few mismatching URLs shall be shown at the end. They can be long. +fileMismatchMaxRec=3 #______________________________________________________________________ # read with readline, only if running bash >=2.03 (-e gives error on POSIX) @@ -75,10 +82,84 @@ fetch() { } #______________________________________________________________________ -# Given URLs, fetch them into $imageTmp, then merge them into image +# DEVELOPMENT TEST: +# Set to a package name of the ISO to get simulated MD5 mismatch +# simulateMD5Mismatch="partman-reiserfs_50_all.udeb" +simulateMD5Mismatch="NOT_A_PACKAGE_NAME" + +# Given URLs and MD5s, fetch them into $imageTmp and verify, +# then merge them into image fetchAndMerge() { + + # The other arguments are URLs in the same sequence as the words in md5List + md5List="$1" + shift 1 + if test "$#" -eq 0; then return 0; fi fetch --force-directories --directory-prefix="$imageTmp" -- "$@" + + # Try to verify downloaded files + for md5 in $md5List + do + url="$1" + shift 1 + test "$md5" = ".no.MD5.known." && continue + + # Simulated MD5 mismatch + if echo "$url" | grep '/'"$simulateMD5Mismatch"'$' >/dev/null 2>&1 + then + echo "DEVELOPMENT TEST: Faking a checksum mismatch with package $simulateMD5Mismatch" >&2 + md5="*INVALIDATED*CHECKSUM*" + fi + + # Alternative proposals by Philip Hands: + # localPath="$imageTmp"/`echo "$url" | sed -e 's,^\(https\?\|ftp\|file\)://,,i'` + # localPath="$imageTmp/${url#[[:alpha:]]*://}" + + localPath="$imageTmp"/`echo "$url" | \ + sed -e 's/^[hH][tT][tT][pP]:\/\///' \ + -e 's/^[hH][tT][tT][pP][sS]:\/\///' \ + -e 's/^[fF][tT][pP]:\/\///' \ + -e 's/^[fF][iI][lL][eE]:\/\///'` + + if test ! -e "$localPath" + then + # Maybe the file was downloaded but above guess was wrong + baseName=`basename "$url"` + localPath=`find "$imageTmp" -name "$baseName" | head -1` + fi + + if test -n "$localPath" -a -e "$localPath" + then + fileMD5=`$jigdoFile md5sum --report=quiet "$localPath" | sed 's/ .*$//'` + if test "$md5" != "$fileMD5" + then + echo >&2 + echo "WARNING: Downloaded file does not match expected MD5:" >&2 + echo " $url" >&2 + echo " $localPath" >&2 + echo " expected: $md5 | $fileMD5 :downloaded" >&2 + echo >&2 + + # Record info for the Aaargh message + fileMismatchCount=`expr "$fileMismatchCount" + 1` + if test "$fileMismatchCount" -le "$fileMismatchMaxRec" + then + fileMismatchList="$fileMismatchList , $url" + fi + + # If the mismatch is simulated: play along and prevent merging + if test "$md5" = "*INVALIDATED*CHECKSUM*" + then + echo "DEVELOPMENT TEST: Removing file to complete mismatch simulation" >&2 + echo >&2 + rm "$localPath" + fi + + fi + fi + done + # Merge into the image $jigdoFile $jigdoOpts --no-cache make-image --image="$image" \ --jigdo="$jigdoF" --template="$template" "$imageTmp" @@ -574,6 +655,10 @@ imageDownload() { fetchTemplate || return 1 hrule + # A new cycle of possible aaarghing begins + fileMismatchCount=0 + fileMismatchList="" + # If a "file:" URI was given instead of a server URL, try to merge # any files into the image. echo "Merging parts from \`file:' URIs, if any..." @@ -596,31 +681,55 @@ imageDownload() { for pass in x xx xxx xxxx xxxxx xxxxxx xxxxxxx xxxxxxxx; do $jigdoFile print-missing-all --image="$image" --jigdo="$jigdoF" \ --template="$template" $jigdoOpts $uriOpts \ - | egrep -i '^(http:|ftp:|$)' >"$list" + >"$list" missingCount=`egrep '^$' <"$list" | wc -l | sed -e 's/ *//g'` # Accumulate URLs in $@, pass them to fetchAndMerge in batches shift "$#" # Solaris /bin/sh doesn't understand "set --" count="" exec 3<"$list" + useUrl="." + md5=".no.MD5.known." + md5List="" while $readLine url <&3; do count="x$count" - if strEmpty "$url"; then count=""; continue; fi - if test "$count" != "$pass"; then continue; fi - if $noMorePasses; then - hrule - echo "$missingCount files not found in previous pass, trying" - echo "alternative download locations:" - echo - fi - noMorePasses=false - set -- "$@" "$url" - if test "$#" -ge "$filesPerFetch"; then - if fetchAndMerge "$@"; then true; else exec 3<&-; return 1; fi - shift "$#" # Solaris /bin/sh doesn't understand "set --" + if strEmpty "$url" + then + if test "$useUrl" != '.' + then + set -- "$@" "$useUrl" + md5List="$md5List $md5" + fi + if test "$#" -ge "$filesPerFetch"; then + if fetchAndMerge "$md5List" "$@" + then + true + else + exec 3<&- + return 1 + fi + shift "$#" # Solaris /bin/sh doesn't understand "set --" + md5List="" + fi + count="" + useUrl="." + md5=".no.MD5.known." + elif echo " $url" | egrep '^ MD5Sum:' >/dev/null 2>&1 + then + md5=`echo " $url" | sed -e 's/ MD5Sum://'` + elif test "$count" = "$pass" + then + useUrl="$url" + if $noMorePasses; then + hrule + echo "$missingCount files not found in previous pass, trying" + echo "alternative download locations:" + echo + fi + noMorePasses=false fi done exec 3<&- - if test "$#" -ge 1; then fetchAndMerge "$@" || return 1; fi + if test "$#" -ge 1; then fetchAndMerge "$md5List" "$@" || return 1; fi if $noMorePasses; then break; fi if test -r "$image"; then break; fi noMorePasses=true @@ -630,21 +739,37 @@ imageDownload() { if test -r "$image"; then break; fi hrule - echo "Aaargh - $missingCount files could not be downloaded. This should not" - echo "happen! Depending on the problem, it may help to retry downloading" - echo "the missing files." - if $batch; then return 1; fi - if $usesDebian || $usesNonus; then - echo "Also, you could try changing to another Debian or Non-US server," - echo "in case the one you used is out of sync." - fi - echo - echo "However, if all the files downloaded without errors and you" - echo "still get this message, it means that the files changed on the" - echo "server, so the image cannot be generated." - if $usesDebian || $usesNonus; then - echo "As a last resort, you could try to complete the CD image download" - echo "by fetching the remaining data with rsync." + echo "Aaargh - $missingCount files remain missing. This should not happen!" + if test "$fileMismatchCount" -gt 0 + then + echo "$fileMismatchCount download attempts yielded files with mismatching MD5 checksums:" + echo + echo "$fileMismatchList" | sed -e 's/ , / /' -e 's/ , /\n /g' + if test "$fileMismatchCount" -gt "$fileMismatchMaxRec" + then + echo " ... the WARNING messages of this run report more ..." + fi + if $batch; then return 1; fi + echo + echo "After a retry with the same mirror consider to search the web" + echo "for the names of remaining mismatching packages. As mirror name" + echo 'use the found URL up to the "/" before the directory name "pool".' + else + echo "Depending on the problem, it may help to retry downloading" + echo "the missing files." + if $batch; then return 1; fi + if $usesDebian || $usesNonus; then + echo "Also, you could try changing to another Debian or Non-US server," + echo "in case the one you used is out of sync." + fi + echo + echo "However, if all the files downloaded without errors and you" + echo "still get this message, it means that the files changed on the" + echo "server, so the image cannot be generated." + if $usesDebian || $usesNonus; then + echo "As a last resort, you could try to complete the CD image download" + echo "by fetching the remaining data with rsync." + fi fi echo echo "Press Return to retry downloading the missing files." ============================================================================ Have a nice day :) Thomas