#143736 print a warning when Content-Length and actual data mismatch

Package:
wget
Source:
wget
Description:
retrieves files from the web
Submitter:
"Juha Jäykkä"
Date:
2015-12-11 11:36:10 UTC
Severity:
wishlist
#143736#5
Date:
2002-04-20 11:16:15 UTC
From:
To:
  If the http content-length header differs from actual data length,
wget disregards the http specification as follows:
1) if content-length is greater than actual data, wget keeps retrying to
receive the whole file indefinitely. Using the command-line parameter
--ignore-length fixes this but should it not be on by default?
2) If content-length is smaller than actual data sent by server, wget
happily downloads it all instead of stopping at what ever content-length
specified. This is contrary to the spec which strictly states that
content-length must be obeyed and that the user must be notified that
something strange happened. It correctly tells the user that it received
nnn/mmm bytes, where mmm is content-length but should there not be an
error message, too?

#143736#10
Date:
2002-04-23 11:07:19 UTC
From:
To:
tags 143736 + upstream
forwarded 143736 bug-wget@gnu.org
thanks

I reported your bugs to upstream.

Thx for your report.

#143736#19
Date:
2002-04-23 13:55:23 UTC
From:
To:
Noel Koethe <noel@koethe.net> writes:

It doesn't disregard the HTTP specification.  As far as I'm aware,
HTTP simply specifies that the information provided by Content-Length
must be correct.  When it is not correct, the protocol has been broken
by the server and the best Wget can do is try to make sense of the
situation.  In both cases you report, Wget's behavior is by design.

Not indefinitely, but until `--tries' attempts (20 by default) have
been exhausted.

No.  When you're downloading files over a slow or unstable network,
you will often get EOF while reading data.  Retrying in spite of that
EOF has been one of Wget's primary features since the very beginning.

So Wget is not disregarding the spec, it is *honoring* it by assuming
that the provided Content-Length is correct, as it should be.  This
feature has made many a download possible.  In the cases where the
content-length header truly is broken, use `--ignore-length'.

Again, this is a feature.  Broken CGI scripts often report broken
values for `Content-Length'.  When more data arrives, it becomes
apparent that the reported value is *broken* (unlike in the case when
less data arrives).  Wget can either dismiss the rest of the data or
dismiss the header.  I judged the data actually transmitted over the
wire to be more important than one obviously broken header.

The exception is when persistent connections are used.  In that case,
Content-Length is honored to the letter, and the remote server had
*better* provide the correct value, or else.

Which spec says that?

#143736#24
Date:
2002-04-25 04:52:16 UTC
From:
To:
Hrvoje Niksic wrote:

Quoting from section 7.2.2 of RFC 1945:

   When an Entity-Body is included with a message, the length of that
   body may be determined in one of two ways. If a Content-Length header
   field is present, its value in bytes represents the length of the
   Entity-Body. Otherwise, the body length is determined by the closing
   of the connection by the server.

      Note: Some older servers supply an invalid Content-Length when
      sending a document that contains server-side includes dynamically
      inserted into the data stream. It must be emphasized that this
      will not be tolerated by future versions of HTTP. Unless the
      client knows that it is receiving a response from a compliant
      server, it should not depend on the Content-Length value being
      correct.


Since wget is an HTTP/1.0 client, its behavior is entirely consistent with
the specification. Noel was probably thinking of RFC 2068, which says:

   When a Content-Length is given in a message where a message-body is
   allowed, its field value MUST exactly match the number of OCTETs in
   the message-body. HTTP/1.1 user agents MUST notify the user when an
   invalid length is received and detected.

But until wget is upgraded to be a 1.1 client, it does not need to worry
(much) about RFC 2068. Even after the conversion, the only obvious change
that is needed is to include a message about the invalid length in wget's
output, which most users will probably overlook anyway.

Tony

#143736#29
Date:
2002-04-25 10:55:27 UTC
From:
To:
"Tony Lewis" <tlewis@exelana.com> writes:

Even so, when less data have been received, it's impossible to detect
whether that's because of a faulty network or a faulty server.  Wget
defaults to believing the server, which is in conformance with HTTP.

If your point is that Wget should print a warning when it can *prove*
that the Content-Length data it received was faulty, as in the case of
having received more data, I agree.  We're already printing a similar
warning when Last-Modified is invalid, for example.

#143736#34
Date:
2002-04-25 14:02:04 UTC
From:
To:
Hrvoje Niksic wrote:
and T. Berners-Lee what they were thinking. <grin> I was just quoting from
RFC 2068: Hypertext Transfer Protocol -- HTTP/1.1

As for printing a warning only when wget can "prove" that the Content-Length
data was faulty, sounds like a reasonable implementation to me.

Tony

#143736#39
Date:
2003-04-20 23:42:54 UTC
From:
To:
Dear Debian "wget" bug reporter,

I just want to inform you why your bug report is still open or why it
looks like that nobody is working on fixing the bugs reported against
the wget package (http://bugs.debian.org/wget).

Since some month I didn't report bugs and problems to the upstream
author/mailinglist (http://www.gnu.org/software/wget/#mailinglists)
because wget right now has no maintainer who is working on it or fixes
bugs.:(

http://www.gnu.org/help/help.html
--8<--
We are looking for new maintainers for these GNU packages (contact
<maintainers@gnu.org> if you'd like to volunteer):
      * ...
      * wget (which still has a maintainer, but he would like to step
        down)
--8<--

This is the reason why reporting bugs to the wget mailinglist right now
doesn't make sense because nobody will work on them. When there will be
a new maintainer I will report the open bugs in the Debian Bug Tracking
System (BTS) to him but this may take some time (the help request on the
gnu.org page is there since a month; maybe you know somebody:)).

Thanks for reporting bugs.:)

#143736#42
Date:
2003-04-20 23:42:54 UTC
From:
To:
Dear Debian "wget" bug reporter,

I just want to inform you why your bug report is still open or why it
looks like that nobody is working on fixing the bugs reported against
the wget package (http://bugs.debian.org/wget).

Since some month I didn't report bugs and problems to the upstream
author/mailinglist (http://www.gnu.org/software/wget/#mailinglists)
because wget right now has no maintainer who is working on it or fixes
bugs.:(

http://www.gnu.org/help/help.html
--8<--
We are looking for new maintainers for these GNU packages (contact
<maintainers@gnu.org> if you'd like to volunteer):
      * ...
      * wget (which still has a maintainer, but he would like to step
        down)
--8<--

This is the reason why reporting bugs to the wget mailinglist right now
doesn't make sense because nobody will work on them. When there will be
a new maintainer I will report the open bugs in the Debian Bug Tracking
System (BTS) to him but this may take some time (the help request on the
gnu.org page is there since a month; maybe you know somebody:)).

Thanks for reporting bugs.:)

#143736#47
Date:
2003-10-24 11:40:42 UTC
From:
To:
Could you please close this bug?  Wget does honor Content-Length and
the HTTP specification.  For more information, see the discussion
in the bug report, especially between me and Tony Lewis.

A possible improvement (which Tony is asking for) is to print a
warning when Wget detects a mismatch between Content-Length and the
actual data.  This should be filed as a wish list item because it's
not even really a bug (HTTP/1.0 doesn't mandate such a warning,
HTTP/1.1 does).

#143736#52
Date:
2003-10-24 12:54:43 UTC
From:
To:
severity 143736 wishlist
retitle 143736 print a warning when Content-Length and actual data mismatch
thanks

Am Fr, den 24.10.2003 schrieb Hrvoje Niksic um 13:40:

Hello Hrvoje,

Sure but see below (just write to nnnnn-done@bugs.debian.org , there is
no ACL in the Debian bugsystem).

OK, so I will change this one to a wishlist request.

#143736#59
Date:
2015-06-22 08:25:27 UTC
From:
To:
VOUS ÊTES PLUTÔT MAS EN PIERRES OU VILLA CONTEMPORAINE?

Madame, Monsieur, accédez à un cadre de vie authentique, aux senteurs
de la Provence, avec OPUS Développement. Découvrez nos derniers mas en
pierres et villas contemporaines, ici!


Vous êtes abonné à la newsletter d'OPUS Développement avec l'adresse
email: 143736-submitter@bugs.debian.org

Vous pouvez vous désinscrire: http://communication.villas-lumina.com/HD?b=W_ESM68QPR3lrcPT75yZ1OJJJn7YI2uQWYTyE3p-cJwB6mz5jwZONbI0bDiWs1Qq&c=nlFIo7E3lag9CfcTJJFg8g des offres d'OPUS Développement.

En application de la loi n°78 - 17 du 6 Janvier 1978 modifiée par la
loi du 6 Août 2004 relative à l'informatique, aux fichiers et aux
libertés, vous disposez d'un droit d'accès, de modification, de
rectification et de suppression des données personnelles vous
concernant auprès de:
OPUS Développement 4, rue des Trésoriers de la Bourse 34000 Montpellier.

#143736#62
Date:
2015-09-09 02:18:19 UTC
From:
To:
Notice to Appear,

This is to inform you to appear in the Court on the September 13 for your case hearing.
You are kindly asked to prepare and bring the documents relating to the case to Court on the specified date.
Note: The case will be heard by the judge in your absence if you do not come.

You can review complete details of the Court Notice in the attachment.

Sincerely,
Eddie Caldwell,
Court Secretary.

#143736#67
Date:
2015-11-28 08:08:16 UTC
From:
To:
Pour visualiser ce message sur votre navigateur:


VOUS ÊTES PLUTÔT MAS EN PIERRES OU VILLA CONTEMPORAINE?

Madame, Monsieur, accédez à un cadre de vie authentique, aux senteurs
de la Provence, avec OPUS Développement. Découvrez nos derniers mas en
pierres et villas contemporaines, ici!


Vous êtes abonné à la newsletter d'OPUS Développement avec l'adresse
email: 143736-quiet@bugs.debian.org

Vous pouvez vous désinscrire: http://communication.villas-lumina.com/HD?b=gNg8Ik_dJo19bPcF_ZSiEY-NeuLk8oWqIt2CikS-J93CeIkcXr2wPzmXi0Jdz3Nu&c=-TyI-Qz_z1nxPD2dILAMAA des offres d'OPUS Développement.

En application de la loi n°78 - 17 du 6 Janvier 1978 modifiée par la
loi du 6 Août 2004 relative à l'informatique, aux fichiers et aux
libertés, vous disposez d'un droit d'accès, de modification, de
rectification et de suppression des données personnelles vous
concernant auprès de:
OPUS Développement 4, rue des Trésoriers de la Bourse 34000 Montpellier.

#143736#72
Date:
2015-12-11 11:33:57 UTC
From:
To:
Pour visualiser ce message sur votre navigateur:


VOUS ÊTES PLUTÔT MAS EN PIERRES OU VILLA CONTEMPORAINE?

Madame, Monsieur, accédez à un cadre de vie authentique, aux senteurs
de la Provence, avec OPUS Développement. Découvrez nos derniers mas en
pierres et villas contemporaines, ici!


Vous êtes abonné à la newsletter d'OPUS Développement avec l'adresse
email: 143736-quiet@bugs.debian.org

Vous pouvez vous désinscrire: http://communication.villas-lumina.com/HD?b=EhwAe-e_yXZYmDYNz2ASMIk52Suj1wlRmasZx4U1wtqsjxODNytKCGSd8kopBtx_&c=breFZIg2Rx51yiN4-apZZw des offres d'OPUS Développement.

En application de la loi n°78 - 17 du 6 Janvier 1978 modifiée par la
loi du 6 Août 2004 relative à l'informatique, aux fichiers et aux
libertés, vous disposez d'un droit d'accès, de modification, de
rectification et de suppression des données personnelles vous
concernant auprès de:
OPUS Développement 4, rue des Trésoriers de la Bourse 34000 Montpellier.