#636292 dak/apt: deficiencies at handling out-of-sync metadata

Package:
apt
Source:
apt
Description:
commandline package manager
Submitter:
Date:
2011-09-20 02:57:05 UTC
Severity:
normal
#636292#5
Date:
2011-08-02 01:10:40 UTC
From:
To:
I think I have a very good idea of what is causing all those MD5Sum
mismatch errors during apt-get update.
( http://article.gmane.org/gmane.linux.debian.user.mirrors/1368 )

You see during a single apt-get update, there will be TWO (2) queries
made to the DNS server for each ONE (1) line in a sources.list file.

I believe one query gets the thing. The other gets the checksum of the
thing.

Now you can guess what will happen when that one line is a round robin
site name.

Yup, if the _two different machines_ now being called are slightly out of
sync, naturally the checksums will not match!

The cure is to fix apt so that it only makes one query!

Making a second query not only does not even out the total load on the
servers any more, it also means there are several windows of time each
day when you are comparing apples from machine 1 to oranges from machine
2! Keep it all on one machine and you will be safe.

You can test it yourself. Turn on verbose debugging in your DNS server,
and do apt-get update, and check the log. Voila, two queries for each one line
in sources.list!

Now try a
$ ping example.com

Check your DNS logs. Only one DNS query is made, despite many repeated
connections. Ping has got it right. Apt has got it wrong.

#636292#10
Date:
2011-08-03 21:24:59 UTC
From:
To:
Gentlemen, junior programmer me has finally found the reason
behind apt's MD5Sum mismatchs: multiple DNS queries!
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=636292

#636292#15
Date:
2011-08-04 08:12:41 UTC
From:
To:
retitle 636292 MD5Sum mismatch error
thanks

jidanni@jidanni.org, le Tue 02 Aug 2011 09:10:40 +0800, a écrit :

I'm getting the error on all ftp.{uk,ch,fr}.debian.org sites, which do
not use round robin at all.

Samuel

#636292#20
Date:
2011-08-04 08:24:18 UTC
From:
To:
ST> I'm getting the error on all ftp.{uk,ch,fr}.debian.org sites, which do
ST> not use round robin at all.
All I know is rocky-mountain.csail.mit.edu is rock solid. Try that.

#636292#25
Date:
2011-08-05 01:52:24 UTC
From:
To:
Hi,

積丹尼 wrote:

That particular consequence of mirrors' use of round-robin DNS is
tracked as Bug#582352.  As far as I can tell, it violates the HTTP
spec and can confuse proxies even if the clients are fixed.  I would
be willing to carry out a protocol change to make this work (doing one
DNS query and using the IP as hostname from then on), but it's not
clear anyone involved is interested, so for now I just avoid
round-robin DNS in sources.list on machines I manage.

Thanks for the reproduction recipe.

#636292#30
Date:
2011-08-05 02:24:40 UTC
From:
To:
forcemerge 636292 582352
thanks
JN> 積丹尼 wrote:

JN> That particular consequence of mirrors' use of round-robin DNS is
JN> tracked as Bug#582352.  As far as I can tell, it violates the HTTP
JN> spec and can confuse proxies even if the clients are fixed.  I would
JN> be willing to carry out a protocol change to make this work (doing one
JN> DNS query and using the IP as hostname from then on), but it's not
JN> clear anyone involved is interested, so for now I just avoid
JN> round-robin DNS in sources.list on machines I manage.

JN> Thanks for the reproduction recipe.
I'll forcemerge the bugs. That will swing them into action.

#636292#37
Date:
2011-08-05 17:40:09 UTC
From:
To:
the A (ipv4) and AAAA (ipv6) record:
19:03:20.575070 IP localhost.35750 > localhost.domain: 41865+ A?  ftp.be.debian.org. (35)
19:03:20.575688 IP localhost.domain > localhost.35750: 41865 1/4/7 A 77.243.184.65 (281)
19:03:20.575885 IP localhost.35750 > localhost.domain: 48866+ AAAA? ftp.be.debian.org. (35)
19:03:20.576190 IP localhost.domain > localhost.35750: 48866 1/4/7 AAAA 2a01:300:11:4:2e0:81ff:fe63:cdb2 (293)

There are no other queries, and this is perfectly normal.  There
is nothing wrong with this.

Even with multiple lines in the sources.list file I only see those
2 requests.

(tested with apt 0.8.15.4, I doubt 0.8.15.5 behaves differently.)

As far as I know the issues with hash sum mismatches is either one
of:
- They use an old version of the mirror script that didn't exclude
  InRelease in the first stage.  As a result the InRelease file
  was already updated while the Packages/Sources file isn't for
  a long time.  This has been a problem since ftp-master started
  generating those InRelease file, which was just after the
  squeeze release.
- There is always a delay between updating the Release file and
  the Packages and Sources file, and the error should go away
  after a short time.
- ftp-master generated broken files for some reason.  It sometimes
  happen but not that often.

So I suggest you make sure that all the mirrors that you see
an issue with have updated their mirror script, since I think
that's the biggest issue at the moment.

This was fixed with this commit in archvsync:
commit 77223bb1af262e139a898020a05680e932d51888
Author: Joerg Jaspert <joerg@debian.org>
Date:   Tue Feb 22 22:32:13 2011 +0100

    ftpsync

    update rsync_options1 to also exclude the newish InRelease files in the first run

    Signed-off-by: Joerg Jaspert <joerg@debian.org>

This is part of the 80387 version that you can find in
project/ftpsync/ on the Debian mirrors.  80387 was released
the next day.

If they are using this script to update the mirror, you should
be able to see the version in project/trace/

If there is no version in that file (only a date) they're probably
using an even older script that's also broken.

If they're not using that script or the latest version of it, you
will very likely see the hash sum issues during the mirror sync.


An other issue might be that you're behind some broken transparent
proxy and your connection gets directed to a different servers for
each file you get.  As far as I know apt will only open 1
connection to the server and requests all files over that single
connection, so this really shouldn't happen.


Kurt

#636292#42
Date:
2011-08-05 20:43:28 UTC
From:
To:
(cc's kept since I am not really sure everyone involved is in subscribed
to debian-mirrors.  If you want me to start trimming them down, please
say so).

Hmm, a normal request like this is supposed to return a number of A or
AAAA records for, e.g. ftp.us.debian.org, and not just one.

Just so that we can close that door completely, does apt do the right
thing and use always the same A record or AAAA record from the returned
set, switching to the next one only if there are problems?  I believe it
does it right, but it would be nice to have a definitive answer on it
(and I don't really grok apt to take a quick look at the source to check
it myself).

That is actually quite possible.  However, it is also something we can
assert for sure:

So, it is time to inspect the project/trace/* files in every mirror on
the multi-mirror aliases that users have complained about.

That might not be true if it is a http/1.0 proxy, or if persistent
connections get disabled for whatever reason.  In that case, apt would
have to make multiple connections, and therefore any proxy, transparent
or not, would likely round-robin over the multiple A and AAAA records.

The answer for that would be to update our repository format to have
something seqlock-like to allow apt to detect metadata generation
mismatch, and thus be able to automatically refetch things until it gets
all metadata with the same generation number:
http://en.wikipedia.org/wiki/Seqlock

Maybe using rsync or ftp can help, if it enforces the "get everything
using the same connection" that http might or might not allow apt to do.
But that does NOT scale well at the mirror server side, at all.

#636292#47
Date:
2011-08-05 21:26:31 UTC
From:
To:
Oh my god even my "rock solid" rocky-mountain server is crumbling today:

W: Failed to fetch http://rocky-mountain.csail.mit.edu/debian/dists/experimental/main/binary-i386/PackagesIndex  MD5Sum mismatch

W: Failed to fetch http://rocky-mountain.csail.mit.edu/debian/dists/unstable/main/binary-i386/PackagesIndex  MD5Sum mismatch

E: Some index files failed to download. They have been ignored, or old ones used instead.

My theories are up in the air. My reputation is ruined.

#636292#52
Date:
2011-08-05 22:07:19 UTC
From:
To:
Ha ha ha, it really does split a single apt-get update into two
different places completely across the Internet.

Any maybe even for singular servers like rocky-mountain... maybe
upstream from it is the same splitting problem somewhere.

Anyway here we go:
# cat /etc/apt/sources.list.d/*
deb http://ftp.us.debian.org/debian unstable contrib
# tcpflow -i ppp0 &
# apt-get update
# ls -og /tmp/m
-rw-r--r-- 1 146150 Aug  6 05:54 064.050.233.100.00080-218.163.001.135.45826
-rw-r--r-- 1  68985 Aug  6 05:54 199.006.012.070.00080-218.163.001.135.56243
-rw-r--r-- 1    185 Aug  6 05:54 218.163.001.135.45826-064.050.233.100.00080
-rw-r--r-- 1   1432 Aug  6 05:54 218.163.001.135.56243-199.006.012.070.00080
$ host ftp.us.debian.org
ftp.us.debian.org has address 128.30.2.36
ftp.us.debian.org has address 199.6.12.70
ftp.us.debian.org has address 35.9.37.225
ftp.us.debian.org has address 64.50.233.100
ftp.us.debian.org has address 64.50.236.52
ftp.us.debian.org has IPv6 address 2001:500:61:28::70

#636292#57
Date:
2011-08-06 01:30:44 UTC
From:
To:
KR> - There is always a delay between updating the Release file and
KR>   the Packages and Sources file, and the error should go away
KR>   after a short time.

NOT acceptable.
I hope on the mirrors they are not doing something like
$ cd staging_area && wget a b
when they should be doing
$ wget a b && mv a b staging_area

#636292#62
Date:
2011-08-06 01:37:50 UTC
From:
To:
With a and b and staging_area all being on the same disk partition, for
almost an atomic operation...
OK this is probably not the culprit today, but it is just good practice.

#636292#67
Date:
2011-08-06 01:59:51 UTC
From:
To:
H> Maybe using rsync or ftp can help, if it enforces the "get everything
H> using the same connection" that http might or might not allow apt to do.
H> But that does NOT scale well at the mirror server side, at all.

Well whatever you do, remember a+b+c+a+b+c=a+a+b+b+c+c, so please be
sure no round robin switching is occurring when it shouldn't. No matter
during user operations or mirror operations. In the big picture the load all
evens out anyway, so no savings are had, and instead errors are introduced.

#636292#72
Date:
2011-08-06 09:50:51 UTC
From:
To:
Except that it's about 1000 files. This is basicly what rsync
--delay-updates does, and what is being used.  And on a very busy
mirror this can actually take some time to do.


Kurt

#636292#77
Date:
2011-08-06 21:26:40 UTC
From:
To:
KR> Except that it's about 1000 files. This is basicly what rsync
KR> --delay-updates does, and what is being used.  And on a very busy
KR> mirror this can actually take some time to do.
Well all I know is the 998 .debs should be done first.
Then the 1 index file and 1 checksum file second.
And that second step being as atomic as $ ln a b staging_area
You get in to trouble when you put the president on the same slow train as
the common person, even if he is supposed to arrive after the other
participants are seated.

#636292#82
Date:
2011-08-06 23:56:03 UTC
From:
To:
No, this is 1000 index files.  Please note that we have more than 1
suite and more than 1 arch, and each of those have several files.
Just take a look at the Release file itself to know how many files
need to be updated at the same time.

The new .debs are done first, so that if you get a Packages or
Sources file, you can actually download the files mentioned in
those files.  They are directly copied to the correct place since
they are new files and not updated files.

Then the Packages, Sources, Release and other files are first
all transfered, then moved to the correct place.

After that old files are removed.  And ftp-master only removes
them after a few days that no Pacakges or Sources files mentions
them.

The critical part is moving all the Packages/Sources/Release files
to the new place.  You want to do that in as short a time as
possible.

The problem you're most likely seeing is that the InRelease file
is done together with copying the .deb files, while it should be
part of the Packages/Sources/Release files part.  And I already
explained that part.

But also note that an atomic update on the server side doesn't
help.  If I start downloading the Release file, and while I'm
downloading the Release files the Release/Packages files are
updated on the server, and then download a Packages file, the
Packages and Release file still won't be from the same time.


Kurt

#636292#87
Date:
2011-08-07 13:33:59 UTC
From:
To:
Well, that has two problems we have observed in practice:

1. Not all mirrors have up-to-date mirror scripts, and that
   _does_ include mirrors selected for the multi-mirror aliases;

2. Mirrors in the same multi-mirror alias are not updated at the
   same time, and it is very possible (especially in http
   scenarios) to get metadata skew problems across mirrors even
   when they are perfectly fine and internally consistent.

That doesn't even need a third issue (multiple DNS queries) to cause
problems, way too many users are behind http proxies and caches that
break things regardless.

Maybe we should start designing sequence tagging/generation tagging for
the metadata?  If nobody has time to implement it right now, it would be
a damn fine GSOC project for 2013...

#636292#92
Date:
2011-08-08 05:05:13 UTC
From:
To:
H> That doesn't even need a third issue (multiple DNS queries)
OK. But at least that part could be fixed now.
No denying it is happening, as I showed with tcpflow!

#636292#97
Date:
2011-08-11 21:52:36 UTC
From:
To:
SP> Could you please try to use ftp.us.d.o and confirm up to date ftpsync on all
SP> backends solved your problem ?
I would be extremely ecstatically happy to.
However,
as I _proved_ in 636292 using tcpflow(1),
a simple "apt-get update",
will make TWO calls to the DNS.
The checksum will come from a _different_ round robin machine, four out
of five times. It's Russian Roulette. I can't bear to pull the trigger.
A user would have to be crazy to use a round robin mirror until the apt
team finally gets around to fixing this probably one line bug.

#636292#102
Date:
2011-08-11 23:03:20 UTC
From:
To:
First it does some UDP thing to all IP addresses:
[pid 25433] socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3
[pid 25433] connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("128.30.2.36")}, 16) = 0
[pid 25433] getsockname(3, {sa_family=AF_INET, sin_port=htons(49660), sin_addr=inet_addr("10.0.200.1")}, [16]) = 0
[pid 25433] connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
[pid 25433] connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("199.6.12.70")}, 16) = 0
[pid 25433] getsockname(3, {sa_family=AF_INET, sin_port=htons(35821), sin_addr=inet_addr("10.0.200.1")}, [16]) = 0
[pid 25433] connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
[pid 25433] connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("35.9.37.225")}, 16) = 0
[pid 25433] getsockname(3, {sa_family=AF_INET, sin_port=htons(52379), sin_addr=inet_addr("10.0.200.1")}, [16]) = 0
[pid 25433] connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
[pid 25433] connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("64.50.233.100")}, 16) = 0
[pid 25433] getsockname(3, {sa_family=AF_INET, sin_port=htons(39421), sin_addr=inet_addr("10.0.200.1")}, [16]) = 0
[pid 25433] connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
[pid 25433] connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("64.50.236.52")}, 16) = 0
[pid 25433] getsockname(3, {sa_family=AF_INET, sin_port=htons(37020), sin_addr=inet_addr("10.0.200.1")}, [16]) = 0
[pid 25433] close(3)                    = 0
[pid 25433] socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP) = 3
[pid 25433] connect(3, {sa_family=AF_INET6, sin6_port=htons(80), inet_pton(AF_INET6, "2001:500:61:28::70", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
[pid 25433] getsockname(3, {sa_family=AF_INET6, sin6_port=htons(50188), inet_pton(AF_INET6, "2001:0:53aa:64c:2ca7:460f:aeac:9430", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
[pid 25433] close(3)                    = 0

No idea what it's really trying to do, but I guess it's trying to see which if they're routable.
The AF_UNSPEC part probably doesn't make much sense.

Then it goes on with:
[pid 25433] socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
[pid 25433] fcntl(3, F_GETFL)           = 0x2 (flags O_RDWR)
[pid 25433] fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 25433] connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("128.30.2.36")}, 16) = -1 EINPROGRESS (Operation now in progress)
[...]
[pid 25433] write(3, "GET /debian/dists/sid/InRelease HTTP/1.1\r\nHost: ftp.us.debian.org\r\nConnection: keep-alive\r\nCache-Control: max-age=0\r\nIf-Modified-Since: Thu, 11 Aug 2011 20:22:47 GMT\r\nUser-Agent: Debian APT-HTTP/1.3 (0.8.15.5)\r\n\r\n", 213) = 213
[...]
[pid 25433] read(3, "HTTP/1.1 304 Not Modified\r\nDate: Thu, 11 Aug 2011 22:32:27 GMT\r\nServer: Apache/2.2.9 (Debian)\r\nConnection: Keep-Alive\r\nKeep-Alive: timeout=15, max=100\r\nETag: \"1d0a203-239d0-4aa408f6173c0\"\r\n\r\n", 65536) = 191
[...]
[pid 25433] write(3, "GET /debian/dists/sid/main/binary-amd64/Packages.diff/Index HTTP/1.1\r\nHost: ftp.us.debian.org\r\nConnection: keep-alive\r\nCache-Control: max-age=0\r\nIf-Modified-Since: Thu, 11 Aug 2011 20:16:48 GMT\r\nUser-Agent: Debian APT-HTTP/1.3 (0.8.15.5)\r\n\r\n", 241) = 241
[...]
[pid 25433] read(3, "HTTP/1.1 304 Not Modified\r\nDate: Thu, 11 Aug 2011 22:32:28 GMT\r\nServer: Apache/2.2.9 (Debian)\r\nConnection: Keep-Alive\r\nKeep-Alive: timeout=15, max=99\r\nETag: \"1d0a308-7f6-4aa4079fb8c00\"\r\n\r\n", 65345) = 188

So it looked for the InRelease and Packages file over the same connection.

And than for some unclear reason to me it closes and opens the connection again to get the i18n files:

[pid 25433] close(3)                    = 0
[pid 25433] read(0, 0x7fff66c68790, 64000) = -1 EAGAIN (Resource temporarily unavailable)
[pid 25433] close(4294967295)           = -1 EBADF (Bad file descriptor)
[pid 25433] write(1, "102 Status\nURI: http://ftp.us.debian.org/debian/dists/sid/main/i18n/Index\nMessage: Connecting to ftp.us.debian.org (199.6.12.70)\n\n", 130) = 130
[pid 25433] socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
[pid 25433] fcntl(3, F_GETFL)           = 0x2 (flags O_RDWR)
[pid 25433] fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 25433] connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("199.6.12.70")}, 16) = -1 EINPROGRESS (Operation now in progress)
[...]
[pid 25433] write(3, "GET /debian/dists/sid/main/i18n/Index HTTP/1.1\r\nHost: ftp.us.debian.org\r\nConnection: keep-alive\r\nCache-Control: max-age=0\r\nIf-Modified-Since: Thu, 11 Aug 2011 19:55:34 GMT\r\nUser-Agent: Debian APT-HTTP/1.3 (0.8.15.5)\r\n\r\n", 219 <unfinished ...>
[...]
[pid 25433] read(3, "HTTP/1.1 304 Not Modified\r\nServer: nginx/0.8.54\r\nDate: Thu, 11 Aug 2011 22:32:46 GMT\r\nLast-Modified:
Thu, 11 Aug 2011 19:55:34 GMT\r\nConnection: keep-alive\r\n\r\n", 65536) = 158
[...]
[pid 25433] exit_group(100)             = ?

(It stops the program without closing the socket.)

This i18n/Index file is also covered by the InRelease, so this clearly is a problem.


Kurt

#636292#107
Date:
2011-08-12 01:39:54 UTC
From:
To:
Now, don't be absurd.
#636292#112
Date:
2011-08-12 08:45:28 UTC
From:
To:
On Fri, Aug 12, 2011 at 03:39, Henrique de Moraes Holschuh <hmh@debian.org> wrote:

Yeah, it's getting hilarious since a while ...
Now as ftpsync is fixed on the US mirrors all checksum problems should
be solved.

#636292#117
Date:
2011-08-12 15:31:15 UTC
From:
To:
to be in sync *across* mirrors, and we cannot trust the network backends
to always connect to the same mirror.

The multiple DNS lookups bug just breaks a workaround for that design
bug that works well in a particular case (fortunately, a common one):
persistent connections.

What I consider absurd is jidanni's "probably one line bug" comment.

#636292#122
Date:
2011-08-13 11:04:22 UTC
From:
To:
H> What I consider absurd is jidanni's "probably one line bug" comment.
Naw... it's probably just a case of
for(thing,checksum_of_thing){
	do_dns_query(); #move this line before the loop
	get_it();
}

#636292#127
Date:
2011-08-21 22:12:43 UTC
From:
To:
SP> It's no longer the case, all ftp.us have no 80387.

SP> jidanni, do you still observe issues ?

Yes, as a matter of fact I do.
I even recorded the exact time window for you. In UTC as a special bonus.
starting Sun Aug 21 21:30:51 UTC 2011
W: Failed to fetch http://ftp.us.debian.org/debian/dists/experimental/main/binary-i386/PackagesIndex  MD5Sum mismatch
W: Failed to fetch http://ftp.us.debian.org/debian/dists/unstable/main/binary-i386/PackagesIndex  MD5Sum mismatch
E: Some index files failed to download. They have been ignored, or old ones used instead.
ending Sun Aug 21 21:36:55 UTC 2011

I have a recommendation:
that you fellows fix the this bug.
As I have noted, it is certainly a one-liner.
I mean aren't we running out of other things to blame for the problem? Thanks.

#636292#132
Date:
2011-08-21 22:33:04 UTC
From:
To:
[...]

As we already pointed out, it is not a one-liner.  If you're so
sure it's a one-liner, I suggest you submit a patch.

Even if we fix the problem with connecting to multiple servers, there
are various other reasons why it can fail, and they have all been
explained already.

I'm not even sure that if you fix the multiple server connections
that would get better or worse results.  But I would still suggest
that we do try and connect to only 1 server.


Kurt

#636292#137
Date:
2011-08-22 01:08:52 UTC
From:
To:
Actually the first five minutes were spent in my 'sleep 5m' so it really is
<< starting Sun Aug 21 21:35:51 UTC 2011
KR> I'm not even sure that if you fix the multiple server connections
KR> that would get better or worse results.  But I would still suggest
KR> that we do try and connect to only 1 server.
I've now also added a tcpflow(1) wrapper enabling me to send you all
byte-by-byte evidence the next time it happens... but why allow me that
wicked pleasure?

#636292#142
Date:
2011-08-22 06:49:27 UTC
From:
To:
We know what the problem is, that's not needed.


Kurt

#636292#147
Date:
2011-09-14 01:28:24 UTC
From:
To:
K> We know what the problem is, that's not needed.
Are you sure?

#636292#152
Date:
2011-09-14 01:35:05 UTC
From:
To:
Actually all that is going to happen is one day I will accidentally send
the tcpflow logs containing unrelated personal traffic too as the
filtering is too complex, so I would appreciate it if someone looked
into this bug.

#636292#157
Date:
2011-09-14 13:58:38 UTC
From:
To:
Three different mirrors in a single _botched_ apt-get update.
#636292#162
Date:
2011-09-16 00:33:15 UTC
From:
To:
I recall that was taken care of.
Why doesn't someone take care of that.

#636292#167
Date:
2011-09-16 08:26:02 UTC
From:
To:
tags 636292 will-get-fixed-by-donkult-then-hell-freezes-over
kthxbye
This is open source software: YOU are part of the awesome team!
So feel free to blame yourself that you haven't taken care of it.
In fact, as we are all volunteers you can only blame yourself…
your one line patch to this bugreport.
We need NOTHING else from you. I repeat: NOTHING ELSE!
No goddamn tcpflow logs nor any other data, just provide
your simple patch and everybody will be happy.
Thanks.


I can only speak for myself, but I haven't even tried to look at
this issue because of this sentence (and all the howling before
and after that). And I am pretty sure it will need a loooooooong time
until I feel motivated to do so thanks to your behavior in the buglog,
so if I were you I would submit a patch or keep silent until I can
provide something useful to fix the bug I respond to.
Bonus points if you can do both.


If you want to blame anyone in the meantime, blame yourself for considerable
lower the chances to get this or any related bug fixed by working hard on
demotivating at least one of the few people who regularly contribute to APT…

Thats a great achievement, given that even the worst kids in my young groups
can't make that happen, so:
Congratulations!

David Kalnischkies

P.S.: Don't bother to answer, the buglog includes enough messages already and
I will not read it anyway. Everything we need is your patch now, so hurry up.

#636292#172
Date:
2011-09-16 09:23:31 UTC
From:
To:
Might be that the trade off between "time to spend" and "what I like to
do first" might not suite your wishes.

As always "Show the code" still applies.

#636292#177
Date:
2011-09-16 20:06:29 UTC
From:
To:
A bug involving InRelease files which has similar symptoms was reported
on http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=641769

#636292#182
Date:
2011-09-17 15:34:02 UTC
From:
To:
retitle 636292 dak/apt: deficiencies at handling out-of-sync metadata
summary 636292 87
thanks

If anyone disagrees with the above triage, please change the summary
and/or title.  Thank you.

You're expected to read the entire thing when refered to a bug report in
a thread you're replying to.  Anyway, triaged.  I didn't want to do it
because it is not my bug, it is not a package I work on, and I have no
idea wether the apt developers agree with my anaylsis of #636292 or not.
Please feel free to improve the title or chose a new summary.

That misses the point, IMO.  To me, it looks like what's "broken" is
that the repository format _and_ the front-ends have deficiencies at
handling metadata which is unsyncronized either in-mirror or across
mirrors.  And these deficiencies are a lot more important nowadays than
they once were, as we have now many dinstall runs per day, lots of users
tracking testing and unstable, a larger set of metadata files, a larger
and more diverse set of mirrors... I.e: a lot more chances to hit
unsyncronized metadata windows.

#636292#191
Date:
2011-09-20 02:52:53 UTC
From:
To:
Le 2011-09-17 11:34, Henrique de Moraes Holschuh a écrit :
I was unaware of that.

[...]

I don't think increasing dinstall frequency worsens these issues
significantly if dinstalls get shorter (unless previous dinstalls ran
during the night). I also think archive size growth should have been
compensated by performance increases. I think the time spent
synchronizing a mirror must not have increased a lot. What did change
here (dramatically) is the proportion of that time where APT indices
updates fail. Round-robin mirrors might also have worsened.

Anyway, the repository format is not a problem per se, it's the
combination of what's on a mirror and how APT fetches it that's a
problem. If you assume the communication protocol is HTTP-like, then
indeed there should be mechanisms to cope with race conditions - i.e.
file versioning and/or having APT retry or report desynchronizations.