#1033632 qa.debian.org: sourceforge redirector for debian/watch files fails with a 500 error #1033632
- Package:
- qa.debian.org
- Source:
- qa.debian.org
- Submitter:
- Christian Marillat
- Date:
- 2023-04-22 10:00:03 UTC
- Severity:
- normal
Dear Maintainer, For several days sf.php no longer works: ,---- | uscan warn: In watchfile debian/watch, reading webpage | https://qa.debian.org/watch/sf.php/synfig/ failed: 500 Error `---- Christian
I think this problem is now resolved. The big red ERROR texts in the Watch column on my DDPO page are slowly going away. Cheers, Peter On Wed, 29 Mar 2023 08:05:01 +0200 Christian Marillat <marillat@debian.org> wrote: > Package: qa.debian.org > Severity: normal > > Dear Maintainer, > > For several days sf.php no longer works: > > ,---- > | uscan warn: In watchfile debian/watch, reading webpage > | https://qa.debian.org/watch/sf.php/synfig/ failed: 500 Error > `---- > > Christian > >
I don't know. I re-written my watch files to check sourceforge.net instead of qa.debian.org Christian
Hi Christian, Seems I spoke too soon! While uscan usually works when I try it locally, now seems to fail randomly on my QA page. Cheers, Peter
This issue is caused by the underlying SourceForge infrastructure (their files RSS feed) starting to apply rate limiting and returning HTTP 429 Too Many Requests errors, which the Debian QA redirector easily hits, depending on how much use the service has per day. We could have individual contributors rewrite every single one of their SourceForge debian/watch files to use the SourceForge files RSS feeds. Alternatively we could move the code for the SourceForge redirector into uscan so that individual uscan users get separate rate limit buckets, rather than having one large Debian rate limit bucket. Unfortunately these changes will not fix the problem of UDD getting errors all the time. To fix that, UDD would need to gain a distributed architecture with multiple IP addresses all contacting SourceForge. That may cause overloads of the SourceForge server resources though, which would probably lead to uscan getting blocked again. So maybe we need to discuss this with SourceForge again.
Hi, There's specific code in the UDD uscan wrapper[1] to handle github's rate limiting. We could have something similar for either sf.net, or the sf.net redirector. Before I work on that, it would be great if someone could change the sf.net redirector to return 429 instead of 500 when sf.net returns 429, so that this specific case is easier to identify. [1] https://salsa.debian.org/qa/udd/-/blob/master/rimporters/upstream.rb#L161 Lucas
This is now done, tested and deployed on the server: https://salsa.debian.org/qa/qa/commit/395d923257e954663156fa315142415f50d1be6a I elected to just pass on all SourceForge HTTP error codes, with the HTTP error text prefixed to clarify the error source.
I added code to handle sf.net's rate limiting in the UDD importer, and
triggered a refresh of all sf.net-hosted packages.
I wonder if we should close this bug. The redirector has not been fixed
(it will still hit rate limiting, but there's not much we can do about
that); but the main path by which maintainers probably access watch data
(UDD -> dashboards) has been fixed.
- Lucas
Some UDD notes for reference:
To watch the status of UDD trying to refresh all SF sources:
udd=> select status, count(*) from upstream where watch_file ~ 'sf.(net|php)' group by status;
status | count
------------------------------+-------
newer package available | 120
up to date | 469
error | 976
only older package available | 53
(4 rows)
udd=> select warnings is null, count(*) from upstream where watch_file ~ 'sf.(net|php)' group by 1;
?column? | count
----------+-------
f | 986
t | 632
(2 rows)
To force a refresh of all sf.net sources:
update upstream set last_check = null where watch_file ~ 'sf.(php|net)' and warnings is not null;
- Lucas
Control: retitle -1 qa.debian.org: sourceforge redirector for debian/watch files gets rate limited Excellent, thanks. Federico Grau (CCed) was talking on #debian-mentors about contacting SourceForge about increasing the rate limits for the Debian redirector service, so lets leave the bug open for that process and discussion.
fyi - The code changes above appear to still be resulting in sf.net errors, or at least the `unixcw' package still reports Watch errors. https://salsa.debian.org/qa/qa/commit/395d923257e954663156fa315142415f50d1be6a https://qa.debian.org/developer.php?login=donfede%40casagrau.org I contacted SourceForce support via email, per info on their contact web page. Expect an update to this bug with status as I hear more, or in about a week. https://sourceforge.net/support ##### # # Copy of email sent to sf.net 2023-04-16: Hello sfnet_ops -- I am Fede Grau, contacting you on behalf of the Debian community. We are seeking support from SourceForge Ops with recent RSS feed rate limit changes. In particular if an "IP exception" may be created for Debian "watch" checks for package updates. Reviewing the SourceForge Support Documentation we see there is now an RSS feed rate limit of "one hit per feed per 30 minutes". Unfortunately this is adversely affecting the Debian "watch" checks for updates of Free and Open Source Software (FOSS) packages hosted at SourceForge. The Debian project is tracking this issue with Bug #1033632 . https://sourceforge.net/p/forge/documentation/RSS/ https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1033632 As noted above, we're checking if an IP exception may be created for RSS feed checks for the Debian project. The qa.debian.org host performing the "watch" checks very rarely changes IP address and is in the Debian IP range of: x.x.x.x/x . Feedback or questions are welcome. Thanks for your assistance. I happen to be one of the package maintainers for the `unixcw' FOSS package hosted at SourceForge, which has been affected by these RSS limits. https://qa.debian.org/developer.php?login=donfede%40casagrau.org https://unixcw.sourceforge.net/ regards, donfede Fede Grau
Hi, rate limiting, checking packages hosted on sourceforge.net takes a long time). You can check using: select * from upstream where source='unixcw'; The last_check column should not be NULL. Thanks! Lucas
Hello sfnet_ops -- I am Fede Grau, contacting you on behalf of the Debian community. We are seeking support from SourceForge Ops with recent RSS feed rate limit changes. In particular if an "IP exception" may be created for Debian "watch" checks for package updates. Reviewing the SourceForge Support Documentation we see there is now an RSS feed rate limit of "one hit per feed per 30 minutes". Unfortunately this is adversely affecting the Debian "watch" checks for updates of Free and Open Source Software (FOSS) packages hosted at SourceForge. The Debian project is tracking this issue with Bug #1033632 . https://sourceforge.net/p/forge/documentation/RSS/ https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1033632 As noted above, we're checking if an IP exception may be created for RSS feed checks for the Debian project. The qa.debian.org host performing the "watch" checks very rarely changes IP address and is in the Debian IP range of: 209.87.16.0/24 . Feedback or questions are welcome. Thanks for your assistance. I happen to be one of the package maintainers for the `unixcw' FOSS package hosted at SourceForge, which has been affected by these RSS limits. https://qa.debian.org/developer.php?login=donfede%40casagrau.org https://unixcw.sourceforge.net/ regards, donfede Fede Grau
Copying sf reply to Debian bug #1033632 , as requested by pabs, to enable Debian members to analyze. donfede
... This is Planet Debian, I guess some blogs are on SourceForge. This is caused by fakeupstream.cgi, which also has a SourceForge redirector, which recursively scrapes SourceForge files pages instead of using the RSS feed. It likely dates from before the RSS feed. There are only 3 packages using it, but none of them are dispcalgui. https://codesearch.debian.net/search?q=fakeupstream.cgi?upstream=sf/&literal=1 I temporarily disabled the web server IP address privacy in order to find out where the requests are coming from and found Msnbot IP addresses. Then I noticed the User-Agent is bingbot/2.0. I also verified that the IP addresses are legitimate bingbot addresses. https://en.wikipedia.org/wiki/Msnbot http://www.bing.com/bingbot.htm https://www.bing.com/webmasters/help/verify-bingbot-2195837f For now I have blocked bingbot from accessing fakeupstream.cgi and then requested that it stop accessing fakeupstream.cgi: https://salsa.debian.org/qa/qa/commit/37ada830d0c2c1ece51e7622910014b8ec047909 https://salsa.debian.org/qa/qa/commit/4893d7fce8537d6978ace6484889d3e5efe34af5 This has stopped the flood to SourceForge and hopefully will stop the flood to fakeupstream.cgi, so this bug can likely be closed now, but... There are some improvements that we could make to QA services: * pass on HTTP error codes from services fakeupstream.cgi accesses * switch fakeupstream.cgi SourceForge support to using the RSS feed * switch fakeupstream.cgi/sf.php User-Agents to legitimate ones If anyone would like to work on these, please submit a merge requests. If no-one does these fixes, then I may get to them eventually. That is likely to be the regular SourceForge redirector.
* add caching to fakeupstream.cgi That could be a candidate for integration into fakeupstream.cgi.