#1033632 qa.debian.org: sourceforge redirector for debian/watch files fails with a 500 error

#1033632#5
Date:
2023-03-29 06:05:01 UTC
From:
To:
Dear Maintainer,

For several days sf.php no longer works:

,----
| uscan warn: In watchfile debian/watch, reading webpage
|   https://qa.debian.org/watch/sf.php/synfig/ failed: 500 Error
`----

Christian

#1033632#10
Date:
2023-04-06 10:26:17 UTC
From:
To:
I think this problem is now resolved.
The big red ERROR texts in the Watch column on my DDPO page are slowly going away.


Cheers,
Peter


On Wed, 29 Mar 2023 08:05:01 +0200 Christian Marillat <marillat@debian.org> wrote:
 > Package: qa.debian.org
 > Severity: normal
 >
 > Dear Maintainer,
 >
 > For several days sf.php no longer works:
 >
 > ,----
 > | uscan warn: In watchfile debian/watch, reading webpage
 > | https://qa.debian.org/watch/sf.php/synfig/ failed: 500 Error
 > `----
 >
 > Christian
 >
 >

#1033632#15
Date:
2023-04-06 10:44:31 UTC
From:
To:
I don't know. I re-written my watch files to check sourceforge.net
instead of qa.debian.org

Christian

#1033632#20
Date:
2023-04-11 10:17:09 UTC
From:
To:
Hi Christian,

Seems I spoke too soon!  While uscan usually works when I try it locally,
now seems to fail randomly on my QA page.

Cheers,
Peter

#1033632#25
Date:
2023-04-12 01:16:45 UTC
From:
To:
This issue is caused by the underlying SourceForge infrastructure
(their files RSS feed) starting to apply rate limiting and returning
HTTP 429 Too Many Requests errors, which the Debian QA redirector
easily hits, depending on how much use the service has per day.

We could have individual contributors rewrite every single one of their
SourceForge debian/watch files to use the SourceForge files RSS feeds.

Alternatively we could move the code for the SourceForge redirector
into uscan so that individual uscan users get separate rate limit
buckets, rather than having one large Debian rate limit bucket.

Unfortunately these changes will not fix the problem of UDD getting
errors all the time. To fix that, UDD would need to gain a distributed
architecture with multiple IP addresses all contacting SourceForge.
That may cause overloads of the SourceForge server resources though,
which would probably lead to uscan getting blocked again.

So maybe we need to discuss this with SourceForge again.

#1033632#30
Date:
2023-04-12 05:14:44 UTC
From:
To:
Hi,

There's specific code in the UDD uscan wrapper[1] to handle github's
rate limiting. We could have something similar for either sf.net, or the
sf.net redirector. Before I work on that, it would be great if someone
could change the sf.net redirector to return 429 instead of 500 when
sf.net returns 429, so that this specific case is easier to identify.

[1] https://salsa.debian.org/qa/udd/-/blob/master/rimporters/upstream.rb#L161

Lucas

#1033632#35
Date:
2023-04-13 00:29:04 UTC
From:
To:
This is now done, tested and deployed on the server:

https://salsa.debian.org/qa/qa/commit/395d923257e954663156fa315142415f50d1be6a

I elected to just pass on all SourceForge HTTP error codes,
with the HTTP error text prefixed to clarify the error source.

#1033632#40
Date:
2023-04-13 05:05:34 UTC
From:
To:
I added code to handle sf.net's rate limiting in the UDD importer, and
triggered a refresh of all sf.net-hosted packages.

I wonder if we should close this bug. The redirector has not been fixed
(it will still hit rate limiting, but there's not much we can do about
that); but the main path by which maintainers probably access watch data
(UDD -> dashboards) has been fixed.

- Lucas

Some UDD notes for reference:
To watch the status of UDD trying to refresh all SF sources:
udd=> select status, count(*) from upstream where watch_file ~ 'sf.(net|php)' group by status;
            status            | count
------------------------------+-------
 newer package available      |   120
 up to date                   |   469
 error                        |   976
 only older package available |    53
(4 rows)

udd=> select warnings is null, count(*) from upstream where watch_file ~ 'sf.(net|php)' group by 1;
 ?column? | count
----------+-------
 f        |   986
 t        |   632
(2 rows)

To force a refresh of all sf.net sources:
update upstream set last_check = null where watch_file ~ 'sf.(php|net)' and warnings is not null;

- Lucas

#1033632#45
Date:
2023-04-14 00:37:32 UTC
From:
To:
Control: retitle -1 qa.debian.org: sourceforge redirector for debian/watch files gets rate limited

Excellent, thanks.

Federico Grau (CCed) was talking on #debian-mentors about contacting
SourceForge about increasing the rate limits for the Debian redirector
service, so lets leave the bug open for that process and discussion.

#1033632#52
Date:
2023-04-16 18:44:18 UTC
From:
To:
fyi -

The code changes above appear to still be resulting in sf.net errors, or at
least the `unixcw' package still reports Watch errors.

https://salsa.debian.org/qa/qa/commit/395d923257e954663156fa315142415f50d1be6a

https://qa.debian.org/developer.php?login=donfede%40casagrau.org


I contacted SourceForce support via email, per info on their contact web page.
Expect an update to this bug with status as I hear more, or in about a week.

https://sourceforge.net/support


#####
#
# Copy of email sent to sf.net 2023-04-16:

Hello sfnet_ops --

I am Fede Grau, contacting you on behalf of the Debian community.  We are
seeking support from SourceForge Ops with recent RSS feed rate limit changes.
In particular if an "IP exception" may be created for Debian "watch" checks
for package updates.


Reviewing the SourceForge Support Documentation we see there is now an RSS
feed rate limit of "one hit per feed per 30 minutes".  Unfortunately this is
adversely affecting the Debian "watch" checks for updates of Free and Open
Source Software (FOSS) packages hosted at SourceForge.  The Debian project is
tracking this issue with Bug #1033632 .

https://sourceforge.net/p/forge/documentation/RSS/

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1033632



As noted above, we're checking if an IP exception may be created for RSS feed
checks for the Debian project.  The qa.debian.org host performing the "watch"
checks very rarely changes IP address and is in the Debian IP range of:
x.x.x.x/x .  Feedback or questions are welcome.  Thanks for your
assistance.


I happen to be one of the package maintainers for the `unixcw' FOSS package
hosted at SourceForge, which has been affected by these RSS limits.

https://qa.debian.org/developer.php?login=donfede%40casagrau.org

https://unixcw.sourceforge.net/


regards,
donfede

Fede Grau

#1033632#57
Date:
2023-04-16 19:15:11 UTC
From:
To:
Hi,
rate limiting, checking packages hosted on sourceforge.net takes a long
time).

You can check using:
select * from upstream where source='unixcw';

The last_check column should not be NULL.

Thanks!

Lucas

#1033632#62
Date:
2023-04-16 18:29:25 UTC
From:
To:
Hello sfnet_ops --

I am Fede Grau, contacting you on behalf of the Debian community.  We are
seeking support from SourceForge Ops with recent RSS feed rate limit changes.
In particular if an "IP exception" may be created for Debian "watch" checks
for package updates.


Reviewing the SourceForge Support Documentation we see there is now an RSS
feed rate limit of "one hit per feed per 30 minutes".  Unfortunately this is
adversely affecting the Debian "watch" checks for updates of Free and Open
Source Software (FOSS) packages hosted at SourceForge.  The Debian project is
tracking this issue with Bug #1033632 .

https://sourceforge.net/p/forge/documentation/RSS/

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1033632



As noted above, we're checking if an IP exception may be created for RSS feed
checks for the Debian project.  The qa.debian.org host performing the "watch"
checks very rarely changes IP address and is in the Debian IP range of:
209.87.16.0/24 .  Feedback or questions are welcome.  Thanks for your
assistance.



I happen to be one of the package maintainers for the `unixcw' FOSS package
hosted at SourceForge, which has been affected by these RSS limits.

https://qa.debian.org/developer.php?login=donfede%40casagrau.org

https://unixcw.sourceforge.net/


regards,
donfede

Fede Grau

#1033632#67
Date:
2023-04-19 21:12:09 UTC
From:
To:
Copying sf reply to Debian bug #1033632 , as requested by pabs, to enable
Debian members to analyze.

donfede

#1033632#72
Date:
2023-04-20 00:15:56 UTC
From:
To:
...

This is Planet Debian, I guess some blogs are on SourceForge.

This is caused by fakeupstream.cgi, which also has a SourceForge
redirector, which recursively scrapes SourceForge files pages instead
of using the RSS feed. It likely dates from before the RSS feed.
There are only 3 packages using it, but none of them are dispcalgui.

https://codesearch.debian.net/search?q=fakeupstream.cgi?upstream=sf/&literal=1

I temporarily disabled the web server IP address privacy in order
to find out where the requests are coming from and found Msnbot IP
addresses. Then I noticed the User-Agent is bingbot/2.0. I also
verified that the IP addresses are legitimate bingbot addresses.

https://en.wikipedia.org/wiki/Msnbot
http://www.bing.com/bingbot.htm
https://www.bing.com/webmasters/help/verify-bingbot-2195837f

For now I have blocked bingbot from accessing fakeupstream.cgi
and then requested that it stop accessing fakeupstream.cgi:

https://salsa.debian.org/qa/qa/commit/37ada830d0c2c1ece51e7622910014b8ec047909
https://salsa.debian.org/qa/qa/commit/4893d7fce8537d6978ace6484889d3e5efe34af5

This has stopped the flood to SourceForge and hopefully will stop the
flood to fakeupstream.cgi, so this bug can likely be closed now, but...

There are some improvements that we could make to QA services:

 * pass on HTTP error codes from services fakeupstream.cgi accesses
 * switch fakeupstream.cgi SourceForge support to using the RSS feed
 * switch fakeupstream.cgi/sf.php User-Agents to legitimate ones

If anyone would like to work on these, please submit a merge requests.
If no-one does these fixes, then I may get to them eventually.

That is likely to be the regular SourceForge redirector.

#1033632#77
Date:
2023-04-22 09:57:10 UTC
From:
To:
 * add caching to fakeupstream.cgi

That could be a candidate for integration into fakeupstream.cgi.