- Package:
- mailman3-web
- Source:
- mailman-suite
- Submitter:
- Peter Chubb
- Date:
- 2025-08-11 21:09:01 UTC
- Severity:
- normal
- Tags:
Dear Maintainer, I have a mailman3 system backed by PostGRES, exim4, and nginx; and it is set up and works properly. However, the uwsgi process keeps growing and growing until the system OOMs. typically after two to three weeks. I added more RAM (the system now has 3Gb) but that postponed but did not fix the problem. As a workaround I now restart the mailman3 service once a day.
Hi, I'm pretty sure I'm seeing this too. I'm running it under apache2 and with mariadb. After a week or so uwsgi was using about 7% RAM on an 8G machine. I restarted mailman3-web and that went back to 1%. One day later it is up to 1.2%; I guess it will keep growing and I will also have to regularly restart mailman3-web. Is it easy to switch the mailman3-web package to run under gunicorn? Cheers, Andy
Hi, Peter Chubb <peter.chubb@unsw.edu.au> wrote on 29/06/2022 at 03:11:15+0200: Having the same kind of setup for the past 6 years, I never had such an issue. Do you have more intel?
Pierre-Elliott> Having the same kind of setup for the past 6 years, I Pierre-Elliott> never had such an issue. Since increasing the size of the VM and the last Mailman3 upgrade, I haven't seen the issue.
I can also confirm this running mailman3-web in Apache. Usually it only takes a few days. I have attached a graph to illustrate the growth.
I am using this to launch it:
WSGIDaemonProcess mailman3 processes=1 threads=8 display-name=%{GROUP} home=/usr/share/mailman3-web
Adding a "maximum-requests=1” does not help at all. Swapping processes and threads does not bring any change either.
I have no idea what to look for but I am happy to investigate if you have any ideas.
root@mail02:~/configuration/haj.ipfire.org/mail02# dpkg -l mailman3*
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==============-===============-============-============================================================
ii mailman3 3.3.8-2~deb12u1 all Mailing list management system
un mailman3-core <none> <none> (no description available)
un mailman3-doc <none> <none> (no description available)
ii mailman3-full 3.3.8-2~deb12u1 all Full Mailman3 mailing list management suite (metapackage)
ii mailman3-web 0+20200530-2.1 all Django project integrating Mailman3 Postorius and HyperKitty
Control: tags -1 -moreinfo [...] What do you need? :) We've been running Mailman 3 from Debian packages for a couple of months now, and we're seeing recurring OOM errors. At first, we were hitting 8GB memory usage, and bumped the memory of that machine to 16GB, but we're still getting OOMs. Our incident log is in: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41957 Here's a screenshot of a Grafana dashboard of our "per-process memory exporter" that shows, well, per-process memory usage:
Hello everyone, I would also happy to provide more information. I am running mailman3-web in Apache with mod_wsgi and I also have the same memory usage problem. Therefore I thought it was a mailman3 problem rather than in the application that is hosting it. I would be happy to hear if running mailman3 in Gunicorn resolves the problem, but maybe it is just a coincidence that the problem doesn’t appear there? All the best, -Michael
Have you pinned down exactly *what* process is eating memory? For us it's clearly uwsgi, so we're thinking the issue actually doesn't lie within mailman itself, and upstream seems to think so as well. It could be! If you could show us OOM dmesg logs, they should show which process was actually using memory when the OOM happens, this should inform next steps pretty well. Alternatively, having per-process memory graphs would help too, I think. Otherwise I'm not sure what peb needs here. :) a.
On 2025-01-15 15:56:27, Michael Tremer wrote: [...] I would try bumping memory to 16GiB, to see if it improves the situation for you. In our case, it clearly showed, rather conclusively, that the problem was not just "oh, mailman3 is using more memory" but more clearly "wow, there's a problem with uwsgi". In the above stats, it's not entirely clear to me the cause is with Apache: you have a lot going on there, and it *could* actually be there's an issue with the overall memory usage and Apache is just being tagged as the culprit by the OOM... But yeah, your numbers might show there's actually an underlying issue with mailman-web itself. Our tests with gunicorn will more conclusively show whether or not it's the case: if the issue goes away in gunicorn, then this could be an issue in *both* uwsgi and apache2-wsgi... a.
On 2025-01-15 16:13:44, Michael Tremer wrote: [...] For the record, I absolutely agree. Honestly, I'm scratching an itch here. If I can get away with getting rid of the OOM by switching to gunicorn, I'll be happy, especially since we use gunicorn elsewhere... a.
Good morning everyone, I ran the machine now with a total of 16 GiB - no other modifications have been made. Since then, the Apache process consumed the entirety of memory (minus the other basic system services) and was killed by the OOM. Graph attached.
I bet this is Apache killing and spawning its children from time to time, sometimes hitting leaky ones. Clearly there's a memory leak in this implementation as well, but we'll know better whether it's specific to apache/uwsgi when we test with gunicorn. Stay tuned!
On 2025-01-15 10:04:54, Antoine Beaupré wrote: [...] It's been a little over 24 hours and we can already say that we still get OOMs under gunicorn. The interesting thing is that it's a different process showing the OOM condition: instead of it being gunicorn itself (which you'd expect if it was designed like uwsgi or apache2-mod-wsgi), it's Python itself eating all the memory. See this comment: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41957#note_3151902 Directly link to the per-process memory graph: https://gitlab.torproject.org/-/project/441/uploads/c8ebf60612c426688e651853f251edd5/mem3.png So my theory, at this moment, is that the assumption that the problem is related to the process manager (uwsgi or apache or gunicorn) is incorrect; this is actually and truly a memory management issue inside the Python process running mailman3-web. This bug report would, therefore, seem to be filed at the right place. The question at this point is: how do we profile this any further? Any advice? run a memory profiler like austin? a.
At last, we have news! I *think* I have identified the culprit. While handling an unrelated issue (GDPR anyone?) we had to rebuild the search indexes and, while testing *that*, we found that we could pretty reliably crash mailman-web by... well, just searching all lists crashes it. Boom. It's search? I'm in the process of switching to Xapian now. This brings a whole lot of other issues (it uses more disk space and there's a bug in the xapian-haystack library that crashes indexing, see #), but so far, we've completely cleared out any OOM errors we were previously getting. Check out this beauty:
Forgot the bug number here, it's #1095320.
Hello,
Apologies for my late reply.
Hmm, I don’t want to bring everybody down, but I think I cannot confirm this.
# FTS
HAYSTACK_CONNECTIONS = {
'default' : {
'PATH' : "/var/lib/mailman3/web/fulltext_index",
'ENGINE' : 'xapian_backend.XapianEngine'
},
}
Looks pretty much the same to me.
Xapian has not been giving us great results in the rest of our infrastructure. We used it in dovecot and it is creating HUGE indexes which were about half the size of the original inboxes and therefore was even very slow to search in it. We migrated to Solr there, but that was not an option for Mailman.
Whoosh was expectedly worse.
So, has this solved it all for good for you guys? What release of xapian are you on?
# apt-cache show python3-xapian-haystack
Package: python3-xapian-haystack
Source: python-xapian-haystack
Version: 2.1.1-1+deb12u1
Installed-Size: 91
Maintainer: Debian Python Team <team+python@tracker.debian.org>
Architecture: all
Depends: python3-django-haystack, python3-xapian, python3-django, python3:any
Enhances: python3-django-haystack
Description-en: Xapian backend for Django-Haystack (Python3 version)
Xapian-haystack is a backend of Django-Haystack for the Xapian search engine.
It provides all the standard features of Haystack:
* Weighting
* Faceted search (date, query, etc.)
* Sorting
* Spelling suggestions
* EdgeNGram and Ngram (for autocomplete)
The endswith search operation is not supported.
.
This package contains the Python 3 version of the library.
Description-md5: 5e43ae0149e2df6b3df16ddcf87f3b13
Homepage: https://github.com/notanumber/xapian-haystack/
Section: python
Priority: optional
Filename: pool/main/p/python-xapian-haystack/python3-xapian-haystack_2.1.1-1+deb12u1_all.deb
Size: 21412
MD5sum: c203fd6ef9a992ad418f0685a528a45e
SHA256: 9b70209f36b9bccbfda0346b048d024c48fd4c168e8a0bfe811a3c770eb18287
On 2025-02-24 15:56:57, Michael Tremer wrote: [...] OOM/day, with peaks at 15, 120 when reindexing, and this is down to 1-5 a day, depending on the day. Kind of hard to track discrete events like this... We had a single OOM in the last 48h. That's "nice". We also have stupidly large Xapian indexes now, it's ridiculous. Clearly something wrong either with the haystack or the hyperkitty implementation. So far I've filed it in the latter: https://gitlab.com/mailman/hyperkitty/-/issues/533 So, TL;DR: improved, but not fixed. I suspect we had a multi-dimensional issue, of which search/whoosh *was* a part of, because we would see a huge increase in OOMs when rebuilding the indexes. But we're still having an issue, so perhaps there's something else. We tried to hookup a memory profiler (austin) but it failed because it didn't work with Python 3.11... so maybe that's something we'll try to revisit after our trixie upgrades (hopefully soon!). a.
Hello Antoine, Okay, this might still be a slight step in the right direction. This is my experience with Xapian and I have found confirmation that this is supposed to be normal. My mailbox indexes were massive and there was no point having them any more. So I can confirm that this looked very similar in Dovecot, too. I tried that but I was struggling with a missing sssd and some other things. Not sure I am ready to try again. It would also not help us to find out where exactly this went wrong if it were fixed :(
[...] A 10x amplification in the disk usage is not normal. https://gitlab.com/mailman/hyperkitty/-/issues/533 I use notmuch as a search index here, and the amplification is *opposite*, 4.5x *reduction* in disk usage compared to the original dataset.
https://gitlab.torproject.org/tpo/tpa/team/-/issues/41957 We've gone from 20-40 OOMs/week (multiple daily) to ~3 per week, so Xapian has definitely improved the situation. I don't think this bug report should be closed though: we still have a memory leak issue. I don't think it's reasonable for mailman to take 16GB of RAM for such small setups. Xapian is also using an unreasonable amount of disk space. But for all intents and purposes, this is as much effort I can dedicate to this. Hopefully, when we upgrade to trixie, we can run a profiler on this. Feel free to remind me of that in a year. Otherwise happy to provide more info as needed of course. Cheers!
De : Michael Tremer <michael.tremer@ipfire.org> À : Antoine Beaupré <anarcat@debian.org> Cc : 1014037@bugs.debian.org; Pierre-Elliott Bécue <peb@debian.org>; Peter Chubb <peter.chubb@unsw.edu.au> Date : 6 mars 2025 11:21:38 Objet : Re: Bug#1014037: mailman3-web: Possible memory leak: uwsgi OOMs after a few weeks Hello, For the sake of clarity I am waiting for transitional freeze to update all mailman3 packages as any py3 transition so far broke a lot of things. In parallel I started to dive a bit in this Xapian matter. Using mu, I agree that the current size for the index is weird. I have yet to finish understanding the codebase but I'll definitely try to see through it ASAP Bests,
Hello, I have been looking at alternatives to mailman3 recently. I think that there is a very good chance we would migrate to mlmmj. There are currently too many large outstanding problems with mailman and it seems that there is not enough of a community around it to get them fixed in time. Although the large memory consumption is mostly annoying and not a deal-breaker, mailman keeps stopping to accept emails sometimes and needs a restart. However, mlmmj (also packaged in Debian) is super small and super simple. The feature set is exactly what we need, although it would have been nice to have an API to subscribe/unsubscribe users. Since it is all a collection of small binaries, that can be built very easily with a couple of CGI scripts or so. But it does not have any archiving features beyond storing all emails in a directory. So there is public-inbox which has a simple web UI, stores emails in a Git repositories which can be cloned and backed up very easily, *and* it is using Xapian indexes. So I thought I would give this a go and import our lists into it - just so that I have a way to compare. On mailman, my Xapian index is about 4.9 GiB, on public-inbox I have 1.4 GiB. A significant change. This is still kind of large, but roughly only a quarter. The search also seems to be much faster. So I assume there is some configuration here that makes the index a lot smaller and the smaller the index the faster the search usually. The mlmmj + public-inbox solution seems to have gained a lot of traction recently. The Linux kernel people are using it (https://lore.kernel.org/), Gentoo is using it (https://public-inbox.gentoo.org/), Promox, the list is actually quote long. So I think we might have a better chance to get something back that worked as well as mailman 2 without all this large complexity. I agree. This is not the most painful problem in the world, but our mailing list needs to *just work* and I cannot spend a lot of time on keeping it working. I would still be curious to find out what the actual problem was here though...
On 2025-03-06 11:27:52, Pierre-Elliott Bécue wrote: [...] That sounds fantastic peb! Let me know if you need any more data! (What's "mu"?)
On 2025-01-23 16:20:07, Antoine Beaupré wrote: [...] We found a fix. First, we tried austin, but it was filling up our logs more than anything. Then lavamind found the trick: turns out the mbox export slurps the entire archive in memory before compression. https://gitlab.com/mailman/hyperkitty/-/issues/385 oops. so this is a bug in hyperkitty, and a pretty big one, IMHO. this is denial-of-service security level stuff, but it's being treated as a feature request upstream. the workaround is to set `HYPERKITTY_MBOX_EXPORT = False` perhaps we could ship such a configuration in debian by default? incident closed on our end, thanks everyone here for the help! a.