Dear Maintainer,
After upgrading ejabberd from 23.10 to 24.07, the package is completely
unusable.
1) Starting the server with systemd results in a hang and finally a timeout.
2) Starting it from the command line with 'sudo -u ejabber ejabberctl
foreground' results in some processes started, but no listening sockets.
Logs provide no useful info whatsoever in both cases.
Output with 'loglevel: debug':
2024-10-15 12:16:39.833463+02:00 [info] <0.123.0>@ejabberd_config:load/1:82 Loading configuration from /etc/ejabberd/ejabberd.yml
2024-10-15 12:16:39.846757+02:00 [notice] <0.123.0>@ejabberd_logger:set/1:419 Changing loglevel from 'info' to 'debug'
2024-10-15 12:16:40.079603+02:00 [info] <0.123.0>@ejabberd_config:load/1:89 Configuration loaded successfully
2024-10-15 12:16:40.106040+02:00 [debug] <0.74.0>@supervisor:report_progress/2:1563 PROGRESS REPORT:
supervisor: {local,kernel_safe_sup}
started: [{pid,<0.226.0>},
{id,dets_sup},
{mfargs,{dets_sup,start_link,[]}},
{restart_type,permanent},
{significant,false},
{shutdown,1000},
{child_type,supervisor}]
2024-10-15 12:16:40.106315+02:00 [debug] <0.74.0>@supervisor:report_progress/2:1563 PROGRESS REPORT:
supervisor: {local,kernel_safe_sup}
started: [{pid,<0.227.0>},
{id,dets},
{mfargs,{dets_server,start_link,[]}},
{restart_type,permanent},
{significant,false},
{shutdown,2000},
{child_type,worker}]
2024-10-15 12:16:40.120122+02:00 [debug] <0.237.0>@supervisor:report_progress/2:1563 PROGRESS REPORT:
supervisor: {local,mnesia_sup}
started: [{pid,<0.238.0>},
{id,mnesia_event},
{mfargs,{mnesia_sup,start_event,[]}},
{restart_type,permanent},
{significant,false},
{shutdown,30000},
{child_type,worker}]
2024-10-15 12:16:40.121216+02:00 [debug] <0.237.0>@supervisor:report_progress/2:1563 PROGRESS REPORT:
supervisor: {local,mnesia_sup}
started: [{pid,<0.239.0>},
{id,mnesia_ext_sup},
{mfargs,{mnesia_ext_sup,start,[]}},
{restart_type,permanent},
{significant,false},
{shutdown,infinity},
{child_type,supervisor}]
2024-10-15 12:16:40.121668+02:00 [debug] <0.240.0>@supervisor:report_progress/2:1563 PROGRESS REPORT:
supervisor: {local,mnesia_kernel_sup}
started: [{pid,<0.241.0>},
{id,mnesia_monitor},
{mfargs,{mnesia_monitor,start,[]}},
{restart_type,permanent},
{significant,false},
{shutdown,3000},
{child_type,worker}]
2024-10-15 12:16:40.123445+02:00 [debug] <0.240.0>@supervisor:report_progress/2:1563 PROGRESS REPORT:
supervisor: {local,mnesia_kernel_sup}
started: [{pid,<0.242.0>},
{id,mnesia_subscr},
{mfargs,{mnesia_subscr,start,[]}},
{restart_type,permanent},
{significant,false},
{shutdown,3000},
{child_type,worker}]
2024-10-15 12:16:40.127259+02:00 [debug] <0.240.0>@supervisor:report_progress/2:1563 PROGRESS REPORT:
supervisor: {local,mnesia_kernel_sup}
started: [{pid,<0.243.0>},
{id,mnesia_locker},
{mfargs,{mnesia_locker,start,[]}},
{restart_type,permanent},
{significant,false},
{shutdown,3000},
{child_type,worker}]
2024-10-15 12:16:40.130249+02:00 [debug] <0.240.0>@supervisor:report_progress/2:1563 PROGRESS REPORT:
supervisor: {local,mnesia_kernel_sup}
started: [{pid,<0.244.0>},
{id,mnesia_recover},
{mfargs,{mnesia_recover,start,[]}},
{restart_type,permanent},
{significant,false},
{shutdown,180000},
{child_type,worker}]
2024-10-15 12:16:40.153184+02:00 [debug] <0.74.0>@supervisor:report_progress/2:1563 PROGRESS REPORT:
supervisor: {local,kernel_safe_sup}
started: [{pid,<0.249.0>},
{id,disk_log_sup},
{mfargs,{disk_log_sup,start_link,[]}},
{restart_type,permanent},
{significant,false},
{shutdown,1000},
{child_type,supervisor}]
2024-10-15 12:16:40.153398+02:00 [debug] <0.74.0>@supervisor:report_progress/2:1563 PROGRESS REPORT:
supervisor: {local,kernel_safe_sup}
started: [{pid,<0.250.0>},
{id,disk_log_server},
{mfargs,{disk_log_server,start_link,[]}},
{restart_type,permanent},
{significant,false},
{shutdown,2000},
{child_type,worker}]
Hi, I tried again, a bit more logging this time. 1) Stopped ejabberd 2) Restored a backup of /var/lib/ejabberd 3) Downgraded ejabberd and erlang packages to their versions in sid on 2024-08-01 4) The upgrade succeeded and started ejabberd succesfully 5) Fully functional ejabberd server! 6) Stopped ejabberd 7) Upgraded ejabberd and erlang to current versions in sid 8) dpkg hangs on starting ejabberd 9) ejabberd processes take 100+% cpu 10) systemd eventually times out the start of ejabberd 11) systemd restarts ejabberd 12) ejabberd dysfunctional The log of steps 7-12 are attached. -- mvg, Alex Hermann
Only downgrading ejabberd to 23.10-1 and keeping erlang packages up2date (versioned as in the initial report) also results in a functional ejabberd server. -- mvg, Alex Hermann
control: severity -1 grave I’m raising the severity of this bug as it makes the package completely unusable.
* Soren Stoutner <soren@debian.org> [2025-03-26 16:56]: Installing and running ejabberd 24.12-2 on a fresh trixie system works just fine for me. So maybe the "grave" severity isn't required here, as that's meant for bugs that render the package unusable by "most or all users"?
* Alex <gaaf@gmx.net> [2024-11-01 13:29]: According to your logs, the actual ejabberd process was started just fine. However, the systemd unit then calls `ejabberdctl started` in order to block until startup is completed, and _that_ process hangs. `ejabberdctl started` tries to talk to the running ejabberd process via TCP, I'd assume there's some issue with that connection, and we could try to track that down. However, it might make sense to make `ejabberdctl foreground` work (which doesn't involve such a connection), first. Could you set the log level to 'debug' and show us the output of that command?
Fair enough. Has anyone successfully tested an upgrade from 23.10-1? That broke for me. If it breaks for everyone upgrading from 23.10-1, then the severity of grave would be justified. If it only breaks for a subset of upgrades, then it wouldn't.
I did test upgrades from 23.01-1 in Debian/bookworm and that worked fine and without any problems at all. I haven't tested upgrades from 23.10-1 though.
* Soren Stoutner <soren@debian.org> [2025-03-28 09:37]: We could try to track down the details if you like. I'd start with: 1. Set `loglevel: debug` in `/etc/ejabberd.yml`. 2. systemctl stop ejabberd 3. ejabberdctl foreground What's the output?
root@mail:/etc/ejabberd# ejabberdctl foreground
2025-03-31 12:19:28.482905-07:00 [info] Loading configuration from /
etc/ejabberd/ejabberd.yml
2025-03-31 12:19:28.490786-07:00 [notice] Changing loglevel from
'info' to 'debug'
2025-03-31 12:19:28.635748-07:00 [info] Configuration loaded
successfully
2025-03-31 12:19:28.652303-07:00 [debug] PROGRESS REPORT:
supervisor: {local,kernel_safe_sup}
started: [{pid,<0.217.0>},
{id,dets_sup},
{mfargs,{dets_sup,start_link,[]}},
{restart_type,permanent},
{significant,false},
{shutdown,1000},
{child_type,supervisor}]
2025-03-31 12:19:28.652504-07:00 [debug] PROGRESS REPORT:
supervisor: {local,kernel_safe_sup}
started: [{pid,<0.218.0>},
{id,dets},
{mfargs,{dets_server,start_link,[]}},
{restart_type,permanent},
{significant,false},
{shutdown,2000},
{child_type,worker}]
2025-03-31 12:19:28.661390-07:00 [debug] PROGRESS REPORT:
supervisor: {local,mnesia_sup}
started: [{pid,<0.228.0>},
{id,mnesia_event},
{mfargs,{mnesia_sup,start_event,[]}},
{restart_type,permanent},
{significant,false},
{shutdown,30000},
{child_type,worker}]
2025-03-31 12:19:28.661994-07:00 [debug] PROGRESS REPORT:
supervisor: {local,mnesia_sup}
started: [{pid,<0.229.0>},
{id,mnesia_ext_sup},
{mfargs,{mnesia_ext_sup,start,[]}},
{restart_type,permanent},
{significant,false},
{shutdown,infinity},
{child_type,supervisor}]
2025-03-31 12:19:28.662312-07:00 [debug] PROGRESS REPORT:
supervisor: {local,mnesia_kernel_sup}
started: [{pid,<0.231.0>},
{id,mnesia_monitor},
{mfargs,{mnesia_monitor,start,[]}},
{restart_type,permanent},
{significant,false},
{shutdown,3000},
{child_type,worker}]
2025-03-31 12:19:28.663217-07:00 [debug] PROGRESS REPORT:
supervisor: {local,mnesia_kernel_sup}
started: [{pid,<0.232.0>},
{id,mnesia_subscr},
{mfargs,{mnesia_subscr,start,[]}},
{restart_type,permanent},
{significant,false},
{shutdown,3000},
{child_type,worker}]
2025-03-31 12:19:28.665174-07:00 [debug] PROGRESS REPORT:
supervisor: {local,mnesia_kernel_sup}
started: [{pid,<0.233.0>},
{id,mnesia_locker},
{mfargs,{mnesia_locker,start,[]}},
{restart_type,permanent},
{significant,false},
{shutdown,3000},
{child_type,worker}]
2025-03-31 12:19:28.666551-07:00 [debug] PROGRESS REPORT:
supervisor: {local,mnesia_kernel_sup}
started: [{pid,<0.234.0>},
{id,mnesia_recover},
{mfargs,{mnesia_recover,start,[]}},
{restart_type,permanent},
{significant,false},
{shutdown,180000},
{child_type,worker}]
2025-03-31 12:19:28.681598-07:00 [debug] PROGRESS REPORT:
supervisor: {local,kernel_safe_sup}
started: [{pid,<0.239.0>},
{id,disk_log_sup},
{mfargs,{disk_log_sup,start_link,[]}},
{restart_type,permanent},
{significant,false},
{shutdown,1000},
{child_type,supervisor}]
2025-03-31 12:19:28.681801-07:00 [debug] PROGRESS REPORT:
supervisor: {local,kernel_safe_sup}
started: [{pid,<0.240.0>},
{id,disk_log_server},
{mfargs,{disk_log_server,start_link,[]}},
{restart_type,permanent},
{significant,false},
{shutdown,2000},
{child_type,worker}]
The contents of /var/log/ejabberd/error.log:
2025-03-31 12:19:20.471345-07:00 [error] <0.148.0> External eimp
process (pid=972861) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471482-07:00 [error] <0.150.0> External eimp
process (pid=972862) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471618-07:00 [error] <0.160.0> External eimp
process (pid=972867) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471543-07:00 [error] <0.152.0> External eimp
process (pid=972863) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471485-07:00 [error] <0.154.0> External eimp
process (pid=972864) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471771-07:00 [error] <0.164.0> External eimp
process (pid=972869) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471845-07:00 [error] <0.168.0> External eimp
process (pid=972871) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471868-07:00 [error] <0.176.0> External eimp
process (pid=972875) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471834-07:00 [error] <0.166.0> External eimp
process (pid=972870) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471582-07:00 [error] <0.158.0> External eimp
process (pid=972866) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471515-07:00 [error] <0.156.0> External eimp
process (pid=972865) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471547-07:00 [error] <0.170.0> External eimp
process (pid=972872) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471590-07:00 [error] <0.172.0> External eimp
process (pid=972873) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471916-07:00 [error] <0.178.0> External eimp
process (pid=972876) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471755-07:00 [error] <0.174.0> External eimp
process (pid=972874) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471763-07:00 [error] <0.162.0> External eimp
process (pid=972868) has terminated unexpectedly, restarting in a few
seconds
ejabberdctl pegs my CPU at about 10% and never successfully launches a
running instance of the server. I terminated the process after about
an hour without anything else being written to the log.
* Soren Stoutner <soren@debian.org> [2025-03-31 13:42]: eimp is a small helper process, we should try to figure out why it terminates. Can you call /usr/lib/erlang/lib/p1_eimp*/priv/bin/eimp manually? (It's supposed to run without producing any output; if that works, just press Ctrl+C to terminate it.) May AppArmor or similar prevent eimp from running? Maybe just `systemctl stop apparmor` and try again just to rule that out?
I can, but every time I do so I have to take my system offline and then restore the database from a backup to fix it, so it might be a couple of days before I can get to it.
* Soren Stoutner <soren@debian.org> [2025-04-01 09:09]: I'd hope you could stop ejabberd and PostgreSQL/MariaDB (if you're using that), copy /etc/ejabberd and (if you're using Mnesia) /var/lib/ejabberd and/or /var/lib/postgresql or /var/lib/mysql (if you're using that) into some backup directory before upgrading ejabberd. Before reverting to the old version, stop ejabberd/SQL and restore those directories from the backup directory. If you're using PostgreSQL/MariaDB as storage backend (strongly recommended BTW), the contents of /var/lib/ejabberd can be deleted between upgrades/downgrades.
Holger,
However, I did get this interesting bit of information in the log:
2025-04-04 16:48:46.882266-07:00 [error] <0.228.0>
Mnesia(ejabberd@mail): ** ERROR ** (core dumped to file: "/var/lib/
ejabberd/MnesiaCore.ejabberd@mail_1743_810526_689352")
** FATAL ** mnesia_tm crashed: {badarg,
[{erlang,unlink,
[undefined],
[{error_info,
#{module => erl_erts_errors}}]},
{mnesia_controller,
release_schema_commit_lock,0,
[{file,"mnesia_controller.erl"},
{line,333}]},
{mnesia_dumper,update,3,
[{file,"mnesia_dumper.erl"},{line,
301}]},
{mnesia_tm,do_commit,3,
[{file,"mnesia_tm.erl"},{line,
1846}]},
{mnesia_tm,recover_coordinator,2,
[{file,"mnesia_tm.erl"},{line,
646}]},
{mnesia_tm,handle_exit,3,
[{file,"mnesia_tm.erl"},{line,
623}]},
{mnesia_sp,init_proc,4,
[{file,"mnesia_sp.erl"},{line,
35}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,
329}]}]} state: [<0.230.0>]
2025-04-04 16:48:56.690142-07:00 [error]
<0.471.0>@supervisor:do_restart/3:1300 SUPERVISOR REPORT:
supervisor: {local,ejabberd_backend_sup}
errorContext: child_terminated
reason: killed
offender: [{pid,<0.475.0>},
{id,ejabberd_router_mnesia},
{mfargs,{ejabberd_router_mnesia,start_link,[]}},
{restart_type,transient},
{significant,false},
{shutdown,5000},
{child_type,worker}]
Before attempting the update
MnesiaCore.ejabberd@mail_1743_810526_689352 was 0.3 MB. After
attempting the update, the file size had ballooned to 235 MB.
However, before the update archive_msg.DAT was 285 MB, and ballooned
to over 300 MB during the failed update.
Do these file sizes seem large to you? It is perhaps a timeout trying
to process the database because of the size? Is there some way to
compact/purge the database?
because with my last update to testing a few weeks ago some other
package borked my AppArmor installation. I haven’t looked deeply into
it. I assume it will fix itself with the next batch of updates I plan
to install next week. But just in case it is important for this bug
report:
Apr 04 16:54:40 mail apparmor.systemd[4134057]: Restarting AppArmor
Apr 04 16:54:40 mail apparmor.systemd[4134057]: Reloading AppArmor
profiles
Apr 04 16:54:40 mail apparmor.systemd[4134161]: Skipping profile in /
etc/apparmor.d/disable: usr.bin.thunderbird
Apr 04 16:54:40 mail apparmor.systemd[4134170]: profile has merged
rule with conflicting x modifiers
Apr 04 16:54:40 mail apparmor.systemd[4134170]: ERROR processing
regexs for profile su, failed to load
Apr 04 16:54:40 mail apparmor.systemd[4134057]: Error: At least one
profile failed to load
Apr 04 16:54:40 mail systemd[1]: apparmor.service: Main process
exited, code=exited, status=1/FAILURE
* Soren Stoutner <soren@debian.org> [2025-04-04 17:22]: Ah, thanks. Yes, this clearly indicates that the Mnesia table used for storing the server-side message archive is borked (there's a max. size limit of 2 GiB), and it's most probably the reason ejabberd fails to start up. There's nothing the Debian package (maintainer) could do to address this, I'm afraid. The problem is that Mnesia is not designed for persistent storage of non-trivial amounts of data (it's more of a Redis replacement than an SQL replacement). It's still ejabberd's default storage backend for historical reasons. Back in the days, there was no server-side archive. But when using it for this kind of data, breakage can happen (e.g., when upgrading Erlang). That's the reason you'll see a warning to not use Mnesia in the logs during startup, and in the default configuration. I've been advocating for removing Mnesia support altogether and switching to SQLite by default, but we didn't get round to that, yet (upstream, I mean). So, for cases such as yours, I see no proper fix, and would instead suggest one of the following two options: 1) Remove Mnesia's archive table (i.e., all messages stored server-side) and maybe also the PubSub tables before upgrading. The remaining tables will probably survive the upgrade, and fresh archive/PubSub tables will be usable after the upgrade (until the next breakage, at least). 2) Convert to SQL before upgrading (strongly recommended). I could help with either (e.g., in the ejabberd support channel), but maybe that's out of scope here?