#1085140 ejabberd does not listen on any port after update to 24.07

Package:
ejabberd
Source:
ejabberd
Description:
extensible realtime platform (XMPP server + MQTT broker + SIP service)
Submitter:
Alex Hermann
Date:
2025-04-05 08:42:02 UTC
Severity:
normal
Tags:
#1085140#5
Date:
2024-10-15 10:38:26 UTC
From:
To:
Dear Maintainer,

After upgrading ejabberd from 23.10 to 24.07, the package is completely
unusable.

1) Starting the server with systemd results in a hang and finally a timeout.
2) Starting it from the command line with 'sudo -u ejabber ejabberctl
   foreground' results in some processes started, but no listening sockets.

Logs provide no useful info whatsoever in both cases.


Output with 'loglevel: debug':

2024-10-15 12:16:39.833463+02:00 [info] <0.123.0>@ejabberd_config:load/1:82 Loading configuration from /etc/ejabberd/ejabberd.yml
2024-10-15 12:16:39.846757+02:00 [notice] <0.123.0>@ejabberd_logger:set/1:419 Changing loglevel from 'info' to 'debug'
2024-10-15 12:16:40.079603+02:00 [info] <0.123.0>@ejabberd_config:load/1:89 Configuration loaded successfully
2024-10-15 12:16:40.106040+02:00 [debug] <0.74.0>@supervisor:report_progress/2:1563 PROGRESS REPORT:
    supervisor: {local,kernel_safe_sup}
    started: [{pid,<0.226.0>},
              {id,dets_sup},
              {mfargs,{dets_sup,start_link,[]}},
              {restart_type,permanent},
              {significant,false},
              {shutdown,1000},
              {child_type,supervisor}]

2024-10-15 12:16:40.106315+02:00 [debug] <0.74.0>@supervisor:report_progress/2:1563 PROGRESS REPORT:
    supervisor: {local,kernel_safe_sup}
    started: [{pid,<0.227.0>},
              {id,dets},
              {mfargs,{dets_server,start_link,[]}},
              {restart_type,permanent},
              {significant,false},
              {shutdown,2000},
              {child_type,worker}]

2024-10-15 12:16:40.120122+02:00 [debug] <0.237.0>@supervisor:report_progress/2:1563 PROGRESS REPORT:
    supervisor: {local,mnesia_sup}
    started: [{pid,<0.238.0>},
              {id,mnesia_event},
              {mfargs,{mnesia_sup,start_event,[]}},
              {restart_type,permanent},
              {significant,false},
              {shutdown,30000},
              {child_type,worker}]

2024-10-15 12:16:40.121216+02:00 [debug] <0.237.0>@supervisor:report_progress/2:1563 PROGRESS REPORT:
    supervisor: {local,mnesia_sup}
    started: [{pid,<0.239.0>},
              {id,mnesia_ext_sup},
              {mfargs,{mnesia_ext_sup,start,[]}},
              {restart_type,permanent},
              {significant,false},
              {shutdown,infinity},
              {child_type,supervisor}]

2024-10-15 12:16:40.121668+02:00 [debug] <0.240.0>@supervisor:report_progress/2:1563 PROGRESS REPORT:
    supervisor: {local,mnesia_kernel_sup}
    started: [{pid,<0.241.0>},
              {id,mnesia_monitor},
              {mfargs,{mnesia_monitor,start,[]}},
              {restart_type,permanent},
              {significant,false},
              {shutdown,3000},
              {child_type,worker}]

2024-10-15 12:16:40.123445+02:00 [debug] <0.240.0>@supervisor:report_progress/2:1563 PROGRESS REPORT:
    supervisor: {local,mnesia_kernel_sup}
    started: [{pid,<0.242.0>},
              {id,mnesia_subscr},
              {mfargs,{mnesia_subscr,start,[]}},
              {restart_type,permanent},
              {significant,false},
              {shutdown,3000},
              {child_type,worker}]

2024-10-15 12:16:40.127259+02:00 [debug] <0.240.0>@supervisor:report_progress/2:1563 PROGRESS REPORT:
    supervisor: {local,mnesia_kernel_sup}
    started: [{pid,<0.243.0>},
              {id,mnesia_locker},
              {mfargs,{mnesia_locker,start,[]}},
              {restart_type,permanent},
              {significant,false},
              {shutdown,3000},
              {child_type,worker}]

2024-10-15 12:16:40.130249+02:00 [debug] <0.240.0>@supervisor:report_progress/2:1563 PROGRESS REPORT:
    supervisor: {local,mnesia_kernel_sup}
    started: [{pid,<0.244.0>},
              {id,mnesia_recover},
              {mfargs,{mnesia_recover,start,[]}},
              {restart_type,permanent},
              {significant,false},
              {shutdown,180000},
              {child_type,worker}]

2024-10-15 12:16:40.153184+02:00 [debug] <0.74.0>@supervisor:report_progress/2:1563 PROGRESS REPORT:
    supervisor: {local,kernel_safe_sup}
    started: [{pid,<0.249.0>},
              {id,disk_log_sup},
              {mfargs,{disk_log_sup,start_link,[]}},
              {restart_type,permanent},
              {significant,false},
              {shutdown,1000},
              {child_type,supervisor}]

2024-10-15 12:16:40.153398+02:00 [debug] <0.74.0>@supervisor:report_progress/2:1563 PROGRESS REPORT:
    supervisor: {local,kernel_safe_sup}
    started: [{pid,<0.250.0>},
              {id,disk_log_server},
              {mfargs,{disk_log_server,start_link,[]}},
              {restart_type,permanent},
              {significant,false},
              {shutdown,2000},
              {child_type,worker}]

#1085140#10
Date:
2024-11-01 12:29:01 UTC
From:
To:
Hi,


I tried again, a bit more logging this time.

1) Stopped ejabberd
2) Restored a backup of /var/lib/ejabberd
3) Downgraded ejabberd and erlang packages to their versions in sid on
2024-08-01
4) The upgrade succeeded and started ejabberd succesfully

5) Fully functional ejabberd server!

6) Stopped ejabberd
7) Upgraded ejabberd and erlang to current versions in sid
8) dpkg hangs on starting ejabberd
9) ejabberd processes take 100+% cpu
10) systemd eventually times out the start of ejabberd
11) systemd restarts ejabberd
12) ejabberd dysfunctional


The log of steps 7-12 are attached.
--
mvg,

Alex Hermann

#1085140#15
Date:
2024-11-01 12:29:42 UTC
From:
To:
Only downgrading ejabberd to 23.10-1 and keeping erlang packages up2date
(versioned as in the initial report) also results in a functional
ejabberd
server.
--
mvg,

Alex Hermann

#1085140#20
Date:
2025-03-26 23:56:16 UTC
From:
To:
control: severity -1 grave

I’m raising the severity of this bug as it makes the package
completely unusable.

#1085140#27
Date:
2025-03-28 11:41:20 UTC
From:
To:
* Soren Stoutner <soren@debian.org> [2025-03-26 16:56]:

Installing and running ejabberd 24.12-2 on a fresh trixie system works
just fine for me.  So maybe the "grave" severity isn't required here, as
that's meant for bugs that render the package unusable by "most or all
users"?

#1085140#32
Date:
2025-03-28 11:40:39 UTC
From:
To:
* Alex <gaaf@gmx.net> [2024-11-01 13:29]:

According to your logs, the actual ejabberd process was started just
fine.  However, the systemd unit then calls `ejabberdctl started` in
order to block until startup is completed, and _that_ process hangs.
`ejabberdctl started` tries to talk to the running ejabberd process via
TCP, I'd assume there's some issue with that connection, and we could
try to track that down.

However, it might make sense to make `ejabberdctl foreground` work
(which doesn't involve such a connection), first.  Could you set the log
level to 'debug' and show us the output of that command?

#1085140#37
Date:
2025-03-28 16:37:53 UTC
From:
To:
Fair enough.  Has anyone successfully tested an upgrade from 23.10-1?
That broke for me.  If it breaks for everyone upgrading from 23.10-1,
then the severity of grave would be justified.  If it only breaks for
a subset of upgrades, then it wouldn't.

#1085140#42
Date:
2025-03-30 08:30:59 UTC
From:
To:
I did test upgrades from 23.01-1 in Debian/bookworm and that worked fine and
without any problems at all. I haven't tested upgrades from 23.10-1 though.

#1085140#47
Date:
2025-03-30 10:39:27 UTC
From:
To:
* Soren Stoutner <soren@debian.org> [2025-03-28 09:37]:

We could try to track down the details if you like.  I'd start with:

1. Set `loglevel: debug` in `/etc/ejabberd.yml`.
2. systemctl stop ejabberd
3. ejabberdctl foreground

What's the output?

#1085140#56
Date:
2025-03-31 20:42:31 UTC
From:
To:
root@mail:/etc/ejabberd# ejabberdctl foreground
2025-03-31 12:19:28.482905-07:00 [info] Loading configuration from /
etc/ejabberd/ejabberd.yml
2025-03-31 12:19:28.490786-07:00 [notice] Changing loglevel from
'info' to 'debug'
2025-03-31 12:19:28.635748-07:00 [info] Configuration loaded
successfully
2025-03-31 12:19:28.652303-07:00 [debug] PROGRESS REPORT:
    supervisor: {local,kernel_safe_sup}
    started: [{pid,<0.217.0>},
              {id,dets_sup},
              {mfargs,{dets_sup,start_link,[]}},
              {restart_type,permanent},
              {significant,false},
              {shutdown,1000},
              {child_type,supervisor}]

2025-03-31 12:19:28.652504-07:00 [debug] PROGRESS REPORT:
    supervisor: {local,kernel_safe_sup}
    started: [{pid,<0.218.0>},
              {id,dets},
              {mfargs,{dets_server,start_link,[]}},
              {restart_type,permanent},
              {significant,false},
              {shutdown,2000},
              {child_type,worker}]

2025-03-31 12:19:28.661390-07:00 [debug] PROGRESS REPORT:
    supervisor: {local,mnesia_sup}
    started: [{pid,<0.228.0>},
              {id,mnesia_event},
              {mfargs,{mnesia_sup,start_event,[]}},
              {restart_type,permanent},
              {significant,false},
              {shutdown,30000},
              {child_type,worker}]

2025-03-31 12:19:28.661994-07:00 [debug] PROGRESS REPORT:
    supervisor: {local,mnesia_sup}
    started: [{pid,<0.229.0>},
              {id,mnesia_ext_sup},
              {mfargs,{mnesia_ext_sup,start,[]}},
              {restart_type,permanent},
              {significant,false},
              {shutdown,infinity},
              {child_type,supervisor}]

2025-03-31 12:19:28.662312-07:00 [debug] PROGRESS REPORT:
    supervisor: {local,mnesia_kernel_sup}
    started: [{pid,<0.231.0>},
              {id,mnesia_monitor},
              {mfargs,{mnesia_monitor,start,[]}},
              {restart_type,permanent},
              {significant,false},
              {shutdown,3000},
              {child_type,worker}]

2025-03-31 12:19:28.663217-07:00 [debug] PROGRESS REPORT:
    supervisor: {local,mnesia_kernel_sup}
    started: [{pid,<0.232.0>},
              {id,mnesia_subscr},
              {mfargs,{mnesia_subscr,start,[]}},
              {restart_type,permanent},
              {significant,false},
              {shutdown,3000},
              {child_type,worker}]

2025-03-31 12:19:28.665174-07:00 [debug] PROGRESS REPORT:
    supervisor: {local,mnesia_kernel_sup}
    started: [{pid,<0.233.0>},
              {id,mnesia_locker},
              {mfargs,{mnesia_locker,start,[]}},
              {restart_type,permanent},
              {significant,false},
              {shutdown,3000},
              {child_type,worker}]

2025-03-31 12:19:28.666551-07:00 [debug] PROGRESS REPORT:
    supervisor: {local,mnesia_kernel_sup}
    started: [{pid,<0.234.0>},
              {id,mnesia_recover},
              {mfargs,{mnesia_recover,start,[]}},
              {restart_type,permanent},
              {significant,false},
              {shutdown,180000},
              {child_type,worker}]

2025-03-31 12:19:28.681598-07:00 [debug] PROGRESS REPORT:
    supervisor: {local,kernel_safe_sup}
    started: [{pid,<0.239.0>},
              {id,disk_log_sup},
              {mfargs,{disk_log_sup,start_link,[]}},
              {restart_type,permanent},
              {significant,false},
              {shutdown,1000},
              {child_type,supervisor}]

2025-03-31 12:19:28.681801-07:00 [debug] PROGRESS REPORT:
    supervisor: {local,kernel_safe_sup}
    started: [{pid,<0.240.0>},
              {id,disk_log_server},
              {mfargs,{disk_log_server,start_link,[]}},
              {restart_type,permanent},
              {significant,false},
              {shutdown,2000},
              {child_type,worker}]


The contents of /var/log/ejabberd/error.log:

2025-03-31 12:19:20.471345-07:00 [error] <0.148.0> External eimp
process (pid=972861) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471482-07:00 [error] <0.150.0> External eimp
process (pid=972862) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471618-07:00 [error] <0.160.0> External eimp
process (pid=972867) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471543-07:00 [error] <0.152.0> External eimp
process (pid=972863) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471485-07:00 [error] <0.154.0> External eimp
process (pid=972864) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471771-07:00 [error] <0.164.0> External eimp
process (pid=972869) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471845-07:00 [error] <0.168.0> External eimp
process (pid=972871) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471868-07:00 [error] <0.176.0> External eimp
process (pid=972875) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471834-07:00 [error] <0.166.0> External eimp
process (pid=972870) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471582-07:00 [error] <0.158.0> External eimp
process (pid=972866) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471515-07:00 [error] <0.156.0> External eimp
process (pid=972865) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471547-07:00 [error] <0.170.0> External eimp
process (pid=972872) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471590-07:00 [error] <0.172.0> External eimp
process (pid=972873) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471916-07:00 [error] <0.178.0> External eimp
process (pid=972876) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471755-07:00 [error] <0.174.0> External eimp
process (pid=972874) has terminated unexpectedly, restarting in a few
seconds
2025-03-31 12:19:20.471763-07:00 [error] <0.162.0> External eimp
process (pid=972868) has terminated unexpectedly, restarting in a few
seconds



ejabberdctl pegs my CPU at about 10% and never successfully launches a
running instance of the server.  I terminated the process after about
an hour without anything else being written to the log.

#1085140#61
Date:
2025-04-01 11:16:57 UTC
From:
To:
* Soren Stoutner <soren@debian.org> [2025-03-31 13:42]:

eimp is a small helper process, we should try to figure out why it
terminates.

Can you call /usr/lib/erlang/lib/p1_eimp*/priv/bin/eimp manually?  (It's
supposed to run without producing any output; if that works, just press
Ctrl+C to terminate it.)

May AppArmor or similar prevent eimp from running?  Maybe just
`systemctl stop apparmor` and try again just to rule that out?

#1085140#66
Date:
2025-04-01 16:09:47 UTC
From:
To:
I can, but every time I do so I have to take my system offline and
then restore the database from a backup to fix it, so it might be a
couple of days before I can get to it.

#1085140#71
Date:
2025-04-02 09:34:42 UTC
From:
To:
* Soren Stoutner <soren@debian.org> [2025-04-01 09:09]:

I'd hope you could stop ejabberd and PostgreSQL/MariaDB (if you're using
that), copy /etc/ejabberd and (if you're using Mnesia) /var/lib/ejabberd
and/or /var/lib/postgresql or /var/lib/mysql (if you're using that) into
some backup directory before upgrading ejabberd.  Before reverting to
the old version, stop ejabberd/SQL and restore those directories from
the backup directory.

If you're using PostgreSQL/MariaDB as storage backend (strongly
recommended BTW), the contents of /var/lib/ejabberd can be deleted
between upgrades/downgrades.

#1085140#76
Date:
2025-04-05 00:22:34 UTC
From:
To:
Holger,

However, I did get this interesting bit of information in the log:

2025-04-04 16:48:46.882266-07:00 [error] <0.228.0>
Mnesia(ejabberd@mail): ** ERROR ** (core dumped to file: "/var/lib/
ejabberd/MnesiaCore.ejabberd@mail_1743_810526_689352")
 ** FATAL ** mnesia_tm crashed: {badarg,
                                 [{erlang,unlink,
                                   [undefined],
                                   [{error_info,
                                     #{module => erl_erts_errors}}]},
                                  {mnesia_controller,
                                   release_schema_commit_lock,0,
                                   [{file,"mnesia_controller.erl"},
                                    {line,333}]},
                                  {mnesia_dumper,update,3,
                                   [{file,"mnesia_dumper.erl"},{line,
301}]},
                                  {mnesia_tm,do_commit,3,
                                   [{file,"mnesia_tm.erl"},{line,
1846}]},
                                  {mnesia_tm,recover_coordinator,2,
                                   [{file,"mnesia_tm.erl"},{line,
646}]},
                                  {mnesia_tm,handle_exit,3,
                                   [{file,"mnesia_tm.erl"},{line,
623}]},
                                  {mnesia_sp,init_proc,4,
                                   [{file,"mnesia_sp.erl"},{line,
35}]},
                                  {proc_lib,init_p_do_apply,3,
                                   [{file,"proc_lib.erl"},{line,
329}]}]} state: [<0.230.0>]

2025-04-04 16:48:56.690142-07:00 [error]
<0.471.0>@supervisor:do_restart/3:1300 SUPERVISOR REPORT:
    supervisor: {local,ejabberd_backend_sup}
    errorContext: child_terminated
    reason: killed
    offender: [{pid,<0.475.0>},
               {id,ejabberd_router_mnesia},
               {mfargs,{ejabberd_router_mnesia,start_link,[]}},
               {restart_type,transient},
               {significant,false},
               {shutdown,5000},
               {child_type,worker}]

Before attempting the update
MnesiaCore.ejabberd@mail_1743_810526_689352 was 0.3 MB.  After
attempting the update, the file size had ballooned to 235 MB.

However, before the update archive_msg.DAT was 285 MB, and ballooned
to over 300 MB during the failed update.

Do these file sizes seem large to you?  It is perhaps a timeout trying
to process the database because of the size?  Is there some way to
compact/purge the database?
because with my last update to testing a few weeks ago some other
package borked my AppArmor installation.  I haven’t looked deeply into
it.  I assume it will fix itself with the next batch of updates I plan
to install next week.  But just in case it is important for this bug
report:

Apr 04 16:54:40 mail apparmor.systemd[4134057]: Restarting AppArmor
Apr 04 16:54:40 mail apparmor.systemd[4134057]: Reloading AppArmor
profiles
Apr 04 16:54:40 mail apparmor.systemd[4134161]: Skipping profile in /
etc/apparmor.d/disable: usr.bin.thunderbird
Apr 04 16:54:40 mail apparmor.systemd[4134170]: profile has merged
rule with conflicting x modifiers
Apr 04 16:54:40 mail apparmor.systemd[4134170]: ERROR processing
regexs for profile su, failed to load
Apr 04 16:54:40 mail apparmor.systemd[4134057]: Error: At least one
profile failed to load
Apr 04 16:54:40 mail systemd[1]: apparmor.service: Main process
exited, code=exited, status=1/FAILURE

#1085140#81
Date:
2025-04-05 08:39:44 UTC
From:
To:
* Soren Stoutner <soren@debian.org> [2025-04-04 17:22]:

Ah, thanks.  Yes, this clearly indicates that the Mnesia table used for
storing the server-side message archive is borked (there's a max. size
limit of 2 GiB), and it's most probably the reason ejabberd fails to
start up.

There's nothing the Debian package (maintainer) could do to address
this, I'm afraid.  The problem is that Mnesia is not designed for
persistent storage of non-trivial amounts of data (it's more of a Redis
replacement than an SQL replacement).  It's still ejabberd's default
storage backend for historical reasons.  Back in the days, there was no
server-side archive.  But when using it for this kind of data, breakage
can happen (e.g., when upgrading Erlang).  That's the reason you'll see
a warning to not use Mnesia in the logs during startup, and in the
default configuration.  I've been advocating for removing Mnesia support
altogether and switching to SQLite by default, but we didn't get round
to that, yet (upstream, I mean).

So, for cases such as yours, I see no proper fix, and would instead
suggest one of the following two options:

1) Remove Mnesia's archive table (i.e., all messages stored server-side)
and maybe also the PubSub tables before upgrading.  The remaining tables
will probably survive the upgrade, and fresh archive/PubSub tables will
be usable after the upgrade (until the next breakage, at least).

2) Convert to SQL before upgrading (strongly recommended).

I could help with either (e.g., in the ejabberd support channel), but
maybe that's out of scope here?