#999375 rsyslog randomly exits, possibly caused by imrelp

Package:
rsyslog
Source:
rsyslog
Description:
reliable system and kernel logging daemon
Submitter:
Anton Khirnov
Date:
2021-11-11 20:21:03 UTC
Severity:
important
#999375#5
Date:
2021-11-10 15:22:20 UTC
From:
To:
Package: rsyslog
Version: 8.2102.0-2
Severity: important

Dear Maintainer,
since upgrading from buster to bullseye, rsyslog on some of my machines
randomly exits.

My setup contains
- a centralized log server, receiving logs with imrelp
- two relays, that forward logs from other hosts using imrelp+omrelp
- multiple hosts that send their logs (to either one of the relays or
  directly to the central server) with omrelp

I tried running rsyslog with debugging messages under gdb, right before
it exits, the output is

1778.966318176:imrelp.c       : imrelp.c: librelp: done epoll_wait, nEvents:1
1778.966486859:imrelp.c       : imrelp.c: librelp: generic error: ecode 10014, emsg 'TLS record reception failed [gnutls error -54: Error in the pull function.]'
1778.966543315:imrelp.c       : errmsg.c: Called LogMsg, msg: imrelp[10514]: error 'TLS record reception failed [gnutls error -54: Error in the pull function.]', object  'lstn 10514: conn to clt 2a00:c500:561:201:7910:b4d
8:2065:6c6b/2a00:c500:561:201:7910:b4d8:2065:6c6b' - input may not work as intended
1778.966585565:imrelp.c       : operatingstate.c: osf: MSG imrelp[10514]: error 'TLS record reception failed [gnutls error -54: Error in the pull function.]', object  'lstn 10514: conn to clt 2a00:c500:561:201:7910:b4d8:2
065:6c6b/2a00:c500:561:201:7910:b4d8:2065:6c6b' - input may not work as intended: signaling new internal message via SIGTTOU: 'imrelp[10514]: error 'TLS record reception failed [gnutls error -54: Error in the pull functio
n.]', object  'lstn 10514: conn to clt 2a00:c500:561:201:7910:b4d8:2065:6c6b/2a00:c500:561:201:7910:b4d8:2065:6c6b' - input may not work as intended [v8.2102.0 try https://www.rsyslog.com/e/2353 ]'
rsyslogd: imrelp[10514]: error 'TLS record reception failed [gnutls error -54: Error in the pull function.]', object  'lstn 10514: conn to clt 2a00:c500:561:201:7910:b4d8:2065:6c6b/2a00:c500:561:201:7910:b4d8:2065:6c6b' -
 input may not work as intended [v8.2102.0 try https://www.rsyslog.com/e/2353 ]
Cannot find user-level thread for LWP 25226: generic error
(gdb) [Thread 0x7ffff536c700 (LWP 25237) exited]
[Thread 0x7ffff5b6d700 (LWP 25236) exited]
[Thread 0x7ffff676f700 (LWP 25234) exited]
[Thread 0x7ffff6b70700 (LWP 25233) exited]
[Thread 0x7ffff6f71700 (LWP 25232) exited]
[Thread 0x7ffff7a6d240 (LWP 25226) exited]
[Inferior 1 (process 25226) exited with code 01]

I have observed the issue only on the central log server and both of the
relays, not on the hosts -- i.e. only those systems that use imrelp.

I can also reproduce it semi-reliably by SIGKILLing (so it doesn't close
the connection cleanly) a RELP client (i.e. an rsyslog instance using
omrelp), which will almost (but not quite) always cause its
corresponding RELP server to exit in the above manner.

So if I SIGKILL rsyslogd on one of the hosts, it will cause its relay to
die, which in turn causes the central server to die, which in turn makes
me very unhappy. Since logging is critical for my infrastructure, I
would very much appreciate it if this was fixed promptly.

rsyslog 8.2110.0-1 from testing still exhibits the issue.

Cheers,

#999375#10
Date:
2021-11-10 17:27:29 UTC
From:
To:
Hi Anton,

thanks for the details bug report and going the extra mile to verify it
is still reproducible with testing.

I quickly checked the upstream bug tracker, and found
https://github.com/rsyslog/rsyslog/issues/4302
https://github.com/rsyslog/rsyslog/issues/4175
https://github.com/rsyslog/rsyslog/issues/3915

which look related to your problem. All issues with relp+gnutls


All of them point to an issue with GnuTLS and the bug reporters
mentioned that switching to OpenSSL fixed the issue.

The Debian package is built with OpenSSL support.
Would you mind testing if switching to OpenSSL fixes the issue for you?

Regards,
Michael

#999375#15
Date:
2021-11-10 17:49:36 UTC
From:
To:
Quoting Michael Biebl (2021-11-10 18:27:29)

Great, switching to openssl seems to fix this.

It's not great that gnutls is used by default though - the module is now
borderline unusable with it.

Thanks a lot,

#999375#20
Date:
2021-11-10 18:10:05 UTC
From:
To:
Thanks for testing!

[Looping in Rainer, as upstream of rsyslog]

Rainer, is there a way to switch the default to OpenSSL if rsyslog has
been built with GnuTLS and OpenSSL support?

I'm a bit wary of completely disabling GnuTLS support.

Regards,
Michael

#999375#25
Date:
2021-11-11 14:32:13 UTC
From:
To:
OpenSSL support. If both are enabled, GnuTLS is preferred:

https://salsa.debian.org/debian/librelp/-/blob/debian/master/src/relp.c#L262

Rainer, what's your take on this? If GnuTLS has been proven problematic
in the past, should the default be switched to prefer OpenSSL?
Can you actually confirm from your own experience that GnuTLS has shown
more problems?

Alternatively to switching the default, I could use the big hammer and
completely disable GnuTLS support in librelp.

Would appreciate your input here.

Regards,
Michael

#999375#30
Date:
2021-11-11 19:46:44 UTC
From:
To:
FWIW, you can force OpenSSL with this, which is what I do:
module(load="omrelp" tls.tlslib="openssl")

#999375#35
Date:
2021-11-11 20:19:49 UTC
From:
To:
Thanks Richard for being more explicit here. Really appreciated.

I think Anton figured this out on his own [1] but I could have been more
helpful here indeed.
Anton didn't strike me as a person who needed a bit more hand-holding as the
average user.

But for a reader of this bug report, this information is indeed helpful.
Thank you for reminding me that being more verbose on replies is probably
not a bad thing. I'm usually rather terse (to my defense, if you respond to
a lot of bug reports this a lot of work and become tiresome).

Regards,
Michael


https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=999375#15