#495681 nagios-plugins-basic: check_by_ssh returns "unknown" instead of "critical" when remote host is "down" #495681
- Package:
- nagios-plugins-basic
- Source:
- monitoring-plugins
- Submitter:
- Cyril Bouthors
- Date:
- 2023-04-26 09:21:03 UTC
- Severity:
- important
- Tags:
check_by_ssh returns STATE_UNKNOWN when remote host is "down". It makes Nagios think the host is not completely down. check_by_ssh must return STATE_CRITICAL when ssh fails. The bug can be reproduced like this: supervision:# /usr/lib/nagios/plugins/check_by_ssh -H 192.168.35.243 date; echo $? Remote command execution failed: ssh: connect to host 192.168.35.243 port 22: No route to host 3 supervision:# I reported a very similar bug which was fixed, please have a look at #257793 Regards
Hi there, I was looking into the bugreport. I can't find anything (beside the changelog entry of 1.3.1.0-9). I can't find the answer of the mail to nagiosplug-devel@lists.sourceforge.net (nothing found in the ml archive), no traces in svn/snapshot.d.n and no hints in the changelog about removing/what happened with the patch. Any thoughts/hints/what ever? With kind regards, Jan.
*I have no idea if upstream has integrated this fix or not. The only thing I know is that check_by_ssh reports prevent Nagios from detecting that the host is DOWN. Could you please fix this? Thanks***
hiya, actually, i don't think that is supposed to be the case. since it's a service being checked *via* ssh, the state of the service can not be known, therefore it is "unknown". a proper nagios config should have some kind of host check (and/or service dependencies on check_by_ssh <-> check_ssh) that detects when the host is down, which will return critical and disable the checks of the service in question. at least, this is from my memory of the last time i saw this come up on the upstream ml. sean
Hi Sean, thanks for your 2 cents. Since this is the same argument which Alexander and I agreed yesterday on IRC, I would say that this is no bug. check_by_ssh is no real plugin, it is just a transport which can only deliver check results of remote checks, like nrpe. It can only report "STATE_UNKNOWN", cause it doesn't know anything about the remote check. With kind regards, Jan.