#251108 ypbind does not rebind if server dies with -noping

Package:
ypbind-mt
Source:
ypbind-mt
Description:
Client daemon for working with Network Information System (NIS)
Submitter:
Francesco Paolo Lovergine
Date:
2021-01-24 15:54:06 UTC
Severity:
important
Tags:
#251108#5
Date:
2004-05-26 21:09:23 UTC
From:
To:
Just after rebooting a slave server, that's the result on client:

klecker:~$ su
Password:
do_ypcall: clnt_call: RPC: Unable to receive; errno = Connection refused
Segmentation fault


The same with login, sudo and any other auth program.

Authentication is now simply locked. Also root cannot login on tty or pty.
When slave comes up status does not change at all. The only solution is
a warm reboot by switching power off :-(

The ypbind client is called with -no-ping. That could do the difference,
possibly.

#251108#10
Date:
2004-05-27 11:49:37 UTC
From:
To:
From reading the source code, indeed if you call it with -no-ping,
ypbind will bind to a server at startup and after that it won't
rebind to another server ever. It appears that that is exactly what
that option is there for ...

You can force it to rebind by sending a SIGHUP to ypbind. Currently,
/etc/init.d/nis reload doesn't signal ypbind with SIGHUP, so that
could perhaps be considered a bug.

However why are you running with -no-ping ? ("doctor doctor it
hurts when I do this" ;) )

Mike.

#251108#15
Date:
2004-05-27 12:41:23 UTC
From:
To:
It's a bit difficult given the lack of documentation.  I guess it would
be reasonable to interpret it as meaning that the regular probes for the
fastest server should be disabled but still redo discovery if it detects
an error.

#251108#20
Date:
2004-05-27 12:41:43 UTC
From:
To:
That could be possible if I could do that as a non-privileged user.
Unfortunately that's impossibile without su, sudo and anything like
that. Moreover that segfault is not a good thing.

Strictly due for port filtering. Without that, ypbind hangs
at startup.

#251108#25
Date:
2004-05-27 22:44:32 UTC
From:
To:
Having looked through the code some more this doesn't seem at all
practical - unless ypbind is probing for servers it really has no idea
if the server is working.

#251108#30
Date:
2004-05-27 22:58:16 UTC
From:
To:
retitle 251108 ypbind does not rebind if server dies with -noping
thanks

You may also be able to use the ypbind RPC interface to cause the
bindings to be re-probed by using the ypset program (providing you
started ypbind with the -ypset option).  This is rather a security
hole, though, and will be unacceptable for many setups.

The segfaults are up to the client code - presumably some part of either
su or pam_unix needs better error checking for RPC failures from NIS.

That sounds like a problem which should be addressed anyway.  Moreover,
looking at the code I can't immediately tell why this would help at all
- the initial server discovery process shouldn't be changed by this
option.  Could you please describe in more detail the setup that
you've got so that I can try to reproduce what's going on?

Similarly, the failure to recover when the server comes back on-line
ought to be investigated - I'll try to have a look at that over the
weekend.

As far as I can tell the behaviour that's being seen is unfortunate but
what was requested - -no-ping tells ypbind not to test the servers
periodically so it doesn't do so.  Unfortunately, without this it then
becomes reliant on some external mechanism to tell it when it needs to
re-probe and at the minute there's no such mechanism.  While you need
-no-probe you could perhaps try using a cron job to send SIGHUP to
ypbind periodically.

#251108#37
Date:
2004-05-28 07:34:18 UTC
From:
To:
That's not a good reason to work as it does. BTW, 'ypcat passwd' does
work while ypbind goes crazy and causes segfaults during authentication.
And this is really strange.
My own idea is that there are some grave issues within libc and ypbind
dialog or so. But I did not read the code at all.

Did you replicate the problem? I had a master and a slave here, and
the problem appears whenever I reboot either the first or the second server.

#251108#42
Date:
2004-05-28 08:07:27 UTC
From:
To:
ypbind should answer something valid.

A master and a slave server. Master is a True64, slave is a
Debian sarge. The client has all incoming tcp/udp ports
filtered, but for ssh.

# Generated by iptables-save v1.2.9 on Tue Mar 23 18:13:00 2004
*filter
:INPUT DROP [408:28655]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [805:119348]
-A INPUT -p icmp -j ACCEPT
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -i ! eth0 -m state --state NEW -j ACCEPT
-A INPUT -p tcp --dport 22 -j ACCEPT
COMMIT
# Completed on Tue Mar 23 18:13:00 2004

Yep that's a possible option. I'll try it as a workaround.
Consider that all setuids() programs apparently fail during the event
(e.g. fetchmail when calling procmail), but hopefully crond should work.

#251108#47
Date:
2004-05-28 08:12:55 UTC
From:
To:
That's really weird, since ypcat and the glibc routines both query ypbind
to find the domain's NIS server. The important thing is, what does 'ypwhich'
say. 'ypwhich' queries ypbind for the current NIS server. If it doesn't
print anything, or errors out, you know the domain is not bound.

Your app segfaulted, not ypbind right ?

Your app crashing, that's a libc6 bug, or perhaps even a bug in the app.
Ypbind doesn't have anything to do with that - whatever the result of
a (libc) conversation with a (ypbind) daemon, crashing with SEGV is
always a bug.

But why did you use -no-ping in the first place? I don't think it was
meant for general usage, the man page talks about systems on a dialup line.
That would at least get the systems running reliably again.

Mike.

#251108#52
Date:
2004-05-28 09:02:51 UTC
From:
To:
True, but that exactly the problem. I cannot currently reboot the
servers, I'll do that again during the weekend in order to better
specify the problem. BTW, the client is currently using the slave.
I had the same event more times, just recently discovered
that it's due to temporary lost connection with NIS servers.

No, ypbind does work, any app that needs auth crashes, e.g. su.
This issue could maybe have security implications, too.

All apps which need to query for auth to be more precise,
so su, sudo, login, fetchmail, exim, ... Definitively a libc problem.
The interesting point is if that's truly a problem of ypbind or a libc one.

I suspect ypbind returns something weird, unexpected or some NULL value
and that causes a SEGV at libc level probably. Any application crashes
in the same way, so it's a libc6 problem. The correct behavior shoud be
that seen without -noping, that's a delay and a timeout (after that
auth continues with other modules if possible). The current behavior
which locks down any local auth is not acceptable, IMHO.

Already answered, I need to keep that client filtered. Anyway, a bug is
a bug, isn't it? Do not use that feature is not a fix :)

#251108#57
Date:
2004-05-28 10:56:04 UTC
From:
To:
You can just null-route the  servers IP to simulate it being unreachable.
with ypbind. The NIS routines in libc6 should have a (better) timeout,
and handle unexpected errors. BTW, I can't reproduce this on current
libc6 versions:

# id miquels
do_ypcall: clnt_call: RPC: Timed out
do_ypcall: clnt_call: RPC: Timed out
id: miquels: No such user
ssr2:~# dpkg -s libc6
Package: libc6
Version: 2.3.2.ds1-11

Now, internal libc6 routines printing to stderr, _that_ is a bug...
a bug for a different day, though.

Okay, I did some tests and read the source code.

It can be fixed, but only by rewriting parts of libc6 and ypbind.

You see, libc6 doesn't query ypbind directly. Ypbind maintains "binding"
files in /var/yp/binding/domain.* that indicate the currently active
NIS server. The libc6 routines then read this file, and try to talk to
the server as read from the file.

If the server doesn't respond, some of the libc6 routines segfault
(though I tried to reproduce that, and on my unstable systems the
commands like 'su' and 'id' just hang and eventually time out).

Now ypbind never knows that the NIS server isn't responding. Libc6
fails to talk to the NIS server, but ypbind isn't informed of that.

It *could* be fixed in the following way:

- If libc6 cannot contact the NIS server as read from /var/yp/binding/domain,
  it should do a YPBIND_DOMAIN RPC call to ypbind i.e. query ypbind
  directly, then if it gets a reply, retry the NIS server once.
- The YPBIND_DOMAIN RPC call should actually check if the currently
  bound server is still alive, and try to rebind to a different
  server if it isn't
- To maintain the -no-ping functionality for dialup systems, ypbind
  should not do those checks if only one server is listed in
  /etc/yp.conf, since there won't be another server to bind to anyway.

That way, libc6 has a way to 'kick' ypbind if the NIS server isn't
responding.

I think this bug has to be forwarded upstream to the NIS maintainer,
and a seperate bug should be filed against libc6.

Mike.

#251108#64
Date:
2004-09-02 07:40:18 UTC
From:
To:
The workaround of killing -HUP ypbind in order to recheck NIS server is
not useful. Moreover my own impression is that stopping the NIS servicce
on one of the server without reboot the box is not critical.
I see problems every time one of the server is simply rebooted.
Maybe we should reassign this bug to libc6? I'm not so sure....

#251108#69
Date:
2005-07-31 22:20:49 UTC
From:
To:
Do you mean that the HUP has no effect?

Could you clarify what you mean here?  How long do you leave the
service stopped when you try stopping the box compared to the amount of
time it's stopped when you're rebooting?  My guess would be that either
some timeout expires during the reboot or an error kicked back by the
network stack while the host is down confuses ypserv.

Well, until ypbind can cope with rebinding there is little point in libc
telling it about problems.