When ssh make a lookup of a node with changed IP-address, nscd returns
the old address. If I stop nscd, ssh get the right address. But as
soon as I start nscd again, the old address is delivered to ssh. All
other applications I've tested get the right address, even when nscd
is running. Applications I've tested includes 'getent hosts', host
and ping.
sid:~> ssh -v bostrom.dyndns.org
OpenSSH_3.8.1p1 Debian-8.sarge.4, OpenSSL 0.9.7g 11 Apr 2005
debug1: Reading configuration data /home/anders/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Connecting to bostrom.dyndns.org [83.250.193.68] port 22.
sid:~> getent hosts bostrom.dyndns.org
83.250.197.26 bostrom.dyndns.org
sid:~> host bostrom.dyndns.org
bostrom.dyndns.org has address 83.250.197.26
sid:~>
strace from ssh:
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = 0
poll([{fd=3, events=POLLOUT|POLLERR|POLLHUP, revents=POLLOUT}], 1, 5000) = 1
writev(3, [{"\2\0\0\0\r\0\0\0\6\0\0\0", 12}, {"hosts\0", 6}], 2) = 18
poll([{fd=3, events=POLLIN|POLLERR|POLLHUP, revents=POLLIN|POLLHUP}], 1, 5000) = 1
recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"hosts\0", 6}], msg_controllen=24, {cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, {4}}, msg_flags=0}, 0) = 6
fstat(4, {st_mode=S_IFREG|0600, st_size=217016, ...}) = 0
pread(4, "\1\0\0\0h\0\0\0\0\0\0\0\1\0\0\0\255\215\\C\0\0\0\0\323"..., 104, 0) = 104
mmap(NULL, 217016, PROT_READ, MAP_SHARED, 4, 0) = 0x2aaaaaaf8000
close(4) = 0
close(3) = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("83.250.193.68")}, 16) = ? ERESTARTSYS (To be restarted)
strace from getent hosts:
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = 0
poll([{fd=3, events=POLLOUT|POLLERR|POLLHUP, revents=POLLOUT}], 1, 5000) = 1
writev(3, [{"\2\0\0\0\r\0\0\0\6\0\0\0", 12}, {"hosts\0", 6}], 2) = 18
poll([{fd=3, events=POLLIN|POLLERR|POLLHUP, revents=POLLIN|POLLHUP}], 1, 5000) = 1
recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"hosts\0", 6}], msg_controllen=24, {cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, {4}}, msg_flags=0}, 0) = 6
fstat(4, {st_mode=S_IFREG|0600, st_size=217016, ...}) = 0
pread(4, "\1\0\0\0h\0\0\0\0\0\0\0\1\0\0\0\34\221\\C\0\0\0\0\323\0"..., 104, 0) = 104
mmap(NULL, 217016, PROT_READ, MAP_SHARED, 4, 0) = 0x2aaaaaf87000
close(4) = 0
close(3) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaaafbc000
write(1, "83.250.197.26 bostrom.dyndns.o"..., 3583.250.197.26 bostrom.dyndns.org
If I remove /var/db/nscd/hosts and restart nscd, ssh works again. But
if I replace /var/db/nscd/hosts with the old db-file again, and
restart nscd, the problem reappear.
The corrupt(?) hosts-file is attached.
/ Anders
Yes, please turn off the default persistent caching of hosts (at least). I think this should also be done upstream. It can lead to lockout of logins in an obscure fashion -- at least it did on Fedora systems running what appears to be the same version of nscd with the same defaults, so presumably Debian would be subject to the same lossage. The situation we saw was the following: the passwd and group databases are from ldap (with files preferred in nsswitch.conf), and hosts are from files and dns (in that order), with authentication by Kerberos. The LDAP servers were moved, so that `ldap' and `ldap-2' got different IP addresses. Over half a day later, it was impossible to log in to the systems multi-user, except via SSH public keys. Login gave authentication errors, either permission denied or invalid password -- I'm not clear why, since Kerberos was functioning OK. In this state, logged in via ssh the results of `getent passwd' and `host ldap' were OK, and there was nothing useful in syslog. Eventually we found that killing nscd solved the problem (and restarting it re-instituted the problem). Later we found (the undocumented) /var/db/nscd and zapped it, whereupon login worked again with nscd running.
* Dave Love: The current code tries to honor TTLs. It might be sufficient to set a zero (or very low) TTL for entries coming from /etc/hosts.
Florian Weimer <fw@deneb.enyo.de> writes: Does `current' mean in the latest Debian package? I can't see anything relevant in the changelog, and the Fedora version definitely didn't time out. I can't easily test the Debian version.
* Dave Love: Yes. Which GNU libc version is in Fedora? 2.3.5? There is quite a bit of code to handle TTLs for records fetched from DNS in version 2.3.5. Don't they expire for you, either?
Florian Weimer <fw@deneb.enyo.de> writes: Yes. (Or it was then -- there seems to be an update to 2.3.6 now.) They were definitely not expiring on Fedora (after ~18 hours with `positive-time-to-live hosts 3600' in nscd.conf). The cache files had ancient modification times; I don't know whether that's an artefact of the mmapping I see is used. Sorry I can't check this on Debian, but it looks to me a risky option to have on anyway.
tag 335476 + upstream confirmed forwarded 335476 http://sourceware.org/bugzilla/show_bug.cgi?id=4428 retitle 335476 [nscd] does not respect DNS TTL. thanks I can confirm that nscd does not invalidate cache as it should wrt DNS TTL's. Though it (as of 2.5) respects positive-time-to-live properly.
tag 335476 + upstream confirmed forwarded 335476 http://sourceware.org/bugzilla/show_bug.cgi?id=4428 retitle 335476 [nscd] does not respect DNS TTL. thanks I can confirm that nscd does not invalidate cache as it should wrt DNS TTL's. Though it (as of 2.5) respects positive-time-to-live properly.
Hi there, I was wondering that now glibc has different (and more friendly) maintainers is it worth trying to raise this issue with them? They may be more receptive on the matter. All the best, Chris
Reviewing this bug and bug #669304, it seems to me that this bug has been
fixed in eglibc (2.13-31):
* patches/any/local-disable-nscd-host-caching.diff: remove, as the host
caching issue has been fixed in the meanwhile. Closes: #669304.
Shouldn't this bug be closed too?
Regards
Javier
What is the current situation about that issue? According to https://sourceware.org/bugzilla/show_bug.cgi?id=4428 that has been fixed in getaddrinfo, but not in gethostbyaddr.