- Package:
- libnss-ldapd
- Source:
- nss-pam-ldapd
- Description:
- NSS module for using LDAP as a naming service
- Submitter:
- Patrick Schoenfeld
- Date:
- 2015-11-08 20:33:03 UTC
- Severity:
- important
Hi, since we use libnss-ldapd we have a problem that is quiet serious for us, as it effectively affects login and group ACLs. However we couldn't yet track down this issue to a specific component, therefore we didn't report it yet. The setup: Our setup is a mixed Windows/Linux environment with a LDAP server, for central authentication. Linux clients use libnss-ldapd for resolution of usernames and groups. The problem: After reboot of the Linux clients they are unable to resolve groups and sometimes are also unable to resolve users. The result is that files are owned by [nobody]:nogroup, while getent passwd and getent group show the right result. In consequence people are unable to properly login (because desktop environment need read permissions on their setting ;) and user permissions are broken. After 10-30 minutes of running the problem disappears. This makes me think that some timeout occours, but I can't tell which. I thought its probably somehow related to the udev resolution issues that are handled different in libnss-ldapd from libnss-ldap which produces a significant delay when booting because groups can't be resolved while ldap is accessible, which is handled gracefully bei libnss-ldapd. Maybe you gather invalid results while booting, because LDAP is not accessible. But I don't see why nslcd should cache these results so I think my idea is absurd. The problem is reproducable with or without nscd running, so the problem is not related to it. The problem seems not to be related to the groups which contain spaces, except that it spams the log secondly with error messages unless my patch is applied. The problem does not occur with libnss-ldap, so the problem is specific to libnss-ldapd. I've choosen severity serious for this issue because at the one hand the problem would fit severity 'Critical', because it "makes unrelated software on the system (or the whole system) break", but then again I felt uncomfortable with it, because the problem does not persist over the uptime of the system and after 10-30 minutes the problem disappears. But I think it should definitive be fixed for lenny. Best Regards, Patrick
Could you provide some more details? Is the LDAP server on the system that also runs nss-ldapd, what options do you use, which LDAP server software etc? Your configuration file should also help. I don't understand this. If you perform getent passwd and getent group you get the expected result but if you do ls -l the files are reported as nobody:nogroup? If ls can't resolve numeric user and group ids it should print the numeric form, not make up something. Can you produce logs of nslcd? It should report whether the LDAP server was reachable or not. If you can run nslcd with the -d option it should report more information that will help in tracking this down. Note that for logging in you also need pam_ldap which has it's own configuration. If the problem is in that you should probably also provide information about that. nslcd only caches the relationship between DNs and uids for group membership lookups (when the uniqueMember attribute is used). This timeout is hardcoded at 15 minutes. Other than that I can't think of a timeout as long unless you set it that high in the config. The way nss-ldapd solves the udev problem is by not doing LDAP lookups that early during boot at all and "fail" quickly. Only when nslcd is started are lookups attempted. In any case I can't think of a case where getent passwd should work and ls would fail. One known issue (#475626) is related to the order at which nslcd is started during boot. If the LDAP server is unavailable when nslcd is started a timeout could occur and the LDAP server will not be found immediately when it is available. I am inclined to lower it to important because it seems to work in a lot of common environments. I hope to fix this soon. Thanks for your bugreport.
Hi Arthur, Yep, I can. I'm just unsure which informations are of interest (I'm at a point where I'm kinda clueless whats the cause of the trouble :/). No, it runs on another host. I don't use any special options. In fact the configuration is the default configuration, except the server address and the search base. root@teekanne:~# grep -v '\(^#\|^$\)' /etc/nss-ldapd.conf uri ldap://majestix-linux.intra.in-medias-res.com base dc=intra,dc=in-medias-res,dc=com uid nslcd gid nslcd The LDAP server is a usual slapd as it is in Etch: slapd (2.3.30-5+etch1) Right. Sometimes all files are "owned" by nobody:nogroup but the most common problem is that only groups are a problem. And yes, while the problem exists getent passwd and getent group show up groups properly. Well, I think this is related to the fact that it is a NFSv4 filesystem. nobody:nogroup is what idmapd from NFS does if it cannot properly resolve the ids. OK. I will add this logs ASAP. Well, the problem is not the login per se, but that some programs (for example GNOME) simply do not work, because they can't read their settings (if the nobody problem exists as well. if the groups are the only problem, then only accessing shared files is a problem) I would have said first, that 15 minutes could be the time frame, but then again: no. Today I saw the problem disappearing after more then half an hour. Well, sounds reasonable and I don't see why this should cause the problems. Well, yes, thats true. But on the other side it has serious affect on the functionality on the system at a whole (because it is a client that mounts /home etc. from the server), so I felt serious is a good compromise. No bug report, no solution, right? So no need to thank me, instead I thank you if you'd find a solution for it. Best Regards, Patrick
Hi, schoenfeld@teekanne ~ % ls -l test -rw-rw-r-- 1 schoenfeld nogroup 0 12. Sep 09:49 test Interesting enough: The symptom is similar to the system behaviour, if nslcd is _not_ running. Then all files resolve to nobody:nogroup. However there is no problem visible from the log. Best Regards, Patrick
If using nfs4 (I've been doing some reading up but still no first-hand experience) is that if the user doesn't exist it is generally mapped to nobody:nogroup. The mapping is done by idmapd but at some point in combination with something in the kernel. From what I understand from scanning the idmapd code is that there is a default cache expiry time (in the kernel) of 500 seconds (10 minutes). Current value should be available in /proc/sys/fs/nfs/idmap_cache_timeout. My guess is that name lookups are cached in idmapd. Can you check that by restarting idmapd (/etc/init.d/nfs-common restart) the problem goes away? On my system, idmapd is started way before nslcd and it probably isn't a good idea to start if before idmapd. There seems to be an undocumented Cache-Expiration option in the General section of /etc/idmapd.conf that could help to bring down the cache timeout value. Can you check the idmapd logs anything out of the ordinary? Perhaps you can increase the verbosity in /etc/idmapd.conf. Thanks. Perhaps I should set up a test environment myself with NFS4. Do you have some pointers for that (I use NFS3 myself).
Hi, right. Nope, it does not. (default: 3, tried up to 10) does not seem to change anything. Basically this is all: Oct 3 09:46:36 teekanne rpc.idmapd[3309]: libnfsidmap: using domain: localdomain Oct 3 09:46:36 teekanne rpc.idmapd[3309]: libnfsidmap: using translation method: nsswitch Oct 3 09:46:36 teekanne rpc.idmapd[3310]: Expiration time is 600 seconds. Oct 3 09:46:36 teekanne rpc.idmapd[3310]: Opened /proc/net/rpc/nfs4.nametoid/channel Oct 3 09:46:36 teekanne rpc.idmapd[3310]: Opened /proc/net/rpc/nfs4.idtoname/channel Oct 3 09:46:36 teekanne rpc.idmapd[3310]: New client: 0 Oct 3 09:46:36 teekanne rpc.idmapd[3310]: Opened /var/lib/nfs/rpc_pipefs/nfs/clnt0/idmap Oct 3 09:46:36 teekanne rpc.idmapd[3310]: New client: 1 Oct 3 09:47:23 teekanne rpc.idmapd[3310]: Client 0: (user) id "30010" -> name "schoenfeld@localdomain" Oct 3 09:47:23 teekanne rpc.idmapd[3310]: Client 0: (group) id "65534" -> name "nogroup@localdomain" Thats not a great thing. You need to setup an export entry like you do for NFSv4, however there is a fundamentel difference to NFSv3. You export a NFSROOT not single exports. So you possibly want to setup a virtual export directory. Its described here [1]. Best Regards, Patrick [1] http://www.crazysquirrel.com/computing/debian/servers/setting-up-nfs4.jspx
(Cc-ing the nfs-utils maintainers, perhaps they have some insight that could solve this) I have been able to reproduce this. On the server I have in /etc/exports (/export/newhome is a bind-mounted /home with half a dozen users): /export 192.168.1.0/24(ro,sync,insecure,root_squash,no_subtree_check,fsid=0) /export/newhome 192.168.1.0/24(rw,nohide,sync,insecure,root_squash,no_subtree_check) On the client I have in /etc/fstab: fs:/newhome /mnt nfs4 rw 0 0 Now if I stop nslcd (all name lookup calls should now return NSS_STATUS_UNAVAIL/ENOENT) an 'ls -l /mnt' shows: [...] drwx-----x 148 nobody users 12288 Oct 3 21:02 arthur [...] (the user arthur from the server is mapped to the user nobody on the client because the namelookup failed). With some more verbose logging rpc.idmapd shows: [...] rpc.idmapd: nfs4_name_to_uid: calling nsswitch->name_to_uid rpc.idmapd: nss_getpwnam: name 'arthur@localdomain' domain 'localdomain': resulting localname 'arthur' rpc.idmapd: nss_getpwnam: name 'arthur' not found in domain 'localdomain' rpc.idmapd: nfs4_name_to_uid: nsswitch->name_to_uid returned -2 rpc.idmapd: nfs4_name_to_uid: final return value is -2 rpc.idmapd: Client 16: (user) name "arthur@localdomain" -> id "65534" [...] If I repeat the ls command a couple of times rpc.idmapd no longer logs the failed lookups and a strace of rpc.idmapd also shows that that process is no longer asked (by the kernel?) to look up the user. If I then start nslcd (now name lookups should be performed as usual and getent shows that they do) the results aren't quickly fixed. After a while (I've been messing about with stuff in /proc so I don't know how long this normally takes) the kernel asks rpc.idmapd again to look up user arthur (and the other users in the filesystem). Also note that the bugreporter had problems with groups and I've reproduced the behaviour with users. [...] drwx-----x 148 arthur users 12288 Oct 3 21:02 /mnt/arthur [...] Now the question is, how should this caching mechanism be tuned and how should we solve this problem. Is there a reliable way to flush the cache? There seems to be /proc/net/rpc/nfs4.nametoid which contains some stuff that could be relevant and /proc/sys/fs/nfs/idmap_cache_timeout. However setting /proc/sys/fs/nfs/idmap_cache_timeout or Cache-Expiration does not result in the expected timeout in seconds (read from the idmapd.c). Setting it to 10 results in a retry every 30 to 60 seconds, setting it to 100 seems to result in a retry in 60-120 seconds. Also, writing to /proc/net/rpc/nfs4.idtoname/flush and /proc/net/rpc/nfs4.nametoid/flush (like is done in flush_nfsd_idmap_cache()) doesn't seem to make a difference. I haven't had a look at the kernel code yet (this is running kernel Linux 2.6.26-1-686 (SMP w/2 CPU cores)). Patrick, does adding "Cache-Expiration = 10" to /etc/idmapd.conf in the [General] section help at all in your setup? (the correct values should be loaded sooner)
Hi, 2008/10/3 Arthur de Jong <adejong@debian.org>: very good. This betters the situation a lot. Its a good workaround. Now if you'd find the reason why the behaviour differs from libnss-ldap and could enhance libnss-ldapd in this way, this would be great :-)) Best Regards, Patrick
retitle 500778 nss-ldapd: problem resolving groups and users with nfs4 severity 500778 important tags 500778 + help thanks I am lowering the severity of this bug for now because the problem is limited to using nss-ldapd in combination to nfs4 and there is a workaround (adding "Cache-Expiration = 10" to /etc/idmapd.conf). I will try to investigate this some more but help is appreciated with this.
I have been able to reproduce the same behaviour with nss_ldap. If you freshly mount a filesystem while the LDAP server is unavailable the kernel will not re-ask idmapd to look up the usernames until the timeout has expired. I have dug a little through the code (nfs-utils, libnfsidmap and kernel) and from what I understand is that the kernel should not cache negative lookups. But idmapd seems to map IDMAP_STATUS_LOOKUPFAIL to IDMAP_STATUS_SUCCESS which causes the kernel to remember the mapping. This is done in: nfs-utils-1.1.3/utils/idmapd/idmapd.c:674: /* XXX: I don't like ignoring this error in the id->name case, * but we've never returned it, and I need to check that the client * can handle it gracefully before starting to return it now. */ if (im.im_status == IDMAP_STATUS_LOOKUPFAIL) im.im_status = IDMAP_STATUS_SUCCESS; Not sure who made the comment and if this still a valid comment. If this is fixed this would result in negative entries not being cached at all (except by nscd if it is enabled but the kernel would ask idmapd which would ask nscd). By looking though the kernel code (fs/nfs/idmap.c) there is no way to flush the cache. Also, the value of /proc/sys/fs/nfs/idmap_cache_timeout at the time the cache entry was created is used so it's no use in lowering the value after the fact. That means that I think the only way to fix this is in the short term is to remove the LOOKUPFAIL to SUCCESS mangling from idmapd.c (which could have other side effects) or to apply the workaround as described before. Note that I have only read code and not done extensive debugging by deploying modified versions of either kernel of idmapd. There is one thing that is remaining a little puzzling in the kernel code is the question about the cache retry. I can't explain the strange timeout if you set the cache value really low like 1 jiffy. Then again I don't know enough about jiffies and kernel internals to go hunting that problem anyway. What nss-ldapd could do is document that the Cache-Expiration option be set. Perhaps a check could be implemented with a debconf note during package installation. Another option would be to start nslcd before nfs-common. This however would probably break an environment where /usr is mounter over NFS. Also that would cause problems because it is best to start nslcd after slapd.
Hi Arthur, That does not seem to be the root of the problem. I've built nfs-utils with these lines commented out on one of my systems and disabled the workaround in idmapd and the problem persists. Hmm. Probably the workaround should then be included in the default configuration of idmapd. It seems not to cause any harm and works around these problems and IMHO its unlikely that this can be fixed *properly* for lenny. What do you think about this approach? Shall we ask the NFS maintainers about this change to the default configuration? Best Regards, Patrick
Thanks for investigating this. Another thought occurred to me that the kernel could be caching the contents of the directory at another level (e.g. it could cache the directory information without ever hitting and idmap code untill that cache is expired). If the NFS maintainers think this does not cause problems then I think this will be the best solution for the short term. The only downside that I can think of is that there might be some reduced performance because the name to id lookups need to be done more frequently. Can you open a new bugreport on nfs-utils? For the longer term the kernel should probably provide a mechanism to flush the idmap cache.
I recently came across the nfsidmap -c option. I haven't thoroughly tried to reproduce the problem but nslcd 0.9.1-1 in experimental has an option to flush various caches. You could put reconnect_invalidate nfsidmap in nslcd.conf. I'm not 100% sure if this fixes the problem but can you reproduce the problem with that option set? Thanks,
I recently came across the nfsidmap -c option. I haven't thoroughly tried to reproduce the problem but nslcd 0.9.1-1 in experimental has an option to flush various caches. You could put reconnect_invalidate nfsidmap in nslcd.conf. I'm not 100% sure if this fixes the problem but can you reproduce the problem with that option set? Thanks,
Just set sec=sys in both the exports entry on the server, and in the fstab options on the client, and it works - tested on centos 6.5 and ubuntu 12.04 (client)/14.04 (server). should work with debians too --- Roy Sigurd Karlsbakk <roysk@hioa.no> Overingeniør, IT drift, HiOA (+47) 9801 3356 / 6723 5827
This is happening quite often to me, also with the proposed workaround of "Cache-Timeout = 10" (both on server and clients). I don't have problems at boot time, but I get random user/gid 4294967294, especially on file creation, and they disappear without no intervention usually in some minutes. The NFS server is on Debian Wheezy and clients on Debian Testing. I also have a Debian Wheezy NFS client that doesn't have this problem at all, so I tried downgrading nslcd (to 0.8.10-4) libnss-ldapd (to 0.8.10-4) libpam-ldapd (to 0.8.10-4) nfs-common (to 1.2.6-4) libnfsidmap2 (to 0.25-4) libnss3 (to 2:3.14.5-1) to match the versions on Wheezy but it's not helping. I also tried the "reconnect_invalidate nfsidmap" on the latest versions of nslcd (on Debian Testing) and it's also not helping. I already have sec=sys enabled in the NFS mount options. Thanks, Daniele
Hi there! It seems i'm also in pain because of this bug, as you can see the uid / gid of many users get mapped to 4294967294 tor@host:~$ ll -a $HOME|head -6 drwxr-xr-x 16 tor root 752 Sep 24 13:00 . drwxr-xr-x 25 nobody nogroup 632 Jul 23 18:21 .. drwx------ 3 4294967294 4294967294 80 Dec 25 2012 .adobe -rw------- 1 4294967294 4294967294 2.0K Sep 24 13:00 .bash_history -rw-r--r-- 1 4294967294 root 3.1K Sep 20 2013 .bashrc After a few minutes the issue is gone: tor@host:~$ ll -a $HOME|head -6 drwxr-xr-x 16 tor root 752 Sep 24 13:00 . drwxr-xr-x 25 nobody nogroup 632 Jul 23 18:21 .. drwx------ 3 tor kassa 80 Dec 25 2012 .adobe -rw------- 1 tor kassa 2.0K Sep 24 13:00 .bash_history -rw-r--r-- 1 tor root 3.1K Sep 20 2013 .bashrc Unfortunately i have no idea how to debug this further :/ Cheers, Simgund
I have a similar issue, but in my case the invalid mapping does not go away. I'm running Jessie with nscld to authenticate against samba4 AD and have a NAS configured as member server. To Linux clients it serves NFS4 sec=krb5p. Actually I have 2 machines, which I think are configured identically concerning nslcd, kerberos, and NFS. On one machine (fresh Jessie install) everything works perfectly. On the other machine (upgrade from wheezy) everything worked perfectly on wheezy and indeed I noticed the issue only after several days, when I first did ls -la on one of the imported drives. It looks extremely strange: drwxrwxrwx 48 nobody 4294967294 4096 Sep 16 21:09 . drwxr-xr-x 9 root root 4096 Nov 23 2014 .. drwxr-xr-x 38 nobody 4294967294 4096 Okt 23 07:16 adm drwxr-xr-x 4 nobody ad_users 4096 Okt 14 2013 admin -rw-r-xr-- 1 nobody ad_users 1219 Okt 10 2001 adsl_suse71 [...] The same directory on the other system: drwxrwxrwx 48 mgr lars 4096 Sep 16 21:09 . drwxr-xr-x 9 root root 4096 Nov 23 2014 .. drwxr-xr-x 38 mgr lars 4096 Okt 23 07:16 adm drwxr-xr-x 4 mgr ad_users 4096 Okt 14 2013 admin -rw-r-xr-- 1 mgr ad_users 1219 Okt 10 2001 adsl_suse71 [...] which is the same, as I see it on the NAS. However, I can read everything, but if I create new files they're created as guest:users on the NAS, which maps to nobody:users on the machine, where everything is alright. I have a similar situation as described in this bug report during start-up. Due to trouble with k5start, nslcd is not available, when NFS is started on boot. I use to start nslcd manually from a root prompt. As said, it does not go away. Changing the expiration does not change a thing. The group number 4294967294 seems to pop out of thin air. It's not the number of the 'lars' group on any system involved. Checking syslog (Verbosity=9) it turns out that idmapd doesn't ever look up the names. This is a log following a nfs-common restart, several 'ls' and a 'touch': Nov 8 21:02:09 midgard rpc.idmapd[11283]: libnfsidmap: using domain: ad.microsult.de Nov 8 21:02:09 midgard rpc.idmapd[11283]: libnfsidmap: Realms list: 'AD.MICROSULT.DE' Nov 8 21:02:09 midgard rpc.idmapd[11283]: libnfsidmap: loaded plugin /lib/x86_64-linux-gnu/libnfsidmap/nsswitch.so for method nsswitch Nov 8 21:02:09 midgard rpc.idmapd[11284]: Expiration time is 10 seconds. Nov 8 21:02:09 midgard rpc.idmapd[11284]: Opened /proc/net/rpc/nfs4.nametoid/channel Nov 8 21:02:09 midgard rpc.idmapd[11284]: Opened /proc/net/rpc/nfs4.idtoname/channel Nov 8 21:02:09 midgard rpc.idmapd[11284]: New client: 13 Nov 8 21:02:09 midgard rpc.idmapd[11284]: New client: 14 Nov 8 21:02:09 midgard rpc.idmapd[11284]: Opened /run/rpc_pipefs/nfs/clnt14/idmap Nov 8 21:02:09 midgard rpc.idmapd[11284]: New client: 15 Nov 8 21:02:09 midgard nfs-common[11269]: Starting NFS common utilities: statd idmapdrpc.idmapd: libnfsidmap: using domain: ad.microsult.de Nov 8 21:02:09 midgard nfs-common[11269]: rpc.idmapd: libnfsidmap: Realms list: 'AD.MICROSULT.DE' Nov 8 21:02:09 midgard nfs-common[11269]: rpc.idmapd: libnfsidmap: loaded plugin /lib/x86_64-linux-gnu/libnfsidmap/nsswitch.so for method nsswitch Nov 8 21:15:29 midgard nfsidmap[11442]: key: 0x154df42a type: uid value: guest@ad.microsult.de timeout 600 Nov 8 21:15:29 midgard nfsidmap[11442]: nfs4_name_to_uid: calling nsswitch->name_to_uid Nov 8 21:15:29 midgard nfsidmap[11442]: nss_getpwnam: name 'guest@ad.microsult.de' domain 'ad.microsult.de': resulting localname 'guest' Nov 8 21:15:29 midgard nfsidmap[11442]: nss_getpwnam: name 'guest' not found in domain 'ad.microsult.de' Nov 8 21:15:29 midgard nfsidmap[11442]: nfs4_name_to_uid: nsswitch->name_to_uid returned -2 Nov 8 21:15:29 midgard nfsidmap[11442]: nfs4_name_to_uid: final return value is -2 Nov 8 21:15:29 midgard nfsidmap[11442]: nfs4_name_to_uid: calling nsswitch->name_to_uid Nov 8 21:15:29 midgard nfsidmap[11442]: nss_getpwnam: name 'nobody@ad.microsult.de' domain 'ad.microsult.de': resulting localname 'nobody' Nov 8 21:15:29 midgard nfsidmap[11442]: nfs4_name_to_uid: nsswitch->name_to_uid returned 0 Nov 8 21:15:29 midgard nfsidmap[11442]: nfs4_name_to_uid: final return value is 0 So it obviously tries to resolve guest (during touch or the following ls), but it never looked up any other name in 13 minutes with any expiry time of 10 seconds. So it seems to be similarly related to chaching of negative results. Please let me know, if I can help with additional input.