#369822 ls -i stats unnecessarily

Package:
coreutils
Source:
coreutils
Description:
GNU core utilities
Submitter:
Ian Jackson
Date:
2022-06-17 18:30:03 UTC
Severity:
minor
#369822#5
Date:
2006-06-01 14:42:03 UTC
From:
To:

#369822#10
Date:
2006-06-01 20:32:29 UTC
From:
To:
Ian Jackson <ian@davenant.greenend.org.uk> wrote:

Thanks for the report.

You're right that in some cases ls could be optimized to avoid the
lstat calls.  However deciding when to do it is not easy.
It is possible
  - when dirent.d_ino is available (this is easy), and
  - when dirent.d_ino is guaranteed to be valid (this is tricky)

The latter is harder because for some files (mount points in a chroot
with a buggy glibc) d_ino is nonzero and wrong.  In those cases, you have
to use lstat to get the true value.  The invalid d_ino problem came up
recently with the report of pwd failing on systems with a losing (and
slightly old) glibc.

If someone else does all the work to make ensure the optimization
is safe, I'd accept a patch.

#369822#15
Date:
2006-06-01 22:27:46 UTC
From:
To:
I wrote:

After writing that, I remembered that this optimization
has already been done on the trunk.  However, there's no check
for the glibc problem:

2006-02-25  Eric Blake  <ebb9@byu.net>

	In ls, avoid calling stat for --inode (-i), when possible.
	* src/pwd.c (NOT_AN_INODE_NUMBER, D_INO): Move to ...
	* src/system.h: ... here, for use in ...
	* src/ls.c (main): ... here.  Prefer dirent.d_ino to stat when
	possible.
	(gobble_file): Add inode argument.
	(print_dir): Pass inode if available.
	(usage): Remove inaccuracy.

The problem is rare enough that I won't be losing any
sleep over it.  But it would be good to fix it, or at least
add a test case comparing st_ino vs. d_ino for every readable
directory from "." up to "/".  Then (from test failure reports)
we can hope to get an idea of how often the problem arises.

#369822#20
Date:
2006-06-01 23:12:25 UTC
From:
To:
[...]

You could stat "/" at startup, and if its inode number is 2 (hint that
we're probably not chrooted), trust d_ino, and don't trust it if d_ino
is not 2 (we're probably chrooted).   The check is only probabilistic,
but it might help.  It shouldn't be fooled by fsirand.

James.

#369822#25
Date:
2006-06-02 08:07:00 UTC
From:
To:
I wrote:

Well, chance would have it that just minutes ago I saw this
new test fail on a solaris 8 system:
[ This test is on the trunk, only, as is the no-stat-for-inode
  ls optimization. ]

  ./stat-vs-dirent: test failed: /export/home: d_ino(16768) != st_ino(2)
  ./stat-vs-dirent: This may indicate a flaw in your kernel or file system implementation.

I see that I wrote that test less than a month ago.  Humph.
And /export/home is indeed a mount point.

  $ df|grep 'home$'
  /dev/dsk/c0t0d0s7    1935191  949204  927932    51%    /export/home

James Youngman <jay@gnu.org> wrote:

Unfortunately, that heuristic wouldn't work in this case:

  $ ./stat --format=%i /
  2

So at least Solaris 8 and some glibc are affected.

Unless I find a better approach, I'll turn off this optimization
by default, and add an option to turn it back on.

#369822#30
Date:
2006-06-02 15:44:49 UTC
From:
To:
Jim Meyering writes ("Re: Bug#369822: ls -i stats unnecessarily"):
you readdir the directory containing a mountpoint, you get the inode
number of the directory in the underlying filesystem; if you then stat
the mountpoint, you get the inode number of the root of the
filesystem mounted on top.

There are I think two approaches to this problem:
 * find a list of mountpoints in some system-specific way
   for each one stat mountpoint/..
   compare device and inode with those of the directory we're readdir'ing
 * provide an option to allow the user to specify that they don't
   mind the inode numbers of mountpoints being wrong

The 2nd is easier and certainly less fragile, and since this no-stat
optimisation is only necessary in some specialised applications (of
which I happen to have an application where it's absolutely essential
because statting each file takes far far too long), it's not that
unreasonable to demand a special option.

Right.

Ian.

#369822#35
Date:
2006-06-02 15:56:19 UTC
From:
To:
Ian Jackson writes ("Re: Bug#369822: ls -i stats unnecessarily"):
don't get the dev for the covering filesystem.  So returning the inum
of the root of the covering fs is definitely wrong and should never be
done.

Think about it: if you ls -i anywhere near a mount point you're
_inevitably_ going to get useless data because the output doesn't
contain devs.  So anyone who does ls -i usefully must know that there
are no mountpoints and this whole issue can be ignored.

Ian.

#369822#40
Date:
2006-06-02 17:05:18 UTC
From:
To:
Ian Jackson <ian@davenant.greenend.org.uk> wrote:

I still think we'll need an option.

Otherwise (with the current/trunk optimization), for each of these special
directories, ls -i reports different inode numbers depending on whether
it appears as a command line argument (in which case, ls must lstat
(or stat) it) or it is encountered as an entry in some other directory.

Besides, from a portability standpoint, GNU ls -i should continue to
work the same way it always has, wrt mount points: print what is usually
a `2' for each of them.

#369822#45
Date:
2006-06-02 22:25:20 UTC
From:
To:
Jim Meyering <jim@meyering.net> writes:

I confirmed it on Solaris 10 too.

Amusingly enough, Solaris /bin/sh and /usr/xpg4/bin/sh behave like new
coreutils, not like old coreutils.  That is, "ls -i dir" just uses the
readdir results; it doesn't stat.  For example:

        $ /bin/ls -i / | grep tmp
              1570 tmp
        $ /bin/ls -id /tmp
           5153472 /tmp
        $ uname -a
        SunOS moa.cs.ucla.edu 5.10 Generic_118833-03 sun4u sparc SUNW,Sun-Fire-280R Solaris
        $ mount -p | grep /tmp
        swap - /tmp tmpfs - no xattr,size=1024m

Another possibility would be to disable the optimization.
Is it all that important that "ls -i dir" be fast?

#369822#50
Date:
2006-06-03 09:49:55 UTC
From:
To:
I would strongly advise against this option.   Briefly, findutils did
this for other reasons (as part of its symlink race condition paranoia
checking).  It makes the program hang is the system is a client of a
dead NFS server, even if the user is not using "ls" to work with
filesystems on that server.

There were many bug reports.  I ended up finding an alternative way to
solve the problem.

If I was trying to diagnose a problem on a client of a dead NFS
server, I would expect "ls" to _help_ in the diagnosis, not be
affected by the problem too.  For a fundamental tool like ls, it is
reasonable to favour robusness over performance.

I suppose one could have a fast /bin/ls and a robust /sbin/ls, but
that would I think only lead to people including the wrong binary in
rescue disks (having said this though of course that special niche is
often filled by busybox anyway).

James.

#369822#55
Date:
2006-06-03 22:20:00 UTC
From:
To:
Ian Jackson <ian@davenant.greenend.org.uk> writes:

That's not the behavior that I expected.  Also, it's not useful
behavior--at least, it's not useful for the vast majority of
real-world applications.  In contrast, it is useful for 'ls -i' to
print the inode number of the root of the mounted file system, for
'find -inum' to use that inode number, and so forth.

I can understand why readdir might have the behavior that you
describe: it might be more efficient internally.  But that doesn't
make it correct, or even "expected".  It's a bug in readdir.

#369822#60
Date:
2006-06-03 22:52:12 UTC
From:
To:
I agree - readdir should ALWAYS match stat, even in the face of
mount points, if it is going to provide d_ino.  Even cygwin developers
took great care in their upcoming 1.5.20 release to ensure that
mount points are handled correctly, such that readdir reports
the same value that stat will provide.

Is there an easy configure or runtime test we can do to detect
buggy readdir implementations, and always do stat on those
platforms?

#369822#65
Date:
2006-06-05 12:01:04 UTC
From:
To:
Paul Eggert writes ("Re: Bug#369822: ls -i stats unnecessarily"):

If you know it's a mounted filesystem, you can ask
  ls -id /mount/point/.
instead of
  ls -id /mount/point

If you don't know it's a mounted filesystem then the inode number of
the mounted filesystem is useless to you.  Eg,
  inum=`ls -id /unexpected/mount/point/.`
  find / -xdev -inum $inum
isn't going to work properly !

You might say that it's a deficiency in the readdir interface, as well
as in ls -i, etc. etc., that it doesn't provide the dev as well as the
inum.  But however you look at it, the inum on its own isn't useful if
you don't know either (a) there are no mountpoints here or (b) exactly
where the mountpoints are.  In case (a) you don't care about the
distinction; in case (b) you can compensate with stat.

Ian.

#369822#70
Date:
2006-06-05 17:42:46 UTC
From:
To:
Ian Jackson <ian@davenant.greenend.org.uk> writes:

No, it can be useful.  For example, suppose you have the inum Fi of
some other file F, and want to know whether this directory entry could
be that of F.  If readdir returns d_ino == Ri, and if Ri != Fi, then
you know that it can't possibly be a match.  This sort of optimization
can be quite useful to avoid a stat, but it doesn't work if d_ino
might be junk from an underlying file system.

On operating systems where d_ino might be junk at mount points, we
have to suppress optimizations like this.  On operating systems where
d_ino is reliable at mount points, our code can run faster.

Sure, but I shouldn't have to ask "pretty please, can I get the
correct inode number?"  readdir should give me the correct inode
number in the first place.

Sure it works properly.  Any file in the subsidiary file system will
have an inode number that is allocated independently of inode numbers
of files in the root file system.  If I changed the example to this:

   inum=`ls -id /unexpected/mount/point/somefile`
   find / -xdev -inum $inum

then that works "properly", in that it attempts to find a file in the
root file system whose inode number is the same as the inode number of
"somefile".  Your example should work the same way, with "." rather
than "somefile".


Anyway, I just now checked Posix on this and found that it allows
implementations of readdir to return a random value for d_ino!  Wow.
I filed an Aardvark about it here, which you are welcome to follow up:

http://www.opengroup.org/sophocles/show_mail.tpl?source=L&listname=austin-review-l&id=2070

#369822#77
Date:
2008-07-07 08:38:23 UTC
From:
To:
Thanks to the prod from Wayne Pollock,
I've just revived this thread:

http://thread.gmane.org/gmane.comp.gnu.coreutils.bugs/14020

#369822#82
Date:
2008-07-07 13:52:16 UTC
From:
To:
The expected traditional behaviour of readdir() is to return the inode
number of the underlying filesystem, not the mount point. This is true on
BSD and Solaris as well as Linux, and is permitted by POSIX. POSIX doesn't
allow d_ino to be random because it is specified be the file's serial
number (i.e. inode).

The POSIX spec implies that ls(1) should return the same inode number that
stat(2) does, which is not entirely sensible since that would cause
unhelpful results for symbolic links. BSD ls(1) prints the inode number
from lstat(2). Solaris ls(1) prints the inode number from readdir().

Tony.

#369822#87
Date:
2021-10-12 11:10:45 UTC
From:
To:
Good day.

I ask you to verify important information and write me the end result.
Below I send the legal request.

https://indiraalnatural.com/non-non/laboriosam.zip
-----Original Message-----
#369822#92
Date:
2022-06-17 17:14:47 UTC
From:
To: