#436338 /bin/df: df rounds sizes incorrectly

Package:
coreutils
Source:
coreutils
Description:
GNU core utilities
Submitter:
River Tarnell
Date:
2010-12-23 23:45:03 UTC
Severity:
minor
#436338#5
Date:
2007-08-07 00:07:41 UTC
From:
To:
i have a filesystem exported from a Solaris host via NFSv3 to a Linux
client.  according to df -k on both sides, this filesystem has
2290522928 KB space used:

Filesystem            kbytes    used   avail capacity  Mounted on
ift                  5119991744 2290522928 2829468816    45%    /aux0

Solaris df -h shows this as 2.1TB:

Filesystem             size   used  avail capacity  Mounted on
ift                    4.8T   2.1T   2.6T    45%    /aux0

however, Linux df -h shows this as 2.2TB:

Filesystem            Size  Used Avail Use% Mounted on
clematis:/aux0/hemlock-home
		      4.8T  2.2T  2.7T  45% /home

(notice available is different too.)

the Linux (coreutils df) output is incorrect.  when rounded using base-2
multiples, 2290522928 KB is 2.1TB.  when rounded using base-10 multiples
(which wouldn't make much sense anyway), the output is still wrong,
because it would be 2.3TB then.

#436338#12
Date:
2010-12-22 20:42:59 UTC
From:
To:
I am seeing incorrect %use when displaying data from a 500GB USB
external drive --
Example output:
 /dev/sde1    480040596 310726424 144929512 69% /media/wdp7
Precise calc. (on HP11C) is Use% = 68.193%
which should not round upward

I am doing a long rm -rf to clean out approx. 200GB of old files.
While this is going on in a background job, I run
# while sleep 5; do echo $(date) $(df .)|cut -d ' ' -f 1,10-; done
in the foreground.
When Use% dropped to 67. 998%, the df display changed to 68%.

This, I think, demonstartes that the problem is in the actual calc.,
not in the use of human friendly number display.

HTH

PS, I have several 500GB USB drives and often defer deleting old files until a
time when the delete does not compete for cycles with useful work. Its OK to
ask for test runs of new code.

Wish list item:

Enable correct computation of Use% that is over 100% (because of the 5%
safety buffer that is built in somewhere)

#436338#17
Date:
2010-12-22 20:53:50 UTC
From:
To:
Paul E Condon wrote:

Thank you for the report.  But I think this is not a bug in df but is
instead a misunderstanding of how it operates.  Please see this FAQ
entry:

http://www.gnu.org/software/coreutils/faq/#df-Size-and-Used-and-Available-do-not-add-up

Is that the issue you are seeing?

In any case, df simply passes along the values reported by the kernel
in the statfs call.  Therefore any actual calculation problems will be
root caused in the kernel and not in the df program.

To see the values that the kernel is returning to df's statfs call
please run the following command and report the contents of the file.

  $ strace -v -e trace=statfs -o /tmp/df.strace.out df /dev/sde1

Bob

#436338#22
Date:
2010-12-23 03:48:04 UTC
From:
To:
No and yes. I am aware of the fact that %use denominator is sum of
Used (U) and Available (A), and that U+A is 0.95 * (1k-blocks). My
'precise' calc. is 100*U/(U+A) . The output transitions to a new,lower
value as the 'precise value' transitions from 68.007 to 67.995, which
I think is strong evidence that the code is ignoring the (1k-blocks)
number and only using U and A. I think the kernel calc is being done
wrong, but the correct calc. can surly be done in user space. Much as
the kernel reports utterly spurious precision of modification times
(down to 1 nanosec) which are ignored by the coreutils by the simple
expedient of truncation.

Of course this is a MINOR bug. I think coreutils should give the user
an self consistent view of what is the situation. I have no idea
what the actual U and A values are. They may be garbage also in which
case I'm asking for self consistent garbage in preference to manifestly
false garbage.

I rather like the idea of having a 5% safety allowance, and having %use
report 100% when there is still 25GB available on a 500GB disk. That is
explained somewhere and is easy to understand and appreciate. But rm id
SLOW on these big disks. I've been watching the progress of rm more
often than I'd like, and I noticed that my mental extrapolations of when
the process would be done weren't giving the correct answer, and it was
because of this bug, so I report.

My suspicion is that the U and A values that are reported by the kernel
are pretty honest data. To get them wrong would require extra code, and
extra code deliberately introduced in order to make a dishonest report
is pretty unbelieveable. Maybe on Wall Street, but not in Linux kernel.

I don't have strace installed on the computer were this is happening.
I attempted to install but the computer crashed will running aptitude.
I close now and go to recovering from the crash. But I don't expect
that df is fudging the numbers that it gets from the kernel. I DO
suspect that the % calc is incorrectly done in the kernel, but on
learning that the calc is done in the kernel, I think that is itself
a minor bug. There are many uses for the kernel in embedded systems
where %use is never needed. Getting it out of the kernel could save
a few dozen bytes, perhaps.

Cheers,

#436338#27
Date:
2010-12-23 07:21:01 UTC
From:
To:
Paul E Condon wrote:

I have been noticing that the ext4 w/ fsync fiasco is making
everything very much slower while saying that it is trying to make
things faster.  The irony is tragic.  I don't know if that is what you
are suffering from but it is potentially possible.

Sometimes it is a bug.  Sometimes it is not.  Thank you for the report
just the same.

In any case I apologize for not spending the time to completely
understand your report before sending my reply.  So often people don't
take minfree into consideration and so I pointed to the FAQ on the
topic.  While the other numbers are just reported from statfs the use
percentage is calculated.  Sorry for getting ahead of myself.

I agree with your analysis that used / (used + available) in your case
of 310726424 / ( 310726424 + 144929512 ) = 0.68193 as you reported
which is not equal to the 69% that the tool emitted.

I looked at the code and if I am following the correct code path then
it is basically doing the following:

  used = f_blocks
  available_to_root = f_bfree
  available = f_bavail
  nonroot_total = used + available
  u100 = used * 100
  pct = u100 / nonroot_total + (u100 % nonroot_total != 0)

Knowing the values returned from the statfs system call would fill in
the values for f_blocks, f_bfree, and f_bavail and should allow us to
know how this calculation is processed.

Again, my apologies for not fully understanding the nature of your bug
report at that time.

Instead of running aptitude (which because of your words makes me
think it ran out of memory and got the oom killer involved) you could
copy the strace deb over and then install it directly with dpkg -i
which would use much less memory and very likely succeed where
aptitude failed.  You could even help aptitude along with

  aptitude download strace

and then dpkg -i strace*.deb at that point.  Just ideas for you.

Alternatively it would be relatively easy to put together a very small
C program that printed the results of the statfs call directly.  Or
perhaps print it from perl's syscall interface.  Please let me know if
you have too much trouble getting strace installed and I will suggest
something.

Bob

#436338#32
Date:
2010-12-23 19:45:39 UTC
From:
To:
No problem about delayed understanding. And the bug really is minor. I've
been puzzling about it for a LOOOONG time while waiting for rm to complete.
Finally did enough careful observation to convince myself that it was real.

Actually, I don't use ext4. I still using ext3. A few months ago I thought
I had a problem with ext3, but the symptoms disappeared while trying to
document it, about the time I throw away a bad disk. My guess is that
that disk was corrupting something that made other disk also appear to
be bad. But I didn't dig it out of the trash to pursue that theory.

Before I got this email I had already done a clean install of Squeeze which
seems to have gotten the box working again. It was strange. Investigation
done before the reinstall indicated that the system clock has stopped two
days ago. Things are working much better now.

So back to the minor bug: My thinking about strace is that it might be
overkill. At some point people operating in user space (me, in
particular, but perhaps you, also) need to trust data returned by the
kernel to a system call. Here we have two kinds of data: disk size, U,
and A are real data. But %use is the result of a trivial calculation
that uses some of these real data as input, and where the result has no
effect on the proper functioning of the kernel. I suspect that the
trivial calculation in the kernel has a silly bug. That it is done
wrong could easily go unnoticed by kernel developers. Such
calculations should not be done by the kernel. It belongs close to the
formatting code that introduces the '%' character into the output
stream, IMHO.  So, I propose that you ignore the %use number given by
the kernel, and replace it with a calculated value that is consistent
with the other numbers on the line as the line is being formatted. The
problem is more cosmetic than real. Three numbers is a row that
purport to be related by simple calculation, but are not, is --- ugly.

There is already, a situation where the data returned by the kernel is
ignored by the coreutils code: the kernel in recent years has started
returning nine orders of magnitude of sub-second precision that had not
been in the last modified time before. At least five or six of those OoM
are utterly spurious and all of them are lost when the file is written
to disk, so the coreutils code ignores them all.

Also, /tmp/df.strace.out is empty after running your suggested diagnostic.
So is the above discussion an instance of sour grapes reasoning?

#436338#37
Date:
2010-12-23 21:26:10 UTC
From:
To:
Paul E Condon wrote:
...
...

kernel differences?  The above works fine for me with unstable's
2.6.32-5-amd64 and self-compiled df from coreutils-8.7.x:

This variant might be more useful:

    $ strace -v -e file -o /tmp/df.strace.out df /dev/sde1

If it too leaves the output file empty, try this:

    $ strace -o /tmp/df.strace.out df /dev/sde1

#436338#42
Date:
2010-12-23 23:00:27 UTC
From:
To:
1> I don't use ext4. How can that be my problem?

2> I DO use ssh to get into the host on which I am seeing the
problem. I could try on one of my three other Squeeze boxes, but none
of them have a /dev/sde1 so someone might think I'm fudging the data.

Original suggested diagnostic seems not to conform to man page use of
-e, but I've never used strace before and may misunderstand. To run
it I just copy and paste between xterms in Gnome GUI.

The simplified string,
strace -o /tmp/df.strace.out df /dev/sde1
gives output that seems garbled when view using less and/or cat.
I will not post it for fear of breakage.

Did you read my last letter? What do you think of the idea of just
doing the divide and round inside df? Your comment about a certain
kernel developer confirms my impression that raising issues as to the
correctness of kernel code can be VERY counter-productive. Better to
ignore output that one doesn't find useful, and produce, from more basic
data, output that is more to ones liking.

As I mentioned this is already being done with certain time-stamps by
coreutils developers.

Cheers,

#436338#47
Date:
2010-12-23 23:43:15 UTC
From:
To:
Paul E Condon wrote:

It is a severe breach of etiquette to forward private email without
permission to a public mailing list.

Bob