#461049 coreutils: dd no longer reports "xx+yy records in|out" after sigpipe.

Package:
coreutils
Source:
coreutils
Description:
GNU core utilities
Submitter:
Rogier Wolff
Date:
2010-07-29 17:45:07 UTC
Severity:
minor
#461049#5
Date:
2008-01-16 10:26:05 UTC
From:
To:

#461049#10
Date:
2008-01-26 22:01:41 UTC
From:
To:
I haven't replied yet because I'm not sure what to make of it. It
definitely does behave this way. Anyone else know if this was an
intentional change or a bug? I don't see anything obvious in the
changelog, and my reading of posix suggests that it's a bug; am I
missing something?

Mike Stone

#461049#15
Date:
2008-01-29 05:50:23 UTC
From:
To:
Michael Stone wrote:

I can't reproduce this behavior.  This is what I see:

  $ dd if=/dev/zero | dd count=100 of=/dev/null
  100+0 records in
  100+0 records out
  51200 bytes (51 kB) copied, 0.00245 seconds, 20.9 MB/s

  $ dd --version
  dd (coreutils) 5.97

  $ $ uname -rm
  2.6.18-5-amd64 x86_64

  $ uname -rm
  2.6.22-3-686 i686

Bob

#461049#20
Date:
2008-01-29 08:20:08 UTC
From:
To:
Yes, you /can/ reproduce the bad behaviour. Your output shows the
bad behaviour. :-)


Good behaviour:

driepoot:~>  dd if=/dev/zero | dd count=100 of=/dev/null
100+0 records in
100+0 records out
51200 bytes transferred in 0.000553 seconds (92582676 bytes/sec)
129+0 records in
128+0 records out
65536 bytes transferred in 0.002203 seconds (29747044 bytes/sec)
driepoot:~> dd --version
dd (coreutils) 5.2.1
[...]
driepoot:~>


There are two "dd" processes. The second one will copy over exactly
100 blocks, and then exit. It should report 100 blocks in, 100
blocks out. This works.

The other dd has an unlimited amount of input data (/dev/zero), and
it's output pipe is closed after 100 blocks of dat is consumed. The
"pipe" can hold a little more data, so actually it will have read
and subsequently written more than 100 blocks (in my case 128).

So, when 100 blocks were written and consumed by the other side of
the pipe, 28 more blocks were in the pipe, when the system noticed
that the other side of the pipe was closed, and returned EPIPE
to the first dd.

Let me reiterate: It is the first "dd" that is misbehaving, when it
recieves a write error and SIGPIPE, it simply exits instead of
reporting the stats.

using strace on the first dd shows (good behaviour on an older
system!):

[...]
read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
rt_sigaction(SIGPIPE, {SIG_DFL}, NULL, 8) = 0
close(0)                                = 0
close(1)                                = 0
write(2, "101+0 records in\n", 17101+0 records in
)      = 17
write(2, "100+0 records out\n", 18100+0 records out
)     = 18
gettimeofday({1201594453, 321231}, NULL) = 0
write(2, "51200 bytes transferred in 0.025"..., 6451200 bytes transferred in 0.025491 seconds (2008543 bytes/sec)
) = 64
gettid()                                = 13885
tgkill(13885, 13885, SIGPIPE)           = 0
sigreturn()                             = ? (mask now [])
--- SIGPIPE (Broken pipe) @ 0 (0) ---
+++ killed by SIGPIPE +++
driepoot:~>

or (bad behaviour at an upgraded system):

[...]
read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
+++ killed by SIGPIPE +++
Process 11355 detached


Roger.

#461049#25
Date:
2008-01-29 12:27:57 UTC
From:
To:
Rogier Wolff <R.E.Wolff@BitWizard.nl> wrote:
...

Thanks for the report, but that behavior is required by POSIX.
dd must handle SIGINT the way you want, but dd may not handle
SIGPIPE that way:

    ASYNCHRONOUS EVENTS

        For SIGINT, the dd utility shall interrupt its current processing,
        write status information to standard error, and exit as though
        terminated by SIGINT. It shall take the standard action for all
        other signals; see the ASYNCHRONOUS EVENTS section in Section 1.4
        (on page 2280).

#461049#30
Date:
2008-01-29 13:01:45 UTC
From:
To:
I figured there'd be some piece of posix at the bottom of it. :) I
wonder if the documentation should better reflect that. (The info page
says only that "when dd completes it outputs the final statistics; maybe
something like "when dd completes normally or is killed by SIGINT it
outputs the final statistics"?)

Mike Stone

#461049#35
Date:
2008-01-29 13:05:48 UTC
From:
To:
Please interpret it as dd is supposed to work:

Please block the "SIGPIPE" signal, then write will return:

       EPIPE  fd is connected to a pipe or socket whose reading end is
              closed.  When this happens the writing process will also
              receive a SIGPIPE signal.  (Thus, the write return
              value is seen only if the program catches, blocks or
              ignores this signal.)

then the write will return EPIPE, which is a normal error code,
indicating that writing has stopped, and that stats should be printed.

What is dd's "purpose in life"? Device-to-device copy, as it's name
suggests? Turns out modern use is often no longer device-to-device. It
copies files, and counts and reports the number of blocks copied. This
feature is used by some. If you go and change it, things will stop
working.

If you start "waving standards around", I'll bounce it back for
you. SIGPIPE is NOT an asynchronous event. As documented in the quoted
documentation for write. It happens synchronously with the write,
normally before the return of "EPIPE".

	Roger.

#461049#40
Date:
2008-01-29 13:13:45 UTC
From:
To:
Michael Stone <mstone@debian.org> wrote:
I like it.

I'll do this one, but in general, it's much easier for me if such a
suggestion comes in the form of a patch created by e.g.,

  git format-patch --signoff HEAD~1

Then I can simply apply it and give proper credit with the
very convenient "git am PATCH".

Here's what I'll push:
------------------
Subject: [PATCH] Improve the description of when dd outputs its final statistics.

* doc/coreutils.texi (dd invocation): Say that dd prints stats
upon normal termination and upon SIGINT.

Signed-off-by: Jim Meyering <meyering@redhat.com>
---
 ChangeLog          |    6 ++++++
 doc/coreutils.texi |    3 ++-
 2 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 148b7d7..8415d3d 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2008-01-29  Michael Stone  <mstone@debian.org>
+
+	Improve the description of when dd outputs its final statistics.
+	* doc/coreutils.texi (dd invocation): Say that dd prints stats
+	upon normal termination and upon SIGINT.
+
 2008-01-29  Jim Meyering  <meyering@redhat.com>

 	Avoid "make distcheck" failure: newly-created man/*.1 files not removed
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index c1300fb..fb9d5fc 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -7559,7 +7559,8 @@ process makes it print I/O statistics to standard error
 and then resume copying.  In the example below,
 @command{dd} is run in the background to copy 10 million blocks.
 The @command{kill} command makes it output intermediate I/O statistics,
-and when @command{dd} completes, it outputs the final statistics.
+and when @command{dd} completes normally or is killed by the
+@code{SIGINT} signal, it outputs the final statistics.

 @example
 $ dd if=/dev/zero of=/dev/null count=10MB & pid=$!
--
1.5.4.rc5.1.ge6bfe

#461049#45
Date:
2008-01-29 13:13:41 UTC
From:
To:
size of the data copied, which helps me verify things went as planned),
I want to see 1440 blocks copied.

So when I have an image of which the first 1440kbytes should be copied
to a floppy, I want

	dd if=my.img of=/dev/fd0 bs=1k

to report
	1440+0 records out

This is my verification that all data got copied as intended. If
my image is larger than 1440 kbytes, I expect to see
	1441+0 records in
and dd will experience a "ENOSPC" on the 1441'th write.

AGAIN: dd is used to copy data, and to provide feedback on howmuch
data it copied. It is inconsistent to omit the feedback in some cases
when the copying stops. Copying can stop because of several reasons:
Two I can think of are: "no more input", and "output no longer
possible". "ENOSPC" on a device, or a file (disk full), is quite
similar, and should be handled similar to EPIPE on the output.

	Roger.

#461049#50
Date:
2008-01-29 13:42:17 UTC
From:
To:
Which is not correct, as the stats are also printed upon write error,
like ENOSPC. (as we all agree the standards require).

	Roger.

#461049#55
Date:
2008-01-29 14:12:24 UTC
From:
To:
Rogier Wolff <R.E.Wolff@BitWizard.nl> wrote:

You were reporting that this didn't work the way
you expected:

The real solution is just "don't do that (i.e., don't use the
unnecessary pipe).   Do this instead:

    $ dd if=/dev/zero count=100 of=/dev/null

#461049#60
Date:
2008-01-29 14:14:10 UTC
From:
To:
Rogier Wolff <R.E.Wolff@BitWizard.nl> wrote:

It depends on your definition of "normally".
If I meant what you seem to think I meant, I would have said
"...completes *successfully*".

However, improvements are always welcome.

#461049#65
Date:
2008-01-29 14:27:53 UTC
From:
To:
As reported in the original bugreport:

"Of course, the two dd's are unneccesary, and this could be done with
one dd. In practise, the consumer (second dd) is another program that
exits when it's had enough data. The output of (the first) dd is then
used to extract the approximate amount of data copied."

The script that, after an system-upgrade, stopped working of course
uses a different program, which I'm not sure you have. So I decided to
provide an example, for which I didn't have to provide any data files
and other "unusual" programs.

This is a simple test case, which regrettably uses a second "dd" to
limit the amount of data that can be copied. . This results in two different
possibilities if miscommunication: First someone reports seeing "100+0
records copied" from the second dd, which is of course correct, but
the bug lies in the first dd no longer reporting the amount of data
copied.

The second case, is your interpretation that one dd would be
unneccesary as it could be done with just one.

	Roger.

#461049#70
Date:
2008-01-29 14:33:01 UTC
From:
To:
Rogier Wolff <R.E.Wolff@BitWizard.nl> wrote:
...

You're beating a dead horse already.
You'll need to come up with much better arguments (that probably
do not exist) to make me change dd to be non-compliant in this respect.

#461049#75
Date:
2008-01-29 15:37:06 UTC
From:
To:
In my interpretation of the quoted part of the standard, dd has become
non-compliant, by not reporting the statistics when its output happens
to be connected to a pipe.

I will agree that my interpretation is a bit convoluted. But it's a
valid interpretation, and it makes dd do the "sensible thing".

It currently implements a "weird" exception, fuelled by a litteral,
interpretation of unintentional wording in a standard.

I'm all in favor of standards. However, blindly implementing an
unintentional consequence of a standard is not the way to improve
interoperability.

The INTENT of the that part of the standard was to say that dd
implementations need not catch-and-handle all possible signals.

If any other dd can be found that behaves like gnu-dd does now, which
warrants gnu-dd to comply to an unintentional consequence of some
interpretation of the standard, I'll concede.

If any other user can be found who is surprised by dd outputting the
stats after the output pipe is closed (but not when the disk is full
or any other error), I'll concede.

The current situation is simply blindly mis-interpreting the intent of
the standard, and then blindly implementing that.

If you would argue that you need to point out the sillyness of the
standard to the standards commission by literally implementing it,
fine, you have my support.

	Roger.

#461049#80
Date:
2008-01-29 16:08:51 UTC
From:
To:
Rogier Wolff <R.E.Wolff@BitWizard.nl> wrote:

If you're interested in getting a feel for how the standards-writers
themselves interpret this, you're welcome to write to the
mailing list dedicated to this sort of thing:

  austin-group-l@opengroup.org

#461049#85
Date:
2008-01-29 16:35:27 UTC
From:
To:
Hi,

In the past, the "dd" program would report the number of records
copied when it stopped. dd stopping can of course be caused by
several things: Running out of input, input IO error, output
io error, or the case of interest here, the output pipe getting
closed.

Someone has quoted the standard to say:

    ASYNCHRONOUS EVENTS

        For SIGINT, the dd utility shall interrupt its current processing,
        write status information to standard error, and exit as though
        terminated by SIGINT. It shall take the standard action for all
        other signals; see the ASYNCHRONOUS EVENTS section in Section 1.4
        (on page 2280).

and as dd recieves a "SIGPIPE" when an output pipe is closed,
this was interpreted as: in that case dd is NOT ALLOWED to print
any statistics output.

In my case, I used the "dd" program as an intermediary in a pipe, and the
statistics output was used to get a rough (how much data
was already in the pipe buffer is unknown) estimate of the amount
of data copied. The new "claimed-to-be-compliant" dd no longer outputs
the amount of data copied, resulting in failure of the script.

This is of course the opposite of the intent of a standard....


My questions are these:

Is the standard indeed intended to "silence" dd in case of the output
being a pipe, and the reading end of the pipe being closed?

If so, what arguments are used to support such a decision? It is imho
very odd that dd would report the amount of data copied for almost all
situations except when the output happens to be a pipe, and the
reason for stopping happens to be the output being closed.

Is there any historical "dd" that would behave like this, where the
standard would describe it's behaviour, to standardise this behaviour
for compatiblity reasons?

Best regards,

	Roger Wolff.

P.S. It's the FSF coreutils maintainers that apparently interpret
the standard this way, and modified their version of dd to no longer
report the statistics in this case. (claiming standards-compliance
as the reason for the change) This caused a script I wrote
to fail.  I am not interested in getting my script to run again.
It already does: I wrote a simple subset of "dd" to simply report
the number of blocks copied, even in the case of the output
pipe being closed. I'm interested in this subject to make the
standard "work as intended": to increase interoperability. My,
and other people's scripts should keep on working across platforms
and versions, instead of requiring manual fixups after each update.

#461049#90
Date:
2008-01-29 16:41:34 UTC
From:
To:
After having read through things again instead of saying the above I
would now have used this test case to reproduce the issue.  This is
clearer test for it.

  $ dd if=/dev/zero | head --bytes 1024 > /dev/null
  129+0 records in
  128+0 records out
  65536 bytes transferred in 0.037618 seconds (1742140 bytes/sec)

Older versions of dd produced the above output.  Starting with version
5.90 dd no longer outputs those statistics on SIGPIPE.

Bob

#461049#95
Date:
2008-02-01 07:02:20 UTC
From:
To:
Following up on Debian bug 461049
<http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=461049>,
Nick Stoughton <nick@usenix.org> writes:

It certainly does.  Wow.  That "perform some additional processing"
loophole is big enough to drive a truck through, though; as worded it
would let dd (say) execute "rm -fr $HOME" on receipt of SIGPIPE.

Surely there was intended to be _some_ limit on the "additional
processing" that utilities can do when they receive a random signal.
I would think that the intent was that this "additional processing" be
limited to cleanup actions (e.g., remove a temp file, or perhaps
restore the terminal state).  Printing statistics goes a bit beyond
that, and one could easily argue that it goes beyond what the standard
was intended to allow.

In 2005 I submitted the patch to coreutils dd to make it treat SIGPIPE
like all other known dd implementations do.  This was partly motivated
by my interpretation of POSIX, but it was also partly because I
couldn't see a good reason why coreutils dd would be incompatible with
all other dd implementations I knew of.

There is a similar issue with SIGQUIT, by the way.  Pre-2005 coreutils
'dd' treated SIGQUIT like SIGPIPE: that is, it printed statistics
before killing itself with SIGQUIT.  I don't view this as being
standard behavior either.

#461049#102
Date:
2010-07-29 17:41:50 UTC
From:
To:
Hi all, today i tried some stuff with dd.

I did:

<------output-------------->
dd if=/dev/sdb bs=1K count=1 | hexdump -C
1+0 records in
1+0 records out
00000000  eb 63 90 d0 bc 00 7c 8e  c0 8e d8 be 00 7c bf 00
|.c....|......|..|
1024 bytes (1.0 kB) copied00000010  06 b9 00 02 fc f3 a4 50  68 1c 06 cb
fb b9 04 00  |.......Ph.......|
, 1.8379e-05 s, 55.7 MB/s
00000020  bd be 07 80 7e 00 00 7c  0b 0f 85 10 01 83 c5 10
|....~..|........|
00000030  e2 f1 cd 18 88 56 00 55  c6 46 11 05 c6 46 03 02
|.....V.U.F...F..|
00000040  ff 00 00 20 01 00 00 00  00 02 fa 90 90 f6 c2 80  |...
............|
<------output-------------->

can you see the mixed output from dd to the console?

It mixed the output "1024 bytes (1.0 kB) copied" into the output from
the hexdump. I dont think this is intentional.

I attached the first few lines which show the behaviour better than in
this text as seperate txt-File.

Best Regards

Georg
Debian Release: squeeze/sid
   500 testing         security.debian.org
   500 testing         ftp.de.debian.org
   500 testing         debian-multimedia.informatik.uni-erlangen.de
   500 squeeze         www.lamaresh.net
--- Package information. ---
Package's Depends field is empty.

Package's Recommends field is empty.

Package's Suggests field is empty.