I haven't replied yet because I'm not sure what to make of it. It definitely does behave this way. Anyone else know if this was an intentional change or a bug? I don't see anything obvious in the changelog, and my reading of posix suggests that it's a bug; am I missing something? Mike Stone
Michael Stone wrote: I can't reproduce this behavior. This is what I see: $ dd if=/dev/zero | dd count=100 of=/dev/null 100+0 records in 100+0 records out 51200 bytes (51 kB) copied, 0.00245 seconds, 20.9 MB/s $ dd --version dd (coreutils) 5.97 $ $ uname -rm 2.6.18-5-amd64 x86_64 $ uname -rm 2.6.22-3-686 i686 Bob
Yes, you /can/ reproduce the bad behaviour. Your output shows the
bad behaviour. :-)
Good behaviour:
driepoot:~> dd if=/dev/zero | dd count=100 of=/dev/null
100+0 records in
100+0 records out
51200 bytes transferred in 0.000553 seconds (92582676 bytes/sec)
129+0 records in
128+0 records out
65536 bytes transferred in 0.002203 seconds (29747044 bytes/sec)
driepoot:~> dd --version
dd (coreutils) 5.2.1
[...]
driepoot:~>
There are two "dd" processes. The second one will copy over exactly
100 blocks, and then exit. It should report 100 blocks in, 100
blocks out. This works.
The other dd has an unlimited amount of input data (/dev/zero), and
it's output pipe is closed after 100 blocks of dat is consumed. The
"pipe" can hold a little more data, so actually it will have read
and subsequently written more than 100 blocks (in my case 128).
So, when 100 blocks were written and consumed by the other side of
the pipe, 28 more blocks were in the pipe, when the system noticed
that the other side of the pipe was closed, and returned EPIPE
to the first dd.
Let me reiterate: It is the first "dd" that is misbehaving, when it
recieves a write error and SIGPIPE, it simply exits instead of
reporting the stats.
using strace on the first dd shows (good behaviour on an older
system!):
[...]
read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
rt_sigaction(SIGPIPE, {SIG_DFL}, NULL, 8) = 0
close(0) = 0
close(1) = 0
write(2, "101+0 records in\n", 17101+0 records in
) = 17
write(2, "100+0 records out\n", 18100+0 records out
) = 18
gettimeofday({1201594453, 321231}, NULL) = 0
write(2, "51200 bytes transferred in 0.025"..., 6451200 bytes transferred in 0.025491 seconds (2008543 bytes/sec)
) = 64
gettid() = 13885
tgkill(13885, 13885, SIGPIPE) = 0
sigreturn() = ? (mask now [])
--- SIGPIPE (Broken pipe) @ 0 (0) ---
+++ killed by SIGPIPE +++
driepoot:~>
or (bad behaviour at an upgraded system):
[...]
read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
+++ killed by SIGPIPE +++
Process 11355 detached
Roger.
Rogier Wolff <R.E.Wolff@BitWizard.nl> wrote:
...
Thanks for the report, but that behavior is required by POSIX.
dd must handle SIGINT the way you want, but dd may not handle
SIGPIPE that way:
ASYNCHRONOUS EVENTS
For SIGINT, the dd utility shall interrupt its current processing,
write status information to standard error, and exit as though
terminated by SIGINT. It shall take the standard action for all
other signals; see the ASYNCHRONOUS EVENTS section in Section 1.4
(on page 2280).
I figured there'd be some piece of posix at the bottom of it. :) I wonder if the documentation should better reflect that. (The info page says only that "when dd completes it outputs the final statistics; maybe something like "when dd completes normally or is killed by SIGINT it outputs the final statistics"?) Mike Stone
Please interpret it as dd is supposed to work:
Please block the "SIGPIPE" signal, then write will return:
EPIPE fd is connected to a pipe or socket whose reading end is
closed. When this happens the writing process will also
receive a SIGPIPE signal. (Thus, the write return
value is seen only if the program catches, blocks or
ignores this signal.)
then the write will return EPIPE, which is a normal error code,
indicating that writing has stopped, and that stats should be printed.
What is dd's "purpose in life"? Device-to-device copy, as it's name
suggests? Turns out modern use is often no longer device-to-device. It
copies files, and counts and reports the number of blocks copied. This
feature is used by some. If you go and change it, things will stop
working.
If you start "waving standards around", I'll bounce it back for
you. SIGPIPE is NOT an asynchronous event. As documented in the quoted
documentation for write. It happens synchronously with the write,
normally before the return of "EPIPE".
Roger.
Michael Stone <mstone@debian.org> wrote:
I like it.
I'll do this one, but in general, it's much easier for me if such a
suggestion comes in the form of a patch created by e.g.,
git format-patch --signoff HEAD~1
Then I can simply apply it and give proper credit with the
very convenient "git am PATCH".
Here's what I'll push:
------------------
Subject: [PATCH] Improve the description of when dd outputs its final statistics.
* doc/coreutils.texi (dd invocation): Say that dd prints stats
upon normal termination and upon SIGINT.
Signed-off-by: Jim Meyering <meyering@redhat.com>
---
ChangeLog | 6 ++++++
doc/coreutils.texi | 3 ++-
2 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 148b7d7..8415d3d 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2008-01-29 Michael Stone <mstone@debian.org>
+
+ Improve the description of when dd outputs its final statistics.
+ * doc/coreutils.texi (dd invocation): Say that dd prints stats
+ upon normal termination and upon SIGINT.
+
2008-01-29 Jim Meyering <meyering@redhat.com>
Avoid "make distcheck" failure: newly-created man/*.1 files not removed
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index c1300fb..fb9d5fc 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -7559,7 +7559,8 @@ process makes it print I/O statistics to standard error
and then resume copying. In the example below,
@command{dd} is run in the background to copy 10 million blocks.
The @command{kill} command makes it output intermediate I/O statistics,
-and when @command{dd} completes, it outputs the final statistics.
+and when @command{dd} completes normally or is killed by the
+@code{SIGINT} signal, it outputs the final statistics.
@example
$ dd if=/dev/zero of=/dev/null count=10MB & pid=$!
--
1.5.4.rc5.1.ge6bfe
size of the data copied, which helps me verify things went as planned), I want to see 1440 blocks copied. So when I have an image of which the first 1440kbytes should be copied to a floppy, I want dd if=my.img of=/dev/fd0 bs=1k to report 1440+0 records out This is my verification that all data got copied as intended. If my image is larger than 1440 kbytes, I expect to see 1441+0 records in and dd will experience a "ENOSPC" on the 1441'th write. AGAIN: dd is used to copy data, and to provide feedback on howmuch data it copied. It is inconsistent to omit the feedback in some cases when the copying stops. Copying can stop because of several reasons: Two I can think of are: "no more input", and "output no longer possible". "ENOSPC" on a device, or a file (disk full), is quite similar, and should be handled similar to EPIPE on the output. Roger.
Which is not correct, as the stats are also printed upon write error, like ENOSPC. (as we all agree the standards require). Roger.
Rogier Wolff <R.E.Wolff@BitWizard.nl> wrote:
You were reporting that this didn't work the way
you expected:
The real solution is just "don't do that (i.e., don't use the
unnecessary pipe). Do this instead:
$ dd if=/dev/zero count=100 of=/dev/null
Rogier Wolff <R.E.Wolff@BitWizard.nl> wrote: It depends on your definition of "normally". If I meant what you seem to think I meant, I would have said "...completes *successfully*". However, improvements are always welcome.
As reported in the original bugreport: "Of course, the two dd's are unneccesary, and this could be done with one dd. In practise, the consumer (second dd) is another program that exits when it's had enough data. The output of (the first) dd is then used to extract the approximate amount of data copied." The script that, after an system-upgrade, stopped working of course uses a different program, which I'm not sure you have. So I decided to provide an example, for which I didn't have to provide any data files and other "unusual" programs. This is a simple test case, which regrettably uses a second "dd" to limit the amount of data that can be copied. . This results in two different possibilities if miscommunication: First someone reports seeing "100+0 records copied" from the second dd, which is of course correct, but the bug lies in the first dd no longer reporting the amount of data copied. The second case, is your interpretation that one dd would be unneccesary as it could be done with just one. Roger.
Rogier Wolff <R.E.Wolff@BitWizard.nl> wrote: ... You're beating a dead horse already. You'll need to come up with much better arguments (that probably do not exist) to make me change dd to be non-compliant in this respect.
In my interpretation of the quoted part of the standard, dd has become non-compliant, by not reporting the statistics when its output happens to be connected to a pipe. I will agree that my interpretation is a bit convoluted. But it's a valid interpretation, and it makes dd do the "sensible thing". It currently implements a "weird" exception, fuelled by a litteral, interpretation of unintentional wording in a standard. I'm all in favor of standards. However, blindly implementing an unintentional consequence of a standard is not the way to improve interoperability. The INTENT of the that part of the standard was to say that dd implementations need not catch-and-handle all possible signals. If any other dd can be found that behaves like gnu-dd does now, which warrants gnu-dd to comply to an unintentional consequence of some interpretation of the standard, I'll concede. If any other user can be found who is surprised by dd outputting the stats after the output pipe is closed (but not when the disk is full or any other error), I'll concede. The current situation is simply blindly mis-interpreting the intent of the standard, and then blindly implementing that. If you would argue that you need to point out the sillyness of the standard to the standards commission by literally implementing it, fine, you have my support. Roger.
Rogier Wolff <R.E.Wolff@BitWizard.nl> wrote: If you're interested in getting a feel for how the standards-writers themselves interpret this, you're welcome to write to the mailing list dedicated to this sort of thing: austin-group-l@opengroup.org
Hi,
In the past, the "dd" program would report the number of records
copied when it stopped. dd stopping can of course be caused by
several things: Running out of input, input IO error, output
io error, or the case of interest here, the output pipe getting
closed.
Someone has quoted the standard to say:
ASYNCHRONOUS EVENTS
For SIGINT, the dd utility shall interrupt its current processing,
write status information to standard error, and exit as though
terminated by SIGINT. It shall take the standard action for all
other signals; see the ASYNCHRONOUS EVENTS section in Section 1.4
(on page 2280).
and as dd recieves a "SIGPIPE" when an output pipe is closed,
this was interpreted as: in that case dd is NOT ALLOWED to print
any statistics output.
In my case, I used the "dd" program as an intermediary in a pipe, and the
statistics output was used to get a rough (how much data
was already in the pipe buffer is unknown) estimate of the amount
of data copied. The new "claimed-to-be-compliant" dd no longer outputs
the amount of data copied, resulting in failure of the script.
This is of course the opposite of the intent of a standard....
My questions are these:
Is the standard indeed intended to "silence" dd in case of the output
being a pipe, and the reading end of the pipe being closed?
If so, what arguments are used to support such a decision? It is imho
very odd that dd would report the amount of data copied for almost all
situations except when the output happens to be a pipe, and the
reason for stopping happens to be the output being closed.
Is there any historical "dd" that would behave like this, where the
standard would describe it's behaviour, to standardise this behaviour
for compatiblity reasons?
Best regards,
Roger Wolff.
P.S. It's the FSF coreutils maintainers that apparently interpret
the standard this way, and modified their version of dd to no longer
report the statistics in this case. (claiming standards-compliance
as the reason for the change) This caused a script I wrote
to fail. I am not interested in getting my script to run again.
It already does: I wrote a simple subset of "dd" to simply report
the number of blocks copied, even in the case of the output
pipe being closed. I'm interested in this subject to make the
standard "work as intended": to increase interoperability. My,
and other people's scripts should keep on working across platforms
and versions, instead of requiring manual fixups after each update.
After having read through things again instead of saying the above I would now have used this test case to reproduce the issue. This is clearer test for it. $ dd if=/dev/zero | head --bytes 1024 > /dev/null 129+0 records in 128+0 records out 65536 bytes transferred in 0.037618 seconds (1742140 bytes/sec) Older versions of dd produced the above output. Starting with version 5.90 dd no longer outputs those statistics on SIGPIPE. Bob
Following up on Debian bug 461049 <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=461049>, Nick Stoughton <nick@usenix.org> writes: It certainly does. Wow. That "perform some additional processing" loophole is big enough to drive a truck through, though; as worded it would let dd (say) execute "rm -fr $HOME" on receipt of SIGPIPE. Surely there was intended to be _some_ limit on the "additional processing" that utilities can do when they receive a random signal. I would think that the intent was that this "additional processing" be limited to cleanup actions (e.g., remove a temp file, or perhaps restore the terminal state). Printing statistics goes a bit beyond that, and one could easily argue that it goes beyond what the standard was intended to allow. In 2005 I submitted the patch to coreutils dd to make it treat SIGPIPE like all other known dd implementations do. This was partly motivated by my interpretation of POSIX, but it was also partly because I couldn't see a good reason why coreutils dd would be incompatible with all other dd implementations I knew of. There is a similar issue with SIGQUIT, by the way. Pre-2005 coreutils 'dd' treated SIGQUIT like SIGPIPE: that is, it printed statistics before killing itself with SIGQUIT. I don't view this as being standard behavior either.
Hi all, today i tried some stuff with dd. I did: <------output--------------> dd if=/dev/sdb bs=1K count=1 | hexdump -C 1+0 records in 1+0 records out 00000000 eb 63 90 d0 bc 00 7c 8e c0 8e d8 be 00 7c bf 00 |.c....|......|..| 1024 bytes (1.0 kB) copied00000010 06 b9 00 02 fc f3 a4 50 68 1c 06 cb fb b9 04 00 |.......Ph.......| , 1.8379e-05 s, 55.7 MB/s 00000020 bd be 07 80 7e 00 00 7c 0b 0f 85 10 01 83 c5 10 |....~..|........| 00000030 e2 f1 cd 18 88 56 00 55 c6 46 11 05 c6 46 03 02 |.....V.U.F...F..| 00000040 ff 00 00 20 01 00 00 00 00 02 fa 90 90 f6 c2 80 |... ............| <------output--------------> can you see the mixed output from dd to the console? It mixed the output "1024 bytes (1.0 kB) copied" into the output from the hexdump. I dont think this is intentional. I attached the first few lines which show the behaviour better than in this text as seperate txt-File. Best Regards Georg Debian Release: squeeze/sid 500 testing security.debian.org 500 testing ftp.de.debian.org 500 testing debian-multimedia.informatik.uni-erlangen.de 500 squeeze www.lamaresh.net --- Package information. --- Package's Depends field is empty. Package's Recommends field is empty. Package's Suggests field is empty.