- Package:
- openssh-client
- Source:
- openssh
- Description:
- secure shell (SSH) client, for secure access to remote machines
- Submitter:
- Michal Suchanek
- Date:
- 2011-12-09 09:36:34 UTC
- Severity:
- important
scp fails to notice close() errors. To reproduce: - mount a 10M tmpfs - export it through samba - mount the share through cifs - take a 12M file (eg dd from /dev/zero) When you copy the file to the samba share no error is reported. The tail of the file is silently lost. $ scp /scratch/junk . ; echo $? 0 $ rm junk $ cat /scratch/junk | gzip -c > junk gzip: stdout: No space left on device $ rm junk $ cat /scratch/junk | gzip -c | split -d -a 3 -b 6M - junk.gz. ; echo $? split: junk.gz.001: No space left on device 1
user release.debian.org@packages.debian.org usertag 607267 squeeze-can-defer tag 607267 squeeze-ignore kthxbye I'm not sure this is grave, but in any case this won't block the release, tagging accordingly. Cheers, Julien
Hello Excerpts from Julien Cristau's message of Sat Dec 25 22:00:11 +0100 2010: As data loss is defined as grave I would expect this is the case. I tried on 2.6.36 release kernel (not rc) and the issue stays. I also tried to open and write a file in a scripting shell and found that the error is only reported on fsync(). Not sure if this is compliant to anything but the Linux close(2) man page clearly states: A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a file system to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored use fsync(2). (It will depend on the disk hardware at this point.) This suggests that the reason the error is not recognized is due to scp not doing fsync() which is the only guaranteed way to ensure that the data is ever written to the file. This condition may occur when you run out of disk space (which is hopefully rare but the more surprising) and possibly when you run out of disk quota. This same issue also happens with cp(1) from coreutils. Thanks Michal
Hi, I verified that this statement is wrong. 1) The coreutils actually check the return value of close which can be seen on copy.c. It has precisely two calls to close and both are checked. 2) I created a simple LD_PRELOAD library to make close fail (attached). Running cp foo bar with this library preloaded (i.e. failing any close with EIO) I get the following output and return value 1: cp: closing `foo': Input/output error This is correct behaviour. Since scp invokes cp for local copies, the test originally submitted testcase Michal Suchanek wrote earlier: is wrong as well. This command would output a similar error message to the one above. However this does not invalidate the bug immediately. To be sure, remote transfers need to be checked as well. A quick grep of the openssh source indicates that checking close is overrated. Who needs errors anyway? I want that it works!!1!eleven Helmut
Excerpts from Helmut Grohne's message of Sat Dec 03 17:33:04 +0100 2011: It is not. Please read the analysis in the latter message. And for it to work it needs to report errors when they happen. Thanks Michal
Hi Michal, This probably depends on the point of view. From the current context it is not clear what "this issue" is. I think we should split this up in two issues: 1) Not checking the return value of close(). This is a very real bug in openssh, but not in coreutils (seem my analysis). 2) Not fsyncing the files before closeing them. It is not the job of cp nor scp to guarantee that any file has reached the disk. So this "bug" will not be fixed. If it was their job, tools like sync(1) would not exist in the first place. If it was, you could file this bug report against every single package handling files in the archive (except for a handful). Since that would be insane, I simply dropped this request in my previous reply. I should have made this more explicit. Can we now ignore 2) and concentrate on 1)? I guess I should have used explicit irony tags here. Unfortunately fixing ssh will not be a small patch. It might be best to simply document the issue in man 1 scp. Helmut
Excerpts from Helmut Grohne's message of Mon Dec 05 15:54:19 +0100 2011: No. If I wanted this semantics I could use shred(1). I want my files saved. Note that this same issue has been found and fixed in dpkg. Thanks Michal
Hi Michal, Please report a separate bug about not using fsync then. (or clone this bug) They can be fixed independently. Honestly. You seem to have a rather different view on this issue than the rest of the world (otherwise it would have been reported way earlier). As for me I really do *not* want fsync here. For instance when I use a slow usb storage device, I really want cp to finish before the copied stuff is written to permanent storage. So even I would assume that the fsync issue is just wontfix (even though I am not the openssh maintainer). This is good and all. But in general tools simply do not provide fsync semantics and people actually expect that now. When firefox tried to introduce fsync, it was disabled for performance reasons again. If you need it, go sync after you copy. (Or write patches.) Helmut
Excerpts from Helmut Grohne's message of Mon Dec 05 18:15:53 +0100 2011: I would guess that most of "the rest of the world" is not aware of this issue. It's probably not been like that since day 1 of Linux. I suspect that at some point Linux implemented some optimization that allows close() to finish before the space for the written data is even allocated and others like that which leads to these issues. Why? Either you want the data on the storage so you eject it and wait for eject to finish anyway or you are doing something else in the meantime so you can do so regardless of cp finishing or not. No, it is not good at all that fsync is not used in things like cp. It was not used in dpkg which led to severe system corruption which is now fixed but any other tools used to manipulate files should do the same. Otherwise they are useless. Thanks Michal
Please kindly take me off your nonsense rants. And better yet, take them off the debian bug tracking system. TIA, Julien
FWIW this is unreproducible as of kernel 3.2 rc2 so I guess this is squeeze only for cifs shares (as can be verified by running the test on a squeeze live CD). If there are other filesystems that might produce the same behaviour under some circumstances in current kernel I don't know. However, the close() man page clearly states that only doing fsync() you can be sure your file was closed successfully. Thanks Michal
severity 607267 important thanks Thanks for reporting. Given the small amount of systems being able to reproduce this issue I am downgrading the severity. You need an old (or stable) kernel and a filesystem like cifs. This doesn't seem to be a frequently encountered combination. This is slightly incorrect. If close() returns 0, the file is closed successfully, so unfortunately your immediate conclusion is wrong again. The truth is that close() does not ensure that any data has reached a disk. Also note that using fsync() would not be enough either. It would just force the file to disk, but not necessarily its entry in a directory. After a power failure your file would be on the disk, but there would be no visible reference from your directory tree. You'd have to call fsync on the directory as well. After all it doesn't seem that obvious given that you got it wrong. ;-) I still wonder where it says that cp would guarantee that data would reach a disk. Can you find a reference? Same question for scp. Helmut