#607267 /usr/bin/scp: fails to notice close() errors

Package:
openssh-client
Source:
openssh
Description:
secure shell (SSH) client, for secure access to remote machines
Submitter:
Michal Suchanek
Date:
2011-12-09 09:36:34 UTC
Severity:
important
#607267#5
Date:
2010-12-16 12:40:30 UTC
From:
To:
scp fails to notice close() errors.

To reproduce:

- mount a 10M tmpfs

- export it through samba

- mount the share through cifs

- take a 12M file (eg dd from /dev/zero)

When you copy the file to the samba share no error is reported.

The tail of the file is silently lost.

$ scp /scratch/junk . ; echo $?
0
$ rm junk
$ cat /scratch/junk |  gzip -c > junk

gzip: stdout: No space left on device
$ rm junk
$ cat /scratch/junk |  gzip -c | split -d -a 3 -b 6M - junk.gz. ; echo $?
split: junk.gz.001: No space left on device
1

#607267#10
Date:
2010-12-25 21:00:11 UTC
From:
To:
user release.debian.org@packages.debian.org
usertag 607267 squeeze-can-defer
tag 607267 squeeze-ignore
kthxbye
I'm not sure this is grave, but in any case this won't block the
release, tagging accordingly.

Cheers,
Julien

#607267#17
Date:
2011-01-02 14:48:05 UTC
From:
To:
Hello

Excerpts from Julien Cristau's message of Sat Dec 25 22:00:11 +0100 2010:

As data loss is defined as grave I would expect this is the case.

I tried on 2.6.36 release kernel (not rc) and the issue stays.

I also tried to open and write a file in a scripting shell and found
that the error is only reported on fsync().

Not sure if this is compliant to anything but the Linux close(2) man
page clearly states:

A successful close does not guarantee that the data has been
successfully saved to disk, as the kernel defers writes. It is not
common for a file system to flush the buffers when the stream is closed.
If you need to be sure that the data is physically stored use fsync(2).
(It will depend on the disk hardware at this point.)

This suggests that the reason the error is not recognized is due to scp
not doing fsync() which is the only guaranteed way to ensure that the
data is ever written to the file.

This condition may occur when you run out of disk space (which is
hopefully rare but the more surprising) and possibly when you run out of
disk quota.

This same issue also happens with cp(1) from coreutils.

Thanks

Michal

#607267#22
Date:
2011-12-03 16:33:04 UTC
From:
To:
Hi,

I verified that this statement is wrong.

1) The coreutils actually check the return value of close which can be
seen on copy.c. It has precisely two calls to close and both are
checked.

2) I created a simple LD_PRELOAD library to make close fail (attached).
Running cp foo bar with this library preloaded (i.e. failing any close
with EIO) I get the following output and return value 1:

cp: closing `foo': Input/output error

This is correct behaviour. Since scp invokes cp for local copies, the
test originally submitted testcase

Michal Suchanek wrote earlier:

is wrong as well. This command would output a similar error message to
the one above. However this does not invalidate the bug immediately. To
be sure, remote transfers need to be checked as well.

A quick grep of the openssh source indicates that checking close is
overrated. Who needs errors anyway? I want that it works!!1!eleven

Helmut

#607267#27
Date:
2011-12-04 23:41:21 UTC
From:
To:
Excerpts from Helmut Grohne's message of Sat Dec 03 17:33:04 +0100 2011:

It is not.

Please read the analysis in the latter message.

And for it to work it needs to report errors when they happen.

Thanks

Michal

#607267#32
Date:
2011-12-05 14:54:19 UTC
From:
To:
Hi Michal,

This probably depends on the point of view. From the current context it
is not clear what "this issue" is.

I think we should split this up in two issues:

1) Not checking the return value of close().

This is a very real bug in openssh, but not in coreutils (seem my
analysis).

2) Not fsyncing the files before closeing them.

It is not the job of cp nor scp to guarantee that any file has reached
the disk. So this "bug" will not be fixed. If it was their job, tools
like sync(1) would not exist in the first place. If it was, you could
file this bug report against every single package handling files in the
archive (except for a handful). Since that would be insane, I simply
dropped this request in my previous reply. I should have made this more
explicit.

Can we now ignore 2) and concentrate on 1)?

I guess I should have used explicit irony tags here.

Unfortunately fixing ssh will not be a small patch. It might be best to
simply document the issue in man 1 scp.

Helmut

#607267#37
Date:
2011-12-05 16:03:25 UTC
From:
To:
Excerpts from Helmut Grohne's message of Mon Dec 05 15:54:19 +0100 2011:

No. If I wanted this semantics I could use shred(1).

I want my files saved.

Note that this same issue has been found and fixed in dpkg.

Thanks

Michal

#607267#42
Date:
2011-12-05 17:15:53 UTC
From:
To:
Hi Michal,

Please report a separate bug about not using fsync then. (or clone this
bug) They can be fixed independently.

Honestly. You seem to have a rather different view on this issue than
the rest of the world (otherwise it would have been reported way
earlier).

As for me I really do *not* want fsync here. For instance when I use a
slow usb storage device, I really want cp to finish before the copied
stuff is written to permanent storage.

So even I would assume that the fsync issue is just wontfix (even though
I am not the openssh maintainer).

This is good and all. But in general tools simply do not provide fsync
semantics and people actually expect that now. When firefox tried to
introduce fsync, it was disabled for performance reasons again. If you
need it, go sync after you copy. (Or write patches.)

Helmut

#607267#47
Date:
2011-12-05 18:57:59 UTC
From:
To:
Excerpts from Helmut Grohne's message of Mon Dec 05 18:15:53 +0100 2011:

I would guess that most of "the rest of the world" is not aware of this
issue.

It's probably not been like that since day 1 of Linux. I suspect that at
some point Linux implemented some optimization that allows close() to
finish before the space for the written data is even allocated and
others like that which leads to these issues.

Why? Either you want the data on the storage so you eject it and wait
for eject to finish anyway or you are doing something else in the
meantime so you can do so regardless of cp finishing or not.

No, it is not good at all that fsync is not used in things like cp.

It was not used in dpkg which led to severe system corruption which is
now fixed but any other tools used to manipulate files should do the
same. Otherwise they are useless.

Thanks

Michal

#607267#52
Date:
2011-12-05 19:39:42 UTC
From:
To:
Please kindly take me off your nonsense rants.  And better yet, take
them off the debian bug tracking system.

TIA,
Julien

#607267#57
Date:
2011-12-08 13:33:00 UTC
From:
To:
FWIW this is unreproducible as of kernel 3.2 rc2 so I guess this is
squeeze only for cifs shares (as can be verified by running the test on
a squeeze live CD).

If there are other filesystems that might produce the same behaviour
under some circumstances in current kernel I don't know.

However, the close() man page clearly states that only doing fsync() you
can be sure your file was closed successfully.

Thanks

Michal

#607267#62
Date:
2011-12-09 09:32:49 UTC
From:
To:
severity 607267 important
thanks

Thanks for reporting. Given the small amount of systems being able to
reproduce this issue I am downgrading the severity. You need an old (or
stable) kernel and a filesystem like cifs. This doesn't seem to be a
frequently encountered combination.

This is slightly incorrect. If close() returns 0, the file is closed
successfully, so unfortunately your immediate conclusion is wrong again.
The truth is that close() does not ensure that any data has reached a
disk. Also note that using fsync() would not be enough either. It would
just force the file to disk, but not necessarily its entry in a
directory. After a power failure your file would be on the disk, but
there would be no visible reference from your directory tree. You'd have
to call fsync on the directory as well. After all it doesn't seem that
obvious given that you got it wrong. ;-)

I still wonder where it says that cp would guarantee that data would
reach a disk. Can you find a reference? Same question for scp.

Helmut