Dear Maintainer,
* What led up to the situation?
I believe the problem started with the latest update of the rsync package end
of 08/2020. A previously working setup of backuppc reproducibly fails for a
specific host and specific directory tree since then.
* What exactly did you do (or not do) that was effective (or
ineffective)?
I repeatedly restarted the backup from BackupPC, all failing.
I conducted a test rsynch over ssh directly without BackupPC, all failing.
I checked dmesg for indications of lower level problems, there are none.
* What was the outcome of this action?
1) With BackupPC
How are you invoking rsync? Please give the exact command line, including what you current working directory is. Are you actually using ssh to connect to localhost and transferring the data over that ssh link? If so, then these lines are very strange in the case (1) text: I would expect an rsync to negotiate the highest protocol version possible, so this looks like it's talking to a newer version than itself. Because of the "errors with program diagnostics" I would have expected more messages... It would be most helpful if you could follow the instructions on https://rsync.samba.org/issues.html especially de second half of question 3. Paul
Hi Paul, thanks for the fast answer! "Paul Slootman" paul@debian.org – 3. September 2020 16:27 [...] User is backuppc, working directory is /var/lib/backuppc. Command line: sh-5.0$ rsync -vvv --rsh=ssh --numeric-ids --perms --owner --group -D --links --hard-links --times --block-size 48 \ --recursive --one-file-system root@localhost:/var/lib/lxc /mnt/testrsync/ >/mnt/testrsync/stdout.txt 2>/mnt/testrsync/stderr.txt In this case (2), I issued the command from within a screen session. Here, ssh is used internally by rsync (--rsh=ssh). The idea is to change privilege, as backuppc is allowed to execute rsync via authorized_keys. Also it is nice that the same mechanism works for all hosts to be backed up. In contrast to case (2), this case is running from within BackupPC. AFAIK, BackupPC_dump directly executes ssh rsync —server —sender … and implements the receiving side itself. So, I would conclude that BackupPC_dump implements protocol version 28. So did I :-) Thanks for the reference, I will try it later. Andreas.
Hi again,
So I did:
sh-5.0$ rsync -vvv --rsh=ssh --rsync-path=/usr/bin/rsync --numeric-ids --perms --owner --group -D --links --hard-links --times --block-size=2048 --recursive --one-file-system root@localhost:/var/lib/lxc /mnt/testrsync/ >/mnt/testrsync/stdout2.txt 2>/mnt/testrsync/stderr2.txt
sh-5.0$ echo $?
13
With the following results:
— begin stderr2.txt
rsync error: errors with program diagnostics (code 13) at log.c(245) [receiver=3.2.3]
rsync: [generator] write error: Broken pipe (32)
— end stderr2.txt
— tail of stdout2.txt
recv_generator(lxc/nextcloud/rootfs/var/backups/nextcloud-confbkp_20200119/extra-apps/groupfolders/l10n/sr.js,519849)
recv_generator(lxc/nextcloud/rootfs/var/backups/nextcloud-confbkp_20200119/extra-apps/groupfolders/l10n/sr.json,519850)
recv_generator(lxc/nextcloud/rootfs/var/backups/nextcloud-confbkp_20200119/extra-apps/groupfolders/l10n/sv.js,519851)
recv_generator(lxc/nextcloud/rootfs/var/backups/nextcloud-confbkp_20200119/extra-apps/groupfolders/l10n/sv.json,519852)
recv_generator(lxc/nextcloud/rootfs/var/backups/nextcloud-confbkp_20200119/extra-apps/groupfolders/l10n/tr.js,519853)
recv_generator(lxc/nextcloud/rootfs/var/backups/nextcloud-confbkp_20200119/extra-apps/groupfolders/l10n/tr.json,519854)
recv_generator(lxc/nextcloud/rootfs/var/backups/nextcloud-confbkp_20200119/extra-apps/groupfolders/l10n/uz.js,519855)
recv_generator(lxc/nextcloud/rootfs/var/backups/nextcloud-confbkp_20200119/extra-apps/groupfolders/l10n/uz.json,519856)
recv_generator(lxc/nextcloud/rootfs/var/backups/nextcloud-confbkp_20200119/extra-apps/groupfolders/l10n/zh_CN.js,519857)
recv_generator(lxc/nextcloud/rootfs/var/backups/nextcloud-confbkp_20200119/extra-apps/groupfolders/l10n/zh_CN.json,519858)
recv_generator(lxc/nextcloud/rootfs/var/backups/nextcloud-confbkp_20200119/extra-apps/groupfolders/l10n/zh_TW.js,519859)
recv_generator(lxc/nextcloud/rootfs/var/backups/nextcloud-confbkp_20200119/extra-apps/groupfolders/l10n/zh_TW.json,519860)
[receiver] _exit_cleanup(code=13, file=log.c, line=245): about to call exit(13)
[generator] _exit_cleanup(code=13, file=io.c, line=1644): about to call exit(13)
— end of stdout2.txt
— tail of sender-side strace
244242 12:26:26 select(2, [0], [1], [0], {tv_sec=60, tv_usec=0}) = 1 (out [1], left {tv_sec=59, tv_usec=999998})
244242 12:26:26 write(1, "\4\200\0\7\364\207\237\312\235\2176X\312\217m\t"..., 32776) = 32776
244242 12:26:26 select(2, [0], [1], [0], {tv_sec=60, tv_usec=0}) = 1 (out [1], left {tv_sec=59, tv_usec=999999})
244242 12:26:26 write(1, "\4\200\0\7\370\37H\305,s\333G42,\261:\"..., 32776) = 24584
244242 12:26:26 select(2, [0], [1], [0], {tv_sec=60, tv_usec=0}) = 2 (in [0], out [1], left {tv_sec=34, tv_usec=971695})
244242 12:26:51 read(0, "", 13405) = 0
244242 12:26:51 select(2, [], [1], [], {tv_sec=60, tv_usec=0}) = 1 (out [1], left {tv_sec=59, tv_usec=999995})
244242 12:26:51 write(1, "u\226A\256\337\231\325\366\242"..., 8192) = -1 EPIPE (Datenübergabe unterbrochen (broken pipe))
244242 12:26:51 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=244242, si_uid=0} ---
244242 12:26:53 write(2, "rsync: [sender] write error: Broken pipe (32)\n", 46) = -1 EPIPE (Datenübergabe unterbrochen (broken pipe))
244242 12:26:53 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=244242, si_uid=0} ---
244242 12:26:53 rt_sigaction(SIGUSR1, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f563bcb3e30}, NULL, 8) = 0
244242 12:26:53 rt_sigaction(SIGUSR2, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f563bcb3e30}, NULL, 8) = 0
244242 12:26:53 write(2, "rsync error: errors with program diagnostics (code 13) at log.c(245) [sender=3.2.3]\n", 84) = -1 EPIPE (Datenübergabe unterbrochen (broken pipe))
244242 12:26:53 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=244242, si_uid=0} ---
244242 12:26:53 rt_sigaction(SIGUSR1, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f563bcb3e30}, NULL, 8) = 0
244242 12:26:53 rt_sigaction(SIGUSR2, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f563bcb3e30}, NULL, 8) = 0
244242 12:26:53 write(2, "[sender] _exit_cleanup(code=13, file=log.c, line=245): about to call exit(13)\n", 78) = -1 EPIPE (Datenübergabe unterbrochen (broken pipe))
244242 12:26:53 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=244242, si_uid=0} ---
244242 12:26:53 rt_sigaction(SIGUSR1, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f563bcb3e30}, NULL, 8) = 0
244242 12:26:53 rt_sigaction(SIGUSR2, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f563bcb3e30}, NULL, 8) = 0
244242 12:26:53 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=100000000}, 0x7fff8399a960) = 0
244242 12:26:53 exit_group(13) = ?
244242 12:26:53 +++ exited with 13 +++
— end of strace
Note I drastically shortened the content of the write(1, …) commands in the same style they were shortened anyway; guess it is just file content.
I must admit that I’m none the wiser. The sending half obviously wanted to write exactly the same diagnostic as the receiving half did ("errors with program diagnostics (code 13)“) - but couldn’t because the communication pipe was already broken. It looks as if the sending side was unexpectedly struck from the receiving side closing the connection. Whereas the receiving side doesn’t seem to give any indication of what is going wrong apart from that somewhat generic message.
Hope this helps anyway!
Best regards,
Andreas.
Hi all, I also face this issue, and tried the last versions of rsync. The problem was introduced in rsync_3.2.3-1, downgrading to 3.2.2-2 works. Best regards, Andre
I've got a script that tries to rsync a file to an external harddisk. -rw------- 1 libvirt-qemu libvirt-qemu 107390828544 15. Apr 15:07 vm.qcow2 In essence, /usr/bin/time rsync -va --inplace --no-whole-file --block-size=65536 --progress --stats \ /var/lib/libvirt/images/vm.qcow2 /mnt/tmp3/vm.qcow2 That command takes quite a while (I guess checksumming the source file), then starts transmitting; sending incremental file list vm.qcow2 5,477,040,128 5% 237.65MB/s 0:06:58 At that position rsync reproducible stops to transfer data; there's no error message, the first rsync process just hangs with 100% CPU. strace of the 3 processes only shows that the other two are waiting for data on their socket; the "hanging" one is simply user-bound, /proc/<pid>/stack is empty. That happens with both 3.1.3-6 and 3.2.3-4. Next time I'll try to have debug symbols installed and look at the process via gdb - right now the file is identical, I copied it via "cp -a".
Hi, Could you also try it without the --block-size=65536 option? Paul
# gdb --args rsync -va --inplace --no-whole-file --progress --stats /var/lib/libvirt/images/win10.qcow2 win10.qcow2 went through without any problem. So either the "--blocksize" is the reason -- or some bit pattern in the previous version. Touching the file and re-trying with checksumming (-c) has nothing to copy and so did succeed; I'll try gdb when the file was changed.
This time I used "--block-size=65536" again, and it hang:
3841 pts/15 S 0:01 \_ -bash
92536 pts/15 R+ 2:36 \_ rsync -va --inplace --no-whole-file
--block-size=65536 --progress --stats
/var/lib/libvirt/images/win10.qcow2 win10.qcow2
92537 pts/15 S+ 1:28 \_ rsync -va --inplace
--no-whole-file --block-size=65536 --progress --stats
/var/lib/libvirt/images/win10.qcow2 win10.qcow2
92538 pts/15 S+ 0:08 \_ rsync -va --inplace
--no-whole-file --block-size=65536 --progress --stats
/var/lib/libvirt/images/win10.qcow2 win10.qcow2
GDB says:
0x00005601c4fd64b0 in hash_search (len=<optimized out>, buf=<optimized
out>, s=<optimized out>, f=<optimized out>) at match.c:206
206 match.c: Datei oder Verzeichnis nicht gefunden.
(gdb) bt
#0 0x00005601c4fd64b0 in hash_search (len=<optimized out>,
buf=<optimized out>, s=<optimized out>, f=<optimized out>) at
match.c:206
#1 match_sums (f=f@entry=4, s=s@entry=0x5601c6465420,
buf=buf@entry=0x5601c6465450, len=107390828544) at match.c:396
#2 0x00005601c4fca6c1 in send_files (f_in=f_in@entry=5,
f_out=f_out@entry=4) at sender.c:389
#3 0x00005601c4fd49ac in client_run (f_in=5, f_out=4,
pid=pid@entry=92537, argc=argc@entry=1, argv=argv@entry=0x5601c643b660)
at main.c:1317
#4 0x00005601c4fb461a in start_client (argv=<optimized out>, argc=1)
at main.c:1580
#5 main (argc=<optimized out>, argv=<optimized out>) at main.c:1812
(gdb) up
#1 match_sums (f=f@entry=4, s=s@entry=0x5601c6465420,
buf=buf@entry=0x5601c6465450, len=107390828544) at match.c:396
396 in match.c
(gdb) info locals
sum_len = <optimized out>
(gdb) print *f
Cannot access memory at address 0x4
(gdb) print *s
$1 = {flength = 107390828544, sums = 0x7f8a6e030010, count = 1638654,
blength = 65536, remainder = 0, s2length = 5}
(gdb) up
#2 0x00005601c4fca6c1 in send_files (f_in=f_in@entry=5,
f_out=f_out@entry=4) at sender.c:389
389 sender.c: Datei oder Verzeichnis nicht gefunden.
(gdb) info locals
fd = 3
s = 0x5601c6465420
mbuf = 0x5601c6465450
st = {st_dev = 48, st_ino = 261, st_nlink = 1, st_mode = 33152, st_uid
= 64055, st_gid = 64055, __pad0 = 0, st_rdev = 0, st_size =
107390828544, st_blksize = 4096, st_blocks = 151051736, st_atim =
{tv_sec = 1619705294,
tv_nsec = 576804175}, st_mtim = {tv_sec = 1619705289, tv_nsec =
236796467}, st_ctim = {tv_sec = 1619705294, tv_nsec = 556804145},
__glibc_reserved = {0, 0, 0}}
fname = "win10.qcow2", '\000' <repeats 3453 times>...
xname = "\000in10.qcow2\000virt/images\000win10.qcow2", '\000' <repeats
1117 times>...
path = 0x5601c643bb40 "/var/lib/libvirt/images"
slash = 0x5601c5005213 "/"
fnamecmp_type = 128 '\200'
iflags = 32776
xlen = -1
file = <optimized out>
phase = 0
max_phase = 2
itemizing = 0
log_code = FLOG
f_xfer = 4
save_io_error = 0
ndx = <optimized out>
j = <optimized out>
(gdb) print f_
f_in f_name f_name_buf f_name_cmp
f_name_has_prefix f_out f_owner_ex f_xfer
(gdb) print f_in
$2 = 5
(gdb) print f_out
$3 = 4
Any other data I can get you?
# rsync -va --inplace --no-whole-file --progress --stats
/var/lib/libvirt/images/... /mnt/tmp4/
0x00005619df81a4c0 in hash_search (len=<optimized out>, buf=<optimized
out>, s=<optimized out>, f=<optimized out>) at match.c:206
206 match.c: Datei oder Verzeichnis nicht gefunden.
(gdb) bt
#0 0x00005619df81a4c0 in hash_search (len=<optimized out>,
buf=<optimized out>, s=<optimized out>, f=<optimized out>) at
match.c:206
#1 match_sums (f=f@entry=4, s=s@entry=0x5619e04333a0,
buf=buf@entry=0x5619e04333d0, len=107390828544) at match.c:396
#2 0x00005619df80e6c1 in send_files (f_in=f_in@entry=5,
f_out=f_out@entry=4) at sender.c:389
#3 0x00005619df8189ac in client_run (f_in=5, f_out=4,
pid=pid@entry=668556, argc=argc@entry=1, argv=argv@entry=0x5619e0409660)
at main.c:1317
#4 0x00005619df7f861a in start_client (argv=<optimized out>, argc=1) at
main.c:1580
#5 main (argc=<optimized out>, argv=<optimized out>) at main.c:1812
(gdb) info locals
l = <optimized out>
done_csum2 = 1
hash_entry = <optimized out>
i = 667490
prev = 0x7fbb95ded798
aligned_i = <optimized out>
more = <optimized out>
aligned_offset = <optimized out>
end = 107390697473
k = 131072
want_i = <optimized out>
backup = <optimized out>
s1 = 1310720
sum = <optimized out>
offset = 45707824536
sum2 = "1\205_\250A]\025\370SF\303\362\342>k\243"
s2 = 785055744
map = <optimized out>
offset = <optimized out>
aligned_offset = <optimized out>
end = <optimized out>
k = <optimized out>
want_i = <optimized out>
aligned_i = <optimized out>
backup = <optimized out>
sum2 = {<optimized out> <repeats 16 times>}
s1 = <optimized out>
s2 = <optimized out>
sum = <optimized out>
more = <optimized out>
map = <optimized out>
null_hash = <optimized out>
check_want_i = <optimized out>
done_csum2 = <optimized out>
hash_entry = <optimized out>
i = <optimized out>
prev = <optimized out>
l = <optimized out>
(gdb) info registers
rax 0xffffffff 4294967295
rbx 0x1 1
rcx 0x7fbb94477010 140443623321616
rdx 0x5 5
rsi 0x19aad850 430626896
rdi 0x855fa841 2237638721
rbp 0x7fbb95ded760 0x7fbb95ded760
rsp 0x7ffd76aa33b0 0x7ffd76aa33b0
r8 0x0 0
r9 0x1 1
r10 0x5619e04333a0 94669136671648
r11 0x7fbb95ded798 140443650021272
r12 0x1976750 26699600
r13 0xa2f62 667490
r14 0xa2f62 667490
r15 0xaa4660d98 45707824536
rip 0x5619df81a4c0 0x5619df81a4c0 <match_sums+1328>
eflags 0x283 [ CF SF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
Yeah, that's not that helpful... but that might be:
(gdb) l
warning: Source file is more recent than executable.
201 int32 l;
202
203 /* When updating in-place, the chunk's
offset must be
204 * either >= our offset or identical
data at that offset.
205 * Remove any bypassed entries that we
can never use. */
206 if (updating_basis_file &&
s->sums[i].offset < offset
207 && !(s->sums[i].flags &
SUMFLG_SAME_OFFSET)) {
208 *prev = s->sums[i].chain;
209 continue;
210 }
(gdb) n
211 prev = &s->sums[i].chain;
(gdb) n
213 if (sum != s->sums[i].sum1)
(gdb) n
217 l = (int32)MIN((OFF_T)s->blength,
len-offset);
(gdb) n
218 if (l != s->sums[i].len)
(gdb) n
221 if (DEBUG_GTE(DELTASUM, 3)) {
(gdb) n
227 if (!done_csum2) {
(gdb) n
233 if
(memcmp(sum2,s->sums[i].sum2,s->s2length) != 0) {
(gdb) n
234 false_alarms++;
(gdb) n
235 continue;
(gdb) n
307 } while ((i = s->sums[i].chain) >= 0);
(gdb) n
206 if (updating_basis_file &&
s->sums[i].offset < offset
(gdb) n
211 prev = &s->sums[i].chain;
(gdb) n
213 if (sum != s->sums[i].sum1)
(gdb) n
217 l = (int32)MIN((OFF_T)s->blength,
len-offset);
(gdb) n
218 if (l != s->sums[i].len)
(gdb) n
221 if (DEBUG_GTE(DELTASUM, 3)) {
(gdb) n
227 if (!done_csum2) {
(gdb) n
233 if
(memcmp(sum2,s->sums[i].sum2,s->s2length) != 0) {
(gdb) n
234 false_alarms++;
(gdb) print i
$1 = 667489
(gdb) print s->sums[i].chain
value has been optimized out
(gdb) print false_alarms
$2 = -745410538
(gdb) n
235 continue;
(gdb) n
307 } while ((i = s->sums[i].chain) >= 0);
(gdb) n
206 if (updating_basis_file &&
s->sums[i].offset < offset
(gdb) print i
$3 = 667488
(gdb) b 206
Breakpoint 1 at 0x5619df81a4a3: file match.c, line 206.
(gdb) c
Continuing.
Breakpoint 1, hash_search (len=<optimized out>, buf=<optimized out>,
s=<optimized out>, f=<optimized out>) at match.c:206
206 if (updating_basis_file &&
s->sums[i].offset < offset
(gdb) print i
$4 = 667487
(gdb) c
Continuing.
Breakpoint 1, hash_search (len=<optimized out>, buf=<optimized out>,
s=<optimized out>, f=<optimized out>) at match.c:206
206 if (updating_basis_file &&
s->sums[i].offset < offset
(gdb) print i
$5 = 667486
(gdb) c
Continuing.
Breakpoint 1, hash_search (len=<optimized out>, buf=<optimized out>,
s=<optimized out>, f=<optimized out>) at match.c:206
206 if (updating_basis_file &&
s->sums[i].offset < offset
(gdb) print i
$6 = 667485
(gdb) c
Continuing.
Breakpoint 1, hash_search (len=<optimized out>, buf=<optimized out>,
s=<optimized out>, f=<optimized out>) at match.c:206
206 if (updating_basis_file &&
s->sums[i].offset < offset
(gdb) print i
$7 = 667484
This time rsync was hung at
45,706,764,560 42% 16.99MB/s 0:59:06 ^C
Just popping in here the fact that I have two Debian unstable clients backing up to the same BackupPC server (v3), and while one works fine, the other does exhibit a problem similar to this one. I am unsure about the hang and 100% CPU use, but my XferLog file is definitely growing indefinitely, and the backups are taking forever.
Thanks for the update. It most probably is dependent on the data, which means it's not surprising that one client works fine while the other doesn't. It also means it's extremely difficult to track down. Paul
Hello,
TL;DR: we experience the same bug using backuppc, and found upstream rsync
commit f9bb8f76ee is the culprit. Unfortunately the bug only occurs on
some debian 11 machines, not all.
We are seeing the same problem when using backuppc to backup a debian 11
machine: the backup never finishes, because the rsync process on the target
machine ends up doing nothing at all (not using any CPU). Rsync on the
target machine is version 3.2.3-4+deb11u1 (standard debian package).
Debugging with strace on the target debian 11 machine, rsync scans the
whole filesystem as expected, but then it gets stuck doing nothing in a
select loop, forever:
strace: Process 4104384 attached
select(1, [0], [], [0], {tv_sec=52, tv_usec=637639}) = 0 (Timeout)
select(1, [0], [], [0], {tv_sec=60, tv_usec=0}) = 0 (Timeout)
select(1, [0], [], [0], {tv_sec=60, tv_usec=0}) = 0 (Timeout)
select(1, [0], [], [0], {tv_sec=60, tv_usec=0}) = 0 (Timeout)
select(1, [0], [], [0], {tv_sec=60, tv_usec=0}) = 0 (Timeout)
Given this behaviour, it seems likely that rsync ends up running the
"noop_io_until_death()" function and loops there forever.
Thanks to a previous comment from Andre, I focused on the git history
between 3.2.2 and 3.2.3 and found this commit:
f9bb8f76ee ("Change daemon variable & simplify some option code")
and specifically this change that introduces a new condition to keep
running noop_io_until_death():
https://github.com/WayneD/rsync/commit/f9bb8f76ee728bd1391a2b4890ce0281457a7bf2#diff-92194f057884b3287a3a6bf84e6e3b2bf433a556b68562799252a091744e7854L920-L921
Reverting f9bb8f76ee on top of the Debian package indeed fixes the issue
for us on this machine.
The strange thing is: on another debian 11 machine that is configured very
similarly (same rsync version, same backuppc server) but has different
data, we could not reproduce the bug at all.
I'm not familiar with rsync internals, but my guess from reading the code is:
- for some reason, possibly due to the way rsync is started by backuppc,
daemon_connection is sometimes set to -1 ("daemon via socket")
- when rsync has finished working, it goes into the noop_io_until_death()
loop because of the new condition in 3.2.3, waiting for a signal
- but the signal never comes, so rsync never exits
- this causes backuppc to wait forever for rsync to finish
For reference, this is how rsync is started by backupppc through SSH:
/usr/bin/ssh -q -x -o StrictHostKeyChecking=no -l backup $CLIENT_HOSTNAME nice -n 19 sudo /usr/bin/rsync --server --sender --numeric-ids --perms --owner --group -D --links --hard-links --times --block-size=2048 --recursive --checksum-seed=32761 . /
I have opened an upstream bug report here: https://github.com/WayneD/rsync/issues/256
Thanks,
Baptiste
Reverting f9bb8f76ee worked for a few backups, but now it's back to the same hanging issue. And rsync is not stuck in noop_io_until_death() as I thought: it's actually stuck trying to read some data from backuppc through the SSH connection. I have added more details in the upstream bug report: https://github.com/WayneD/rsync/issues/256 In any case, I believe it's completely unrelated to Philipp's issue with rsync taking 100% CPU. But it's definitely the same issue as the one reported by Andreas at the beginning of this bug report. Baptiste
Hello, I also had a reproducible failure on a full backup of a debian 11 pc, which worked with rsync 3.2.2 but failed with rsync 3.3.3.At some point during the backup file contents ended up the the backuppc transfer log and either the backup stalled or got so slow that it never finished. I stayed with rsync 3.2.2 until recently I stumbled upon https://github.com/backuppc/backuppc/issues/369#issuecomment-692431546. The suggested solution of adding the argument --no-msgs2stderr to rsync fixed the issue for me. Unfortunately, this argument is unknown in older versions of rsync (e.g. in version 3.1.3 included in debian 10), so it has to be added to the configuration depending on the version of rsync on the pc to be backed up. I am unsure whether I really had the same issue as originally posted but I observed it with the same versions and wanted to share this information hoping it helps someone. Bastian
Hello Bastian, Many thanks for the link, it does indeed look like the exact same bug! So, this turns out to be a longstanding bug in libfile-rsyncp-perl. I see two ways to fix this problem in Debian: - update libfile-rsyncp-perl to 0.76 in buster. This seems quite unlikely but I'm not 100% familiar with Debian policies. - wait for upstream rsync to (possibly) revert the default behaviour of rsync, that is, disable --msgs2stderr by default. And then wait for this updated version of rsync to land in bullseye. See https://github.com/WayneD/rsync/issues/95#issuecomment-700424731 It also looks quite unlikely. So I guess we need the workarounds for now :/ Baptiste
Hello folks, it seems to me that upstream addressed and fixed the issue round about 4 years ago. Upstream ticket: https://github.com/RsyncProject/rsync/issues/95 Upstream commit: https://github.com/RsyncProject/rsync/commit/4adfdaaf12db26c348b4d6150119b377f9b622c8 Released upstream as rsync 3.2.4 on April 2022: https://download.samba.org/pub/rsync/NEWS#3.2.4 Contained in Debian since "oldstable" (3.2.7). Any objects against closing the issue? Regards, Christian Buhtz