Fabre

#969463 rsync: fails reproducible but unexplained on a specific volume, breaks horribly when started by BackupPC #969463

Package:: rsync

Source:: rsync

Description:: fast, versatile, remote (and local) file-copying tool

Submitter:: Andreas Feldner

Date:: 2026-05-19 08:43:01 UTC

Severity:: important

#969463#5

Date:: 2020-09-03 13:23:47 UTC

From:

To:

Dear Maintainer,

   * What led up to the situation?

I believe the problem started with the latest update of the rsync package end
of 08/2020. A previously working setup of backuppc reproducibly fails for a
specific host and specific directory tree since then.

   * What exactly did you do (or not do) that was effective (or
     ineffective)?

I repeatedly restarted the backup from BackupPC, all failing.
I conducted a test rsynch over ssh directly without BackupPC, all failing.
I checked dmesg for indications of lower level problems, there are none.

   * What was the outcome of this action?

1) With BackupPC

#969463#10

Date:: 2020-09-03 14:27:41 UTC

From:

To:

How are you invoking rsync? Please give the exact command line,
including what you current working directory is.

Are you actually using ssh to connect to localhost and transferring the
data over that ssh link?
If so, then these lines are very strange in the case (1) text:

I would expect an rsync to negotiate the highest protocol version
possible, so this looks like it's talking to a newer version than
itself.

Because of the "errors with program diagnostics" I would have expected
more messages...

It would be most helpful if you could follow the instructions on
https://rsync.samba.org/issues.html especially de second half of
question 3.


Paul

#969463#15

Date:: 2020-09-03 16:27:41 UTC

From:

To:

Hi Paul,

thanks for the fast answer!

"Paul Slootman" paul@debian.org – 3. September 2020 16:27
[...]


User is backuppc, working directory is /var/lib/backuppc. Command line:

sh-5.0$ rsync -vvv --rsh=ssh --numeric-ids --perms --owner --group -D --links --hard-links --times --block-size 48 \
--recursive --one-file-system root@localhost:/var/lib/lxc /mnt/testrsync/ >/mnt/testrsync/stdout.txt 2>/mnt/testrsync/stderr.txt

In this case (2), I issued the command from within a screen session. Here, ssh is used internally by rsync (--rsh=ssh). The idea is to change privilege, as backuppc is allowed to execute rsync via authorized_keys. Also it is nice that the same mechanism works for all hosts to be backed up.

In contrast to case (2), this case is running from within BackupPC. AFAIK, BackupPC_dump directly executes ssh rsync —server —sender … and implements the receiving side itself. So, I would conclude that BackupPC_dump implements protocol version 28.

So did I :-)

Thanks for the reference, I will try it later.

Andreas.

#969463#20

Date:: 2020-09-05 10:00:33 UTC

From:

To:

Hi again,

So I did:

sh-5.0$ rsync -vvv --rsh=ssh --rsync-path=/usr/bin/rsync --numeric-ids --perms --owner --group -D --links --hard-links --times --block-size=2048 --recursive --one-file-system root@localhost:/var/lib/lxc /mnt/testrsync/ >/mnt/testrsync/stdout2.txt 2>/mnt/testrsync/stderr2.txt
sh-5.0$ echo $?
13

With the following results:

— begin stderr2.txt
rsync error: errors with program diagnostics (code 13) at log.c(245) [receiver=3.2.3]
rsync: [generator] write error: Broken pipe (32)
— end stderr2.txt

— tail of stdout2.txt
recv_generator(lxc/nextcloud/rootfs/var/backups/nextcloud-confbkp_20200119/extra-apps/groupfolders/l10n/sr.js,519849)
recv_generator(lxc/nextcloud/rootfs/var/backups/nextcloud-confbkp_20200119/extra-apps/groupfolders/l10n/sr.json,519850)
recv_generator(lxc/nextcloud/rootfs/var/backups/nextcloud-confbkp_20200119/extra-apps/groupfolders/l10n/sv.js,519851)
recv_generator(lxc/nextcloud/rootfs/var/backups/nextcloud-confbkp_20200119/extra-apps/groupfolders/l10n/sv.json,519852)
recv_generator(lxc/nextcloud/rootfs/var/backups/nextcloud-confbkp_20200119/extra-apps/groupfolders/l10n/tr.js,519853)
recv_generator(lxc/nextcloud/rootfs/var/backups/nextcloud-confbkp_20200119/extra-apps/groupfolders/l10n/tr.json,519854)
recv_generator(lxc/nextcloud/rootfs/var/backups/nextcloud-confbkp_20200119/extra-apps/groupfolders/l10n/uz.js,519855)
recv_generator(lxc/nextcloud/rootfs/var/backups/nextcloud-confbkp_20200119/extra-apps/groupfolders/l10n/uz.json,519856)
recv_generator(lxc/nextcloud/rootfs/var/backups/nextcloud-confbkp_20200119/extra-apps/groupfolders/l10n/zh_CN.js,519857)
recv_generator(lxc/nextcloud/rootfs/var/backups/nextcloud-confbkp_20200119/extra-apps/groupfolders/l10n/zh_CN.json,519858)
recv_generator(lxc/nextcloud/rootfs/var/backups/nextcloud-confbkp_20200119/extra-apps/groupfolders/l10n/zh_TW.js,519859)
recv_generator(lxc/nextcloud/rootfs/var/backups/nextcloud-confbkp_20200119/extra-apps/groupfolders/l10n/zh_TW.json,519860)
[receiver] _exit_cleanup(code=13, file=log.c, line=245): about to call exit(13)
[generator] _exit_cleanup(code=13, file=io.c, line=1644): about to call exit(13)
— end of stdout2.txt

— tail of sender-side strace
244242 12:26:26 select(2, [0], [1], [0], {tv_sec=60, tv_usec=0}) = 1 (out [1], left {tv_sec=59, tv_usec=999998})
244242 12:26:26 write(1, "\4\200\0\7\364\207\237\312\235\2176X\312\217m\t"..., 32776) = 32776
244242 12:26:26 select(2, [0], [1], [0], {tv_sec=60, tv_usec=0}) = 1 (out [1], left {tv_sec=59, tv_usec=999999})
244242 12:26:26 write(1, "\4\200\0\7\370\37H\305,s\333G42,\261:\"..., 32776) = 24584
244242 12:26:26 select(2, [0], [1], [0], {tv_sec=60, tv_usec=0}) = 2 (in [0], out [1], left {tv_sec=34, tv_usec=971695})
244242 12:26:51 read(0, "", 13405)      = 0
244242 12:26:51 select(2, [], [1], [], {tv_sec=60, tv_usec=0}) = 1 (out [1], left {tv_sec=59, tv_usec=999995})
244242 12:26:51 write(1, "u\226A\256\337\231\325\366\242"..., 8192) = -1 EPIPE (Datenübergabe unterbrochen (broken pipe))
244242 12:26:51 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=244242, si_uid=0} ---
244242 12:26:53 write(2, "rsync: [sender] write error: Broken pipe (32)\n", 46) = -1 EPIPE (Datenübergabe unterbrochen (broken pipe))
244242 12:26:53 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=244242, si_uid=0} ---
244242 12:26:53 rt_sigaction(SIGUSR1, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f563bcb3e30}, NULL, 8) = 0
244242 12:26:53 rt_sigaction(SIGUSR2, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f563bcb3e30}, NULL, 8) = 0
244242 12:26:53 write(2, "rsync error: errors with program diagnostics (code 13) at log.c(245) [sender=3.2.3]\n", 84) = -1 EPIPE (Datenübergabe unterbrochen (broken pipe))
244242 12:26:53 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=244242, si_uid=0} ---
244242 12:26:53 rt_sigaction(SIGUSR1, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f563bcb3e30}, NULL, 8) = 0
244242 12:26:53 rt_sigaction(SIGUSR2, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f563bcb3e30}, NULL, 8) = 0
244242 12:26:53 write(2, "[sender] _exit_cleanup(code=13, file=log.c, line=245): about to call exit(13)\n", 78) = -1 EPIPE (Datenübergabe unterbrochen (broken pipe))
244242 12:26:53 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=244242, si_uid=0} ---
244242 12:26:53 rt_sigaction(SIGUSR1, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f563bcb3e30}, NULL, 8) = 0
244242 12:26:53 rt_sigaction(SIGUSR2, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f563bcb3e30}, NULL, 8) = 0
244242 12:26:53 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=100000000}, 0x7fff8399a960) = 0
244242 12:26:53 exit_group(13)          = ?
244242 12:26:53 +++ exited with 13 +++
— end of strace
Note I drastically shortened the content of the write(1, …) commands in the same style they were shortened anyway; guess it is just file content.

I must admit that I’m none the wiser. The sending half obviously wanted to write exactly the same diagnostic as the receiving half did ("errors with program diagnostics (code 13)“) - but couldn’t because the communication pipe was already broken. It looks as if the sending side was unexpectedly struck from the receiving side closing the connection. Whereas the receiving side doesn’t seem to give any indication of what is going wrong apart from that somewhat generic message.

Hope this helps anyway!

Best regards,
Andreas.

#969463#25

Date:: 2020-11-14 08:48:27 UTC

From:

To:

Hi all,

I also face this issue, and tried the last versions of rsync.

The problem was introduced in rsync_3.2.3-1, downgrading to 3.2.2-2
works.

Best regards,
Andre

#969463#30

Date:: 2021-04-19 07:25:56 UTC

From:

To:

I've got a script that tries to rsync a file to an external harddisk.

	-rw------- 1 libvirt-qemu libvirt-qemu 107390828544 15. Apr 15:07
vm.qcow2


In essence,

	/usr/bin/time rsync -va --inplace --no-whole-file --block-size=65536
--progress --stats \
		/var/lib/libvirt/images/vm.qcow2 /mnt/tmp3/vm.qcow2

That command takes quite a while (I guess checksumming the source file),
then starts transmitting;

	sending incremental file list
	vm.qcow2
	  5,477,040,128   5%  237.65MB/s    0:06:58

At that position rsync reproducible stops to transfer data; there's
no error message, the first rsync process just hangs with 100% CPU.

strace of the 3 processes only shows that the other two are waiting
for data on their socket; the "hanging" one is simply user-bound,
/proc/<pid>/stack is empty.


That happens with both 3.1.3-6 and 3.2.3-4.


Next time I'll try to have debug symbols installed and look at the
process via gdb - right now the file is identical, I copied it via
"cp -a".

#969463#35

Date:: 2021-04-19 07:45:35 UTC

From:

To:

Hi,

Could you also try it without the --block-size=65536 option?


Paul

#969463#40

Date:: 2021-04-28 07:56:32 UTC

From:

To:

# gdb --args rsync -va --inplace --no-whole-file --progress --stats
/var/lib/libvirt/images/win10.qcow2 win10.qcow2

went through without any problem.
So either the "--blocksize" is the reason -- or some bit pattern in the
previous version.

Touching the file and re-trying with checksumming (-c) has nothing to
copy and so did succeed; I'll try gdb when the file was changed.

#969463#45

Date:: 2021-04-30 06:06:46 UTC

From:

To:

This time I used "--block-size=65536" again, and it hang:


    3841 pts/15   S      0:01  \_ -bash
   92536 pts/15   R+     2:36      \_ rsync -va --inplace --no-whole-file
--block-size=65536 --progress --stats 
/var/lib/libvirt/images/win10.qcow2 win10.qcow2
   92537 pts/15   S+     1:28          \_ rsync -va --inplace
--no-whole-file --block-size=65536 --progress --stats 
/var/lib/libvirt/images/win10.qcow2 win10.qcow2
   92538 pts/15   S+     0:08              \_ rsync -va --inplace
--no-whole-file --block-size=65536 --progress --stats 
/var/lib/libvirt/images/win10.qcow2 win10.qcow2


GDB says:

	0x00005601c4fd64b0 in hash_search (len=<optimized out>, buf=<optimized
out>, s=<optimized out>, f=<optimized out>) at match.c:206
	206     match.c: Datei oder Verzeichnis nicht gefunden.
	(gdb) bt
	#0  0x00005601c4fd64b0 in hash_search (len=<optimized out>,
buf=<optimized out>, s=<optimized out>, f=<optimized out>) at
match.c:206
	#1  match_sums (f=f@entry=4, s=s@entry=0x5601c6465420,
buf=buf@entry=0x5601c6465450, len=107390828544) at match.c:396
	#2  0x00005601c4fca6c1 in send_files (f_in=f_in@entry=5,
f_out=f_out@entry=4) at sender.c:389
	#3  0x00005601c4fd49ac in client_run (f_in=5, f_out=4,
pid=pid@entry=92537, argc=argc@entry=1, argv=argv@entry=0x5601c643b660)
at main.c:1317
	#4  0x00005601c4fb461a in start_client (argv=<optimized out>, argc=1)
at main.c:1580
	#5  main (argc=<optimized out>, argv=<optimized out>) at main.c:1812
	(gdb) up
	#1  match_sums (f=f@entry=4, s=s@entry=0x5601c6465420,
buf=buf@entry=0x5601c6465450, len=107390828544) at match.c:396
	396     in match.c
	(gdb) info locals
	sum_len = <optimized out>
	(gdb) print *f
	Cannot access memory at address 0x4
	(gdb) print *s
	$1 = {flength = 107390828544, sums = 0x7f8a6e030010, count = 1638654,
blength = 65536, remainder = 0, s2length = 5}
	(gdb) up
	#2  0x00005601c4fca6c1 in send_files (f_in=f_in@entry=5,
f_out=f_out@entry=4) at sender.c:389
	389     sender.c: Datei oder Verzeichnis nicht gefunden.
	(gdb) info locals
	fd = 3
	s = 0x5601c6465420
	mbuf = 0x5601c6465450
	st = {st_dev = 48, st_ino = 261, st_nlink = 1, st_mode = 33152, st_uid
= 64055, st_gid = 64055, __pad0 = 0, st_rdev = 0, st_size =
107390828544, st_blksize = 4096, st_blocks = 151051736, st_atim =
{tv_sec = 1619705294,
		tv_nsec = 576804175}, st_mtim = {tv_sec = 1619705289, tv_nsec =
236796467}, st_ctim = {tv_sec = 1619705294, tv_nsec = 556804145},
__glibc_reserved = {0, 0, 0}}
	fname = "win10.qcow2", '\000' <repeats 3453 times>...
	xname = "\000in10.qcow2\000virt/images\000win10.qcow2", '\000' <repeats
1117 times>...
	path = 0x5601c643bb40 "/var/lib/libvirt/images"
	slash = 0x5601c5005213 "/"
	fnamecmp_type = 128 '\200'
	iflags = 32776
	xlen = -1
	file = <optimized out>
	phase = 0
	max_phase = 2
	itemizing = 0
	log_code = FLOG
	f_xfer = 4
	save_io_error = 0
	ndx = <optimized out>
	j = <optimized out>
	(gdb) print f_
	f_in               f_name             f_name_buf         f_name_cmp
     f_name_has_prefix  f_out              f_owner_ex         f_xfer
	(gdb) print f_in
	$2 = 5
	(gdb) print f_out
	$3 = 4

Any other data I can get you?

#969463#50

Date:: 2021-06-01 07:00:15 UTC

From:

To:

# rsync -va --inplace --no-whole-file --progress --stats
/var/lib/libvirt/images/... /mnt/tmp4/


0x00005619df81a4c0 in hash_search (len=<optimized out>, buf=<optimized
out>, s=<optimized out>, f=<optimized out>) at match.c:206
206     match.c: Datei oder Verzeichnis nicht gefunden.
(gdb) bt
#0  0x00005619df81a4c0 in hash_search (len=<optimized out>,
buf=<optimized out>, s=<optimized out>, f=<optimized out>) at
match.c:206
#1  match_sums (f=f@entry=4, s=s@entry=0x5619e04333a0,
buf=buf@entry=0x5619e04333d0, len=107390828544) at match.c:396
#2  0x00005619df80e6c1 in send_files (f_in=f_in@entry=5,
f_out=f_out@entry=4) at sender.c:389
#3  0x00005619df8189ac in client_run (f_in=5, f_out=4,
pid=pid@entry=668556, argc=argc@entry=1, argv=argv@entry=0x5619e0409660)
at main.c:1317
#4  0x00005619df7f861a in start_client (argv=<optimized out>, argc=1) at
main.c:1580
#5  main (argc=<optimized out>, argv=<optimized out>) at main.c:1812
(gdb) info locals
l = <optimized out>
done_csum2 = 1
hash_entry = <optimized out>
i = 667490
prev = 0x7fbb95ded798
aligned_i = <optimized out>
more = <optimized out>
aligned_offset = <optimized out>
end = 107390697473
k = 131072
want_i = <optimized out>
backup = <optimized out>
s1 = 1310720
sum = <optimized out>
offset = 45707824536
sum2 = "1\205_\250A]\025\370SF\303\362\342>k\243"
s2 = 785055744
map = <optimized out>
offset = <optimized out>
aligned_offset = <optimized out>
end = <optimized out>
k = <optimized out>
want_i = <optimized out>
aligned_i = <optimized out>
backup = <optimized out>
sum2 = {<optimized out> <repeats 16 times>}
s1 = <optimized out>
s2 = <optimized out>
sum = <optimized out>
more = <optimized out>
map = <optimized out>
null_hash = <optimized out>
check_want_i = <optimized out>
done_csum2 = <optimized out>
hash_entry = <optimized out>
i = <optimized out>
prev = <optimized out>
l = <optimized out>
(gdb) info registers
rax            0xffffffff          4294967295
rbx            0x1                 1
rcx            0x7fbb94477010      140443623321616
rdx            0x5                 5
rsi            0x19aad850          430626896
rdi            0x855fa841          2237638721
rbp            0x7fbb95ded760      0x7fbb95ded760
rsp            0x7ffd76aa33b0      0x7ffd76aa33b0
r8             0x0                 0
r9             0x1                 1
r10            0x5619e04333a0      94669136671648
r11            0x7fbb95ded798      140443650021272
r12            0x1976750           26699600
r13            0xa2f62             667490
r14            0xa2f62             667490
r15            0xaa4660d98         45707824536
rip            0x5619df81a4c0      0x5619df81a4c0 <match_sums+1328>
eflags         0x283               [ CF SF IF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0


Yeah, that's not that helpful... but that might be:


(gdb) l
warning: Source file is more recent than executable.
201                             int32 l;
202
203                             /* When updating in-place, the chunk's
offset must be
204                              * either >= our offset or identical
data at that offset.
205                              * Remove any bypassed entries that we
can never use. */
206                             if (updating_basis_file &&
s->sums[i].offset < offset
207                              && !(s->sums[i].flags &
SUMFLG_SAME_OFFSET)) {
208                                     *prev = s->sums[i].chain;
209                                     continue;
210                             }
(gdb) n
211                             prev = &s->sums[i].chain;
(gdb) n
213                             if (sum != s->sums[i].sum1)
(gdb) n
217                             l = (int32)MIN((OFF_T)s->blength,
len-offset);
(gdb) n
218                             if (l != s->sums[i].len)
(gdb) n
221                             if (DEBUG_GTE(DELTASUM, 3)) {
(gdb) n
227                             if (!done_csum2) {
(gdb) n
233                             if
(memcmp(sum2,s->sums[i].sum2,s->s2length) != 0) {
(gdb) n
234                                     false_alarms++;
(gdb) n
235                                     continue;
(gdb) n
307                     } while ((i = s->sums[i].chain) >= 0);
(gdb) n
206                             if (updating_basis_file &&
s->sums[i].offset < offset
(gdb) n
211                             prev = &s->sums[i].chain;
(gdb) n
213                             if (sum != s->sums[i].sum1)
(gdb) n
217                             l = (int32)MIN((OFF_T)s->blength,
len-offset);
(gdb) n
218                             if (l != s->sums[i].len)
(gdb) n
221                             if (DEBUG_GTE(DELTASUM, 3)) {
(gdb) n
227                             if (!done_csum2) {
(gdb) n
233                             if
(memcmp(sum2,s->sums[i].sum2,s->s2length) != 0) {
(gdb) n
234                                     false_alarms++;
(gdb) print i
$1 = 667489
(gdb) print s->sums[i].chain
value has been optimized out
(gdb) print false_alarms
$2 = -745410538
(gdb) n
235                                     continue;
(gdb) n
307                     } while ((i = s->sums[i].chain) >= 0);
(gdb) n
206                             if (updating_basis_file &&
s->sums[i].offset < offset
(gdb) print i
$3 = 667488
(gdb) b 206
Breakpoint 1 at 0x5619df81a4a3: file match.c, line 206.
(gdb) c
Continuing.

Breakpoint 1, hash_search (len=<optimized out>, buf=<optimized out>,
s=<optimized out>, f=<optimized out>) at match.c:206
206                             if (updating_basis_file &&
s->sums[i].offset < offset
(gdb) print i
$4 = 667487
(gdb) c
Continuing.

Breakpoint 1, hash_search (len=<optimized out>, buf=<optimized out>,
s=<optimized out>, f=<optimized out>) at match.c:206
206                             if (updating_basis_file &&
s->sums[i].offset < offset
(gdb) print i
$5 = 667486
(gdb) c
Continuing.

Breakpoint 1, hash_search (len=<optimized out>, buf=<optimized out>,
s=<optimized out>, f=<optimized out>) at match.c:206
206                             if (updating_basis_file &&
s->sums[i].offset < offset
(gdb) print i
$6 = 667485
(gdb) c
Continuing.

Breakpoint 1, hash_search (len=<optimized out>, buf=<optimized out>,
s=<optimized out>, f=<optimized out>) at match.c:206
206                             if (updating_basis_file &&
s->sums[i].offset < offset
(gdb) print i
$7 = 667484


This time rsync was hung at
  45,706,764,560  42%   16.99MB/s    0:59:06  ^C

#969463#55

Date:: 2021-06-26 02:55:04 UTC

From:

To:

Just popping in here the fact that I have two Debian unstable
clients backing up to the same BackupPC server (v3), and while one
works fine, the other does exhibit a problem similar to this one. I
am unsure about the hang and 100% CPU use, but my XferLog file is
definitely growing indefinitely, and the backups are taking forever.

#969463#60

Date:: 2021-06-26 11:48:41 UTC

From:

To:

Thanks for the update.

It most probably is dependent on the data, which means it's not
surprising that one client works fine while the other doesn't.
It also means it's extremely difficult to track down.

Paul

#969463#65

Date:: 2021-11-17 15:14:23 UTC

From:

To:

Hello,

TL;DR: we experience the same bug using backuppc, and found upstream rsync
commit f9bb8f76ee is the culprit.  Unfortunately the bug only occurs on
some debian 11 machines, not all.

We are seeing the same problem when using backuppc to backup a debian 11
machine: the backup never finishes, because the rsync process on the target
machine ends up doing nothing at all (not using any CPU).  Rsync on the
target machine is version 3.2.3-4+deb11u1 (standard debian package).

Debugging with strace on the target debian 11 machine, rsync scans the
whole filesystem as expected, but then it gets stuck doing nothing in a
select loop, forever:

    strace: Process 4104384 attached
    select(1, [0], [], [0], {tv_sec=52, tv_usec=637639}) = 0 (Timeout)
    select(1, [0], [], [0], {tv_sec=60, tv_usec=0}) = 0 (Timeout)
    select(1, [0], [], [0], {tv_sec=60, tv_usec=0}) = 0 (Timeout)
    select(1, [0], [], [0], {tv_sec=60, tv_usec=0}) = 0 (Timeout)
    select(1, [0], [], [0], {tv_sec=60, tv_usec=0}) = 0 (Timeout)

Given this behaviour, it seems likely that rsync ends up running the
"noop_io_until_death()" function and loops there forever.

Thanks to a previous comment from Andre, I focused on the git history
between 3.2.2 and 3.2.3 and found this commit:

    f9bb8f76ee ("Change daemon variable & simplify some option code")

and specifically this change that introduces a new condition to keep
running noop_io_until_death():

https://github.com/WayneD/rsync/commit/f9bb8f76ee728bd1391a2b4890ce0281457a7bf2#diff-92194f057884b3287a3a6bf84e6e3b2bf433a556b68562799252a091744e7854L920-L921

Reverting f9bb8f76ee on top of the Debian package indeed fixes the issue
for us on this machine.

The strange thing is: on another debian 11 machine that is configured very
similarly (same rsync version, same backuppc server) but has different
data, we could not reproduce the bug at all.

I'm not familiar with rsync internals, but my guess from reading the code is:

- for some reason, possibly due to the way rsync is started by backuppc,
  daemon_connection is sometimes set to -1 ("daemon via socket")

- when rsync has finished working, it goes into the noop_io_until_death()
  loop because of the new condition in 3.2.3, waiting for a signal

- but the signal never comes, so rsync never exits

- this causes backuppc to wait forever for rsync to finish

For reference, this is how rsync is started by backupppc through SSH:

    /usr/bin/ssh -q -x -o StrictHostKeyChecking=no -l backup $CLIENT_HOSTNAME nice -n 19 sudo /usr/bin/rsync --server --sender --numeric-ids --perms --owner --group -D --links --hard-links --times --block-size=2048 --recursive --checksum-seed=32761 . /

I have opened an upstream bug report here: https://github.com/WayneD/rsync/issues/256

Thanks,
Baptiste

#969463#70

Date:: 2021-11-17 16:23:13 UTC

From:

To:

Reverting f9bb8f76ee worked for a few backups, but now it's back to the
same hanging issue.

And rsync is not stuck in noop_io_until_death() as I thought: it's actually
stuck trying to read some data from backuppc through the SSH connection.

I have added more details in the upstream bug report: https://github.com/WayneD/rsync/issues/256

In any case, I believe it's completely unrelated to Philipp's issue with
rsync taking 100% CPU.  But it's definitely the same issue as the one
reported by Andreas at the beginning of this bug report.

Baptiste

#969463#75

Date:: 2021-11-19 13:02:29 UTC

From:

To:

Hello,

I also had a reproducible failure on a full backup of a debian 11 pc,
which worked with rsync 3.2.2 but failed with rsync 3.3.3.At some point
during the backup file contents ended up the the backuppc transfer log
and either the backup stalled or got so slow that it never finished.

I stayed with rsync 3.2.2 until recently I stumbled upon
https://github.com/backuppc/backuppc/issues/369#issuecomment-692431546.
The suggested solution of adding the argument --no-msgs2stderr to rsync
fixed the issue for me. Unfortunately, this argument is unknown in older
versions of rsync (e.g. in version 3.1.3 included in debian 10), so it
has to be added to the configuration depending on the version of rsync
on the pc to be backed up.

I am unsure whether I really had the same issue as originally posted but
I observed it with the same versions and wanted to share this
information hoping it helps someone.

Bastian

#969463#80

Date:: 2021-11-23 17:29:37 UTC

From:

To:

Hello Bastian,

Many thanks for the link, it does indeed look like the exact same bug!

So, this turns out to be a longstanding bug in libfile-rsyncp-perl.  I see
two ways to fix this problem in Debian:

- update libfile-rsyncp-perl to 0.76 in buster.  This seems quite unlikely
  but I'm not 100% familiar with Debian policies.

- wait for upstream rsync to (possibly) revert the default behaviour of
  rsync, that is, disable --msgs2stderr by default.  And then wait for
  this updated version of rsync to land in bullseye.
  See https://github.com/WayneD/rsync/issues/95#issuecomment-700424731
  It also looks quite unlikely.

So I guess we need the workarounds for now :/

Baptiste

#969463#85

Date:: 2026-05-19 08:41:29 UTC

From:

To:

Hello folks,

it seems to me that upstream addressed and fixed the issue round about 4
years ago.
Upstream ticket: https://github.com/RsyncProject/rsync/issues/95
Upstream commit:
https://github.com/RsyncProject/rsync/commit/4adfdaaf12db26c348b4d6150119b377f9b622c8

Released upstream as rsync 3.2.4 on April 2022:
https://download.samba.org/pub/rsync/NEWS#3.2.4

Contained in Debian since "oldstable" (3.2.7).

Any objects against closing the issue?

Regards,
Christian Buhtz

#969463 rsync: fails reproducible but unexplained on a specific volume, breaks horribly when started by BackupPC #969463

Just Reply to ...

Reply to submitter ...

Send control command (Silently)

Set Architecture Tags (Silently)