#1140745 openvpn: DCO peer-table desync leaks client instances, causing --max-clients lockout under load (fixed upstream by 7791f535)

Package:
openvpn
Source:
openvpn
Description:
virtual private network daemon
Submitter:
Thomas Nyberg
Date:
2026-06-27 13:53:03 UTC
Severity:
normal
Tags:
#1140745#5
Date:
2026-06-25 14:52:04 UTC
From:
To:
Hello,

Full disclosure: I used Claude for a lot of this work, including the text
below. That said, I was thorough in the testing, careful to verify the bug
and the fix, and have read through the details below -- but I don't want to
pretend there weren't robots involved.

Summary
-------
OpenVPN's DCO (data-channel offload) userspace code in 2.6.x can desync its
client table from the kernel under a peer-deletion storm: it leaks client
instances until the server reaches --max-clients and refuses
new/reconnecting
clients, recoverable only by restarting openvpn. This is fixed upstream
in 2.7
by commit 7791f535 ("dco: process messages immediately after read", fixes
OpenVPN issue #919), but that fix is not in the 2.6 branch, and trixie ships
2.6.14.

This is the userspace counterpart to the kernel-module crash I reported in
#1140548 (openvpn-dco-dkms). A DCO server needs both fixes to be safe under
load: #1140548 stops the kernel crash; this one stops the userspace
client-table desync. (I confirmed the two are independent -- with the
openvpn-dco-dkms fix applied but stock openvpn, the desync still occurs;
with
this openvpn fix applied but the stock buggy module, the kernel still
crashes.)

To be clear, this is a separate bug from #1140548 in a different source
package
(openvpn vs openvpn-dco-dkms) -- not a duplicate; both fixes are
required and
neither substitutes for the other.

The bug
-------
On a busy DCO server, when many peers are deleted in a short window (e.g. a
network blip that makes a large number of peers hit --keepalive timeout at
nearly the same time), libnl delivers multiple netlink messages per
nl_recvmsgs(), but the 2.6 DCO read path stores each notification into
single-slot dco_context_t fields and processes only the last, silently
dropping the rest. openvpn then never reaps those client instances ->
n_clients is never decremented -> the instances leak. Once enough leak, new
and reconnecting clients are rejected with:

   MULTI: new incoming connection would exceed maximum number of clients

and the only recovery is restarting openvpn. Both the management
interface and
the status file report the inflated client count (they read the same
multi_context list).

Reproduced, and the fix tested
------------------------------
Reproduced against the stock Debian package (2.6.14 + openvpn-dco-dkms):
1024
DCO clients, server under heavy CPU load (stress-ng), then drop the server's
inbound VPN packets so all peers time out near-simultaneously. The kernel
deletes all 1024 peers but openvpn is left with a nondeterministic number of
stale instances (one run left 767/1024), and reconnects are then refused.

I built a patched package (2.6.14 + the attached patch) and re-ran the same
load: the desync is gone -- repeated 1024-peer deletion storms under
stress-ng
drain cleanly to 0 with no stale instances, reconnect to 1024 with zero
max-clients rejections, no regressions, and it survives reboot.

Patch
-----
Attached is a quilt patch (DEP-3 header). I generated it with quilt
against the
current openvpn source package, so it slots in at the end of the existing
debian/patches series, and the resulting package builds and passed all
the load
testing above -- i.e. it is ready to drop into debian/patches/ and add
to the
series as-is. Please note it is an *adaptation* of upstream
7791f535, not a clean cherry-pick: 7791f535 depends on two earlier 2.7
commits
that are not in release/2.6 (a699681b, which introduces dco->c, and
7f5a6dea,
which introduces c->multi), and the intervening 2.7 DCO/new-module
rework means
neither those nor 7791f535 apply to 2.6.14 (they conflict in 5 and 7 files
respectively). So the patch reimplements 7791f535's logic against the old
ovpn-dco API and inlines the two prerequisites as small c/multi
back-pointers
in dco_context_t. It is therefore worth a real review (back-pointer
lifetime,
the server/client dispatch, the lock placement) rather than a verbatim diff.

The option I'd like to offer
----------------------------
To be explicit: the attached patch can be applied as-is now -- you do
not need
to wait for upstream to ship anything. It's a forwarded delta (see
below), it
slots into the existing series, and the built package passed the load
testing
above. This would fit the same trixie stable-update path as the
openvpn-dco-dkms fix in #1140548.

Also raised upstream
--------------------
I have also posted this to the openvpn-devel mailing list, asking
whether they
will backport 7791f535 to release/2.6 and in what form (verbatim
prerequisites
vs. a minimal adaptation like mine):


https://sourceforge.net/p/openvpn/mailman/openvpn-devel/thread/5afdb852-eabf-4829-b95f-6a322ed5d56a%40midjourney.com/#msg59351167

That thread is for coordination, not a blocker. Upstream may well decline a
release/2.6 backport at all -- 2.6 + the out-of-tree ovpn-dco module is the
older path, and their answer may be "move to 2.7 + the in-tree module."
If so,
carrying this delta is the practical way to fix 2.6.x users regardless
of what
upstream decides; and if they do land an official release/2.6 commit, it can
simply replace this delta later. (If you carry it, the DEP-3 Forwarded:
field
should point at that mailing-list post once it has an archive URL --
happy to
provide that.)

Full validation details (logs, etc.) available on request.

Thanks,
Thomas Nyberg

#1140745#10
Date:
2026-06-25 21:02:13 UTC
From:
To:
Hello,

Follow-up to correct and add context to my earlier report.

First, a correction: I had adapted upstream commit 7791f535 ("dco: process
messages immediately after read", fixes #919) by hand, not realising the fix
had already been backported to the upstream release/2.6 branch and
shipped in
the OpenVPN 2.6.20 stable release. This was confirmed by Gert Doering on
openvpn-devel:


https://sourceforge.net/p/openvpn/mailman/openvpn-devel/thread/aj2LVcoBm4HVKF8M%40greenie.muc.de/#msg59351313

   7791f535:

https://github.com/OpenVPN/openvpn/commit/7791f5358a5574d4ef1bd27e2d52300c9d98bd72

The upstream backport is not a single commit but a reviewed 3-commit series
(authored on top of v2.6.19, Acked-by Gert Doering):

   1fbbe91d  dco linux: avoid redefining ovpn enums (2.6)

https://github.com/OpenVPN/openvpn/commit/1fbbe91d292fb925f5af73b512d7d1c83abfe714

   876a8cf5  dco: port core/context infrastructure needed for backport
of 7791f53

https://github.com/OpenVPN/openvpn/commit/876a8cf5fd6166a22bfe6b6f37889d3cff3a17c6

   e78a8af2  dco: backport immediate notification processing on Linux and
             FreeBSD   (Github: fixes OpenVPN/openvpn#919)

https://github.com/OpenVPN/openvpn/commit/e78a8af2f5ce5ef3bbfefc2dc8efeca84027c018

This series does NOT cleanly cherry-pick onto the trixie base (2.6.14):
876a8cf5/e78a8af2 conflict because they sit on top of the DCO peer-float
backport and the DCO/persist-tun reconnect repair that landed in
2.6.15-2.6.19:

   b0b123b3  dco: backport OS-independent part of peer float support

https://github.com/OpenVPN/openvpn/commit/b0b123b3a7d6b64e236bc0b9836cb73d76c130e2

   3c9fe881  dco: support float notifications on FreeBSD

https://github.com/OpenVPN/openvpn/commit/3c9fe881207df94e938ba7325a0cd46765d6ba6c

   fae4a9e3  Repair interaction between DCO and persist-tun after
reconnection

https://github.com/OpenVPN/openvpn/commit/fae4a9e3f51554bdecc9df45344135006da1f0d9

So I see two honest options, each with a real downside, and I'm genuinely
unsure which fits stable policy best:

  1) The targeted patch I originally attached. It is significantly
smaller and
     lower-risk than the full upstream delta, so as a minimal stable fix it
     isn't unreasonable. The downside is that it diverges from upstream:
it is
     a reimplementation rather than what upstream shipped, and it would make
     pulling in any future DCO fixes harder, since later cherry-picks would
     conflict against a non-upstream base. I'm not really comfortable
with that.

  2) Take the upstream fix as released, i.e. effectively the whole
     2.6.15-2.6.19 DCO delta (the 2.6.20 release). This is
upstream-aligned and
     keeps future maintenance clean, but it is a much larger change than a
     typical stable update and would be the SRM's call.

I'll defer to the maintainer and the SRM on which (if either) is appropriate
for trixie. I'm happy to help test or review whichever direction you prefer,
including re-rolling the targeted patch with proper DEP-3 provenance if
that's
the path, or validating a 2.6.20-based build.

For what it's worth, the fix is already available to trixie users today via
trixie-backports (openvpn 2.7.x carries it natively), so there is a
supported
path for anyone hitting this on stable in the meantime.

Cheers,
Thomas