Dear Maintainer,
While investigating the autopkgtest regression blocking the migration
of vsftpd (3.0.5-1) to testing, I discovered that the newly introduced
WebSocket test case in pycurl consistently deadlocks and fails under
libcurl >= 8.21.0 inside autopkgtest (LXC) environments:
tests/test_websocket.py::test_default_mode_autopongs_server_ping
Correlation with libcurl 8.21.0:
- PASS: Runs against libcurl 8.20.0 (from testing) pass 100% of the time.
- FAIL: Runs against libcurl 8.21.0 (from unstable, including rc3) fail 100% of the time.
This regression was triggered by the upstream security fix for
CVE-2026-11586 in curl 8.21.0.
Root Cause & Deadlock Mechanism:
1. The PycURL Test Loop
In tests/test_websocket.py, the helper _recv catches
BlockingIOError and calls _wait_readable, which polls the
socket only for readability:
r, _, _ = select.select([fd], [], [], timeout) # Empty write list
2. curl 8.21.0 "Lazy PONG" (CVE-2026-11586)
To prevent memory exhaustion, upstream commit 849317ff5c5a5e13f50ec3d0
removed the immediate flush from ws_enc_add_cntrl() and made
PONG sending lazy. The PONG frame is now merely buffered in
ws->pending to be flushed during subsequent I/O.
3. The Deadlock
In non-blocking mode, when curl_ws_recv() consumes the incoming
PING frame from the socket receive buffer, no application-layer data
(TEXT or BINARY) is yet available to return. Consequently,
curl_ws_recv() naturally returns CURLE_AGAIN (raising
BlockingIOError in Python) to yield control.
Crucially, under curl 8.21.0's new "lazy PONG" design, the generated
PONG frame is only buffered in ws->pending and has not yet been
flushed to the socket.
Because the PycURL test loop immediately suspends in a read-only
select() upon receiving BlockingIOError, the client blocks
forever waiting for readability. The server, waiting for the PONG,
sends no further data. Since the client is blocked in select(),
it never invokes any subsequent libcurl API to drive the write
queue, leaving the PONG permanently trapped in ws->pending.
4. The Latent Bug in PycURL's I/O Loop
In non-blocking mode with CONNECT_ONLY, the application has the
obligation to drive the outbound write queue (either by polling
for write-readiness or invoking subsequent libcurl APIs to flush
pending output). The pycurl test helper _wait_readable violates
this by only polling for readability. This latent bug was exposed
by libcurl's new lazy PONG design. While alternative options like
CURLWS_NOAUTOPONG (CURLOPT_WS_OPTIONS) exist, the default mode
requires a robust non-blocking drive.
[Why Increasing Timeout is Ineffective]
Increasing the client-side timeout (even to 60.0s) does not resolve
the deadlock. The mock server's underlying websockets library has
a hardcoded 10.0-second close_timeout:
# websockets/asyncio/connection.py
class Connection:
def __init__(..., close_timeout: float | None = 10, ...):
After exactly 10.0 seconds of waiting for the PONG, the server
forcefully closes the TCP connection (sending a FIN packet). This
wakes the client's select(), which then calls ws_recv() and
immediately fails with CURLE_GOT_NOTHING (52, "Server returned
nothing").
Execution trace obtained by adding print debug messages to the
client-side execution path (60s timeout):
=== PHASE 3: The Deadlock (PING received, PONG trapped) ===
[DEBUG CLIENT] ws_recv raised BlockingIOError at 0.000798s
[DEBUG CLIENT] Entering _wait_readable (remaining=59.493937s) at 0.001020s
[DEBUG CLIENT] Exited _wait_readable at 10.008510s # Trapped for exactly 10.0s until server FIN
[DEBUG CLIENT] Calling ws_recv at 10.008932s
FAILED (pycurl.error: 52, 'Server returned nothing')
strace of pytest running the websocket test case:
18:12:26.595986 recvfrom(11, ..., 65535, 0, ...) = -1 EAGAIN # Read-side yield, socket is empty
18:12:26.596821 pselect6(12, [11], NULL, NULL, ...) <unfinished ...> # Read-only select (writefds is NULL)
...
[10.0-second silence: PONG remains trapped in ws->pending, zero outbound writes on fd 11]
...
18:12:36.603624 close(12) <unfinished ...> # Server close_timeout (10s) expires, sending FIN
18:12:36.603798 <... pselect6 resumed> = 1 (in [11]) # Woken by TCP FIN
18:12:36.604703 recvfrom(11, "", 65535, ...) = 0 # Connection closed (EOF), raising CURLE_GOT_NOTHING
[Proposed Temporary Workaround]
Since a complete fix requires upstream architectural refactoring of
the non-blocking I/O loop, marking this test as xfail is proposed
as a temporary workaround to unblock package migrations in Debian.
diff --git a/tests/test_websocket.py b/tests/test_websocket.py
index d9268f7..0b764db 100644
--- a/tests/test_websocket.py
+++ b/tests/test_websocket.py
@@ -381,6 +381,7 @@ def test_ws_recv_would_block(wscurl, ws_app):
assert exc_info.value.errno == errno.EAGAIN
+@pytest.mark.xfail(run=False, reason="flaky deadlock on non-blocking auto-PONG without write-ready drive")
def test_default_mode_autopongs_server_ping(wscurl, ws_app):
wscurl.setopt(pycurl.URL, ws_app + "/ping-and-report-pong")
wscurl.setopt(pycurl.CONNECT_ONLY, 2)
Thanks,
Keng-Yu Lin