#1140849 apt-cacher-ng: aborted expiration leaks _xstore/rsnap snapshots, self-reinforcing into permanent expiration failure

Package:
apt-cacher-ng
Source:
apt-cacher-ng
Description:
caching proxy server for software repositories
Submitter:
Matt Shirel
Date:
2026-06-27 18:23:02 UTC
Severity:
normal
#1140849#5
Date:
2026-06-27 18:05:18 UTC
From:
To:
Dear Maintainer,

When a daily expiration run (cron.daily/apt-cacher-ng -> acngtool maint)
aborts -- e.g. on a by-hash validation error -- apt-cacher-ng leaves behind
the per-run volatile-index snapshots it created under
<CacheDir>/_xstore/rsnap/. These snapshots are never cleaned up on the abort
path, so they accumulate one-per-run indefinitely. Once one of the
accumulated snapshots is itself inconsistent, it becomes a new permanent
abort trigger: every subsequent expiration trips over it during "Validating
cache contents" and aborts again. A single transient by-hash hiccup thus
wedges expiration permanently; cache pruning stops cache-wide and the only
escape is a manual "rm -rf _xstore/rsnap".

Environment
-----------
  apt-cacher-ng 3.7.4-1+b2 (daemon reports version 3.7.4)
  Debian GNU/Linux 12 (bookworm), amd64
  ExAbortOnProblems = 1 (compiled default, unset in config)
  Cache proxies amd64 repos (security.ubuntu.com, Ubuntu archive) plus arm64
  (ports.ubuntu.com), so by-hash index sets rotate frequently and unevenly.

Steps to reproduce
------------------
  1. Run apt-cacher-ng caching repos that use Acquire-By-Hash (any current
     Debian/Ubuntu mirror), ideally serving more than one client architecture.
  2. Let the daily expiration run over many days. Each run writes a fresh
     snapshot per volatile index under
     <CacheDir>/_xstore/rsnap/.../<dist>/<numeric-fid>.
  3. Induce or wait for one by-hash inconsistency (an InRelease referencing a
     by-hash entry that was only partially fetched). Expiration aborts:

       There were error(s) processing
         ports.ubuntu.com/ubuntu-ports/dists/noble-security/<fid>, ignoring...
       ByHash error at
         ports.ubuntu.com/ubuntu-ports/dists/noble-updates/InRelease
       Validating cache contents...
       Found errors during processing, aborting as requested.

Observed behaviour
------------------
  - After the abort, _xstore/rsnap/ retains every run's snapshot. Here a
    single dist (ports.ubuntu.com/.../noble-security) had 22 accumulated
    snapshot files (one per run over ~2 weeks), all 126 KiB copies of the
    same InRelease.
  - The damaged snapshot was reported by its logical path (without the
    _xstore/rsnap/ prefix), making it look like a live-cache fault when the
    file only exists in the snapshot store.
  - _exfail_cnt grew an entry per failed run; nothing was pruned cache-wide
    for ~2 weeks while the proxy kept serving HTTP 200 (an otherwise silent
    failure).
  - Setting ExAbortOnProblems = 0 does NOT help: the by-hash validation step
    still ends the run with "aborting as requested". Per the documentation
    that option governs the index-update (preparation) step, so operators
    cannot opt out of the abort for this path.

Expected behaviour
------------------
  1. Snapshot hygiene on abort: snapshots created for a run that aborts
     should be cleaned up / not retained, so a failure cannot accumulate
     state that guarantees future failures (a bounded or GC'd snapshot store).
  2. Self-recovery: a damaged snapshot in _xstore/rsnap should be discarded
     and regenerated, not treated as a hard abort clearable only by hand.
  Optionally, the by-hash validation error could skip and re-fetch the single
  inconsistent index rather than aborting the entire cache expiration.

Workaround
----------
  Clearing the working state and stale volatile index metadata, then
  re-running, turns the aborting run into a clean "Done.":

    cd /var/cache/apt-cacher-ng
    rm -rf _xstore/rsnap; rm -f _expending_damaged _exfail_cnt
    find . -type d -path '*/dists/*' -name by-hash -prune -exec rm -rf {} +
    find . -type f -path '*/dists/*' \( -name 'InRelease*' -o -name 'Release'        -o -name 'Release.gpg' -o -name 'Release.head'        -o -name 'Release.[0-9]*' \) -delete
    /usr/lib/apt-cacher-ng/acngtool maint -c /etc/apt-cacher-ng        SocketPath=/run/apt-cacher-ng/socket

Impact
------
  Cache expiration silently stops cache-wide after one transient by-hash
  error; disk usage grows unbounded until noticed (our cache held ~1.3 GB of
  stale metadata that pruned on the first clean run, 2.8 GB -> 1.5 GB);
  recovery requires manual filesystem surgery.

Thanks for maintaining apt-cacher-ng.

Regards,
Matt Shirel

To unsubscribe click: https://link.shirel.com/us/?e=bzLQKVFQzYtI.SWVY8WtCz94u.BTKxgwPkr7p