- Package:
- src:systemd
- Source:
- src:systemd
- Submitter:
- Michael Biebl
- Date:
- 2025-06-14 20:07:01 UTC
- Severity:
- normal
- Tags:
Looking at https://ci.debian.net/packages/s/systemd/unstable/amd64/ , systemd has been failing on debci since about the beginning of May. Asking around on #debci, this might be kernel related, as the debci related systems were upgraded to bookworm around that time.
Small update: I can reproduce the failures in a bookworm (qemu) VM, using LXC. Only upgrading the kernel to the one from trixie [1] is sufficient to make autopkgtest pass. [1] 6.4.0-2-amd64
The plot thickens... Am 23.08.23 um 13:20 schrieb Michael Biebl: For completeness sake the failing tests are: # autopkgtest systemd -- lxc autopkgtest-bookworm 784s hostnamed FAIL non-zero exit status 1 784s localed-locale FAIL non-zero exit status 1 784s localed-x11-keymap FAIL non-zero exit status 1 784s networkd-test.py FAIL non-zero exit status 1 784s boot-and-services FAIL non-zero exit status 1 784s unit-tests FAIL non-zero exit status 1 # autopkgtest systemd -- lxc autopkgtest-trixie 782s hostnamed FAIL non-zero exit status 1 782s localed-locale FAIL non-zero exit status 1 782s networkd-test.py FAIL non-zero exit status 1 782s boot-and-services FAIL non-zero exit status 1 Running e.g. # autopkgtest --test-name=hostnamed systemd -- lxc autopkgtest-trixie I see the following error in the journal: Aug 23 14:23:50 debian audit[4096]: AVC apparmor="DENIED" operation="file_lock" profile="lxc-autopkgtest-lxc-iomhit_</var/lib/lxc>" pid=4096 comm="(ostnamed)" family="unix" sock_type="dgram" protocol=0 requested_mask="send" Aug 23 14:23:50 debian kernel: audit: type=1400 audit(1692793430.788:33): apparmor="DENIED" operation="file_lock" profile="lxc-autopkgtest-lxc-iomhit_</var/lib/lxc>" pid=4096 comm="(ostnamed)" family="unix" sock_type="dgram" protocol=0 requested_mask="send" Aug 23 14:23:50 debian kernel: audit: type=1400 audit(1692793430.788:34): apparmor="DENIED" operation="file_lock" profile="lxc-autopkgtest-lxc-iomhit_</var/lib/lxc>" pid=4096 comm="(ostnamed)" family="unix" sock_type="dgram" protocol=0 requested_mask="send" Aug 23 14:23:50 debian audit[4096]: AVC apparmor="DENIED" operation="file_lock" profile="lxc-autopkgtest-lxc-iomhit_</var/lib/lxc>" pid=4096 comm="(ostnamed)" family="unix" sock_type="dgram" protocol=0 requested_mask="send" With the 6.4 kernel, no such error happens. So, this looks to me like an AppArmor issue, thus reassigning to the apparmor package. Dear AppArmor maintainers: can you please have a look? If you need further information, please let me know. Regards, Michael
Am 23.08.23 um 14:32 schrieb Michael Biebl: It appears this was already reported separately as https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1038315 and the corresponding upstream bug https://github.com/lxc/lxc/issues/4333 Apparently any service using PrivateNetwork=yes and running inside lxc, will trigger this AppArmor violation.
What we found so far is, that the AppArmor policy of lxc breaks any systemd service using PrivateNetwork=yes or PrivateIPC=yes when being run under lxc (running under bookworm using the bookworm kernel). I wonder what the best course of action is here. Should we disable the AA policy of lxc via a stable upload of the lxc package until the root cause is found? Unfortunately I know too little about AppArmor and lxc's AppArmor policy and my attempts to ask around for help weren't successful so far. Regards, Michael
Am 31.08.23 um 08:41 schrieb Michael Biebl: I.e. by setting `lxc.apparmor.profile = unconfined` in /etc/lxc/default.conf and regenerating the autopkgtest container on bookworm, the failures are gone.
Hello everyone, same case for systemd services using DynamicUser=yes Kind regards, Dan
Hello, Am Donnerstag, 31. August 2023, 08:41:59 CEST schrieb Michael Biebl: Two quick hints, but let me warn you that I'm not familiar with lxc and also didn't check the content of the lxc-autopkgtest-lxc-iomhit_* profile. https://github.com/lxc/lxc/issues/4333 indicates that this issue was fixed in (much) a newer kernel - but that's probably not news to you since you wrote that comment ;-) That said - the DENIED log entry translates to unix send type=dgram, You could try if adding this rule to the lxc-autopkgtest-lxc-iomhit_* profile helps - but if the issue is really on the kernel side, my hope is limited). For testing, you could also try with a more broad unix send, or even unix, rule - but please don't add these broader rules to the production profile. Regards, Christian Boltz
Am 31.08.23 um 19:54 schrieb Christian Boltz: The profile above seems to be autogenerated and I only found a binary file with that name in /var/cache/apparmor. The only way to fix the container was to use the aforementioned `lxc.apparmor.profile = unconfined`. I think we should do that as the breakage is rather widespread and I already see individual packages trying to work around that to at least keep debci afloat. See e.g.: https://salsa.debian.org/systemd-team/systemd/-/merge_requests/211 https://salsa.debian.org/debian/pdns/-/commit/637e54ef73386541086da430553b82db78266bac or disabling the systemd hardening options completely_ https://salsa.debian.org/utopia-team/polkit/-/blob/master/debian/patches/debian/Don-t-use-PrivateNetwork-yes-for-the-systemd-unit.patch This is not a good outcome of this and the problem will become more apparent with debci running on bookworm now. Regards, Michael
Am 01.09.23 um 13:23 schrieb Michael Biebl: I went ahead and submitted https://salsa.debian.org/lxc-team/lxc/-/merge_requests/18 since I don't see another solution atm. Looping in the release team as well for their input. Regards, Michael
I don't think we have a good understanding of the root cause of this issue. Initially we thought this was a known upstream issue with all- but very recent versions of apparmor and a corresponding lxc profile fix [0]. However, it appears this is a different issue that somehow depends on the interaction of bookworm's versions of the kernel, apparmor, and/or lxc. A minimal reproducer is to install bookworm and create a container with a systemd service using a hardening option like PrivateNetwork=yes. With the latest bookworm kernel (6.1.38-4), the service will fail. But, grab a kernel from testing (6.4.11-1) and then things work -- with no other changes required. I tried the "oldest" kernel on snapshot.d.o post 6.1 series (6.3.1+1~exp1 [1]) and the service works properly with that version as well. So, something changed in the kernel (either upstream or in Debian's packaging) between 6.1 and 6.3 that "unbreaks" services within lxc containers. Given that simply installing a newer kernel fixes things, I am hesitant to start making changes to lxc until we actually understand what's changed when running the newer kernel and how it's affecting lxc's behavior. I have tried tweaking the apparmor profile that's generated for containers (the relevant part is defined in the variable AA_PROFILE_UNIX_SOCKETS in src/lxc/lsm/apparmor.c), but haven't had any success in a workaround. I am not super familiar with apparmor, so maybe I'm not specifying things right, but I've previously tried the sort of rules Christian suggested, none of which have had any affect. I strongly dislike the idea of blanketly disabling apparmor profiles by default for all lxc installs, since apparmor is one of the ways of helping to ensure isolation of containers. For the specific instance of debci, /etc/lxc/default.conf can be modified post-lxc install to change lxc.apparmor.profile from "generated" to "unconfined" for the time being. Mathias--- [0] -- https://github.com/lxc/lxc/issues/4333 [1] -- https://snapshot.debian.org/package/linux-signed-amd64/6.3.1%2B1~exp1/
Thanks for the investigation. This led to think of something that would work around this issue, but maybe has bigger consequences. I'm wondering whether we should, as a policy, run backports kernels on the ci.debian.net workers. Given the most important use case is testing testing¹, having a kernel that is closest to the one in testing might make sense. ¹ pun intended Of course, this does not prevents having QEMU workers, and I want to provide that at some point. But since we won't be able to have QEMU for all architectures, anyway, I still think running backports kernels in the lxc workers might be a valid strategy.
Hi everyone Am 02.09.23 um 13:09 schrieb Antonio Terceiro: Nod Right, these are my findings as well. I also tested downgrading apparmor to 2.13.6-10 (i.e. the version from oldstable) on a bookworm system. This was also sufficient to unbreak lxc. So it "looks" like apparmor 3.x makes assumptions about the kernel that are not fulfilled by the kernel 6.1.x in bookworm. especially/mainly for debci. I guess we have three options here: a/ upgrade the kernels to the one from backports as suggested by Antonio b/ disable apparmor confinement for lxc on debci via some debci specific configuration c/ disable apparmor confinement for lxc in bookworm via a stable upload of the lxc package The MR I proposed is c/, as I don't know how to implement a/ or b/. That said, I would be fine with a/ and b/ as well, as this would buy us time to investigate this issue without being under the pressure of causing debci failures. Those debci failures are hard to debug and I would like to avoid having individual maintainers waste time on it. Do the debci maintainers / lxc maintainers / release team have any preference regarding a/, b/ and c/ ? Michael
I'm tentatively raising this to RC, mainly to make this issue more visible for other maintainers.
Hi, I agree with you, but also consider that with this issue being there since ~ April 2023 we don't need to rush. What I fear a bit, is that if we do either of the three, Debian infra is not affected anymore which removes some incentive to find the root cause. a, b, or c means that Debian maintainers don't need to dive into it anymore, but who knows which downstream project (volunteers or paid alike) will need to look into the problem in the future if we don't fix it inside packaging? One part of me likes the ci.d.n infrastructure to run stable as an example of "eat your own dogfood". Another part of me agrees with Antonio that it makes sense if it would run a backports kernel to be as close as possible to testing as we can reasonably (maintenance wise) can get. Because we have a known issue at hand, the balance goes to backports for me. If Antonio doesn't beat me to it, I'll get to it (although I don't know yet how to do that in our configuration [1] and exclude riscv64 too). I have manually upgraded the s390x host and rebooted, so that can serve as a test arch. Paul [1] https://salsa.debian.org/ci-team/debian-ci-config
Am 03.09.23 um 10:50 schrieb Paul Gevers: ng? https://ci.debian.net/data/autopkgtest/testing/s390x/s/systemd/37374052/log.gz Thanks!
I took a quick look through v6.1..v6.3.1 there is a patch that I think is the likely fix, it first landed in v6.2 1cf26c3d2c4c apparmor: fix apparmor mediating locking non-fs unix sockets it matches up the reported audit logs. Unfortunately it does not have a Fixes tag but as best I can figure it should be applied all the way back to. 56974a6fcfef apparmor: add base infastructure for socket mediation how/where this bug surfaces partly depends on the userspace policy and compiler which combines the features set supported by the kernel with what policy claims to support. So it is possible to have an affected kernel but not trigger the bug.
Hello, Am Samstag, 2. September 2023, 01:13:11 CEST schrieb Mathias Gibbens: I asked in #apparmor, and John answered [11:04:33] <cboltz> can someone have a look at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1050256 ? Short version: Debian gets unix denials when running lxc with kernel 6.1.38 from bookwork, but things work with kernel 6.3.1 [19:19:41] <jjohansen> cboltz: ok, I will try and look at it today [07:00:34] <jjohansen> cboltz: I didn't see anything that would cause unix failures in a first pass. I will take another pass at it tomorrow [10:01:30] <jjohansen> cboltz: commit 1cf26c3d2c4c apparmor: fix apparmor mediating locking non-fs unix sockets So you could test if the bookwork kernel with 1cf26c3d2c4c applied on top fixes the issue. To answer a question from a later mail: Am Sonntag, 3. September 2023, 02:56:05 CEST schrieb Michael Biebl: The difference is in the abi levels - without an abi/ include specified, unix rules don't get enforced (= allow everything), while with abi/3.0 and AppArmor >= 3.x userspace, unix rules get enforced. abi/3.0 got introduced in AppArmor 3.0, and my guess is that the abi/3.0 include was also added to the lxc profile. Actually the explanation might be slightly different (same result, but without abi/3.0 in the lxc profile): It looks like the Debian AppArmor maintainers pinned the abi to /etc/apparmor.d/abi/kernel-5.4-outoftree-network which, like abi/3.0, includes enforcing unix rules. (Note: I'm only looking at https://salsa.debian.org/apparmor-team/apparmor.git/ since I don't have a Debian machine running.) For completeness: 2.13.x doesn't support abi at all (besides ignoring abi/* includes if it finds them in a profile) so even if you have a profile with abi/3.0, unix rules won't be enforced. There's an exception: Ubuntu kernels carry some patches to enable unix and some other rules even with older AppArmor versions. Regards, Christian Boltz
Thanks for the pointer John -- I think that is the fix we've been looking for! Commit 1cf26c3d2c4c doesn't apply cleanly to the v6.1 tree due to the other commits from the patchset of Oct 3, 2022 that modified a bunch of the apparmor code. Because I couldn't quickly cherry-pick all the changes without amassing a large diff, I made the small proof-of- concept patch at the end of this message and applied it to the 6.1.38- 4 kernel from bookworm. Booting with the patched kernel allows services to start up in containers without any issues. :) So, I think the next step should be to get that commit properly backported to the v6.1 longterm tree and included in an upstream release. Hopefully that would be able to happen in enough time so that it is bundled with the kernel updates for bookworm's point release next month. If not, we should be sure to get it into Debian's packaging so at least there's a proper fix available. I'm happy to help test any proposed patch for this fix on my end. Mathias-----
Am 04.09.23 um 20:23 schrieb Mathias Gibbens: Thanks for the update Mathias, this looks very promising. A stable update of the Linux 6.1.x kernel would obviously be the ideal solution. John, could you help with getting this fix into 6.1.x? Regards, Michael
yes, I am working on a patch.
Hi, All ci.d.n workers (except riscv64) now run the kernel from bookworm-backports. systemd passes it's autopkgtest again in unstable, testing and stable. Paul
Hi, Michael Biebl (2023-08-23): I'm sorry I was not able to do so yet. I plan to catch up during the next few days at DebConf. But I know very little about LXC/AppArmor integration so most likely the best I can do is to help connect the right people. Cheers,
Hi again, Thank you all for working both on workarounds for Debian CI and on a proper upstream Linux kernel fix. Impressive cross-team work! :) At this stage it seems clear that the bug and the corresponding ideal fix are in the AppArmor part of src:linux, and the bug affects at least src:apparmor and src:lxc. I'd like to reflect this in the metadata of #1050256 by reassigning the bug to Linux, and adding "affects" indications. I'll do so in the next few days unless someone objects soon. Doing so will also be an opportunity for me to sum up the problem for the maintainers of src:linux, and let them know about our desired timeline: ideally this would be fixed in the upcoming Bookworm point-release. This being said, if said timeline can't be met in src:linux, it'll be up to the maintainers of LXC in Debian to decide what they want to do in the upcoming Bookworm point-release. If I misunderstood something important, please let me know. Cheers,
Am 09.09.23 um 14:20 schrieb intrigeri: +1 It also affects at least src:systemd, src:pdns, src:policykit-1 All those packages have added workarounds for this issue. I'll revert the workaround in systemd and notify the maintainers of pdns and policykit-1. Sounds good to me. For now, given that all the debci hosts are running the backports kernel, I'm downgrading the severity again. When you do the reassignment, you should probably merge this bug report with #1038315 and #1042880, now that we know what the root cause is. Regards, Michael
Hi John, I wanted to check in to see if you've had a chance to work on that patch for the 6.1 kernel. The deadline for package updates being included in the 12.2 point release is in roughly two weeks, but given this will be a patch for the kernel I'd really like to have something tested and handed over to the src:linux team well before then. Thanks, Mathias
Not having heard any objections, please feel free to reassign this bug. As you said, this will give the src:linux maintainers a heads up, even if the patch isn't quite ready yet (but hopefully in time for the 12.2 point release). Mathias
Control: reassign -1 src:linux Control: retitle -1 AppArmor breaks locking non-fs Unix sockets Control: affects -1 src:apparmor src:lxc src:systemd src:pdns src:policykit-1 Control: found -1 6.1.38-1 Control: found -1 6.1.38-2 Control: notfound -1 6.3.1-1~exp1 Hi Debian Kernel Team, In the last month or so, a number of people from various Debian teams and other distributions have been tracking down a regression that affects systems upgraded to Bookworm: services that use certain systemd facilities such as PrivateNetwork=yes fail to start in LXC/LXD containers. Among other things, this breaks the autopkgtests of many packages, such as systemd, on ci.debian.net (#1050256). This was tracked down to a kernel regression, for which a fix landed in Linux 6.2: 1cf26c3d2c4c apparmor: fix apparmor mediating locking non-fs unix sockets Work is ongoing to backport the fix to linux-stable/linux-6.1.y. I'm Cc'ing John and Mathias who have been working on this. FYI, ideally this would be fixed in the upcoming Bookworm point-release (12.2, early October). Current workarounds: - ci.debian.net was upgraded to the bookworm-backports kernel - various packages maintainers have added workarounds such as disabling PrivateNetwork=yes for autopkgtests Cheers,
Dear lxd and systemd maintainers, Michael Biebl (2023-09-11): FTR I did not dare merging these myself: perhaps you want to keep separate bug reports to track workarounds on top of #1050256 that's tracking the root cause, or something. Cheers,
Hi all, We're having issues [1] with the (backports and) unstable kernel on our main amd64 host, so we reverted back to the stable kernel for amd64. Paul [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1052130
close 1050256 6.3.1-1~exp1 tags 1050256 + bookworm upstream thanks
Hi, Thanks for the details. Has this already been sent it to the stable maintainers? I do not see it yet on the stable list. Regards, Salvatore
I believe that John has been working on the fix for the 6.1 branch, although I don't know what the status is. I don't have the necessary familiarity with apparmor internals to attempt to backport the fix myself, but I'll be very happy to test once it's available. Mathias
Hi, We're having issues [2] with the backports kernel on arm64 so our arm64, armhf and armel hosts are back to the previous backports (arm64) kernel. I'm slightly wondering if the next point release (on Saturday) will bring us a fixed kernel for this issue? Given that this is the second time in 3 months we experience an issue with backports kernels, I think we'll have to revert our hosts back to stable kernels for maintainability reasons. Paul [2] https://bugs.debian.org/1057282
Hi Paul, locking non-fs unix sockets") for the 6.1.y stable series has not landed yet so it's not included in the 6.1.64-1 update of the upcoming point release next weekend. John, as it was said you are working on having the fix backpored to linux-6.1.y, is this still WIP? Regards, Salvatore
Hi John,
John, did you had a chance to work on this backport for 6.1.y stable
upstream so we could pick it downstream in Debian in one of the next
stable imports? Cherry-picking 1cf26c3d2c4c ("apparmor: fix apparmor
mediating locking non-fs unix sockets") does not work, if not
havinging the work around e2967ede2297 ("apparmor: compute policydb
permission on profile load") AFAICS, so that needs a 6.1.y specific
backport submitted to stable@vger.kernel.org ?
I think we could have people from this bug as well providing a
Tested-by when necessary. I'm not feeling confident enough to be able
to provide myself such a patch to sent to stable (and you only giving
an Acked-by/Reviewed-by), so if you can help out here with your
upstream hat on that would be more than appreciated and welcome :)
Thanks a lot for your work!
Regards,
Salvatore
I played around with this a bit the past week as well, and came to the same conclusion as Salvatore did that commits e2967ede2297 and 1cf26c3d2c4c need to be cherry-picked back to the 6.1 stable tree. I've attached the two commits rebased onto 6.1.y as patches to this message. Commit e2967ede2297 needed a little bit of touchup to apply cleanly, and 1cf26c3d2c4c just needed adjustments for line number changes. I included some comments at the top of each patch. With these two commits cherry-picked on top of the 6.1.69 kernel, I can boot a bookworm system and successfully start a service within a container that utilizes `PrivateNetwork=yes`. Rebooting back into an unpatched vanilla 6.1.69 kernel continues to show the problem. While I didn't see any immediate issues (ie, `aa-status` and log files looked OK), I don't understand the changes in the first commit well enough to be confident in sending these patches for inclusion in the upstream stable tree on my own. Mathias
Hi John, Do you had a chance to look at this for 6.1.y upstream? Asking/Poking since the point release dates are now clear: https://lists.debian.org/debian-security/2024/01/msg00005.html if possible I would like to include those fixes, but only if they are at least queued fror 6.1.y itself to not diverge from upstream. Otherwise we will wait another round, but which means usually 2 months for the point release cadence. Regards, Salvatore
I am looking at it right now, I should be done with it today
The changes are strictly more than necessary for the fix. They are part of a larger change set that is trying to cleanup the runtime code by changing the permission mapping from a runtime operation to something that is done only at policy load/unpack time. The advantage of this approach is that while it is a larger change than strictly necessary. It is backporting patches that are already upstream, keep the code closer and making backports easier. Georgia did a minimal backport fix by keeping the version as part of policy and doing the permission mapping at runtime. I have included that patch below. Its advantage is it is a minimal change to fix the issue. I am happy with either version going into stable. Do you want to send them or do you want me to do it? Acked-by: John Johansen <john.johansen@canonical.com>
Hi John, Thanks a lot, that is *really* much appreicated! if you can send them that would be great, because think then they come directly from you, the trust from Greg or Sasha is higher. otherwise I think they will then explicitly want an ack on that submission thread from you (or pointing to this Debian downstream bug). Greg will probably want the backport apporach of the two commits if it feasible and we do not expect regression from it. But you are definitively in a better position to judge this :) Thanks again! Regards, Salvatore p.s.: feel free to CC us as well in the upstream stable submission.
On Sun, 28 Jan 2024 10:57:03 +0100 Salvatore Bonaccorso <salvatore.bonaccorso@gmail.com> wrote: stable next apparmor policydb specific be able giving :) came to and tree. this apply kernel, I within a an log commit inclusion in them. Hi John, Is there any update on this? As far as I am aware this patch has not been sent for backporting yet, so apparmor in 6.1 is still borken, and the CI still fails because of it. Is there any chance you could please take care of that, so that we can finally fix this issue? Thanks!
Hi, For those watching this bug: John has prepared backports in his tree, with both approaches: https://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor.git/log/?h=debian-two-patch-1780227 and https://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor.git/log/?h=debian-backport-1780227 (but with the open question which one will be submitted for stable. From upstream stable point of view probably the two patch backport approach would be the preferred one). Regards, Salvatore
https://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor.git/log/?h=debian-two-patch-1780227 https://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor.git/log/?h=debian-backport-1780227 Very nice, thank you! In the meanwhile, I found a way to reliably detecting this and gracefully skipping it in systemd, so debci is now fixed. However, it still results in PrivateNetwork= being quietly disabled, so the backport is still very much needed, as it is a useful security feature.
Hi John, We still have tis issue open for 6.1.y upstream TTBOMK. If you are confident as maintainer with any of the two approaches, would it be possible to submit them for stable? If the preferred one get then accepted and queued, we might already cherry-pick the solution for us, but at this point we can wait for the respective 6.1.y stable version which will include the fix. Regards, Salvatore
Hi John, Friendly ping. Any news here? Regards, Salvatore
Hi John, Anything we can do there to help on the decision which set of fixes could land in the 6.1.y stable series? Would it help if I prod Mathias to test both variants for feedback? Or is there a problem you envision already by trying to backport those fixes to upstream 6.1.y? Thanks for your work, and sorry for pestering you again about it :( Regards, Salvatore
While at it, I noticed that in the above commits for https://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor.git/log/?h=debian-two-patch-1780227 or https://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor.git/log/?h=debian-backport-1780227 it might be worth adding a Link: https://bugs.debian.org/1050256 Do you see any problems with any of the both you prepared? If not, is there soemthing which you miss from us downstream? Regards, Salvatore