- Package:
- debian-policy
- Source:
- debian-policy
- Submitter:
- Sean Whitton
- Date:
- 2021-09-15 15:36:04 UTC
- Severity:
- normal
Hello, Thank you for sharing this link -- it seems like Fedora have thought harder about this than we have, at least at the level of the whole project. We can't jump straight to something as involved in that, but threads like this on -devel suggest to me that Policy's discussing of vendoring needs to be expanded. In particular, Policy should explain /why/ bundling is best avoided, and the consensus that it sometimes has to happen should be noted, along with mention of registering bundled copies with the security team where appropriate.
**why** bundling should be avoided. I spend a lot of time dealing with that when packaging Docker, and at some point I realized that I couldn't even explain to myself why I was spending so much time un-bundling the world out of Docker. I just had a vague understanding that "bundling is bad", and I understand the security issues of bundled code. But I wish I had more details on "how bad it is", just so that I can justify to myself to spend so much time on it. Sometimes the barrier between time well-spent and time wasted is very thin, and you're not sure where you stand. Also, it turns out that sometimes bundling can't be avoided. I don't know if it's possible to come up with some general guidelines on that. We have it documented in the README.source of docker, but it applies to docker special case, and I don't pretend it can be extended to a general case. During all this time when I was questioning myself on the reason to un-bundle, the only official documentation I found was the short paragraph in the Debian Policy [1], which is quite thin. Only now, through the thread in debian-devel, I discover that there is some more information in Wiki. I couldn't find this information when I needed it, but maybe I'm just not good at finding a needle in a haystack ;) All of that to say: I would find it very helpful to have some more "official information" from Debian on bundle/vendored/embedded code. The rationale to un-bundle, and possibly some guidelines to keep bundles. Arnaud [1]: https://www.debian.org/doc/debian-policy/ch-source.html#s-embeddedfiles
Hi, Sean Whitton wrote: My first instinct was that this belongs in devref, not Policy, since it is more about the project than about consistency and interoperability issues that directly affect packaging tools and user experience. But then I realized that the Debian Free Software Guidelines, for example, are part of policy. This topic would similarly be a good fit for ch-archive. Thanks for filing it. Jonathan
Hi, Arnaud Rebillout wrote: For reference, I think you're referring to https://wiki.debian.org/EmbeddedCodeCopies https://wiki.debian.org/UpstreamGuide#No_inclusion_of_third_party_code Thanks for that. It may be a good place to find text to reuse. Jonathan
Over this last year there seems to have been a noticeable divergence of maintainer opinion, on what has become known as vendoring, from a strict reading of [policy 4.13]. I think it's notable that the heading is [Embedded] copies and was [Convenience] copies since its inception, thankfully I found a request to expand this section using [vendoring]. [policy 4.13]: https://www.debian.org/doc/debian-policy/ch-source.html#embedded-code-copies [Embedded]: https://bugs.debian.org/955036 [Convenience]: https://bugs.debian.org/392362 [vendoring]: https://bugs.debian.org/907051 It is my reading of the situation that not only has this practice become more prevalent across multiple ecosystems since 2008, but that it can be a good thing when upstreams use it to better modularise their code. As a consequence, and in particular for large upstream projects, it is not a good use of maintainer time to package every single vendored library as a separate source package. See e.g. [kubernetes], [python BoF @25mins], [android-platform-tools], and even [uscan] grouping used by nodejs. [kubernetes]: https://bugs.debian.org/971515#172 [python BoF @25mins]: https://meetings-archive.debian.net/pub/debian-meetings/2021/DebConf21/debconf21-97-python-team-bof.webm [android-platform-tools]: https://salsa.debian.org/android-tools-team/admin/-/issues/40 [uscan]: https://manpages.debian.org/uscan#grouped_package Is there any objection to the following summary? 1. If the reused code is small and intended to be embedded into a package, then this MUST be documented in the [security-tracker]. 2. If the included project has already been packaged, then the Debian version SHOULD be used instead. 3. If modifications have been made, then those changes SHOULD be forwarded and/or the package ported to the official version. 4. When 2 or 3 are too onerous to maintain, the package MAY use the convenience copy but MUST document why in README.source and SHOULD be included in the [security-tracker]. 5. Where only a small number of unrelated projects are bundled, they SHOULD be uploaded as separate source packages. 6. If the upstream authors are largely the same, then vendored sub-projects MAY simply be built together as the same source. 7. A large number of vendored dependencies used only together for a single Debian package MAY be grouped into a single source upload. 8. If 6 or 7 are used initially but a new package has some overlap, then the new package MUST NOT duplicate the vendoring. The duplication SHOULD be packaged separately, then the original package SHOULD be updated to use the Debian version instead. 9. When upstream has a proven track record of promptly handling security vulnerabilities inside their vendored dependencies, then maintainers SHOULD follow the same practice, updating versions in lockstep. I might be misusing the MUST/SHOULD/MAY, so those can be dropped as needed, but I tried to capture the accepted practice and deliberately used all the different historical terms. For comparison, there's also [Fedora] policy, but apart from not having an in-band "bundled", I also think that the line has ended up being drawn marginally differently. [security-tracker]: https://salsa.debian.org/security-tracker-team/security-tracker/-/blob/master/data/embedded-code-copies [Fedora] https://docs.fedoraproject.org/en-US/packaging-guidelines/#bundling
Le ven. 3 sept. 2021 à 00:39, Phil Morrell <debian@emorrp1.name> a écrit : a couple naive questions: - should a package debian/control list bundled dependencies to make sure to avoid duplications ? - when a bundled package dependency is already available in debian, and is the same (unpatched), should the upstream source tarball be repacked without that dependency, or kept inside the source tarball ? Jérémy
Hi Phil, First of all, thanks for compiling the list of reasonings. I get the impression that you are framing current state of embedding as a generally good thing to do, and if I understand that correctly then I disagree with it. I suspect that it helps if separating reasons for _encouraging_ embedding (tiny upstream projects and deeply integrated sets of upstreams, I guess) from reasons for _discouraging_ embdding (all other cases, I guess). Quoting Phil Morrell (2021-09-03 00:38:35) packages embed unrelated packages to meet ftpmaster requirement of a minimum size source package. - Jonas
Quoting Phil Morrell (2021-09-03 03:30:04) I do not think that those maintainers you gave as example should do differently. My point is that those you gave as example are *exceptions* to a general practice in Debian of _avoiding_ embedding. I am very worried about how complex node-* packages in Debian have become since ftpmasters explicitly stated a not-too-small rule and we began more aggressively embedding. E.g. version of each embedded project is hidden by default, and those packages manually adding virtual packages has no mechanism to ensure that versions don't jump backwards or disappear due to a typos. - Jonas
Embedded copies of code/etc have downsides ... https://wiki.debian.org/EmbeddedCopies ... but there are many many copies in Debian and they are not going away upstream. all the time and old copies get removed. This has always been the case and it always will be. So we need to cope with the consequences of this change toward embedding in the upstream FLOSS ecosystems. Personally, my recommendations are that: Debian package maintainers could investigate upstream tarballs for embedded copies before each upload containing a new/changed upstream tarball. Debian package maintainers could talk to upstream about removing embedded copies and replacing them with dependencies. Debian package maintainers could talk to upstream about upstreaming changes in modified embedded copies, removing the embedded copies and replacing them with dependencies. Debian package maintainers could use Files-Excluded or `rm -r` in debian/rules to ensure that embedded copies are not used by the build. Debian package maintainers could add hints to the source package about which embedded copies are definitely used. Debian security tracker could remove the perpetually outdated list of embedded copies. Debian security issue investigators could search the archive for similar or duplicate code (using the tools listed on the above wiki page), investigate the build logs for each package found and determine which packages are affected. This is a lot of work, but given the level of embedding we already have, it is already necessary. Also, the issue of static linking is similar; it is here, it isn't going away and so now we have to cope with it and the problems it causes are similar to embedded copies. https://wiki.debian.org/StaticLinking
Le jeudi 2 septembre 2021, 22:38:35 UTC Phil Morrell a écrit : For uscan you misread, the goal of the uscan grouped package is to avoid embeded copy. It decouple the concept of upstream package from debian package by grouping it, but (see for instance node-resolve unstable package), but decreasing the number of embeded copy in the distrib. - src:node-resolve embded a few related npm package: * node-resolve * node-types-resolve (the typescript API for this package but that is another packages upstream. It is crazy but welcome to nodejs/npm world) * is-core-module really small depends upstream of this package * path-parse really small depends upstream of this package The node-resolve arch:all debian package provides node-type-resolve (=version), node-is-core-module (=version) and so on. So apt-get install node- types-resolve works for user as expected. Moreover other package could depends on node-types-resolve (>> version) thus using the normal depends mechanism of debian. Moreover (it is for now contructed by manual rules but me and mabye yadd are planning to automate this rules), node-resolve create a /usr/share/doc/node- types-resolve directory with copyright and changelog.Debian.gz linked back to /usr/share/doc/node-resolve/copyright and /usr/share/doc/node-resolve/ changelog.Debian.gz BUT with documentation of node-types-resolve in order to allow switching easilly to non grouped package if needed (package being non leaf packages or bigger to pass ftpmaster constraint), and in order to let your user thinks node-types-resolve is a normal package, with its own documentation. Thus with my pkg-javascript team hat, I use the group feature of uscan, in order to deacrease embeded copy in javascript packages and to workarround npm one line one package culture. It is not perfect but it is I believe a good engineering trade of. The only problem is I am not english native speaker, and thus I have not documented why and how we use this feature. Patch to policy are welcome Moreover I believe that trackers should display the versionned provides and https://packages.debian.org/unstable/node-resolve sould also document versionned provides and https://packages.debian.org/unstable/node-types-resolve should redirect to d https://packages.debian.org/unstable/node-resolve in order to avoid to confuse our users. Bastien
If the embedded project is specifically not maintained as API- or
ABI-stable, then I think that's also a common and valid reason to embed
it, perhaps even more so than the ones you mentioned here. This can mean
library APIs, but also command-line options, output behaviour or some
other machine interface that the embedding project might require.
(Or perhaps you intended "deeply integrated" to cover this, but I think
it's worth being explicit about the API stability factor.)
I'm mainly thinking here of "copylibs" like gnulib, libiberty, libglnx,
stb. If an upstream tells us their code will break API/ABI without
notice, then I think we should believe them - which means the only
long-term-maintainable way to depend on it is for the projects that need
it to vendor a known-compatible copy, and be responsible for updating
that copy and the code that calls it at the same time if a serious bug
is found. This is not new - gnulib and libiberty have been like this
for decades, and vendoring those is not really the same thing as vendoring
the likes of zlib or libjpeg.
I know we do have a libstb package that builds a shared library, but the
existence of a stb build system (let alone a shared library build system)
is a Debian-specific patch, which does not fill me with confidence
that it has the necessary API- and ABI-stability to be correct as a
shared library.
Outside the scope of libraries, a few packages in GNOME are currently
using gi-docgen (a gtk-doc-like API documentation generator) as a vendored
project, because it is intended to have a stable interface in future but
has not got there yet. There's a gi-docgen package in NEW, but it will
stay in experimental and will not be used in build-dependencies until
it has stabilized enough to be valid to use that way.
Of course, if an upstream decides that their project *is* ready to be a
sufficiently-stable separate project, then we should usually prefer to
package it separately (for example, bubblewrap and xdg-dbus-proxy are
vendored into flatpak, and the vendored copies were used in Debian early
in its lifetime, but both are now stable enough and independent enough
that we have moved to system copies).
smcv
The package MUST be listed as being without security support in the debian-security-support package and the Release Notes, and it MUST NOT be installed as part of a default installation, unless the security team has explicitly agreed to support it. If we are shipping software where Debian cannot provide security support, then we shouldn't hide the problem from our users. cu Adrian
Thanks to Adrian and pabs for their corrections on documenting security support, and there wasn't too much objection to the summary, more to the sad state of affairs that leads to it and a bit of clarification. I believe all the major points have cc'd 907051, so would like to encourage someone more familiar with policy process than I am to draft an amendment. There should be enough written there now to expand the section accordingly with more recommendations, and possibly file wishlist bugs for maintainers to document their reasons in the source. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=907051