#907051 Say much more about vendoring of libraries

#907051#5
Date:
2018-08-23 13:13:00 UTC
From:
To:
Hello,

Thank you for sharing this link -- it seems like Fedora have thought
harder about this than we have, at least at the level of the whole
project.

We can't jump straight to something as involved in that, but threads
like this on -devel suggest to me that Policy's discussing of vendoring
needs to be expanded.

In particular, Policy should explain /why/ bundling is best avoided, and
the consensus that it sometimes has to happen should be noted, along
with mention of registering bundled copies with the security team where
appropriate.

#907051#10
Date:
2018-08-24 02:08:42 UTC
From:
To:
**why** bundling should be avoided. I spend a lot of time dealing with
that when packaging Docker, and at some point I realized that I couldn't
even explain to myself why I was spending so much time un-bundling the
world out of Docker. I just had a vague understanding that "bundling is
bad", and I understand the security issues of bundled code. But I wish I
had more details on "how bad it is", just so that I can justify to
myself to spend so much time on it. Sometimes the barrier between time
well-spent and time wasted is very thin, and you're not sure where you
stand.

Also, it turns out that sometimes bundling can't be avoided. I don't
know if it's possible to come up with some general guidelines on that.
We have it documented in the README.source of docker, but it applies to
docker special case, and I don't pretend it can be extended to a general
case.

During all this time when I was questioning myself on the reason to
un-bundle, the only official documentation I found was the short
paragraph in the Debian Policy [1], which is quite thin. Only now,
through the thread in debian-devel, I discover that there is some more
information in Wiki. I couldn't find this information when I needed it,
but maybe I'm just not good at finding a needle in a haystack ;)

All of that to say: I would find it very helpful to have some more
"official information" from Debian on bundle/vendored/embedded code. The
rationale to un-bundle, and possibly some guidelines to keep bundles.

  Arnaud

[1]: https://www.debian.org/doc/debian-policy/ch-source.html#s-embeddedfiles

#907051#15
Date:
2018-08-24 02:18:11 UTC
From:
To:
Hi,

Sean Whitton wrote:

My first instinct was that this belongs in devref, not Policy, since
it is more about the project than about consistency and
interoperability issues that directly affect packaging tools and user
experience.

But then I realized that the Debian Free Software Guidelines, for
example, are part of policy.  This topic would similarly be a good fit
for ch-archive.  Thanks for filing it.

Jonathan

#907051#20
Date:
2018-08-24 02:21:23 UTC
From:
To:
Hi,

Arnaud Rebillout wrote:

For reference, I think you're referring to

https://wiki.debian.org/EmbeddedCodeCopies
https://wiki.debian.org/UpstreamGuide#No_inclusion_of_third_party_code

Thanks for that.  It may be a good place to find text to reuse.

Jonathan

#907051#25
Date:
2021-09-02 22:38:35 UTC
From:
To:
Over this last year there seems to have been a noticeable divergence of
maintainer opinion, on what has become known as vendoring, from a strict
reading of [policy 4.13]. I think it's notable that the heading is
[Embedded] copies and was [Convenience] copies since its inception,
thankfully I found a request to expand this section using [vendoring].

[policy 4.13]: https://www.debian.org/doc/debian-policy/ch-source.html#embedded-code-copies
[Embedded]: https://bugs.debian.org/955036
[Convenience]: https://bugs.debian.org/392362
[vendoring]: https://bugs.debian.org/907051

It is my reading of the situation that not only has this practice become
more prevalent across multiple ecosystems since 2008, but that it can be
a good thing when upstreams use it to better modularise their code. As a
consequence, and in particular for large upstream projects, it is not a
good use of maintainer time to package every single vendored library as
a separate source package. See e.g. [kubernetes], [python BoF @25mins],
[android-platform-tools], and even [uscan] grouping used by nodejs.

[kubernetes]: https://bugs.debian.org/971515#172
[python BoF @25mins]: https://meetings-archive.debian.net/pub/debian-meetings/2021/DebConf21/debconf21-97-python-team-bof.webm
[android-platform-tools]: https://salsa.debian.org/android-tools-team/admin/-/issues/40
[uscan]: https://manpages.debian.org/uscan#grouped_package

Is there any objection to the following summary?

1. If the reused code is small and intended to be embedded into a
   package, then this MUST be documented in the [security-tracker].
2. If the included project has already been packaged, then the Debian
   version SHOULD be used instead.
3. If modifications have been made, then those changes SHOULD be
   forwarded and/or the package ported to the official version.
4. When 2 or 3 are too onerous to maintain, the package MAY use the
   convenience copy but MUST document why in README.source and SHOULD be
   included in the [security-tracker].
5. Where only a small number of unrelated projects are bundled, they
   SHOULD be uploaded as separate source packages.
6. If the upstream authors are largely the same, then vendored
   sub-projects MAY simply be built together as the same source.
7. A large number of vendored dependencies used only together for a
   single Debian package MAY be grouped into a single source upload.
8. If 6 or 7 are used initially but a new package has some overlap, then
   the new package MUST NOT duplicate the vendoring. The duplication
   SHOULD be packaged separately, then the original package SHOULD be
   updated to use the Debian version instead.
9. When upstream has a proven track record of promptly handling security
   vulnerabilities inside their vendored dependencies, then maintainers
   SHOULD follow the same practice, updating versions in lockstep.

I might be misusing the MUST/SHOULD/MAY, so those can be dropped as
needed, but I tried to capture the accepted practice and deliberately
used all the different historical terms. For comparison, there's also
[Fedora] policy, but apart from not having an in-band "bundled", I also
think that the line has ended up being drawn marginally differently.

[security-tracker]: https://salsa.debian.org/security-tracker-team/security-tracker/-/blob/master/data/embedded-code-copies
[Fedora] https://docs.fedoraproject.org/en-US/packaging-guidelines/#bundling

#907051#30
Date:
2021-09-02 23:03:35 UTC
From:
To:
Le ven. 3 sept. 2021 à 00:39, Phil Morrell <debian@emorrp1.name> a écrit :

a couple naive questions:
- should a package debian/control list bundled dependencies to make
sure to avoid duplications ?
- when a bundled package dependency is already available in debian,
and is the same (unpatched), should the upstream source tarball be
repacked without that dependency, or kept inside the source tarball ?

Jérémy

#907051#35
Date:
2021-09-03 00:46:20 UTC
From:
To:
Hi Phil,

First of all, thanks for compiling the list of reasonings.

I get the impression that you are framing current state of embedding as
a generally good thing to do, and if I understand that correctly then I
disagree with it.

I suspect that it helps if separating reasons for _encouraging_
embedding (tiny upstream projects and deeply integrated sets of
upstreams, I guess) from reasons for _discouraging_ embdding (all other
cases, I guess).


Quoting Phil Morrell (2021-09-03 00:38:35)
packages embed unrelated packages to meet ftpmaster requirement of a
minimum size source package.


 - Jonas

#907051#40
Date:
2021-09-03 02:52:51 UTC
From:
To:
Quoting Phil Morrell (2021-09-03 03:30:04)

I do not think that those maintainers you gave as example should do
differently.

My point is that those you gave as example are *exceptions* to a general
practice in Debian of _avoiding_ embedding.

I am very worried about how complex node-* packages in Debian have
become since ftpmasters explicitly stated a not-too-small rule and we
began more aggressively embedding.  E.g. version of each embedded
project is hidden by default, and those packages manually adding virtual
packages has no mechanism to ensure that versions don't jump backwards
or disappear due to a typos.


 - Jonas

#907051#45
Date:
2021-09-03 03:23:56 UTC
From:
To:
Embedded copies of code/etc have downsides ...

https://wiki.debian.org/EmbeddedCopies

... but there are many many copies in Debian and they are not going
away upstream.
all the time and old copies get removed. This has always been the case
and it always will be.

So we need to cope with the consequences of this change toward
embedding in the upstream FLOSS ecosystems.

Personally, my recommendations are that:

Debian package maintainers could investigate upstream tarballs for
embedded copies before each upload containing a new/changed upstream
tarball.

Debian package maintainers could talk to upstream about removing
embedded copies and replacing them with dependencies.

Debian package maintainers could talk to upstream about upstreaming
changes in modified embedded copies, removing the embedded copies and
replacing them with dependencies.

Debian package maintainers could use Files-Excluded or `rm -r` in
debian/rules to ensure that embedded copies are not used by the build.

Debian package maintainers could add hints to the source package about
which embedded copies are definitely used.

Debian security tracker could remove the perpetually outdated list of
embedded copies.

Debian security issue investigators could search the archive for
similar or duplicate code (using the tools listed on the above wiki
page), investigate the build logs for each package found and determine
which packages are affected. This is a lot of work, but given the
level of embedding we already have, it is already necessary.

Also, the issue of static linking is similar; it is here, it isn't
going away and so now we have to cope with it and the problems it
causes are similar to embedded copies.

https://wiki.debian.org/StaticLinking

#907051#50
Date:
2021-09-03 08:10:55 UTC
From:
To:
Le jeudi 2 septembre 2021, 22:38:35 UTC Phil Morrell a écrit :

For uscan you misread, the goal of the uscan grouped package is to avoid
embeded copy. It decouple the concept of upstream package from debian package
by grouping it, but (see for instance node-resolve unstable package), but
decreasing the number of embeded copy in the distrib.
- src:node-resolve embded a few related npm package:
   * node-resolve
  * node-types-resolve (the typescript API for this package but that is
another packages upstream. It is crazy but welcome to nodejs/npm world)
  * is-core-module really small depends upstream of this package
  * path-parse  really small depends upstream of this package

The node-resolve arch:all debian package provides node-type-resolve
(=version), node-is-core-module (=version) and so on. So apt-get install node-
types-resolve works for user as expected. Moreover other package could depends
on node-types-resolve (>> version) thus using the normal depends mechanism of
debian.

Moreover (it is for now contructed by manual rules but me and mabye yadd are
planning to automate this rules), node-resolve create a /usr/share/doc/node-
types-resolve directory with copyright and changelog.Debian.gz linked back to
/usr/share/doc/node-resolve/copyright and /usr/share/doc/node-resolve/
changelog.Debian.gz BUT with documentation of node-types-resolve in order to
allow switching easilly to non grouped package if needed (package being non
leaf packages or bigger to pass ftpmaster constraint), and in order to let
your user thinks node-types-resolve is a normal package, with its own
documentation.

Thus with my pkg-javascript team hat, I use the group feature of uscan, in
order to deacrease embeded copy in javascript packages and to workarround npm
one line one package culture.

It is not perfect but it is I believe a good engineering trade of.

The only problem is I am not english native speaker, and thus I have not
documented why and how we use this feature. Patch to policy are welcome

Moreover I believe that trackers should display the versionned provides and
https://packages.debian.org/unstable/node-resolve sould also document
versionned provides and https://packages.debian.org/unstable/node-types-resolve should redirect to d https://packages.debian.org/unstable/node-resolve
in order to avoid to confuse our users.

Bastien

#907051#55
Date:
2021-09-03 09:20:30 UTC
From:
To:
If the embedded project is specifically not maintained as API- or
ABI-stable, then I think that's also a common and valid reason to embed
it, perhaps even more so than the ones you mentioned here. This can mean
library APIs, but also command-line options, output behaviour or some
other machine interface that the embedding project might require.
(Or perhaps you intended "deeply integrated" to cover this, but I think
it's worth being explicit about the API stability factor.)

I'm mainly thinking here of "copylibs" like gnulib, libiberty, libglnx,
stb. If an upstream tells us their code will break API/ABI without
notice, then I think we should believe them - which means the only
long-term-maintainable way to depend on it is for the projects that need
it to vendor a known-compatible copy, and be responsible for updating
that copy and the code that calls it at the same time if a serious bug
is found. This is not new - gnulib and libiberty have been like this
for decades, and vendoring those is not really the same thing as vendoring
the likes of zlib or libjpeg.

I know we do have a libstb package that builds a shared library, but the
existence of a stb build system (let alone a shared library build system)
is a Debian-specific patch, which does not fill me with confidence
that it has the necessary API- and ABI-stability to be correct as a
shared library.

Outside the scope of libraries, a few packages in GNOME are currently
using gi-docgen (a gtk-doc-like API documentation generator) as a vendored
project, because it is intended to have a stable interface in future but
has not got there yet. There's a gi-docgen package in NEW, but it will
stay in experimental and will not be used in build-dependencies until
it has stabilized enough to be valid to use that way.

Of course, if an upstream decides that their project *is* ready to be a
sufficiently-stable separate project, then we should usually prefer to
package it separately (for example, bubblewrap and xdg-dbus-proxy are
vendored into flatpak, and the vendored copies were used in Debian early
in its lifetime, but both are now stable enough and independent enough
that we have moved to system copies).

    smcv

#907051#60
Date:
2021-09-12 17:01:34 UTC
From:
To:
  The package MUST be listed as being without security support in
  the debian-security-support package and the Release Notes,
  and it MUST NOT be installed as part of a default installation,
  unless the security team has explicitly agreed to support it.

If we are shipping software where Debian cannot provide security support,
then we shouldn't hide the problem from our users.

cu
Adrian

#907051#65
Date:
2021-09-15 15:31:55 UTC
From:
To:
Thanks to Adrian and pabs for their corrections on documenting security
support, and there wasn't too much objection to the summary, more to the
sad state of affairs that leads to it and a bit of clarification.

I believe all the major points have cc'd 907051, so would like to
encourage someone more familiar with policy process than I am to draft
an amendment. There should be enough written there now to expand the
section accordingly with more recommendations, and possibly file
wishlist bugs for maintainers to document their reasons in the source.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=907051