- Package:
- debian-policy
- Source:
- debian-policy
- Submitter:
- Sean Whitton
- Date:
- 2026-05-31 11:43:02 UTC
- Severity:
- important
Package: debian-policy Severity: important X-debbugs-cc: apo@debian.org Control: block 795402 by -1 Control: block 883966 by -1 Control: block 884223 by -1 Control: block 884226 by -1 Control: block 884227 by -1 Control: block 884228 by -1 User: debian-policy@packages.debian.org Usertags: normative discussion Hello debian-policy@l.d.o, Our current criteria for including licenses, as Markus Koschany smartly puts it in #884228, is "[a]pparently something between gut feeling and the popularity of our least preferred license in common-licenses." We can and should do better than this. In the air is also the idea that we include licenses in common-licenses to save disk space on low disk space systems: the license should be popular enough such that the reduced size of d/copyright files will outweigh the increased size of base-files. We should write down our criteria in Policy, in section 12.5 (or possibly in the Policy Changes Process appendix). We should probably say too that the application of the criteria is at the discretion of the Policy Editors. Before we can do that, however, we need to consider whether the criteria need to be updated. The only point of clear consensus -- at least among the Policy Editors -- is that short licenses which have more than one popular variant should never be included because of the risk that packages licensed under one variant incorrectly refer to a different variant in common-licenses. This problem actually exists in the archive because a BSD variant was included in common-licenses at some point. We should include this point the Policy Manual. Otherwise, here are some of the arguments on the table: (1) In a related d-devel thread, someone working with embedded systems suggested that these days, either a system has enough disk space that common-licenses isn't relevant, or it has so little disk space that all of /usr/share/doc must be deleted. If this is right, disk space concerns should not decide what goes into common-licenses. Is it right? (2) Some people want more licenses in common-licenses because they find it more convenient. Convenient processes save our volunteers' time. We frequently get requests to expand common-licenses and I suspect that many of them are motivated by the belief that it would make the requestor's work more convenient. If disk space issues aren't relevant anymore, an increase in convenience might become a dominating criterion for inclusion. However, this point has been disputed: better tools could provide license text formatted suitably for d/copyright, which would be just as convenient (e.g., in Emacs: `C-u M-! get-formatted-license GPL-3` would be about as convenient as it gets). And there surely exist those who find common-licenses makes editing d/copyright less convenient... I'm not sure how to proceed. It would be nice to verify (1) with other people working with embedded systems. Possibly we should ask on one of our more specialised mailing lists. And there are surely other arguments besides (1) and (2). We should gather those in this bug. #884228 has further points of discussion, but I'd ask that we restrict ourselves in this bug to discussing what the criteria for inclusion should be. In particular, let's not discuss the proposal to add all known DFSG-free licenses to common-licenses. Whether that proposal is valid depends on our criteria for inclusion, so let's stick to hashing our those criteria in this bug.
Blocking my own bug report with this one, which I just noticed. I submitted bug #910548 previously against the base-files package: "base-files - please consider adding /usr/share/common-licenses/Unicode-Data". I had formatted the copyright and license information for Unicode data files from the http://unicode.org website to put in the debian/copyright file in a package that I created this summer. The copyright information is more involved than most copyright statements, so I kept it in what I submitted with the bug report. I thought if that license was something Debian found useful, there would be no need for anyone else to duplicate the effort of formatting that I had gone through once, and so I offered it. Just the license in isolation could be formatted like other licenses fairly quickly if the copyright section is not wanted. Or the whole thing can be left out and that bug closed, as you wish. Thanks, Paul Hardy
Good morning, Attached please find your PDF account statement and invoice as of 05/11/2023. Please notice you have a past due balance for invoice IN0099203. Please provide payment as soon as possible. Best Regards, Shawneen Chisholm Accounts Receivable Coordinator UNITED RENTALS, INC. Branch L02 BONNYVILLE 4920 56TH AVE BONNYVILLE AB T9N 2N8 CA 780-826-7610 CONFIDENTIALITY NOTICE: The contents of this email message and any attachments are intended solely for the addressee(s). This may contain confidential and/or privileged information and may be legally protected from disclosure. If you are not the intended recipient of this message, please alert the sender immediately by reply email and then delete this message and any attachments. Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited
Good morning, Attached please find your PDF account statement and invoice as of 05/11/2023. Please notice you have a past due balance for invoice IN0099203. Please provide payment as soon as possible. Best Regards, Shawneen Chisholm Accounts Receivable Coordinator UNITED RENTALS, INC. Branch L02 BONNYVILLE 4920 56TH AVE BONNYVILLE AB T9N 2N8 CA 780-826-7610 CONFIDENTIALITY NOTICE: The contents of this email message and any attachments are intended solely for the addressee(s). This may contain confidential and/or privileged information and may be legally protected from disclosure. If you are not the intended recipient of this message, please alert the sender immediately by reply email and then delete this message and any attachments. Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited
Hello everyone,
I come seeking your opinions. Please cc 885698@bugs.debian.org on replies
so that we can accumulate this discussion in a Debian Policy bug.
One of the responsibilities of the Policy Editors is to determine which
licenses should be included in /usr/share/common-licenses, and thus do not
have to be reproduced in the copyright file of every package that use
them. We have never had a clear criteria for this. We need one, so that
we can advertise a clear and transparent policy for inclusion without
having the conversation from first principles for each new license.
I was the one who made the last few decisions, and I based the decision
largely on the number of binary packages in Debian using the license.
When I was doing this, I set a fairly high threshold (more packages than
the least popular package currently in /usr/share/common-licenses, which
historically has been GFDL-1.3 although it now appears to be MPL-1.1). No
one was entirely satisfied with that criteria, including me.
I have the following questions:
1. What criteria (besides the obvious one of being a DFSG-free license)
should we apply when deciding what licenses to include? Number of
packages? Length? How positive we feel towards the license? Some
combination of these things? Please be specific.
2. If we use number of packages as a criteria, what should the threshold
be? I have appended to the bottom of this message the current output
of my ad-hoc license-count tool run against the current archive so that
you have a feeling for how many packages use various licenses.
3. If we use number of packages, should that be source packages or binary
packages? Source packages represent maintainer effort; binary packages
represent disk clutter.
4. Should there be a length cutoff for licenses, such that we do not
include in /usr/share/common-licenses any license shorter than some
number of lines or bytes? The justification would be that telling
people to go look elsewhere for the license has some inherent overhead
and annoyance when they discover that the license is all of ten lines
and could have just been included in the copyright file.
5. Should we exclude licenses that contain text that all or most users of
the license customize when they use it? For example, the existing
/usr/share/common-licenses/BSD contains the clause:
3. Neither the name of the University nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
which users of this specific license usually change to instead include
the name of their organization, or their name, or something else. Full
disclosure: it will be very hard to convince me that licenses used this
way should be included in common-licenses, since I believe it is
technically incorrect to omit a license and point to the
common-licenses version when the provisions of the common-licenses
version are different in detail due to naming different people or
requiring or prohibiting mentioning of different names as endorsements.
Here are various concerns that people have had in this area in the past.
I'm neither indicating agreement nor disagreement with any of these
points, only listing them to provoke thought about some of the things
people have raised before.
* Including long legal texts in debian/copyright, particularly if one
wants to format them for copyright-format, is tedious and annoying and
doesn't benefit our users in any significant way, and therefore we
should include as many licenses as possible in common-licenses to spare
people that work.
* common-licenses consumes disk space on every installed Debian system of
any size, and therefore should be kept small to avoid wasting system
resources.
* Every appproved DFSG license should be included in common-licenses so
that it serves as a repository of licenses the project has approved.
* Including a license in common-licenses implies that the project approves
of that license, and therefore licenses such as the LaTeX Project Public
License 1.0, which requires renaming derived works, should not be
included even though DFSG #4 grudgingly allows for this type of license
term.
* All licenses explicitly mentioned in the Debian Free Software Guidelines
should be present in common-licenses (as justification for including the
BSD license even though the current text is specific to the Regents of
the University of California).
In order to structure the discussion and prod people into thinking about
the implications, I will make the following straw man proposal. This is
what I would do if the decision was entirely up to me:
Licenses will be included in common-licenses if they meet all of the
following criteria:
* The license is DFSG-free.
* Exactly the same license wording is used by all works covered by it.
* The license applies to at least 100 source packages in Debian.
* The license text is longer than 25 lines.
I will attempt to guide and summarize discussion on this topic. No
decision will be made immediately; I will summarize what I've heard first
and be transparent about what direction I think the discussion is
converging towards (if any).
Finally, as promised, here is the count of source packages in unstable
that use the set of licenses that I taught my script to look for. This is
likely not accurate; the script uses a bunch of heuristics and guesswork.
AGPL 3 277
Apache 2.0 5274
Artistic 4187
Artistic 2.0 337
BSD (common-licenses) 42
CC-BY 1.0 3
CC-BY 2.0 15
CC-BY 2.5 13
CC-BY 3.0 240
CC-BY 4.0 159
CC-BY-SA 1.0 8
CC-BY-SA 2.0 48
CC-BY-SA 2.5 16
CC-BY-SA 3.0 425
CC-BY-SA 4.0 237
CC0-1.0 1069
CDDL 67
CeCILL 30
CeCILL-B 13
CeCILL-C 9
GFDL (any) 569
GFDL (symlink) 55
GFDL 1.2 289
GFDL 1.3 231
GPL (any) 20006
GPL (symlink) 1331
GPL 1 4033
GPL 2 10466
GPL 3 6783
LGPL (any) 5019
LGPL (symlink) 265
LGPL 2 3850
LGPL 2.1 2926
LGPL 3 1526
LaTeX PPL 46
LaTeX PPL (any) 40
LaTeX PPL 1.3c 32
MPL 1.1 165
MPL 2.0 361
SIL OFL 1.0 11
SIL OFL 1.1 258
Quoting Russ Allbery (2023-09-10 05:35:27) I fully support the above proposed criteria, and appreciate your initiative to have this conversation. - Jonas
I like this. I'd say that even if a license is shorter than 25 lines I'd appreciate to be able to link to it instead of copypasting it. I like to be able to fill the license field with a value, after checking that the upstream license didn't diverge from what it looks like. I'd love to use SPDX IDs there, for example. In an ideal world, I'd like to autofill debian/copyright with SPDX IDs from upstream metadata. Having a link to a file goes closer to having a declarative license ID. In general the less bytes I have to maintain in debian/* the happier I am, and as a personal aesthetic sense I feel like the less bytes we all have to maintain in debian/* the less is our collective maintenance burden. Enrico
Hideki Yamane <henrich@iijmio-mail.jp> writes: Can we do this legally? If we can, it certainly has substantial merits, but I'm not sure that this satisfies the requirement in a lot of licenses to distribute a copy of the license along with the work. Some licenses may allow that to be provided as a URL, but I don't think they all do (which makes sense since people may receive Debian on physical media and not have Internet access).
Hmm, how about providing license-common package and that depends on "license-common-list", and ISO image provides both, then? It would be no regressions. I expect license-common-list data as below license-short-name: URL GPL-2: file:///usr/share/common-licenses/GPL-2 Boost-1.0: https://spdx.org/licenses/BSL-1.0.html
Me too. Agreed.
Quoting Hideki Yamane (2023-09-10 11:00:07) I guess Russ' response above was a concern over using http(s) URIs towards a non-local resource. What I practice since some years is the following syntax: Files: foo/bar Copyright: 2022 Someone License: Apache-2.0 or Expat License: Apache-2.0 Reference: /usr/share/common-licenses/Apache-2.0 License: Expat [the full contents of the Expat license] That syntax introduces a new field "Reference" (our copyright file format permits new fields, despite lintian complaining about it). Related discussion is at https://bugs.debian.org/786450 - Jonas
How about just pointing SPDX licenses URL for whole license text and lists DFSG-free licenses from that? (but yes, we should adjust short name of licenses for DEP-5 and SPDX for it).
+1, great work and great starting point. I also agree with Enrico and I'd like lower limits too, but any progress is good progress on this matter for me.
Jonas Smedegaard <jonas@jones.dk> writes: I do wonder why we've never done this. Does anyone know? common-licenses is in an essential package so it doesn't require a dependency and is always present, and we've leaned on that in the past in justifying not including those licenses in the binary packages themselves, but I'm not sure why a package dependency wouldn't be legally equivalent. We allow symlinking the /usr/share/doc directory in some cases where there is a dependency, so we don't strictly require every binary package have a copyright file. Yes, I think the https URL is an essential part of the first proposal, since it avoids needing to ship a copy of all of the licenses. But I'm dubious that would pass legal muster. The alternative proposal as I understand it would be to haave a license-common package that includes full copies of all the licenses with some more relaxed threshold requirement and have packages that use one of those licenses depend on that package. (This would obviously require a maintainer be found for the license-common package.) This is separate from this particular bug, but I would love to see the pointer to common-licenses turned into a formal field of this type in the copyright format, rather than being an ad hoc comment.
Russ Allbery <rra@debian.org> writes: In the thread so far, there's been a bit of early convergence around my threshold of 100 packages above. I want to make sure people realize that this is a very conservative threshold that would mean saying no to most new license inclusion requests. My guess is that with the threshold set at 100, we will probably add around eight new licenses with the 25 line threshold (AGPL-2, Artistic-2.0, CC-BY 3.0, CC-BY 4.0, CC-BY-SA 3.0, CC-BY-SA 4.0, and OFL-1.1, and I'm not sure about some of those because the CC licenses have variants that would each have to reach the threshold independently; my current ad hoc script does not distinguish between the variants), and maybe 10 to 12 total without that threshold (adding Expat, zlib, some of the BSD licenses). This would essentially be continuing current practice except with more transparent and consistent criteria. It would mean not including a lot of long legal license texts that people have complained about having to duplicate, such as the CDDL, CeCILL licenses, probably the EPL, the Unicode license, etc. If that's what people want, that's what we'll do; as I said, that's what I would do if the choice were left entirely up to me. But I want to make sure I give the folks who want a much more relaxed standard a chance to speak up.
Or we could generate DEBIAN/copyright from debian/copyright using data in license-common-list at build time. So maintainers would not need to manage the copying themselves. Cheers, Bill
Quoting Russ Allbery (2023-09-10 18:16:07) Good point. Another way of reading the responses is that there was some interest in including even more licenses. I would also prefer inclusion of more licenses, simply had the impression that a) we could do that step by step, and b) my habit of writing copyright files (and other teksts) using [semantic linebreaks] made me forget that Expat license is arguably only 3 lines long (whereas in my style of writing it is 24-25 lines long). If "include all SPDX licenses" is for some reason (space in minimal systems?) problematic, then let me propose a threshold of 1000 characters - as that just about covers Expat ;-) - Jonas [semantic linebreaks]: https://sembr.org/
Jeremy Stanley <fungi@yuggoth.org> writes: of the short licenses because historically I wasn't considering them (with the exception of common-licenses references to the BSD license, which I kind of would like to make an RC bug and clean up so that we could remove the BSD license from common-licenses on the grounds that it's specific to only the University of California and confuses people). If we go with any sort of threshold, the script will need serious improvements. That was something else I wanted to ask: I've invested all of a couple of hours in this script, and would be happy to throw it away in favor of something that tries to do a more proper job of classifying the licenses referenced in debian/copyright. Has someone already done this (Jonas, perhaps)?
Hi, Quoting Bill Allombert (2023-09-10 18:29:36) I very much like this idea. The main reason maintainers want more licenses in /usr/share/common-licenses/ is so that they do not anymore have humongous d/copyright files with all license texts copypasted over and over again. If long texts could be reduced to a reference that get expanded by a machine it would make debian/copyright look much nicer and would make it easier to maintain while at the same time shipping the full license text in the binary package. Does anybody know why such an approach would be a bad idea? I have zero legal training so the only potential problem with this approach that I was able to come up with is, that then the source package itself would not anymore contain the license text and thus we would be shipping code covered by a license that states that the code may only be distributed with the license text alongside it without that text. So while auto-generating this would probably create compliant binary packages, it would leave the source package without the license text. Is that a problem? Thanks! cheers, josch
On 2023-09-09 20:35:27 -0700 (-0700), Russ Allbery wrote: [...] [...] I'm surprised, for example, by the absence of the ISC license given that not only ISC's software but much of that originating from the OpenBSD ecosystem uses it. My personal software projects also use the ISC license. Are you aggregating the "License:" field in copyright files too, or is it really simply a hard-coded list of matching patterns? Regardless, this is great work, thanks for kicking off the reevaluation!
At 2023-09-10T21:47:36+0200, Johannes Schauer Marin Rodrigues wrote:
[...]
...why wouldn't it? Remember how a source package is defined:
A DSC file, an upstream source archive (maybe more than one in exciting
new source formats I haven't learned), and a compressed diff of Debian
changes.
Debian _source_ packages generally don't chop copyright notices and
license texts out the upstream distributions, and should not do so
unless those notices/texts are invalid or the material they cover has
been removed. (Both of these do sometimes happen.)
Even if one worries about theoretical liability due to the existence of
separate files for .dsc, .tar.gz, and .diff.gz, then let us recall that
(1) the DSC is minimal, containing metadata that may not rise to the
threshold or originality required by copyright [in the U.S., anyway];
(2) the upstream archive has the notices and texts that the _original
distributor_ put in it, and as a rule, if permission to distribute the
work exists, it is not incumbent on redistributors to add notices/texts
where the rightsholder themselves neglected to do so; and (3) the
.diff.gz will not be in the business of removing notices/texts except as
contemplated in the previous paragraph (correcting erroneous
notices).[1]
I don't think that is a risk as long as people continue to follow
packaging practices that Debian has applied with little objection from
our upstreams for 25+ years.[2]
I am unable to imagine the mechanism by which that would happen, given
what Russ and Bill proposed.
Regards,
Branden
[1] When repackaging, e.g., to remove non-free material, affected
content is removed altogether even from the source. Nothing in
copyright law can compel you to distribute copyright notices and
texts that don't apply to work you're not distributing.[3]
[2] I don't know of Debian _ever_ having had a problem, as in receiving
a cease-and-desist letter or other threat of legal action with what
one might term an "institutional" copyright holder. We've certainly
had our share of nasty emails from cantankerous individual copyright
holders, often who had their own perverse misreadings of licenses
drafted by others (hello to the memory of Jörg Schilling). There
also was once an upstream who stuck a Trojan horse into the source
code to try to get Debian's users to stop using versions we
distributed, but to go directly upstream instead. Nowadays, that
seems quaint; you can today Trojan your machine much more
conveniently with npm(1).
[3] At the same time a few non-free FSF manuals under the GNU FDL
declaim the GNU _GPL_ text to be an Invariant Section. Like most of
the defects of the FDL, I think this is a pointless encumbrance; if
you distribute GPL'ed software, a copy of its text must come along
anyway. The only rationale I can imagine is to mandate, for printed
copies of the manuals, the inclusion of the GPL's preachy preamble.
But I digress.
Johannes Schauer Marin Rodrigues <josch@debian.org> writes: I can think of a few possible problems: * I'm not sure if we generate binary package copyright files at build time right now, and if all of our tooling deals with this. I had thought that we prohibited this, but it looks like it's only a Policy should and there isn't a mention of it in the reject FAQ, so I think I was remembering the rule for debian/control instead. Of course, even if tools don't support this now, they could always be changed. * If ftp-master has to review the copyright files of each binary package separate from the copyright file of the source package (I think this would be an implication of generating the copyright files during build time), and the binary copyright files have fully-expanded licenses, that sounds like kind of a pain for the ftp-master reviewers. Maybe we can deal with this with better tooling, but someone would need to write that. * If we took this to its logical end point and did this with the GPL as well, we would add 20,000 copies of the GPL to the archive and install a *lot* of copies on the system. Admittedly text files are small and disks are large, but this still seems a little excessive. So maybe we still need to do something with common-licenses?
* Russ Allbery <rra@debian.org> [2023-09-10 09:16]:
For me, this outcome would already be an improvement over the current
situation and alleviate my biggest pain point (CC licenses).
Still, I'd like to be significantly more relaxed.
I propose the following three criteria must be satisfied for
inclusion in /usr/share/common-licenses:
* The license is DFSG-free.
* Exactly the same license wording is used by all works covered by it.
* The license is in the SPDX list of common licenses (https://spdx.org/licenses/)
OR
The license applies to at least 100 source packages in Debian.
I am not committed to the 100 source packages threshold, it is
mostly intended as fallback for a hypothetical future license which
is super popular but for some reason does not make it to the SPDX
list in a timely manner.
One very intentional side effect of my proposal is a nudge towards
using SPDX License Identifiers in d/copyright files.
Cheers
Timo
Quoting Russ Allbery (2023-09-10 21:41:59) I have so far worked the most on identifying and grouping source data, putting only little attention (yet - but do dream big...) towards parsing and processing debian/copyright files e.g. to compare and assess how well aligned the file is with the content it is supposed to cover. So if I understand your question correctly and you are not looking for the output of `licensecheck --list-licenses`, then unfortunately I have nothing exciting to offer. - Jonas
Jonas Smedegaard <jonas@jones.dk> writes: I think that's mostly correct. I was wondering what would happen if one ran licensecheck debian/copyright, but unfortunately it doesn't look like it does anything useful. I tried it on one of my packages (remctl) that has a bunch of different licenses, and it just said: debian/copyright: MIT License and apparently ignored all of the other licenses present (FSFAP, FSFFULLR, ISC, X11, GPL-2.0-or-later with Autoconf-exception-generic, and GPL-3.0-or-later with Autoconf-exception-generic). It also doesn't notice that some of the MIT licenses are variations that contain people's names. (I still put all the Autoconf build machinery licenses in my debian/copyright file because of the tooling I use to manage my copyright file, which I also use upstream. I probably should change that, but I need to either switch to licensecheck or rewrite my horrible script.) Also, presumably it doesn't know about copyright-format since it wouldn't be expecting that in source files, so it wouldn't know to include licenses referenced in License stanzas without the license text included.
Quoting Russ Allbery (2023-09-10 23:24:24) Right. Licensecheck so far mostly scans for human prose stating "this has been licensed as..." and "this is the license...", and rarely is able to recognize "the default license of this project is..." or "that folder over there is licensed as..." style prose. That said, there is interest in covering that as well, and also interest in improving on non-prose forms like "[this is YAML;] Copyright: ..." or binary forms most commonly embedded in fonts and ICC data in images. It is helpful if you (i.e. anyone reading this) have a good (as in particularly rich/tricky/peculiar) case that you file a bugreport pointing to its failure of being recognized by licensecheck. Also, I hadn't thought of there being interest in statistics - it should not be too hard to spit out numbers for variation in licenses or copyright holders once licensecheck has recognized the information. Again, if someone has suggestions for formats they'd particularly like such statistisc to be served from licensecheck then please file a bugreport. Sorry this isn't helping anything for the topic being discussed. - Jonas
Hi, One problem is, that some software declares that they use some licenses (e.g. MIT), but sometimes they modify the license term itself a bit. So, there's a difference between words in the license list and some words in the included license in such software. It'd be better to find such software and ask upstream to fix it to use proper license terms, by tagging it at BTS. And, it's NOT Debian specific issues, so it may be better to ask folks to join such a movement then, IMHO.
Quoting Hideki Yamane (2023-09-12 09:27:12) I can only assume that the proposal for an automated DEBIAN/copyright file is limited to source files *possible* to automatically process, and consequently only relates to debian/copyright files written in the machine-readable format. The problem you describe about ambiguous MIT-derived licensing cannot, in by understanding, occur using the machine-readable format - only with less strictly structured debian/copyright files. If you mean to say that ambiguous MIT declarations exist in debian/copyright files written using the machine-readable format, then please point to an example, as I cannot imagine how that would look. - Jonas
Jonas Smedegaard <jonas@jones.dk> writes: is essentially, but not precisely, the same as Expat. If we then tell people that they can omit the text of the license and we'll fill it in automatically, they'll remove the actual text and we'll fill it in with the wrong thing. This is just a bug in handling the debian/copyright file, though. If we take this approach, we'll need to be very explicit that you can only use whatever triggers the automatic inclusion of the license text if your license text is word-for-word identical. Otherwise, you'll need to cut and paste it into the file as always.
Quoting Russ Allbery (2023-09-12 18:15:27) Ah, right. I see it now. Strictly speaking it is not (as I was more narrowly focusing on) that the current debian/copyright spec leaves room for *ambiguity*, but instead that there is a real risk of making mistakes when replacing with centrally defined ones (e.g. redefining a local "Expat" from locally meaning "MIT-ish legalese as stated in this project" to falsely mean "the MIT-ish legalese that SPDX labels MIT"). If you disagree, then please shout, as then I am still missing your point here... - Jonas
Jonas Smedegaard <jonas@jones.dk> writes: Right, the existing copyright format defines a few standard labels and says that you should only use those labels when the license text matches, but it doesn't stress that "matches" means absolutely word-for-word identical. I suspect, although I haven't checked, that we've made at least a few mistakes where some license text that's basically equivalent to Expat is labelled as Expat even though the text is not word-for-word identical. Given that currently all labels in debian/copyright are essentially local and the full text is there (except for common-licenses, where apart from BSD the licenses normally are used verbatim), this is not currently really a bug. But we could turn it into a bug quite quickly if we relied on the license short name to look up the text. To take an example that I've been trying to get rid of for over a decade, many of the /usr/share/common-licenses/BSD references currently in the archive are incorrect. There are a few cases where the code is literally copyrighted only by the Regents of the University of California and uses exactly that license text, but this is not the case for a lot of them. It looks like a few people have even tried to say "use common-licenses but change the name in the license" rather than reproducing the license text, which I don't believe meets the terms of the license (although it's of course very unlikely that anyone would sue over it). A quick code search turns up the following examples, all of which I believe are wrong: https://sources.debian.org/src/mrpt/1:2.10.0+ds-3/doc/man-pages/pod/simul-beacons.pod/?hl=35#L35 https://sources.debian.org/src/gridengine/8.1.9+dfsg-11/debian/scripts/init_cluster/?hl=7#L7 https://sources.debian.org/src/rust-hyphenation/0.7.1-1/debian/copyright/?hl=278#L278 https://sources.debian.org/src/nim/1.6.14-1/debian/copyright/?hl=64#L64 https://sources.debian.org/src/yade/2023.02a-2/debian/copyright/?hl=78#L78 An example of one that probably is okay, although ideally we still wouldn't do this because there are other copyrights in the source: https://sources.debian.org/src/lpr/1:2008.05.17.3+nmu1/debian/copyright/?hl=15#L15 This problem potentially would happen a lot with the BSD licenses, since the copyright-format document points to SPDX and SPDX, since it only cares about labeling legally-equivalent documents, allows the license text to vary around things like the name of the person you're not supposed to say endorsed your software while still receiving the same label. We therefore cannot use solely SPDX as a way of determining whether we can substitute the text of the license automatically for people, because there are SPDX labels for a lot of licenses for which we'd need to copy and paste the exact license text because it varies. At least if I understand what our goals would be. (License texts that have portions that vary between packages they apply to are a menace and make everything much harder, and I really wish people would stop using them, but of course the world of software development is not going to listen to me.)
Note that my proposal makes detecting the discrepancy more visible rather than less, since you can compare the generated copyright file with the actual license statement without chasing files. Also, overengineering aside, the copyright generator could support parameter substitution to accomodate small discrepancies in license. For example an option to replace in /usr/share/common-licenses/BSD the line "Copyright (c) The Regents of the University of California." by whatever is required when generating DEBIAN/copyright. Cheers, Bill
Hopefully I'm not too late and I hope I won't make any ('dumb') mistakes as
I'm not as well-versed in licenses and packaging as other participants.
I think both of these criteria are excellent.
The only reason for not doing so that I've detected is worry about disk space?
If we were talking about several Megabytes (or even larger) then I could see
that point. But license text is max several Kilobytes?
diederik@bagend:/usr/share/doc$ find . -name copyright | wc -l
3759
I suspect I have an enormous amount of duplicate license texts on this system
and replacing those with references to common-licenses will likely reduce the
waste of system resources.
Optionally the license texts in common-licenses could be gz compressed (gzip
is Priority: required) to reduce disk-space even further.
So I would be in favor of dropping the threshold.
The primary reason I'm in favor of dropping this too is consistency.
This is an important reason why I'd want to have most/all licenses that are
used in Debian included in common-licenses.
It's not only tedious and annoying, but also (because of that) error prone.
And then you run the risk of the included license text not being (word-for-
word) the same.
Getting rid of tedious/annoying/repeating busy work seems like a win for
everyone.
And IMO it's not only not beneficial to our users, but actually provides extra
work. If I want to make sure the license text is indeed the same as my
(hopefully correct) local copy, I'd have to run a `diff` with the included text
in the copyright file. And that applies to every user who'd want to do that.
And also for a prospective (new) maintainer of a package.
I'm a (big) fan of SPDX because it simplifies and clarifies things (a lot IMO)
and makes things more consistent. And I'm a sucker for consistency.
I do think that the license should be provided locally (and its availability
not be dependent on a build step in some other tool).
Having a link to an online version may be a useful extra service, but having a
working internet connection should not be a requirement (IMO).
Cheers,
Diederik
Hello Russ, Thank you for working on this. Something that hasn't been brought up yet is the effects on NEW review. I would like to expand the idea of the same license wording being used by all works, to include the additional requirement that there aren't any very similar licenses that are easily confused with the license. For, if it's a license with small variations of any kind, including variations that are not project-specific things like the names of copyright holders, then NEW review is much easier if all the text is right there in d/copyright. I would be in favour of the 25 lines criterion. The main problem with manipulating d/copyright is only the really long licenses, IME.
Attention, I do have a business which l know will be tremendously profitable to both of us, if you will be interested, please get back to me for more details. Sincerely, Mr. Edric Reed
Hi Russ and Sean, thanks for for working on this. Just today I worked on a package having some CC-BY-SA-4.0 licensed content and wasn't too glad at having to copy the full license. Are there any big blockers for this ? Reading the previous discussion the techicalities seem to be mostly agreed upon (unless I missed something ?). I think this would be a big improvement for packagers. Let me know if you need help finalizing any discussion to make this policy. best, Matthias Geiger <werdahias>
I suggested a tool that would copy the full license inside the binary package copyright file at build time. This seems a more sustainable option. Cheers,
Hello, I am Christine Edward, I have a proposal I believe would be of great interest to you. I would appreciate your swift response to enable me to share more details with you. Best regards, Ms. Christine Edward.
Hello, I am Christine Edward, I have a proposal I believe would be of great interest to you. I would appreciate your swift response to enable me to share more details with you. Best regards, Ms. Christine Edward.
Hello
As a member of the new DFSG team, I would like to restart the discussion
from September 2023..
The issue at hand is the inclusion of additional license texts in the
base-files package.
In doing so, I would like to continue Russ Albery’s proposal [0].
Licenses will be included in common-licenses if they meet all of the
following criteria:
* The license is DFSG-free.
* Exactly the same license wording is used by all works covered by it.
* The license applies to at least 100 source packages in Debian.
* The license text is longer than 25 lines.
It also lists various reasons why it makes sense to include additional
license texts.
* Including long legal texts in debian/copyright, particularly if one
wants to format them for copyright-format, is tedious and annoying and
doesn't benefit our users in any significant way, and therefore we
should include as many licenses as possible in common-licenses to spare
people that work.
* common-licenses consumes disk space on every installed Debian system of
any size, and therefore should be kept small to avoid wasting system
resources.
The above reasons also make it easier for the DFSG team to review the
packages.
Even on our own systems, the licenses in question meet these criteria.
[0] https://lists.debian.org/debian-devel/2023/09/msg00055.html
* Mechtilde Stehmann <mechtilde@debian.org> [260427 10:02]: Great. Let me state some IMO relevant questions: 1) What was the outcome of the 2023 discussion? 2) If nothing has changed, why? 3) What is the current dataset? Best, Chris
Hi all, Maybe it'd make sense to restrict this to licenses which also included in Essential packages, or ones with an high enough priority (like Important), so that extra disk usage for base-files. is less of a concern. Bye!
hello Am 27.04.26 um 10:08 schrieb Chris Hofstaedtler: They agreed to include more licenses in /usr/share/common-licenses. No objections were raised. But nothing was changed. I guess because nobody want to do the work. This is my personal opinion At a first task I think theses licenses should be added. They fullfill the criteria Russ posted and at my local machine. Artistic-2.0 AGPL-3 BSL-1.0 CC-BY-3.0 CC-BY-4.0 CC-BY-SA-3.0 CC-BY-SA-4.0 OFL-1.1 The Artistic License in /usr/share/common-licenses is version 1.0 Regards
I think that there was a consensus back then and there is still one now. Do you volunteer to NMU base-files, if the maintainer is not interested in working on this? The list of licenses to be added is small enough that this should not be a concern. But maybe you have different data?
Uh, what? I'm pretty certain Santiago would be happy to update base-files *once* debian-policy has been updated, but certainly not before. So instead of unnecessarily throwing shade, perhaps get debian-policy updated first? Thanks, Guillem
Indeed. Here is a quote from base-files FAQ for those who never bothered to read it: Q. Why isn't license "foo" included in common-licenses? A. I delegate such decisions to the policy group. If you want to propose a new license you should make a policy proposal to modify the paragraph in policy saying "Packages distributed under the Apache license (version 2.0), the Artistic license, the GNU GPL (versions 1, 2, or 3), the GNU LGPL (versions 2, 2.1, or 3), and the GNU FDL (versions 1.2 or 1.3) should refer to the corresponding files under /usr/share/common-licenses". The way of doing this is explained in the debian-policy package. As usual, you should always take a look at already reported bugs against debian-policy before submitting a new one. If somebody has a problem with me delegating the decision to the policy group, they should say so in a clear and non ambiguous way. Thanks.
Hello, Am 27.04.26 um 15:22 schrieb Santiago Vila: You can find the following text under https://salsa.debian.org/sanvila/base-files/-/blob/master/debian/README
Chris Hofstaedtler <zeha@debian.org> writes: I dropped the ball. I would very much welcome someone else pushing this forward, since I am way, way behind on all of my volunteer work.
Do you mean the license text or the license itself ? For the remaining licenses, I have made a proposal to fill in debian/copyright at build time from the list of SPDX identifier. This is still an option if the NEW team does not reject it. Cheers,
Santiago Vila [27/Apr 3:22pm +02] wrote: I'm sorry, I just posted to #1135097 stating the opposite .. I think that the "at least 100 packages" part of this proposal is too low a bar. But perhaps the traditional "deduplicating it would save disk space on the majority of Debian installations" is too high a bar. Any thoughts on something in between?
I think that the proposal if fine, since it adds a quite small number of licenses. If people disagree then please bring measurements.
why do you think so? I think 100 is quite a high bar already. (assuming we talk source packages.)
Holger Levsen [28/Apr 10:08am GMT] wrote: Those 100 could easily be packages that most systems don't have installed -- or, in particular, that systems that are trying to be really small almost never have installed. Really we need to hear from the people who are trying to make the minimal install of Debian small. That's not me.
* Sean Whitton <spwhitton@spwhitton.name> [260428 12:26]: I have a mild interest in keeping small installs small, but I'm certainly not an expert. I've however done some poking. Looking at the copyright files of packages installed by `mmdebstrap forky /dev/null` - IOW a set of packages that can be expected that every 'normal' install of Debian has (excluding container and embedded usecases which can and will apply hacks) - yields a few interesting things: 1) libc, sed have "Boost Software License - Version 1.0 - August 17th, 2003" in their copyright files. Adding this to common-licenses seems a net positive and could IMO be done immediately without any negative effects. 2) mawk, libunistring5 use CC-BY-SA 3.0 These packages can be uninstalled. However curl depends on libunistring5, so once your install wants to talk to the Internet it probably has to stay. 3) nftables uses CC-BY-SA 4.0 This package can be uninstalled, but again once you want network connectivity, ... 4) AGPLv3 is NOT present 5) Deduplicating copyright files might be a meaningful disk space saving, if we actually care about disk space savings. The install per above has: * 10 binary packages from src:util-linux adding 30KB copyright per binary * 6 binary packages from src:systemd adding 13KB copyright per binary * 5 binary packages from src:e2fsprogs adding 20KB copyright per binary * 4 binary packages from src:pam adding 10KB copyright per binary * 4 binary packages from src:krb5 adding 63KB copyright per binary ff. I haven't done a full calculation but it seems we could save 1MB in such an install just by deduplicating the copyright files. Someone else may be interested in running the same analysis on different install scenarios, say Live ISOs, Desktop installs, etc. With my src:util-linux maintainer hat on, I'd welcome tooling and a corresponding policy change towards copyright file deduplication. And/or compression might also be of interest. 6) Even in this install scenario we still have some packages not using https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/ : debian-archive-keyring gcc-16-base libcrypt1 libgcc-s1 libgssapi-krb5-2 libk5crypto3 libkrb5-3 libkrb5support0 libstdc++6 Best, Chris
Could common-licenses directory be compressed? That would save a lot of space. common-licenses is 295k on my system, but the whole of base files (compressed package) is only 73k Cheers, Peter
in Policy 12.5). That would save even more space, it's 63 MB in total on my machine and 2.1 MB on the aforementioned minimal forky. I've searched for a policy bug and of course there was one: #491055, filed in 2008, no discussions after 2008, closed in 2017 for inactivity.
[ trimming Cc lines a little bit, I will read replies from the lists ] Ok, let's make some kind of declaration here: The base-files package has not had any new license added in a lot of years, while the rest of the system has continued to grow exponentially, as usual. Note that I'm using the word "exponentially" here in the pure mathematical sense, not in the English sense that "it grows too much". Most licenses are not really large files by today's standards. To be consistent, compressing licenses would force all references to be changed to the compressed version, IMO for very little gain. I'm more concerned about people copying and pasting the same licenses over and over again into debian/copyright, as pointed out by Mechtilde. Therefore, please do not worry about the increased size in the installed size of base-files after this proposal is approved. I think we definitely can afford it. Thanks.
The reason is that the Debian policy team expected this could only be settled by the NEW team and the legal team so there was no point to debate either way. Cheers,
I fully agree. My tool of choice here is dh_installdocs --link-doc.
E.g. from kmod:
override_dh_installdocs:
dh_installdocs -pkmod -plibkmod-dev --link-doc=libkmod2
dh_installdocs -plibkmod2
Beware: enabling this on an existing package also requires using
dpkg-maintscript-helper (debian/*.maintscript) with dir_to_symlink.
Hello, Am 28.04.26 um 15:24 schrieb Santiago Vila: ACk I prepare a repository locally for the potential Merge Request. All 8 licenses need ~150 KB That is less space as they used in an Install ISO than it is published with each package If there is consent to follow the proposal with Simon's update in #1135097 I can prepare a Merge Request with the additional license texts. Thanks for the constructive discussion. Kind regards
Santiago Vila <sanvila@debian.org> writes: Yeah, I agree with this position. I think license texts are small even by the standards of embedded systems these days. Disk space growth has continued since previous rounds of this discussion, human time is more valuable than a few extra bytes of disk consumption, and we're talking about on the order of 1MiB at most (I suspect less than that). Compared to the size of the Debian base image, this is very small. Folks who actively work on embedded Debian should of course feel free to correct me, but my recollection of past discussions is that they had roughly the same position. I think even in the worst case scenario of a system with a ton of Debian chroots, the incremental size here is highly unlikely to be a significant factor compared to, e.g., normal growth in the size of the utilities in the base image. And of course the local system administrator can always rm -r /usr/share/common-licenses if they really want to. (I doubt anything important uses files there at runtime.)
On Tue, Apr 28, 2026 at 09:29:56AM -0700, Russ Allbery wrote: [..] It might be sensible to have policy allow for this, and thus require that no packages *use* these files during their normal operation (and also not in maintscripts, etc). Except maybe for tools explicitly designed to operate on them (say, license checkers, devscripts). I hope there is pre-established wording in policy that could be reused for such an exception. Chris
I was one of these people. I just deleted /usr/share/doc entirely so I don't think a couple kb more in there would make any difference. Certainly it wouldn't have made it for me.
We are talking about /usr/share/common-licenses which is not in /usr/share/doc ;-) cu Andreas
Is there objections to using SPDX abbreviations for the file names of licenses in base-files? I didn't double-check if that's in the proposal, but I think that's how it should be done. If we already deviate from SPDX names, then maybe moving existing files to SPDX-names, and recommending use of those names, and set up a symlink would be an improvement. Or grandfather in them as exceptions, to avoid unnecessary debian/copyright churn. It would be nice if SPDX names was mentioned in debian-policy or base-files/debian/README.source, so we don't forget about this aspect in the future. We can always change that policy later on if it turns out to be a bad idea for some reason (if someone registers FOO`rm -rf /` as a SPDX license name, perhaps). /Simon
More generally we could have two packages: base-files with /usr/share/common-licenses/ and a new package spdx-license with /usr/share/spdx-licenses/ with all SPDX license used by Debian. and have a tool that build debian/copyright from spdx-license at build time, so spdx-license would only be needed when building packages. Cheers,
Bill Allombert <ballombe@debian.org> writes: I think that is orthogonal -- but I also think the suggestion is good. Doesn't 'spdx-licenses' provide this, though? Maybe not the "tool that build debian/copyright" part though, but that could be done separately. https://tracker.debian.org/pkg/spdx-licenses /Simon
Simon Josefsson <simon@josefsson.org> writes: In general, I think this is a good idea, but I think it's mostly meaningful in combination with adopting SPDX license abbreviations across the board, including in the copyright-format standard. To be clear, I agree with doing that, but I think it has the most value if it's not done piecemeal, since ideally the file names in common-licenses should match the names we use in copyright-format. (Some symlinking may be required if we have to rename anything; I haven't checked if that would be the case.)
Chris Hofstaedtler [28/Apr 1:11pm +02] wrote: Thank you for the feedback. Seems to me we can prioritise developer time by adding more licenses to common-licenses, then, with the possible exception of the AGPL.
Hello all, Am 30.04.26 um 11:35 schrieb Sean Whitton: This exception of the AGPL means we will only add 120 KB instead of 150 KB to /usr/share/common-licenses ? Kind regards
In case my opinion counts: I think AGPL is common enough and will still save developer time if added to common-licenses, even if it's not present in the absolutely minimum Debian system shown by Chris. Thanks.
This is a different concern that can be solved with better tooling to generate the copyright file, by automatically including the AGPL when needed. Cheers,
However, that was never the idea of the original report, and not what I would like to do. Does somebody else believe that adding 30k to base-files is too much because there are not packages in the base system using AGPL? AFAIK, "common licenses" means just that, common licenses, not "common licenses in the base system". I believe we would still benefit from adding the AGPL. Thanks.
Le Fri, May 01, 2026 at 05:05:12PM +0200, Santiago Vila a écrit : Hi all, yes, please add the AGPL-3 and the other licenses suggested by Mechtilde to the common licenses. For reference, I just ran license-count on coccia after applying the attached patch. Here is the output. By the way, it runs takes only a few seconds and not 30 minutes as indicated in the source code comments. AGPL 3 313 Apache 2.0 7087 Artistic 4270 Artistic 2.0 365 BSD (common-licenses) 3 BSL-1.0 302 CC-BY 1.0 3 CC-BY 2.0 16 CC-BY 2.5 11 CC-BY 3.0 256 CC-BY 4.0 249 CC-BY-SA 1.0 9 CC-BY-SA 2.0 46 CC-BY-SA 2.5 19 CC-BY-SA 3.0 461 CC-BY-SA 4.0 352 CC0-1.0 1544 CDDL 66 CeCILL 33 CeCILL-B 16 CeCILL-C 11 GFDL (any) 588 GFDL (symlink) 53 GFDL 1.2 285 GFDL 1.3 254 GPL (any) 20356 GPL (symlink) 947 GPL 1 4168 GPL 2 10658 GPL 3 7321 LGPL (any) 5310 LGPL (symlink) 192 LGPL 2 4093 LGPL 2.1 3184 LGPL 3 1771 LaTeX PPL 52 LaTeX PPL (any) 42 LaTeX PPL 1.3a 1 LaTeX PPL 1.3c 34 MPL 1.1 178 MPL 2.0 502 SIL OFL 1.0 10 SIL OFL 1.1 309 The AGPL-3 and Artistic-2.0 are among the licenses promoted as 'standard' for R packages (together with GPL-2 GPL-3 LGPL-2 LGPL-2.1 LGPL-3 BSD_2_clause BSD_3_clause and MIT), which I handle a lot recently. https://cran.r-project.org/doc/manuals/R-exts.html#Licensing And while I aggree to the opinions expressed here that there seems to be no objections raised directly by users of systems under space constraints, please note that adding CC-BY-SA-3.0 will not increase the size of systems using GRUB, and that CC-BY-SA-4.0 licenses are found on systems that use nftables. https://lists.debian.org/debian-policy/2026/01/msg00010.html It is not uncommon that I find these four license when I have to write new debian/copyright files for r-cran-* and r-bioc-* packages, or when their upstreams relicense their work. I would deeply appreciate if they could be added to the common licenses. By the way, for the point of view of saving the time of writing, reading and scrolling to maintainers and reviewers of new packages, maybe the DFSG, Licensing and New packages team could run license-count regulary and see which licenses are trending up? Being proactive would have the highest impact. Have a week-end, Charles
* Santiago Vila <sanvila@debian.org> [260501 17:05]: I *guess* it's gonna be fine. As always people likely have different ideas how this selection came to be and what the presence of the files mean. Maybe this should be spelled out somewhere, if it's not already done. Chris
Hi. After a discussion has taken place, I've decided to add the nine licenses proposed by Mechtilde to base-files. Including a license in base-files should be considered as a promise that the licenses will be there indefinitely (i.e. forever in principle, unless there is a very strong reason not to, but I can't imagine right now what kind of reason that could be). I understand that packages under those licenses now "may" refer to the copies in base-files in an opt-in way, as it's an essential package. Naturally, I expect this to become a "should" in policy as well in some not too distant future, but there is not any hurry on my side. In fact, I think it would be a good thing if there was some kind of intermediate period during which maintainers receive some warning or advance notice in a less intrusive way than a bug report (for example, by way of a lintian warning). Thanks.
Santiago Vila [29/May 4:06pm +02] wrote: Thanks. We can document this in Policy if someone would provide an updated patch. It shouldn't be a 'should' yet, indeed.