- Package:
- debian-policy
- Source:
- debian-policy
- Submitter:
- Simon Josefsson
- Date:
- 2026-06-11 11:43:01 UTC
- Severity:
- normal
- Tags:
Hi! The copyright specification https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/ do not mention Files-Excluded/Included which are supported for a long time by uscan and other tools. What do you think about the attached patch to fix this? /Simon
Inlined path below so it is directly visible in the email. I don't know why Files-Excluded was left out from this standard, but it seems obvious to me that it should be mentioned and a link to additional info provided like Simon suggests. Seconded. From 8c3605b91938ed78f554569696581a4935ae4e4f Mon Sep 17 00:00:00 2001 From: Simon Josefsson <simon@josefsson.org> Date: Mon, 16 Mar 2026 11:01:42 +0100 Subject: [PATCH] Mention Files-Excluded/Included in copyright-format--- copyright-format-1.0.xml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/copyright-format-1.0.xml b/copyright-format-1.0.xml index 954a65b..8e1345b 100644 --- a/copyright-format-1.0.xml +++ b/copyright-format-1.0.xml @@ -246,6 +246,11 @@ <link linkend="copyright-field">Copyright</link>: optional. </para> </listitem> + <listitem> + <para> + <link linkend="files-excluded-included-field">Files-Excluded, Files-Excluded-*, Files-Included, Files-Included-*</link>: optional. + </para> + </listitem> </itemizedlist> <para> The <varname>Copyright</varname> and <varname>License</varname> @@ -498,6 +503,18 @@ License: MPL-1.1 </para> </section> + <section id="files-excluded-included-field"> + <title><varname>Files-Excluded, Files-Excluded-*, Files-Included, Files-Included-*</varname></title> + <para> + Formatted text, no synopsis: these fields are used by <ulink + url="https://manpages.debian.org/testing/devscripts/uscan.1.en.html#COPYRIGHT_FILE_EXAMPLES">uscan</ulink> + and <ulink + url="https://manpages.debian.org/testing/devscripts/mk-origtargz.1.en.html">mk-origtargz</ulink> + to automatically repack an upstream tarball, see the + documentation for details. + </para> + </section> + <section id="license-field"> <title><varname>License</varname></title> <para>
copyright-format specification, and packaging tools should not require the use of copyright-format since it is an optional format. They should have used another file instead. Cheers, Bill
Bill Allombert <ballombe@debian.org> writes: I agree it is a bad design. But a lot of things aren't how we would want them to be. Documenting reality is sometimes better than hiding things, hoping something better will come along. I believe Files-* is widely used in Debian packages already. Is anyone working on a better design? Although what do you mean packaging tools shouldn't require the use of copyright-format? There is no requirement here, everything is opt-in. The patch document these to be optional fields, for those who want to use them as they are supported by uscan and mk-origtargz. /Simon
But Debian Policy purpose is higher than just documenting things (which are already documented anyway). Getting uscan to call mk-origtargz to remove files requires using the new copyright-format. Otherwise the maintainer need to call mk-origtargz manually. We should do better. Cheers,
Yes, I think we should document these in the spec. One comment about your patch is that it refers to the other tools' documentation for what the contents of the fields should be. But Policy should probably become the canonical source here. So can you instead document the format of the contents of the fields?
Bill Allombert <ballombe@debian.org> writes: I'm not sure I follow here. How? From a design point of view, having a Files Excluded/Included wildcard list seems like a reasonable approach. And people will need to opt-in to use some mechanism, or do things manually. Is your concern that the existing fields hi-jack the debian/copyright file, when those fields could have been put in a different file? Or is there anything deeper about the design that you think could be improved? If I would start from scratch to solve this problem, I can't come up with anything to improve. The tooling could be improved a lot to be more user friendly and better support iterative re-imports, but the specification part seems fairly good. In my experience, the only reason for using the Files-* headers is to avoid non-DFSG files, or to simplify the upstream package to make the burden to document the license of 1000+ completely irrelevant files in debian/copyright. Thus, putting the headers in debian/copyright feels entirely appropriate to me, so I'm not sure I see the argument for a different file either. /Simon
mk-origtargz does not process Files-Excluded in debian/copyright if debian/copyright is not in the new copyright-format. Indeed. Cheers, Bill.
Sean Whitton <spwhitton@spwhitton.name> writes: finish this. Attaching a first draft for specification. This would ideally have to be reviewed by someone familiar with how uscan/mk-origtargz works (or by me, or someone else, who could be made to compare code with text). The text is silence on how multiple occurances of these lines behave. The simplest is to just ban that, which could be added. If multiple occurances are permitted, the text needs to say something if the ORDER of those lines matters. I think there are two possible interpretations: 1) Multiple headers are merged into just one before processing. That is, first merge all Files-Excluded* headers and exclude all files mentioned in the merged list of fiels. Then merge all Files-Included* headers and then re-include all those files. 2) Multiple headers are processed in order of occurance in the file. This means walking through the directives imperatively, applying them sequentially. The corner-case I'm thinking of something like this: Files-Excluded: foo/* Files-Included: foo/bar* Files-Excluded: foo/bar_foo* Files-Included: foo/bar_foo_baz And a tree like this: foo/hej foo/bar foo/barbar foo/bar_foo foo/bar_foofoo foo/bar_foo_baz With 1) it would merge this into Files-Excluded: foo/* foo/bar_foo* in which means the second foo/bar_foo* is redundant (which suggest this is not a valid interpretation) and all files would be excluded, and then we would have Files-Included: foo/bar* foo/bar_foo_baz which also contains a redundant line (also suggesting this is not a good interpretation) so that the resulting tree would be: foo/bar foo/barbar foo/bar_foo foo/bar_foofoo foo/bar_foo_baz With 2) the first 'Files-Excluded: foo/*' would exclude ALL files, and the next 'Files-Included: foo/bar*' would make the tree contain: foo/bar foo/barbar foo/bar_foo foo/bar_foofoo foo/bar_foo_baz and then 'Files-Excluded: foo/bar_foo*' would turn the tree into: foo/bar foo/barbar and the final 'Files-Included: foo/bar_foo_baz' would turn the tree into: foo/bar foo/barbar foo/bar_foo_baz So I think 2) is the better interpretation. Now I only wonder if this is actually how uscan/mk-origtargz really behaves. Another question is how to deal with paths containing odd characters like SPC. /Simon
Bill Allombert <Bill.Allombert@math.u-bordeaux.fr> writes: Right. Would you want it to behave in any other way? If someone dislikes the new copyright-format, they can do things manually. Or come up with a new specification how to do things, and try to gain adoption of that. There is no direct conflict with any of that compared to documenting the current approach. Thanks for clarifying. I see your point, and can sympathize, but I think this is a case where the current situation is not perfect, but it so widely used and fixing all those occurances is a lot of work compared to merely document and accept what is currently used and working. If there would be some actual substantial GAIN from changing all Files-* in debian/copyright to using some new format, that is a better argument, but even then I see no problem to document the current approach. /Simon
Simon Josefsson <simon@josefsson.org> [17/Mar 3:00pm +01] wrote: I don't know about any of these details either. But until they are figured out I don't think we should document this. Either Policy defines the field and how it works or it doesn't mention it because it remains, nominally, an experimental extension.
Sean Whitton <spwhitton@spwhitton.name> writes: I agree -- but I'm hopeful that figuring this out is only stalled on my lack of knowledge, rather than these details not being clear from other documentation/code. Let's wait and see if someone steps up to clarify the implemented semantics. /Simon
Hi
[...]
Repeated use of the same field within a stanza of a deb822-style file is
already forbidden, so there's no further need to worry about any of
these cases:
copyright-format/1.0 §4 says:
The syntax of the file is the same as for other Debian control files,
as specified in the Debian Policy Manual. See its section 5.1 for
details.
And Policy §5.1 says:
A stanza must not contain more than one instance of a particular field
name.
mk-origtargz uses dpkg's Dpkg::Control to read the data which matches
this requirement; likewise python-debian's debian.copyright module
implements this already.
The mk-origtargz source uses these as what copyright-format/1.0 §4.2
calls a "whitespace-separated list".
split(/\s+/, $data->{ $self->config->excludestanza })
This matches all other places in copyright-format/1.0 where files or
sets of files are described - i.e. the Files field (see §6.9).
Between §4.2 and §6.9, the details are already exhaustively described in
the format, along with practical guidance on how to deal with a space
character.
I would suggest that the current patch for the documentation of these
fields should have a reference to the format of the field (§4.2) and
useful additional information about that format (§6.9) added to it.
Would be great to get this extensively used field documented -
apparently 4304 packages currently use it.
cheers
Stuart
Stuart Prescott <stuart@debian.org> writes: Thank you! I was hoping for that. I don't think the section has to even repeate this piece of information. Great suggestions! What do you think of updated patch below? The rendered text reads like this: 6.7. Files-Excluded, Files-Excluded-*, Files-Included, Files-Included-* Whitespace-separated list: filename patterns (same as the Files field, including the wildcards) used to signal repacking of the upstream source source to remove unwanted files. Files-Excluded is a list of filename patterns matching files (including directories) to exclude. Files-Excluded-COMPONENT behave the same, but is applied to a particular orig.tar component only (replacing COMPONENT with the component name), for multi-orig.tar sources. Files-Included is used to include some files (or directories) that were excluded by a Files-Excluded expression. Files-Included-COMPONENT is similarily used to re-include paths excluded by Files-Excluded-COMPONENT. These fields are supported by uscan and mk-origtargz. I believe this is a self-contained specification, and does not need any external references in order to understand, based on already widely implemented properties. Thus addressing Sean's concern about the syntax of these fields being experimental. /Simon
Simon Josefsson <simon@josefsson.org> [21/Mar 10:01am +01] wrote: Thanks, this looks good to me as a piece of Policy. I'm not qualified to review its factual accuracy though so it would be great if someone else could do that.
lör 2026-03-21 klockan 12:52 +0000 skrev Sean Whitton: Indeed, and there is no hurry, so I think it would be great to hear feedback from people familiar with implementations of these headers to have better confidence in the text. /Simon
Hello everybody, I am glad to see the documentation of Files-Excluded moving forward. In 2012 (#685506#10) I recommeded to wait and in 2013 (#685506#55) I wrote in a reply to Andreas: 13 years later, no alternative has emerged. Moreover, Andreas made the important point that it is relevant to document in `debian/copyright` which files are removed from the upstream sources that we point to in the same file. In addition, `uscan` has a `--copyright` argument to use a custom location for the `debian/copyright` file and developers who do not wish to use the machine-readable copyright format can write their own package update scripts that take advantage of this option. In the context of creating new packages I have used `uscan` with a short machine-readable copyright file header stub without `Files` paragraphs, and it works well. Thus, there is no obligation to use the machine-readable format to document the package copyright in order to be able to use `uscan`'s facility for removing files from tarballs. I have read the whole #685506, #1000771 and #1131015 again today, and my main recommendation would be to also credit Joe Nahmias for their patch in #685506#197, which is not directly here used but primed some of the stakeholders here for their review of the current patch. I have read the plain text rendering of 1131015#75 as well as uscan(1), https://wiki.debian.org/UscanEnhancements; altogether this matches my understanding and the experience I have with the tool, with the caveat that I never used the Files-Excluded-<component> syntax. Finally, in addition to be used by `uscan` and `mk-origtargz`, the `Files-Excluded` is also recognised by multiple tools such as `cme` and `lintian` for instance. I do not think it needs to be documented but I mention it to illustrate how the field has spread in our ecosystem in the past 14 years or so. Based on Simon's patch, and following Osamu's recommendation in #1000771#15 to keep things short, and Bill's concerns about tying maintainer tools to a particular format of the `debian/copyright` file, I would like to propose the following modified text.---------------------------------------------------------------- Whitespace‑separated filename patterns, as in the Files field, used to indicate which parts of the upstream source specified in the Source field will not be included in the Debian source package. Files‑Excluded lists patterns for files or directories to omit. Files‑Included lists patterns for files that should not be omitted even if they match a Files‑Excluded rule. For multi‑orig.tar sources, Files‑Excluded‑COMPONENT and Files‑Included‑COMPONENT apply the same logic but only to the specified orig.tar component. These fields may also be used by maintainer tools to perform the exclusions, but such tools should not require the machine-readable debian/copyright file format for correct operation. ---------------------------------------------------------------- Have a nice day, Charles
Thanks for going through earlier discussions Charles! Please find attached an updated patch, using your proposed paragraph but adding a parenthesis about wildcards/directory names which I find clarifying. What do you all think about this self-contained patch? /Simon
Is it possible to set the copyright file inside the debian/watch file ? The man page suggests this is not possible. Cheers,
Simon Josefsson [29/May 8:49am +02] wrote: Seconded, thanks.
Hi, also seconded. best, werdahias
control: tag -1 + pending Matthias Geiger [11/Jun 12:56pm +02] wrote: Queued up for the next release, thanks all.