#1131015 mention Files-Excluded/Included-* in copyright-format-1.0.xml

#1131015#5
Date:
2026-03-16 22:51:33 UTC
From:
To:
Hi!  The copyright specification

https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/

do not mention Files-Excluded/Included which are supported for a long
time by uscan and other tools.

What do you think about the attached patch to fix this?

/Simon

#1131015#10
Date:
2026-03-17 06:31:03 UTC
From:
To:
Inlined path below so it is directly visible in the email.

I don't know why Files-Excluded was left out from this standard, but
it seems obvious to me that it should be mentioned and a link to
additional info provided like Simon suggests.

Seconded.


From 8c3605b91938ed78f554569696581a4935ae4e4f Mon Sep 17 00:00:00 2001
From: Simon Josefsson <simon@josefsson.org>
Date: Mon, 16 Mar 2026 11:01:42 +0100
Subject: [PATCH] Mention Files-Excluded/Included in copyright-format
--- copyright-format-1.0.xml | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/copyright-format-1.0.xml b/copyright-format-1.0.xml index 954a65b..8e1345b 100644 --- a/copyright-format-1.0.xml +++ b/copyright-format-1.0.xml @@ -246,6 +246,11 @@ <link linkend="copyright-field">Copyright</link>: optional. </para> </listitem> + <listitem> + <para> + <link linkend="files-excluded-included-field">Files-Excluded, Files-Excluded-*, Files-Included, Files-Included-*</link>: optional. + </para> + </listitem> </itemizedlist> <para> The <varname>Copyright</varname> and <varname>License</varname> @@ -498,6 +503,18 @@ License: MPL-1.1 </para> </section> + <section id="files-excluded-included-field"> + <title><varname>Files-Excluded, Files-Excluded-*, Files-Included, Files-Included-*</varname></title> + <para> + Formatted text, no synopsis: these fields are used by <ulink + url="https://manpages.debian.org/testing/devscripts/uscan.1.en.html#COPYRIGHT_FILE_EXAMPLES">uscan</ulink> + and <ulink + url="https://manpages.debian.org/testing/devscripts/mk-origtargz.1.en.html">mk-origtargz</ulink> + to automatically repack an upstream tarball, see the + documentation for details. + </para> + </section> + <section id="license-field"> <title><varname>License</varname></title> <para>
#1131015#15
Date:
2026-03-17 07:12:39 UTC
From:
To:
copyright-format specification, and packaging tools should not require the use
of copyright-format since it is an optional format.  They should have used
another file instead.

Cheers,
Bill

#1131015#20
Date:
2026-03-17 07:46:29 UTC
From:
To:
Bill Allombert <ballombe@debian.org> writes:

I agree it is a bad design.  But a lot of things aren't how we would
want them to be.  Documenting reality is sometimes better than hiding
things, hoping something better will come along.  I believe Files-* is
widely used in Debian packages already.  Is anyone working on a better
design?

Although what do you mean packaging tools shouldn't require the use of
copyright-format?  There is no requirement here, everything is opt-in.
The patch document these to be optional fields, for those who want to
use them as they are supported by uscan and mk-origtargz.

/Simon

#1131015#25
Date:
2026-03-17 09:24:54 UTC
From:
To:
But Debian Policy purpose is higher than just documenting things
(which are already documented anyway).

Getting uscan to call mk-origtargz to remove files requires using the new
copyright-format. Otherwise the maintainer need to call mk-origtargz
manually. We should do better.

Cheers,

#1131015#30
Date:
2026-03-17 09:48:29 UTC
From:
To:
Yes, I think we should document these in the spec.

One comment about your patch is that it refers to the other tools'
documentation for what the contents of the fields should be.  But Policy
should probably become the canonical source here.  So can you instead
document the format of the contents of the fields?

#1131015#35
Date:
2026-03-17 13:31:30 UTC
From:
To:
Bill Allombert <ballombe@debian.org> writes:

I'm not sure I follow here.  How?  From a design point of view, having a
Files Excluded/Included wildcard list seems like a reasonable approach.
And people will need to opt-in to use some mechanism, or do things
manually.

Is your concern that the existing fields hi-jack the debian/copyright
file, when those fields could have been put in a different file?

Or is there anything deeper about the design that you think could be
improved?

If I would start from scratch to solve this problem, I can't come up
with anything to improve.  The tooling could be improved a lot to be
more user friendly and better support iterative re-imports, but the
specification part seems fairly good.

In my experience, the only reason for using the Files-* headers is to
avoid non-DFSG files, or to simplify the upstream package to make the
burden to document the license of 1000+ completely irrelevant files in
debian/copyright.  Thus, putting the headers in debian/copyright feels
entirely appropriate to me, so I'm not sure I see the argument for a
different file either.

/Simon

#1131015#40
Date:
2026-03-17 13:48:45 UTC
From:
To:
mk-origtargz does not process Files-Excluded in debian/copyright if
debian/copyright is not in the new copyright-format.

Indeed.

Cheers,
Bill.

#1131015#45
Date:
2026-03-17 14:00:42 UTC
From:
To:
Sean Whitton <spwhitton@spwhitton.name> writes:
finish this.  Attaching a first draft for specification.

This would ideally have to be reviewed by someone familiar with how
uscan/mk-origtargz works (or by me, or someone else, who could be made
to compare code with text).

The text is silence on how multiple occurances of these lines behave.
The simplest is to just ban that, which could be added.

If multiple occurances are permitted, the text needs to say something if
the ORDER of those lines matters.

I think there are two possible interpretations:

1) Multiple headers are merged into just one before processing.  That
is, first merge all Files-Excluded* headers and exclude all files
mentioned in the merged list of fiels.  Then merge all Files-Included*
headers and then re-include all those files.

2) Multiple headers are processed in order of occurance in the file.
This means walking through the directives imperatively, applying them
sequentially.

The corner-case I'm thinking of something like this:

Files-Excluded: foo/*
Files-Included: foo/bar*
Files-Excluded: foo/bar_foo*
Files-Included: foo/bar_foo_baz

And a tree like this:

foo/hej
foo/bar
foo/barbar
foo/bar_foo
foo/bar_foofoo
foo/bar_foo_baz

With 1) it would merge this into

Files-Excluded: foo/* foo/bar_foo*

in which means the second foo/bar_foo* is redundant (which suggest this
is not a valid interpretation) and all files would be excluded, and then
we would have

Files-Included: foo/bar* foo/bar_foo_baz

which also contains a redundant line (also suggesting this is not a good
interpretation) so that the resulting tree would be:

foo/bar
foo/barbar
foo/bar_foo
foo/bar_foofoo
foo/bar_foo_baz

With 2) the first 'Files-Excluded: foo/*' would exclude ALL files, and
the next 'Files-Included: foo/bar*' would make the tree contain:

foo/bar
foo/barbar
foo/bar_foo
foo/bar_foofoo
foo/bar_foo_baz

and then 'Files-Excluded: foo/bar_foo*' would turn the tree into:

foo/bar
foo/barbar

and the final 'Files-Included: foo/bar_foo_baz' would turn the tree
into:

foo/bar
foo/barbar
foo/bar_foo_baz

So I think 2) is the better interpretation.  Now I only wonder if this
is actually how uscan/mk-origtargz really behaves.

Another question is how to deal with paths containing odd characters
like SPC.

/Simon

#1131015#50
Date:
2026-03-17 14:07:57 UTC
From:
To:
Bill Allombert <Bill.Allombert@math.u-bordeaux.fr> writes:

Right.  Would you want it to behave in any other way?

If someone dislikes the new copyright-format, they can do things
manually.  Or come up with a new specification how to do things, and try
to gain adoption of that.  There is no direct conflict with any of that
compared to documenting the current approach.

Thanks for clarifying.

I see your point, and can sympathize, but I think this is a case where
the current situation is not perfect, but it so widely used and fixing
all those occurances is a lot of work compared to merely document and
accept what is currently used and working.

If there would be some actual substantial GAIN from changing all Files-*
in debian/copyright to using some new format, that is a better argument,
but even then I see no problem to document the current approach.

/Simon

#1131015#55
Date:
2026-03-17 14:28:32 UTC
From:
To:
Simon Josefsson <simon@josefsson.org> [17/Mar  3:00pm +01] wrote:

I don't know about any of these details either.  But until they are
figured out I don't think we should document this.  Either Policy
defines the field and how it works or it doesn't mention it because it
remains, nominally, an experimental extension.

#1131015#60
Date:
2026-03-17 14:41:47 UTC
From:
To:
Sean Whitton <spwhitton@spwhitton.name> writes:

I agree -- but I'm hopeful that figuring this out is only stalled on my
lack of knowledge, rather than these details not being clear from other
documentation/code.  Let's wait and see if someone steps up to clarify
the implemented semantics.

/Simon

#1131015#65
Date:
2026-03-21 07:04:57 UTC
From:
To:
Hi
[...]

Repeated use of the same field within a stanza of a deb822-style file is
already forbidden, so there's no further need to worry about any of
these cases:

copyright-format/1.0 §4 says:

   The syntax of the file is the same as for other Debian control files,
   as specified in the Debian Policy Manual. See its section 5.1 for
   details.
And Policy §5.1 says:

   A stanza must not contain more than one instance of a particular field
   name.

mk-origtargz uses dpkg's Dpkg::Control to read the data which matches
this requirement; likewise python-debian's debian.copyright module
implements this already.

The mk-origtargz source uses these as what copyright-format/1.0 §4.2
calls a "whitespace-separated list".

	split(/\s+/, $data->{ $self->config->excludestanza })

This matches all other places in copyright-format/1.0 where files or
sets of files are described - i.e. the Files field (see §6.9).

Between §4.2 and §6.9, the details are already exhaustively described in
the format, along with practical guidance on how to deal with a space
character.

I would suggest that the current patch for the documentation of these
fields should have a reference to the format of the field (§4.2) and
useful additional information about that format (§6.9) added to it.


Would be great to get this extensively used field documented -
apparently 4304 packages currently use it.

cheers
Stuart

#1131015#70
Date:
2026-03-21 09:01:07 UTC
From:
To:
Stuart Prescott <stuart@debian.org> writes:

Thank you!  I was hoping for that.  I don't think the section has to
even repeate this piece of information.

Great suggestions!  What do you think of updated patch below?

The rendered text reads like this:

   6.7. Files-Excluded, Files-Excluded-*, Files-Included, Files-Included-*

   Whitespace-separated list: filename patterns (same as the Files
   field, including the wildcards) used to signal repacking of the
   upstream source source to remove unwanted files. Files-Excluded is a
   list of filename patterns matching files (including directories) to
   exclude. Files-Excluded-COMPONENT behave the same, but is applied to
   a particular orig.tar component only (replacing COMPONENT with the
   component name), for multi-orig.tar sources. Files-Included is used
   to include some files (or directories) that were excluded by a
   Files-Excluded expression. Files-Included-COMPONENT is similarily
   used to re-include paths excluded by Files-Excluded-COMPONENT. These
   fields are supported by uscan and mk-origtargz.

I believe this is a self-contained specification, and does not need any
external references in order to understand, based on already widely
implemented properties.  Thus addressing Sean's concern about the syntax
of these fields being experimental.

/Simon

#1131015#75
Date:
2026-03-21 12:52:54 UTC
From:
To:
Simon Josefsson <simon@josefsson.org> [21/Mar 10:01am +01] wrote:

Thanks, this looks good to me as a piece of Policy.  I'm not qualified
to review its factual accuracy though so it would be great if someone
else could do that.

#1131015#80
Date:
2026-03-21 13:03:49 UTC
From:
To:
lör 2026-03-21 klockan 12:52 +0000 skrev Sean Whitton:

Indeed, and there is no hurry, so I think it would be great to hear
feedback from people familiar with implementations of these headers to
have better confidence in the text.

/Simon

#1131015#85
Date:
2026-05-29 00:54:25 UTC
From:
To:
Hello everybody,

I am glad to see the documentation of Files-Excluded moving forward.

In 2012 (#685506#10) I recommeded to wait and in 2013 (#685506#55) I
wrote in a reply to Andreas:

13 years later, no alternative has emerged.  Moreover, Andreas made the
important point that it is relevant to document in `debian/copyright`
which files are removed from the upstream sources that we point to in
the same file.

In addition, `uscan` has a `--copyright` argument to use a custom
location for the `debian/copyright` file and developers who do not wish
to use the machine-readable copyright format can write their own package
update scripts that take advantage of this option.  In the context of
creating new packages I have used `uscan` with a short machine-readable
copyright file header stub without `Files` paragraphs, and it works
well.  Thus, there is no obligation to use the machine-readable format
to document the package copyright in order to be able to use `uscan`'s
facility for removing files from tarballs.

I have read the whole #685506, #1000771 and #1131015 again today, and my
main recommendation would be to also credit Joe Nahmias for their patch
in #685506#197, which is not directly here used but primed some of the
stakeholders here for their review of the current patch.

I have read the plain text rendering of 1131015#75 as well as uscan(1),
https://wiki.debian.org/UscanEnhancements; altogether this matches my
understanding and the experience I have with the tool, with the caveat
that I never used the Files-Excluded-<component> syntax.

Finally, in addition to be used by `uscan` and `mk-origtargz`, the
`Files-Excluded` is also recognised by multiple tools such as `cme` and
`lintian` for instance.  I do not think it needs to be documented but I
mention it to illustrate how the field has spread in our ecosystem in
the past 14 years or so.

Based on Simon's patch, and following Osamu's recommendation in
#1000771#15 to keep things short, and Bill's concerns about tying
maintainer tools to a particular format of the `debian/copyright` file,
I would like to propose the following modified text.
---------------------------------------------------------------- Whitespace‑separated filename patterns, as in the Files field, used to indicate which parts of the upstream source specified in the Source field will not be included in the Debian source package. Files‑Excluded lists patterns for files or directories to omit. Files‑Included lists patterns for files that should not be omitted even if they match a Files‑Excluded rule. For multi‑orig.tar sources, Files‑Excluded‑COMPONENT and Files‑Included‑COMPONENT apply the same logic but only to the specified orig.tar component. These fields may also be used by maintainer tools to perform the exclusions, but such tools should not require the machine-readable debian/copyright file format for correct operation. ---------------------------------------------------------------- Have a nice day, Charles
#1131015#90
Date:
2026-05-29 06:49:43 UTC
From:
To:
Thanks for going through earlier discussions Charles!  Please find
attached an updated patch, using your proposed paragraph but adding a
parenthesis about wildcards/directory names which I find clarifying.

What do you all think about this self-contained patch?

/Simon

#1131015#95
Date:
2026-05-29 08:31:36 UTC
From:
To:
Is it possible to set the copyright file inside the debian/watch file ?
The man page suggests this is not possible.

Cheers,

#1131015#100
Date:
2026-05-31 11:30:31 UTC
From:
To:
Simon Josefsson [29/May  8:49am +02] wrote:

Seconded, thanks.

#1131015#105
Date:
2026-06-11 10:56:37 UTC
From:
To:
Hi,

also seconded.

best,

werdahias

#1131015#110
Date:
2026-06-11 11:33:44 UTC
From:
To:
control: tag -1 + pending

Matthias Geiger [11/Jun 12:56pm +02] wrote:

Queued up for the next release, thanks all.