- Package:
- debian-policy
- Source:
- debian-policy
- Submitter:
- Enrico Zini
- Date:
- 2026-01-29 22:51:28 UTC
- Severity:
- wishlist
Hello,
enrico> Just when I wanted to split Maintainer fields my commas, I
stumble on Maintainer: Adam C. Powell, IV <hazelsct@debian.org>
enrico> Now I'll just split Uploader: gfields by commas
liw> enrico, hmm, should the Maintainer field not be an rfc822
compatible e-mail address spec?
enrico> liw: I'll check the policy
liw> hmm, the policy only mentions a problem with periods
enrico> liw: and only the mail address seems to be RFC822
Section 5.6.2. `Maintainer' says:
The package maintainer's name and email address. The name should come
first, then the email address inside angle brackets `<>' (in RFC822
format).
If the maintainer's name contains a full stop then the whole field
will not work directly as an email address due to a misfeature in the
syntax specified in RFC822; a program using this field as an address
must check for this and correct the problem if necessary (for example
by putting the name in round brackets and moving it to the end, and
bringing the email address forward).
Now, the field "Adam C. Powell, IV <hazelsct@debian.org>" seems to be
legal according to current policy. However, later in "5.6.3.
`Uploaders'" the policy says:
The format is the same as that of the Maintainer tag, and multiple
entries should be comma separated.
This would imply that commas should not be used in the Maintainer field.
This is not a big problem for me now, since I can work around the issue
by using two different functions to handle Maintainer and Uploaders, one
that does not try to split on commas and the other that does.
However, this issue seems to require a little clarification.
Ciao,
Enrico
There is no reason to split Maintainer fields, because they should be nothing to split. You should probably have two different functions anyway, because if you encountered a Maintainer field with two values, that would be illegal. :)
That is very right indeed :)
So the part about the Uploader field ("5.6.3. `Uploaders'") probably
needs changing, where it says:
The format is the same as that of the Maintainer tag, and
multiple entries should be comma separated.
I started writing a possible amended text, but I stopped not knowing if
commas are to be allowed within double quotes (as in "Adam C. Powell,
IV" <...>) or to be disallowed altogether.
Ciao,
Enrico
The current Policy (since 3.8.1.0) says "All control files must be encoded in UTF-8." and the current practice is to use UTF-8 in Maintainer: field values.
my intention with this bug was to ask to clarify how entries in Uploaders: fields should be split, and I do not see how UTF-8 encoding can help with that. I checked version 3.9.6.1, and the policy still says this for the Uploader: field: Uploaders: Adam C. Powell, IV <hazelsct@debian.org> is a field with a list of two values: "Adam C. Powell" and "IV <hazelsct@debian.org>", one of which is illegal becaues it contains no email address inside angle brackets. However, the Maintainer: field has only one value, so it does not mention how commas can be represented in its value without being confused with separators. I would like the Uploaders: field to clarify how one can have a comma in the name without triggering a split. There can be several strategies: - it could say that commas are not allowed at all in an Uploaders: field, except as separators; - it could say that one needs to double quote a name that contains commas; - it could say that the separator is a comma only if it is preceded by a closed angle bracket and optional spaces: this would be the option that is harder to implement, but that keeps Adam's packages policy compliant. I assume this is an effect of this bug having been merged with 160827 and 160827 having been closed. I've unmerged them and reopened this one, as this does not have anything to do with charset encoding. Enrico
* Enrico Zini <enrico@debian.org>, 2015-05-15, 13:22: We have another open bug about that. Let's merge them.
It was always my intent that this field would in be a subset of RFC822/5322 sender/recipient field formt. We should never have diverged from 822 here. So that we ever permitted commas in the name part was an egregious mistake. IMO the only question for this bug is is precisely what subsets of 822 format is allowed. The work to be done here is: * Review 5322 and decide which subset to allow. Probably, just unquoted Name <Email> and one-quoted-bloc "Name" <Email>. * Write this down in policy, making many programs insta-nonomcpliant. * Go around fixing all the software. I wanted to reply specifically to this one comment: This is a very bad argument. We should not syntactically prevent a future evolution of our policy to permit co-maintainership, or normal use of this same field by downstreams with a different policy. Furthermore, we have the Uploaders field now. Clearly Maintainer and Uploaders ought to be in the same syntax. Ian.
This last line is somewhat surprising to me. Clearly Uploaders shouldn't exist if Maintainer allows multiple emails.
Andrey Rakhmatullin writes ("Re: Bug#401452: Standardize syntax of the name in the Maintainer control field"):
You do have a point. It's true that we invented Uploaders as a
workaroun for the problem with Maintainer not allowing commas. But
nowadays it seems that some people do make a distinction between
Maintainer and Uploaders.
I think trying to disentangle the semantics of Maintainer vs Uploaders
will to attract a great deal of attention from anyone who has opinions
about package maintenance practices - ie, loads and loads of people.
I worry we'll be drowning in "but *our* definition is ..." and "in our
team we distinguish ..." and so on.
I suggest that we decouple these problems.
I think we can fix the syntax first. Regardless of the semantics,
there is no good reason for the syntaxes to be different.
If after we have fixed the syntax, we can consider the semantics of
Mainitainer (possibly with multiple entries) vs Uploadeers,
separately. Then we can debate questions of workflow, policy,
politics, and so on, without getting hung up on the syntax.
IOW let's not try to drain the ocean all at once.
Ian.
Do you have a list of all tools and services that assume that Maintainer is a single email address ? Before that, I consider any changes to be dangerous. Cheers,
Suggested approach.
Gosh there's a lot of decisions to make!
Ian.
## Fields giving names and email addresses (entity and email fields)
The format of the content of all fields naming people including
Maintainer, and Uploaders are based on IETF RFC 5322 recipient fields
(e.g., "To:" fields in emails), with some modifications.
We call these "entity and email fields".
Informally:
* The field is a comma-separated list of `name <email>` where `name`
can be quoted `"name"` (and may then contain Unicode), or be
unquoted but then has a restriccted character set which excludes
Unicode and excludes `,`.
* There is no `\`-escaping: names simply cannot contain `\` or `"`.
* The `<email>` part contains no whitespace and has a very
restrictive character set.
* The value can be folded to break the lines, except within `"..."`.
Formally:
* These fields are "multiline" as per [internal xref].
* The content is an RFC5322 `address-list`,
with exceptions:
* Each RFC5322 `mailbox` must be `name-addr`
with a non-absent `display-name`.
* Whitespace is not permitted within RFC5322 `angle-addr`.
* Each RFC5322 `local-part` must be `dot-atom`.
* Each RFC5322 `phrase` must be either a single `quoted-string`,
or one or more space-separated `atom`s.
* The RFC5322 `domain` must be in lowercase.
* The following RFC5322 constructs are forbidden:
`obs-*`, `group`, `comment`, `quoted-pair`, `domain-literal`.
* UTF-8 representing Unioce characters with Graphic basic type
may occur as part of `qtext` within `qcontent`.
* Outside RFC5322 `quoted-text`, `FWS` means a single ASCII space,
or a newline followed by one or more ASCII spaces.
* Inside RFC5322 `quoted-text`, `FWS` means a single ASCII space.
Currently, Maintainer may only contain one entry, but this is a
semantic, not syntactic restriction, and may be relaxed in the future.
Email addresses that don't fit into `dot-atom@domain` set are
theoretically legal in RFC 5322, but cannot be represented. However,
these are almost unuseable on the modern Internet.
### Historical notes
The Maintainer and Uploader fields have historically had a more
relaxed, but also inconsistent and confusing syntax.
When existing data is processed:
* ASCII punctuation characters not permitted in RFC5322 atext might
be found unquoted in the `phrase` part.
* This includes commas in Maintainer, but not in Uploaders.
### Processing strategy
A system which doesn't need to understand the field can safely display
it as-is in its entirity.
A system which needs to understand an entity and email field could
proceed as follows:
* Unfold as if this were a "folded" field, collapsing each whitespace
sequence into a single space, so we have a single line.
* Match `"` quotes to identify quoted text. These quotes always
appear in pairs. Check that quoted text contains no `\`.
* Split the whole field on unqquoted `,`.
If the field is a Maintainer field and this would result in any
fragments that do not end in `>`, skip this step. In the future,
this rule will be abolished, and only be relevant for old data.
* Strip whitespace from the ends.
* Now each entry will end in `<....>`. That is the email address
part.
It has a restricted syntax: the allowable character set is ascii
alphanumerics plus any of the following punctuation:
! # $ % & ' * + - / = ? ^ _ ` { | } ~
The email address is in a canonical representation, so can be
directly compared for equality.
* The remainer of the entry (with white space normalised to single
spaces) is the name part. Strip any `"`.
The name part may be used for human display and possibly ordering.
It should not be involved in equality comparisons, lookups, etc.
### Sending emails
To send email to those named in an entity and email field:
Replace any " " that contain non-ASCII with `encoded-words`
as per IETF RFC 1342.
Or, split the address above and use a email header generation library.
Ian Jackson <ijackson@chiark.greenend.org.uk> writes: I think this first point is the one that poses the most backward compatibility issues. The main reason why I have, in the past, looked at this bug and then decided not to work on it is that I'm not sure how to reconcile two quite reasonable competing goals: 1. The Maintainer and Uploader control fields should be easily usable as email addresses. 2. The Maintainer and Uploader control fields do not have the long history of backward compatibility challenges of RFC 5322 and existing practice and the current specification are looser. In particular, the current specification doesn't require people with non-ASCII names to add extra punctuation around their name, which would feel icky, at least to me. In the past, we had at least one Debian maintainer who had a comma in their name. The relevant software mostly handled that correctly in the Maintainer field. I'm not sure if that's still the case. Before we make changes here, I think we need to understand the blast radius. Maybe someone can do some work in UDD to figure out how many packages would be affected if we were to tighten up the syntax here? Requiring quoting for commas (and any other reserved RFC 5322 punctuation) makes sense to me, since that's really going to break otherwise if one attempts to use Maintainer in the To field of an email message. I'm a bit less willing to require non-ASCII be quoted; all of the control fields are specified as being UTF-8 at this point, and I feel like most email sending libraries should be able to cope with UTF-8 in the name even if RFC 5322 still requires weird escaping. But maybe that belief is too optimistic.
For what is worth, packages.debian.org and the BTS send email to the package maintainer according to the Maintainer field. They do not support multiple email addresses. Cheers,
Bill Allombert <ballombe@debian.org> writes: Do you (or anyone else) know what encoding they do to transform the Maintainer field value into an email address in order to send mail to the package maintainer?
Russ Allbery writes ("Re: Bug#401452: Info received (Bug#401452: Standardize syntax of the name in the Maintainer control field)"):
Yes. Hmm.
While I was writing my previous proposal it occurred to me that maybe
we should follow git instead. I tried to find a specification of the
syntax of the "author" and "committer" and "tagger" header lines in
the relevant git objects, but failed.
Hi Ian Huge thanks for tackling this one... it's a seemingly-simple but actually complicated field to describe as you have noted. I've had a bit of a wander through the list of entries that are currently in Maintainer and Uploaders to look at what the stated approach would rule in/out. That then raises some examples to consider - I ask the questions about a few examples below from the "are we sure this is what we want to do" perspective rather than "we should not do this". I'll use made-up examples in the discussion below rather than extracting real people's names from Sources. I don't want to centre the discussion on any individuals, and I am also conscious that this discussion needs to not turn into something that has overtones of "you're spelling your name wrong". It makes for a very long reply - sorry. It's not because there are lots of problems, just (corner) cases to understand. cheers Stuart I'm pleased that we finally have a way to include , in the name part - that fixes one of the current problems nicely. There is only one current example of a comma in Maintainer/Uploaders and it is quoted in this way already. We have a few of the following constructs in the name that I *think* are OK by these rules without quoting, but to confirm: J Smith (js) [parens] J (js) Smith [parens] J O'Dear [single quote] (I have a recollection of parens being special in email addresses; single quotes often are special and there are lots of them in existing entries — just double checking!) I would like to suggest that we find a way to permit non-ASCII unicode letter characters in the name part without requiring quotes. I understand that's an extension to RFC5322 but ... - any use of these data will get to some sort of MUA to fix the representation prior being an issue - other fields in d/control and Sources are allowed to contain non-ASCII unicode letters without any restrictions or encoding. - it would be a compatible upgrade to RFC5322 in that anyone who did quote some non-ASCII characters in their name will not have done the wrong thing - it is appropriate to find ways of being less Anglocentric in our format specifications and I have a feeling that is possible to do safely here - there are many hundreds of existing entries in Sources where the names contain non-ASCII letter characters from lots of different languages - I doubt there is an appetite in Debian to make many thousands of existing packages insta-buggy and then take the next decade to upload fixes, and until they are all fixed also have no set format that can be used by parsers. Some examples Julián Niño J Lee (你好世界) 你好世界 (and we could, of course, imagine lots of other languages and scripts being used here and there are several others in Sources) From the perspective of someone writing a parser I can see why this is attractive... we do have a couple of counter-examples in the archive at present John "Fred" Smith It's a big call to tell those people that they don't know how to spell their name. Can we avoid imposing this restriction without causing too much pain? (Undoubtedly _some_ don't care between " and ', but is that the design principle we should work to?) This is an interesting requirement - is there any need for it? There are counter-examples currently in the archive and uppercase domain names work just fine in real mail systems. The examples above probably explore the space enough, but the attached script spits out 360 'interesting' Maintainer/Uploader entries to look at if you are curious to see some real cases and look check for other variations that I've missed. The regex is overly strict compared to these rules to pull out 'interesting' for 'are we sure' discussions, and not 'violations of the above rules'. Note that the script looks at unique entries in Sources, not people (plenty of repeated names with different email addresses); it offers a count of unique (name, addr) pairs and a count of affected source packages in main. Some variations on the regex in the script let us consider some variations to these rules. The rules as written above = about 300 buggy entries across 5500 packages. Of these: - approx 290 are unicode letter characters in names - i.e. if we can allow unicode letter characters in the name part without needing quoting, we make huge strides in compatibility. (my test was \w which in Python 3 permits unicode “Lm”, “Lt”, “Lu”, “Ll”, or “Lo” plus some digit/numeric forms that we don't want to actually permit but aren't in use in the data set so aren't an issue here) - approx 10 entries are from domain names being in uppercase - there's a handful of remaining items that might actually be OK that are the limits of my current understanding of RFC5322, such as allowing @ in the name part. (and then there 6 or so buggy entries already in Maintainers and Uploaders, either missing commas or with stray commas) I think these data make a strong case for permitting unicode letter characters in the name part and uppercase domain names. Thanks for listing this out - it's useful to consider this at the same time. I had a go at coding it (to eventually land in python-debian) while working through it, but couldn't quite follow a couple of steps below. I'm not sure what 'Match' means in practical terms in the algorithm - would you be storing the list of (start-quote, stop-quote) positions and then at the latter splitting step, not split at character positions of "," that are within those (start-quote, stop-quote) positions? (In my playing, I ended up walking the length of the string, toggling whether the current status was inside or outside a quoted section, and only acting on commas that are found while outside; Python's 'yield' keyword is convenient for that.) Can you please unpack why this is needed? they are defined to not exist ;) Is the purpose to parse or to validate? There are lots of other things that one should check if the purpose is to validate. I don't see a nice way of doing that at that point as I'm not seeing the bigger picture for the algorithm. Perhaps this is a good DebCamp discussion? It would be worth noting that we have many examples of trailing commas in Uploaders and that should be specifically allowed (partly so that implementations don't assume that the last entry is not actually empty). (In looking at examples in the archive I also found a 3 cases where commas were missing in Uploaders; one fixed on salsa, one bug filed) The only maintainer fields containing "," are ones with a single entry that ends with "," — they are already buggy and the parser would drop the empty section anyway, so perhaps this wart can be omitted? also @ . ;) The rules above restrict that further to lowercase ASCII; does one feel the need to actually check that it matches those things? If actually validating, there's a lot more to do than a char check; if not validating, then it's just the bit between < and > whitespace normalisation was already done in the first step so can be avoided here; whitespace on the left end was dealt with 2 steps before. Stripping the single whitespace from the right end would be needed though. (Side question: is the whitespace between name part and < required?) While true of course... we also do that in lots of places in Debian to squash together the multiple emails that an individual has within Sources. (e.g. in the UDD dashboard views)
[ Replying to the bug report, for tracking purposes, as it seems I accidentally sent the previous two mails only to the debian-policy mailing list. ] Hi! Thanks, Guillem