#401452 Standardize syntax of the name in the Maintainer control field

#401452#5
Date:
2006-12-03 17:49:23 UTC
From:
To:
Hello,

 enrico> Just when I wanted to split Maintainer fields my commas, I
         stumble on Maintainer: Adam C. Powell, IV <hazelsct@debian.org>
 enrico> Now I'll just split Uploader: gfields by commas
    liw> enrico, hmm, should the Maintainer field not be an rfc822
         compatible e-mail address spec?
 enrico> liw: I'll check the policy
    liw> hmm, the policy only mentions a problem with periods
 enrico> liw: and only the mail address seems to be RFC822

Section 5.6.2. `Maintainer' says:

     The package maintainer's name and email address.  The name should come
     first, then the email address inside angle brackets `<>' (in RFC822
     format).

     If the maintainer's name contains a full stop then the whole field
     will not work directly as an email address due to a misfeature in the
     syntax specified in RFC822; a program using this field as an address
     must check for this and correct the problem if necessary (for example
     by putting the name in round brackets and moving it to the end, and
     bringing the email address forward).

Now, the field "Adam C. Powell, IV <hazelsct@debian.org>" seems to be
legal according to current policy.  However, later in "5.6.3.
`Uploaders'" the policy says:

     The format is the same as that of the Maintainer tag, and multiple
     entries should be comma separated.

This would imply that commas should not be used in the Maintainer field.

This is not a big problem for me now, since I can work around the issue
by using two different functions to handle Maintainer and Uploaders, one
that does not try to split on commas and the other that does.

However, this issue seems to require a little clarification.


Ciao,

Enrico

#401452#10
Date:
2007-01-14 23:30:23 UTC
From:
To:
There is no reason to split Maintainer fields, because they should be
nothing to split.

You should probably have two different functions anyway, because if you
encountered a Maintainer field with two values, that would be illegal. :)

#401452#15
Date:
2007-03-06 10:34:22 UTC
From:
To:
That is very right indeed :)

So the part about the Uploader field ("5.6.3. `Uploaders'") probably
needs changing, where it says:

	The format is the same as that of the Maintainer tag, and
	multiple entries should be comma separated.

I started writing a possible amended text, but I stopped not knowing if
commas are to be allowed within double quotes (as in "Adam C. Powell,
IV" <...>) or to be disallowed altogether.


Ciao,

Enrico

#401452#20
Date:
2015-05-11 08:28:30 UTC
From:
To:
The current Policy (since 3.8.1.0) says "All control files must be encoded
in UTF-8." and the current practice is to use UTF-8 in Maintainer: field
values.

#401452#25
Date:
2015-05-15 11:22:51 UTC
From:
To:
my intention with this bug was to ask to clarify how entries in
Uploaders: fields should be split, and I do not see how UTF-8 encoding
can help with that.

I checked version 3.9.6.1, and the policy still says this for the
Uploader: field:

  Uploaders: Adam C. Powell, IV <hazelsct@debian.org>

is a field with a list of two values: "Adam C. Powell" and
"IV <hazelsct@debian.org>", one of which is illegal becaues it contains
no email address inside angle brackets.

However, the Maintainer: field has only one value, so it does not
mention how commas can be represented in its value without being
confused with separators.

I would like the Uploaders: field to clarify how one can have a comma in
the name without triggering a split. There can be several strategies:

 - it could say that commas are not allowed at all in an Uploaders:
   field, except as separators;
 - it could say that one needs to double quote a name that contains
   commas;
 - it could say that the separator is a comma only if it is preceded by
   a closed angle bracket and optional spaces: this would be the option
   that is harder to implement, but that keeps Adam's packages policy
   compliant.

I assume this is an effect of this bug having been merged with 160827
and 160827 having been closed. I've unmerged them and reopened this one,
as this does not have anything to do with charset encoding.


Enrico

#401452#32
Date:
2015-05-15 19:54:10 UTC
From:
To:
* Enrico Zini <enrico@debian.org>, 2015-05-15, 13:22:

We have another open bug about that. Let's merge them.

#401452#49
Date:
2025-06-13 10:47:08 UTC
From:
To:
It was always my intent that this field would in be a subset of
RFC822/5322 sender/recipient field formt.

We should never have diverged from 822 here.  So that we ever
permitted commas in the name part was an egregious mistake.

IMO the only question for this bug is is precisely what subsets of 822
format is allowed.  The work to be done here is:

 * Review 5322 and decide which subset to allow.  Probably, just
   unquoted Name <Email> and one-quoted-bloc "Name" <Email>.

 * Write this down in policy, making many programs insta-nonomcpliant.

 * Go around fixing all the software.

I wanted to reply specifically to this one comment:

This is a very bad argument.

We should not syntactically prevent a future evolution of our policy
to permit co-maintainership, or normal use of this same field by
downstreams with a different policy.

Furthermore, we have the Uploaders field now.  Clearly Maintainer and
Uploaders ought to be in the same syntax.

Ian.

#401452#54
Date:
2025-06-13 12:35:27 UTC
From:
To:
This last line is somewhat surprising to me. Clearly Uploaders shouldn't
exist if Maintainer allows multiple emails.

#401452#59
Date:
2025-06-13 16:46:40 UTC
From:
To:
Andrey Rakhmatullin writes ("Re: Bug#401452: Standardize syntax of the name in the Maintainer control field"):

You do have a point.  It's true that we invented Uploaders as a
workaroun for the problem with Maintainer not allowing commas.  But
nowadays it seems that some people do make a distinction between
Maintainer and Uploaders.

I think trying to disentangle the semantics of Maintainer vs Uploaders
will to attract a great deal of attention from anyone who has opinions
about package maintenance practices - ie, loads and loads of people.
I worry we'll be drowning in "but *our* definition is ..." and "in our
team we distinguish ..." and so on.

I suggest that we decouple these problems.

I think we can fix the syntax first.  Regardless of the semantics,
there is no good reason for the syntaxes to be different.

If after we have fixed the syntax, we can consider the semantics of
Mainitainer (possibly with multiple entries) vs Uploadeers,
separately.  Then we can debate questions of workflow, policy,
politics, and so on, without getting hung up on the syntax.

IOW let's not try to drain the ocean all at once.

Ian.

#401452#64
Date:
2025-06-13 17:12:54 UTC
From:
To:
Do you have a list of all tools and services that assume that Maintainer is a
single email address ? Before that, I consider any changes to be dangerous.

Cheers,

#401452#69
Date:
2025-06-13 18:14:31 UTC
From:
To:
Suggested approach.

Gosh there's a lot of decisions to make!

Ian.


## Fields giving names and email addresses (entity and email fields)

The format of the content of all fields naming people including
Maintainer, and Uploaders are based on IETF RFC 5322 recipient fields
(e.g., "To:" fields in emails), with some modifications.

We call these "entity and email fields".

Informally:

 * The field is a comma-separated list of `name <email>` where `name`
   can be quoted `"name"` (and may then contain Unicode), or be
   unquoted but then has a restriccted character set which excludes
   Unicode and excludes `,`.

 * There is no `\`-escaping: names simply cannot contain `\` or `"`.

 * The `<email>` part contains no whitespace and has a very
   restrictive character set.

 * The value can be folded to break the lines, except within `"..."`.

Formally:

 * These fields are "multiline" as per [internal xref].
 * The content is an RFC5322 `address-list`,
   with exceptions:
 * Each RFC5322 `mailbox` must be `name-addr`
   with a non-absent `display-name`.
 * Whitespace is not permitted within RFC5322 `angle-addr`.
 * Each RFC5322 `local-part` must be `dot-atom`.
 * Each RFC5322 `phrase` must be either a single `quoted-string`,
   or one or more space-separated `atom`s.
 * The RFC5322 `domain` must be in lowercase.
 * The following RFC5322 constructs are forbidden:
   `obs-*`, `group`, `comment`, `quoted-pair`, `domain-literal`.
 * UTF-8 representing Unioce characters with Graphic basic type
   may occur as part of `qtext` within `qcontent`.
 * Outside RFC5322 `quoted-text`, `FWS` means a single ASCII space,
   or a newline followed by one or more ASCII spaces.
 * Inside RFC5322 `quoted-text`, `FWS` means a single ASCII space.

Currently, Maintainer may only contain one entry, but this is a
semantic, not syntactic restriction, and may be relaxed in the future.

Email addresses that don't fit into `dot-atom@domain` set are
theoretically legal in RFC 5322, but cannot be represented.  However,
these are almost unuseable on the modern Internet.

### Historical notes

The Maintainer and Uploader fields have historically had a more
relaxed, but also inconsistent and confusing syntax.

When existing data is processed:

 * ASCII punctuation characters not permitted in RFC5322 atext might
   be found unquoted in the `phrase` part.

 * This includes commas in Maintainer, but not in Uploaders.

### Processing strategy

A system which doesn't need to understand the field can safely display
it as-is in its entirity.

A system which needs to understand an entity and email field could
proceed as follows:

 * Unfold as if this were a "folded" field, collapsing each whitespace
   sequence into a single space, so we have a single line.

 * Match `"` quotes to identify quoted text.  These quotes always
   appear in pairs.  Check that quoted text contains no `\`.

 * Split the whole field on unqquoted `,`.

   If the field is a Maintainer field and this would result in any
   fragments that do not end in `>`, skip this step.  In the future,
   this rule will be abolished, and only be relevant for old data.

 * Strip whitespace from the ends.

 * Now each entry will end in `<....>`.  That is the email address
   part.

   It has a restricted syntax: the allowable character set is ascii
   alphanumerics plus any of the following punctuation:
      ! # $ % & ' * + - / = ?  ^ _ ` { | } ~

   The email address is in a canonical representation, so can be
   directly compared for equality.

 * The remainer of the entry (with white space normalised to single
   spaces) is the name part.  Strip any `"`.

   The name part may be used for human display and possibly ordering.
   It should not be involved in equality comparisons, lookups, etc.

### Sending emails

To send email to those named in an entity and email field:

Replace any " " that contain non-ASCII with `encoded-words`
as per IETF RFC 1342.

Or, split the address above and use a email header generation library.

#401452#74
Date:
2025-06-20 16:38:27 UTC
From:
To:
Ian Jackson <ijackson@chiark.greenend.org.uk> writes:

I think this first point is the one that poses the most backward
compatibility issues. The main reason why I have, in the past, looked at
this bug and then decided not to work on it is that I'm not sure how to
reconcile two quite reasonable competing goals:

1. The Maintainer and Uploader control fields should be easily usable as
   email addresses.

2. The Maintainer and Uploader control fields do not have the long history
   of backward compatibility challenges of RFC 5322 and existing practice
   and the current specification are looser. In particular, the current
   specification doesn't require people with non-ASCII names to add extra
   punctuation around their name, which would feel icky, at least to me.

In the past, we had at least one Debian maintainer who had a comma in
their name. The relevant software mostly handled that correctly in the
Maintainer field. I'm not sure if that's still the case.

Before we make changes here, I think we need to understand the blast
radius. Maybe someone can do some work in UDD to figure out how many
packages would be affected if we were to tighten up the syntax here?

Requiring quoting for commas (and any other reserved RFC 5322 punctuation)
makes sense to me, since that's really going to break otherwise if one
attempts to use Maintainer in the To field of an email message. I'm a bit
less willing to require non-ASCII be quoted; all of the control fields are
specified as being UTF-8 at this point, and I feel like most email sending
libraries should be able to cope with UTF-8 in the name even if RFC 5322
still requires weird escaping. But maybe that belief is too optimistic.

#401452#79
Date:
2025-06-20 21:31:32 UTC
From:
To:
For what is worth, packages.debian.org and the BTS send email to the
package maintainer according to the Maintainer field.
They do not support multiple email addresses.

Cheers,

#401452#84
Date:
2025-06-20 22:23:30 UTC
From:
To:
Bill Allombert <ballombe@debian.org> writes:

Do you (or anyone else) know what encoding they do to transform the
Maintainer field value into an email address in order to send mail to the
package maintainer?

#401452#89
Date:
2025-06-20 22:31:52 UTC
From:
To:
Russ Allbery writes ("Re: Bug#401452: Info received (Bug#401452: Standardize syntax of the name in the Maintainer control field)"):

Yes.  Hmm.

While I was writing my previous proposal it occurred to me that maybe
we should follow git instead.  I tried to find a specification of the
syntax of the "author" and "committer" and "tagger" header lines in
the relevant git objects, but failed.

#401452#94
Date:
2025-06-23 02:41:55 UTC
From:
To:
Hi Ian

Huge thanks for tackling this one... it's a seemingly-simple but
actually complicated field to describe as you have noted.

I've had a bit of a wander through the list of entries that are
currently in Maintainer and Uploaders to look at what the stated
approach would rule in/out. That then raises some examples to consider -
I ask the questions about a few examples below from the "are we sure
this is what we want to do" perspective rather than "we should not do this".

I'll use made-up examples in the discussion below rather than extracting
real people's names from Sources. I don't want to centre the discussion
on any individuals, and I am also conscious that this discussion needs
to not turn into something that has overtones of "you're spelling your
name wrong".

It makes for a very long reply - sorry. It's not because there are lots
of problems, just (corner) cases to understand.

cheers
Stuart

I'm pleased that we finally have a way to include , in the name part -
that fixes one of the current problems nicely. There is only one current
example of a comma in Maintainer/Uploaders and it is quoted in this way
already.

We have a few of the following constructs in the name that I *think* are
OK by these rules without quoting, but to confirm:

	J Smith (js)			[parens]
	J (js) Smith			[parens]
	J O'Dear			[single quote]

(I have a recollection of parens being special in email addresses;
single quotes often are special and there are lots of them in existing
entries — just double checking!)

I would like to suggest that we find a way to permit non-ASCII unicode
letter characters in the name part without requiring quotes. I
understand that's an extension to RFC5322 but ...

- any use of these data will get to some sort of MUA to fix the
   representation prior being an issue
- other fields in d/control and Sources are allowed to contain non-ASCII
   unicode letters without any restrictions or encoding.
- it would be a compatible upgrade to RFC5322 in that anyone who did
   quote some non-ASCII characters in their name will not have done the
   wrong thing
- it is appropriate to find ways of being less Anglocentric in our
   format specifications and I have a feeling that is possible to do
   safely here
- there are many hundreds of existing entries in Sources where the names
   contain non-ASCII letter characters from lots of different languages
- I doubt there is an appetite in Debian to make many thousands of
   existing packages insta-buggy and then take the next decade to upload
   fixes, and until they are all fixed also have no set format that can
   be used by parsers.

Some examples

	Julián Niño
	J Lee (你好世界)
	你好世界

(and we could, of course, imagine lots of other languages and scripts
being used here and there are several others in Sources)

 From the perspective of someone writing a parser I can see why this is
attractive... we do have a couple of counter-examples in the archive at
present

	John "Fred" Smith

It's a big call to tell those people that they don't know how to spell
their name. Can we avoid imposing this restriction without causing too
much pain? (Undoubtedly _some_ don't care between " and ', but is that
the design principle we should work to?)

This is an interesting requirement - is there any need for it? There are
counter-examples currently in the archive and uppercase domain names
work just fine in real mail systems.



The examples above probably explore the space enough, but the attached
script spits out 360 'interesting' Maintainer/Uploader entries to look
at if you are curious to see some real cases and look check for other
variations that I've missed. The regex is overly strict compared to
these rules to pull out 'interesting' for 'are we sure' discussions, and
not 'violations of the above rules'. Note that the script looks at
unique entries in Sources, not people (plenty of repeated names with
different email addresses); it offers a count of unique (name, addr)
pairs and a count of affected source packages in main.


Some variations on the regex in the script let us consider some
variations to these rules.

The rules as written above = about 300 buggy entries across 5500 packages.

Of these:

- approx 290 are unicode letter characters in names - i.e. if we can
allow unicode letter characters in the name part without needing
quoting, we make huge strides in compatibility. (my test was \w which in
Python 3 permits unicode “Lm”, “Lt”, “Lu”, “Ll”, or “Lo” plus some
digit/numeric forms that we don't want to actually permit but aren't in
use in the data set so aren't an issue here)

- approx 10 entries are from domain names being in uppercase

- there's a handful of remaining items that might actually be OK that
are the limits of my current understanding of RFC5322, such as allowing
@ in the name part.

(and then there 6 or so buggy entries already in Maintainers and
Uploaders, either missing commas or with stray commas)


I think these data make a strong case for permitting unicode letter
characters in the name part and uppercase domain names.

Thanks for listing this out - it's useful to consider this at the same
time. I had a go at coding it (to eventually land in python-debian)
while working through it, but couldn't quite follow a couple of steps below.

I'm not sure what 'Match' means in practical terms in the algorithm -
would you be storing the list of (start-quote, stop-quote) positions and
then at the latter splitting step, not split at character positions of
"," that are within those (start-quote, stop-quote) positions?

(In my playing, I ended up walking the length of the string, toggling
whether the current status was inside or outside a quoted section, and
only acting on commas that are found while outside; Python's 'yield'
keyword is convenient for that.)

Can you please unpack why this is needed? they are defined to not exist
;) Is the purpose to parse or to validate? There are lots of other
things that one should check if the purpose is to validate.

I don't see a nice way of doing that at that point as I'm not seeing the
bigger picture for the algorithm. Perhaps this is a good DebCamp discussion?


It would be worth noting that we have many examples of trailing commas
in Uploaders and that should be specifically allowed (partly so that
implementations don't assume that the last entry is not actually empty).

(In looking at examples in the archive I also found a 3 cases where
commas were missing in Uploaders; one fixed on salsa, one bug filed)


The only maintainer fields containing "," are ones with a single entry
that ends with "," — they are already buggy and the parser would drop
the empty section anyway, so perhaps this wart can be omitted?

also
	@ .
;)

The rules above restrict that further to lowercase ASCII; does one feel
the need to actually check that it matches those things? If actually
validating, there's a lot more to do than a char check; if not
validating, then it's just the bit between < and >


whitespace normalisation was already done in the first step so can be
avoided here; whitespace on the left end was dealt with 2 steps before.
Stripping the single whitespace from the right end would be needed
though. (Side question: is the whitespace between name part and < required?)

While true of course... we also do that in lots of places in Debian to
squash together the multiple emails that an individual has within
Sources. (e.g. in the UDD dashboard views)

#401452#99
Date:
2026-01-29 22:03:42 UTC
From:
To:
[ Replying to the bug report, for tracking purposes, as it seems I
  accidentally sent the previous two mails only to the debian-policy
  mailing list. ]

Hi!

Thanks,
Guillem