- Package:
- debian-policy
- Source:
- debian-policy
- Submitter:
- Russ Allbery
- Date:
- 2020-11-25 12:39:03 UTC
- Severity:
- wishlist
I think we've discussed this before, but I didn't see an open bug, so
I'll open one so that we can discuss it in one place.
Policy currently says the following about the Maintainer field, which
applies by reference to the Uploaders field:
The package maintainer's name and email address. The name should come
first, then the email address inside angle brackets <> (in RFC822
format).
If the maintainer's name contains a full stop then the whole field
will not work directly as an email address due to a misfeature in the
syntax specified in RFC822; a program using this field as an address
must check for this and correct the problem if necessary (for example
by putting the name in round brackets and moving it to the end, and
bringing the email address forward).
Most software has taken this to mean that the e-mail address should be
in RFC822 format, not that the whole field should be.
This is primarily posing a problem for people who have commas in their
name. The main example to date is Adam C. Powell, IV, but it can happen
with various other name qualifiers and honorifics. Currently, the only
way to express such a name that works with our existing tools is to drop
the comma, since several programs blindly split on commas when parsing the
field.
The most fully technically correct approach would be to require a full
RFC 5322 parse, but that adds a lot of complexity and raises the problem
that there's no standard canonicalization of RFC 5322 header fields. It
becomes unclear whether one should strip off double quotes, remove
blackslashes, remove portions in parentheses, or other things that would
be logical to do from the RFC 5322 grammar.
Alternatively, we could document the permitted character set for the name
portion of the Maintainer field and exclude commas. It's annoying to do
this since commas have been supported in the past (in Maintainer, they're
unambiguous) and have only become a problem in Uploaders. We could only
restrict them in Uploaders, but the lack of symmetry strikes me as a bad
idea.
We could also standardize a simple escaping mechanism of our own (allow
double quotes, for example, but require that, if used, they surround the
entire name and are stripped off by the parsing).
However we resolve this, we should probably also update the referece in
Policy to RFC 822 to refer to RFC 5322 instead, since I doubt we really
want to support source-routed e-mail addresses or similar bizarreness in
Debian control files.
Russ Allbery wrote: > Alternatively, we could document the permitted character set for the name I think it is not polite to force changes in maintainer names. Hmm, RFC5322 is not yet a standard (BTW it is not yet cited in STD1), and anyway it still use the old semantic for compatibility (see the "obs-" references, e.g. the section 4.4). IMHO we should specify a subset of RFC 822, because a full 5322 parse is IMO too complex (and BTW not so useful) to implement in all the tools. Ev. require to use only a subset in the control file, and to recommend a full 5322 parsing in the tools. ciao cate
"Giacomo A. Catenazzi" <cate@debian.org> writes: This is true, but it's essentially meaningless. It's sort of an artifact of the IETF process, but RFC 822 is for practical purposes obsolete and RFC 5322 reflects the current state of addressing standards. True. We should explicitly rule that out. I'm leaning that way as well. I also don't want to require people to use RFC 2047 encoding if they have a name that doesn't fit into ASCII. Anyone have any suggestions on a good subset and description of it that isn't too complex?
While I think it would be fine to have a comprehensive and accurate specification, something like this could be an easy improvement. By omitting mention of RFC 822, the mandate for UTF-8 in the control file should obviate RFC 2047 encoding. Despite underspecifying things, I doubt there will be anyone trying to use email addresses of the wrong form. diff --git a/policy.sgml b/policy.sgml index 7de382d..080229c 100644 --- a/policy.sgml +++ b/policy.sgml @@ -2582,17 +2582,14 @@ Package: libc6 <p> The package maintainer's name and email address. The name should come first, then the email address inside angle - brackets <tt><></tt> (in RFC822 format). + brackets <tt><></tt>. </p> <p> - If the maintainer's name contains a full stop then the - whole field will not work directly as an email address due - to a misfeature in the syntax specified in RFC822; a - program using this field as an address must check for this - and correct the problem if necessary (for example by - putting the name in round brackets and moving it to the - end, and bringing the email address forward). + If the maintainer's name contains a full stop or a comma, + the entire name must either be surrounded by quotation marks + or put within round brackets and moved it to the end + (thus bringing the email address forward). </p> </sect1>
Thank you for the concrete wording proposal! Clint Adams <schizo@debian.org> writes: We could say that the e-mail address must be an RFC 5322 addr-spec without obs-* rules so that we don't lose the restriction on what the e-mail address should be like. I wonder if we should also prohibit domain-literal. We allow it now, but there are no uses of it in the archive. We should say explicitly that the quotation marks are not part of the maintainer's name. Should we say something about whether the maintainer name can be quoted even if it doesn't contain a comma? I'd like to maintain the current allowance for not quoting the maintainer name even if it contains a full stop, despite the RFC 5322 requirement to quote addresses that contain full stops. Among other things, people who use initials in their maintainer names don't currently do the quoting and I don't really want to make those packages buggy. I think we can safely prohibit for our purposes the email@address (Name) form. There are no occurrances of it in the archive. Whatever we say here we should probably also say in section 4.4 (the changelog specification). Maintainers should use the same form of the name and be able to do the same quoting in both places.
While I can only agree on the technical ground of this proposal, I have quite a number of scripts (including popcon) that depend on the ability to extract the maintainer name from the Maintainer/Uploaders field. I suspect others developers and debian-qa might have others. Adding quotes around the maintainer name break the interface somehow. Using the full Maintainer field is often problematic because: 1) we might not want to display the email address. 2) we might want to merge entries from the same maintainer using different email adresses for different packages. (popcon go farther and check for different capitalization). So I would suggest we keep the format 'Name <email>' and forbid dot and commas. Developers that need them could use UTF-8 variants of those. Alternatively, debian-policy could spell out the correct regexp to extract the Maintainer name, but there will be a lot of scripts to update. Cheers,
Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr> writes: Well, I really don't want to prohibit dots. We allow dots now and they don't pose any problems, other than the note in Policy that you need to put quotes around the name if you use it in an e-mail To: field (which presumably all of our software already deals with). Your point about not wanting to change software that parses the name is well-taken. I think, though, that if we say that you may only put double-quotes around the name if there is a comma in the name and otherwise the quotes should be omitted, that would minimize the problem. Only a handful of existing maintainers would be affected (namely those maintainers who are having trouble right now), so updating software wouldn't be that urgent.
* Russ Allbery [Mon, 19 Jan 2009 12:10:55 -0800]: I think dots should be allowed, yes, and be allowed unquoted. I think we should *consider* do without commas at all, if losing them is something we could live with. I realize that would be annoying for people that have a comma in their name, so I'm not right away saying we should forbid them. But I really think we should consider it, because even if commas have to be quoted, you've already lost the ability to parse the Uploaders field with split /\s*,\s*/, which I think would be a loss, since that works for all other fields. (Oh, and if we do without commas, we should do without quoting as well IMHO.) Just my 2¢,
Adeodato Simó <dato@net.com.org.es> writes: It would certainly make it easier for software. I have to admit to a personal bias (speaking as someone who goes by his middle name rather than his first name) in favor of fixing software to accurately recognize people's names rather than the other way around. I personally find software that refuses to recognize my name the way that I spell it to be quite obnoxious, so I'm sympathetic to people who have commas in their name. But yes, allowing commas, even quoted, does complicate Uploaders parsing quite a bit over the current simple state. Bill mentioned the possibility of a Unicode comma other than the ASCII comma. Does such a thing exist? It's kind of a hack, but it's also an interesting compromise. I'm not sure why there would be such a thing, though, given that there's a perfectly good comma in the ASCII range and Unicode normally doesn't duplicate code points to no purpose.
There are several other commas that have code points, but IMHO none of them would be an adequate fit for this given that the glyphs differ. The one with the closest glyph would be U+FE50 SMALL COMMA, but that appears to be a fullwidth character.
In any case, if commas are allowed, policy should spellout the correct regexp to parse the Uploaders field. I have the exact opposite experience with unicode :) U+FF0C FULLWIDTH COMMA should do the trick. Cheers,
I thing that this issue became more serious with the lintian-based automatic rejection of incoming packages. Some days ago an upload (illuminator_0.11.0-4_amd64.changes) got rejected because of this pending issue. (from the reject mail) (CC-ing ftp-masters as reject mails say to write if something concerns them, and CC-ing debian-science-maintainers and hazelsct as maintainers of the package in question) ciao Riccardo
Riccardo Stagni <unriccio@email.it> writes: I thought we established the last time around on this discussion that Adam needs to leave the comma off in his name for the time being. DAK is also misinterpreting the value, so it's not just a Lintian problem. All our current tools assume they can do a naive split on comma, and even if we decided to change this and changed Policy now, it's going to take a while for the tools to change.
I wasn't aware of this, reading the bug report it looked to me that the discussion was stopped and no decision really made. Sorry for the wasted time! :( Riccardo
Riccardo Stagni <unriccio@email.it> writes: Oh, no, it's quite okay. I feel bad that the discussion stalled out and we didn't reach a conclusion, since the current situation really isn't ideal and I hate making someone not use their actual name.
Yeah, I was kind of surprised by that. Didn't have time to deal with it over the past week, but figured I'd try to work around it using a lintian override. I don't remember receiving this advice. That would make the name different from what's in my GPG key, but I suppose I could add an additional name to the key... Note my email From address doesn't have the comma or full stop, because I find the quotes aesthetically inelegant, but I made that change after the key. I think you mean "even if we decided to clarify Policy..." as my practice conforms to what's currently in Policy. But we've been through that already.
Adam C Powell IV <hazelsct@debian.org> writes:
Ack, sorry. Apologies for not having communicated effectively enough
there.
Nothing in the archive software should care about this. Authorization is
done based on the key, not on the IDs on the key. I'm not sure if tools
like debsign would have trouble figuring out what key to use, but I
suspect they do the lookup based on the e-mail address only to avoid
problems with slight mismatches in names.
Yeah, sorry, I was sloppy in my wording. Policy is currently ambiguous:
The package maintainer's name and email address. The name should come
first, then the email address inside angle brackets <> (in RFC822
format).
was interpreted (reasonably) by you as saying that the whole field was in
RFC822 format, and was interpreted by the authors of tools as saying that
only the e-mail address was in RFC822 format, with no specification for
the name. Then Uploaders was added later, with a specification saying
that uploaders are separated by commas.
The whole specification is inadequate on multiple fronts, really, since we
certainly don't accept all RFC822 addresses either (RFC822 addresses could
contain commas inside comments, for example, and I'm sure our tools don't
cope).
I dunno how they try to match key ids, but to be sure you can put your key in the configuration so debsign knows what to use no matter what you put in your changelog (useful for sponsoring packages too): emilio@saturno:~$ grep DEBSIGN .devscripts DEBSIGN_KEYID=4A08B2FE Cheers, Emilio
* Russ Allbery <rra@debian.org>, 2008-12-27, 12:27:
- Let's forget about RFC 822/5322 compatibility, as it would introduce
only needless complexity.
- Let's allow any punctuation characters in maintainer names and e-mail
addresses *except* "<" and ">".
This way comma is completely disambiguated: it splits the field if and
only it's preceded by the ">" character. I.e. you can use the following
Perl regex to split the field: /\>\K\s*,\s*/.
One can easily check that this method does the right thing for parsing
Uploaders fields of the existing packages: you could e.g. try this on
ries:
$ zcat /srv/ftp.debian.org/mirror/dists/*/*/source/Sources.gz | grep-dctrl -ns Maintainer,Uploaders -e '' | perl -pe 's/\>\K\s*,\s*/\n/g' | sort -u
Incidentally, this is (almost) the same method dak uses to split
Uploaders:
$ grep -r uploaders.*split daklib/
daklib/dbconn.py: for up in u.pkg.dsc["uploaders"].replace(">, ", ">\t").split("\t"):
Let's fix them, then. :) I volunteer to fix lintian and dd-list. Do you
know any other tools that parse Uploaders?
Jakub Wilk <jwilk@debian.org> writes: Oh, hm, yeah, that would work. dak, of course, but it sounds from your message like it's already doing the right thing. The PTS and DDPO -- I'm not sure what gets that data into those systems. UDD? I think your solution sounds excellent.
* Russ Allbery <rra@debian.org>, 2011-09-08, 19:09:
One think it doesn't do right is that it doesn't allow for space before
the comma. (We have a few packages in the archive with " , " in the
Uploaders field.) Should other (than space) whitespace characters be
allowed before/after comma as well?
They have both their own, IMO over-engineered parsers of Sources files.
PTS:
def addresses_from_string(content):
pattern = re.compile("([^>]),")
hacked_content = pattern.sub("\\1WEWANTNOCOMMAS", content)
msg = email.message_from_string("Header: " + hacked_content)
hacked_list = email.Utils.getaddresses(msg.get_all("Header", []))
list = map(lambda p:
map(lambda s:string.replace(s,"WEWANTNOCOMMAS",","), p),
hacked_list)
return list
Again, PTS trips on a space before comma.
DDPO:
my @uploaders = ($uploaders =~ /([^,@ ][^@]+@[^@]+>)/g);
$db{"com:$package"} = scalar @uploaders;
foreach my $uploader (@uploaders) {
my ($name, $mail);
if ($uploader =~ /^\S+$/) {
($name, $mail) = ("(unknown)", $uploader);
warn "Uploader without name: $package $uploader";
} else {
$uploader =~ /(.+) <(.+)>/ or warn "$fname:$.: syntax error in $uploader";
($name, $mail) = ($1, $2);
$db{"name:$mail"} = $name;
}
$packages{$mail}->{$component}->{$package} = 1;
}
DDPO doesn't allow for leading comma or @ in the maintainer's name, but
that's a minor nitpick.
UDD uses Python's email.Utils.getaddresses(), so it will need fixing.
Jakub Wilk <jwilk@debian.org> writes: The only other ones I can think of are newline and tab, which would be weird but which I believe is allowed by the syntax.
Hi! I quite like Jakub's suggestion that we use /\>\K\s*,\s*/ to split the list of Uploaders. It's very permissive and will suit our needs for this field but doesn't imply a large amount of overhead for parsers of the field or require parsers to deal with the full gamut of possibilities that the various RFCs would permit if we referenced only them. In practical terms, what is required now to wrap this up? (Knowing how Uploaders should be split would then allow us to expose functionality to do this in python-debian.) cheers Stuart
I would say: - Review whether the current Uploaders fields are compliant with this. - Review whether tools that parse the Uploaders field does it in a safe way. - Actually write the proposal. Cheers,
Dear Customer, Courier was unable to deliver the parcel to you. Please, open email attachment to print shipment label. Yours faithfully, Gary Turner, Sr. Station Manager.
Dear Customer, This is to confirm that one or more of your parcels has been shipped. You can review complete details of your order in the find attached. Yours faithfully, Curtis Mays, Sr. Station Manager.
Dear Customer, We could not deliver your item. You can review complete details of your order in the find attached. Thank you for choosing FedEx, Dan Roe, Sr. Support Manager.
Dear Customer, This is to confirm that one or more of your parcels has been shipped. Shipment Label is attached to email. Yours sincerely, Johnnie Carson, Operation Manager.
Dear Customer, We could not deliver your parcel. Shipment Label is attached to email. Yours trully, Vernon Noble, Sr. Delivery Agent.
Dear Customer, Courier was unable to deliver the parcel to you. Shipment Label is attached to email. Yours faithfully, Ivan Buck, FedEx Station Manager.
Dear Customer, This is to confirm that one or more of your parcels has been shipped. Please, download Delivery Label attached to this email. Sincerely, Roberto Walsh, FedEx Operation Agent.
Dear Customer, We could not deliver your item. You can review complete details of your order in the find attached. Warm regards, Travis Farley, Support Manager.
Dear Customer, Courier was unable to deliver the parcel to you. Please, download Delivery Label attached to this email. Yours sincerely, Robert Schaefer, FedEx Support Agent.
Dear Customer, Your parcel has arrived at November 03. Courier was unable to deliver the parcel to you. Shipment Label is attached to this email. Thank you for choosing FedEx, Eduardo Henderson, Sr. Operation Agent.