#509935 decide whether Uploaders is parsed per RFC 5322

#509935#5
Date:
2008-12-27 20:27:37 UTC
From:
To:
I think we've discussed this before, but I didn't see an open bug, so
I'll open one so that we can discuss it in one place.

Policy currently says the following about the Maintainer field, which
applies by reference to the Uploaders field:

    The package maintainer's name and email address. The name should come
    first, then the email address inside angle brackets <> (in RFC822
    format).

    If the maintainer's name contains a full stop then the whole field
    will not work directly as an email address due to a misfeature in the
    syntax specified in RFC822; a program using this field as an address
    must check for this and correct the problem if necessary (for example
    by putting the name in round brackets and moving it to the end, and
    bringing the email address forward).

Most software has taken this to mean that the e-mail address should be
in RFC822 format, not that the whole field should be.

This is primarily posing a problem for people who have commas in their
name.  The main example to date is Adam C. Powell, IV, but it can happen
with various other name qualifiers and honorifics.  Currently, the only
way to express such a name that works with our existing tools is to drop
the comma, since several programs blindly split on commas when parsing the
field.

The most fully technically correct approach would be to require a full
RFC 5322 parse, but that adds a lot of complexity and raises the problem
that there's no standard canonicalization of RFC 5322 header fields.  It
becomes unclear whether one should strip off double quotes, remove
blackslashes, remove portions in parentheses, or other things that would
be logical to do from the RFC 5322 grammar.

Alternatively, we could document the permitted character set for the name
portion of the Maintainer field and exclude commas.  It's annoying to do
this since commas have been supported in the past (in Maintainer, they're
unambiguous) and have only become a problem in Uploaders.  We could only
restrict them in Uploaders, but the lack of symmetry strikes me as a bad
idea.

We could also standardize a simple escaping mechanism of our own (allow
double quotes, for example, but require that, if used, they surround the
entire name and are stripped off by the parsing).

However we resolve this, we should probably also update the referece in
Policy to RFC 822 to refer to RFC 5322 instead, since I doubt we really
want to support source-routed e-mail addresses or similar bizarreness in
Debian control files.

#509935#12
Date:
2009-01-14 09:01:54 UTC
From:
To:
Russ Allbery wrote:
  > Alternatively, we could document the permitted character set for the name

I think it is not polite to force changes in maintainer names.

Hmm, RFC5322 is not yet a standard (BTW it is not yet cited in STD1),
and anyway it still use the old semantic for compatibility (see the
"obs-" references, e.g. the section 4.4).

IMHO we should specify a subset of RFC 822, because a full 5322 parse
is IMO too complex (and BTW not so useful) to implement in all the
tools.  Ev. require to use only a subset in the control file, and
to recommend a full 5322 parsing in the tools.

ciao
	cate

#509935#17
Date:
2009-01-15 06:26:03 UTC
From:
To:
"Giacomo A. Catenazzi" <cate@debian.org> writes:

This is true, but it's essentially meaningless.  It's sort of an artifact
of the IETF process, but RFC 822 is for practical purposes obsolete and
RFC 5322 reflects the current state of addressing standards.

True.  We should explicitly rule that out.

I'm leaning that way as well.  I also don't want to require people to use
RFC 2047 encoding if they have a name that doesn't fit into ASCII.

Anyone have any suggestions on a good subset and description of it that
isn't too complex?

#509935#22
Date:
2009-01-19 02:02:16 UTC
From:
To:
While I think it would be fine to have a comprehensive and accurate
specification, something like this could be an easy improvement.

By omitting mention of RFC 822, the mandate for UTF-8 in the control
file should obviate RFC 2047 encoding.

Despite underspecifying things, I doubt there will be anyone trying
to use email addresses of the wrong form.

diff --git a/policy.sgml b/policy.sgml
index 7de382d..080229c 100644
--- a/policy.sgml
+++ b/policy.sgml
@@ -2582,17 +2582,14 @@ Package: libc6
 	  <p>
 	    The package maintainer's name and email address.  The name
 	    should come first, then the email address inside angle
-	    brackets <tt><></tt> (in RFC822 format).
+	    brackets <tt><></tt>.
 	  </p>

 	  <p>
-	    If the maintainer's name contains a full stop then the
-	    whole field will not work directly as an email address due
-	    to a misfeature in the syntax specified in RFC822; a
-	    program using this field as an address must check for this
-	    and correct the problem if necessary (for example by
-	    putting the name in round brackets and moving it to the
-	    end, and bringing the email address forward).
+	    If the maintainer's name contains a full stop or a comma,
+	    the entire name must either be surrounded by quotation marks
+	    or put within round brackets and moved it to the end
+	    (thus bringing the email address forward).
 	  </p>
 	</sect1>

#509935#27
Date:
2009-01-19 02:24:46 UTC
From:
To:
Thank you for the concrete wording proposal!

Clint Adams <schizo@debian.org> writes:

We could say that the e-mail address must be an RFC 5322 addr-spec without
obs-* rules so that we don't lose the restriction on what the e-mail
address should be like.

I wonder if we should also prohibit domain-literal.  We allow it now, but
there are no uses of it in the archive.

We should say explicitly that the quotation marks are not part of the
maintainer's name.  Should we say something about whether the maintainer
name can be quoted even if it doesn't contain a comma?

I'd like to maintain the current allowance for not quoting the maintainer
name even if it contains a full stop, despite the RFC 5322 requirement to
quote addresses that contain full stops.  Among other things, people who
use initials in their maintainer names don't currently do the quoting and
I don't really want to make those packages buggy.

I think we can safely prohibit for our purposes the email@address (Name)
form.  There are no occurrances of it in the archive.

Whatever we say here we should probably also say in section 4.4 (the
changelog specification).  Maintainers should use the same form of the
name and be able to do the same quoting in both places.

#509935#32
Date:
2009-01-19 08:21:15 UTC
From:
To:
While I can only agree on the technical ground of this proposal, I have
quite a number of scripts (including popcon) that depend on the ability
to extract the maintainer name from the Maintainer/Uploaders field. I suspect
others developers and debian-qa might have others.

Adding quotes around the maintainer name break the interface somehow.

Using the full Maintainer field is often problematic because:
1) we might not want to display the email address.
2) we might want to merge entries from the same maintainer using
different email adresses for different packages. (popcon go farther
and check for different capitalization).

So I would suggest we keep the format 'Name <email>' and forbid dot and
commas. Developers that need them could use UTF-8 variants of those.

Alternatively, debian-policy could spell out the correct regexp to
extract the Maintainer name, but there will be a lot of scripts to
update.

Cheers,

#509935#37
Date:
2009-01-19 20:10:55 UTC
From:
To:
Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr> writes:

Well, I really don't want to prohibit dots.  We allow dots now and they
don't pose any problems, other than the note in Policy that you need to
put quotes around the name if you use it in an e-mail To: field (which
presumably all of our software already deals with).

Your point about not wanting to change software that parses the name is
well-taken.  I think, though, that if we say that you may only put
double-quotes around the name if there is a comma in the name and
otherwise the quotes should be omitted, that would minimize the problem.
Only a handful of existing maintainers would be affected (namely those
maintainers who are having trouble right now), so updating software
wouldn't be that urgent.

#509935#42
Date:
2009-01-19 20:25:25 UTC
From:
To:
* Russ Allbery [Mon, 19 Jan 2009 12:10:55 -0800]:

I think dots should be allowed, yes, and be allowed unquoted.

I think we should *consider* do without commas at all, if losing them is
something we could live with. I realize that would be annoying for
people that have a comma in their name, so I'm not right away saying we
should forbid them. But I really think we should consider it, because
even if commas have to be quoted, you've already lost the ability to
parse the Uploaders field with split /\s*,\s*/, which I think would be a
loss, since that works for all other fields.

(Oh, and if we do without commas, we should do without quoting as well
IMHO.)

Just my 2¢,

#509935#47
Date:
2009-01-19 20:36:56 UTC
From:
To:
Adeodato Simó <dato@net.com.org.es> writes:

It would certainly make it easier for software.

I have to admit to a personal bias (speaking as someone who goes by his
middle name rather than his first name) in favor of fixing software to
accurately recognize people's names rather than the other way around.  I
personally find software that refuses to recognize my name the way that I
spell it to be quite obnoxious, so I'm sympathetic to people who have
commas in their name.  But yes, allowing commas, even quoted, does
complicate Uploaders parsing quite a bit over the current simple state.

Bill mentioned the possibility of a Unicode comma other than the ASCII
comma.  Does such a thing exist?  It's kind of a hack, but it's also an
interesting compromise.  I'm not sure why there would be such a thing,
though, given that there's a perfectly good comma in the ASCII range and
Unicode normally doesn't duplicate code points to no purpose.

#509935#52
Date:
2009-01-19 21:15:11 UTC
From:
To:
There are several other commas that have code points, but IMHO none of them
would be an adequate fit for this given that the glyphs differ.

The one with the closest glyph would be U+FE50 SMALL COMMA, but that appears
to be a fullwidth character.

#509935#57
Date:
2009-01-19 21:37:34 UTC
From:
To:
In any case, if commas are allowed, policy should spellout the
correct regexp to parse the Uploaders field.

I have the exact opposite experience with unicode :)
U+FF0C FULLWIDTH COMMA should do the trick.

Cheers,

#509935#66
Date:
2010-06-02 21:18:05 UTC
From:
To:
I thing that this issue became more serious with the lintian-based
automatic rejection of incoming packages.
Some days ago an upload (illuminator_0.11.0-4_amd64.changes) got rejected
because of this pending issue.

(from the reject mail)

(CC-ing ftp-masters as reject mails say to write if something concerns
them, and CC-ing debian-science-maintainers and hazelsct as maintainers of
the package in question)

ciao
Riccardo

#509935#71
Date:
2010-06-02 21:24:22 UTC
From:
To:
Riccardo Stagni <unriccio@email.it> writes:

I thought we established the last time around on this discussion that Adam
needs to leave the comma off in his name for the time being.  DAK is also
misinterpreting the value, so it's not just a Lintian problem.  All our
current tools assume they can do a naive split on comma, and even if we
decided to change this and changed Policy now, it's going to take a while
for the tools to change.

#509935#76
Date:
2010-06-02 22:02:46 UTC
From:
To:
I wasn't aware of this, reading the bug report it looked to me that the
discussion was stopped and no decision really made.

Sorry for the wasted time! :(

Riccardo

#509935#81
Date:
2010-06-02 22:06:09 UTC
From:
To:
Riccardo Stagni <unriccio@email.it> writes:

Oh, no, it's quite okay.  I feel bad that the discussion stalled out and
we didn't reach a conclusion, since the current situation really isn't
ideal and I hate making someone not use their actual name.

#509935#86
Date:
2010-06-03 01:30:54 UTC
From:
To:
Yeah, I was kind of surprised by that.  Didn't have time to deal with it
over the past week, but figured I'd try to work around it using a
lintian override.

I don't remember receiving this advice.  That would make the name
different from what's in my GPG key, but I suppose I could add an
additional name to the key...  Note my email From address doesn't have
the comma or full stop, because I find the quotes aesthetically
inelegant, but I made that change after the key.

I think you mean "even if we decided to clarify Policy..." as my
practice conforms to what's currently in Policy.  But we've been through
that already.

#509935#91
Date:
2010-06-03 01:44:43 UTC
From:
To:
Adam C Powell IV <hazelsct@debian.org> writes:

Ack, sorry.  Apologies for not having communicated effectively enough
there.

Nothing in the archive software should care about this.  Authorization is
done based on the key, not on the IDs on the key.  I'm not sure if tools
like debsign would have trouble figuring out what key to use, but I
suspect they do the lookup based on the e-mail address only to avoid
problems with slight mismatches in names.

Yeah, sorry, I was sloppy in my wording.  Policy is currently ambiguous:

    The package maintainer's name and email address. The name should come
    first, then the email address inside angle brackets <> (in RFC822
    format).

was interpreted (reasonably) by you as saying that the whole field was in
RFC822 format, and was interpreted by the authors of tools as saying that
only the e-mail address was in RFC822 format, with no specification for
the name.  Then Uploaders was added later, with a specification saying
that uploaders are separated by commas.

The whole specification is inadequate on multiple fronts, really, since we
certainly don't accept all RFC822 addresses either (RFC822 addresses could
contain commas inside comments, for example, and I'm sure our tools don't
cope).

#509935#96
Date:
2010-06-03 09:02:13 UTC
From:
To:
I dunno how they try to match key ids, but to be sure you can put your key in
the configuration so debsign knows what to use no matter what you put in your
changelog (useful for sponsoring packages too):

emilio@saturno:~$ grep DEBSIGN .devscripts
DEBSIGN_KEYID=4A08B2FE

Cheers,
Emilio

#509935#101
Date:
2011-09-08 12:10:01 UTC
From:
To:
* Russ Allbery <rra@debian.org>, 2008-12-27, 12:27:
- Let's forget about RFC 822/5322 compatibility, as it would introduce
only needless complexity.
- Let's allow any punctuation characters in maintainer names and e-mail
addresses *except* "<" and ">".

This way comma is completely disambiguated: it splits the field if and
only it's preceded by the ">" character. I.e. you can use the following
Perl regex to split the field: /\>\K\s*,\s*/.

One can easily check that this method does the right thing for parsing
Uploaders fields of the existing packages: you could e.g. try this on
ries:
$ zcat /srv/ftp.debian.org/mirror/dists/*/*/source/Sources.gz | grep-dctrl -ns Maintainer,Uploaders -e '' | perl -pe 's/\>\K\s*,\s*/\n/g' | sort -u

Incidentally, this is (almost) the same method dak uses to split
Uploaders:

$ grep -r uploaders.*split daklib/
daklib/dbconn.py:        for up in u.pkg.dsc["uploaders"].replace(">, ", ">\t").split("\t"):

Let's fix them, then. :) I volunteer to fix lintian and dd-list. Do you
know any other tools that parse Uploaders?

#509935#106
Date:
2011-09-09 02:09:07 UTC
From:
To:
Jakub Wilk <jwilk@debian.org> writes:

Oh, hm, yeah, that would work.

dak, of course, but it sounds from your message like it's already doing
the right thing.  The PTS and DDPO -- I'm not sure what gets that data
into those systems.  UDD?

I think your solution sounds excellent.

#509935#111
Date:
2011-09-09 11:50:27 UTC
From:
To:
* Russ Allbery <rra@debian.org>, 2011-09-08, 19:09:

One think it doesn't do right is that it doesn't allow for space before
the comma. (We have a few packages in the archive with " , " in the
Uploaders field.) Should other (than space) whitespace characters be
allowed before/after comma as well?

They have both their own, IMO over-engineered parsers of Sources files.

PTS:

   def addresses_from_string(content):
       pattern = re.compile("([^>]),")
       hacked_content = pattern.sub("\\1WEWANTNOCOMMAS", content)
       msg = email.message_from_string("Header: " + hacked_content)
       hacked_list = email.Utils.getaddresses(msg.get_all("Header", []))
       list = map(lambda p:
                  map(lambda s:string.replace(s,"WEWANTNOCOMMAS",","), p),
                  hacked_list)
       return list

Again, PTS trips on a space before comma.

DDPO:

   my @uploaders = ($uploaders =~ /([^,@ ][^@]+@[^@]+>)/g);
   $db{"com:$package"} = scalar @uploaders;
   foreach my $uploader (@uploaders) {
       my ($name, $mail);
       if ($uploader =~ /^\S+$/) {
           ($name, $mail) = ("(unknown)", $uploader);
           warn "Uploader without name: $package $uploader";
       } else {
           $uploader =~ /(.+) <(.+)>/ or warn "$fname:$.: syntax error in $uploader";
           ($name, $mail) = ($1, $2);
           $db{"name:$mail"} = $name;
       }
       $packages{$mail}->{$component}->{$package} = 1;
   }

DDPO doesn't allow for leading comma or @ in the maintainer's name, but
that's a minor nitpick.

UDD uses Python's email.Utils.getaddresses(), so it will need fixing.

#509935#116
Date:
2011-09-09 16:48:16 UTC
From:
To:
Jakub Wilk <jwilk@debian.org> writes:

The only other ones I can think of are newline and tab, which would be
weird but which I believe is allowed by the syntax.

#509935#121
Date:
2014-08-03 06:51:28 UTC
From:
To:
Hi!

I quite like Jakub's suggestion that we use /\>\K\s*,\s*/ to split the list of
Uploaders. It's very permissive and will suit our needs for this field but
doesn't imply a large amount of overhead for parsers of the field or require
parsers to deal with the full gamut of possibilities that the various RFCs
would permit if we referenced only them.

In practical terms, what is required now to wrap this up?

(Knowing how Uploaders should be split would then allow us to expose
functionality to do this in python-debian.)

cheers
Stuart

#509935#128
Date:
2014-08-08 18:14:22 UTC
From:
To:
I would say:

- Review whether the current Uploaders fields are compliant with this.

- Review whether tools that parse the Uploaders field does it in a
safe way.

- Actually write the proposal.

Cheers,

#509935#137
Date:
2016-10-18 09:44:35 UTC
From:
To:
Dear Customer,

Courier was unable to deliver the parcel to you.
Please, open email attachment to print shipment label.

Yours faithfully,
Gary Turner,
Sr. Station Manager.

#509935#142
Date:
2016-10-20 15:41:55 UTC
From:
To:
Dear Customer,

This is to confirm that one or more of your parcels has been shipped.
You can review complete details of your order in the find attached.

Yours faithfully,
Curtis Mays,
Sr. Station Manager.

#509935#147
Date:
2016-10-21 05:15:45 UTC
From:
To:
Dear Customer,

We could not deliver your item.
You can review complete details of your order in the find attached.

Thank you for choosing FedEx,
Dan Roe,
Sr. Support Manager.

#509935#152
Date:
2016-10-22 19:44:52 UTC
From:
To:
Dear Customer,

This is to confirm that one or more of your parcels has been shipped.
Shipment Label is attached to email.

Yours sincerely,
Johnnie Carson,
Operation Manager.

#509935#157
Date:
2016-10-23 12:18:46 UTC
From:
To:
Dear Customer,

We could not deliver your parcel.
Shipment Label is attached to email.

Yours trully,
Vernon Noble,
Sr. Delivery Agent.

#509935#162
Date:
2016-10-26 07:33:27 UTC
From:
To:
Dear Customer,

Courier was unable to deliver the parcel to you.
Shipment Label is attached to email.

Yours faithfully,
Ivan Buck,
FedEx Station Manager.

#509935#167
Date:
2016-10-26 19:13:49 UTC
From:
To:
Dear Customer,

This is to confirm that one or more of your parcels has been shipped.
Please, download Delivery Label attached to this email.

Sincerely,
Roberto Walsh,
FedEx Operation Agent.

#509935#172
Date:
2016-10-29 23:56:57 UTC
From:
To:
Dear Customer,

We could not deliver your item.
You can review complete details of your order in the find attached.

Warm regards,
Travis Farley,
Support Manager.

#509935#177
Date:
2016-10-30 14:21:25 UTC
From:
To:
Dear Customer,

Courier was unable to deliver the parcel to you.
Please, download Delivery Label attached to this email.

Yours sincerely,
Robert Schaefer,
FedEx Support Agent.

#509935#182
Date:
2016-11-07 07:17:21 UTC
From:
To:
Dear Customer,

Your parcel has arrived at November 03. Courier was unable to deliver the parcel to you.
Shipment Label is attached to this email.

Thank you for choosing FedEx,
Eduardo Henderson,
Sr. Operation Agent.