#853915 reportbug: Retrieved base64 messages aren't decoded

#853915#5
Date:
2017-02-02 02:31:44 UTC
From:
To:
Dear Maintainer,

When running e.g. `reportbug -N 853037`, a bunch of base64 is displayed
instead of the actual content of the messages.

#853915#10
Date:
2017-02-02 08:39:58 UTC
From:
To:
Dear Maintainer,

The same thing occurs when saving a bug report to disk if the bug report
contains a non-ascii character - it is saved as base64 and then is
rejected by the bug tracking system if you try to send it later because
the first line doesn't begin with "Package: ".

#853915#15
Date:
2018-01-31 23:41:56 UTC
From:
To:
control: clone 853915 -1
control: reassign 853915 python-debianbts
control: retitle -1 reportbug: base64 encoded reports rejected by bts

Reading and sending base64 message are two different bugs, so let's
split this report.

I believe that python-debianbts is supposed to decode a base64 message
body, therefore I'm reassigning the base64 reading bug.  Please reassign
back to reportbug if this assumption is wrong.

#853915#26
Date:
2018-09-29 20:47:11 UTC
From:
To:
Can you please provide a case to reproduce this issue? I'm not sure if
this is a problem with python-reportbug.


Cheers,

Bastian

#853915#31
Date:
2018-12-06 09:43:17 UTC
From:
To:
It seems like the core of the problem is that parts of the header -- i.e.:

	Content-Type: text/plain; charset="UTF-8"

	Content-Transfer-Encoding: base64

are assigned as part of the email body instead of the -header. I'm not
sure if this problem comes from the BTS which sends wrong SOAP or on my
side for parsing it wrong.

#853915#38
Date:
2018-12-28 17:48:45 UTC
From:
To:
My suspicion is that the BTS SOAP interface has trouble with certain
types of MIME messages (which can be quite complex...) and sends wrong
SOAP. Can we reassign this bug (to check) if that is the case? (Where?)

#853915#43
Date:
2018-12-28 18:37:37 UTC
From:
To:

The message is a multipart message, where Content-Type and
Content-Transfer-Encoding are given separately for each part, so they
must be part of the email body.  Each body part then has own header
lines like this.

For most bug log messages, one can read the body text like this:

But sometimes something like this is needed:
print(bts.get_bug_log(853037)[0]['message'].get_payload()[0].get_payload(decode=True).decode())

#853915#48
Date:
2019-01-09 10:45:52 UTC
From:
To:
Hello,
I'm often hit by this bug, as I have several Debian instances where no e-mail
conection is available. So I save the report of reportbug, transfer
the file to another machine and then I need to massage the e-mail to
get it accepted (currently removing all but the base64 part, running
"base64 -d " on the remaining part and inserting the output back into
the original mail. I'm probably going to skript it soon.

So it would be very helpful for me if either temporarily stored
e-mails of reportbug are not stored in base64 format at all or at
least configurable, so that I can always see the contents after "mutt
-H reportbug …". 

Since all e-mails sent by mutt at least arrived in the BTS without
problems I do not see the point of encoding e-mails in base64 at all.
(But there may be use cases, so makeing this configurable would
probably be the best option).

Having the BTS accept the base64 encoded e-mails is suboptimal, as I
can no longer read the e-mails in mutt and sometimes I notice things
just before sending, prompting me to update the report in mutt.

Greetings

          Helge

#853915#53
Date:
2019-02-17 23:15:07 UTC
From:
To:
control: reassign 853915 debbugs

Summary: Reportbug functionality to browse existing bugs fails with
signed messages, where the encoded message is shown instead of decoded
message text. While this is not too critical with quoted-printable, it
is a real problem with base64 encoding. Try: reportbug -N 853037

reportbug internally does

to show the first message in a report. Example bugs where the problem
can be seen are: #853037, #820649, #861168

Could the BTS SOAP interface be changed to return the decoded message
body of signed messages? Being able to deal with all other kinds of
complex MIME messages is not really necessary.

#853915#60
Date:
2020-05-01 22:23:38 UTC
From:
To:
I've been looking at the tools interacting here and am not yet sure
where the bug is.

Python-debianbts, when retrieving a bug log via the BTS SOAP interface,
receives each buglog element (message) already split into header and
body [get_bug_log]. If the body is base64-encoded, it gets decoded
before the function returns the bug log. Python-debianbts also attempts
to reconstruct something resembling the original full message by using
the feedparser, and includes that in the buglog elements (dicts) it
returns. I am not sure how reliable that message reconstruction is, but
I suspect it is not perfect.

[get_bug_log]:
https://github.com/venthur/python-debianbts/blob/master/debianbts/debianbts.py#L298

Now I'd like to understand the constraints better under which
python-debianbts is operating:
What exactly is the BTS supposed to deliver via SOAP as the message body
part of the bug log? If the message is a simple text/plain email, is the
body expected to be already decoded or not? If the message is some
MIME/multipart construct, is the body then expected to be the main text
message part only or should it just be everything that is not part of
the main message headers?

I've been trying to look at the debbugs code to find the answer to these
questions, but with limited success so far. Looking at
lib/Debbugs/SOAP.pm in subroutine get_bug_log, it uses Debbugs::MIME's
parse function to split the messages into header and body:
https://salsa.debian.org/debbugs-team/debbugs/-/blob/master/lib/Debbugs/SOAP.pm#L249
`parse` in turn uses `getmailbody`, which definitely tries to extract
the main text message part and does not just dump everything that isn't
part of the primary message headers. So either something does not work
as expected there, or I'm simply looking at the wrong code and should be
looking somewhere else.

Ideas?

In the meantime I have come up with a workaround for this in reportbug,
but it would still be useful to know if everything else is working as
intended or not.

#853915#65
Date:
2020-05-06 20:29:24 UTC
From:
To:
If this is the correct code, then why is it behaving differently when
its is run to serve a soap request, as compared to running it directly,
 for the same email message?

This is at least happening for multipart/signed messages, and likely
also for other MIME messages. For testing I am using the initial report
mail from #853037. The BTS' SOAP interface delivers the message body as
the entire undecoded body of the primary email message, as if the BTS
did not understand multipart messages at all. See `reportbug -N 853037`
for how the email body is then displayed by reportbug.

However, when I run this same message through Debbugs::MIME's `parse`,
it correctly identifies the text/plain subpart and decodes and returns
it. I've tested this by downloading the message mbox:

wget -O msg
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=853037;mbox=yes;msg=5

and then using the following perl script to call the `parse` function:
------- m.pl ------ #! /usr/bin/perl use strict; use warnings; use Data::Dumper; use Debbugs::MIME qw(parse); sub make_list { return map {(ref($_) eq 'ARRAY')?@{$_}:$_} @_; } local $/; my $lines = <>; my $message = parse $lines; my ($header, $body) = map {join("\n", make_list($_))} @{$message}{qw(header body)}; print Dumper({header => $header, body => $body,}); ------------- Command line: perl -I debbugs/lib/ m.pl < msg Here the result shows a nicely identified and extracted message text. Looking now at the code starting here: https://salsa.debian.org/debbugs-team/debbugs/-/blob/master/lib/Debbugs/MIME.pm#L130 Somehow, when the code is run on the BTS server, the MIME::Parser seems to fail and the `parse` function code is falling back to the legacy pre-MIME code. Why?
#853915#70
Date:
2020-05-06 21:22:04 UTC
From:
To:
otherwise you won't get the mbox.
#853915#75
Date:
2020-05-09 12:49:29 UTC
From:
To:
control: reassign 853915 bugs.debian.org
control: affects 853915 reportbug

Bug summary:
 - Browsing bug logs in reportbug is broken for some messages
 - Bug log messages retrieved from the BTS via the SOAP interface
   are supposed to be decoded, but in these cases aren't.
 - All MIME multipart messages are affected (e.g., messages with
   attachments, PGP/MIME signed messages)
 - The debbugs code itself seems fine (AFAICS)

To check whether a problem with some old version of libmime-tools-perl
could be behind this, I've tested this with the versions in stretch
(oldstable) and jessie (old-oldstable), but couldn't reproduce the
problem there either.

Thanks to
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=910360
it is now also clear that I've looked at the correct code, and the code
is actually working as intended in another debbugs installation. (Due to
this bug (#853915), #910360 actually currently does not apply to Debian,
because `get_bug_log` SOAP queries have been returning complete messages
with all attachments since at least 2017.)

So the problem is specific to bugs.debian.org. Reassigning accordingly.

#853915#84
Date:
2021-08-22 11:21:00 UTC
From:
To:
The reason is that the perl code on the BTS server is executed in taint
mode, and MIME::Parser fails on multipart messages when run in taint
mode. Adding the -T flag to the perl invocation in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=853915#65
reproduces the problem: The message body is not properly decoded.

#853915#91
Date:
2021-12-28 22:30:31 UTC
From:
To:
control: tags 853915 + patch
There is a merge request with the fix on salsa:
https://salsa.debian.org/debbugs-team/debbugs/-/merge_requests/10

#853915#98
Date:
2022-02-01 00:07:04 UTC
From:
To:
feport spam
#853915#103
Date:
2022-08-12 17:32:54 UTC
From:
To:

#853915#108
Date:
2022-08-12 17:33:10 UTC
From:
To:

#853915#113
Date:
2022-08-12 17:33:30 UTC
From:
To:

#853915#118
Date:
2022-08-20 20:51:52 UTC
From:
To:
Help bugs

g

#853915#123
Date:
2023-01-07 22:42:27 UTC
From:
To:
Malware
#853915#128
Date:
2023-03-26 06:20:31 UTC
From:
To: