#984581 pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

Package:
pst-utils
Source:
libpst
Description:
tools for reading Microsoft Outlook PST files
Submitter:
sai kalyan
Date:
2022-01-21 01:12:02 UTC
Severity:
important
Tags:
#984581#5
Date:
2021-03-05 17:36:37 UTC
From:
To:
Hi,

We have been using the tool to extract emails from the PST files. However with
the recent observations, for some mails where the transport headers contain ARC
headers, the email addresses are not extracted from the PST and only usernames
are available in the MIME content of emails that are extracted.
After enabling debug logs we got to know that all the internet headers are
being ignored as bogus headers which also contains the headers To:, From: ...
where we can see the email addresses available.

As the tool is open-source we tried to debug the tool, post debug we identified
that the the headers are ignored (as bogus headers) and the tool is using the
metadata extracted to construct MIME content for the email where the email
addresses are missing.

We would like to point at two parts where the issue could be possibly happened.
1) Parsing the mail from PST - As the structure variable does not contain the
addresses for these emails.
2) Ignoring the headers as bogus headers using the incorrect comparison.


We are not able to look into the parsing part, but we did some changes to
verify the behavior at identification part of bogus headers, probably not
appropriate changes.

Sample Data:
        Below is the sample MIME Content that is extracted for an email from
PST by readpst utility

From: user_1
To: user_2
CC: user_3

where user_1, user_2 and user_3 are just usernames without email addresses

We would like to hear back as soon as possible.

Thank you
Sai Kalyan

#984581#10
Date:
2021-03-06 02:42:00 UTC
From:
To:
Control: tags -1 + moreinfo

Could you test version 0.6.75-1 from Debian bullseye?

Could you attach your patch to the bug report?

Could you provide some information about what ARC headers are?

Please supply an example PST file that this problem occurs with.

#984581#17
Date:
2021-03-07 17:42:17 UTC
From:
To:
Hi Paul,

We already tried with version 0.6.75-1. Also compiled the latest code available and tried with it, still the same results.

Please find the changes in the attached file. (readpst.c line no. : 1238)

ARC headers are kind of email authentication headers. Authenticated Received Chain (ARC) creates a mechanism for individual Internet Mail Handlers to add their authentication assessment to a message's ordered set of handling results. For more details please refer the following rfc https://tools.ietf.org/html/rfc8617.

For some security reasons we cannot share the original , we will once discuss and let you know, if possible we will try to share the inhouse sample pst. We will let you know about the PST in the next couple of days
Meanwhile our observation is if the headers start with the following headers (Date, From, To, Content-Type, MIME-Version, Microsoft Mail Internet Headers, Received, Subject and some other headers) it is treated as bogus, this email is starting with some header which is not one of the listed.

Thank you
Sai Kalyan


From: Paul Wise <pabs@debian.org>
Sent: 06 March 2021 08:12 AM
To: Surla, Sai Kalyan <SaiKalyan.Surla@arcserve.com>; 984581@bugs.debian.org
Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

Control: tags -1 + moreinfo

Could you test version 0.6.75-1 from Debian bullseye?

Could you attach your patch to the bug report?

Could you provide some information about what ARC headers are?

Please supply an example PST file that this problem occurs with.

#984581#24
Date:
2021-03-08 01:32:01 UTC
From:
To:
Control: found -1 0.6.75-1

Thanks, marking the bug as found in that version.

Thanks for testing this too.

It is traditional to provide changes in the patch format by using the
`diff -u` command or the corresponding commands from the version
control system that the upstream project is using.

Below is the output from the Mercurial diff for your change.

   $ hg diff
   diff -r 7200790e46ac src/readpst.c
   --- a/src/readpst.c     Tue Jun 16 17:18:28 2020 -0700
   +++ b/src/readpst.c     Mon Mar 08 09:20:50 2021 +0800
   @@ -1235,7 +1235,7 @@

    int  header_match(char *header, char*field) {
        int n = strlen(field);
   -    if (strncasecmp(header, field, n) == 0) return 1;   // tag:{space}
   +    if (strstr(header,field) != NULL || strncasecmp(header, field, n) == 0) return 1;   // tag:{space}
        if ((field[n-1] == ' ') && (strncasecmp(header, field, n-1) == 0)) {
            char *crlftab = "\r\n\t";
            DEBUG_INFO(("Possible wrapped header = %s\n", header));


I am fairly certain that this is not the correct fix for this issue.

Thanks for the info.

Understood.

That would be necessary to be able to fix the issue.

That does look like what the code does indeed, probably the right fix
is to scan through all of the headers instead of just the first one.

#984581#29
Date:
2021-03-08 08:14:36 UTC
From:
To:
Hi Paul,

Please find the PST contains single email with which we also faced problem in extracting email addresses under ‘To:’ header.

Thank you
Sai Kalyan

From: Paul Wise <pabs@debian.org>
Sent: 08 March 2021 07:02 AM
To: Surla, Sai Kalyan <SaiKalyan.Surla@arcserve.com>; 984581@bugs.debian.org
Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

Control: found -1 0.6.75-1

Thanks, marking the bug as found in that version.

Thanks for testing this too.

It is traditional to provide changes in the patch format by using the
`diff -u` command or the corresponding commands from the version
control system that the upstream project is using.

Below is the output from the Mercurial diff for your change.

$ hg diff
diff -r 7200790e46ac src/readpst.c
--- a/src/readpst.c Tue Jun 16 17:18:28 2020 -0700
+++ b/src/readpst.c Mon Mar 08 09:20:50 2021 +0800
@@ -1235,7 +1235,7 @@

int header_match(char *header, char*field) {
int n = strlen(field);
- if (strncasecmp(header, field, n) == 0) return 1; // tag:{space}
+ if (strstr(header,field) != NULL || strncasecmp(header, field, n) == 0) return 1; // tag:{space}
if ((field[n-1] == ' ') && (strncasecmp(header, field, n-1) == 0)) {
char *crlftab = "\r\n\t";
DEBUG_INFO(("Possible wrapped header = %s\n", header));


I am fairly certain that this is not the correct fix for this issue.

Thanks for the info.

Understood.

That would be necessary to be able to fix the issue.

That does look like what the code does indeed, probably the right fix
is to scan through all of the headers instead of just the first one.

#984581#34
Date:
2021-03-08 13:56:46 UTC
From:
To:
Sorry, it looks like outlooks blocked this pst.

From: Surla, Sai Kalyan
Sent: 08 March 2021 01:45 PM
To: Paul Wise <pabs@debian.org>; 984581@bugs.debian.org
Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

Hi Paul,

Please find the PST contains single email with which we also faced problem in extracting email addresses under ‘To:’ header.

Thank you
Sai Kalyan

From: Paul Wise <pabs@debian.org<mailto:pabs@debian.org>>
Sent: 08 March 2021 07:02 AM
To: Surla, Sai Kalyan <SaiKalyan.Surla@arcserve.com<mailto:SaiKalyan.Surla@arcserve.com>>; 984581@bugs.debian.org<mailto:984581@bugs.debian.org>
Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

Control: found -1 0.6.75-1

Thanks, marking the bug as found in that version.

Thanks for testing this too.

It is traditional to provide changes in the patch format by using the
`diff -u` command or the corresponding commands from the version
control system that the upstream project is using.

Below is the output from the Mercurial diff for your change.

$ hg diff
diff -r 7200790e46ac src/readpst.c
--- a/src/readpst.c Tue Jun 16 17:18:28 2020 -0700
+++ b/src/readpst.c Mon Mar 08 09:20:50 2021 +0800
@@ -1235,7 +1235,7 @@

int header_match(char *header, char*field) {
int n = strlen(field);
- if (strncasecmp(header, field, n) == 0) return 1; // tag:{space}
+ if (strstr(header,field) != NULL || strncasecmp(header, field, n) == 0) return 1; // tag:{space}
if ((field[n-1] == ' ') && (strncasecmp(header, field, n-1) == 0)) {
char *crlftab = "\r\n\t";
DEBUG_INFO(("Possible wrapped header = %s\n", header));


I am fairly certain that this is not the correct fix for this issue.

Thanks for the info.

Understood.

That would be necessary to be able to fix the issue.

That does look like what the code does indeed, probably the right fix
is to scan through all of the headers instead of just the first one.

#984581#39
Date:
2021-03-08 13:58:19 UTC
From:
To:
Outlook blocking the PST, please find the zipped PST file.

Thank you
Sai Kalyan

From: Surla, Sai Kalyan
Sent: 08 March 2021 07:27 PM
To: 'Paul Wise' <pabs@debian.org>; '984581@bugs.debian.org' <984581@bugs.debian.org>
Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

Sorry, it looks like outlooks blocked this pst.

From: Surla, Sai Kalyan
Sent: 08 March 2021 01:45 PM
To: Paul Wise <pabs@debian.org<mailto:pabs@debian.org>>; 984581@bugs.debian.org<mailto:984581@bugs.debian.org>
Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

Hi Paul,

Please find the PST contains single email with which we also faced problem in extracting email addresses under ‘To:’ header.

Thank you
Sai Kalyan

From: Paul Wise <pabs@debian.org<mailto:pabs@debian.org>>
Sent: 08 March 2021 07:02 AM
To: Surla, Sai Kalyan <SaiKalyan.Surla@arcserve.com<mailto:SaiKalyan.Surla@arcserve.com>>; 984581@bugs.debian.org<mailto:984581@bugs.debian.org>
Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

Control: found -1 0.6.75-1

Thanks, marking the bug as found in that version.

Thanks for testing this too.

It is traditional to provide changes in the patch format by using the
`diff -u` command or the corresponding commands from the version
control system that the upstream project is using.

Below is the output from the Mercurial diff for your change.

$ hg diff
diff -r 7200790e46ac src/readpst.c
--- a/src/readpst.c Tue Jun 16 17:18:28 2020 -0700
+++ b/src/readpst.c Mon Mar 08 09:20:50 2021 +0800
@@ -1235,7 +1235,7 @@

int header_match(char *header, char*field) {
int n = strlen(field);
- if (strncasecmp(header, field, n) == 0) return 1; // tag:{space}
+ if (strstr(header,field) != NULL || strncasecmp(header, field, n) == 0) return 1; // tag:{space}
if ((field[n-1] == ' ') && (strncasecmp(header, field, n-1) == 0)) {
char *crlftab = "\r\n\t";
DEBUG_INFO(("Possible wrapped header = %s\n", header));


I am fairly certain that this is not the correct fix for this issue.

Thanks for the info.

Understood.

That would be necessary to be able to fix the issue.

That does look like what the code does indeed, probably the right fix
is to scan through all of the headers instead of just the first one.

#984581#44
Date:
2021-03-10 09:28:55 UTC
From:
To:
Hi Paul,

How are you? Hope you got a chance to at the issue that we reported. I am reiterating the summary of the problem.
There are some transport headers starting with “ARC-Seal: ”. These transport headers also contain the To, CC and BCC addresses with both display names and corresponding email IDs. However, the `readpst` is discarding these transport headers while creating the EML file with MIME content and in the final MIME content we are getting only the display names for all the To, CC and BCC addresses. Possible that you might be considering canonical properties to extract the To, CC, and Bcc addresses from the PST file.


After looking at the readpst.c file (see below) we understood that the readpst is discarding any transport header that doesn’t start with the specified text.



int  valid_headers(char *header)

     // headers are sometimes really bogus - they seem to be fragments of the

     // message body, so we only use them if they seem to be real rfc822 headers.

     // this list is composed of ones that we have seen in real pst files.

     // there are surely others. the problem is - given an arbitrary character

     // string, is it a valid (or even reasonable) set of rfc822 headers?

     if (header) {

         if (header_match(header, "Content-Type: "                 )) return 1;

         if (header_match(header, "Date: "                         )) return 1;

         if (header_match(header, "From: "                         )) return 1;

         if (header_match(header, "MIME-Version: "                 )) return 1;

         if (header_match(header, "Microsoft Mail Internet Headers")) return 1;

         if (header_match(header, "Received: "                     )) return 1;

         if (header_match(header, "Return-Path: "                  )) return 1;

         if (header_match(header, "Subject: "                      )) return 1;

         if (header_match(header, "To: "                           )) return 1;

         if (header_match(header, "X-ASG-Debug-ID: "               )) return 1;

         if (header_match(header, "X-Barracuda-URL: "              )) return 1;

         if (header_match(header, "X-x: "                          )) return 1;

         if (strlen(header) > 2) {

             DEBUG_INFO(("Ignore bogus headers = %s\n", header));

         }

         return 0;

     }

     else return 0;

}

As per our understanding, the ARC headers(which helps preserve email authentication results and verifies the identity of email intermediaries that forward a message on to its final destination) are introduced in 2016 and looks like this is not taken care in readpst.

Appreciate if you can clarify :

  1.  Is our understanding correct?
  2.  If Yes, can we expect a patch from you ?
  3.  If our understanding is not correct, can we expect a patch with proper fixes, or can you let us know where to fix the problem?
  4.  Are there any other headers like ARC, that are not taken care?



Looking forward for your reply so that we can commit a date to our customers.

Thank you
Sai Kalyan


From: Surla, Sai Kalyan
Sent: 08 March 2021 07:28 PM
To: 'Paul Wise' <pabs@debian.org>; '984581@bugs.debian.org' <984581@bugs.debian.org>
Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

Outlook blocking the PST, please find the zipped PST file.

Thank you
Sai Kalyan

From: Surla, Sai Kalyan
Sent: 08 March 2021 07:27 PM
To: 'Paul Wise' <pabs@debian.org<mailto:pabs@debian.org>>; '984581@bugs.debian.org' <984581@bugs.debian.org<mailto:984581@bugs.debian.org>>
Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

Sorry, it looks like outlooks blocked this pst.

From: Surla, Sai Kalyan
Sent: 08 March 2021 01:45 PM
To: Paul Wise <pabs@debian.org<mailto:pabs@debian.org>>; 984581@bugs.debian.org<mailto:984581@bugs.debian.org>
Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

Hi Paul,

Please find the PST contains single email with which we also faced problem in extracting email addresses under ‘To:’ header.

Thank you
Sai Kalyan

From: Paul Wise <pabs@debian.org<mailto:pabs@debian.org>>
Sent: 08 March 2021 07:02 AM
To: Surla, Sai Kalyan <SaiKalyan.Surla@arcserve.com<mailto:SaiKalyan.Surla@arcserve.com>>; 984581@bugs.debian.org<mailto:984581@bugs.debian.org>
Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

Control: found -1 0.6.75-1

Thanks, marking the bug as found in that version.

Thanks for testing this too.

It is traditional to provide changes in the patch format by using the
`diff -u` command or the corresponding commands from the version
control system that the upstream project is using.

Below is the output from the Mercurial diff for your change.

$ hg diff
diff -r 7200790e46ac src/readpst.c
--- a/src/readpst.c Tue Jun 16 17:18:28 2020 -0700
+++ b/src/readpst.c Mon Mar 08 09:20:50 2021 +0800
@@ -1235,7 +1235,7 @@

int header_match(char *header, char*field) {
int n = strlen(field);
- if (strncasecmp(header, field, n) == 0) return 1; // tag:{space}
+ if (strstr(header,field) != NULL || strncasecmp(header, field, n) == 0) return 1; // tag:{space}
if ((field[n-1] == ' ') && (strncasecmp(header, field, n-1) == 0)) {
char *crlftab = "\r\n\t";
DEBUG_INFO(("Possible wrapped header = %s\n", header));


I am fairly certain that this is not the correct fix for this issue.

Thanks for the info.

Understood.

That would be necessary to be able to fix the issue.

That does look like what the code does indeed, probably the right fix
is to scan through all of the headers instead of just the first one.

#984581#49
Date:
2021-03-15 03:21:44 UTC
From:
To:
I am looking at the issue today.

I managed to reproduce the issue that you have reported using the
sample PST file that you have provided.

I acknowledge that I am seeing both the issues you reported:

 * only a limited set of headers are being extracted
 * email address is missing from the To header
    - but the From header is correct

The readpst -d option to output debug information was instrumental in
reproducing this, it causes all the info in the PST file and the entire
sequence of decoding steps to be output to a debug file.

I modified the valid_headers function to also accept the ARC-Seal
header but that does not fix the problem. Looking at the debug output I
noticed that the X-GM-THRID header is the first header. I then added a
X-GM-THRID to the valid_headers function and that fixed the problem. I
think that messages with a different first header will not work though,
you would have to add all of the first headers that could exist to the
valid_headers function, which seems like an incorrect thing to do.

If you have any sample PST files that *do* work with the current code,
that would allow me to compare the working PST with the broken PST,
which would be very helpful in tracking down where the problem is.

Until I can figure out the correct fix, I suggest you workaround this
bug by adding "return 1;" without quotes as the first line in the
valid_headers function. This way you can keep readpst working for your
customers while the correct fix is found. I believe that the modern PST
files that you have available are all valid files, while the
valid_headers function aims to detect broken files, so there should be
no risk to the conversion process for your case.

#984581#54
Date:
2021-03-15 04:17:10 UTC
From:
To:
I did some further investigation of the PST file you sent.

I conclude that there are two problems you are experiencing:

The first one is that readpst doesn't consider the headers as valid
even though they clearly are valid. Since the header validity detection
was added to detect invalid PST files I am going to have to discuss
this with the upstream author. Perhaps the header validity detection
will have to become more generic or perhaps it will be discarded or
perhaps the invalid PST files will be detected in a different way.
Fixing this will bring back all the headers, including ARC & To.

The second one is that for your particular PST file, the To field does
not contain an email address. Looking at the debug output I see that
the "Display Sent-To Address" contains only the name, not the email.
This appears to be a problem with the PST file itself, as the 0x0E04
type, which is PR_DISPLAY_TO, aka the "Address Sent-To", does not
contain the email address. The email address does appear in the
"Contact Address" and "Search Key" though. I am not sure if it is
correct to merge the contact address into the to address though.

If you have any more samples of working or broken PST files, I would be
happy to have a copy of them to debug further.

#984581#59
Date:
2021-03-15 04:38:25 UTC
From:
To:
Hi Carl,

A Debian user reported a bug in libpst's readpst tool:

https://bugs.debian.org/984581

The bug report contains two issues experienced with a sample PST file:

The first is that normal MIME headers were not extracted, because the
headers were not considered valid, because the first header was the
X-GM-THRID header rather than one of the limited list of headers
considered valid by readpst. Clearly the header validity function is
not going to keep up with changes in the list of headers that are
commonly in PST files. My suggestion is to replace the current header
validity function with one that just checks if either the first header,
or the entire header block complies with the email header RFC.
Alternatively the header validity function could be removed, or made
optional but disabled/enabled by default.

The second is that when the headers are considered invalid, the To
field doesn't get an email address, only a name. This is because in the
PST file the 0x0E04 aka PR_DISPLAY_TO aka "Address Sent-To" field does
not contain the email address, only a name. The email address does
appear in some the other PST fields (contact and search key) though,
but I am not sure if all PST files have this problem, and I am not sure
if other PST files have the email address in other fields and I am not
sure if it is right to copy those fields to the To field.

I welcome any help you can give on these two topics.

#984581#64
Date:
2021-03-16 06:19:52 UTC
From:
To:
Hi Paul,

Thanks for your time on this issue.

We will try to provide our inhouse PST as soon as possible.

Thank you

From: Paul Wise <pabs@debian.org>
Sent: 15 March 2021 09:47 AM
To: Surla, Sai Kalyan <SaiKalyan.Surla@arcserve.com>; 984581@bugs.debian.org
Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

I did some further investigation of the PST file you sent.

I conclude that there are two problems you are experiencing:

The first one is that readpst doesn't consider the headers as valid
even though they clearly are valid. Since the header validity detection
was added to detect invalid PST files I am going to have to discuss
this with the upstream author. Perhaps the header validity detection
will have to become more generic or perhaps it will be discarded or
perhaps the invalid PST files will be detected in a different way.
Fixing this will bring back all the headers, including ARC & To.

The second one is that for your particular PST file, the To field does
not contain an email address. Looking at the debug output I see that
the "Display Sent-To Address" contains only the name, not the email.
This appears to be a problem with the PST file itself, as the 0x0E04
type, which is PR_DISPLAY_TO, aka the "Address Sent-To", does not
contain the email address. The email address does appear in the
"Contact Address" and "Search Key" though. I am not sure if it is
correct to merge the contact address into the to address though.

If you have any more samples of working or broken PST files, I would be
happy to have a copy of them to debug further.

#984581#69
Date:
2021-03-18 17:14:20 UTC
From:
To:
Hi Paul,

Please find a PST file which contains single email in `Sent Items` folder for which readpst failed to extract the To address.
Even after modifying the header_match function to return 1 always, readpst cannot extract the To addresses from it. Looks like it is some different issue.
Also I tried opening PST in Outlook and outlook is able to show the To addresses for this email.

I verified the headers and I didn’t find To address in MIME. And tried some windows tools where some of them are able to show the To address and some couldn’t.

Thank you
Sai Kalyan

From: Surla, Sai Kalyan
Sent: 16 March 2021 11:50 AM
To: Paul Wise <pabs@debian.org>; 984581@bugs.debian.org
Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

Hi Paul,

Thanks for your time on this issue.

We will try to provide our inhouse PST as soon as possible.

Thank you

From: Paul Wise <pabs@debian.org<mailto:pabs@debian.org>>
Sent: 15 March 2021 09:47 AM
To: Surla, Sai Kalyan <SaiKalyan.Surla@arcserve.com<mailto:SaiKalyan.Surla@arcserve.com>>; 984581@bugs.debian.org<mailto:984581@bugs.debian.org>
Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file

I did some further investigation of the PST file you sent.

I conclude that there are two problems you are experiencing:

The first one is that readpst doesn't consider the headers as valid
even though they clearly are valid. Since the header validity detection
was added to detect invalid PST files I am going to have to discuss
this with the upstream author. Perhaps the header validity detection
will have to become more generic or perhaps it will be discarded or
perhaps the invalid PST files will be detected in a different way.
Fixing this will bring back all the headers, including ARC & To.

The second one is that for your particular PST file, the To field does
not contain an email address. Looking at the debug output I see that
the "Display Sent-To Address" contains only the name, not the email.
This appears to be a problem with the PST file itself, as the 0x0E04
type, which is PR_DISPLAY_TO, aka the "Address Sent-To", does not
contain the email address. The email address does appear in the
"Contact Address" and "Search Key" though. I am not sure if it is
correct to merge the contact address into the to address though.

If you have any more samples of working or broken PST files, I would be
happy to have a copy of them to debug further.

#984581#74
Date:
2021-03-19 00:30:13 UTC
From:
To:
As far as I can tell from the `readpst -d debug.log` output, this new
PST file does not have any MIME headers in it, so it is expected that
fixing the valid_headers function will do nothing. I expect if you look
at the PST file in Outlook you will see there are no MIME headers.

I noticed something in common between the original PST file and the new
PST file you have sent, they both have an unknown MAPI type 0x39fe that
contains the email addresses of the recipients. So I will try to find
out in the PST file specifications what this MAPI type is for and then
add some code to libpst and readpst to decode it.

$ rm -f * ; /usr/bin/readpst -d debug.log ~/stash/samples/pst/bugs.debian.org/984581/forpst.pst ; echo ; grep -A5 'mapi-id: 0x39fe' debug.log
Opening PST file and indexes...
Processing Folder "Deleted Items"
Processing Folder "for pst"
	"Outlook Data File" - 2 items done, 0 items skipped.
	"for pst" - 1 items done, 0 items skipped.

2356166                 pst_process libpst.c(2194) #10 - mapi-id: 0x39fe type: 0x1f length: 0x13
2356166                 pst_process libpst.c(3172) Unknown type 0x39fe Unicode String Data [size = 0x13]
2356166                 pst_process libpst.c(3174)
2356166                     000000	:64 65 65 70 74 69 73 6b 40 67 6d 61 69 6c 2e 63 :deeptisk@gmail.c
2356166                     000010	:6f 6d 00                                        :om.

$ rm -f * ; /usr/bin/readpst -d debug.log ~/stash/samples/pst/bugs.debian.org/984581/u3si.pst ; echo ; grep -A5 'mapi-id: 0x39fe' debug.log
Opening PST file and indexes...
Processing Folder "Deleted Items"
Processing Folder "Sent Items"
	"Outlook Data File" - 2 items done, 0 items skipped.
	"Sent Items" - 1 items done, 0 items skipped.

2356205                 pst_process libpst.c(2194) #13 - mapi-id: 0x39fe type: 0x1f length: 0x16
2356205                 pst_process libpst.c(3172) Unknown type 0x39fe Unicode String Data [size = 0x16]
2356205                 pst_process libpst.c(3174)
2356205                     000000	:4d 79 55 73 65 72 31 40 65 78 63 68 31 33 66 61 :MyUser1@exch13fa
2356205                     000010	:73 2e 6c 6f 63 00                               :s.loc.

#984581#84
Date:
2021-03-19 02:56:10 UTC
From:
To:
The issue in libpst when there are no MIME headers in the PST file is:

There are some MAPI properties for To/CC/BCC:

https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplayto-canonical-property
https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaycc-canonical-property
https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaybcc-canonical-property

These contain *only* the names and not the addresses.

Outlook fills them automatically from the list of recipients.

Outlook stores the recipients in a separate table to email properties.

libpst stores them in the sentto/cc/bcc fields of the email structure.

libpst has no storage of the recipients table of the PST file.

libpst processes the MAPI types one-by-one rather than in separate
tables and only has one action per MAPI type.

So this is not going to be easy to fix.

I will discuss this with upstream.

#984581#89
Date:
2021-03-19 03:38:05 UTC
From:
To:
I figured out the problem here, the PR_DISPLAY_BCC, PR_DISPLAY_CC and
PR_DISPLAY_TO fields are basically bogus and contain *only* the names
and not the addresses and are filled out automatically by Outlook based
on the recipients of the message, which are stored in a separate MAPI
table to the email properties. libpst extracts the email properties,
but doesn't store the recipients table anywhere as far as I can tell.

https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplayto-canonical-property
https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaycc-canonical-property
https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaybcc-canonical-property

I propose the following set of fixes for this issue:

Add a pst_item_recipient struct with the set of properties described by
Microsoft at the URL below and a pointer to the next recipient struct.

https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/recipient-tables

Add a recipients linked list element to the pst_item struct with a
pointer to the first pst_item_recipient struct.

Add code to the pst_process function to populate the recipients linked
list in a similar way to how the attachments are done.

Does that seem correct to you?

Are you willing to work on this or should I?

#984581#94
Date:
2021-03-22 05:41:59 UTC
From:
To:
Hi Paul,

In this case can we still go with the temporary change that you suggested as the issue is little different with this PST?

Thank you
Sai Kalyan

The issue in libpst when there are no MIME headers in the PST file is:

There are some MAPI properties for To/CC/BCC:

https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplayto-canonical-property<https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplayto-canonical-property>
https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaycc-canonical-property<https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaycc-canonical-property>
https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaybcc-canonical-property<https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaybcc-canonical-property>

These contain *only* the names and not the addresses.

Outlook fills them automatically from the list of recipients.

Outlook stores the recipients in a separate table to email properties.

libpst stores them in the sentto/cc/bcc fields of the email structure.

libpst has no storage of the recipients table of the PST file.

libpst processes the MAPI types one-by-one rather than in separate
tables and only has one action per MAPI type.

So this is not going to be easy to fix.

I will discuss this with upstream.

#984581#99
Date:
2021-03-22 07:25:36 UTC
From:
To:
The temporary change will not work for the second PST, since it only
works around the header detection issue, but the second PST doesn't
have the full MIME headers, only the predefined PST To/CC/BCC fields.

There isn't any easy workaround for the issue with the second PST.

#984581#104
Date:
2021-04-05 06:04:49 UTC
From:
To:
Hi Paul,

Hope you are doing good.
Is there any update on the issues.

Thank you
Sai Kalyan

The temporary change will not work for the second PST, since it only
works around the header detection issue, but the second PST doesn't
have the full MIME headers, only the predefined PST To/CC/BCC fields.

There isn't any easy workaround for the issue with the second PST.

#984581#109
Date:
2021-04-05 06:38:09 UTC
From:
To:
I discussed the issues with upstream.

Upstream doesn't have time to work on the issues.

Upstream confirmed my suggested solutions sound OK.

I haven't yet had time to work on the solutions.

#984581#114
Date:
2021-05-30 02:26:21 UTC
From:
To:
I finally found time to work on the first issue (header detection)
where we had a workaround already and created proper patches (attached)
for the issue and sent them to the upstream maintainer.

#984581#119
Date:
2021-05-31 06:38:16 UTC
From:
To:
Hi Paul,

Thanks for your time on this issue.
We will verify the patch that you shared and will let you know the results.

Thank you
Sai Kalyan



I finally found time to work on the first issue (header detection)
where we had a workaround already and created proper patches (attached)
for the issue and sent them to the upstream maintainer.

#984581#124
Date:
2021-08-17 03:48:22 UTC
From:
To:
Control: forwarded -1 https://bugzilla.redhat.com/show_bug.cgi?id=1994178

I have forwarded the patches to the Fedora bug tracker, hopefully that
will mean that the upstream maintainer will accept them now.

I had to fix a bug with the first patch causing a segfault.

I will include the patches in the next upload to Debian unstable.