Hi,
We have been using the tool to extract emails from the PST files. However with
the recent observations, for some mails where the transport headers contain ARC
headers, the email addresses are not extracted from the PST and only usernames
are available in the MIME content of emails that are extracted.
After enabling debug logs we got to know that all the internet headers are
being ignored as bogus headers which also contains the headers To:, From: ...
where we can see the email addresses available.
As the tool is open-source we tried to debug the tool, post debug we identified
that the the headers are ignored (as bogus headers) and the tool is using the
metadata extracted to construct MIME content for the email where the email
addresses are missing.
We would like to point at two parts where the issue could be possibly happened.
1) Parsing the mail from PST - As the structure variable does not contain the
addresses for these emails.
2) Ignoring the headers as bogus headers using the incorrect comparison.
We are not able to look into the parsing part, but we did some changes to
verify the behavior at identification part of bogus headers, probably not
appropriate changes.
Sample Data:
Below is the sample MIME Content that is extracted for an email from
PST by readpst utility
From: user_1
To: user_2
CC: user_3
where user_1, user_2 and user_3 are just usernames without email addresses
We would like to hear back as soon as possible.
Thank you
Sai Kalyan
Control: tags -1 + moreinfo Could you test version 0.6.75-1 from Debian bullseye? Could you attach your patch to the bug report? Could you provide some information about what ARC headers are? Please supply an example PST file that this problem occurs with.
Hi Paul, We already tried with version 0.6.75-1. Also compiled the latest code available and tried with it, still the same results. Please find the changes in the attached file. (readpst.c line no. : 1238) ARC headers are kind of email authentication headers. Authenticated Received Chain (ARC) creates a mechanism for individual Internet Mail Handlers to add their authentication assessment to a message's ordered set of handling results. For more details please refer the following rfc https://tools.ietf.org/html/rfc8617. For some security reasons we cannot share the original , we will once discuss and let you know, if possible we will try to share the inhouse sample pst. We will let you know about the PST in the next couple of days Meanwhile our observation is if the headers start with the following headers (Date, From, To, Content-Type, MIME-Version, Microsoft Mail Internet Headers, Received, Subject and some other headers) it is treated as bogus, this email is starting with some header which is not one of the listed. Thank you Sai Kalyan From: Paul Wise <pabs@debian.org> Sent: 06 March 2021 08:12 AM To: Surla, Sai Kalyan <SaiKalyan.Surla@arcserve.com>; 984581@bugs.debian.org Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file Control: tags -1 + moreinfo Could you test version 0.6.75-1 from Debian bullseye? Could you attach your patch to the bug report? Could you provide some information about what ARC headers are? Please supply an example PST file that this problem occurs with.
Control: found -1 0.6.75-1
Thanks, marking the bug as found in that version.
Thanks for testing this too.
It is traditional to provide changes in the patch format by using the
`diff -u` command or the corresponding commands from the version
control system that the upstream project is using.
Below is the output from the Mercurial diff for your change.
$ hg diff
diff -r 7200790e46ac src/readpst.c
--- a/src/readpst.c Tue Jun 16 17:18:28 2020 -0700
+++ b/src/readpst.c Mon Mar 08 09:20:50 2021 +0800
@@ -1235,7 +1235,7 @@
int header_match(char *header, char*field) {
int n = strlen(field);
- if (strncasecmp(header, field, n) == 0) return 1; // tag:{space}
+ if (strstr(header,field) != NULL || strncasecmp(header, field, n) == 0) return 1; // tag:{space}
if ((field[n-1] == ' ') && (strncasecmp(header, field, n-1) == 0)) {
char *crlftab = "\r\n\t";
DEBUG_INFO(("Possible wrapped header = %s\n", header));
I am fairly certain that this is not the correct fix for this issue.
Thanks for the info.
Understood.
That would be necessary to be able to fix the issue.
That does look like what the code does indeed, probably the right fix
is to scan through all of the headers instead of just the first one.
Hi Paul,
Please find the PST contains single email with which we also faced problem in extracting email addresses under ‘To:’ header.
Thank you
Sai Kalyan
From: Paul Wise <pabs@debian.org>
Sent: 08 March 2021 07:02 AM
To: Surla, Sai Kalyan <SaiKalyan.Surla@arcserve.com>; 984581@bugs.debian.org
Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
Control: found -1 0.6.75-1
Thanks, marking the bug as found in that version.
Thanks for testing this too.
It is traditional to provide changes in the patch format by using the
`diff -u` command or the corresponding commands from the version
control system that the upstream project is using.
Below is the output from the Mercurial diff for your change.
$ hg diff
diff -r 7200790e46ac src/readpst.c
--- a/src/readpst.c Tue Jun 16 17:18:28 2020 -0700
+++ b/src/readpst.c Mon Mar 08 09:20:50 2021 +0800
@@ -1235,7 +1235,7 @@
int header_match(char *header, char*field) {
int n = strlen(field);
- if (strncasecmp(header, field, n) == 0) return 1; // tag:{space}
+ if (strstr(header,field) != NULL || strncasecmp(header, field, n) == 0) return 1; // tag:{space}
if ((field[n-1] == ' ') && (strncasecmp(header, field, n-1) == 0)) {
char *crlftab = "\r\n\t";
DEBUG_INFO(("Possible wrapped header = %s\n", header));
I am fairly certain that this is not the correct fix for this issue.
Thanks for the info.
Understood.
That would be necessary to be able to fix the issue.
That does look like what the code does indeed, probably the right fix
is to scan through all of the headers instead of just the first one.
Sorry, it looks like outlooks blocked this pst.
From: Surla, Sai Kalyan
Sent: 08 March 2021 01:45 PM
To: Paul Wise <pabs@debian.org>; 984581@bugs.debian.org
Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
Hi Paul,
Please find the PST contains single email with which we also faced problem in extracting email addresses under ‘To:’ header.
Thank you
Sai Kalyan
From: Paul Wise <pabs@debian.org<mailto:pabs@debian.org>>
Sent: 08 March 2021 07:02 AM
To: Surla, Sai Kalyan <SaiKalyan.Surla@arcserve.com<mailto:SaiKalyan.Surla@arcserve.com>>; 984581@bugs.debian.org<mailto:984581@bugs.debian.org>
Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
Control: found -1 0.6.75-1
Thanks, marking the bug as found in that version.
Thanks for testing this too.
It is traditional to provide changes in the patch format by using the
`diff -u` command or the corresponding commands from the version
control system that the upstream project is using.
Below is the output from the Mercurial diff for your change.
$ hg diff
diff -r 7200790e46ac src/readpst.c
--- a/src/readpst.c Tue Jun 16 17:18:28 2020 -0700
+++ b/src/readpst.c Mon Mar 08 09:20:50 2021 +0800
@@ -1235,7 +1235,7 @@
int header_match(char *header, char*field) {
int n = strlen(field);
- if (strncasecmp(header, field, n) == 0) return 1; // tag:{space}
+ if (strstr(header,field) != NULL || strncasecmp(header, field, n) == 0) return 1; // tag:{space}
if ((field[n-1] == ' ') && (strncasecmp(header, field, n-1) == 0)) {
char *crlftab = "\r\n\t";
DEBUG_INFO(("Possible wrapped header = %s\n", header));
I am fairly certain that this is not the correct fix for this issue.
Thanks for the info.
Understood.
That would be necessary to be able to fix the issue.
That does look like what the code does indeed, probably the right fix
is to scan through all of the headers instead of just the first one.
Outlook blocking the PST, please find the zipped PST file.
Thank you
Sai Kalyan
From: Surla, Sai Kalyan
Sent: 08 March 2021 07:27 PM
To: 'Paul Wise' <pabs@debian.org>; '984581@bugs.debian.org' <984581@bugs.debian.org>
Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
Sorry, it looks like outlooks blocked this pst.
From: Surla, Sai Kalyan
Sent: 08 March 2021 01:45 PM
To: Paul Wise <pabs@debian.org<mailto:pabs@debian.org>>; 984581@bugs.debian.org<mailto:984581@bugs.debian.org>
Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
Hi Paul,
Please find the PST contains single email with which we also faced problem in extracting email addresses under ‘To:’ header.
Thank you
Sai Kalyan
From: Paul Wise <pabs@debian.org<mailto:pabs@debian.org>>
Sent: 08 March 2021 07:02 AM
To: Surla, Sai Kalyan <SaiKalyan.Surla@arcserve.com<mailto:SaiKalyan.Surla@arcserve.com>>; 984581@bugs.debian.org<mailto:984581@bugs.debian.org>
Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
Control: found -1 0.6.75-1
Thanks, marking the bug as found in that version.
Thanks for testing this too.
It is traditional to provide changes in the patch format by using the
`diff -u` command or the corresponding commands from the version
control system that the upstream project is using.
Below is the output from the Mercurial diff for your change.
$ hg diff
diff -r 7200790e46ac src/readpst.c
--- a/src/readpst.c Tue Jun 16 17:18:28 2020 -0700
+++ b/src/readpst.c Mon Mar 08 09:20:50 2021 +0800
@@ -1235,7 +1235,7 @@
int header_match(char *header, char*field) {
int n = strlen(field);
- if (strncasecmp(header, field, n) == 0) return 1; // tag:{space}
+ if (strstr(header,field) != NULL || strncasecmp(header, field, n) == 0) return 1; // tag:{space}
if ((field[n-1] == ' ') && (strncasecmp(header, field, n-1) == 0)) {
char *crlftab = "\r\n\t";
DEBUG_INFO(("Possible wrapped header = %s\n", header));
I am fairly certain that this is not the correct fix for this issue.
Thanks for the info.
Understood.
That would be necessary to be able to fix the issue.
That does look like what the code does indeed, probably the right fix
is to scan through all of the headers instead of just the first one.
Hi Paul,
How are you? Hope you got a chance to at the issue that we reported. I am reiterating the summary of the problem.
There are some transport headers starting with “ARC-Seal: ”. These transport headers also contain the To, CC and BCC addresses with both display names and corresponding email IDs. However, the `readpst` is discarding these transport headers while creating the EML file with MIME content and in the final MIME content we are getting only the display names for all the To, CC and BCC addresses. Possible that you might be considering canonical properties to extract the To, CC, and Bcc addresses from the PST file.
After looking at the readpst.c file (see below) we understood that the readpst is discarding any transport header that doesn’t start with the specified text.
int valid_headers(char *header)
// headers are sometimes really bogus - they seem to be fragments of the
// message body, so we only use them if they seem to be real rfc822 headers.
// this list is composed of ones that we have seen in real pst files.
// there are surely others. the problem is - given an arbitrary character
// string, is it a valid (or even reasonable) set of rfc822 headers?
if (header) {
if (header_match(header, "Content-Type: " )) return 1;
if (header_match(header, "Date: " )) return 1;
if (header_match(header, "From: " )) return 1;
if (header_match(header, "MIME-Version: " )) return 1;
if (header_match(header, "Microsoft Mail Internet Headers")) return 1;
if (header_match(header, "Received: " )) return 1;
if (header_match(header, "Return-Path: " )) return 1;
if (header_match(header, "Subject: " )) return 1;
if (header_match(header, "To: " )) return 1;
if (header_match(header, "X-ASG-Debug-ID: " )) return 1;
if (header_match(header, "X-Barracuda-URL: " )) return 1;
if (header_match(header, "X-x: " )) return 1;
if (strlen(header) > 2) {
DEBUG_INFO(("Ignore bogus headers = %s\n", header));
}
return 0;
}
else return 0;
}
As per our understanding, the ARC headers(which helps preserve email authentication results and verifies the identity of email intermediaries that forward a message on to its final destination) are introduced in 2016 and looks like this is not taken care in readpst.
Appreciate if you can clarify :
1. Is our understanding correct?
2. If Yes, can we expect a patch from you ?
3. If our understanding is not correct, can we expect a patch with proper fixes, or can you let us know where to fix the problem?
4. Are there any other headers like ARC, that are not taken care?
Looking forward for your reply so that we can commit a date to our customers.
Thank you
Sai Kalyan
From: Surla, Sai Kalyan
Sent: 08 March 2021 07:28 PM
To: 'Paul Wise' <pabs@debian.org>; '984581@bugs.debian.org' <984581@bugs.debian.org>
Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
Outlook blocking the PST, please find the zipped PST file.
Thank you
Sai Kalyan
From: Surla, Sai Kalyan
Sent: 08 March 2021 07:27 PM
To: 'Paul Wise' <pabs@debian.org<mailto:pabs@debian.org>>; '984581@bugs.debian.org' <984581@bugs.debian.org<mailto:984581@bugs.debian.org>>
Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
Sorry, it looks like outlooks blocked this pst.
From: Surla, Sai Kalyan
Sent: 08 March 2021 01:45 PM
To: Paul Wise <pabs@debian.org<mailto:pabs@debian.org>>; 984581@bugs.debian.org<mailto:984581@bugs.debian.org>
Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
Hi Paul,
Please find the PST contains single email with which we also faced problem in extracting email addresses under ‘To:’ header.
Thank you
Sai Kalyan
From: Paul Wise <pabs@debian.org<mailto:pabs@debian.org>>
Sent: 08 March 2021 07:02 AM
To: Surla, Sai Kalyan <SaiKalyan.Surla@arcserve.com<mailto:SaiKalyan.Surla@arcserve.com>>; 984581@bugs.debian.org<mailto:984581@bugs.debian.org>
Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file
Control: found -1 0.6.75-1
Thanks, marking the bug as found in that version.
Thanks for testing this too.
It is traditional to provide changes in the patch format by using the
`diff -u` command or the corresponding commands from the version
control system that the upstream project is using.
Below is the output from the Mercurial diff for your change.
$ hg diff
diff -r 7200790e46ac src/readpst.c
--- a/src/readpst.c Tue Jun 16 17:18:28 2020 -0700
+++ b/src/readpst.c Mon Mar 08 09:20:50 2021 +0800
@@ -1235,7 +1235,7 @@
int header_match(char *header, char*field) {
int n = strlen(field);
- if (strncasecmp(header, field, n) == 0) return 1; // tag:{space}
+ if (strstr(header,field) != NULL || strncasecmp(header, field, n) == 0) return 1; // tag:{space}
if ((field[n-1] == ' ') && (strncasecmp(header, field, n-1) == 0)) {
char *crlftab = "\r\n\t";
DEBUG_INFO(("Possible wrapped header = %s\n", header));
I am fairly certain that this is not the correct fix for this issue.
Thanks for the info.
Understood.
That would be necessary to be able to fix the issue.
That does look like what the code does indeed, probably the right fix
is to scan through all of the headers instead of just the first one.
I am looking at the issue today. I managed to reproduce the issue that you have reported using the sample PST file that you have provided. I acknowledge that I am seeing both the issues you reported: * only a limited set of headers are being extracted * email address is missing from the To header - but the From header is correct The readpst -d option to output debug information was instrumental in reproducing this, it causes all the info in the PST file and the entire sequence of decoding steps to be output to a debug file. I modified the valid_headers function to also accept the ARC-Seal header but that does not fix the problem. Looking at the debug output I noticed that the X-GM-THRID header is the first header. I then added a X-GM-THRID to the valid_headers function and that fixed the problem. I think that messages with a different first header will not work though, you would have to add all of the first headers that could exist to the valid_headers function, which seems like an incorrect thing to do. If you have any sample PST files that *do* work with the current code, that would allow me to compare the working PST with the broken PST, which would be very helpful in tracking down where the problem is. Until I can figure out the correct fix, I suggest you workaround this bug by adding "return 1;" without quotes as the first line in the valid_headers function. This way you can keep readpst working for your customers while the correct fix is found. I believe that the modern PST files that you have available are all valid files, while the valid_headers function aims to detect broken files, so there should be no risk to the conversion process for your case.
I did some further investigation of the PST file you sent. I conclude that there are two problems you are experiencing: The first one is that readpst doesn't consider the headers as valid even though they clearly are valid. Since the header validity detection was added to detect invalid PST files I am going to have to discuss this with the upstream author. Perhaps the header validity detection will have to become more generic or perhaps it will be discarded or perhaps the invalid PST files will be detected in a different way. Fixing this will bring back all the headers, including ARC & To. The second one is that for your particular PST file, the To field does not contain an email address. Looking at the debug output I see that the "Display Sent-To Address" contains only the name, not the email. This appears to be a problem with the PST file itself, as the 0x0E04 type, which is PR_DISPLAY_TO, aka the "Address Sent-To", does not contain the email address. The email address does appear in the "Contact Address" and "Search Key" though. I am not sure if it is correct to merge the contact address into the to address though. If you have any more samples of working or broken PST files, I would be happy to have a copy of them to debug further.
Hi Carl, A Debian user reported a bug in libpst's readpst tool: https://bugs.debian.org/984581 The bug report contains two issues experienced with a sample PST file: The first is that normal MIME headers were not extracted, because the headers were not considered valid, because the first header was the X-GM-THRID header rather than one of the limited list of headers considered valid by readpst. Clearly the header validity function is not going to keep up with changes in the list of headers that are commonly in PST files. My suggestion is to replace the current header validity function with one that just checks if either the first header, or the entire header block complies with the email header RFC. Alternatively the header validity function could be removed, or made optional but disabled/enabled by default. The second is that when the headers are considered invalid, the To field doesn't get an email address, only a name. This is because in the PST file the 0x0E04 aka PR_DISPLAY_TO aka "Address Sent-To" field does not contain the email address, only a name. The email address does appear in some the other PST fields (contact and search key) though, but I am not sure if all PST files have this problem, and I am not sure if other PST files have the email address in other fields and I am not sure if it is right to copy those fields to the To field. I welcome any help you can give on these two topics.
Hi Paul, Thanks for your time on this issue. We will try to provide our inhouse PST as soon as possible. Thank you From: Paul Wise <pabs@debian.org> Sent: 15 March 2021 09:47 AM To: Surla, Sai Kalyan <SaiKalyan.Surla@arcserve.com>; 984581@bugs.debian.org Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file I did some further investigation of the PST file you sent. I conclude that there are two problems you are experiencing: The first one is that readpst doesn't consider the headers as valid even though they clearly are valid. Since the header validity detection was added to detect invalid PST files I am going to have to discuss this with the upstream author. Perhaps the header validity detection will have to become more generic or perhaps it will be discarded or perhaps the invalid PST files will be detected in a different way. Fixing this will bring back all the headers, including ARC & To. The second one is that for your particular PST file, the To field does not contain an email address. Looking at the debug output I see that the "Display Sent-To Address" contains only the name, not the email. This appears to be a problem with the PST file itself, as the 0x0E04 type, which is PR_DISPLAY_TO, aka the "Address Sent-To", does not contain the email address. The email address does appear in the "Contact Address" and "Search Key" though. I am not sure if it is correct to merge the contact address into the to address though. If you have any more samples of working or broken PST files, I would be happy to have a copy of them to debug further.
Hi Paul, Please find a PST file which contains single email in `Sent Items` folder for which readpst failed to extract the To address. Even after modifying the header_match function to return 1 always, readpst cannot extract the To addresses from it. Looks like it is some different issue. Also I tried opening PST in Outlook and outlook is able to show the To addresses for this email. I verified the headers and I didn’t find To address in MIME. And tried some windows tools where some of them are able to show the To address and some couldn’t. Thank you Sai Kalyan From: Surla, Sai Kalyan Sent: 16 March 2021 11:50 AM To: Paul Wise <pabs@debian.org>; 984581@bugs.debian.org Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file Hi Paul, Thanks for your time on this issue. We will try to provide our inhouse PST as soon as possible. Thank you From: Paul Wise <pabs@debian.org<mailto:pabs@debian.org>> Sent: 15 March 2021 09:47 AM To: Surla, Sai Kalyan <SaiKalyan.Surla@arcserve.com<mailto:SaiKalyan.Surla@arcserve.com>>; 984581@bugs.debian.org<mailto:984581@bugs.debian.org> Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file I did some further investigation of the PST file you sent. I conclude that there are two problems you are experiencing: The first one is that readpst doesn't consider the headers as valid even though they clearly are valid. Since the header validity detection was added to detect invalid PST files I am going to have to discuss this with the upstream author. Perhaps the header validity detection will have to become more generic or perhaps it will be discarded or perhaps the invalid PST files will be detected in a different way. Fixing this will bring back all the headers, including ARC & To. The second one is that for your particular PST file, the To field does not contain an email address. Looking at the debug output I see that the "Display Sent-To Address" contains only the name, not the email. This appears to be a problem with the PST file itself, as the 0x0E04 type, which is PR_DISPLAY_TO, aka the "Address Sent-To", does not contain the email address. The email address does appear in the "Contact Address" and "Search Key" though. I am not sure if it is correct to merge the contact address into the to address though. If you have any more samples of working or broken PST files, I would be happy to have a copy of them to debug further.
As far as I can tell from the `readpst -d debug.log` output, this new PST file does not have any MIME headers in it, so it is expected that fixing the valid_headers function will do nothing. I expect if you look at the PST file in Outlook you will see there are no MIME headers. I noticed something in common between the original PST file and the new PST file you have sent, they both have an unknown MAPI type 0x39fe that contains the email addresses of the recipients. So I will try to find out in the PST file specifications what this MAPI type is for and then add some code to libpst and readpst to decode it. $ rm -f * ; /usr/bin/readpst -d debug.log ~/stash/samples/pst/bugs.debian.org/984581/forpst.pst ; echo ; grep -A5 'mapi-id: 0x39fe' debug.log Opening PST file and indexes... Processing Folder "Deleted Items" Processing Folder "for pst" "Outlook Data File" - 2 items done, 0 items skipped. "for pst" - 1 items done, 0 items skipped. 2356166 pst_process libpst.c(2194) #10 - mapi-id: 0x39fe type: 0x1f length: 0x13 2356166 pst_process libpst.c(3172) Unknown type 0x39fe Unicode String Data [size = 0x13] 2356166 pst_process libpst.c(3174) 2356166 000000 :64 65 65 70 74 69 73 6b 40 67 6d 61 69 6c 2e 63 :deeptisk@gmail.c 2356166 000010 :6f 6d 00 :om. $ rm -f * ; /usr/bin/readpst -d debug.log ~/stash/samples/pst/bugs.debian.org/984581/u3si.pst ; echo ; grep -A5 'mapi-id: 0x39fe' debug.log Opening PST file and indexes... Processing Folder "Deleted Items" Processing Folder "Sent Items" "Outlook Data File" - 2 items done, 0 items skipped. "Sent Items" - 1 items done, 0 items skipped. 2356205 pst_process libpst.c(2194) #13 - mapi-id: 0x39fe type: 0x1f length: 0x16 2356205 pst_process libpst.c(3172) Unknown type 0x39fe Unicode String Data [size = 0x16] 2356205 pst_process libpst.c(3174) 2356205 000000 :4d 79 55 73 65 72 31 40 65 78 63 68 31 33 66 61 :MyUser1@exch13fa 2356205 000010 :73 2e 6c 6f 63 00 :s.loc.
The specs indicate that 0x39fe is indeed the recipient address: https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-pst/141923d5-15ab-4ef1-a524-6dce75aae546 https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-pst/5ee9a00a-858b-47db-95b3-f91518640ea7 https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagsmtpaddress-canonical-property
The issue in libpst when there are no MIME headers in the PST file is: There are some MAPI properties for To/CC/BCC: https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplayto-canonical-property https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaycc-canonical-property https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaybcc-canonical-property These contain *only* the names and not the addresses. Outlook fills them automatically from the list of recipients. Outlook stores the recipients in a separate table to email properties. libpst stores them in the sentto/cc/bcc fields of the email structure. libpst has no storage of the recipients table of the PST file. libpst processes the MAPI types one-by-one rather than in separate tables and only has one action per MAPI type. So this is not going to be easy to fix. I will discuss this with upstream.
I figured out the problem here, the PR_DISPLAY_BCC, PR_DISPLAY_CC and PR_DISPLAY_TO fields are basically bogus and contain *only* the names and not the addresses and are filled out automatically by Outlook based on the recipients of the message, which are stored in a separate MAPI table to the email properties. libpst extracts the email properties, but doesn't store the recipients table anywhere as far as I can tell. https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplayto-canonical-property https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaycc-canonical-property https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaybcc-canonical-property I propose the following set of fixes for this issue: Add a pst_item_recipient struct with the set of properties described by Microsoft at the URL below and a pointer to the next recipient struct. https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/recipient-tables Add a recipients linked list element to the pst_item struct with a pointer to the first pst_item_recipient struct. Add code to the pst_process function to populate the recipients linked list in a similar way to how the attachments are done. Does that seem correct to you? Are you willing to work on this or should I?
Hi Paul, In this case can we still go with the temporary change that you suggested as the issue is little different with this PST? Thank you Sai Kalyan The issue in libpst when there are no MIME headers in the PST file is: There are some MAPI properties for To/CC/BCC: https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplayto-canonical-property<https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplayto-canonical-property> https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaycc-canonical-property<https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaycc-canonical-property> https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaybcc-canonical-property<https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagdisplaybcc-canonical-property> These contain *only* the names and not the addresses. Outlook fills them automatically from the list of recipients. Outlook stores the recipients in a separate table to email properties. libpst stores them in the sentto/cc/bcc fields of the email structure. libpst has no storage of the recipients table of the PST file. libpst processes the MAPI types one-by-one rather than in separate tables and only has one action per MAPI type. So this is not going to be easy to fix. I will discuss this with upstream.
The temporary change will not work for the second PST, since it only works around the header detection issue, but the second PST doesn't have the full MIME headers, only the predefined PST To/CC/BCC fields. There isn't any easy workaround for the issue with the second PST.
Hi Paul, Hope you are doing good. Is there any update on the issues. Thank you Sai Kalyan The temporary change will not work for the second PST, since it only works around the header detection issue, but the second PST doesn't have the full MIME headers, only the predefined PST To/CC/BCC fields. There isn't any easy workaround for the issue with the second PST.
I discussed the issues with upstream. Upstream doesn't have time to work on the issues. Upstream confirmed my suggested solutions sound OK. I haven't yet had time to work on the solutions.
I finally found time to work on the first issue (header detection) where we had a workaround already and created proper patches (attached) for the issue and sent them to the upstream maintainer.
Hi Paul, Thanks for your time on this issue. We will verify the patch that you shared and will let you know the results. Thank you Sai Kalyan I finally found time to work on the first issue (header detection) where we had a workaround already and created proper patches (attached) for the issue and sent them to the upstream maintainer.
Control: forwarded -1 https://bugzilla.redhat.com/show_bug.cgi?id=1994178 I have forwarded the patches to the Fedora bug tracker, hopefully that will mean that the upstream maintainer will accept them now. I had to fix a bug with the first patch causing a segfault. I will include the patches in the next upload to Debian unstable.