Normal attachement is detected as Inline attachment (C# .NET)

Hi,

we downloaded an EML from mail provider. When try to extract attachments we got a strange filename: the “à” char was converted into “�” ones.

Below the snippet of EML, containing the filename:

------=_NextPart_000_0067_01D4E86C.C73B95F0
Content-Type: application/octet-stream;
name="=?utf-8?Q?Comunicazione_Cambio_Indirizzo_P?=
=?utf-8?Q?EC_Cilento_Reti_Gas=5FSociet=EF=BF=BD_di_V?=
=?utf-8?Q?endita.pdf?="
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="=?utf-8?Q?Comunicazione_Cambio_Indirizzo_P?=
=?utf-8?Q?EC_Cilento_Reti_Gas=5FSociet=EF=BF=BD_di_V?=
=?utf-8?Q?endita.pdf?="

I expect to have a filename like “Comunicazione Cambio Indirizzo PEC Cilento Reti Gas_Società di Vendita.pdf” instead of “Comunicazione Cambio Indirizzo PEC Cilento Reti Gas_Societ� di Vendita.pdf”.
It seems that “Societ=EF=BF=BD” is not correctly converted into string. Maybe some Encoding conversion issues?

We are using Aspose.EMail v. 19.3.0.

Thanks

@gdiedoardo,

Can you please share source file along with sample code so that we may further investigate to help you out.

Hi,

This is the EML file.
234069.zip (179.5 KB)

In my DB I have this:
image.png (2.9 KB)

And this is the test code source:

private void TestEMLBis()
{
  byte[] EML = File.ReadAllBytes(@"c:\temp\234069.eml");

  EmlLoadOptions lo = new EmlLoadOptions();
  lo.PreserveTnefAttachments = true;
  lo.PrefferedTextEncoding = Encoding.UTF8;

  MailMessage mm;

  using (MemoryStream ms = new MemoryStream(EML))
  {
    mm = MailMessage.Load(ms, lo);

    Attachment att = mm.Attachments.FirstOrDefault(a => a.Name.ToLower() == "postacert.eml".ToLower());

    MailMessage messaggioCertificato = MailMessage.Load(att.ContentStream, lo);

    //Here att2 is null because "Comunicazione Cambio Indirizzo PEC Cilento Reti Gas_Societ? di Vendita.pdf" is different from “Comunicazione Cambio Indirizzo PEC Cilento Reti Gas_Societ� di Vendita.pdf”
    Attachment att2 = messaggioCertificato.Attachments.FirstOrDefault(a => a.Name.ToLower() == "Comunicazione Cambio Indirizzo PEC Cilento Reti Gas_Societ? di Vendita.pdf".ToLower());
  }
}

Remember that the character that should have been saved is this: “à”
Thank you

@gdiedoardo,

I have used following statement for extracting the attachment and it worked on my end. Can you please try using the same on your end.

        Attachment att2 = messaggioCertificato.Attachments.FirstOrDefault(a => a.Name.ToLower() == "Comunicazione Cambio Indirizzo PEC Cilento Reti Gas_Societ� di Vendita.pdf".ToLower());

Hi,

this is not the problem. The test source code above is for understand the problem. In our system the mail is saved on DB in this way:

          client.SaveMessage(msg.UniqueId, $@"{messagePath}\{msg.UniqueId}.eml");
          byte[] imapRawMessage;
          MailMessage imapParsedMessage;
          using (MemoryStream ms = new MemoryStream())
          {
            client.SaveMessage(msg.UniqueId, ms);
            imapRawMessage = ms.ToArray();
            imapParsedMessage = MailMessage.Load(ms);
          }

          //Controlla la presenza nell'intestazione del campo X-Ricevuta
          if (string.IsNullOrEmpty(imapParsedMessage.Headers["X-Ricevuta"])) //Se il campo non è presente, allora si tratta di una mail
          {
            childLogger.LogPecController(taskID, $"Messaggio identificato come POSTA IN INGRESSO", LogControllerModelDirezioneEnum.Ricezione, mbx.Mailbox.ID, idMail);
            //Genera il model della mail da salvare
            MailModel mailToSave = childMailService.CreateModelFromEML(imapRawMessage);
            mailToSave.DataRicezione = DateTime.Now;
            mailToSave.ServerUniqueID = idMail;

            //Imposta la cartella in cui salvarla
            switch (mailToSave.EnumTipoMail)
            {
              case MailModelTipoMailEnum.Certificata: //Nel caso di posta certificata
                childLogger.LogPecController(taskID, $"Messaggio di posta certificata", LogControllerModelDirezioneEnum.Ricezione, mbx.Mailbox.ID, idMail);
                mailToSave.FolderID = childFolderService.GetFolderByRole(FolderModelRuoloEnum.Inbox).ID;
                mailToSave.IsCertificata = true;
                break;

              case MailModelTipoMailEnum.NoCertificata: //Nel caso di posta non certificata
                childLogger.LogPecController(taskID, $"Messaggio di posta non certificata", LogControllerModelDirezioneEnum.Ricezione, mbx.Mailbox.ID, idMail);
                mailToSave.FolderID = childFolderService.GetFolderByRole(FolderModelRuoloEnum.NotCertified).ID;
                mailToSave.IsCertificata = false;
                break;
            }

            childMailService.Save(mailToSave);
            newID = mailToSave.ID;

where class of “client” is Aspose.Email.Client.Imap.ImapClient.

The problem is that on the DB the name of the attachment is this:

“Comunicazione Cambio Indirizzo PEC Cilento Reti Gas_Societ? di Vendita.pdf”

Therefore the equality between the two strings is not verified.

We cannot change the EML, It seems that “Societ=EF=BF=BD” is not correctly converted into string.
The correct string should be “Società”, neither “Societ?” nor “Societ�”.

How can we convert correctly the char?

Thank you

@gdiedoardo,

I have further investigated the issue on our end and an issue with ID EMAILNET-39437 has been created in our issue tracking system to further investigate and resolve the issue. This thread has been linked with the issue so that you may be notified once the issue will be fixed.

Good morning. Any news?

@gdiedoardo,

I request you to please try using latest Aspose.Email for .NET 19.4 on your end to kindly observe the issue. If there is still an issue please feel free to share with us.

Hi,

I tried to install version 19.4 but the problem is not solved.

@gdiedoardo,

We have investigated the issue on your end. You may need to use following modification in your sample code:

if ((mapiMessage.BodyHtml + "").Contains("cid:" + contentId + ""))
{
    return true;
}
else if ((HttpUtility.HtmlDecode(mapiMessage.BodyHtml + "") + "").Contains("cid:" + contentId + ""))
{
    return true;
}

Instead of following:

if ((mapiMessage.BodyHtml + "").Contains(contentId + ""))
{
    return true;
}
else if ((HttpUtility.HtmlDecode(mapiMessage.BodyHtml + "") + "").Contains(contentId + ""))
{
    return true;
}

The attachments content ID table is:

Attachment ContentId
IMG_0722.jpg “0”
IMG_0721.jpg “1”
IMG_0720.jpg “2”
doc05667520190313160834.pdf {not exists}
Outlook-1501095325.png “9061b351-f389-44d7-8d5f-d3d1e413900b”

This is because, ContentId strings “0”, “1”, “2” cannot be uniquely identified as attachment. If we try to find ContentId(“0”, “1”, “2”) in BodyHtml these values are usually present in a large data. Therefore, in these cases IsInlineAttachmentCheck result is true, although it is not so. If we need to check ContentId in BodyHtml we need to use “cid:” prefix as shared above.

Sorry but I didn’t understand the reply. It’s not about the question I asked.

@gdiedoardo,

Please accept my apology for this. We have internally verified the issue and this issue may concern to UTF8 conversion, if you convert UTF8 string to ASCII before save to DB, or DB engine convert it automatically if field type is not unicode.

foreach (var a in messaggioCertificato.Attachments)
{
    Console.WriteLine(a.Name);
    var utf8bytes = Encoding.UTF8.GetBytes(a.Name);
    var win1252Bytes = Encoding.Convert(Encoding.UTF8, Encoding.GetEncoding("windows-1252"), utf8bytes);
    Console.WriteLine(Encoding.GetEncoding("windows-1252").GetString(win1252Bytes));
}

You can get this result:

Comunicazione Cambio Indirizzo PEC Cilento Reti Gas_Societ� di Vendita.pdf
Comunicazione Cambio Indirizzo PEC Cilento Reti Gas_Societ? di Vendita.pdf