Embedded image lost while converting e-mail to PDF/A

geert_vanpeteghem_docshifter_com · October 14, 2014, 8:51am

Hi

We are using Aspose.Email 4.5.0.0 and Aspose.Pdf 9.3.1 to convert e-mails to PDF/A1B.
When we convert e-mails with embedded images, the image is lost in the PDF file and replaced by a big X.

I’ve attached an example to this post.

Could you investigate?

Kind regards,

Bart Maes
ECM Consultant
Docbyte sa

tilal.ahmad · October 15, 2014, 2:20am

Hi Bart,

Thanks for your inquiry. We have tested the scenario using the latest version of Aspose.Email, Aspose.Words, and Aspose.Pdf Jars and were unable to notice the reported issue. We would appreciate it if you could share your sample code here, so we can test the scenario and provide you with more information accordingly.

// MSG to MHTML using Aspose.Email
FileInputStream fstream = new FileInputStream(myDir + "Mail met tekening in.msg");

MailMessage eml = MailMessage.load(fstream);

Save the message to output stream in MHTML format
ByteArrayOutputStream emlStream = new ByteArrayOutputStream();
eml.save(emlStream, MailMessageSaveType.getMHtmlFormat());

//MHTML to PDF using Aspose.Words

Load the stream in Word document
com.aspose.words.LoadOptions lo = new com.aspose.words.LoadOptions();
lo.setLoadFormat(com.aspose.words.LoadFormat.MHTML);
com.aspose.words.Document doc = new com.aspose.words.Document(new ByteArrayInputStream(emlStream.toByteArray()), lo);

//Save to disc
doc.save(myDir + "Mail met tekening in.pdf", com.aspose.words.SaveFormat.PDF)

// PDF to PDFA1b conversion using Apsose.Pdf

com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document(myDir + "Mail met tekening in.pdf");

// Convert to PDF/A compliant document
//pdfDocument.validate("Validation_log.xml",com.aspose.pdf.PdfFormat.PDF_A_1A);
pdfDocument.convert("Conversion_log.xml",com.aspose.pdf.PdfFormat.PDF_A_1B,com.aspose.pdf.ConvertErrorAction.Delete);

// Save updated document
pdfDocument.save(myDir + "pdfa1b_pdf.pdf");

Please feel free to contact us for any further assistance.

Best Regards,

geert_vanpeteghem_docshifter_com · October 16, 2014, 4:40am

Hi Tilal

Here is part of our code to convert e-mail to PDF.
The contents of the e-mail are retrieved with mail.getHtmlBody();

MailMessageLoadOptions mailOptions = new MailMessageLoadOptions();
mailOptions.setMessageFormat(MessageFormat.getMsg());
mailOptions.setFileCompatibilityMode(FileCompatibilityMode.SkipValidityChecking);
MailMessage mail = MailMessage.load(inFilePath, mailOptions);

Document doc = new Document();
DocumentBuilder docBuilder = new DocumentBuilder(doc);

docBuilder.insertHtml(mail.getHtmlBody());

ByteArrayOutputStream docOut = new ByteArrayOutputStream();
PdfSaveOptions saveOptions = new PdfSaveOptions();
saveOptions.setCompliance(PdfCompliance.PDF_A_1_B);
doc.save(docOut, saveOptions);

Kind regards,

Bart

tilal.ahmad · October 17, 2014, 1:09am

Hi Bart,

Thanks for sharing your source code. It seems you are using Aspose.Email for Java and Aspose.Words for Java for the purpose. It seems when you read Email contents into DocumentBuilder() using getHTMLBody() method it loses the image reference. We are looking into it and will guide you soon.

However, meanwhile you can to convert Email to MHTML and load into Aspose.Words for Java to convert it to PDFA. It would help you to accomplish the task.

Best Regards,

kashif.iqbal · October 17, 2014, 2:15am

Hi Bart,

Images in a MSG are linked to the HTML body of the message through Linked Resources where each Linked Resource has its own Content id. If you write the HTML string to a file on disc and open in browser, it will be missing the images due to the image hyperlinks pointing towards the linked resources which it doesn’t find where the HTML file is saved. Thus, to achieve what you are looking for, you need to carry out the following steps:

1. Extract the linked resources from the MSG and save them to disc.

2. Correct the references to the linked resources to point towards the saved images on disc

3. Convert the corrected html file to PDF

Please refer to the following sample code here for your kind reference:

Sample Code:

String dir = “579482\”;

MailMessage msg = MailMessage.load(dir + “Mail met tekening in.msg”);

// Save all embedded images

for(LinkedResource lr:msg.getLinkedResources())

{

lr.save(dir + lr.getContentId());

}

//Replace ContentsID with path to images

String strHtmlContents = msg.getHtmlBody().replace(“cid:”, “”);

//Save the HTML contents to file

FileOutputStream fop = null;

File file;

try {

file = new File(dir + “Test.html”);

fop = new FileOutputStream(file);

// if file doesnt exists, then create it

if (!file.exists()) {

file.createNewFile();

}

// get the content in bytes

byte[] contentInBytes = strHtmlContents.getBytes();

fop.write(contentInBytes);

fop.flush();

fop.close();

}

catch(Exception e)

{}

Document doc = new Document(dir + “Test.html”);

PdfSaveOptions saveOptions = new PdfSaveOptions();

saveOptions.setCompliance(PdfCompliance.PDF_A_1_B);

doc.save(dir + “Test.pdf”, saveOptions);