Embedded image lost while converting e-mail to PDF/A

Hi

We are using Aspose.Email 4.5.0.0 and Aspose.Pdf 9.3.1 to convert e-mails to PDF/A1B.
When we convert e-mails with embedded images, the image is lost in the PDF file and replaced by a big X.

I’ve attached an example to this post.

Could you investigate?


Kind regards,

Bart Maes
ECM Consultant
Docbyte sa

Hi Bart,


Thanks for your inquiry. We have tested the scenario using latest version of Aspose.Email, Aspose.Words and Aspose.Pdf Jars and unable to notice the reported issue. We will appreciate it if you please share your sample code here, so we will test the scenario and provide you more information accordingly.

// MSG to MHTML using Aspose.Email<o:p></o:p>

FileInputStream fstream=new FileInputStream(myDir+"Mail met tekening in.msg");

MailMessage eml = MailMessage.load(fstream);

//Save the Message to output stream in MHTML format

ByteArrayOutputStream emlStream = new ByteArrayOutputStream();

eml.save(emlStream, MailMessageSaveType.getMHtmlFormat());

//MHTML to PDF using Aspose.Words

//Load the stream in Word document

com.aspose.words.LoadOptions lo = new com.aspose.words.LoadOptions();

lo.setLoadFormat(com.aspose.words.LoadFormat.MHTML);

com.aspose.words.Document doc = new com.aspose.words.Document(new ByteArrayInputStream(emlStream.toByteArray()), lo);

//Save to disc

doc.save(myDir+"Mail met tekening in.pdf", com.aspose.words.SaveFormat.PDF);

// PDF to PDFA1b conversion using Apsose.Pdf

com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document(myDir

+ "Mail met tekening in.pdf");

// Convert to PDF/A compliant document

//pdfDocument.validate("Validation_log.xml",com.aspose.pdf.PdfFormat.PDF_A_1A);

pdfDocument.convert("Conversion_log.xml",com.aspose.pdf.PdfFormat.PDF_A_1B,com.aspose.pdf.ConvertErrorAction.Delete);

// Save updated document

pdfDocument.save(myDir + "pdfa1b_pdf.pdf");

Please feel free to contact us for any further assistance.

Best Regards,

Hi Tilal

Here is part of our code to convert e-mail to PDF.
The contents of the e-mail are retrieved with mail.getHtmlBody();

MailMessageLoadOptions mailOptions = new MailMessageLoadOptions();
mailOptions.setMessageFormat(MessageFormat.getMsg());
mailOptions.setFileCompatibilityMode(FileCompatibilityMode.SkipValidityChecking);
MailMessage mail = MailMessage.load(inFilePath, mailOptions);

Document doc = new Document();
DocumentBuilder docBuilder = new DocumentBuilder(doc);

docBuilder.insertHtml(mail.getHtmlBody());

ByteArrayOutputStream docOut = new ByteArrayOutputStream();
PdfSaveOptions saveOptions = new PdfSaveOptions();
saveOptions.setCompliance(PdfCompliance.PDF_A_1_B);
doc.save(docOut, saveOptions);


Kind regards,

Bart

Hi Bart,


Thanks for sharing your source code. It seems you are using Aspose.Email for Java and Aspose.Words for Java for the purpose. It seems when you read Email contents into DocumentBuilder() using getHTMLBody() method it loses the image reference. We are looking into it and will guide you soon.

However, meanwhile you can to convert Email to MHTML and load into Aspose.Words for Java to convert it to PDFA. It would help you to accomplish the task.

Best Regards,

Hi Bart,


Images in a MSG are linked to the HTML body of the message through Linked Resources where each Linked Resource has its own Content id. If you write the HTML string to a file on disc and open in browser, it will be missing the images due to the image hyperlinks pointing towards the linked resources which it doesn’t find where the HTML file is saved. Thus, to achieve what you are looking for, you need to carry out the following steps:

1. Extract the linked resources from the MSG and save them to disc.
2. Correct the references to the linked resources to point towards the saved images on disc
3. Convert the corrected html file to PDF

Please refer to the following sample code here for your kind reference:

Sample Code:


String dir = “579482\”;


MailMessage msg = MailMessage.load(dir + “Mail met tekening in.msg”);


// Save all embedded images

for(LinkedResource lr:msg.getLinkedResources())

{

lr.save(dir + lr.getContentId());

}


//Replace ContentsID with path to images

String strHtmlContents = msg.getHtmlBody().replace(“cid:”, “”);


//Save the HTML contents to file

FileOutputStream fop = null;

File file;


try {


file = new File(dir + “Test.html”);

fop = new FileOutputStream(file);


// if file doesnt exists, then create it

if (!file.exists()) {

file.createNewFile();

}


// get the content in bytes

byte[] contentInBytes = strHtmlContents.getBytes();


fop.write(contentInBytes);

fop.flush();

fop.close();

}

catch(Exception e)

{}


Document doc = new Document(dir + “Test.html”);

PdfSaveOptions saveOptions = new PdfSaveOptions();

saveOptions.setCompliance(PdfCompliance.PDF_A_1_B);

doc.save(dir + “Test.pdf”, saveOptions);