We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

HTML to Word to PDF- missing image in PDF

I am currently evaluating Aspose.Words while looking for a method for converting HTML to Word and PDF that allows me to using a single Word template file for both in order to provide headers, footers, etc., thus Word as an intermediate representation when converting to PDF. The conversion from HTML to Word is working reasonable well (only some minor style issues), and the images are embedded in the resulting Word docx file. However, they do not show up in the PDF file (there is a red “X” placeholder instead of the image). The following code is an indication of the process I am using to generate Word and PDF output, and for loading images. Suggestions?

public static byte[] toWord(byte[] input, String baseURL, String format) throws Exception {
Document doc = loadReport(input, baseURL, format);
// Save to DOCX output stream
ByteArrayOutputStream outputStr = new ByteArrayOutputStream();
SaveOptions saveOptions = SaveOptions.createSaveOptions(SaveFormat.DOCX);
saveOptions.setPrettyFormat(true); // makes output human readable
doc.save(outputStr, saveOptions);
return(outputStr.toByteArray());
}

public static byte[] toPDF(byte[] input, String baseURL, String format) throws Exception {
Document doc = loadReport(input, baseURL, format);
// Save to PDF output stream
ByteArrayOutputStream outputStr = new ByteArrayOutputStream();
SaveOptions options = SaveOptions.createSaveOptions(SaveFormat.PDF);
doc.save(outputStr, options);
return(outputStr.toByteArray());
}

public static Document loadReport(byte[] input, String baseURL, String format) throws Exception {
LoadOptions loadOptions = new LoadOptions(LoadFormat.DOCX, “”, baseURL);
InputStream templateStr = loadOptions.getClass().getResourceAsStream("/resources/asposeReportTemplate.docx");
Document template = new Document(templateStr, loadOptions);
Document report = loadHTML(input, baseURL);
// Append report to template
template.appendDocument(report, ImportFormatMode.KEEP_SOURCE_FORMATTING);
template.updateFields();
return template;
}

// Load HTML to Document
public static Document loadHTML(byte[] input, final String baseURL) throws Exception {
ByteArrayInputStream inputStr = new ByteArrayInputStream(input);
LoadOptions options = new LoadOptions(LoadFormat.HTML, “”, baseURL);
// Image loader
options.setResourceLoadingCallback(new IResourceLoadingCallback() {
public int resourceLoading(ResourceLoadingArgs args) {
if(args.getResourceType() == ResourceType.IMAGE) {
String url = baseURL + args.getOriginalUri();
byte[] imageData = readFromURL(url);
if(imageData != null) {
args.setData(imageData);
return ResourceLoadingAction.USER_PROVIDED;
} else {
return ResourceLoadingAction.SKIP;
}
} else {
return ResourceLoadingAction.DEFAULT;
}
}
});
return new Document(inputStr, options);
}

public static byte[] readFromURL(String url) {
try {
URL U = new URL(url);
if(U != null) {
HttpURLConnection conn = null;
conn = (HttpURLConnection)U.openConnection();
conn.setConnectTimeout(5000);
conn.setReadTimeout(30000);
conn.setDoInput(true);
byte[] data = new byte[4096];

InputStream input = conn.getInputStream();
ByteArrayOutputStream bos = new ByteArrayOutputStream();
int count = input.read(data);
while(count > -1) {
bos.write(data, 0, count);
count = input.read(data);
}
input.close();
bos.close();
byte[] result = bos.toByteArray();
return(result);
}
} catch(Exception e) {
}
return(null);
}

Hi Jeff,


Thanks for your inquiry. Could you please attach your input HTML here for testing? I will investigate the issue on my side and provide you more information.

Hi Tahir,

I attached the sample html (sample.zip) to the original post. Obviously, you will need to change the image URLs to something that works in your environment. The image files are in the DOCX.

Jeff

Hi Jeff,


Thanks for sharing the html file. I have tested the scenario and have not found the shared issue. Perhaps, you are facing the issue due to base tag. Please make sure that you can access image via browser.


How could the base URL be an issue? The images are being loaded via a callback. Please look at the code again. I load the document the same way whether creating a DOCX or a PDF. When saving as DOCX, the images are there, and they are embedded in the DOCX. When saving as PDF, the images are not there. The images loaded via the callback should also be embedded in the PDF. Is this an evaluation version issue? Also note that my evaluation of this product includes evaluation of how good the support is, and so far I am not impressed.

Addendum: Problem solved. My image loading callback was not succeeding, yet the images were appearing in the docx document anyway. Weird. Even though the images were appearing in the docx, they were not exported to the pdf. Also weird. With the image loading callback fixed, images continue to appear in the docx, and now also appear in the pdf.

Hi Jeff,


Please accept my apologies for your inconvenience.

It is nice to hear from you that your problem has been solved. I have tested the scenario with shared code in your first post and have not found any issue while using latest version of Aspose.Words for Java v 13.7.0.

I have used the baseURL as ‘http://www.aspose.com’ and change the image source as follow:

<img height=“72” src="/images/aspose-logo.gif" alt="" width=“72”></img><o:p></o:p>


I have attached the output Pdf and Docx files with this post for your kind reference. Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.