HTML to Word to PDF- missing image in PDF

jeffmbeckergmail · July 29, 2013, 11:45am

I am currently evaluating Aspose.Words while looking for a method for converting HTML to Word and PDF that allows me to using a single Word template file for both in order to provide headers, footers, etc., thus Word as an intermediate representation when converting to PDF. The conversion from HTML to Word is working reasonable well (only some minor style issues), and the images are embedded in the resulting Word docx file. However, they do not show up in the PDF file (there is a red “X” placeholder instead of the image). The following code is an indication of the process I am using to generate Word and PDF output, and for loading images. Suggestions?

public static byte[] toWord(byte[] input, String baseURL, String format) throws Exception
{
    Document doc = loadReport(input, baseURL, format);
    // Save to DOCX output stream
    ByteArrayOutputStream outputStr = new ByteArrayOutputStream();
    SaveOptions saveOptions = SaveOptions.createSaveOptions(SaveFormat.DOCX);
    saveOptions.setPrettyFormat(true); // makes output human readable
    doc.save(outputStr, saveOptions);
    return (outputStr.toByteArray());
}

public static byte[] toPDF(byte[] input, String baseURL, String format) throws Exception
{
    Document doc = loadReport(input, baseURL, format);
    // Save to PDF output stream
    ByteArrayOutputStream outputStr = new ByteArrayOutputStream();
    SaveOptions options = SaveOptions.createSaveOptions(SaveFormat.PDF);
    doc.save(outputStr, options);
    return (outputStr.toByteArray());
}

public static Document loadReport(byte[] input, String baseURL, String format) throws Exception
{
    LoadOptions loadOptions = new LoadOptions(LoadFormat.DOCX, "", baseURL);
    InputStream templateStr = loadOptions.getClass().getResourceAsStream("/resources/asposeReportTemplate.docx");
    Document template = new Document(templateStr, loadOptions);
    Document report = loadHTML(input, baseURL);
    // Append report to template
    template.appendDocument(report, ImportFormatMode.KEEP_SOURCE_FORMATTING);
    template.updateFields();
    return template;
}

// Load HTML to Document public static Document loadHTML(byte[] input, final String baseURL) throws Exception {
ByteArrayInputStream inputStr = new ByteArrayInputStream(input);
LoadOptions options = new LoadOptions(LoadFormat.HTML, "", baseURL);
// Image loader
options.setResourceLoadingCallback(new IResourceLoadingCallback()
        {
            public int resourceLoading(ResourceLoadingArgs args)
            {
                if (args.getResourceType() == ResourceType.IMAGE)
                {
                    String url = baseURL + args.getOriginalUri();
                    byte[] imageData = readFromURL(url);
                    if (imageData != null)
                    {
                        args.setData(imageData);
                        return ResourceLoadingAction.USER_PROVIDED;
                    }
                    else
                    {
                        return ResourceLoadingAction.SKIP;
                    }
                }
                else
                {
                    return ResourceLoadingAction.DEFAULT;
                }
            }
    }); 
    return new Document(inputStr, options);
}

public static byte[] readFromURL(String url)
{
    try
    {
        URL U = new URL(url);
        if (U != null)
        {
            HttpURLConnection conn = null;
            conn = (HttpURLConnection) U.openConnection();
            conn.setConnectTimeout(5000);
            conn.setReadTimeout(30000);
            conn.setDoInput(true);
            byte[] data = new byte[4096];

            InputStream input = conn.getInputStream();
            ByteArrayOutputStream bos = new ByteArrayOutputStream();
            int count = input.read(data);
            while (count> -1)
            {
                bos.write(data, 0, count);
                count = input.read(data);
            }
            input.close();
            bos.close();
            byte[] result = bos.toByteArray();
            return (result);
        }
    }
    catch (Exception e)
    {}
    return (null);
}

tahir.manzoor · July 31, 2013, 5:37am

Hi Jeff,

Thanks for your inquiry. Could you please attach your input HTML here for testing? I will investigate the issue on my side and provide you more information.

jeffmbeckergmail · July 31, 2013, 9:44am

Hi Tahir,

I attached the sample html (sample.zip) to the original post. Obviously, you will need to change the image URLs to something that works in your environment. The image files are in the DOCX.

Jeff

tahir.manzoor · August 1, 2013, 6:11am

Hi Jeff,

Thanks for sharing the html file. I have tested the scenario and have not found the shared issue. Perhaps, you are facing the issue due to base tag. Please make sure that you can access image via browser.

jeffmbeckergmail · August 7, 2013, 10:12am

How could the base URL be an issue? The images are being loaded via a callback. Please look at the code again. I load the document the same way whether creating a DOCX or a PDF. When saving as DOCX, the images are there, and they are embedded in the DOCX. When saving as PDF, the images are not there. The images loaded via the callback should also be embedded in the PDF. Is this an evaluation version issue? Also note that my evaluation of this product includes evaluation of how good the support is, and so far I am not impressed.

Addendum: Problem solved. My image loading callback was not succeeding, yet the images were appearing in the docx document anyway. Weird. Even though the images were appearing in the docx, they were not exported to the pdf. Also weird. With the image loading callback fixed, images continue to appear in the docx, and now also appear in the pdf.

tahir.manzoor · August 12, 2013, 4:24am

Hi Jeff,

Please accept my apologies for your inconvenience.

It is nice to hear from you that your problem has been solved. I have tested the scenario with shared code in your first post and have not found any issue while using latest version of Aspose.Words for Java v 13.7.0.

I have used the baseURL as ‘https://www.aspose.com/’ and change the image source as follow:

<img height="72" src="/images/aspose-logo.gif" alt="" width="72"></img>

I have attached the output Pdf and Docx files with this post for your kind reference. Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.