Render Images present in HTML to docx/pdf

Hi ,

I have a img reference as below in my html

<img src=“https://www.tutorialspoint.com/assets/questions/media/426142-1668760872.png” alt=“alternatetext” />

I am converting this html to docx format via java . When I open generated docx file I am seeing the broken image in my output docx file as below.

May I know how can I make the image that is in HTML should render in output.docx file??

Thanks

@ramesh676 Please make sure the application have access to internet. I have used the following code for testing and image is properly downloaded and inserted into the output document:

Document doc = new Document("C:\\Temp\\in.html");
doc.save("C:\\Temp\\out.docx");

in.zip (272 Bytes)
out.docx (18.1 KB)

Hi @alexey.noskov,

I am using below code

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.insertHtml(html);
doc.save(docOutStream, SaveFormat.DOCX);

I am successfully converting html to PDF/DOCX using aspose.words that means I have internet connection right.

@ramesh676 The following code also properly inserts image into the document:

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.insertHtml("<img src=\"https://www.tutorialspoint.com/assets/questions/media/426142-1668760872.png\" alt=\"alternatetext\" />");
doc.save("C:\\Temp\\out.docx");

Aspose.Words does not require internet for conversion documents. Internet connection is required if it is needed to download external resources such as images in HTML.

Hi @alexey.noskov,

I am running the code in eclipse , I would like to know where you are running your code?
Do I need enable something in my windows machine or eclipse? I have connected proper internet source and there are no issues with internet.

Thanks
Ramesh

@ramesh676 I run the code in IntellyJ IDEA. You can try implementing IResourceLoadingCallback to control how Aspose.Words loads external resources.

Hi @alexey.noskov,

Is it mandatory to use IResourceLoadingCallback Interface? If yes could you share me the code that I need to add for below code

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.insertHtml(html);
doc.save(docOutStream, SaveFormat.DOCX);

Hi @alexey.noskov,

I have tried img src as below

<img src="file:///C:/Users/Ramesh/Downloads/RameshPicture.jpg" alt="alternatetext" />

Now the image got generated in the output.docx.

Since I gave local file system path so Internet connection is not required to read this. I suspect when we give img src = https then only it is failing and it might not be able top make a call to that https url.

I have one question here when I use this line builder.insertHtml(html);—> This insertHtml is processing from aspose side. I believe aspose should read this https src and aspose should download and embed the img in resulted docx.

@ramesh676

No, it is not mandatory. The callback simply allows you to control how external resources are loaded by Aspose.Words. For example it can be useful if external resource requires authentication.

Yes, you are right, Aspose.Words downloads external resource while inserting HTML. Please make sure the request is not blocked by firewall on your side.

Hi @alexey.noskov,

Document doc = new Document("C:\\Temp\\in.html");
doc.save("C:\\Temp\\out.docx");

I tried the code you mentioned here in a freshly created maven project in eclipse. I am still seeing broken image

Please let me know how can I make aspose download img src because aspose should download and I don’t know which firewall is blocking. Aspose will read the HTML and then aspose APIs will make call to img src. I can’t contron this aspose APIs to download the img src right? Please help.
Thanks
Ramesh

@ramesh676 As I have mentioned you can use IResourceLoadingCallback to control how external resources are loaded. For example see the following ocde:

Document doc = new Document();
doc.setResourceLoadingCallback(new TestResourceLoadingCallback());
DocumentBuilder builder = new DocumentBuilder(doc);
builder.insertHtml("<img src=\"https://www.tutorialspoint.com/assets/questions/media/426142-1668760872.png\" alt=\"alternatetext\" />");
doc.save("C:\\Temp\\out.docx");
private static class TestResourceLoadingCallback implements IResourceLoadingCallback
{
    @Override
    public int resourceLoading(ResourceLoadingArgs args) throws Exception {

        String url = args.getOriginalUri();
        System.out.println(url);
        args.setData(getBytesFromStream(new URI(url).toURL().openStream()));
        return ResourceLoadingAction.USER_PROVIDED;
    }

    static byte[] getBytesFromStream(final InputStream inputStream) throws IOException {
        final int bufferSize = 1024;
        int len;

        ByteArrayOutputStream byteBuffer = new ByteArrayOutputStream();
        byte[] buffer = new byte[bufferSize];

        while ((len = inputStream.read(buffer)) != -1) {
            byteBuffer.write(buffer, 0, len);
        }
        return byteBuffer.toByteArray();
    }
}

Hi @alexey.noskov,

As you mentioned here may I know how can send authentication details if external resource requires authentication.

In have HTML as below

<img alt=""  src="https://www.test.com/assets/questions/media/test.png">

In order to access above img authentication is required.

I am building above HTML and converting to PDF as below

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.insertHtml("<img alt=\"\"  src=\"https://www.test.com/assets/questions/media/test.png\">");
doc.save("C:\\Temp\\out.docx");

Now how can I send authentication details so that I can access above resource? Also pls let us know if there is any way to see logs if I get any iusues while converting my string html to PDF.

Thanks.

@ramesh676 If your remote host requires authentication, you should using IResourceLoadingCallback as shown above and specify authentication details in the URI. Please see the following article on stackoverflow for more details:
https://stackoverflow.com/questions/496651/connecting-to-remote-url-which-requires-authentication-using-java

You can implement IWarningCallback to get notification about possible issues upon processing documents.

HI @alexey.noskov,

I have used IResourceLoadingCallback and now image is getting inserted in the pdf but I have got an issue with width . The included imag in pdf crossing the right side border and some of the image content is cut as below , marked the content with yeelow circle right side, you can observe that table got cut

How can I control the width so that my entire imag will be visble in generated pdf without cut?
How can I set img right side width so that it will not cross right side border?

Thanks

@ramesh676 Could you please attach your problematic input and output documents here for our reference. Unfortunately, it is impossible to analyze the problem using screenshots.