Issues with images when converting html to word in a firewalled environment

Hi support team,

We are running into a strange situation and would like to have your assistance. Our application is using Aspose.Words to convert html to words. The html contains some images with relative path and we provide base href tag as well. Things work perfectly in our local machine, QA server (running Windows 2008 r2).

The problem arises when we deploy to production servers in client environment. They do have firewall rules in place. Once html is converted into words, all images appears as red crosses.

We try to mix and match:

  • Using absolute url to an image on the internet - WORKS
  • Using relative url with base href to an image on the internet - WORKS
  • Using absolute urls to application hosted images - DOESN’T work on production
  • Using relative urls to application hosted images with base href - DOESN’T work on production

Following is the details:

  • Application deployed on 10.0.0.7 and accessible via http://10.0.0.7 or its domain name http://client.domain.com
  • The application can be accessed from that machine locally or from intranet or internet
  • Embedded images are served by the application
  • Typing images’ urls (e.g. http://client.domain.com/images/abc.png) into a browser on that machine or a remote machine will return expected image.

Could you please help to determine where the problem is? Is there a way to view/turn on debug log for aspose products?

I hope I have provided enough context on the situation. Please let me know should you need further information.

Thanks,

Tien

Hi Tien,

Thanks for your inquiry. I think you need to set the BaseUri option in the LoadOptions when loading the HTML document. Please see the API page here for details. I hope, this helps.

Best regards,

Hi Awais,

As explained before, I know that base href defined in the html worked well for relative urls. What I am asking is why it doesn’t work for this particular production deployment? Is there a debug mode/logging configuration I can use to figure out the cause.

Thanks, Tien

Hi Tien,

Thanks for your inquiry.

First of all, could you please attach your input Html here for testing? I will investigate the issue on my side and provide you more information. Secondly, you mentioned about firewall, it would be great if you please check the firewall log and your system event viewer security log to ensure if a specific URL is not blocked. You can also test the following code to determine if the images are actually accessible:

const string href = "image url goes here";
WebRequest request = WebRequest.Create(href);
WebResponse response = request.GetResponse();
using (Stream responseStream = response.GetResponseStream())
using (FileStream fileStream = File.OpenWrite(@"C:\Temp\img.jpg"))
{
    byte[] buf = new byte[4096];
    while (true)
    {
        int bytesRead = responseStream.Read(buf, 0, buf.Length);
        if (bytesRead <= 0)
            break;
        else
            fileStream.Write(buf, 0, bytesRead);
    }
}

Best regards,

Awais, thanks for your reply but I did specify that the issue with java. .Net example doesn’t help. It also seems that you didn’t understand my explanation in the first post.

What I am after is a way to debug or turn on logging inside Aspose. Anything related to firewall will be almost impossible as it needs to go through many management layers.

Thanks, Tien

Hi Awais, your response actually gave me an idea to fix the issue. Many Thanks.

We can close this one here.

Cheers, Tien

Hi Tien,

Thanks for your inquiry and I apologize for posting the .NET code.

I suppose the problem occurs because Aspose.Words simply cannot find the image in the specified location. Specifying full path to image in src attribute of img tag should have fixed the problem. Well, it would be great if you please attach your problematic HTML document here for testing. I will then investigate the issue on my side and provide you more information.

Regarding turning on logging inside Aspose.Words, I am afraid this is not possible as currently Aspose.Words doesn’t generate any logs/warnings about images being loaded from HTML. However, you may want to take a look at the LoadOptions.ResourceLoadingCallback property which is called when an image is found during HTML import and allows you to get and set the URI to download the image from.

Best regards,

Thanks Awais, ResourceLoadingCallback sounds useful. There was nothing wrong with the input html. As stated in the first post, I did test it with absolute urls.

Cheers, Tien