DOS issues with Document Conversion

This possibly extends to other Aspose products as well.


Various denial of service scenarios occur if Aspose retrieves images over the network. Consider the following HTML document (tarpit.html):



Aspose, when rendering the document, attempts to load the file “google-logo.jpg” from the network. However, the network is not necessarily trustworthy.

Two Python web server scripts were tested to respond to Aspose’s request for “google-logo.jpg”.

1) Tarpitting: The first server returned a response (in chunked encoding, although details don’t matter) which takes 2 hours to complete. That is, it dribbles a little data, then a little more, over the course of two hours.

2) Crapflooding: Same test procedure as before, except the second server returned 1 TB of “A” characters as quickly as possible to any response. This caused Aspose to consume large amounts and possibly run out of memory and crash the application.

Hi,

Can you please further elaborate your requirement? If you are using Aspose.Words for Java to convert HTML to Word or PDF, you can use the following code to skip loading all or specific external images.

LoadOptions loadOptions = new LoadOptions();

loadOptions.setResourceLoadingCallback(new HandleResourceLoading());

Document doc = new Document(MyDir + "in.docx", loadOptions);

doc.save(MyDir + "Out.pdf");

class HandleResourceLoading implements IResourceLoadingCallback

{

public HandleResourceLoading()

{

// do nothing

}

public int resourceLoading(ResourceLoadingArgs args)

{

if (args.getResourceType() == ResourceType.IMAGE)

return ResourceLoadingAction.SKIP;

else

return ResourceLoadingAction.DEFAULT;

}

}

Best Regards,