Preventing Aspose word from executing HTML zip bomb

Hi,

I have an scenario where i create a Aspose word Document from an inputstream in Java.

Aspose.Word version is 18.9

The input file in the input stream is actually a txt file that contains HTML-code linking to a zip bomb file but renamed to a DOC file like this myFile.html.doc

I create my Aspose word document like this:

LoadOptions loadOptions = new LoadOptions();
loadOptions.setLoadFormat(LoadFormat.DOC);
// loadOptions.setLoadFormat(LoadFormat.TEXT);
Document doc = new Document(is, loadOptions);

When I use LoadFormat.TEXT everything works fine because the HTML content is interpret as pure text.
But when I change to LoadFormat.DOC the html code is interpret as HTML and the zip bomb is executed.

So my question is, is there some way that I can prevent HTML code from being executed when I create a Aspose word document using LoadFormat.DOC since this code is also used to create ordinary word documents.

Regards

@tobjo853

Thanks for your inquiry. If your document is text file and contains HTML tags, please do not use LoadFormat.DOC.

Please ZIP and attach your input document and expected output document here for testing. We will investigate the issue on our side and provide you more information.

zipbomb.zip (350 Bytes)

Here is the test file I use to trigger the problem.
I need to use LoadFormat.DOC since this code is used to create “ordinary” doc files not just bad once.

So, what I’m looking for is a way to tell Aspose not to try to be clever (execute the html code inside the doc) when creating the Document object from a DOC file.
I can’t use LoadFormat.TXT because then I loose images etc in the doc file.

@tobjo853

Thanks for sharing the document. Unfortunately, the shared URI is not resolved. Please share the working URI.
http://isoteam.infor.com/GZIP-BOMB/gzipBloat.php?contenttype=image%2fjpeg

We have tested the scenario using following image tag and noticed that the image is not imported into Aspose.Words’ DOM.
<img src="https://www.aspose.com/images/aspose-logo.gif" />

We have logged this issue as WORDSJAVA-1938 in our issue tracking system. You will be notified via this forum thread once this issue is resolved. We apologize for your inconvenience.

Could you please share the problematic output document here for our reference?

I will try to explain the issue I have and what I want to accomplish.

I need to use Asopse to create a word document using a input stream in Java.

like this:

LoadOptions loadOptions = new LoadOptions();
loadOptions.setLoadFormat(LoadFormat.DOC);
Document doc = new Document(is, loadOptions);

The inputstream is a .doc file.
The .doc file contains HTML code with a html link to a resource.
When I use LoadFormat.DOC and the new Document(is, loadOption) is triggered the HTML code inside the .doc file is executed. In my case the HTML code contains a link to a zip bomb that causes the new Document(…) call to crash.

So what I’m looking for is a way to use the new Document(is, loadFormat) call but at the same time tell Aspose not to execute the HTML code inside the .doc file. The content of the .doc file has to be treated as pure text, Aspose can’t be smart and try to execute the HTML code.

Regards

@tobjo853

Thanks for sharing the detail. Please implement IResourceLoadingCallback interface to control how Aspose.Words loads external resource when importing a document from HTML or MHTML.

Please use the following code example to achieve your requirement. Hope this helps you.

public class HandleResourceLoading implements IResourceLoadingCallback
{
    public int resourceLoading(ResourceLoadingArgs args)
    {
        if (args.getOriginalUri().toString().endsWith(".gzip"))
            return ResourceLoadingAction.SKIP;

        return ResourceLoadingAction.DEFAULT;
    }
}

LoadOptions loadOptions = new LoadOptions();
loadOptions.setLoadFormat(LoadFormat.DOC);
loadOptions.setResourceLoadingCallback(new HandleResourceLoading());

Document doc = new Document(MyDir + "zipbomb.html.doc", loadOptions);

Thanks, the provided solution will probably be good enough!

@tobjo853

Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.

@tobjo853

Thanks for your patience. It is to inform you that the issue which you are facing is actually not a bug in Aspose.Words. So, we have closed this issue (WORDSJAVA-1938) as ‘Not a Bug’.

You can skip .gzip file from loading using the code example shared in this thread.