Aspose.Word create document without images and videos

I want to compare two docs using Aspose.Word for Java so I need to create two documents first. But when the two docs are too large(in my case 236M, because there are pictures and videos in it) it will cause an OOM. So I wonder is there any way I can create a document without any pictures and videos? If I can do this, the new docs will be small enough to process.

I notice that there is a load option and setTempFile method, but it seems like useless in my test case. Why and how can I fix my problem? Please provide some help, thanks very much!

@longmenzhitong Aspose.Words always allocates more memory that actual document size. This is expected. Please see our documentation for more information:
https://docs.aspose.com/words/net/memory-requirements/
For reducing memory usage upon processing extremally large documents, you can try using LoadOptions.TempFolder, SaveOptions.TempFolder and SaveOptions.MemoryOptimization properties.

If you would like to remove all images from the document before processing, you can use the following code:

Iterable<Shape> shapes = doc.getChildNodes(NodeType.SHAPE, true);
for (Shape s : shapes)
{
    if (s.isTopLevel() && s.hasImage())
        s.remove();
}

Thanks for your reply. I realized that Aspose.Words need multiple times of memory than the size of the original file. As to removing all shapes, I’ve tried it, but it’s after the document creation. Unfortunately, the OOM exception occurs when I try to create a document object. So I think it must be LoadOptions that fit my case. I also tried TempFolder, but OOM still and the folder I created is always empty. Do you have any clue? Thanks again! : - )

@longmenzhitong 236M Docx document is an extremally large document for both MS Word and Aspose.Words.
If your document contains OLE objects, you can try specifying LoadOptions.IgnoreOleData. Ignoring OLE data may reduce memory consumption and increase performance without data lost in a case when destination format does not support OLE objects.

Well, I do not care about images and videos in the docs which need to compare. I’d rather ignore them in big file cases to achieve the goal that comparing the text content in the word(with no OOM of course). So is there any way can create a document without load images and videos?
Best regards and looking forward to your further reply!

@longmenzhitong Unfortunately, currently there is no option to ignore images upon loading MS Word documents. We will consider adding such option. I have logged a feature request as WORDSNET-25699.

All right, I will try another way, much thanks for your help!

1 Like