Out of memory errors

I’m trying to use Aspose.Words 14.4.0.0 to load a big DOCX document to extract text from it, but I’m running into OOMEs. The document itself has a lot of images, and only a few lines of text When creating the Document object, Aspose.Words tries to load the whole document in memory, and this causes the OOME.


Is there any way of loading the document using streaming? Also, is there any way to avoid loading the images or any other object I’m not interested in when loading the file?

Thank you in advance.

Hi there,

Thanks for your inquiry.

atlassian:
I’m trying to use Aspose.Words 14.4.0.0 to load a big DOCX document to extract text from it, but I’m running into OOMEs. The document itself has a lot of images, and only a few lines of text When creating the Document object, Aspose.Words tries to load the whole document in memory, and this causes the OOME.

Could you please attach your input Word document along with code here for testing? I will investigate the issue on my side and provide you more information.
atlassian:

Is there any way of loading the document using streaming?

Yes, you can load the word document from stream. Please read following documentation link for your kind reference.
http://www.aspose.com/docs/display/wordsjava/Opening+from+a+Stream
atlassian:

Also, is there any way to avoid loading the images or any other object I’m not interested in when loading the file?

In case, you are loading html documents, please use the LoadOptions.ResourceLoadingCallback property. The property allows to control how external resources (images, style sheets) are loaded
when a document is imported from HTML, MHTML.