Document structure causing out of memory exception

praveen1123 · April 25, 2017, 12:21am

HI we are using Aspose words- 7.0.0 to write a document and save it in PDF file.

The code structure of how we are writing is in the below mentioned format.

Document doc = null;

Document newDoc = new Document();

for(int i =0; i<size ; i++){

doc = new Document(templateFile);

// Code to add data in tables of document //

Table headingTable = (Table) doc.getChild(NodeType.TABLE, 0, true);
headingTable.getRows().get(3).getCells().get(3).getFirstParagraph().appendChild(new Run(doc, value));

Table doaTable = (Table) doc.getChild(NodeType.TABLE, 1, true);
doaTable.getRows().get(2).getCells().get(1).getFirstParagraph()
.appendChild(new Run(doc, value));

// Code to add data in tables of document //

newDoc.appendDocument(doc,ImportFormatMode.USE_DESTINATION_STYLES);

}

Node node = newDoc.getFirstSection();

node.remove();

ByteArrayOutputStream ms = new ByteArrayOutputStream();

newDoc.save(ms, SaveFormat.PDF);

byte[] extractedDocument = ms.toByteArray();

when we encountered out of memory error we analyzed heap dump and found that large data is held on objects of section under object document.

But it contains a tree structure of section Objects consisting large heap size.

like - com.aspose.words.Document

com.aspose.words.section

com.aspose.words.section …so on

Please find the attachement - aspose MAT analysis

Can you please let us know is this structure expected or is it something wrong from the way we implemented the API.

tahir.manzoor · April 25, 2017, 4:56am

Hi Praveen,

Thanks for your inquiry. A valid Document needs to have one Section and a valid section needs to have Body with one Paragraph. You are adding multiple documents into another Document. So, the final document will have multiple sections. Please read following article:

Aspose.Words Document Object Model

If you are loading same document in following line of code, we suggest you please add it before for loop and append only tables to main document.

doc = new Document(templateFile);

Moreover, we suggest you please use latest version of Aspose.Words for Java 17.4. Hope this helps you.

praveen1123 · April 27, 2017, 1:50am

Hi Tahir,

Thanks for your reply, can you also let me know what would be the percentage reduction of size occupation, if i implement it in the way you suggested.

Also i found this post in one of the forum posted by Alexey.

Now let me explain why Aspose.Words uses more memory than document size. Document after loading into the memory is stored in DOM (Document object Model). If document contains mostly text content, Aspose.Words requires approximately 40 times more memory than the original DOCX document size (10 times more memory than DOC file size). So in your case, if your DOCX document size is 20MB, to load this document you need 800MB of memory. Then when you save document to PDF, Aspose.Words needs to build layout of the document that also stored in the memory. So I think to convert such huge document to PDF you need approximately 2GB of available memory.

By the way, even MS Word does not like such large documents

Can you tell me if i use rtf file as my input will it still occupy 40 or 10 times the space of the original file and when i convert that to PDF, it will still occupy more space ( i.e 100 times the sapce of original file in Heap as per above).

Thanks in advance, will be waiting for your reply.

tahir.manzoor · April 27, 2017, 10:39am

Hi Praveen,

Thanks for your inquiry.

praveen1123:

can you also let me know what would be the percentage reduction of size occupation, if i implement it in the way you suggested.

If you load the same document in for loop 20 times, more memory will be consumed. You need to load the document only once before for loop. However, it depends on your requirement and use cases.

praveen1123:

Can you tell me if i use rtf file as my input will it still occupy 40 or 10 times the space of the original file and when i convert that to PDF, it will still occupy more space ( i.e 100 times the sapce of original file in Heap as per above).

Please note that performance and memory usage all depend on complexity and size of the documents you are generating. While rendering a document to fixed page formats (e.g. PDF), Aspose.Words needs to build two model in the memory – one for document and the other for rendered document.

In terms of memory, Aspose.Words does not have any limitations. If you're loading huge Word documents into Aspose.Words' DOM, more memory would be required. This is because during processing, the document needs to be held wholly in memory. Usually, Aspose.Words needs 10 times more memory than the original document size to build a DOM in the memory.

Setting SaveOptions.MemoryOptimization option to true can significantly decrease memory consumption while saving large documents at the cost of slower saving time.

When the document is closed, all the DOM data is purged from memory during the next garbage collector cycle. Please note that the memory may not be released until you close the application.