OutOfMemoryError transforming a doc file into pdf

nollymar · August 8, 2011, 11:15am

Hi, I am trying to save a very long doc file (4350 pages) in a pdf format using PdfSaveOptions but when I use the getPageCount method to set the doc size, it takes a lot of time to execute this (more than 30 minutes with 6GB assigned to JVM) and always crashes displaying this exception:

java.lang.OutOfMemoryError: GC overhead limit exceeded
at com.aspose.words.xl.QD(LprBorder.java:29)
at com.aspose.words.xk.hashCode(LprBase.java:40)
at com.aspose.words.xl.(LprBorder.java:23)
at com.aspose.words.xm.(LprBorders.java:140)
at com.aspose.words.n.a(AttributeConverter.java:502)
at com.aspose.words.n.c(AttributeConverter.java:316)
at com.aspose.words.aqr.ag(SpanGenerator.java:84)
at com.aspose.words.fq.AP(DocumentSpanConverter.java:347)
at com.aspose.words.fq.moveNext(DocumentSpanConverter.java:149)
at com.aspose.words.fq.AH(DocumentSpanConverter.java:203)
at com.aspose.words.vk.a(LayoutDocument.java:37)
at com.aspose.words.Document.updatePageLayout(Document.java:1399)
at com.aspose.words.Document.ab(Document.java:1332)
at com.aspose.words.Document.zy(Document.java:1345)
at com.aspose.words.Document.getPageCount(Document.java:1380)

Do you know another way I can use to get the doc page count that could be more efficient? I am using the last version of Aspose.Words?

Thanks in advance!

AndreyN · August 8, 2011, 1:19pm

Hello
Thanks for your request. Memory usage depends on document size, format and document’s content. Usually Aspose.Words needs few times more memory than document size to build model of the document in memory.
Also, I would like to say that producing huge MS Word documents is not very good practice. Ms Word does not like huge documents. Usually it takes a lot of time to open such documents in MS Word and sometimes MS Word just hangs. Normal size of MS Word documents is 100 – 200 pages.
But in the meantime the only way you can process really big documents if you give more heap space to your Java virtual machine. Aspose.Words will take loads of memory when loading the document, but when you finished processing the document, all memory will be released and garbage collected quickly. So it will be only a short spike of high memory use.
Best regards,

alegr44 · August 15, 2011, 1:47pm

Hello Andrey,

I have this exact problem. Sadly our business force us to manage this kind of documents. Is there another way to get the page count of a word document? We need this number not only to transform the document to PDF, we can also to prevent the system to manage documents too big and giving OUT OF MEMORY Errors.

Do you have another solution, like split a word document in several documents of 200 pages for example?

Any workaround is accepted, but please, don’t tell me to put more memory, my concern is how much time it takes to launch the exception. If we can get the number of pages of a document, we can manage the transformation or even say to the user that the document is too big without waiting for the Out of Memory.

Thanks very much in advance.

Best regards

Alejandro

AndreyN · August 15, 2011, 2:09pm

Hi Alejandro,
Thanks for your inquiry. Could you please attach the document you are getting problem with? I will check it on my side and provide you more information.
Best regards,

alegr44 · August 15, 2011, 2:31pm

Hi Andrey,

Thanks for you rapid answer. The problem is that the document is generated dynamically by the application, and when we try to save it we got the Out of memory exception (it has 2.000 pages), but what sound I little crazy is that we can generate the PDF of this same document.

More than 2.000 pages is not possible (we can’t get the pageCount), the application crashes. We are willing to set a max number of page, I mean, a document of more than 2.000 It’s kind a crazy, but right now, with Aspose we don’t know if a document has more than 2.000 pages, because the method getPageCount() is the one that throw the exception. Do you know any other way to get this page count?

The document basically is a big table with 2 columns. We already had taken in consideration all the ways to modify the template (multiple columns per page, small font, single space, etc etc)…

Any other information that you need, please let me know.

Thanks in advance for your help

Best regards

Alejandro

adam.skelton · August 15, 2011, 6:16pm

Hi Alejandro,
Thanks for your inquiry.
I’m afraid calculating the number of pages of a document requires building the page layout in memory. With very large documents this of course takes a while and may even pose the risk of the application running out of memory in the process.
If you only need to know the approximate number of pages then you may want to use the code below instead to predict this based off the number of paragraphs in the document. You will need to calculate the magic number on your side based off an existing document that you have generated.

int predictedPageCount = doc.getChildNodes(NodeType.PARAGRAPH, true).getCount() / magicNumber;

Thanks,