We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Facing out of memory exception while convertin a large file (50 MB) into HTML

Hi,

We have just purchased license for aspose.total fom tk20.com. We have to convert big doc/docx file into HTML but currently we are facingjava.lang.OutOfMemoryError: Java heap space error while converting 50 MB file. We may have to convert files of size 100~150 MB into HTML.

Please help us ASAP.


Thanks,

Ajay

Hi Ajay,


Thanks for your inquiry. You should note that usually Aspose.Words needs few times more memory than document size to build model of the document in memory. For example if your document’s size is 1 MB, Aspose.Words needs 10-20 MB of RAM to build its DOM in memory. Multiplier depends on format because some formats are more compact than others. For example DOCX format is more compact than DOC and RTF, and DOC is more compact than RTF. The exception will occur if you don’t have that much available space in RAM. So I am not sure the issue you reported can be resolved in Aspose.Words. I would advise you to use few small documents instead of one huge document.

Best regards,

Hi,

As we don't have control over size of the files used , so could you provide us a brief chart of which document type(ppt/pptx/doc/docx/xls/xlsx/pdf) can require how much times of size of the file to convert it into HTML e.g. for docx -20 times of the size of the file max. We need this to implement in our project so that we can ensure that JVM will never crash due to conversion.


Thanks,

Ajay

Hi Ajay,

Up to 20 times of the original document size condition is true for Word and Excel documents but unfortunately there is no such limitation in case of PDF and PowerPoint documents. For PDF and PowerPoint documents, memory consumption can increase depending upon number of slides, shapes, images and other contents and at the moment, we are not in a position to guaranty how much this will increase.

You can go through the following links to further reduce memory consumption in case of Excel documents.

https://blog.aspose.com/2014/04/24/optimize-memory-for-existing-worksheets-with-aspose.cells-for-java-8.0.1

http://www.aspose.com/docs/display/cellsjava/Optimizing+Memory+Usage+while+Working+with+Big+Files+having+Large+Datasets

Best Regards,

Hi ,

We have bought aspose.total for java license 3 weeks back. We have a docx file of about (~50 MB) and before starting its conversion to HTML I checked the free memory available to jvm was approx. 1667 MB but it produced OutOfMemory Heap error. Could you please explain how much memory is required to process the file? Please respond it ASAP. I am not able to upload the file on your server.

Thanks,

Ajay

Hi Ajay,

May be your file is a bit complex and requires more memory. You can use a third party storage like DropBox or Windows Live and share the link in a private message. Please check http://www.aspose.com/corporate/purchase/faqs/send-license-to-aspose-staff.aspx for more details on how to send a private message.

Best Regards,

Hi Muhammad,

We have just uploaded a 10 MB file containing a some text and some images. Now after taking long time the file finally processed but it's HTML was 118 MB and the image and css were also there with of approx ~10 MB. We are not even able to open it in browser.

Could you explain why the HTML is of 110 MB? Is it due to image or the text part? I have attached a very raw file which was used to test the conversion functionality on larger files.



Thanks,

Ajay

Hi,

I was analyzing the memory consumption by Aspose.word.java on 64 bit linux system and below are the details when docx to HTML conversion is done by Aspose.words java on a sample file of ~10 MB(which I attached with the previous post):

SET 1 SET 2 SET 3

Before conversion
used memory : 15,349,040 16,003,640 16,003,704

After conversion
used memory : 1,176,999,336 1,266,575,320 1,320,443,304

*All the data are in Bytes.

This means a docx file may require more than 130 times of the size of file.

After this i tried it with another file of ~13 MB but I was not able to convert it due to heap error. I am attaching the file(s) with the reply. Please reply ASAP as we are stuck due to this problem only.


Thanks,

Ajay

Hi Ajay,

I used latest version of Aspose.Words for Java (i.e. 14.5.0) and output HTML file (including all resources) was just under 65MB and browsers were loading that HTML without any issue. Please try with the latest version.

If we convert same file to HTML using MS Word, the size of output file (including all resources) is 30MB. There is a lot of text in your input file and that might be the reason for large output size. A new issue to improve performance and reduce the size of output HTML has been logged into our issue tracking system as WORDSNET-10285. We will keep you updated on this issue in this thread.

Sorry for the inconvenience.

Best Regards,

Hi,

I am not able to open link provided by you for optimizing memory for existing memory sheet.

https://blog.aspose.com/2014/04/24/optimize-memory-for-existing-worksheets-with-aspose.cells-for-java-8.0.1


Thanks,

Ajay



Hi Ajay,

We will check the issue with that post. Same instructions are shared in the documentation topic at http://www.aspose.com/docs/display/cellsjava/Optimizing+Memory+Usage+while+Working+with+Big+Files+having+Large+Datasets

You can follow these instructions.

Best Regards,

Hi Ajay,

Regarding issue WORDSNET-10285, Aspose.Words uses inline CSS by default (options.setCssStyleSheetType(CssStyleSheetTypeoptions.setCssStyleSheetType(CssStyleSheetType.Inline)). You can use embedded CSS (options.setCssStyleSheetType(CssStyleSheetType.EMBEDDED)) to reduce output HTML document size. With this option output HTML document takes around 9MB.

Best Regards,