Excel to PDF in Java - Aspose.PDF is throwing OutOfMemoryError Exception

Hi Aspose Team,

We are trying to convert excel to Word and it’s taking too much time. We have a PDF of about 15 MB with a table with about 10000 rows.

Code snippet:

File file = new File(path_to_excel);
Document finalOutput = new Document()
ByteArrayOutputStream bOutputStream = new ByteArrayOutputStream();
finalOutput.save(bOutputStream, com.aspose.pdf.SaveFormat.DocX);

Do we have any alternative to this?

Also, with a higher number of rows and columns i.e. 10000 rows and 28 columns, it is just being stuck and couldn’t complete at all.

We are aspose cells version 19.5.
Thanks,
Harish

@HThagunna,

Thanks for the details.

I guess you are using Aspose.PDF API for PDF to Word conversion. Correct me if I am wrong. I am moving your thread to Aspose.PDF forum where one of my fellow colleagues will help you better.

PS. You should share your sample PDF document for evaluation. Also, in case you are using Aspose.Cells APIs, kindly share your template MS Excel file.

Sorry, it is excel to PDF. I have attached the excel file.

censusReport (1).xlsx.zip (4.7 MB)

@HThagunna,

Thanks for the template file.

I evaluated your scenario/ case using the following sample code with your template file using our latest version/fix: Aspose.Cells for Java v19.8 (please try it), it works and takes 45-50 seconds to complete the process on my end. Seeing your file, it has lots of formulas and data/values in two sheets, so considering the fact, the time cost is ok.
e.g
Sample code:

double startTime = System.currentTimeMillis();
		  
		Workbook workbook = new Workbook("f:\\files\\censusReport (1).xlsx");
		workbook.save("f:\\files\\out1.pdf");
		  double stopTime = System.currentTimeMillis();

		  double elapsedTime = stopTime - startTime;
	      
	      System.out.println(elapsedTime);
		
		System.out.println("Done");

Please try converting this attached excel to DOCX. Actually, it is for excel to word conversion.

Thanks,
Harish

@HThagunna,

It looks like you are converting Excel (using Aspose.Cells APIs) --> PDF (using Aspose.PDF API) --> DOCX, correct me if I am wrong. I am afraid, we are only responsible for converting Excel to PDF which is fine and takes 45-50 seconds only which is ok considering the fact it has long list of pages and pages with formulas and data/values.

We are converting excel to Word and is taking time. Isn’t this conversion in Aspose’s scope?
If not do we have any alternatives to achieve this?

Thanks,
Harish

@HThagunna,

How do you convert Excel file to Word document using Aspose.Cells and Aspose.Words API? Do you use some utility (which Aspose.Words team has written) or your own custom code?

There is no direct way to accomplish the task (Excel to Word) using Aspose APIs. As I told you in our previous reply that you can try the following approach in two steps which is better one:

  1. Convert MS Excel file to PDF file format using Aspose.Cells APIs
  2. Now convert the output PDF (by Aspose.Cells) to Word document using Aspose.PDF API.

Thanks for the reply. We are converting excel to PDF and then to Word.

Could you please take a look on our code sample that we are using for converting excel to word as below:

Document finalOutput = new Document()
Workbook workbook = new Workbook(“excel file with large number of rows with formula”)
ByteArrayOutputStream dstStream = new ByteArrayOutputStream();
workbook.save(dstStream, SaveFormat.PDF);
ByteArrayInputStream srcStream = new ByteArrayInputStream(dstStream.toByteArray());
Document tempDocument = new Document(srcStream)
finalOutput.getPages().add(tempDocument.getPages())
if (formats == ‘word’) {
finalOutput.save(bOutputStream, com.aspose.pdf.SaveFormat.DocX); // this part is taking time
}

Also with larger data we are getting error as:

java.lang.OutOfMemoryError: Java heap space
at com.aspose.pdf.internal.ms.System.IO.l1j.lI(Unknown Source)
at com.aspose.pdf.internal.ms.System.IO.l1j.lj(Unknown Source)
at com.aspose.pdf.internal.ms.System.IO.l1j.write(Unknown Source)
at com.aspose.pdf.internal.ms.System.IO.l2n.lu(Unknown Source)
at com.aspose.pdf.internal.ms.System.IO.l2n.le(Unknown Source)
at com.aspose.pdf.internal.ms.System.IO.l2n.lI(Unknown Source)
at com.aspose.pdf.internal.l71h.l46v.lI(Unknown Source)
at com.aspose.pdf.internal.l71h.l28l.lf(Unknown Source)
at com.aspose.pdf.internal.l71h.l28l.lI(Unknown Source)
at com.aspose.pdf.internal.l71h.l28l.lf(Unknown Source)
at com.aspose.pdf.internal.l71h.l28l.lI(Unknown Source)
at com.aspose.pdf.internal.l71h.l28l.lf(Unknown Source)
at com.aspose.pdf.internal.l71h.l28l.lI(Unknown Source)
at com.aspose.pdf.internal.l71h.l28l.lf(Unknown Source)
at com.aspose.pdf.internal.l71h.l28l.lI(Unknown Source)
at com.aspose.pdf.internal.l71h.l28l.lf(Unknown Source)
at com.aspose.pdf.internal.l71h.l26l.lI(Unknown Source)
at com.aspose.pdf.internal.l71h.l26l.lj(Unknown Source)
at com.aspose.pdf.internal.doc.ml.OpenXmlDocumentWriter.lI(Unknown Source)
at com.aspose.pdf.internal.doc.ml.OpenXmlDocumentWriter.writeDocument(Unknown Source)
at com.aspose.pdf.internal.doc.ml.DocxConverter.convert(Unknown Source)
at com.aspose.pdf.internal.l95n.le.lt(Unknown Source)
at com.aspose.pdf.internal.l0n.lf.lI(Unknown Source)
at com.aspose.pdf.l4if.lI(Unknown Source)
at com.aspose.pdf.ADocument.lj(Unknown Source)
at com.aspose.pdf.ADocument.lj(Unknown Source)
at com.aspose.pdf.ADocument.lf(Unknown Source)
at com.aspose.pdf.Document.lf(Unknown Source)
at com.aspose.pdf.ADocument.save(Unknown Source)
at com.aspose.pdf.Document.save(Unknown Source)
at com.aspose.pdf.IDocument$save$0.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)

@HThagunna,

Thanks for providing further details.

Seeing your error trace, it seems the issue is on Aspose.PDF end, so I am moving the thread to Aspose.PDF forum where one of our fellow colleagues will evaluate it and help you through.

@HThagunna

Would you please try increasing Java Heap size by changing values of xms and xmx parameters. In case issue still persists, please share your sample input document with us. We will test the scenario in our environment and address it accordingly.

I have already attached the excel input in this chain. Could you please check? Our concern is: with same memory and same number of rows in excel, PDF is being saved within seconds without consuming more memory but it’s just reverse in case of converting it to word.

Thanks,
Harish

@HThagunna

Would you please share your complete environment details i.e. OS Name and Version, Installed Memory, JDK Version, Application Type, Java Heap Size, etc. We will further proceed to assist you accordingly.

Here is the details:
OS Name and Version: CentOS Linux release 7.6.1810 (Core)
RAM: 3730 MB
openJDK: jdk8u222-b10
heap size details: -Xms256m -Xmx1536m -XX:MaxMetaspaceSize=768m
Tomcat: 8.5.38

Let me know if you need any more details.

@HThagunna

Thanks for providing environment details.

We have logged an issue as PDFJAVA-38858 in our issue tracking system for further investigation. We will look into details of the scenario and keep you posted with the status of ticket resolution. Please be patient and spare us little time.

We are sorry for the inconvenience.

Ok. Thanks.

Let us know once you figure out something from your side.

Thanks,
Harish

@HThagunna

We will surely let you know as soon as we have some investigation results.

Any updates?

@HThagunna

We regret to share that earlier logged issue is not yet resolved due to other high priority issues in the queue. We will surely inform you as soon as we make some significant progress towards resolution of the ticket. Please spare us little time.

We are sorry for the inconvenience.