Perform memory optimization while loading document using Java

We have encountered an unexpected memory issue when trying to convert very large docx document to pdf (size ~100mb). The documents we are using consists of text, tables and images.
We use openjdk 1.8 and aspose-words 20.8

Code snippet:

package pdf.generator.core.services

import com.aspose.words.Document
import com.aspose.words.LoadOptions
import com.aspose.words.PdfCompliance
import com.aspose.words.PdfSaveOptions

import java.nio.charset.Charset

class AsposePdfGenerationService {
    private final String tempFolder = '/home/krisravn/asposeTemp'

    ByteArrayOutputStream generatePdf(InputStream indholdStream, String encoding) {
        ByteArrayOutputStream outputStream = new ByteArrayOutputStream()

        LoadOptions loadOptions = new LoadOptions()
        loadOptions.setTempFolder(tempFolder)
        new File(loadOptions.getTempFolder()).mkdir()
        loadOptions.preserveIncludePictureField = false
        loadOptions.encoding = Charset.forName(encoding ?: 'UTF-8')
        PdfSaveOptions saveOptions = new PdfSaveOptions()
        saveOptions.compliance = PdfCompliance.PDF_A_1_A
        saveOptions.setTempFolder(tempFolder)
        new Document(indholdStream, loadOptions).save(outputStream, saveOptions)

        return outputStream
    }
}

Error:

Java heap space
java.lang.OutOfMemoryError: Java heap space
	at com.aspose.words.internal.zz2Q.zzQV(Unknown Source)
	at com.aspose.words.internal.zz2Q.zzQW(Unknown Source)
	at com.aspose.words.internal.zz2Q.write(Unknown Source)
	at com.aspose.words.internal.zzYF.zzZ(Unknown Source)
	at com.aspose.words.internal.zzYF.zzZ(Unknown Source)
	at com.aspose.words.internal.zz4D.zz2(Unknown Source)
	at com.aspose.words.internal.zz4B.zz2(Unknown Source)
	at com.aspose.words.internal.zz4C.zzqX(Unknown Source)
	at com.aspose.words.internal.zz4B.zzqX(Unknown Source)
	at com.aspose.words.internal.zz4C.zzqV(Unknown Source)
	at com.aspose.words.internal.zz4B.zzqV(Unknown Source)
	at com.aspose.words.internal.zzFV.zzZ(Unknown Source)
	at com.aspose.words.internal.zzFV.zze(Unknown Source)
	at com.aspose.words.internal.zzFV.zzf(Unknown Source)
	at com.aspose.words.internal.zzFV.<init>(Unknown Source)
	at com.aspose.words.zz6N.<init>(Unknown Source)
	at com.aspose.words.zzZ3Z.zzLj(Unknown Source)
	at com.aspose.words.Document.zzY(Unknown Source)
	at com.aspose.words.Document.zzZ(Unknown Source)
	at com.aspose.words.Document.<init>(Unknown Source)
	at com.aspose.words.Document.<init>(Unknown Source)
	at pdf.generator.core.services.AsposePdfGenerationService.generatePdf(AsposePdfGenerationService.groovy:25)
	at pdf.generator.core.services.PdfGenerationCoreService.generatePdf(PdfGenerationCoreService.groovy:41)
	at pdf.generator.core.services.PdfGenerationCoreServiceIntegrationSpec.test generering af pdf tager længere tid ind timeout(PdfGenerationCoreServiceIntegrationSpec.groovy:68)

We do understand it can be a bit problematic converting huge docx files and therefore we have also tried using different tricks to make it work. First we have tried to use a temp folder that according to the aspose documentation might help solve memory issues: Specify Load Options in Java|Aspose.Words for Java
We have monitored the folder during execution but nothing is written to the folder and we get the same error.
Furthermore we have also tried increasing our heap size up to 20Gb of Ram, but again same error.
But if we send a much smaller file ~10Mb everything works as expected and we get a converted pdf document.

So we want to know how to get around this memory issue?
We have purchased Paid support but the new priveledge hasn’t been assigned our account yet, therefore i am posting this here until our support level is increased.
A Sample file (151MB) that causes issues can be downloaded from: https://drive.google.com/file/d/1YHfGVDi4oDWzS3FABVo0rFOYhqHZSfse/view?usp=sharing

@dahpak

We suggest you please use SaveOptions.MemoryOptimization property to optimize the memory performance. Setting this option to true can significantly decrease memory consumption while saving large documents at the cost of slower saving time. Hope this helps you.

Thanks for the answer.

We are aware of the SaveOptions.MemoryOptimization property and we have also tried to set it to true, without luck. Our memory problem is when we try to load the document, not when saving. We have tried to comment out the part of the code that saves the document and the same error still persists.

Unfortunately there is no option named LoadOptions.MemoryOptimization

@dahpak

When dealing with very large and complex documents Aspose.Words sometimes had problems during saving resulting in out of memory exceptions, disk swapping and generally failures.

We logged a feature request as WORDSNET-14837 to perform memory optimization while loading document. We will look into the possibility of implementation of this feature. Once we have any information about this feature, we will update you via this forum thread.

Hey Admin,

Is there any update on the mentioned feature request WORDSNET-14837 ?

@neha.agarwal The issue is currently under analysis. Unfortunately, there are no news regarding it yet.

Okay. Thanks for the update. I have below questions as well and it would be great if you could answer them:

  1. Is there any file size limit that Aspose has for loading/saving word/pptx file for conversion ?
  2. How can I use setTempFolder option for Loading a word file using InputStream. I couldn’t find any example in the documentation.

@neha.agarwal

No, there are no file size limits in Aspose.Words and Aspose.Slides. The only limit is available memory on your side. But it is extremally not recommended to use large documents. It is better to use few smaller documents.

The code is the same as for loading from file path:

LoadOptions loadOptions = new LoadOptions();
loadOptions.setTempFolder("C:\\TempFolder\\");

// Ensure that the directory exists and load.
new File(loadOptions.getTempFolder()).mkdir();

FileInputStream inStream = new FileInputStream("C:\\Temp\\in.docx");
Document doc = new Document(inStream, loadOptions);