We have encountered an unexpected memory issue when trying to convert very large docx document to pdf (size ~100mb). The documents we are using consists of text, tables and images.
We use openjdk 1.8 and aspose-words 20.8
Code snippet:
package pdf.generator.core.services
import com.aspose.words.Document
import com.aspose.words.LoadOptions
import com.aspose.words.PdfCompliance
import com.aspose.words.PdfSaveOptions
import java.nio.charset.Charset
class AsposePdfGenerationService {
private final String tempFolder = '/home/krisravn/asposeTemp'
ByteArrayOutputStream generatePdf(InputStream indholdStream, String encoding) {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream()
LoadOptions loadOptions = new LoadOptions()
loadOptions.setTempFolder(tempFolder)
new File(loadOptions.getTempFolder()).mkdir()
loadOptions.preserveIncludePictureField = false
loadOptions.encoding = Charset.forName(encoding ?: 'UTF-8')
PdfSaveOptions saveOptions = new PdfSaveOptions()
saveOptions.compliance = PdfCompliance.PDF_A_1_A
saveOptions.setTempFolder(tempFolder)
new Document(indholdStream, loadOptions).save(outputStream, saveOptions)
return outputStream
}
}
Error:
Java heap space
java.lang.OutOfMemoryError: Java heap space
at com.aspose.words.internal.zz2Q.zzQV(Unknown Source)
at com.aspose.words.internal.zz2Q.zzQW(Unknown Source)
at com.aspose.words.internal.zz2Q.write(Unknown Source)
at com.aspose.words.internal.zzYF.zzZ(Unknown Source)
at com.aspose.words.internal.zzYF.zzZ(Unknown Source)
at com.aspose.words.internal.zz4D.zz2(Unknown Source)
at com.aspose.words.internal.zz4B.zz2(Unknown Source)
at com.aspose.words.internal.zz4C.zzqX(Unknown Source)
at com.aspose.words.internal.zz4B.zzqX(Unknown Source)
at com.aspose.words.internal.zz4C.zzqV(Unknown Source)
at com.aspose.words.internal.zz4B.zzqV(Unknown Source)
at com.aspose.words.internal.zzFV.zzZ(Unknown Source)
at com.aspose.words.internal.zzFV.zze(Unknown Source)
at com.aspose.words.internal.zzFV.zzf(Unknown Source)
at com.aspose.words.internal.zzFV.<init>(Unknown Source)
at com.aspose.words.zz6N.<init>(Unknown Source)
at com.aspose.words.zzZ3Z.zzLj(Unknown Source)
at com.aspose.words.Document.zzY(Unknown Source)
at com.aspose.words.Document.zzZ(Unknown Source)
at com.aspose.words.Document.<init>(Unknown Source)
at com.aspose.words.Document.<init>(Unknown Source)
at pdf.generator.core.services.AsposePdfGenerationService.generatePdf(AsposePdfGenerationService.groovy:25)
at pdf.generator.core.services.PdfGenerationCoreService.generatePdf(PdfGenerationCoreService.groovy:41)
at pdf.generator.core.services.PdfGenerationCoreServiceIntegrationSpec.test generering af pdf tager længere tid ind timeout(PdfGenerationCoreServiceIntegrationSpec.groovy:68)
We do understand it can be a bit problematic converting huge docx files and therefore we have also tried using different tricks to make it work. First we have tried to use a temp folder that according to the aspose documentation might help solve memory issues: https://docs.aspose.com/words/java/specify-load-options/#use-tempfolder-to-avoid-amemory-exception
We have monitored the folder during execution but nothing is written to the folder and we get the same error.
Furthermore we have also tried increasing our heap size up to 20Gb of Ram, but again same error.
But if we send a much smaller file ~10Mb everything works as expected and we get a converted pdf document.
So we want to know how to get around this memory issue?
We have purchased Paid support but the new priveledge hasn’t been assigned our account yet, therefore i am posting this here until our support level is increased.
A Sample file (151MB) that causes issues can be downloaded from: https://drive.google.com/file/d/1YHfGVDi4oDWzS3FABVo0rFOYhqHZSfse/view?usp=sharing