Save large text to PDF causes OOM issues


#1

We have encountered an issue when using Aspose Words for Java to save large text files as PDFs. With 32MB of plain text we encounter an unexpected amount of GC which ultimately leads to an OOM error. We can consistently see OOM errors with a max heap size of 2GB. We are able to increase our max heap size to 4GB which prevents the OOM but still requires a lot of heap allocations. When the size of the file increases to the 100MB we again run into OOM issues. I’ve attached an example of the code to reproduce the issue. I’m unable to upload a sample file due to the size limitations. But any plain text file at about 100MB will reproduce the issue.

We’re using Aspose Words for Java 19.6.

Java Version:
java version “1.8.0_181”
Java™ SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot™ 64-Bit Server VM (build 25.181-b13, mixed mode)

JAVA_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xmx4g -Xms4g -XX:+UseG1GC"

aspose-words-large-text.zip (450 Bytes)
Screenshot 2019-08-10 at 10.13.04.png (292.5 KB)


#2

@tucker.barbour,

Please ZIP the sample file, upload the .zip file to Dropbox and share the Download link here for testing. We will then investigate the issue on our end and provide you more information.


#3

Here is the 100MB file
Let me know if you have any issues retrieving it.


#4

Here is a ZIP version


#5

@tucker.barbour,

We tested the scenario and have managed to reproduce the same problem on our end. For the sake of correction, we have logged this problem in our issue tracking system. The ID of this issue is WORDSNET-19043. We will further look into the details of this problem and will keep you updated on the status of correction. We apologize for your inconvenience.


#6

@tucker.barbour,

Regarding WORDSNET-19043, it is to inform you that Aspose.Words initially was not designed to handle documents of such sizes - 2.5M rows in input .txt or 44 thousand of A4 pages. Two thirds of RAM is used by layout objects. Significant reduction in memory consumption can be achieved after complete redesign of Aspose.Words layout architecture but, we have to postpone this rework for now. As a workaround you can implement the following workflow:

We will keep you posted on any further updates and let you know as soon as this issue will be resolved in future. We apologize for any inconvenience.


#7

Thank you for the reply. We will try creating PDFs with the workflow referenced above.