We use aspose.pdf to convert PDF files to generate HTML files, the conversion process is very slow, and the process of converting the memory consumption is very large, if converted very large PDF files, often occupy a very large memory, the memory consumption will not be released at the end of conversion, often out of memory in the annex, the picture is our memory test, a 20MB conversion of the PDF file conversion consumes four hours
Thanks for contacting support.
Would you please provide us your sample PDF document along with the code snippet and environment details (i.e API Version, Application Type, Operating System Info, Target Framework, etc), so that we can test the scenario in our environment and address it accordingly.
- @author JIE
public class PdfConvert extends BaseConvertImpl
public ConvertType getConvertType()
public List<Class<? extends FlowStrategy>> getFlowStrategy()
List<Class<? extends FlowStrategy>> flowStrategys = new ArrayList<Class<? extends FlowStrategy>>();
public void doAnalysis() throws Exception
Document doc = new Document(getParam(ParamConstrant.SOURCEPATH, String.class));
public void doConvert() throws Exception
Document doc = getParam(ParamConstrant.DOCUMENT, Document.class);
if (null != doc)
doc.save(getParam(ParamConstrant.TARGETPATH, String.class), getHtmlSaveOptions(getParam(ParamConstrant.TARGETPATH, String.class)));
protected HtmlSaveOptions getHtmlSaveOptions(final String targetPath)
HtmlSaveOptions newOptions = new HtmlSaveOptions();
newOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
newOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.SaveInAllFormats;
// 这个地方是控制, 图片是否压入的地方
newOptions.PartsEmbeddingMode = DopConfig.getBoolean(ParamConstrant.COMPRESSIMAGE, false) ? HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml : HtmlSaveOptions.PartsEmbeddingModes.EmbedCssOnly;
newOptions.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
newOptions.CustomHtmlSavingStrategy = new HtmlSaveOptions.HtmlPageMarkupSavingStrategy()
public void invoke(HtmlSaveOptions.HtmlPageMarkupSavingInfo htmlSavingInfo)
byte resultHtmlAsBytes = new byte[(int) htmlSavingInfo.ContentStream.getLength()];
htmlSavingInfo.ContentStream.read(resultHtmlAsBytes, 0, resultHtmlAsBytes.length);
FileOutputStream fos = null;
LOG.info(“开始写入Html[” + targetPath + “]文件…”);
fos = new FileOutputStream(targetPath);
LOG.info(“Html[” + targetPath + “]文件写入完成…”);
} catch (Exception e)
aspose.pdf version is aspose.pdf-11.4.0
The document is probably 21MB and cannot be uploaded to the forum. You can provide the mailbox, I send it to you, or you can provide a FTP space that can upload large attachments
Thanks for contacting support.
In case if you have larger size document, you can upload it to some public file sharing service (e.g Dropbox, Google Drive) and share the link here. We will test the scenario in our environment address it accordingly.
I’m from China, and I can’t use the service provided by Google in China. Can you visit the Baidu http://pan.baidu.com/? I can upload the files to it， The biggest problem is that when the large PDF file is converted to the HTML file, the memory will always be occupied, not released, and constantly converted until memory overflows
粘贴图片.png (182.1 KB)
Aspose.pdf converts PDF files into HTML files that generate large amounts of temporary files without being deleted
Thanks for writing back.
Would you please share the environment details, i.e your application type, development environment, JDK version, etc. So that we can also observe the performance issue in specified environment.
I am able to access this website, please upload your files there and share the link, so that we can test the scenario with your specific document as well.
I uploaded the files we tested to Baidu cloud
We will file for continuous conversion (conversion, a file and a file without concurrent conversion, before the test had concurrent conversion, that memory can’t stand), the entire conversion process we found that the memory will not be released, resulting in memory are straight up, not down. Memory overflows are eventually caused.
Os: win server 64bit
Tomcat, arguments:, set, CATALINA_OPTS=-server, -Xms1024m, -Xmx10240m, -XX:PermSize=128M, -XX:MaxPermSize=2048M
Thanks for sharing environment details.
We are setting up an environment to test the scenario and will get back to you shortly. Meanwhile, would you please check the link which you have shared because it was giving 404 Not Found Error, when I tried to open it.
Thanks for sharing sample documents.
We are testing the scenario in our environment and will get back to you with our findings as soon as possible. Please be patient.
Thanks for your patience.
I have tested the scenario in an environment i.e Eclipse Neon.2 Release (4.6.2), Apache Tomcat Server 7.0, JRE 1.8, with Aspose.Pdf for Java 17.6 and observed that the code execution took more than an hour, resulting OutOfMemoryError. The CPU usage was 100% throughout the conversion process and Memory Consumption was 70%-80%.
Therefore, I have logged an issue as PDFJAVA-36958 in our issue tracking system. We will further investigate this issue and keep you posted with the status of its correction. Please be patient and spare us little time.
We are sorry for the inconvenience.
Thank you very much.