PDF to DOC/DOCX extremely slow at document.save()

I have an issue with ASPOSE.PDF when converting pdf to doc/docx.
I followed the advice here - c# - Aspose.Words save to PDF first time is slow - Stack Overflow

    Document pdf2wordDocument = new Document();

While initializing the Document object as soon as my project runs helps ( it reduces the process time with 15-25sec ) the conversion is still very slow.

    pdf2wordDocument = new Document(outputDocDir + fileNameWithExtension); // this takes 15-25 sec - FIXED with initialization of Document object at startup
    String documentExtension = saveOptions.getFormat() == 0 ? "doc" : "docx";

    String outputFile = outputDocDir + fileName + "." + documentExtension;
    log.info("New file to be converted -> " + outputFile + "  Converting...");
    pdf2wordDocument.save(outputFile, saveOptions); // This takes 65-75 seconds
    pdf2wordDocument.close();

I tried both increasing the memory in MEMMIN and MEMMAX and removing the restriction all together – this has no effect on the conversion speed.

Both delays are not present when I compiled a standalone app that only converts the files. However, when implemented in the project there is a huge delay of over a minute for 4-page PDF (the limit of the Trial version of the library)

Do have a clue as how to speed up the pdf2wordDocument.save()?

@isvirchev

Can you please share your sample PDF document for our reference? We will test the scenario in our environment and address it accordingly.

Hello.

The file size and elements do not matter in this case - it is always way too slow.

A similar issue was logged last year for .NET → Aspose.pdf .NET issue -- document.save execution is extremely slow

There is still no information or fix for the issue with .NET logged above.

I cannot provide the file that I used so I ran a test.pdf for your reference.
test.pdf (54.9 KB)

The conversion of that file is 20-30seconds which is unbelievable and should not happen.

Please look into this.

Thank you.

@isvirchev

We tested the scenario in our environment which is:

  • Aspose.PDF for .NET 21.6
  • VS 2019/Console Application/x64 Debug Mode
  • RAM 8G

We noticed that the API took 24 seconds when first time program was executed. Whereas, the conversion took 5-7 seconds on subsequent runs. Please note that the performance of the API is measured on the basis of subsequent runs as at the first run, API loads necessary resources e.g. fonts in the memory which causes an additional time to execute the functionality. Please test again at your end and share with us if results are not better on subsequent runs of the program.

Hello @asad.ali,

I would like to start by mentioning that I use Aspose.PDF for Java but shared the ticket of a similar issue with .NET for reference.

Then I want to clarify something:

Initially when I start my project I call
Document pdf2wordDocument = new Document();
to pre-load a new Document object which later I use for the actual file as per thee code below.

The code below I have to execute every time I have an incoming file, which always takes 65-75 seconds with my own PDF and 20-30 seconds with the test.pdf (previously attached) exactly on .save()

pdf2wordDocument = new Document(outputDocDir + fileNameWithExtension);
String documentExtension = saveOptions.getFormat() == 0 ? "doc" : "docx";

String outputFile = outputDocDir + fileName + "." + documentExtension;
log.info("New file to be converted -> " + outputFile + "  Converting...");
pdf2wordDocument.save(outputFile, saveOptions); // This takes 65-75 seconds
pdf2wordDocument.close();

So how exactly can I optimize the process for subsequent runs when the delay always occurs on the .save() for different files?

Thank you,
Ivan.

@isvirchev

We tested the scenario using the below code snippet and were able to notice the delay in conversion. Therefore, an issue as PDFJAVA-40699 has been logged in our issue tracking system for the sake of correction.

Document doc = new Document(dataDir + "test.pdf");
DocSaveOptions saveOption = new DocSaveOptions();
saveOption.setMode(DocSaveOptions.RecognitionMode.Flow);
saveOption.setFormat(DocSaveOptions.DocFormat.DocX);
//saveOption.setRecognizeBullets(true);
doc.save(dataDir + "Sample_21.7.docx", saveOption);

We will further look into details of the logged ticket and keep you posted with the status of its correction. Please be patient and spare us some time.

We are sorry for the inconvenience.

any news here? a few months have passed already

@isvirchev

Please note that the issues in free support model are resolved on a first come first serve basis and we are afraid that earlier logged ticket could not get resolved due to other issues in the queue logged prior to it. However, we will surely inform you once we make some significant progress towards resolution of the issue. Please spare us some time.

We apologize for the inconvenience.

@asad.ali

Please understand my concern - this ticket seems to be of the same nature Aspose.pdf .NET issue -- document.save execution is extremely slow - #27 by asad.ali

It was logged 11 months ago and still no resolution.

I am using Aspose library to build in a demo of PDF to DOC/DOCX conversion in our project for our clients and if they like the result they will purchase the license.

Which is not ok as the working conversion time is waay too high for them to rely on this library.

@isvirchev

We do understand your concerns and significance of the issue for you. Please note that the .NET ticket is related to TXT to PDF conversion where user is adding text in a PDF document from a very large .txt file and facing performance issue. Notice that the performance related issues are used to be complex in nature as they involve various internal components of the API. Such issues take significant amount of time to get fully investigated and resolved.

Nevertheless, we have recorded your concerns along with the ticket and will surely consider them during performing analysis. We will surely inform you as soon as we have definite and certain updates regarding its resolution. We highly appreciate your patience and comprehension in this regard. Please spare us some time.

We are sorry for the inconvenience.

@asad.ali

We accidentally found and fixed the issue with the slow conversion.

It was a -Djava.compiler=NONE passed into the start script of our project.

As far as I understand the parameter disables the JIT compiler (enabled by default) which improves performance.

As per: IBM Documentation

Disabling the JIT compiler is not recommended except to diagnose or work around JIT compilation problems.

@isvirchev

During our investigation, we were also unable to replicate the issue. We had the results that were more that 10 times better. Nevertheless, it is great to know that your issue has been sorted out. The information shared by you will also be helpful for others who are facing similar issue. Please feel free to create a new topic in case you face any issues.

The issues you have found earlier (filed as PDFNET-48885) have been fixed in this update.