Docx-Pdf: Long running conversion issue: JAVA

Hi ,
We are having l
Supplemental_file.docx (2.5 MB)

which impacting many of our clients specially from China.

Please find snippet from our POM

<dependency>
	<groupId>com.aspose</groupId>
	<artifactId>aspose-cells</artifactId>
	<version>24.2</version>
</dependency>

<dependency>
	<groupId>com.aspose</groupId>
	<artifactId>aspose-words</artifactId>
	<version>24.2</version>
	<classifier>jdk16</classifier>
</dependency>

<dependency>
	<groupId>com.aspose</groupId>
	<artifactId>aspose-slides</artifactId>
	<version>23.7.1</version>
	<classifier>jdk16</classifier>
</dependency>

<dependency>
	<groupId>com.aspose</groupId>
	<artifactId>aspose-pdf</artifactId>
	<version>24.2</version>
</dependency>

<dependency>
	<groupId>com.sparkjava</groupId>
	<artifactId>spark-core</artifactId>
	<version>2.3</version>
</dependency>

All these conversions from docx to pdf takes more than 1 minutes.This behavior is not associated with the file size.I am attaching 2 files for your reference.Please advise.Please also let me know if we can have a quick call to go through this issue.
Highlights.docx (12.8 KB)

@jaipm

It seems you are experiencing long conversion times when converting DOCX files to PDF using Aspose libraries in Java. Here are some steps and considerations that may help you address this issue:

  1. Library Versions: Ensure that you are using the latest versions of the Aspose libraries. You mentioned using versions like 18.4, 18.9, and 19.3. Upgrading to the latest versions (e.g., 24.2) can provide performance improvements and bug fixes.

  2. Optimization Techniques:

    • Load Options: When loading documents, consider using LoadOptions to specify any settings that might optimize the loading process.
    • Document Structure: If your documents contain complex structures (like large tables, images, or embedded objects), simplifying these elements can help reduce conversion time.
    • Batch Processing: If you are converting multiple documents, consider implementing batch processing to handle them more efficiently.
  3. Performance Profiling: Use profiling tools to identify bottlenecks in your conversion process. This can help you understand which parts of the conversion are taking the most time.

  4. Asynchronous Processing: If applicable, consider implementing asynchronous processing to handle conversions without blocking your main application thread.

  5. Testing with Different Files: Since you mentioned that the issue is not associated with file size, try testing with various DOCX files to see if specific content types or structures are causing delays.

  6. Documentation and Support: For detailed guidance on exporting documents to PDF, refer to the official Aspose documentation. It provides step-by-step instructions and best practices for using the API effectively.

If the issue persists, consider reaching out to Aspose support for more tailored assistance, or you may want to schedule a call to discuss the problem in detail.

For further reading, you can check the following resources:

If you have any more specific details or questions, feel free to share!

How to schedule a call with the support?

@jaipm I tested conversion of your documents using the latest 25.3 version of Aspose.Words for java and cannot reproduce the problem.
Conversion of Supplemental_file.docx takes about 5 seconds on my side.
Conversion of Highlights.docx takes about 1 second.

So, please try using the latest version of Aspose.Words and let us know if the problem still persists on your side.

Thanks so much for your prompt response.I upgraded to the latest version 25.3 but unfortunately we are still having this issue and many of our clients are getting impacted in the Prod.

Please find the below code snippet we have.

We are running aspose on EC2 r5a.xlarge instance.

pom.xml snippet

<dependency>
      <groupId>com.aspose</groupId>
      <artifactId>aspose-cells</artifactId>
      <version>25.3</version>
    </dependency>

    <dependency>
      <groupId>com.aspose</groupId>
      <artifactId>aspose-words</artifactId>
      <version>25.3</version>
      <classifier>jdk17</classifier>
    </dependency>

    <dependency>
      <groupId>com.aspose</groupId>
      <artifactId>aspose-slides</artifactId>
      <version>25.3</version>
      <classifier>jdk16</classifier>
    </dependency>

    <dependency>
      <groupId>com.aspose</groupId>
      <artifactId>aspose-pdf</artifactId>
      <version>25.3</version>
      <classifier>jdk17</classifier>
    </dependency>

ConvertWord.java

 FontSettings.getDefaultInstance().setFontsFolder("/compname/conv/fonts/", true);

    Document doc = new Document(file.getPath());

    String fileNameWithOutExt = FilenameUtils.removeExtension(file.getPath());
if (format == ConversionService.PDF)
    {
      HandleWordWarnings callback = new HandleWordWarnings(serviceCallId);
      doc.setWarningCallback(callback);

      PdfSaveOptions saveOptions = new PdfSaveOptions();
      saveOptions.setCompliance(PdfCompliance.PDF_17);
      saveOptions.setUseHighQualityRendering(true);
      saveOptions.setImageCompression(PdfImageCompression.JPEG);
      saveOptions.setJpegQuality(80);

      Callable<Boolean> call = new Callable<Boolean>()
      {
        public Boolean call() throws Exception
        {
**// here it keeps on waiting and gets timeout  (2mins)**
          doc.save(fileNameWithOutExt + ".pdf", saveOptions);
          return true;
        }
      };
**//timeout = 2min**
      TimeLimitedCodeBlock.runWithTimeout(call, timeout, TimeUnit.SECONDS, file.getAbsolutePath(), serviceCallId);
    }

i also tried with below but no luck

      saveOptions.setUseHighQualityRendering(false);
      saveOptions.setJpegQuality(60);
      saveOptions.setEmbedFullFonts(false);
      saveOptions.setPageMode(PdfPageMode.USE_NONE);

TimeLimitedCodeBlock.java

public static <T> T runWithTimeout(Callable<T> callable, long timeout, TimeUnit timeUnit, String filePath, String serviceCallId) throws Exception {
    final ExecutorService executor = Executors.newSingleThreadExecutor();
    final Future<T> future = executor.submit(callable);
    executor.shutdown(); // This does not cancel the already-scheduled task.
    try {
      return future.get(timeout, timeUnit);
    }
    catch (TimeoutException e) {
      
      throw e;
    }
    catch (ExecutionException e) {
      //unwrap the root cause
      Throwable t = e.getCause();
      if (t instanceof Error) {
        throw (Error) t;
      } else if (t instanceof Exception) {
        throw (Exception) e;
      } else {
        throw new IllegalStateException(t);
      }
    }
  }

Appreciate your help on this.
Supplemental_file.docx (2.5 MB)

@jaipm
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSJAVA-3075

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

Thanks @alexey.maslov

Meanwhile would you advise on how could we further debug/investigate this issue at our end? Or do you suspect some defect at your end?

As it’s impacting our prod environment we need to expedite this .

@jaipm,

We have already conducted a preliminary analysis. The long processing time is due to the document containing TIFF images. As a temporary workaround, you may try using PNG or JPEG formats instead until we resolve the issue with TIFF processing delays. We will notify you as soon as the issue is resolved. Please accept our apologies for the inconvenience.

Really appreciate @alexey.maslov your help in this.

One observation , when i try to convert my sample file (previously uploaded) at Convert Word, PDF And Many Other File Formats Using Java , it gets converted in less than 20 sec. but at our end it never gets converted due to timeout after 2 minutes.

What should we make out from this. Kindly advise.

@jaipm,

If the conversion of a document containing TIFF images hangs or throws an error on your side, check that the required dependency is installed.

You can try adding the following to the POM file:

<dependency>  
    <groupId>javax.media.jai</groupId>  
    <artifactId>com.springsource.javax.media.jai.core</artifactId>  
    <version>1.1.3</version>  
</dependency>  

We already have this dependency in our POM.
My question is that my file gets converted when i try it on products.aspose.com/words/java/conversion but it never converts at our end and doesnt seem to throw any error.

@jaipm ,

Thank you for bringing this to our attention. We will investigate this matter further.

@alexey.maslov , Here is another unique sample file. This file doesnt even convert at products.aspose.com/words/java/conversion

I also tried deleting all the images from this file but it still gets timeout(2 mins) at our end.

I expected that this would convert as I had deleted all the embedded images but doesnt.
Is there something else going on besides TIFF issue?
I also see extremely high CPU utilization during save method.

another ques. do missing fonts also cause slowness?
Kindly advise.

丙烯海松酸键合硅胶固定相的制备及其在混合模式色谱分离中的应用.docx (9.1 MB)

@jaipm It does not look like the problem is related to TIFF.

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-28136

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

another ques. do missing fonts also cause slowness?

This should not have significant impact on the document conversion time.

@alexey.noskov I have JAVA application, shouldnt this ticket start with WORDSJAVA instead of WORDSNET?

i also tried with compute optimized EC2(c5a.4xlarge) but did not help. We have atleast 10 instances running.But previously attached docs still takes > 2 mins and gets time out.

We are stuck with this issue in our Prod env. and unable to find any workaround. Please advise what else should we try to mitigate this issue util you have a permanent fix as part of the tickets you have previously created.

@jaipm .NET version of Aspose.Words is the main version, so all fixes are first implemented in .NET version and then ported to Java.

The issue with TIFF image is specific for Java version so the issue has been created for java version only.

@alexey.noskov Could you please let me know the ETA of these tickets? We are badly impacted in our Prod env. due to this slowness of Aspose. We must expedite the fix.

Meanwhile would you recommend any mitigation?

@jaipm WORDSJAVA-3075 is already in development and is scheduled to be resolved in the next 25.4 version of Aspose.Words for Java.
WORDSNET-28136 has a duplicate issue in our defect tracking system, which is also already in development. The fix is scheduled to 25.5 version.
We will keep you updated and let you know once the issues are resolved.

I am afraid, currently I cannot suggest you any workarounds for the above mentioned issues.

@alexey.maslov do you know when is 25.4 is expected to be released?

@jaipm,

Version 25.4 has already been released, but unfortunately, issue WORDSJAVA-3075 required more in-depth analysis and has not been resolved. The fix for WORDSJAVA-3075 is planned for version 25.5, along with the resolution of WORDSNET-28136. Please accept our apologies for the inconvenience.