DOCX to PDF conversion consumes high CPU and seems to infinite loop. (Java library)

Hi, we are using Java Aspose library version 25.2 to convert Word docs to PDF in bulk.
Sometimes, we see our process hangs with high CPU usage for certain input Word docs.

The stack trace often looks like this:

	at com.aspose.words.zzs1.zzXAV(Unknown Source)
	at com.aspose.words.zzs1.zzzW(Unknown Source)
	at com.aspose.words.zzs1.movePrevious(Unknown Source)
	at com.aspose.words.zzzT.zzj2(Unknown Source)
	at com.aspose.words.zzXzq.zzXDb(Unknown Source)
	at com.aspose.words.zzXzq.zzr7(Unknown Source)
	at com.aspose.words.zzXzq.zzVXw(Unknown Source)
	at com.aspose.words.zzZn7.zzXDb(Unknown Source)
	at com.aspose.words.zzZn7.zzWzC(Unknown Source)
	at com.aspose.words.zzYO3.zznU(Unknown Source)
	at com.aspose.words.zzYO3.zzXDb(Unknown Source)
	at com.aspose.words.zzXjg.zzj2(Unknown Source)
	at com.aspose.words.zzXjg.zzD3(Unknown Source)
	at com.aspose.words.zzXjg.zzj2(Unknown Source)
	at com.aspose.words.zzZVJ.zzZ20(Unknown Source)
	at com.aspose.words.zzZVJ.zzj2(Unknown Source)
	at com.aspose.words.zzYYS.zzXRq(Unknown Source)
	at com.aspose.words.zzYYS.zzD3(Unknown Source)
	at com.aspose.words.zzYYS.zzXDb(Unknown Source)
	at com.aspose.words.zzZK3.zzZDo(Unknown Source)
	at com.aspose.words.zzZK3.zzD3(Unknown Source)
	at com.aspose.words.zzZK3.zzj2(Unknown Source)
	at com.aspose.words.zzpI.zzYwb(Unknown Source)
	at com.aspose.words.zzpI.zzWdD(Unknown Source)
	at com.aspose.words.zzpI.zzWwl(Unknown Source)
	at com.aspose.words.zzpI.zzYwb(Unknown Source)
	at com.aspose.words.zzUl.zzYwb(Unknown Source)
	at com.aspose.words.zzWy9.zzj2(Unknown Source)
	at com.aspose.words.zzYBj.zzCB(Unknown Source)
	at com.aspose.words.zzU2.zzXOv(Unknown Source)
	at com.aspose.words.Document.updatePageLayout(Unknown Source)
	at com.aspose.words.zzWbB.zzZIO(Unknown Source)
	at com.aspose.words.zzw6.zzpO(Unknown Source)
	at com.aspose.words.zzXnb.zzM7(Unknown Source)
	at com.aspose.words.zzXnb.zzj2(Unknown Source)
	at com.aspose.words.zzXnb.zzXDb(Unknown Source)
	at com.aspose.words.zzXnb.zzWVC(Unknown Source)
	at com.aspose.words.Document.zzWIp(Unknown Source)
	at com.aspose.words.Document.zzj2(Unknown Source)
	at com.aspose.words.Document.zzXDb(Unknown Source)
	at com.aspose.words.Document.save(Unknown Source)
        .......

Sample code to reproduce:

  @Nonnull
  public static com.aspose.words.Document loadDocument(@Nonnull byte[] docBytes)
      throws IOException {
    try {
        return new com.aspose.words.Document(new ByteArrayInputStream(docBytes));
    } catch (Exception e) {
      throw new IOException("Could not load Word document", e);
    }
  }


      String inputFile = "....../Concept report sample.docx";
      File ff = new File(inputFile);
      byte[] origFileBytes =  < ... read the file bytes > 
      com.aspose.words.Document doc = loadDocument(origFileBytes);
      com.aspose.words.PdfSaveOptions options = new com.aspose.words.PdfSaveOptions();
      doc.save("abdde.pdf", options);

@alex987654321

It seems that you are experiencing high CPU usage and potential infinite loops when converting certain DOCX files to PDF using the Aspose.Words Java library version 25.2. This issue can occur due to various reasons, including complex document structures or specific content within the DOCX file that the library struggles to process.

Here are some steps you can take to troubleshoot and potentially resolve the issue:

  1. Update the Library: First, consider updating to the latest version of the Aspose.Words library. Version 25.2 is quite old, and many performance improvements and bug fixes have been made in subsequent releases. Check the Aspose.Words for Java release notes for details on the latest updates.

  2. Simplify the Document: If possible, try to simplify the DOCX file that is causing the issue. Remove any complex elements such as nested tables, large images, or extensive formatting that might be contributing to the high CPU usage.

  3. Use Save Options: When saving the document to PDF, you can specify various options that might help with performance. For example, you can try adjusting the PdfSaveOptions to optimize the output. Here’s a sample code snippet:

    com.aspose.words.Document doc = loadDocument(origFileBytes);
    com.aspose.words.PdfSaveOptions options = new com.aspose.words.PdfSaveOptions();
    options.setOptimizeOutput(true); // Optimize the output for performance
    doc.save("/home/hyc/Downloads/abdde.pdf", options);
    
  4. Profile the Code: If the problem persists, consider profiling your application to identify which parts of the code are consuming the most CPU. This can help you pinpoint the specific operations that are causing the hang.

  5. Contact Support: If none of the above solutions work, consider reaching out to Aspose support with the problematic DOCX file and the stack trace you provided. They may be able to offer more specific guidance or identify a bug in the library.

By following these steps, you should be able to mitigate the high CPU usage and resolve the hanging issue during the DOCX to PDF conversion process. If you have any further questions or need additional assistance, feel free to ask!

@alex987654321 The problem is not reproducible using the latest 25.3 version of Aspose.Words. Please update to the latest version and let us know if the problem still persists on your side.

Thank you , will give it a try.

1 Like