extractPages in Aspose 25.2 running infinitely

We noticed as part of our upgrade to 25.2 that for the given file, the document.extractPages was running infinitely for more than 30mins though the file is light weight.
Manuscript.docx (34.9 KB)

Could you share some limelight on the issue?

P.S: Upgrading to 25.4 now has solved the issue but would like to understand the technical issue behind this.

@ELSSAM_elsevier_com

It seems that you encountered an issue with the extractPages method in Aspose.Words version 25.2, where it was running indefinitely for a lightweight document. This behavior can sometimes occur due to specific complexities or bugs in the library version you were using.

While you mentioned that upgrading to version 25.4 resolved the issue, it’s important to note that the extractPages method is designed to return a document representing a specified range of pages. The method takes two parameters: the zero-based index of the first page to extract and the number of pages to extract. The resulting document should ideally maintain the layout similar to what you would see in Microsoft Word, including headers, footers, and cross-references.

The indefinite running time you experienced could have been due to a bug or inefficiency in the handling of certain document structures in version 25.2. Such issues can arise from the complexity of the document’s content or formatting, which may not have been fully optimized in that version.

If you want to further investigate the technical details behind this issue, you might consider checking the release notes or change logs for version 25.4, as they often provide insights into bug fixes and performance improvements that could clarify what was addressed in the upgrade.

For your reference, here is a simple example of how to use the extractPages method:

Document doc = new Document("path/to/your/document.docx");
Document extractedDoc = doc.ExtractPages(0, 2); // Extracts the first two pages
extractedDoc.Save("path/to/save/extractedDocument.docx");

This code snippet demonstrates how to extract pages from a document and save the result.

If you have any further questions or need assistance with specific scenarios, feel free to ask!

Sources:
[1]: Document.extractPages method | Aspose.Words for Node.js
[2]: Document.ExtractPages | Aspose.Words for .NET

We could not find any reference to this in the release notes, please provide more information about the root cause of the issue.

@ELSSAM_elsevier_com Unfortunately, it is quite hard to answer the question. The problem might be caused by new feature introduced in 25.2 version - character spacing control. We have seen similar problem before after introducing this feature and improved the algorithm in the most recent versions.

With Aspose 25.4 we observe that still the problem persist where the thread is running infinitely. Sharing the thread trace below.

java.lang.Thread.State: RUNNABLE
	at com.aspose.words.zzxY.zzHF(Unknown Source)
	at com.aspose.words.zzWis.zznx(Unknown Source)
	at com.aspose.words.zzWis.zzA(Unknown Source)
	at com.aspose.words.zzWis.movePrevious(Unknown Source)
	at com.aspose.words.zzWRD.zzZmh(Unknown Source)
	at com.aspose.words.zzVRu.zzZ90(Unknown Source)
	at com.aspose.words.zzVRu.zzVWQ(Unknown Source)
	at com.aspose.words.zzVRu.zzZma(Unknown Source)
	at com.aspose.words.zzWFl.zzZ90(Unknown Source)
	at com.aspose.words.zzWFl.zzrI(Unknown Source)
	at com.aspose.words.zzWBK.zzZPS(Unknown Source)
	at com.aspose.words.zzWBK.zzZ90(Unknown Source)
	at com.aspose.words.zzWTS.zzZmh(Unknown Source)
	at com.aspose.words.zzWTS.zzWd9(Unknown Source)
	at com.aspose.words.zzWTS.zzZmh(Unknown Source)
	at com.aspose.words.zzX4v.zzXTZ(Unknown Source)
	at com.aspose.words.zzX4v.zzZmh(Unknown Source)
	at com.aspose.words.zzW38.zzYnE(Unknown Source)
	at com.aspose.words.zzW38.zzWd9(Unknown Source)
	at com.aspose.words.zzW38.zzZ90(Unknown Source)
	at com.aspose.words.zzZoq.zzY8v(Unknown Source)
	at com.aspose.words.zzZoq.zzZmh(Unknown Source)
	at com.aspose.words.zzZoq.zzZ90(Unknown Source)
	at com.aspose.words.zzXrK.zzXXo(Unknown Source)
	at com.aspose.words.zzXrK.zzZPS(Unknown Source)
	at com.aspose.words.zzXrK.zzZuM(Unknown Source)
	at com.aspose.words.zzXrK.zzWZs(Unknown Source)
	at com.aspose.words.zzIb.zzWZs(Unknown Source)
	at com.aspose.words.zzXlt.zzZmh(Unknown Source)
	at com.aspose.words.zzXo5.zzZ0s(Unknown Source)
	at com.aspose.words.zzY4d.zzYSg(Unknown Source)
	at com.aspose.words.Document.updatePageLayout(Unknown Source)
	at com.aspose.words.Document.zzYsL(Unknown Source)
	at com.aspose.words.Document.getPageCount(Unknown Source)

@ELSSAM_elsevier_com Could you please provide simple cod that will allow us to reproduce the problem on our side?
I have tested with the following simple code and the problem is not reproducible:

Document doc = new Document("C:\\Temp\\in.docx");
        
for(int i=0; i<doc.getPageCount(); i++)
{
    Document page = doc.extractPages(i, 1);
    doc.save("C:\\Temp\\page_"+i+".docx");
}

Would like to add few details:

  1. This issue was reprocible in MAC in 25.2 but started working fine with 25.4, but in alpine machine its failing to complete(running infinitely). I assume with the thread stack, it should be understandable from your side. I can see a while(true) in zzWRD.zzZmh method which could be causing this infinite execution.
  2. We are running a very similar code to yours, nothing complex in it.

@ELSSAM_elsevier_com
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSJAVA-3092

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.