Twice the slowness when including license vs not including the license for the same file in Word truncation

We are truncating word file to 3 pages using extractPages function using Aspose Java word and we observe slowness while we provide the license (~6seconds).

It is faster if the license is not provided (~3seconds), though the setLicense is not taking any time .
sample_file.docx (1.6 MB)

Attaching the sample file.

We are using simple Java code to truncate the pages to 3 using the extractPages function only.

@ELSSAM_elsevier_com

It seems you’re experiencing slowness when using a license with the Aspose.Words library for Java while truncating a Word document to 3 pages. This behavior is unusual, as applying a license should not significantly impact performance.

Here are a few suggestions to troubleshoot and potentially improve the performance:

  1. Check License Validity: Ensure that the license file is valid and correctly applied. An invalid license might cause the library to operate in a trial mode, which could affect performance.

  2. Profile Your Code: Use a profiler to identify where the time is being spent in your code. This can help determine if the slowness is related to the license application or the document processing itself.

  3. Optimize Document: Before truncating, consider optimizing the document. For example, reducing the size of images or removing unnecessary elements might help speed up processing.

  4. Use Latest Version: Ensure you are using the latest version of Aspose.Words for Java, as performance improvements and bug fixes are regularly included in updates.

  5. Contact Support: If the issue persists, consider reaching out to Aspose support with your findings. They may provide insights specific to your scenario.

If you need further assistance, please provide more details about your implementation, and I can help you refine your approach.

Yes,

  1. We are using the valid license
  2. We added logs to confirm about the execution and the time is taken on the extractPages method majorly and not in the setLicense or anything else.
  3. We do not have control over the input file, we can only tune the code with options provided by Aspose.
  4. We are using 25.9, please confirm if there is any change related to this performance in 25.10.

We see simple files ~5MB, ~10MB taking 100+seconds only to truncate files and causing a peformance impact, while the memory and CPU looks stable.

@ELSSAM_elsevier_com The behavior is expected. When you are using Aspose.Words in evaluation mode, the input document size is limited to several hundreds of paragraphs. So Aspose.Words in this case does not build whole document layout and page extraction process is executed faster.
Your document is quite large (145 page). In evaluation mode though only 35 pages are processed.

@alexey.noskov Its not about this input, this is a sample prepared to identify the root cause of this.
We are facing this performance issue with a 32 page file, where truncating the page without the license is quicker (15seconds) than loading it with the license (40seconds) in local machine.

We see slowness upto 150seconds to truncate the same file while run on cloud sometimes while it completes at 45seconds minumum…

@ELSSAM_elsevier_com Could you please provide the problematic input document where the slowness is observed whiteout truncation document by evaluation version limitations.

While we only have data from production which cannot be shared at the moment, we will prepare an anonymized sample shortly.

Our objective would be to optimize the document loading, with option to restrict to specific number of pages (maybe 3 pages) like it is done in the trial version (to 35 pages).
Would like to know if there is an option already to load specific number of pages while loading the document.

@ELSSAM_elsevier_com MS Word documents are flow by their nature and there is no “page” concept. To use ExtractPages method, Aspose.Words has to build document layout to determine where each page starts and ends. So this is technically impossible to limited number of loaded pages.

Okay, we have 2 things which is unclear,

  1. What is the best optimised way to limit a word file from fetching first 3 pages and truncating the rest of the pages, the extractPages function is slow (25+ seconds) when it has 4 or 5 images in them. Can you please share the best optimised way of loading a word file (both doc and docx only) including the best loading options and saving options (as docx) to be used in Java?
    Please consider the above provided sample as input for this.
  2. How can ASPOSE be quicker if the license is not set and slower when the paid license is provided. This is easily observable when the file size is more than 15MB.

@ELSSAM_elsevier_com

  1. Actually, document loading is a very fast operation. With your particular document, loading takes less than a second on my side. There are no load options that can make the loading process faster. However, if your input document contains linked images that need to be loaded from external resources, the loading process can be slower due to the additional time required to fetch these resources. In this case, you can skip loading external resources by implementing the IResourceLoadingCallback interface.

  2. In your case, most of the time spent on page extraction is consumed by building the document layout. Unfortunately, there is no way to make this process faster. You can try make the document smaller before building document layout.

Document doc = new Document("C:\\Temp\\in.docx");

// Remove all section except the first
while (doc.getSections().getCount() > 1)
    doc.getLastSection().remove();

// Remove all child nodes from the first section's body except the first 100.
while (doc.getFirstSection().getBody().getChildNodes(NodeType.ANY, false).getCount() > 20)
    doc.getFirstSection().getBody().getLastChild().remove();

int pages = doc.getPageCount() > 3 ? 3 : doc.getPageCount();
doc.extractPages(0, pages).save("C:\\Temp\\out.docx");