Free Support Forum -

Performance issues with LayoutCollector.getStartPageIndex in Java


We are migrating one of our legacy .NET services to Java and have encountered a significant decrease in performance on a flow that involves numerous calls to the LayoutCollector.getStartPageIndex method.

Unfortunately, there is no way to reduce the number of calls, as we need to identify the page number on which each paragraph in an OOXML document occurs.

Running the code bellow takes more than 6 seconds for a 2MB test file (attached) using the 17.3.0 version of aspose-words, while in 16.4.0, which is our current production version, it’s even slower (running the same code takes more than 8 seconds):
public Map<Integer, Node> makeProcessingMap(final Document document) throws Exception {
final Map<Integer, Node> processingMap = new HashMap<>();
final LayoutCollector collector = new LayoutCollector(document);
final NodeCollection nodes = document.getChildNodes(NodeType.PARAGRAPH, true);
for (final Node node : nodes) {
int startPageIndex = collector.getStartPageIndex(node);
processingMap.put(startPageIndex, node);
return processingMap;

The legacy .NET code was less optimised and made a lot more calls to the GetStartPageIndex method, however the overall performance was significantly better (the Java implementation takes almost twice as long to process the same file).

We need to be able to process files that are much larger than this 2MB sample, but can’t get around this performance issue.

How can we overcome this performance gap between the Java and .NET implementations?

Thank you,

Hi Oana,

Thanks for your inquiry.

In this case, Aspose.Words needs to build a ‘page layout’ of the document internally. Roughly, Aspose.Words layouts 10 pages per second; so, the extra amount of time Aspose.Words takes to format a document into pages depends on the number of pages your Word document has. Also, please note that this process is not linear; it may take a minute to build layout of one page and may take a few seconds to process 100 pages. Put simply, the processing time and memory usage fully depend on your documents and their complexity.

We have tested the following code over Windows 10:
<span style=“color: rgb(128, 128, 128); font-style: italic; background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>
<span style=“color: rgb(128, 128, 128); font-style: italic; background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>// Document load
<pre style=“background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>Document doc = new Document(“D:\temp\docx_2mb.docx”);

// Rest of the code
LayoutCollector collector = new LayoutCollector(doc);

NodeCollection nodes = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Node node : nodes) {
int startPageIndex = collector.getStartPageIndex(node);

We have observed the following readings over .NET Framework 4.6 and Java 8 platforms. There are 308 pages in your Word document and around three seconds difference between .NET and Java platforms looks OK.


Aspose.Words for .NET (17.3)

Aspose.Words for Java (17.3)

Document Load (ms)

Rest of the Code (ms)

Document Load (ms)

Rest of the Code (ms)

Reading 1





Reading 2





Reading 3










Total (Avg)



Please let us know if we can be of any further assistance.

Best regards,

Hi Awais,

Thank you for your analysis! We’ll look further into it and see how we could improve the response time.

Have a nice day!