Hi,
We are migrating one of our legacy .NET services to Java and have encountered a significant decrease in performance on a flow that involves numerous calls to the LayoutCollector.getStartPageIndex method.
Unfortunately, there is no way to reduce the number of calls, as we need to identify the page number on which each paragraph in an OOXML document occurs.
Running the code bellow takes more than 6 seconds for a 2MB test file (attached) using the 17.3.0 version of aspose-words, while in 16.4.0, which is our current production version, it’s even slower (running the same code takes more than 8 seconds):
public Map<Integer, Node> makeProcessingMap(final Document document) throws Exception {
final Map<Integer, Node> processingMap = new HashMap<>();
final LayoutCollector collector = new LayoutCollector(document);
final NodeCollection nodes = document.getChildNodes(NodeType.PARAGRAPH, true);
for (final Node node : nodes) {
int startPageIndex = collector.getStartPageIndex(node);
processingMap.put(startPageIndex, node);
}
return processingMap;
}
The legacy .NET code was less optimised and made a lot more calls to the GetStartPageIndex method, however the overall performance was significantly better (the Java implementation takes almost twice as long to process the same file).
We need to be able to process files that are much larger than this 2MB sample, but can’t get around this performance issue.
How can we overcome this performance gap between the Java and .NET implementations?
Thank you,
Oana