Using Aspose Words (Java v21.4). Reproduced on MacOS and Linux.
I’ve created a test case which simulates what our code is doing (I can’t share our actual code). We’re calling getRectangle() on the layout enumerator in order to get the bounding box and page index information for each line of text. This works for most of the documents we process, but we see 2 problems:
- For some documents we see a NullPointerException when calling getRectangle().
- For some documents calling getEntity() on the layout collector returns null.
Attaching a Word doc which demonstrates both issues LayoutEnumeratorNPE.docx (15.2 KB). Example code to trigger the problem is below:
// For each paragraph print the span bounding box
NodeCollection<Paragraph> paras = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph para : paras) {
Object layoutEntity = layoutCollector.getEntity(para);
if (layoutEntity == null) {
System.out.println("Unable to get layout entity for " + para);
continue;
}
layoutEnumerator.setCurrent(layoutEntity);
layoutEnumerator.moveParent();
// For each line in the paragraph...
int lineNo = 0;
do {
if (layoutEnumerator.getType() == LayoutEntityType.LINE) {
layoutEnumerator.moveFirstChild();
// For each span within the line...
int spanNo = 0;
do {
String id = String.format("paragraph=[%s], line=%d, span=%d", para, lineNo, spanNo);
try {
if (layoutEnumerator.getType() == LayoutEntityType.SPAN) {
Float bounds = layoutEnumerator.getRectangle();
int pageIndex = layoutEnumerator.getPageIndex();
System.out.println(id + " page=" + pageIndex + ", bounds=" + bounds);
}
} catch (Exception ex) {
LOG.warn("Error occurred getting layout information for " + id, ex);
}
spanNo++;
} while (layoutEnumerator.moveNext());
layoutEnumerator.moveParent();
lineNo++;
}
} while (layoutEnumerator.movePreviousLogical());
}
The output from this is as follows (including our own debugging/DOM trace):
08:50:20 [com.elsevier.dp.works.transformation.Main.main()] DEBUG com.elsevier.dp.works.transformation.aspose.AsposeWorksTransformer - Document DOM tree:
Document originalFilename=LayoutEnumeratorNPE.docx
Section 0 paperSize=A4, differentFirstPageHeaderFooter=false, oddAndEvenPagesHeaderFooter=false
Body 0.0
Paragraph 0.0.0
Run 0.0.0.0 text=Layout enumerator NPE
Paragraph 1.0.0
Paragraph 2.0.0
Paragraph 3.0.0
Shape 0.3.0.0 shapeType=RECTANGLE
Paragraph 0.0.3.0.0
Paragraph 4.0.0
paragraph=[Paragraph 0.0.0], line=0, span=0 page=1, bounds=java.awt.geom.Rectangle2D$Float[x=72.0,y=72.0,w=32.877,h=14.648]
paragraph=[Paragraph 0.0.0], line=0, span=1 page=1, bounds=java.awt.geom.Rectangle2D$Float[x=104.877,y=72.0,w=2.713,h=14.648]
paragraph=[Paragraph 0.0.0], line=0, span=2 page=1, bounds=java.awt.geom.Rectangle2D$Float[x=107.59,y=72.0,w=58.6,h=14.648]
paragraph=[Paragraph 0.0.0], line=0, span=3 page=1, bounds=java.awt.geom.Rectangle2D$Float[x=166.19,y=72.0,w=2.713,h=14.648]
paragraph=[Paragraph 0.0.0], line=0, span=4 page=1, bounds=java.awt.geom.Rectangle2D$Float[x=168.903,y=72.0,w=19.805,h=14.648]
paragraph=[Paragraph 0.0.0], line=0, span=5 page=1, bounds=java.awt.geom.Rectangle2D$Float[x=188.708,y=72.0,w=7.031,h=14.648]
paragraph=[Paragraph 1.0.0], line=0, span=0 page=1, bounds=java.awt.geom.Rectangle2D$Float[x=72.0,y=86.648,w=7.031,h=14.648]
Unable to get layout entity for Paragraph 2.0.0
Unable to get layout entity for Paragraph 3.0.0
08:50:20 [com.elsevier.dp.works.transformation.Main.main()] WARN com.elsevier.dp.works.transformation.aspose.AsposeWorksTransformer - Error occurred getting layout information for paragraph=[Paragraph 0.0.3.0.0], line=0, span=0
java.lang.NullPointerException: null
at com.aspose.words.zzYO1.zzsQ(Unknown Source) ~[aspose-words-21.4-jdk17.jar:21.4.0]
at com.aspose.words.zzYXJ.zzZ(Unknown Source) ~[aspose-words-21.4-jdk17.jar:21.4.0]
at com.aspose.words.zzYXJ.zzYFU(Unknown Source) ~[aspose-words-21.4-jdk17.jar:21.4.0]
at com.aspose.words.zzYN4.zzYFU(Unknown Source) ~[aspose-words-21.4-jdk17.jar:21.4.0]
at com.aspose.words.zzYXJ.zzZ(Unknown Source) ~[aspose-words-21.4-jdk17.jar:21.4.0]
at com.aspose.words.zzYXJ.zzW(Unknown Source) ~[aspose-words-21.4-jdk17.jar:21.4.0]
at com.aspose.words.zzYXJ.zzZ(Unknown Source) ~[aspose-words-21.4-jdk17.jar:21.4.0]
at com.aspose.words.LayoutEnumerator.zzZ5X(Unknown Source) ~[aspose-words-21.4-jdk17.jar:21.4.0]
at com.aspose.words.LayoutEnumerator.getRectangle(Unknown Source) ~[aspose-words-21.4-jdk17.jar:21.4.0]
at com.elsevier.dp.works.transformation.aspose.AsposeWorksTransformer.transform(AsposeWorksTransformer.java:121) [classes/:?]
at com.elsevier.dp.works.transformation.aspose.AsposeWorksTransformer.transform(AsposeWorksTransformer.java:62) [classes/:?]
at com.elsevier.dp.works.transformation.Main.main(Main.java:39) [classes/:?]
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:254) [exec-maven-plugin-3.0.0.jar:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
paragraph=[Paragraph 4.0.0], line=0, span=0 page=1, bounds=java.awt.geom.Rectangle2D$Float[x=72.0,y=101.296,w=7.031,h=14.648]