Show Hidden Text & Get (x, y) Position of Paragraphs in Word Document | Avoid Java Null Pointer Exception

Using Aspose Words (Java v21.4). Reproduced on MacOS and Linux.

I’ve created a test case which simulates what our code is doing (I can’t share our actual code). We’re calling getRectangle() on the layout enumerator in order to get the bounding box and page index information for each line of text. This works for most of the documents we process, but we see 2 problems:

  • For some documents we see a NullPointerException when calling getRectangle().
  • For some documents calling getEntity() on the layout collector returns null.

Attaching a Word doc which demonstrates both issues LayoutEnumeratorNPE.docx (15.2 KB). Example code to trigger the problem is below:

        // For each paragraph print the span bounding box
        NodeCollection<Paragraph> paras = doc.getChildNodes(NodeType.PARAGRAPH, true);
        for (Paragraph para : paras) {
          Object layoutEntity = layoutCollector.getEntity(para);
          if (layoutEntity == null) {
            System.out.println("Unable to get layout entity for " + para);
            continue;
          }

          layoutEnumerator.setCurrent(layoutEntity);
          layoutEnumerator.moveParent();

          // For each line in the paragraph...
          int lineNo = 0;
          do {
            if (layoutEnumerator.getType() == LayoutEntityType.LINE) {
              layoutEnumerator.moveFirstChild();

              // For each span within the line...
              int spanNo = 0;
              do {
                String id = String.format("paragraph=[%s], line=%d, span=%d", para, lineNo, spanNo);
                try {
                  if (layoutEnumerator.getType() == LayoutEntityType.SPAN) {
                    Float bounds = layoutEnumerator.getRectangle();
                    int pageIndex = layoutEnumerator.getPageIndex();
                    System.out.println(id + " page=" + pageIndex + ", bounds=" + bounds);
                  }
                } catch (Exception ex) {
                  LOG.warn("Error occurred getting layout information for " + id, ex);
                }

                spanNo++;
              } while (layoutEnumerator.moveNext());

              layoutEnumerator.moveParent();
              lineNo++;
            }
          } while (layoutEnumerator.movePreviousLogical());
        }

The output from this is as follows (including our own debugging/DOM trace):

08:50:20 [com.elsevier.dp.works.transformation.Main.main()] DEBUG com.elsevier.dp.works.transformation.aspose.AsposeWorksTransformer - Document DOM tree:
Document originalFilename=LayoutEnumeratorNPE.docx
  Section 0 paperSize=A4, differentFirstPageHeaderFooter=false, oddAndEvenPagesHeaderFooter=false
    Body 0.0
      Paragraph 0.0.0
        Run 0.0.0.0 text=Layout enumerator NPE
      Paragraph 1.0.0
      Paragraph 2.0.0
      Paragraph 3.0.0
        Shape 0.3.0.0 shapeType=RECTANGLE
          Paragraph 0.0.3.0.0
      Paragraph 4.0.0
paragraph=[Paragraph 0.0.0], line=0, span=0 page=1, bounds=java.awt.geom.Rectangle2D$Float[x=72.0,y=72.0,w=32.877,h=14.648]
paragraph=[Paragraph 0.0.0], line=0, span=1 page=1, bounds=java.awt.geom.Rectangle2D$Float[x=104.877,y=72.0,w=2.713,h=14.648]
paragraph=[Paragraph 0.0.0], line=0, span=2 page=1, bounds=java.awt.geom.Rectangle2D$Float[x=107.59,y=72.0,w=58.6,h=14.648]
paragraph=[Paragraph 0.0.0], line=0, span=3 page=1, bounds=java.awt.geom.Rectangle2D$Float[x=166.19,y=72.0,w=2.713,h=14.648]
paragraph=[Paragraph 0.0.0], line=0, span=4 page=1, bounds=java.awt.geom.Rectangle2D$Float[x=168.903,y=72.0,w=19.805,h=14.648]
paragraph=[Paragraph 0.0.0], line=0, span=5 page=1, bounds=java.awt.geom.Rectangle2D$Float[x=188.708,y=72.0,w=7.031,h=14.648]
paragraph=[Paragraph 1.0.0], line=0, span=0 page=1, bounds=java.awt.geom.Rectangle2D$Float[x=72.0,y=86.648,w=7.031,h=14.648]
Unable to get layout entity for Paragraph 2.0.0
Unable to get layout entity for Paragraph 3.0.0
08:50:20 [com.elsevier.dp.works.transformation.Main.main()] WARN  com.elsevier.dp.works.transformation.aspose.AsposeWorksTransformer - Error occurred getting layout information for paragraph=[Paragraph 0.0.3.0.0], line=0, span=0
java.lang.NullPointerException: null
        at com.aspose.words.zzYO1.zzsQ(Unknown Source) ~[aspose-words-21.4-jdk17.jar:21.4.0]
        at com.aspose.words.zzYXJ.zzZ(Unknown Source) ~[aspose-words-21.4-jdk17.jar:21.4.0]
        at com.aspose.words.zzYXJ.zzYFU(Unknown Source) ~[aspose-words-21.4-jdk17.jar:21.4.0]
        at com.aspose.words.zzYN4.zzYFU(Unknown Source) ~[aspose-words-21.4-jdk17.jar:21.4.0]
        at com.aspose.words.zzYXJ.zzZ(Unknown Source) ~[aspose-words-21.4-jdk17.jar:21.4.0]
        at com.aspose.words.zzYXJ.zzW(Unknown Source) ~[aspose-words-21.4-jdk17.jar:21.4.0]
        at com.aspose.words.zzYXJ.zzZ(Unknown Source) ~[aspose-words-21.4-jdk17.jar:21.4.0]
        at com.aspose.words.LayoutEnumerator.zzZ5X(Unknown Source) ~[aspose-words-21.4-jdk17.jar:21.4.0]
        at com.aspose.words.LayoutEnumerator.getRectangle(Unknown Source) ~[aspose-words-21.4-jdk17.jar:21.4.0]
        at com.elsevier.dp.works.transformation.aspose.AsposeWorksTransformer.transform(AsposeWorksTransformer.java:121) [classes/:?]
        at com.elsevier.dp.works.transformation.aspose.AsposeWorksTransformer.transform(AsposeWorksTransformer.java:62) [classes/:?]
        at com.elsevier.dp.works.transformation.Main.main(Main.java:39) [classes/:?]
        at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:254) [exec-maven-plugin-3.0.0.jar:?]
        at java.lang.Thread.run(Thread.java:829) [?:?]
paragraph=[Paragraph 4.0.0], line=0, span=0 page=1, bounds=java.awt.geom.Rectangle2D$Float[x=72.0,y=101.296,w=7.031,h=14.648]

@arronhardenels,

We have logged this problem in our issue tracking system with ID WORDSNET-22403. We will further look into the details of this problem and will keep you updated on the status of correction. We apologize for your inconvenience.

@arronhardenels,

Regarding WORDSNET-22403, we have completed the analysis of this issue and concluded to close this issue with “not a bug” status. To fix this problem, please use this line doc.getLayoutOptions().setShowHiddenText(true); after loading the Word document. Complete code is as follows:

Document doc = new Document("C:\\Temp\\LayoutEnumeratorNPE.docx");

doc.getLayoutOptions().setShowHiddenText(true);

LayoutCollector layoutCollector = new LayoutCollector(doc);
LayoutEnumerator layoutEnumerator = new LayoutEnumerator(doc);

// For each paragraph print the span bounding box
NodeCollection<Paragraph> paras = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph para : paras) {
    Object layoutEntity = layoutCollector.getEntity(para);
    if (layoutEntity == null) {
        System.out.println("Unable to get layout entity for " + para);
        continue;
    }

    layoutEnumerator.setCurrent(layoutEntity);
    layoutEnumerator.moveParent();

    // For each line in the paragraph...
    int lineNo = 0;
    do {
        if (layoutEnumerator.getType() == LayoutEntityType.LINE) {
            layoutEnumerator.moveFirstChild();

            // For each span within the line...
            int spanNo = 0;
            do {
                String id = String.format("paragraph=[%s], line=%d, span=%d", para, lineNo, spanNo);
                try {
                    if (layoutEnumerator.getType() == LayoutEntityType.SPAN) {

                        int pageIndex = layoutEnumerator.getPageIndex();
                        System.out.println(id + " page=" + pageIndex + ", bounds=" + layoutEnumerator.getRectangle());
                    }
                } catch (Exception ex) {
                    System.out.println("Error occurred getting layout information for " + id + ex);
                }

                spanNo++;
            } while (layoutEnumerator.moveNext());

            layoutEnumerator.moveParent();
            lineNo++;
        }
    } while (layoutEnumerator.movePreviousLogical());
}

Hidden text is not rendered hence it does not appear in layout model and null layout entity is returned. The exception happens because subject line is inside a shape anchored in the hidden paragraph and it cannot resolve the coordinate of a line.

Thanks for the analysis.

However, what about the NullPointerException being thrown from deep within the library when I call getRectangle()? Throwing a NPE is pretty much always a bug in my book…

Is there a way I can tell whether a paragraph is hidden (I don’t want to process hidden content)?

@arronhardenels,

You can make use of Font.Hidden property to determine whether an element is hidden or not. For example, you can use following line to determine if a paragraph marker character is visible or not:

if (paragraph.getParagraphBreakFont().getHidden()) // it's not visible

Similarly, Run class represents Text in document and you can use Run.Font.Hidden property to identify hidden text.