Aspose.words for java如何准确获取文档的行数?

下面代码:

    public static int getLines(Document doc) throws Exception {
        LayoutCollector layoutCollector = new LayoutCollector(doc);
        LayoutEnumerator layoutEnumerator = new LayoutEnumerator(doc);
        int lineCount = 0;
        for (Paragraph paragraph : (NodeCollection<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true)) {
            layoutEnumerator.setCurrent(layoutCollector.getEntity(paragraph));
            do {
                lineCount++;
            } while (layoutEnumerator.moveNextLogical());
        }
        return lineCount;
    }

    public static void main(String[] args) {
        try {
            Document doc = loadFromDocxFile(Path.of("/Users/pizhai/Documents/test.docx"));
            int lines = getLines(doc);
            System.out.println("lines = " + lines);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

代码执行到:

      layoutEnumerator.setCurrent(layoutCollector.getEntity(paragraph));

报错:

java.lang.NullPointerException: Value cannot be null.
Parameter name: value
	at com.aspose.words.LayoutEnumerator.setCurrent(Unknown Source)
  1. 这种方式获取文档的行数可行吗?可行的话结果准确吗?
  2. 如果是可行的,帮我修改正确。
  3. 如果不可行,请问有可以准确获取word文档行数的方法吗?

@pizhai2

Summary
The NullPointerException is thrown because layoutCollector.getEntity(paragraph) can return null for some paragraphs (e.g., paragraphs inside tables, hidden paragraphs, or paragraphs that do not generate a layout entity). When the null value is passed to LayoutEnumerator.setCurrent(...) the enumerator crashes.
Aspose.Words does not provide a direct “line‑count” property, but you can obtain a reliable count by enumerating layout entities of type LINE after the document layout has been created.


Why it happens

Cause Explanation
layoutCollector.getEntity(paragraph) returns null The collector creates a layout entity only for paragraphs that actually participate in the layout. Paragraphs inside tables, hidden paragraphs, or split paragraphs may have no entity, so the call returns null and the subsequent setCurrent call throws the NPE.
Layout not yet built If the document has never been laid out (e.g., it was just opened), the layout engine may not have generated any entities, leading to null results.
Counting lines per paragraph is fragile Word wraps text based on page size, margins, fonts, hyphenation, etc. Counting lines by iterating over paragraphs can miss lines that belong to the same paragraph or count extra “line” entities created by tables, footnotes, etc.

Because of these factors, the most robust approach is to count layout entities of type LINE directly, which reflects the exact layout Aspose.Words will render.


What to do next

  1. Force layout creation – call document.updatePageLayout() (or any operation that triggers layout, such as rendering to PDF/HTML) before you start counting.
  2. Iterate over layout entities and increment a counter for each entity whose type is LINE.
  3. (Optional) If you need a per‑paragraph line breakdown, use LayoutCollector only after checking that the returned entity is not null.

Below is a concise example for Java (compatible with recent Aspose.Words versions):

import com.aspose.words.*;

public class LineCounter {

 public static int getLineCount(Document doc) throws Exception {
 // 1️⃣ Ensure the layout engine has built the layout.
 doc.updatePageLayout();

 // 2️⃣ Create a LayoutEnumerator for the whole document.
 LayoutEnumerator enumerator = new LayoutEnumerator(doc);

 int lineCount = 0;

 // 3️⃣ Move to the first layout entity.
 enumerator.moveFirst();

 // 4️⃣ Walk through every layout entity.
 while (enumerator.moveNext()) {
 // 5️⃣ Count only entities that represent a line of text.
 if (enumerator.getType() == LayoutEntityType.LINE) {
 lineCount++;
 }
 }
 return lineCount;
 }

 public static void main(String[] args) {
 try {
 Document doc = new Document("test.docx"); // adjust the path as needed
 int lines = getLineCount(doc);
 System.out.println("Total lines = " + lines);
 } catch (Exception e) {
 e.printStackTrace();
 }
 }
}

Per‑paragraph line count (with null‑check)

public static int getLinesPerParagraph(Document doc) throws Exception {
 doc.updatePageLayout();

 LayoutCollector collector = new LayoutCollector(doc);
 LayoutEnumerator enumerator = new LayoutEnumerator(doc);
 int totalLines = 0;

 for (Paragraph para : (NodeCollection<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true)) {
 LayoutEntity entity = collector.getEntity(para);
 if (entity == null) {
 // Paragraph has no direct layout entity (e.g., inside a table). Skip it.
 continue;
 }
 enumerator.setCurrent(entity);

 // Count the first line of this paragraph.
 totalLines++;

 // Count any additional logical lines belonging to the same paragraph.
 while (enumerator.moveNextLogical()) {
 totalLines++;
 }
 }
 return totalLines;
}

Limitations & notes

  • The count reflects the layout as rendered by Aspose.Words (page size, margins, font substitution, hyphenation, etc.). Changing any of those settings will change the result.
  • LINE entities also represent lines inside tables, footnotes, headers/footers, and text boxes. If you need to exclude those, add a check on the parent entity type (e.g., enumerator.getParent().getType()).
  • Because Word’s layout can vary with printer drivers, zoom levels, or custom fonts, the line count may differ from what a user sees in Microsoft Word. The method above provides the count that Aspose.Words would render.

Next steps for you

  1. Replace the existing line‑count logic with the getLineCount method shown above.
  2. Run the code on a few representative documents to confirm that the numbers align with the visual line count you expect.
  3. If you need to filter out lines from tables, footnotes, etc., insert a condition such as if (enumerator.getParent().getType() == LayoutEntityType.PARAGRAPH) before incrementing the counter.

Feel free to request further refinements (e.g., returning a map of paragraph → line count) or to raise any edge cases you encounter.

:warning: I could not fully verify this explanation against the retrieved sources. Please double-check the behaviour in your environment and confirm the exact SDK/version.

1 Like

你提供的第一种方式返回值总是0;第二种方式结果远超文档的行数。

@pizhai2 您可以使用以下代码将文档分割成行:

Document doc = new Document("C:\\Temp\\in.docx");

// Split all Run nodes in the document to make them not more than one word.
Node[] runs = doc.getChildNodes(NodeType.RUN, true).toArray();
for (Node n : runs)
{
    Run current = (Run)n;
    while (current.getText().indexOf(' ') >= 0)
        current = SplitRun(current, current.getText().indexOf(' ') + 1);
}

// Wrap all runs in the document with bookmarks to make it possible to work with LayoutCollector and LayoutEnumerator
runs = doc.getChildNodes(NodeType.RUN, true).toArray();
    
ArrayList<String> tmpBookmakrs = new ArrayList<String>();
int bkIndex = 0;
for (Node r : runs)
{
    // LayoutCollector and LayoutEnumerator does not work with nodes in header/footer or in textboxes.
    if (r.getAncestor(NodeType.HEADER_FOOTER) != null || r.getAncestor(NodeType.SHAPE) != null)
        continue;
        
    BookmarkStart start = new BookmarkStart(doc, "r" + bkIndex);
    BookmarkEnd end = new BookmarkEnd(doc, start.getName());
        
    r.getParentNode().insertBefore(start, r);
    r.getParentNode().insertAfter(end, r);
        
    tmpBookmakrs.add(start.getName());
    bkIndex++;
}

// Now we can use collector and enumerator to get runs per line in MS Word document.
LayoutCollector collector = new LayoutCollector(doc);
LayoutEnumerator enumerator = new LayoutEnumerator(doc);
    
Object currentLine = null;
for (String bkName : tmpBookmakrs)
{
    Bookmark bk = doc.getRange().getBookmarks().get(bkName);
        
    enumerator.setCurrent(collector.getEntity(bk.getBookmarkStart()));
    while (enumerator.getType() != LayoutEntityType.LINE)
        enumerator.moveParent();
            
    if (!enumerator.getCurrent().equals(currentLine))
    {
        currentLine = enumerator.getCurrent();
            
        System.out.println();
        System.out.println("-------=========Start Of Line=========-------");
        // Here you can get coordinates of the line.
        System.out.println(enumerator.getRectangle());
    }
        
    Node nextNode = bk.getBookmarkStart().getNextSibling();
    if (nextNode != null && nextNode.getNodeType() == NodeType.RUN)
        System.out.print(((Run)nextNode).getText());
}
private static Run SplitRun(Run run, int position)
{
    Run afterRun = (Run)run.deepClone(true);
    run.getParentNode().insertAfter(afterRun, run);
    afterRun.setText(run.getText().substring(position));
    run.setText(run.getText().substring(0, position));
    return afterRun;
}