How do i get the first node the line containing a specific node

guyyyyyyyyyyy · March 29, 2023, 7:14am

Hello,

I wanted to be able to get the first node of the line that contains a specific node, my first idea was the following :

public static Node getFirstNodeOfLine(DocumentBuilder builder, Node refNode)
{
    builder.MoveTo(refNode);
    return builder.CurrentParagraph.FirstChild;
}

And it works as long as the node is part of a paragraph (which is a requirement in my case) and the line doesn’t spread on more that one line.

However in the following case

Where my reference node is the first child of my bookmark, the node returned contains the following text “Line3DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDMyText”
but i expected to get “DDDDDDDDDDDDDDDDDDDDDDDDDDDDDMyText”

Is there a way to achieve this behaviour ?

Sincerely.

alexey.noskov · March 29, 2023, 8:04am

@guyyyyyyyyyyy As you may know MS Word documents are flow document and do not have concept of page or line. The document consumer applications (MS Word, Open Office etc) reflows the document content into lines and pages on the fly. So there is no special “node” for line of text in MS Word documents. Please see our documentation to learn more about Aspose.Words Document Object Model:
https://docs.aspose.com/words/net/aspose-words-document-object-model/

In your particular case a single Run node can span several lines or even pages.

guyyyyyyyyyyy · March 29, 2023, 8:32am

thanks for your answer, i guess then this problem can’t be solved.

alexey.noskov · March 29, 2023, 10:14am

@guyyyyyyyyyyy There is no easy way to achieve this. However, Aspose.Words provides LayoutCollector and LayoutEnumerator classes to get layout information of the document. You can try using these classes to get the lines’ nodes. For example see the following code:

Document doc = new Document(@"C:\Temp\in.docx");

// Split all Run nodes in the document to make them not more than one word.
List<Run> runs = doc.GetChildNodes(NodeType.Run, true).Cast<Run>().ToList();
foreach (Run r in runs)
{
    Run current = r;
    while (current.Text.IndexOf(' ') >= 0)
        current = SplitRun(current, current.Text.IndexOf(' ') + 1);
}

// Wrap all runs in the document with bookmakrs to make it possibel to work with LayoutCollector and LayoutEnumerator
runs = doc.GetChildNodes(NodeType.Run, true).Cast<Run>().ToList();

List<string> tmpBookmakrs = new List<string>();
int bkIndex = 0;
foreach (Run r in runs)
{
    // LayoutCollector and LayoutEnumerator does nto work with nodes in header/footer or in textboxes.
    if (r.GetAncestor(NodeType.HeaderFooter) != null || r.GetAncestor(NodeType.Shape) != null)
        continue;

    BookmarkStart start = new BookmarkStart(doc, string.Format("r{0}", bkIndex));
    BookmarkEnd end = new BookmarkEnd(doc, start.Name);

    r.ParentNode.InsertBefore(start, r);
    r.ParentNode.InsertAfter(end, r);

    tmpBookmakrs.Add(start.Name);
    bkIndex++;
}

// Now we can use collector and enumerator to get runs per line in MS Word document.
LayoutCollector collector = new LayoutCollector(doc);
LayoutEnumerator enumerator = new LayoutEnumerator(doc);

object currentLine = null;
foreach (string bkName in tmpBookmakrs)
{
    Bookmark bk = doc.Range.Bookmarks[bkName];

    enumerator.Current = collector.GetEntity(bk.BookmarkStart);
    while (enumerator.Type != LayoutEntityType.Line)
        enumerator.MoveParent();

    if (currentLine != enumerator.Current)
    {
        currentLine = enumerator.Current;

        Console.WriteLine();
        Console.WriteLine("-------=========Start Of Line=========-------");
    }

    Run run = bk.BookmarkStart.NextSibling as Run;
    if (run != null)
        Console.Write(run.Text);
}

private static Run SplitRun(Run run, int position)
{
    Run afterRun = (Run)run.Clone(true);
    run.ParentNode.InsertAfter(afterRun, run);
    afterRun.Text = run.Text.Substring(position);
    run.Text = run.Text.Substring(0, position);
    return afterRun;
}