How to Get Coordinates (top and left) of Text in Word Document using .NET

Hi,

I am attaching screenshot of word document below. I want to find the TOP and LEFT for the all strings included in “<<>>” on word document, for ex. SIGNM01. How can i achieve it using aspose? Can you please share the required code?

image.png (7.0 KB)

Thanks in advance,
Hemant

@hemant_thote

The Aspose.Words.Layout namespace provides classes that allow to access information such as on what page and where on a page particular document elements are positioned, when the document is formatted into pages.

You can use LayoutCollector.GetEntity method to get an opaque position of the LayoutEnumerator which corresponds to the specified node.

Please note that all text in Word document is in Run nodes. So, please insert bookmark before text <<>> and get the position (Left Top) of BookmarkStart node.

Following code example shows how to get the position of desired text. Hope this helps you.

Document doc = new Document("eSignature.Test.02.docx");
//Find the text between <<>> and insert bookmark
doc.Range.Replace(new Regex(@"\<<.*?\>>"), "", new FindReplaceOptions() { ReplacingCallback = new FindAndInsertBookmark() });

LayoutCollector layoutCollector = new LayoutCollector(doc);
LayoutEnumerator layoutEnumerator = new LayoutEnumerator(doc);

//Display the left top position of text between angle bracket.
foreach (Bookmark bookmark in doc.Range.Bookmarks)
{
    if (bookmark.Name.StartsWith("bookmark_"))
    {
        layoutEnumerator.Current = layoutCollector.GetEntity(bookmark.BookmarkStart);
        Console.WriteLine(" --> Left : " + layoutEnumerator.Rectangle.Left + " Top : " + layoutEnumerator.Rectangle.Top);
    }
}
doc.Save("20.10.docx");
public class FindAndInsertBookmark : IReplacingCallback
{
    int i = 1;
    DocumentBuilder builder;
    ReplaceAction IReplacingCallback.Replacing(ReplacingArgs e)
    {
        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = e.MatchNode;

        if (builder == null)
            builder = new DocumentBuilder((Document)currentNode.Document);

        // The first (and may be the only) run can contain text before the match, 
        // in this case it is necessary to split the run.
        if (e.MatchOffset > 0)
            currentNode = SplitRun((Run)currentNode, e.MatchOffset);

        ArrayList runs = new ArrayList();

        // Find all runs that contain parts of the match string.
        int remainingLength = e.Match.Value.Length;
        while (
            (remainingLength > 0) &&
            (currentNode != null) &&
            (currentNode.GetText().Length <= remainingLength))
        {
            runs.Add(currentNode);
            remainingLength = remainingLength - currentNode.GetText().Length;

            // Select the next Run node. 
            // Have to loop because there could be other nodes such as BookmarkStart etc.
            do
            {
                currentNode = currentNode.NextSibling;
            }
            while ((currentNode != null) && (currentNode.NodeType != NodeType.Run));
        }

        // Split the last run that contains the match if there is any text left.
        if ((currentNode != null) && (remainingLength > 0))
        {
            SplitRun((Run)currentNode, remainingLength);
            runs.Add(currentNode);
        }

        Run run = (Run)runs[0];
        builder.MoveTo(run);
        builder.StartBookmark("bookmark_" + i);
        builder.EndBookmark("bookmark_" + i);
        i++;
        ;

        // Signal to the replace engine to do nothing because we have already done all what we wanted.
        return ReplaceAction.Skip;
    }

    /// <summary>
    /// Splits text of the specified run into two runs.
    /// Inserts the new run just after the specified run.
    /// </summary>
    private static Run SplitRun(Run run, int position)
    {
        Run afterRun = (Run)run.Clone(true);
        afterRun.Text = run.Text.Substring(position);
        run.Text = run.Text.Substring(0, position);
        run.ParentNode.InsertAfter(afterRun, run);
        return afterRun;
    }
}

Hi,

This code worked for me successfully. But I came across one scenario where this is failing and giving null exception. I have a document in which token is placed in footer, which this code can’t handle and failing at getEntity() method.

Please find attachments for error and sample document.

aspose issue.zip (241.1 KB)

Thanks,
Hemant

@hemant_thote

The LayoutCollector.GetEntity method works for only Paragraph nodes, as well as indivisible inline nodes, e.g. BookmarkStart or Shape. It does not work for Run, CellRow or Table nodes, and nodes within header/footer.

Hi,

Thank you for your response. Can you please help me with the code which works for everything i.e. page layout, header and footer ?

Thanks,
Hemant

@hemant_thote

Following code example shows ways of traversing a document’s layout entities including header and footer. Hope this helps you.

public void LayoutEnumerator()
{
    // Open a document that contains a variety of layout entities.
    // Layout entities are pages, cells, rows, lines, and other objects included in the LayoutEntityType enum.
    // Each layout entity has a rectangular space that it occupies in the document body.
    Document doc = new Document(MyDir + "Meryl Linch Commercial Spaces_VA Test Deal 15_04_2021_2021-04-20_15_34_04.docx");

    // Create an enumerator that can traverse these entities like a tree.
    LayoutEnumerator layoutEnumerator = new LayoutEnumerator(doc);

    Assert.AreEqual(doc, layoutEnumerator.Document);

    layoutEnumerator.MoveParent(LayoutEntityType.Page);

    Assert.AreEqual(LayoutEntityType.Page, layoutEnumerator.Type);
    Assert.Throws<InvalidOperationException>(() => Console.WriteLine(layoutEnumerator.Text));

    // We can call this method to make sure that the enumerator will be at the first layout entity.
    layoutEnumerator.Reset();

    // There are two orders that determine how the layout enumerator continues traversing layout entities
    // when it encounters entities that span across multiple pages.
    // 1 -  In visual order:
    // When moving through an entity's children that span multiple pages,
    // page layout takes precedence, and we move to other child elements on this page and avoid the ones on the next.
    Console.WriteLine("Traversing from first to last, elements between pages separated:");
    TraverseLayoutForward(layoutEnumerator, 1);

    // Our enumerator is now at the end of the collection. We can traverse the layout entities backwards to go back to the beginning.
    Console.WriteLine("Traversing from last to first, elements between pages separated:");
    TraverseLayoutBackward(layoutEnumerator, 1);

    // 2 -  In logical order:
    // When moving through an entity's children that span multiple pages,
    // the enumerator will move between pages to traverse all the child entities.
    Console.WriteLine("Traversing from first to last, elements between pages mixed:");
    TraverseLayoutForwardLogical(layoutEnumerator, 1);

    Console.WriteLine("Traversing from last to first, elements between pages mixed:");
    TraverseLayoutBackwardLogical(layoutEnumerator, 1);
}

/// <summary>
/// Enumerate through layoutEnumerator's layout entity collection front-to-back,
/// in a depth-first manner, and in the "Visual" order.
/// </summary>
private static void TraverseLayoutForward(LayoutEnumerator layoutEnumerator, int depth)
{
    do
    {
        PrintCurrentEntity(layoutEnumerator, depth);

        if (layoutEnumerator.MoveFirstChild())
        {
            TraverseLayoutForward(layoutEnumerator, depth + 1);
            layoutEnumerator.MoveParent();
        }
    } while (layoutEnumerator.MoveNext());
}

/// <summary>
/// Enumerate through layoutEnumerator's layout entity collection back-to-front,
/// in a depth-first manner, and in the "Visual" order.
/// </summary>
private static void TraverseLayoutBackward(LayoutEnumerator layoutEnumerator, int depth)
{
    do
    {
        PrintCurrentEntity(layoutEnumerator, depth);

        if (layoutEnumerator.MoveLastChild())
        {
            TraverseLayoutBackward(layoutEnumerator, depth + 1);
            layoutEnumerator.MoveParent();
        }
    } while (layoutEnumerator.MovePrevious());
}

/// <summary>
/// Enumerate through layoutEnumerator's layout entity collection front-to-back,
/// in a depth-first manner, and in the "Logical" order.
/// </summary>
private static void TraverseLayoutForwardLogical(LayoutEnumerator layoutEnumerator, int depth)
{
    do
    {
        PrintCurrentEntity(layoutEnumerator, depth);

        if (layoutEnumerator.MoveFirstChild())
        {
            TraverseLayoutForwardLogical(layoutEnumerator, depth + 1);
            layoutEnumerator.MoveParent();
        }
    } while (layoutEnumerator.MoveNextLogical());
}

/// <summary>
/// Enumerate through layoutEnumerator's layout entity collection back-to-front,
/// in a depth-first manner, and in the "Logical" order.
/// </summary>
private static void TraverseLayoutBackwardLogical(LayoutEnumerator layoutEnumerator, int depth)
{
    do
    {
        PrintCurrentEntity(layoutEnumerator, depth);

        if (layoutEnumerator.MoveLastChild())
        {
            TraverseLayoutBackwardLogical(layoutEnumerator, depth + 1);
            layoutEnumerator.MoveParent();
        }
    } while (layoutEnumerator.MovePreviousLogical());
}

/// <summary>
/// Print information about layoutEnumerator's current entity to the console, while indenting the text with tab characters
/// based on its depth relative to the root node that we provided in the constructor LayoutEnumerator instance.
/// The rectangle that we process at the end represents the area and location that the entity takes up in the document.
/// </summary>
private static void PrintCurrentEntity(LayoutEnumerator layoutEnumerator, int indent)
{
    string tabs = new string('\t', indent);

    Console.WriteLine(layoutEnumerator.Kind == string.Empty
        ? $"{tabs}-> Entity type: {layoutEnumerator.Type}"
        : $"{tabs}-> Entity type & kind: {layoutEnumerator.Type}, {layoutEnumerator.Kind}");

    // Only spans can contain text.
    if (layoutEnumerator.Type == LayoutEntityType.Span)
        Console.WriteLine($"{tabs}   Span contents: \"{layoutEnumerator.Text}\"");

    RectangleF leRect = layoutEnumerator.Rectangle;
    Console.WriteLine($"{tabs}   Rectangle dimensions {leRect.Width}x{leRect.Height}, X={leRect.X} Y={leRect.Y}");
    Console.WriteLine($"{tabs}   Page {layoutEnumerator.PageIndex}");
}

Hi,

I used the code snippet provided by you and was able to get location (Top, Left) of the all the entities inluding footer entities.

But if you look through previous posts in the thread, we are using tokens with <<>> which we are setting as bookmarks.

The latest snippet provided by you is giving entities but I’m not able to distinguish which bookmark should correspond with which entity or how I will find the token locations as the code provided by you will traverse through entire document and give all the entities in it.

Is there any way we can use Aspose.Words.Bookmark.BookmarkStart to identfy the token entities which are also present in throughout the document including footers as that is our primary requirement, or if there is any other way to get token locations which are also present in footers or headers.

Thanks in advance,
Hemant

@hemant_thote

You can use the same code to find the bookmark where LayoutEntityType is Span and LayoutEnumerator.Kind is “BOOKMARKSTART”. You can put following condition in PrintCurrentEntity method to get the bookmark position.

if (layoutEnumerator.Type == LayoutEntityType.Span && layoutEnumerator.Kind == "BOOKMARKSTART")
{
    RectangleF leRect = layoutEnumerator.Rectangle;
    Console.WriteLine($"{tabs}   Rectangle dimensions {leRect.Width}x{leRect.Height}, X={leRect.X} Y={leRect.Y}");
    Console.WriteLine($"{tabs}   Page {layoutEnumerator.PageIndex}");
}