Add Pagenumber hyperlink in document

Hi,
I have a requirement to find out page numbers of certain keyword in the document, add pagenumber hyperlink to end of the document which should take me to respective pages.

Once I found the keyword, I need to insert text in last page saying - "This Keyword exists in pages: 2, 34, 68, 70 (2,34,68 and 70 are the pages numbers where keyword exists). These page numbers should be hyperlink and on click of it, it should take me to respective pages.

I am able to find the keyword in the document using ā€œIReplacingCallbackā€ present in Aspose.Words.Replacing. I would need help in forming the page hyperlinks on click of which it should take me to respective pages. Thanks !

@KCSR You can wrap keywords with bookmarks then use LayoutCollector to determine page index where the bookmarks are located and DocumentBuilder.InsertHyperlink to insert hyperlink to the bookmark. Here is simple code that demonstrates the basic technique:

Document doc = new Document(@"C:\Temp\in.docx");
// Wrap keyword with bookmakrs
ReplaceEvaluatorWrapWithBookmark wrapWithBookmarkCallback = new ReplaceEvaluatorWrapWithBookmark();
FindReplaceOptions opt = new FindReplaceOptions();
opt.ReplacingCallback= wrapWithBookmarkCallback;
doc.Range.Replace("test", "", opt);

// Now we can use LayoutCollector to determine page indices where "test" word occurs.
LayoutCollector collector = new LayoutCollector(doc);

// DocumentBuilder will be used to insert hyperlinks.
DocumentBuilder builder = new DocumentBuilder(doc);
builder.MoveToDocumentEnd();
builder.Writeln();
builder.Write("Word 'test' occurs on ");
foreach (string bkName in wrapWithBookmarkCallback.Bookmarks)
{
    int pageIndex = collector.GetStartPageIndex(doc.Range.Bookmarks[bkName].BookmarkStart);
    // Insert hyperlink to bookmark.
    builder.Font.StyleIdentifier = StyleIdentifier.Hyperlink;
    builder.InsertHyperlink(pageIndex.ToString(), bkName, true);
    builder.Font.ClearFormatting();

    if (wrapWithBookmarkCallback.Bookmarks.Last() != bkName)
        builder.Write(", ");
}
builder.Write(" page(s)");

doc.Save(@"C:\Temp\out.docx");
internal class ReplaceEvaluatorWrapWithBookmark : IReplacingCallback
{
    /// <summary>
    /// This method is called by the Aspose.Words find and replace engine for each match.
    /// </summary>
    ReplaceAction IReplacingCallback.Replacing(ReplacingArgs e)
    {
        Document doc = (Document)e.MatchNode.Document;

        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = e.MatchNode;

        // The first (and may be the only) run can contain text before the match, 
        // in this case it is necessary to split the run.
        if (e.MatchOffset > 0)
            currentNode = SplitRun((Run)currentNode, e.MatchOffset);

        // This array is used to store all nodes of the match for further deleting.
        List<Run> runs = new List<Run>();

        // Find all runs that contain parts of the match string.
        int remainingLength = e.Match.Value.Length;
        while (
            remainingLength > 0 &&
            currentNode != null &&
            currentNode.GetText().Length <= remainingLength)
        {
            runs.Add((Run)currentNode);
            remainingLength -= currentNode.GetText().Length;

            // Select the next Run node.
            // Have to loop because there could be other nodes such as BookmarkStart etc.
            do
            {
                currentNode = currentNode.NextSibling;
            } while (currentNode != null && currentNode.NodeType != NodeType.Run);
        }

        // Split the last run that contains the match if there is any text left.
        if (currentNode != null && remainingLength > 0)
        {
            SplitRun((Run)currentNode, remainingLength);
            runs.Add((Run)currentNode);
        }

        // Generate an unique bookmakr name. Another approach can be used.
        string bookmarkName = "_" + Guid.NewGuid().ToString();
        while (doc.Range.Bookmarks[bookmarkName] != null)
            bookmarkName += "_" + Guid.NewGuid().ToString();

        // Insert a bookmakr around the matched text
        BookmarkStart start = new BookmarkStart(doc, bookmarkName);
        BookmarkEnd end = new BookmarkEnd(doc, bookmarkName);

        runs[0].ParentNode.InsertBefore(start, runs[0]);
        runs[runs.Count - 1].ParentNode.InsertAfter(end, runs[runs.Count - 1]);

        Bookmarks.Add(bookmarkName);

        // Signal to the replace engine to do nothing because we have already done all what we wanted.
        return ReplaceAction.Skip;
    }

    private static Run SplitRun(Run run, int position)
    {
        Run afterRun = (Run)run.Clone(true);
        run.ParentNode.InsertAfter(afterRun, run);
        afterRun.Text = run.Text.Substring(position);
        run.Text = run.Text.Substring((0), (0) + (position));
        return afterRun;
    }

    public List<string> Bookmarks
    {
        get { return mBookmarks; }
    }

    private List<string> mBookmarks = new List<string>();
}
1 Like

Thanks Alexey, this was helpful!

1 Like

@alexey.noskov , the below lines seem to be time consuming, it took lot of time when we have too many occurrences of search text in the user document

int pageIndex = collector.GetStartPageIndex(doc.Range.Bookmarks[bkName].BookmarkStart);

Could you please let me know if we have any other efficient way to find the pagenumber of the Node?

Regards,
Chetan

@alexey.noskov - I tried using the below option also and this also seems to be time consuming. I have 350 different search nodes in 99 pages document. It is taking 2min 20 sec to find page numbers of all 350 nodes in 99 pages which seems to be long time.

string currentNodePageIndex = GetBookmarkPage(currentNode);
private string GetBookmarkPage(Node currentNode = null)
{
    //Get page number
    string pageNumber = string.Empty;
    Aspose.Words.DocumentBuilder builder = new DocumentBuilder((Aspose.Words.Document)currentNode.Document);
    Field page = null;
    try
    {
        if (currentNode != null)
        {
            builder.MoveTo(currentNode);
        }

        page = builder.InsertField("PAGE");

        builder.Document.UpdatePageLayout();

        page.Update();

        pageNumber = page.Result;
        // Remove PAGE field.

        page.Remove();
        return pageNumber;
    }
    catch (Exception ex)
    {
        LmoLogger.Error(ex);
        return pageNumber;
    }
    finally
    {
        page = null;
    }
}

@KCSR When you create LayoutCollector Aspose.Words rebuilds document layout, which is quite resource consuming operation. As you may know MS Word documents are flow documents and do not contain any information about document layout. To calculate page number of particular node it is required to build document layout.

The code is much more resource consuming, since builder.Document.UpdatePageLayout(); is called for each bookmark. Calling this method forces Aspose.Words to rebuild document layout, this, as I have mentioned, is quite resource consuming operation.

1 Like