Bookmarking all the static text inside a document

Hello Team,
I will be uploading an document which will contain static text and aspose syntax in it and I want tp iterate through each and every line even for table…

example

prop.check  = <<[prop.check]>>

prop.example = <<[prop.example]>>

prop.abc = <<[prop.abc]>>

**Lorem Ipsum** is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book

If i upload a document with this texts in it i want to create bookmark for prop.check = ,prop.example =, prop.abc = and

**Lorem Ipsum** is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book

these are static text so i want to create bookmark for each continuous text…and assign guid is there i can do this in aspose word…

means for all continuous static text i want separate separate bookmarks and not for aspose syntax

@RajKumar1234

To create bookmarks for all continuous static text in a document using Aspose.Words, you can follow these steps:

  1. Initialize the Document and DocumentBuilder: Start by creating a new document and initializing the DocumentBuilder, which will allow you to add content and bookmarks.

  2. Iterate Through the Document: You will need to iterate through the document’s content, including tables, to identify the static text that you want to bookmark. You can use a combination of text extraction and regular expressions to find the static text patterns you mentioned (e.g., prop.check, prop.example, etc.).

  3. Create Bookmarks: For each piece of static text identified, you can create a bookmark using the StartBookmark and EndBookmark methods of the DocumentBuilder. Make sure to assign a unique name (like a GUID) to each bookmark.

Here is a sample code snippet demonstrating how to achieve this:

using Aspose.Words;
using Aspose.Words.Tables;

public void CreateBookmarksForStaticText(string filePath)
{
    // Load the document
    Document doc = new Document(filePath);
    DocumentBuilder builder = new DocumentBuilder(doc);

    // Iterate through all paragraphs and tables in the document
    foreach (Node node in doc.GetChildNodes(NodeType.Paragraph, true))
    {
        Paragraph paragraph = (Paragraph)node;
        string text = paragraph.GetText();

        // Check for static text patterns
        if (text.Contains("prop.check") || text.Contains("prop.example") || text.Contains("prop.abc"))
        {
            // Create a bookmark for the static text
            builder.MoveTo(paragraph);
            builder.StartBookmark("Bookmark_" + Guid.NewGuid());
            builder.Write(text);
            builder.EndBookmark("Bookmark_" + Guid.NewGuid());
        }
    }

    // Save the modified document
    doc.Save("OutputWithBookmarks.docx");
}

Explanation:

  • The code loads a document and initializes a DocumentBuilder.
  • It iterates through all paragraphs in the document, checking for specific static text patterns.
  • For each match, it creates a bookmark around the static text, assigning a unique name using a GUID.
  • Finally, it saves the modified document with the bookmarks included.

This approach ensures that you create separate bookmarks for each piece of continuous static text while ignoring any Aspose syntax.

For more detailed information on working with bookmarks in Aspose.Words, you can refer to the official documentation on Working with Bookmarks and Creating Bookmarks.

Sources:

[1]: Working with Bookmarks
[2]: Create Bookmark In Word Document

Its not every time i have the same document I can add many aspose holders and static text…I just gave a example above

@RajKumar1234 Could you please attach your actual input document and expected output? This will help us to better understand your requirements. We will check the documents and provide you more information.

Input Document.docx (81.7 KB)

and all the continuous static content and text should be bookmarked

Expected output document.docx (80.0 KB)

for tables if it has static text or image we can only bookmark that content not entire cell

@RajKumar1234 You can try using the following code:

Document doc = new Document(@"C:\Temp\in.docx");

// Make each tag to be represented as a single Run.
FindReplaceOptions opt = new FindReplaceOptions();
opt.UseSubstitutions = true;
// NOTE: For demonstration purposes a simple regex is used, it might not cover all possible cases.
Regex tagRegex = new Regex(@"<<\[.*?\]>>");
doc.Range.Replace(tagRegex, "$0", opt);

int bkIndex = 0;
foreach (Paragraph p in doc.GetChildNodes(NodeType.Paragraph, true))
{
    string bkName = "";
    string paraText = p.ToString(SaveFormat.Text);
    if (!tagRegex.IsMatch(paraText))
    {
        // Wrap whole paragraph into a bookmark.
        bkName = $"bk_{bkIndex++}";
        p.PrependChild(new BookmarkStart(doc, bkName));
        p.AppendChild(new BookmarkEnd(doc, bkName));
    }
    else
    {
        // Get Run nodes that represent tags.
        List<Run> tagRuns = p.GetChildNodes(NodeType.Run, true).Cast<Run>()
            .Where(r => tagRegex.IsMatch(r.Text)).ToList();

        // Wrap text before the first tag to bookmark.
        Run firstTag = tagRuns.First();
        if (p.FirstChild != firstTag)
        {
            bkName = $"bk_{bkIndex++}";
            p.PrependChild(new BookmarkStart(doc, bkName));
            p.InsertBefore(new BookmarkEnd(doc, bkName), firstTag);
        }
        // Wrap contnet betwene tags.
        for (int i = 1; i < tagRuns.Count; i++)
        {
            Run nextTag = tagRuns[i];
            if (firstTag.NextSibling != nextTag)
            {
                bkName = $"bk_{bkIndex++}";
                p.InsertAfter(new BookmarkStart(doc, bkName), firstTag);
                p.InsertBefore(new BookmarkEnd(doc, bkName), nextTag);
            }
            firstTag = nextTag;
        }
        // Wrap content after the last tag.
        if (p.LastChild != firstTag)
        {
            bkName = $"bk_{bkIndex++}";
            p.InsertAfter(new BookmarkStart(doc, bkName), firstTag);
            p.AppendChild(new BookmarkEnd(doc, bkName));
        }
    }
}
doc.Save(@"C:\Temp\out.docx");
1 Like

Hello @alexey.noskov

OutputDocument.docx (81.6 KB)

This is the document generated with above code and I can see multiple BK without content been created…
eg - bk_3, bk_5,bk_7 without any content I don’t want those and al

@RajKumar1234 The above code is an example that demonstrates the basic technique. You are free to modify it according to your requirements. The bookmarks bk_3, bk_5,bk_7 are create for empty paragraphs. Just add a condition to skip empty paragraphs to the code to avoid this.

1 Like

I need help in adding the same logic for header and footer… Could u please help me out there

If possible give me code reference so that I can modify if needed

@RajKumar1234 Here is the modified code:

Document doc = new Document(@"C:\Temp\in.docx");

// Make each tag to be represented as a single Run.
FindReplaceOptions opt = new FindReplaceOptions();
opt.UseSubstitutions = true;
// NOTE: For demonstration purposes a simple regex is used, it might not cover all possible cases.
Regex tagRegex = new Regex(@"<<\[.*?\]>>");
doc.Range.Replace(tagRegex, "$0", opt);

int bkIndex = 0;
foreach (Paragraph p in doc.GetChildNodes(NodeType.Paragraph, true))
{
    // Skip empty paragraphs.
    if (!p.HasChildNodes)
        continue;

    string bkName = "";
    string paraText = p.ToString(SaveFormat.Text);
    if (!tagRegex.IsMatch(paraText))
    {
        // Wrap whole paragraph into a bookmark.
        bkName = $"bk_{bkIndex++}";
        p.PrependChild(new BookmarkStart(doc, bkName));
        p.AppendChild(new BookmarkEnd(doc, bkName));
    }
    else
    {
        // Get Run nodes that represent tags.
        List<Run> tagRuns = p.GetChildNodes(NodeType.Run, true).Cast<Run>()
            .Where(r => tagRegex.IsMatch(r.Text)).ToList();

        // Wrap text before the first tag to bookmark.
        Run firstTag = tagRuns.First();
        if (p.FirstChild != firstTag)
        {
            bkName = $"bk_{bkIndex++}";
            p.PrependChild(new BookmarkStart(doc, bkName));
            p.InsertBefore(new BookmarkEnd(doc, bkName), firstTag);
        }
        // Wrap contnet betwene tags.
        for (int i = 1; i < tagRuns.Count; i++)
        {
            Run nextTag = tagRuns[i];
            if (firstTag.NextSibling != nextTag)
            {
                bkName = $"bk_{bkIndex++}";
                p.InsertAfter(new BookmarkStart(doc, bkName), firstTag);
                p.InsertBefore(new BookmarkEnd(doc, bkName), nextTag);
            }
            firstTag = nextTag;
        }
        // Wrap content after the last tag.
        if (p.LastChild != firstTag)
        {
            bkName = $"bk_{bkIndex++}";
            p.InsertAfter(new BookmarkStart(doc, bkName), firstTag);
            p.AppendChild(new BookmarkEnd(doc, bkName));
        }
    }
}
doc.Save(@"C:\Temp\out.docx");

The following condition has been added:

// Skip empty paragraphs.
if (!p.HasChildNodes)
    continue;

The above code process whole document including headers/footers, so no modifications are required.

1 Like

Hello @alexey.noskov
If I want to replace this bookmark with another bookmark having same name in another document.
How can I do for entire document.
I tried Extractcontent it was not working for cells. If I try to replace a extracted content for cell its not working.
Is there any other way to replace the entire bookmark content with another bookmark content. It can contain Images also so I can’t use Text=“” I hope.
Give me a solution for this

@RajKumar1234 I am afraid, there is no other way to replace bookmarks content between the documents. You can use Bookmark.Text property for getting/setting only textual content ignoring it’s formatting.