Extract the content of Bookmark

Raghul214 · August 12, 2024, 5:27am

Hello Team,
I want to extract the content between Bookmark start and end and Bookmark is in header and footer. It can contain image anything not only text give me some code for the same

Professionalize.Discourse · August 12, 2024, 5:27am

To extract the content between a bookmark’s start and end points in a document, including images and other elements, you can use the Aspose.Words library. Below is a code example in C# that demonstrates how to achieve this:

using Aspose.Words;
using Aspose.Words.Bookmarks;
using System.Collections.Generic;

public class BookmarkContentExtractor
{
    public void ExtractContentBetweenBookmarks(string filePath, string bookmarkName)
    {
        // Load the document
        Document doc = new Document(filePath);

        // Get the bookmark by name
        Bookmark bookmark = doc.Range.Bookmarks[bookmarkName];
        if (bookmark != null)
        {
            // Get the start and end nodes of the bookmark
            Node bookmarkStart = bookmark.BookmarkStart;
            Node bookmarkEnd = bookmark.BookmarkEnd;

            // Create a list to hold the extracted nodes
            List<Node> extractedNodes = new List<Node>();

            // Extract content between the bookmark start and end
            Node currentNode = bookmarkStart;
            while (currentNode != null && currentNode != bookmarkEnd)
            {
                extractedNodes.Add(currentNode);
                currentNode = currentNode.NextSibling;
            }
            // Add the end node as well
            extractedNodes.Add(bookmarkEnd);

            // Create a new document to hold the extracted content
            Document extractedDoc = new Document();
            foreach (Node node in extractedNodes)
            {
                // Import each node into the new document
                Node importedNode = extractedDoc.ImportNode(node, true);
                extractedDoc.FirstSection.Body.AppendChild(importedNode);
            }

            // Save the extracted content to a new file
            extractedDoc.Save("ExtractedContent.docx");
        }
        else
        {
            Console.WriteLine("Bookmark not found.");
        }
    }
}

Explanation:

Load the Document: The document is loaded from the specified file path.
Get the Bookmark: The bookmark is retrieved using its name.
Extract Nodes: A loop is used to collect all nodes between the bookmark’s start and end nodes.
Create New Document: A new document is created, and the extracted nodes are imported into it.
Save the Document: Finally, the new document containing the extracted content is saved.

Make sure to replace "ExtractedContent.docx" with your desired output file name. This code will include all content between the specified bookmark, including images and other elements, in the new document.

For more details on working with bookmarks, you can refer to the official documentation here.

alexey.noskov · August 12, 2024, 5:35am

@Raghul214 You can extract content from header/footer by copying the header/footer content into the main body of empty document and then extract content from the main body of the temporary document using the approach described here:
https://docs.aspose.com/words/net/how-to-extract-selected-content-between-nodes-in-a-document/

Raghul214 · August 12, 2024, 5:45am

Shall I get the code for same

alexey.noskov · August 12, 2024, 5:52am

@Raghul214 Sure, here is a simple code example:

Document doc = new Document(@"C:\Temp\in.docx");
// Get header/footer.
HeaderFooter hf = doc.FirstSection.HeadersFooters[HeaderFooterType.HeaderPrimary];
// Move content of header footer to a temporary document.
Document tmp = (Document)doc.Clone(false);
tmp.EnsureMinimum();
NodeImporter importer = new NodeImporter(doc, tmp, ImportFormatMode.UseDestinationStyles);
foreach (Node n in hf.GetChildNodes(NodeType.Any, false))
    tmp.FirstSection.Body.AppendChild(importer.ImportNode(n, true));

// Get content of the bookmark.
Bookmark bk = tmp.Range.Bookmarks["test"];
List<Node> extractedNodes = ExtractContentHelper.ExtractContent(bk.BookmarkStart, bk.BookmarkEnd, true);
Document extactedDocument = ExtractContentHelper.GenerateDocument(tmp, extractedNodes);

extactedDocument.Save(@"C:\Temp\out.docx");

Raghul214 · August 12, 2024, 6:49am

I am using the same piece of code But in the tmp document the content of header is not copied

alexey.noskov · August 12, 2024, 7:25am

@Raghul214 Could you please attach your input document here for testing? We will check it and provide you more information.

Raghul214 · August 12, 2024, 7:26am

GeneratedDocument.docx (179.0 KB)

Raghul214 · August 12, 2024, 7:26am

This is an input document

alexey.noskov · August 12, 2024, 7:28am

@Raghul214 The code works fine with your document. Here is the code with specified bookmark name:

Document doc = new Document(@"C:\Temp\in.docx");
// Get header/footer.
HeaderFooter hf = doc.FirstSection.HeadersFooters[HeaderFooterType.HeaderPrimary];
// Move content of header footer to a temporary document.
Document tmp = (Document)doc.Clone(false);
tmp.EnsureMinimum();
NodeImporter importer = new NodeImporter(doc, tmp, ImportFormatMode.UseDestinationStyles);
foreach (Node n in hf.GetChildNodes(NodeType.Any, false))
    tmp.FirstSection.Body.AppendChild(importer.ImportNode(n, true));

// Get content of the bookmark.
Bookmark bk = tmp.Range.Bookmarks["HeaderBM"];
List<Node> extractedNodes = ExtractContentHelper.ExtractContent(bk.BookmarkStart, bk.BookmarkEnd, true);
Document extactedDocument = ExtractContentHelper.GenerateDocument(tmp, extractedNodes);

extactedDocument.Save(@"C:\Temp\out.docx");

GeneratedDocument.docx (179.0 KB)
out.docx (10.3 KB)

Raghul214 · August 12, 2024, 7:30am

Not sure let me check from my end

dmerkle1 · April 21, 2025, 4:47pm

Hi @alexey.noskov,

The Ai’s code above works pretty well, but in the case when the bookmark has a Heading and then other paragraphs as part of the bookmark, I’m finding that the Heading does not have a “NextSibiling” so it does not get the other nodes. Any ideas for this?

alexey.noskov · April 21, 2025, 5:09pm

@dmerkle1 NextSibling returns null when the reference node is the last child node of it’s parent node. You can use DocumentExplorer demo project to inspect your input document structure.

dmerkle1 · April 21, 2025, 5:34pm

@alexey.noskov thanks for the response. So is using the while loop and checking next sibiling starting with bookmark.start the best way to do this without using the DocumentExplorer?

alexey.noskov · April 22, 2025, 4:24am

@dmerkle1 DocumentExplorer is a demo project intended to help to understand internal document structure.

Yes, you should check next sibling for null in the loop to make sure there is next sibling.