Extract the content of Bookmark

Hello Team,
I want to extract the content between Bookmark start and end and Bookmark is in header and footer. It can contain image anything not only text give me some code for the same

To extract the content between a bookmark’s start and end points in a document, including images and other elements, you can use the Aspose.Words library. Below is a code example in C# that demonstrates how to achieve this:

using Aspose.Words;
using Aspose.Words.Bookmarks;
using System.Collections.Generic;

public class BookmarkContentExtractor
{
    public void ExtractContentBetweenBookmarks(string filePath, string bookmarkName)
    {
        // Load the document
        Document doc = new Document(filePath);

        // Get the bookmark by name
        Bookmark bookmark = doc.Range.Bookmarks[bookmarkName];
        if (bookmark != null)
        {
            // Get the start and end nodes of the bookmark
            Node bookmarkStart = bookmark.BookmarkStart;
            Node bookmarkEnd = bookmark.BookmarkEnd;

            // Create a list to hold the extracted nodes
            List<Node> extractedNodes = new List<Node>();

            // Extract content between the bookmark start and end
            Node currentNode = bookmarkStart;
            while (currentNode != null && currentNode != bookmarkEnd)
            {
                extractedNodes.Add(currentNode);
                currentNode = currentNode.NextSibling;
            }
            // Add the end node as well
            extractedNodes.Add(bookmarkEnd);

            // Create a new document to hold the extracted content
            Document extractedDoc = new Document();
            foreach (Node node in extractedNodes)
            {
                // Import each node into the new document
                Node importedNode = extractedDoc.ImportNode(node, true);
                extractedDoc.FirstSection.Body.AppendChild(importedNode);
            }

            // Save the extracted content to a new file
            extractedDoc.Save("ExtractedContent.docx");
        }
        else
        {
            Console.WriteLine("Bookmark not found.");
        }
    }
}

Explanation:

  1. Load the Document: The document is loaded from the specified file path.
  2. Get the Bookmark: The bookmark is retrieved using its name.
  3. Extract Nodes: A loop is used to collect all nodes between the bookmark’s start and end nodes.
  4. Create New Document: A new document is created, and the extracted nodes are imported into it.
  5. Save the Document: Finally, the new document containing the extracted content is saved.

Make sure to replace "ExtractedContent.docx" with your desired output file name. This code will include all content between the specified bookmark, including images and other elements, in the new document.

For more details on working with bookmarks, you can refer to the official documentation here.

@Raghul214 You can extract content from header/footer by copying the header/footer content into the main body of empty document and then extract content from the main body of the temporary document using the approach described here:
https://docs.aspose.com/words/net/how-to-extract-selected-content-between-nodes-in-a-document/

Shall I get the code for same

@Raghul214 Sure, here is a simple code example:

Document doc = new Document(@"C:\Temp\in.docx");
// Get header/footer.
HeaderFooter hf = doc.FirstSection.HeadersFooters[HeaderFooterType.HeaderPrimary];
// Move content of header footer to a temporary document.
Document tmp = (Document)doc.Clone(false);
tmp.EnsureMinimum();
NodeImporter importer = new NodeImporter(doc, tmp, ImportFormatMode.UseDestinationStyles);
foreach (Node n in hf.GetChildNodes(NodeType.Any, false))
    tmp.FirstSection.Body.AppendChild(importer.ImportNode(n, true));

// Get content of the bookmark.
Bookmark bk = tmp.Range.Bookmarks["test"];
List<Node> extractedNodes = ExtractContentHelper.ExtractContent(bk.BookmarkStart, bk.BookmarkEnd, true);
Document extactedDocument = ExtractContentHelper.GenerateDocument(tmp, extractedNodes);

extactedDocument.Save(@"C:\Temp\out.docx");

I am using the same piece of code But in the tmp document the content of header is not copied

@Raghul214 Could you please attach your input document here for testing? We will check it and provide you more information.

GeneratedDocument.docx (179.0 KB)

This is an input document

@Raghul214 The code works fine with your document. Here is the code with specified bookmark name:

Document doc = new Document(@"C:\Temp\in.docx");
// Get header/footer.
HeaderFooter hf = doc.FirstSection.HeadersFooters[HeaderFooterType.HeaderPrimary];
// Move content of header footer to a temporary document.
Document tmp = (Document)doc.Clone(false);
tmp.EnsureMinimum();
NodeImporter importer = new NodeImporter(doc, tmp, ImportFormatMode.UseDestinationStyles);
foreach (Node n in hf.GetChildNodes(NodeType.Any, false))
    tmp.FirstSection.Body.AppendChild(importer.ImportNode(n, true));

// Get content of the bookmark.
Bookmark bk = tmp.Range.Bookmarks["HeaderBM"];
List<Node> extractedNodes = ExtractContentHelper.ExtractContent(bk.BookmarkStart, bk.BookmarkEnd, true);
Document extactedDocument = ExtractContentHelper.GenerateDocument(tmp, extractedNodes);

extactedDocument.Save(@"C:\Temp\out.docx");

GeneratedDocument.docx (179.0 KB)
out.docx (10.3 KB)

Not sure let me check from my end

1 Like