How to read content inside word bookmark in c#

Hi Team, i am trying to get data inside of bookmark . But unable to get

Please find attached word document
Bookmarkword.zip (19.6 KB)

Code:

Document doc = new Document(fname);
DocumentBuilder builder = new DocumentBuilder(doc);

foreach (Bookmark bookmark in doc.Range.Bookmarks)
    if (bookmark.Name.StartsWith("_"))
        bookmark.Remove(); // remove hidden bookmarks        

builder.MoveToDocumentEnd();
builder.StartBookmark("DocumentEnd");
builder.EndBookmark("DocumentEnd");

BookmarkCollection bmCollection = doc.Range.Bookmarks;
string bookmarkName = string.Empty;
string bookmarkHtml = string.Empty;
for (int i = 0; i < bmCollection.Count - 1; i++)
{

    Bookmark bookmark = doc.Range.Bookmarks[i];


    bookmarkName = bookmark.Name;
    if (bookmarkName == "PR_Title")
    {
        obj.PressReleaseTitle = bookmark.Text; //sb.ToString();
    }
    else if (bookmarkName == "PR_Overview")
    {
        obj.Overview = bmCollection[i].Text;
    }
}

@pravinghadge Could you please elaborate your requirements a bit more? As I can see text of bookmarks are returned properly on my side. I used the following simple code for testing:

Document doc = new Document(@"C:\Temp\Bookmarkword.docx");
foreach(Bookmark bookmark in doc.Range.Bookmarks)
{
    Console.WriteLine(bookmark.Name);
    Console.WriteLine(bookmark.Text);
}

If you need to extract formatted bookmarked content you should extract content between appropriate BookmarkStart and BookmarkEnd nodes.

Hi @alexey.noskov,
Thanks for your reply
For Eg: For PR_Overview bookmark
I am getting “OVERVIEW” as result

But i need data which is inside of PR_Overview bookmark.

I have also changed code but still getting same output as “OVERVIEW”:

Bookmark bookmark = doc.Range.Bookmarks[i];
BookmarkStart start = doc.Range.Bookmarks[i].BookmarkStart;
BookmarkEnd end = doc.Range.Bookmarks[i].BookmarkEnd;
ArrayList nodes = ExtractContent(start, end, true);
Document htmlDoc = GenerateDocument(doc, nodes);
// htmlDoc.FirstSection.Body.FirstParagraph.Remove();
String sb = htmlDoc.ToString(SaveFormat.Html);

But i need following output:
Team has taken the following rating action on xyz Private Limited’s term loan:

@pravinghadge I have inspected your document and the bookmark PR_Overview contains only “OVERVIEW” text. Please see the appropriate XML fragment from your source document:

<w:bookmarkStart w:id="2" w:name="PR_Overview"/>
<w:bookmarkStart w:id="3" w:name="PR_RatingActionCommentary"/>
<w:r w:rsidRPr="00AC7613">
	<w:rPr>
		<w:rFonts w:ascii="Calibri" w:eastAsia="Calibri" w:hAnsi="Calibri" w:cs="Calibri"/>
		<w:b/>
		<w:szCs w:val="20"/>
	</w:rPr>
	<w:t>O</w:t>
</w:r>
<w:r w:rsidR="00295F93">
	<w:rPr>
		<w:rFonts w:ascii="Calibri" w:eastAsia="Calibri" w:hAnsi="Calibri" w:cs="Calibri"/>
		<w:b/>
		<w:szCs w:val="20"/>
	</w:rPr>
	<w:t>VERVIEW</w:t>
</w:r>
<w:bookmarkEnd w:id="2"/>

As you can see text between PR_Overview bookmark start and end is “OVERVIEW”.

Thanks for your reply

image.png (3.4 KB)

Can you please suggest ,then how to get highlighted data

Please check attached file

@pravinghadge The text that you would like to extract is in an editable rage that is right after the bookmark.

<w:bookmarkStart w:id="2" w:name="PR_Overview"/>
...
<w:bookmarkEnd w:id="2"/>
<w:permStart w:id="1267667890" w:edGrp="everyone"/>
...
<w:permEnd w:id="1267667890"/>

You can extract the content between the corresponding editable range start and end. For example see the following code:

Document doc = new Document(@"C:\Temp\Bookmarkword.docx");

Bookmark overview = doc.Range.Bookmarks["PR_Overview"];
EditableRangeStart rangeStart = (EditableRangeStart)overview.BookmarkEnd.NextSibling;
EditableRange editableRange = rangeStart.EditableRange;

List<Node> extractedNodes = ExtractContentHelper.ExtractContent(editableRange.EditableRangeStart, editableRange.EditableRangeEnd, true);

// Insert the content into a new separate document and save it to disk.
Document dstDoc = ExtractContentHelper.GenerateDocument(doc, extractedNodes);

dstDoc.Save(@"C:\Temp\out.docx");

Sources of ExtractContentHelper are available on GitHub.

Thank you so much it is working for PR_Overview.

But when i trying to read PR_CurrentRating, It is below giving error

"Unable to cast object of type ‘Aspose.Words.Run’ to type ‘Aspose.Words.EditableRangeStart’."

@pravinghadge In case of PR_CurrentRating bookmarks an editable range is not the immediate sibling of the bookmark end. You can use DocumentExplorer demo project to investigate Document Object Model of your document.


In this case you can use either NextSibling property or NextPreorder method to find the start of the editable range.

Document doc = new Document(@"C:\Temp\Bookmarkword.docx");

Bookmark bk = doc.Range.Bookmarks["PR_CurrentRating"];
// This code searches for the EditableRangeStart in the sibling nodes of bookmark end,
// if EditableRangeStart is not child of the same parent as BookmarkEnd, you can use NextPreOrder method to search for EditableRangeStart.
Node currentNode = bk.BookmarkEnd;
while (currentNode != null && currentNode.NodeType != NodeType.EditableRangeStart)
    currentNode = currentNode.NextSibling;

if (currentNode != null)
{
    EditableRangeStart rangeStart = (EditableRangeStart)currentNode;
    EditableRange editableRange = rangeStart.EditableRange;

    List<Node> extractedNodes = ExtractContentHelper.ExtractContent(editableRange.EditableRangeStart, editableRange.EditableRangeEnd, true);

    // Insert the content into a new separate document and save it to disk.
    Document dstDoc = ExtractContentHelper.GenerateDocument(doc, extractedNodes);

    dstDoc.Save(@"C:\Temp\out.docx");
}

Thank you so much for your support :slightly_smiling_face:

It is working fine now