Hi Team,
Able to read all data between bookmark and tag. But one of bookmark data is not able to read
Document doc = new Document(fname);
DocumentBuilder builder = new DocumentBuilder(doc);
builder.MoveToDocumentEnd();
builder.StartBookmark("DocumentEnd");
builder.EndBookmark("DocumentEnd");
BookmarkCollection bmCollection = doc.Range.Bookmarks;
string bookmarkName = string.Empty;
string bookmarkHtml = string.Empty;
for (int k = 0; k < bmCollection.Count; k++)
{
Bookmark bk1 = doc.Range.Bookmarks[k];
if (bk1.Name.Contains("_"))
{
bk1.Remove();
}
}
for (int k = 0; k < bmCollection.Count; k++)
{
Bookmark bm = bmCollection[k];
bookmarkName = bm.Name;
Bookmark bk1 = doc.Range.Bookmarks[k];
Bookmark bk2 = doc.Range.Bookmarks[k + 1];
// Get bookmarks parent SDTs
StructuredDocumentTag startTag = (StructuredDocumentTag)bk1.BookmarkEnd.GetAncestor(NodeType.StructuredDocumentTag);
StructuredDocumentTag endTag = (StructuredDocumentTag)bk2.BookmarkStart.GetAncestor(NodeType.StructuredDocumentTag);
// Extract content between the first bookmark end and the second bookmark start.
List<Node> extractedNodes = null;
try
{
extractedNodes = ExtractContentHelper.ExtractContent(startTag, endTag, false);
}
catch
{
continue;
}
Document dstDoc = ExtractContentHelper.GenerateDocument(doc, extractedNodes);
StringBuilder sb = new StringBuilder();
sb.Append(dstDoc.ToString(new HtmlSaveOptions() { PrettyFormat = true }));
string plainText = GetPlainTextFromHTML(sb.ToString());
plainText = plainText.Replace("\r", "");
plainText = plainText.Replace("\n", "");
plainText = plainText.Replace("\t", "");
plainText = plainText.Replace(" ", "");
if (bookmarkName == "DetailedDescriptionOfKeyRatingDrivers")
{
obj.KeyRatingDrivers = sb.ToString();
}
}
Data between “Detailed Description of Key Rating Drivers” and Liquidity not able to read
Can you please suggest what should i make changes in code to resolve this issue
Please find document for reference:
TestRAC.zip (30.5 KB)
@pravinghadge
It seems you are encountering an issue while trying to read data between specific bookmarks in your Word document using Aspose.Words for .NET. Specifically, you mentioned that the data between the bookmarks named “DetailedDescriptionOfKeyRatingDrivers” and “Liquidity” is not being read correctly.
Here are a few suggestions to help you troubleshoot and potentially resolve this issue:
-
Check Bookmark Existence: Ensure that both bookmarks are correctly defined in your Word document. You can do this by iterating through the BookmarkCollection
and printing out the names of all bookmarks to verify that they exist and are correctly named.
-
Bookmark Indexing: When accessing bookmarks by index, ensure that you are not exceeding the bounds of the BookmarkCollection
. If you are trying to access bk2
using k + 1
, make sure that k
is not the last index in the collection.
-
Extract Content Logic: The logic you are using to extract content between bookmarks may need to be adjusted. Ensure that the ExtractContentHelper.ExtractContent
method is correctly implemented to handle cases where there might be no content or where the bookmarks are not adjacent.
-
Debugging: Add debugging statements to log the content being extracted. This can help you identify if the issue is with the extraction logic or if the content is simply not present between the specified bookmarks.
-
Handling Empty Content: If there is a possibility that there is no content between the bookmarks, you might want to handle that case explicitly in your code to avoid exceptions.
Here is a modified snippet of your code that includes a check for the existence of bookmarks and handles potential out-of-bounds access:
BookmarkCollection bmCollection = doc.Range.Bookmarks;
for (int k = 0; k < bmCollection.Count; k++)
{
Bookmark bk1 = bmCollection[k];
if (bk1.Name.Contains("_"))
{
bk1.Remove();
}
}
for (int k = 0; k < bmCollection.Count - 1; k++) // Adjusted to prevent out-of-bounds
{
Bookmark bk1 = bmCollection[k];
Bookmark bk2 = bmCollection[k + 1];
// Ensure both bookmarks are valid
if (bk1 != null && bk2 != null)
{
// Extract content logic
StructuredDocumentTag startTag = (StructuredDocumentTag)bk1.BookmarkEnd.GetAncestor(NodeType.StructuredDocumentTag);
StructuredDocumentTag endTag = (StructuredDocumentTag)bk2.BookmarkStart.GetAncestor(NodeType.StructuredDocumentTag);
List<Node> extractedNodes = ExtractContentHelper.ExtractContent(startTag, endTag, false);
Document dstDoc = ExtractContentHelper.GenerateDocument(doc, extractedNodes);
// Process extracted content
StringBuilder sb = new StringBuilder();
sb.Append(dstDoc.ToString(new HtmlSaveOptions() { PrettyFormat = true }));
string plainText = GetPlainTextFromHTML(sb.ToString());
// Clean up plainText as needed
}
}
By implementing these suggestions, you should be able to better diagnose the issue and ensure that the content between your bookmarks is read correctly. If the problem persists, consider sharing the specific Word document for further analysis.
Thank you so much for your quick reply
I have tried but same issue
Attaching document for your reference
TestRAC.zip (30.5 KB)
@pravinghadge As I can see content is extracted properly. I tested with the following simplified code:
Document doc = new Document(@"C:\Temp\in.docx");
foreach(Bookmark bk in doc.Range.Bookmarks)
Console.WriteLine(bk.Name);
Bookmark bk1 = doc.Range.Bookmarks["DetailedDescriptionOfKeyRatingDrivers"];
Bookmark bk2 = doc.Range.Bookmarks["Liquidity"];
// Get bookmarks parent SDTs
StructuredDocumentTag startTag = (StructuredDocumentTag)bk1.BookmarkEnd.GetAncestor(NodeType.StructuredDocumentTag);
StructuredDocumentTag endTag = (StructuredDocumentTag)bk2.BookmarkStart.GetAncestor(NodeType.StructuredDocumentTag);
List<Node> extractedNodes = ExtractContentHelper.ExtractContent(startTag, endTag, false);
Document dstDoc = ExtractContentHelper.GenerateDocument(doc, extractedNodes);
dstDoc.Save(@"C:\Temp\out.docx");
out.docx (19.9 KB)
Thanks for reply.
Noted that after hardcoding of bookmarks resolve my issue
1 Like