Read Formatted Text from Bookmark to Database

Need solution soon…

Hi Allan,

Thanks for your inquiry. A complete Bookmark in a Microsoft Word document is consisting of a bookmark start character and bookmark end character. As mentioned earlier, you need to extract content enclosed between these bookmark start and bookmark end markers into a temporary Document and then get HTML representation of that document. I hope, this helps.

Best regards,

Thanks for reply,

It’ll be helpful if you give the code snippet in C#.

Regards
Allan

Hi Allan,

Thanks for your inquiry. The code mentioned in these articles is already in .NET (C#). If we can help you with anything else, please feel free to ask.

Best regards,

Hi:
Thanks again for your reply. However, I am getting a little frustrated because my manager is putting a lot of pressure on me and I have still not found (or received an answer) to my question. The documentation and all the examples (including the pointers you have provided) clearly show how to retrieve plain text or insert formatted text into a bookmark. However, there is nothing I can find to read formatted text from a bookmark when it contains

  1. New bookmark
  2. Hyperlink
  3. End of a paragraph

In going to your examples and documentation I finally stumbled upon this link
https://docs.aspose.com/words/net/aspose-words-for-net-11-10-0-release-notes//
which briefly mentions the problem that I am having - but no solution (at least that I can see). Am I am missing something here ? and perhaps you can still help.
I have attached a file describing the problem I am having.
Any help you can provide is appreciated.

Hi Allan,

Thanks for your inquiry. I have attached a sample project here for your reference. I hope, this helps in getting the formatted HTML string representation of a particular Bookmark’s content. Please let me know if I can be of any further assistance.

Best regards,

Thank you very much for your help. The sample program is working fine for me and I get the formatted text for all bookmarks. However there are two issues again.

  1. The starting point of each bookmark goes to next line when the bookmark is being filled with the formatted text.How to remove the paragraph nodes in order to avoid this issue?
  2. If I have the bookmarks inside a table, I can’t read those bookmarks and when doing so it throws an NULL REFERENCE ERROR since your code reads the text from paragraph nodes rather than table node.

Kindly let me know where I need to change the code in sample program in order to get bookmarks inside the table node.
Find the attachment for your reference.

Hi Allan,
Thanks for your inquiry. After an initial test with Aspose.Words 14.7.0, I was unable to reproduce these issues on my side. I would suggest you please upgrade to the latest version of Aspose.Words. You can download it from the following link. I hope, this helps.
https://releases.aspose.com/words/net
Best regards,

Hi,
Thanks for reply. As you told I’ve upgraded my Aspose version. Afterwards I could read the bookmark text which are inside a table. But there is another issue arising as follows.

I am creating a data table with bookmark names as column names and bookmark text as column values for the respective column names. I noticed that bookmark text of each bookmark inside a table is the text of last row of document’s table.
The bookmark text of Age , Gender and Street are getting only the bookmark text of Street.I added another row to the tail end of the table and created another bookmark inside the row of the table and process the document. Eventually I got the same issue.
I could read bookmark text of all bookmarks inside a table as the bookmark text of last row of the table.
Besides It reads entire text inside a row including labels.
I can’t find the logic behind your sample project. May be I should do some changes in order to get the proper result on your sample code, but I am stumbled to break the code. kindly let me know where can i edit the code to solve the above mentioned issue.

Hi Allan,

Thanks for your inquiry. It would be great if you please create a standalone runnable simple console application that helps us reproduce the same problem on our end and attach it here for testing. As soon as you get this simple application ready, we’ll start further investigation into your issue and provide you more information. Please also attach the Word document you are getting this problem with here for testing. Thanks for your cooperation.

Best regards,

Hi:
Attached please find the console project with the problem.
As you can see the data in bookmarks outside the table is OK.
As soon as it is inside the table, the same text “software” is read from all the bookmarks.
I am sure i am doing something wrong here.
Your help is appreciated
thanks

Hi Allan,

Thanks for your inquiry. We are working over your query and will get back to you soon.

Best regards,

Hi,
I have been waiting for your reply. Kindly make it ASAP.

Regards
Allan.

Hi Allan,

Thanks for being patient. Please spare us some time for the investigation of this issue. We will reply you as soon as we can.

Best regards,

Hi Allan,

Thanks for being patient. You can use the following simple code to get html representation of Bookmark’s content:

string Basepath = @"Documents";
Document doc = new Document(Basepath + "BMDemo_Edit.doc");
BookmarkCollection bmCollection = doc.Range.Bookmarks;
DataTable dtUnitCurriculum = new DataTable();
foreach(Bookmark bm in bmCollection)
{
    ArrayList nodes = ExtractContent1(bm.BookmarkStart, bm.BookmarkEnd);
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i <nodes.Count; i++)
    {
        Node node = (Node) nodes[i];
        if (node.IsComposite)
        {
            sb.Append(node.ToString(SaveFormat.Html));
            i = nodes.IndexOf(((CompositeNode) node).LastChild) + 1;
            continue;
        }
        sb.Append(node.ToString(SaveFormat.Html));
    }
    string bookmarkName = bm.Name;
    string bookmarkHtml = sb.ToString();
}

public static ArrayList ExtractContent1(Node startNode, Node endNode)
{
    ArrayList nodes = new ArrayList();
    for (Node node = startNode; node != null && node != endNode; node = node.NextPreOrder(node.Document))
    {
        nodes.Add(node);
    }
    return nodes;
}

I hope, this helps.

Best regards,

This code works fine. Thank you for reply.

Hi There is another issue I have encountered recently. The code works fine for single line text inside the bookmark. If I have a paragraph and when try to read the paragraph text inside the bookmark the loop goes indefinitely. I have attached the file which has the bookmark with paragraph text. Need solution to read both paragraph text and single line text. Your help will be appreciated.

Regards
Allan

Hi Allan,

Thanks for your inquiry. I believe, the following code change will fix this issue:

Document doc = new Document(MyDir + @"Global+warming.docx");
BookmarkCollection bmCollection = doc.Range.Bookmarks;
DataTable dtUnitCurriculum = new DataTable();
foreach(Bookmark bm in bmCollection)
{
    ArrayList nodes = ExtractContent1(bm.BookmarkStart, bm.BookmarkEnd);
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i <nodes.Count; i++)
    {
        Node node = (Node) nodes[i];
        if (node.IsComposite)
        {
            sb.Append(node.ToString(SaveFormat.Html));
            if (((CompositeNode) node).LastChild.NodeType == NodeType.BookmarkEnd)
                i = nodes.IndexOf(((CompositeNode) node).LastChild.PreviousPreOrder(doc)) + 1;
            else
                i = nodes.IndexOf(((CompositeNode) node).LastChild) + 1;
            continue;
        }
        sb.Append(node.ToString(SaveFormat.Html));
    }
    string bookmarkName = bm.Name;
    string bookmarkHtml = sb.ToString();
}

Best regards,

Hi,
One more issue I’ve come through recently. I have bookmarks inside a table, each row contains a bookmark and text associated with the bookmark. It reads properly until the line break. It throws an “Object reference is not set to instance of the object” exception when it tries to read an empty line and the next line starts from next page. I have attached the document from which i faced the issue. Need solution and your help will be appreciated.

Regards
Allan.

Hi Allan,

Thanks for your inquiry. I think, you can simply add the following check to avoid this exception.

……

for (int i = 0; i <nodes.Count; i++)
{
    Node node = (Node) nodes[i];
    if (node.IsComposite && ((CompositeNode) node).ChildNodes.Count> 0)
    {
        sb.Append(node.ToString(SaveFormat.Html));

……

I hope, this helps.

Best regards,