Create bookmark in Aspose word

I am using evaluation version of aspose words.
I need to add bookmarks to all the sdt (Structured document tag) present in word.
Bookmarks as par my requirement can be added in two ways:

  1. Two bookmarks to single sdt
    a) at the start of the sdt
    b) at the end of the sdt
  2. Or Single bookmark enclosing whole sdt. i.e. bookmark starting at the start of sdt and ending at the end of sdt. I am using following code but not getting how to traverse till the end of sdt either to create second bookmark (if using method 1) or to end the bookmark (if using method 2)
    Below is the code snippet:
DocumentBuilder builder = new DocumentBuilder(LobjDocument);
NodeCollection nodes = LobjDocument.GetChildNodes(NodeType.StructuredDocumentTag, true);
int iCount = nodes.Count;
LayoutCollector objLayoutCollector = new LayoutCollector(LobjDocument);
LayoutEnumerator objLayoutEnumerator = new LayoutEnumerator(LobjDocument);
foreach (StructuredDocumentTag sdt in nodes)
{
    Object obk = objLayoutCollector.GetEntity(sdt);
    if (obk != null)
    {
        objLayoutEnumerator.Current = obk;

        builder.MoveTo(sdt);
        builder.StartBookmark("start" + sdt.Tag);
        builder.WriteLine(" ");
        builder.EndBookmark("start" + sdt.Tag);
    }
}

@mhtsharma9,

Please see these input/output Word documents (bookmark-sdt-content.zip (25.9 KB)) and try running the following code. Hope, this helps:

Document doc = new Document("D:\\temp\\sdt.docx");
DocumentBuilder builder = new DocumentBuilder(doc);

StructuredDocumentTag sdt = (StructuredDocumentTag)doc.GetChildNodes(NodeType.StructuredDocumentTag, true)[0];
builder.MoveTo(sdt);

BookmarkStart start = builder.StartBookmark(sdt.Tag);
BookmarkEnd end = builder.EndBookmark(sdt.Tag);

Paragraph first = (Paragraph)sdt.FirstChild;
Paragraph last = (Paragraph)sdt.LastChild;

first.InsertBefore(start, first.FirstChild);
last.InsertAfter(end, last.LastChild);

doc.Save("D:\\Temp\\18.5.docx");

StudyProtocolTest.zip (1.3 MB)

Attached zipped word document. Scenario we need:

  1. Each content control (SDT) of word document should be enclosed in a bookmark with the name as sdt.Tag . Every bookmark of the SDT should start at the start of SDT and should end at the bottom of SDT. It is not necessary that SDT will have only paragraph enclosed in it. It can be possible that SDT will have another SDT or a table as content.
  2. We will update the word document as mentioned above and convert it into PDF with PDF Option of bookmarkOutline as 9.
  3. Now when we extract all the bookmarks from PDF it should give me all the co-ordinates of bookmark such as X,Y, Height, Width.

When we are trying with the code provided below bookmarks of PDF is showing Bottom and Right co-ordinate as 0 aslways

Document LobjDocument = new Document(PsDocument);
string pdfDocument = PsDocument;
pdfDocument = pdfDocument.Replace(".docx", ".pdf");
DocumentBuilder builder = new DocumentBuilder(LobjDocument);
NodeCollection nodes = LobjDocument.GetChildNodes(NodeType.StructuredDocumentTag, true);
int iCount = nodes.Count;
LayoutCollector objLayoutCollector = new LayoutCollector(LobjDocument);
LayoutEnumerator objLayoutEnumerator = new LayoutEnumerator(LobjDocument);
foreach (StructuredDocumentTag sdt in nodes)
{
    Object obk = objLayoutCollector.GetEntity(sdt);
    if (obk != null)
    {
        objLayoutEnumerator.Current = obk;

        builder.MoveTo(sdt);
        BookmarkStart start = builder.StartBookmark(sdt.Tag);
        BookmarkEnd end = builder.EndBookmark(sdt.Tag);
        Node first = sdt.FirstChild;
        Node last = sdt.LastChild;
        // first.InsertBefore(start, first.FirstChild);
        // last.InsertAfter(end, last.LastChild);
        sdt.ParentNode.InsertBefore(start, sdt);
        sdt.ParentNode.InsertAfter(end, sdt);

    }
}
PdfSaveOptions options = new PdfSaveOptions();
options.OutlineOptions.DefaultBookmarksOutlineLevel = 9;
LobjDocument.UpdateFields();
LobjDocument.Save(pdfDocument, options);

And we are extracting bookmark and its co-ordinates as

Aspose.Pdf.Document document = new Aspose.Pdf.Document(PsDocument);
// Create PdfBookmarkEditor
PdfBookmarkEditor bookmarkEditor = new PdfBookmarkEditor();

// Open PDF file
bookmarkEditor.BindPdf(document);

// Extract bookmarks
Aspose.Pdf.Facades.Bookmarks bookmarks = bookmarkEditor.ExtractBookmarks();
foreach (Aspose.Pdf.Facades.Bookmark bookmark in bookmarks)
{
    Debug.WriteLine("Title: {0}", bookmark.Title);
    Debug.WriteLine("Bottom: {0}", bookmark.PageDisplay_Bottom);
    Debug.WriteLine("Top: {0}", bookmark.PageDisplay_Top);
    Debug.WriteLine("Left: {0}", bookmark.PageDisplay_Left);
    Debug.WriteLine("Right: {0}", bookmark.PageDisplay_Right);
}

@mhtsharma9,

We are working over your query and will get back to you soon.

Any update on the above issue?

@mhtsharma9,

StructuredDocumentTag nodes in your Word document have three types of MarkupLevel i.e. Inline, Block and Cell.

Also, a Word document can not contain multiple Bookmarks with the same names. There are some SDT nodes with same Tag e.g. IN:NumberOfSubjectsSummary. In this case, Bookmarks will not be inserted. Or you will have to choose different Bookmark names.

You can build on the following code to meet your requirement:

Document doc = new Document("D:\\temp\\StudyProtocolTest\\StudyProtocolTest.docx");

DocumentBuilder builder = new DocumentBuilder(doc);

foreach (StructuredDocumentTag sdt in doc.GetChildNodes(NodeType.StructuredDocumentTag, true))
{
    NodeCollection nodes = sdt.GetChildNodes(NodeType.Any, true);

    Node firstChild = nodes[0];
    Node lastChild = nodes[nodes.Count - 1];

    builder.MoveTo(lastChild);

    BookmarkStart start = builder.StartBookmark(sdt.Tag);
    BookmarkEnd end = builder.EndBookmark(sdt.Tag);

    if (lastChild.NodeType == NodeType.Paragraph && ((Paragraph)lastChild).ChildNodes.Count == 2)
    {

    }
    else
    {
        lastChild.ParentNode.InsertAfter(end, lastChild);
    }

    if (firstChild.NodeType == NodeType.Run)
    {
        firstChild.ParentNode.InsertBefore(start, firstChild);
    }
    else if (firstChild.NodeType == NodeType.Paragraph)
    {
        Paragraph para = (Paragraph)firstChild;
        if (para.ChildNodes.Count == 0)
        {
            para.AppendChild(start);
        }
        else
        {
            if (para.FirstChild.NodeType == NodeType.BookmarkStart)
            {
                BookmarkStart bmStart = (BookmarkStart)para.FirstChild;
                if (bmStart.Name.Equals(start.Name))
                {

                }
                else
                {
                    bmStart.ParentNode.InsertBefore(start, bmStart);
                }
            }
            else
            {
                para.InsertBefore(start, para.FirstChild);
            }
        }
    }
    else if (firstChild.NodeType == NodeType.BookmarkStart)
    {

    }
}

doc.Save("D:\\Temp\\StudyProtocolTest\\18.6.docx");

Issue is not in adding bookmark.
I am not getting bookmark end in PDF document.

If I created a bookmark for whole SDT in word document. (Start of the bookmark will be start of SDT and End of the bookmark will be end of SDT), and I converted such word document into PDF. Then when I check the bookmark co-ordinates using

Aspose.Pdf.Facades.Bookmarks bookmarks = bookmarkEditor.ExtractBookmarks();
foreach (Aspose.Pdf.Facades.Bookmark bookmark in bookmarks)
{
    Debug.WriteLine("Title: {0}", bookmark.Title);
    Debug.WriteLine("Bottom: {0}", bookmark.PageDisplay_Bottom);
    Debug.WriteLine("Top: {0}", bookmark.PageDisplay_Top);
    Debug.WriteLine("Left: {0}", bookmark.PageDisplay_Left);
    Debug.WriteLine("Right: {0}", bookmark.PageDisplay_Right);
}

I am getting bottom and right always zero.

@mhtsharma9,

Considering “StudyProtocolTest.docx” as the input file, please create and attach your expected Word document (DOCX file) showing the correct Bookmarks here for our reference. You can create this document by using Microsoft Word. We will then investigate the structure of your expected Word document and provide you more information. Thanks for your cooperation.

  1. StudyProtocolTest as input file, add bookmark for content controls in this document. Bookmarks should be added in such a way that its “Start” node should be inserted before SDT and “END” node should be inserted after SDT.
    Start Bookmark SDT End Bookmark
  2. Now convert this document into PDF. Considering above bookmark, when we read bookmarks in PDF document its Top and Left co-ordinate will be start of SDT and its Bottom and Right co-ordinate will be end of SDT.
    But its Bottom and Right are returning 0 always.

Please note word document is correctly showing bookmark start and end. Issue is with converted PDF Document.

@mhtsharma9,

There is a SDT in your document with TAG name as “IN:PrimarySecondaryObjectivesAndEndpoints”. This is a Block level SDT and you cannot use Aspose.Words to place BookmarkStart before it and BookmarkEnd after it. Currently bookmarks are supported only at the inline-level, that is inside Paragraph, but bookmark start and bookmark end can be in different paragraphs. We have logged a new feature request to support insertion of Bookmarks at Block, Row and Cell levels. The ID of this issue is WORDSNET-16962. Your thread has also been linked to this issue and you will be notified as soon as the requested feature is supported. Sorry for the inconvenience.

I think there is some communication gap.
My issue is not with bookmark insertion.
I am facing issue when a document with bookmarks added using above procedure (Consider if document have only inline SDTs) is converted into PDF.
Then from PDF I am reading co-ordinates of bookmark and it is giving me Bottom and Right co-ordinate 0 always.

@mhtsharma9,

Thanks for the additional information. Please see these sample documents: Docs.zip (48.5 KB)

The following code returns 0 for Bottom and Right co-ordinates for both (Aspose.Words and MS Word generated) PDF files:

Aspose.Pdf.Document document = new Aspose.Pdf.Document(PsDocument);
// Create PdfBookmarkEditor
PdfBookmarkEditor bookmarkEditor = new PdfBookmarkEditor();

// Open PDF file
bookmarkEditor.BindPdf(document);

// Extract bookmarks
Aspose.Pdf.Facades.Bookmarks bookmarks = bookmarkEditor.ExtractBookmarks();
foreach (Aspose.Pdf.Facades.Bookmark bookmark in bookmarks)
{
    Debug.WriteLine("Title: {0}", bookmark.Title);
    Debug.WriteLine("Bottom: {0}", bookmark.PageDisplay_Bottom);
    Debug.WriteLine("Top: {0}", bookmark.PageDisplay_Top);
    Debug.WriteLine("Left: {0}", bookmark.PageDisplay_Left);
    Debug.WriteLine("Right: {0}", bookmark.PageDisplay_Right);
}

Please share following resources here for further testing.

  • MS Word generated document (DOCX file) containing inline level Bookmarked SDT
  • Corresponding Aspose.Words generated output PDF file where Bottom and Right co-ordinates are returned as 0
  • And MS Word generated PDF file

UserFiles.zip (136.8 KB)
contains all the files you requested.

But the word generated PDF does not contains bookmarks.

@mhtsharma9,

Please see MS Word 2016 generated PDF file (msw-2016.pdf (78.0 KB)). This file has bookmarks.

Now, when I run the following code:

Aspose.Pdf.Document document = new Aspose.Pdf.Document("D:\\temp\\UserFiles\\msw-2016.pdf");
// Create PdfBookmarkEditor
PdfBookmarkEditor bookmarkEditor = new PdfBookmarkEditor();

// Open PDF file
bookmarkEditor.BindPdf(document);

// Extract bookmarks
Aspose.Pdf.Facades.Bookmarks bookmarks = bookmarkEditor.ExtractBookmarks();
foreach (Aspose.Pdf.Facades.Bookmark bookmark in bookmarks)
{
    Console.WriteLine("Title: {0}", bookmark.Title);
    Console.WriteLine("Bottom: {0}", bookmark.PageDisplay_Bottom);
    Console.WriteLine("Top: {0}", bookmark.PageDisplay_Top);
    Console.WriteLine("Left: {0}", bookmark.PageDisplay_Left);
    Console.WriteLine("Right: {0}", bookmark.PageDisplay_Right);
}

The output is shown below:

Title: SimpleTextControl
Bottom: 0
Top: 673
Left: 122
Right: 0
Title: RichTextControl
Bottom: 0
Top: 622
Left: 131
Right: 0

The same code produces the following output against your “SimpleTestDoc.pdf”

Title: SimpleTextControl
Bottom: 0
Top: 673
Left: 124
Right: 0
Title: RichTextControl
Bottom: 0
Top: 622
Left: 133
Right: 0

So again, Aspose.PDF code returns 0 for Bottom and Right co-ordinates for both (Aspose.Words and MS Word generated) PDF files.

You may please contact Aspose.PDF support team to know as to why the values are 0?

Any update on WORDSNET-16962?

@mhtsharma9,

WORDSNET-16962 is about providing ability to insert Bookmarks at Block, Row and Cell levels. This issue is currently pending for analysis and is in the queue. We will inform you via this thread as soon as this issue is resolved. We apologize for any inconvenience.

The issues you have found earlier (filed as WORDSNET-16962) have been fixed in this Aspose.Words for .NET 18.9 update and this Aspose.Words for Java 18.9 update.