Merge Word Documents with Table of Content TOC Fields & Update Hyperlinks so that they point to Correct Heading Locations using Java

Hi Team,

I am trying to merge multiple docx documents to generate a single docx document.
Each document containing Table of content .When all the documents are getting merged using aspose.word ,in the final generated document the first table of content hyperlinks referring correct header in the final document but for the second table of content the hyperlink are referring incorrect headers as you can see in the below attached sample document.When you will click the second table of content hyperlink it is referring incorrect header location .

Your solution on this will be really helpful for me.ASPOSE_SAMPLE.zip (39.9 KB)

regards,
Uddeshya Pratik

@upratik,

To ensure a timely and accurate response, please ZIP and attach the following resources here for testing:

  • Your simplified input Word documents that you are merging into one
  • Aspose.Words 20.3 generated output DOCX file showing the undesired behavior
  • Your expected DOCX file showing the desired output. You can create this document by using MS Word.
  • Please also create a standalone simple runnable console application (source code without compilation errors) that helps us to reproduce your current problem on our end and attach it here for testing. Please do not include Aspose.Words DLL/JAR files in it to reduce the file size.

As soon as you get these pieces of information ready, we will start investigation into your scenario and provide you more information. Thanks for your cooperation.

@awais.hafeez

Please find all the requested document and the description and issue details inside the attached folder.

Regards,
Uddeshya Pratik

Sample_documents.zip (203.9 KB)

@upratik,

To fix “Actual_generated_document.docx”, you can just define the scope of second TOC field by enclosing the entire content of last section inside a Bookmark. Please try the following code:

Document doc = new Document("E:\\Temp\\Sample_documents\\Actual_generated_document.docx");

com.aspose.words.DocumentBuilder builder = new DocumentBuilder(doc);
builder.moveToDocumentEnd();

BookmarkStart bookmarkStart = builder.startBookmark("bookmarked_Scope");
builder.endBookmark("bookmarked_Scope");

doc.getLastSection().getBody().getFirstParagraph().insertBefore(bookmarkStart, doc.getLastSection().getBody().getFirstParagraph().getFirstChild());

for (Field field : doc.getLastSection().getBody().getRange().getFields()) {
    if (field.getType() == FieldType.FIELD_TOC) {
        FieldToc tocField = (FieldToc) field;
        tocField.setBookmarkName("bookmarked_Scope");
        tocField.update();
        break;
    }
}

doc.save("E:\\Temp\\Sample_documents\\awjava-20.3.docx");

@awais.hafeez

Thanks, Hafeez for the above solution.

Now I am able to see in the newly generated document the TOC 2 hyperlinks are referring to correct header locations.

However, I came across a couple of observations after using the above code snippet given by you.

Formatting to table of content 2 is getting change as well as in the updated generated document few headers are also missing.

I am attaching a detailed issue description document along with the sample code and other documents for your reference.

Please help me to sort out these issues.

Updated_Sample_documents.zip (220.3 KB)

Regards,
Uddeshya Pratik

@upratik,

We are checking these scenarios and will get back to you soon.

@awais.hafeez

Did you get a chance to look into the above issue.

Regards,
@upratik

@upratik,

We tested the scenario and have managed to reproduce the same problems on our end. For the sake of correction, we have logged a ticket in our issue tracking system. The ID of this ticket is WORDSNET-20187. We will further look into the details of these problems and will keep you updated on the status of linked issue. We apologize for your inconvenience.

@awais.hafeez

Please let me know once you will get any solution on this.

Regards,
Uddeshya Pratik

@upratik,

Your issue (WORDSNET-20187) is currently ‘pending for analysis’ and is in the queue. We will inform you via this thread as soon as this issue will be resolved. We apologize for your inconvenience.

@upratik,

Regarding WORDSNET-20187, it is to update you that we have completed the work on this issue and the fix will be included in the next 20.5 version of Aspose.Words for Java. Please also check the following analysis details:

Here is the summary of your scenario:

  • prepend TOC field and page break into Doc2.docx, update fields and save it as TOC1.docx
  • prepend TOC field and page break into Doc3.docx, update fields and save it as TOC2.docx
  • append TOC1.docx and TOC2.docx into Doc1.docx and save it as Actual_generated_document.docx

There are few problems with such scenario:

  • TOC entries page numbers are invalid, because TOC fields are populated before merging documents
  • some TOC entries refer to invalid paragraphs for the same reason

The workaround is:

  • wrap last section in Actual_generated_document.docx with bookmark
  • bind second TOC field in last section to bookmark and update it
  • save document as FinalDocument.docx
  • update fields in FinalDocument.docx and save it as 20.3.docx

The workaround solves references problem for the second TOC field, but Aspose.Words currently misses some entries. Also, the first TOC field now contains extra entries, because it is not bound to bookmark. There is no formatting issues here, MS Word updates field in the same way.

The missed entries issue is fixed. Also, we suggest the following code for your use case:

GenerateTocDoc("Doc2.docx", "TOC1.docx");
GenerateTocDoc("Doc3.docx", "TOC2.docx");

Document document = new Document("Doc1.docx");
AppendDoc(document, "TOC1.docx");
AppendDoc(document, "TOC2.docx");
document.updateFields();
document.save("out.docx");

public static int bookmarkCounter = 1;
private static void GenerateTocDoc(String input, String output) throws Exception {
    Document doc = new Document(input);

    DocumentBuilder builder = new DocumentBuilder(doc);
    FieldToc field = (FieldToc) builder.insertTableOfContents("\\o \"1-3\" \\h \\z \\u");
    builder.insertBreak(BreakType.PAGE_BREAK);

    String bookmark = "_bm" + bookmarkCounter;
    builder.startBookmark(bookmark);
    builder.moveToDocumentEnd();
    builder.endBookmark(bookmark);

    field.setBookmarkName(bookmark);

    bookmarkCounter++;
    doc.save(output);
}

private static void AppendDoc(Document main, String src) throws Exception {
    Document srcDoc = new Document(src);
    main.appendDocument(srcDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
}

The issues you have found earlier (filed as WORDSNET-20187) have been fixed in this Aspose.Words for .NET 20.5 update and this Aspose.Words for Java 20.5 update.