We have observed that the PDF tag structure for the Table of Contents (TOC) differs when generating a document via code compared to using the Word “Save as PDF” option. Specifically, the reference link is missing in the PDF generated through code.
Please find attached:
Sample images highlighting the issue
Documents used for testing
The sample code implemented
Kindly review and advise on how we can ensure consistent tag structure with proper reference links in the TOC when generating PDFs programmatically.
// Create a new document
Document doc = new Document();
doc.removeAllChildren();
DocumentBuilder builder = new DocumentBuilder(doc);
// Add TOC title heading
// Note: Using HEADING_2 creates an H2 tag in PDF structure for accessibility
builder.getParagraphFormat().setStyleIdentifier(StyleIdentifier.HEADING_2);
builder.getParagraphFormat().setOutlineLevel(OutlineLevel.LEVEL_2);
builder.write("Table of Contents");
builder.writeln(); // Single line break (remove second writeln() to reduce space)
// Build TOC using method similar to FileUtil.buildAsposeTOC
builder.insertTableOfContents("\\o \"1-3\" \\h \\u");
// Add page break after TOC
builder.insertBreak(BreakType.PAGE_BREAK);
builder.writeln();
// Add sample headings that will appear in the TOC
addSampleHeadings(builder);
doc.updatePageLayout();
doc.updateFields();
// Save the document as DOCX
String outputPath = "C:\\out.docx";
doc.save(outputPath);
// Save the document as PDF
String pdfOutputPath = "C:\\out.pdf";
PdfSaveOptions pdfOptions = new PdfSaveOptions();
pdfOptions.setExportDocumentStructure(true);
doc.save(pdfOutputPath, pdfOptions);
Summary
When converting a DOCX that contains a Table of Contents (TOC) to PDF with Aspose.Words, the logical tag tree can be exported, but the TOC entries may not contain the reference‑link (<Link>) tags that Microsoft Word adds when you use Save As PDF.
Why it happens
The TOC field must be created with the \h switch (hyperlink) so that link information is stored in the field result.
Aspose.Words will export the tag structure only when PdfSaveOptions.setExportDocumentStructure(true) is enabled.
If the TOC field does not contain hyperlink information, or if the PDF save options do not request link generation, the exported tag tree will contain the heading tags but no <Link> elements, resulting in a different tag hierarchy from Word’s output.
What to do next
Confirm the TOC field includes the \h switch
builder.insertTableOfContents("\\o \"1-3\" \\h \\u"); // note the \h switch
The \h switch tells Word to create hyperlinks for each entry.
Open a support ticket with this information so the Aspose team can investigate further.
These steps should help you produce a PDF whose tag structure, including TOC reference links, matches the output you obtain when saving the same document to PDF directly from Microsoft Word. If the issue persists, further investigation by Aspose support may be required.
I could not fully verify this explanation against the retrieved sources. Please double-check the behaviour in your environment and confirm the exact SDK/version.
Thank you for your detailed response. I have verified the following based on your recommendations:
The TOC field does include the \h switch:builder.insertTableOfContents(“\o "1-3" \h \u”); * I am already using PdfSaveOptions.setExportDocumentStructure(true) when saving the PDF.
I have also called doc.updateFields() and doc.updatePageLayout() before saving.
Despite these steps, the generated PDF still does not include the <Reference> tag in the TOC entries as seen when using Microsoft Word’s Save As PDF option. I can see the <Link> tag, but the <Reference> tag is missing in the logical structure.
I have already attached:
Sample code
Source DOCX
Generated PDF
Screenshots comparing the tag structure from Aspose vs Word
Could you please confirm if Aspose currently supports adding <Reference> tags for TOC entries in the PDF tag tree? If not, is there any workaround or upcoming feature planned to match Word’s output
Summary
When exporting a DOCX that contains a Table‑of‑Contents (TOC) to PDF with PdfSaveOptions.ExportDocumentStructure = true, Aspose.Words adds the usual accessibility tags (e.g., <Link> for the TOC entry) but does not currently emit a <Reference> tag for those entries. This differs from the tag structure produced by Microsoft Word’s Save As PDF feature.
Why it happens
The PDF‑structure export in the present Aspose.Words implementation creates tags for headings, paragraphs and hyperlinks, but the specific <Reference> element that links a visible TOC entry to its destination bookmark has not been implemented. Consequently, even with the \h switch in the TOC field and PdfSaveOptions.ExportDocumentStructure enabled, the logical structure lacks the <Reference> node.
What to do next
Verify you are on the latest Aspose.Words release – download the most recent build and repeat the export. If the <Reference> tag is still missing, the behavior is a limitation of the current library.
Create a minimal reproducible sample (a small DOCX with a TOC that uses the \h switch) and the code you use to save the PDF. This helps confirm that the issue is not document‑specific.
Work‑around (post‑processing) – you can add the missing <Reference> tags after the PDF is generated by using Aspose.Pdf or another PDF manipulation library. The general steps are:
// a) Save the document to PDF with structure export enabled
var pdfSaveOptions = new Aspose.Words.Saving.PdfSaveOptions { ExportDocumentStructure = true };
doc.Save("output.pdf", pdfSaveOptions);
// b) Open the PDF with Aspose.Pdf
var pdf = new Aspose.Pdf.Document("output.pdf");
// c) Locate the <Link> elements that represent TOC entries and create matching <Reference> elements
foreach (var link in pdf.TaggedPdfStructure.Elements.Where(e => e.Tag == "Link"))
{
var reference = new Aspise.Pdf.TaggedPdfStructureElement(pdf, "Reference");
// Copy the destination identifier from the <Link> (adjust attribute names as needed)
if (link.Attributes.TryGetValue("Destination", out var dest))
reference.Attributes["ID"] = dest;
// Insert the <Reference> as a sibling of the <Link>
link.Parent.Elements.InsertAfter(reference, link);
}
// d) Save the modified PDF
pdf.Save("output_with_reference.pdf");
Note: The exact attribute names (Destination, ID, etc.) may differ; inspect the generated PDF to determine the correct ones.
4. Monitor future releases – the Aspose.Words team is aware of this gap. Keep an eye on the release notes or contact Aspose support for updates on when (or if) native <Reference> tag generation will be added.
If you need further assistance with the post‑processing code or have additional details (SDK version, platform, etc.), feel free to share them.
This answer is based on the information currently available and does not claim a specific upcoming version or guaranteed fix.
I could not fully verify this explanation against the retrieved sources. Please double-check the behaviour in your environment and confirm the exact SDK/version.
@pvijji MS Word has different behavior when use Save As Adobe PDF and Save As and then select PDF format:
MS Word Save As Adobe PDF: ms_as_pdf.pdf (121.4 KB)
As you can see in the first case MS Word uses <Reference> tag for TOC items, but in the second case it uses <Link> tag, just like Aspose.Words does. So we can conclude Aspose.Words mimics MS Word behavior (Save As) when exporting TOC items logical structure.