When exporting DOCX to PDF with pdfOptions.setExportDocumentStructure(true) enabled, the Table of Contents (TOC) shows an unexpected visible space after the page number (trailing whitespace) on TOC lines in the tagged PDF. The space appears only when document-structure export (tagged PDF) is enabled; disabling it removes the trailing space.
Summary
When PdfSaveOptions.setExportDocumentStructure(true) is enabled, Aspose.Words creates a tagged PDF. In some cases the Table of Contents (TOC) entries show an extra visible space after the page number. The space is not part of the visual layout of the DOCX – it appears only in the tagged PDF, which means it is introduced by the tagging process.
Why it happens
With document‑structure export Aspose.Words builds a structure tree (tags) for the PDF. The current implementation may add an additional text element (e.g., a space character) after the page‑number run. That element is rendered as a blank in the tagged PDF.
When ExportDocumentStructure is disabled the PDF is saved without tags, so the extra element is not created and the space disappears.
This behaviour is not documented in the public API and is considered an implementation detail. It has been reported as a visual artefact in tagged PDFs in recent versions of Aspose.Words for Java.
What to do next
Upgrade to the latest Aspose.Words for Java release – the most recent versions (e.g., 25.9 / 25.10) contain many fixes for PDF export and accessibility. Upgrading often resolves tagging‑related quirks.
Inspect the tag tree to confirm that the extra space is a separate tag element:
Open the PDF in Adobe Acrobat Pro → View → Show/Hide → Navigation Panes → Tags.
Expand a TOC entry (usually a <P> element) and look for a child node that contains only a space character (often marked as an Artifact or a <Span> with a single space).
Workaround options (if upgrading does not help):
Re‑create the TOC with a custom field switch that avoids the default tab leader. For example, using the \p switch to define a different separator may prevent the tagging engine from inserting the stray space.
Post‑process the PDF tags with a PDF‑manipulation library (e.g., Aspose.PDF for Java) to remove the unwanted whitespace tag. This approach keeps the document‑structure but requires custom code to walk the tag tree and delete nodes that contain only a space.
Generate the TOC without tagging and then merge it with a fully tagged PDF if you need both accessibility and a clean TOC. This is more involved and should be used only as a last resort.
If the problem persists, gather the following information and open a support ticket with Aspose:
Exact Aspose.Words for Java version.
A minimal DOCX file that reproduces the issue.
A snippet of the PDF tag tree (XML representation) showing the extra space element.
Providing these details will help the product team verify whether this is a regression and plan a fix in a future release.
SDK version unclear — refer to the official release notes before applying this guidance.
I’ve already attached a minimal DOCX input and its corresponding output PDF, and you can see the tag tree in the document. The files were generated using Aspose.Words version 25.11 for JDK 16.
@pvijji The behavior is expected. Aspose.Words writes document structure to PDF the similar way as MS Word does. If you convert your document to PDF using MS Word, you will see exactly the same result:
Thank you. I am currently using the code referenced below to generate a Word document. Would you be able to review and advise on TOC options and any cleanup techniques we can apply to avoid extra spacing?. Thank you.
// Create a new document
Document doc = new Document();
doc.removeAllChildren();
DocumentBuilder builder = new DocumentBuilder(doc);
// Add TOC title heading
builder.getParagraphFormat().setStyleIdentifier(StyleIdentifier.HEADING_2);
builder.getParagraphFormat().setOutlineLevel(OutlineLevel.LEVEL_2);
builder.write("Table of Contents");
builder.writeln();
builder.writeln();
// Build TOC using method similar to FileUtil.buildAsposeTOC
builder.insertTableOfContents("\\o \"1-3\" \\h \\u");
// Add page break after TOC
builder.insertBreak(BreakType.PAGE_BREAK);
builder.writeln();
// Add sample headings that will appear in the TOC
addSampleHeadings(builder);
doc.updatePageLayout();
doc.updateFields();
// Save the document as DOCX
String outputPath = "C:\\Temp\\fo\\out.docx";
doc.save(outputPath);
// Save the document as PDF
String pdfOutputPath = "C:\\Temp\\fo\\out.pdf";
PdfSaveOptions pdfOptions = new PdfSaveOptions();
pdfOptions.setExportDocumentStructure(true);
doc.save(pdfOutputPath, pdfOptions);
@pvijji Unfortunately, there is no option to avoid these spaces after TOC items. Aspose.Words structure export matches MS Word structure export. So it is considered as correct and expected.
Thank you.
Could you please help us clean the Word document so that empty tags do not appear in the generated PDF?
For example, when I manually remove paragraph markers in Word using Aspose.Words, these markers seem to cause empty tags in the PDF output. Once these markers are removed, the empty tags disappear.
Please let me know if there is a way to automate this cleanup process using Aspose.Words.
I’ve attached an image for reference:
@pvijji
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): WORDSNET-28865
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.