Creating an accessible tagged PDF from a Word document created with Aspose.Words

I am creating a Word document from scratch using Aspose.Words and populating with text from database. I am adding headings to the docx to help structure the document. Once the docx is created, it is later converted to PDF also using Aspose.Words.

PdfSaveOptions saveOptions = new PdfSaveOptions()
{
    Compliance = PdfCompliance.PdfA1a,
    DisplayDocTitle = true,
    ExportDocumentStructure = true,
    OutlineOptions = { DefaultBookmarksOutlineLevel = 1 }
};

The accessibility checker in Adobe Acrobat says that the document is not tagged and I cannot figure out how to create a Word document that can be later converted to PDF that is appropriately tagged. Can someone point me in the right direction?

@protstein

To create an accessible tagged PDF from a Word document using Aspose.Words, you need to ensure that the document is structured correctly in Word before converting it to PDF. Here are the steps you can follow:

  1. Set Up Document Structure: When creating your Word document, make sure to use the appropriate heading styles (e.g., Heading 1, Heading 2) for your sections. This helps in maintaining a logical structure that can be recognized during the PDF conversion.

  2. Use PdfSaveOptions: When saving the document as a PDF, you should configure the PdfSaveOptions to ensure that the document is tagged correctly. Here’s an example of how to set this up:

    Document doc = new Document(@"C:\\Temp\\in.docx");
    PdfSaveOptions options = new PdfSaveOptions
    {
        Compliance = PdfCompliance.PdfA1a, // Use PdfA1a for accessibility
        DisplayDocTitle = true,
        ExportDocumentStructure = true, // Ensure document structure is exported
        OutlineOptions = { DefaultBookmarksOutlineLevel = 1 } // Set bookmark levels
    };
    doc.Save(@"C:\\Temp\\out.pdf", options);
    
  3. Check Accessibility: After converting the document, use an accessibility checker (like Adobe Acrobat) to verify that the PDF is tagged correctly. If the checker indicates that the document is not tagged, ensure that:

    • The headings in the Word document are properly applied.
    • The ExportDocumentStructure option is set to true.
  4. Consider Compliance Levels: Depending on your requirements, you might want to use different compliance levels. For example, PdfCompliance.PdfUa1 is specifically designed for accessible PDFs. You can set it like this:

    options.Compliance = PdfCompliance.PdfUa1; // For PDF/UA compliance
    

By following these steps, you should be able to create a tagged PDF that meets accessibility standards. If you continue to face issues, consider reviewing the structure of your Word document and the settings used in PdfSaveOptions.

For more detailed examples and options, you can refer to the Aspose documentation on PdfSaveOptions.

Sources:
[1]: Converting doc documents to Accessible and Tagged (WCAG compilant) PDF
[2]: Generate tagged PDF from ASPOSE Word
[3]: Convert an normal word document to tagged pdf

@protstein Please try using PdfCompliance.PdfUa1 compliance, which primary purpose is to represent electronic documents in the PDF format in a manner that allows the file to be accessible.

Thank you! This worked once I realized I was also merging the PDF with another PDF downstream in the code using an old version of iText.

@protstein It is perfect that you managed to determine the root cause of the problem. Please feel free to ask in case of any further issues. We are always glad to help you.