Free Support Forum - aspose.com

Word to Pdf accessibility issues

We are using Aspose to create accessible pdfs and we want them to uphold pdfa1a and ua.
When validating UA with Aspose we get these 9 errors:

<Compliance Name="Log" Operation="Validation" Target="PDF/UA-1"><Version>1.0</Version><Copyright>Copyright (c) 2001-2019 Aspose Pty Ltd. All Rights Reserved.</Copyright><Date>2020-06-25 14:40:21</Date><File Version="1,4" Name="" Pages="3"><Security /><General>
<Problem Severity="Error" Clause="7.1" ObjectID="" Page="2" Convertable="False" Code="7.1:1.1(14.8)">Path object not tagged</Problem>
<Problem Severity="Error" Clause="7.1" ObjectID="" Page="2" Convertable="False" Code="7.1:1.1(14.8)">Text object not tagged</Problem>
<Problem Severity="Error" Clause="7.1" ObjectID="" Page="3" Convertable="False" Code="7.1:1.1(14.8)">Path object not tagged</Problem>
<Problem Severity="Error" Clause="7.1" ObjectID="" Page="3" Convertable="False" Code="7.1:1.1(14.8)">Text object not tagged</Problem>
<Problem Severity="Warning" Clause="7.1" ObjectID="" Page="" Convertable="True" Code="7.1:2.3">'Part' structure element used as root element</Problem>
<Problem Severity="Need manual check" Clause="7.1" ObjectID="" Page="" Convertable="False" Code="7.1:5">Color contrast</Problem></General><Text>
<Problem Severity="Need manual check" Clause="7.2" ObjectID="" Page="" Convertable="False" Code="7.2:1">Logical Reading Order</Problem></Text><Fonts>
<Problem Severity="Error" Clause="7.21.4.2" ObjectID="11" Page="1" Convertable="False" Code="7.21.4.2">CIDSet is missing or incomplete for font '11'</Problem></Fonts><Graphics /><Headings /><Tables /><Lists /><NotesAndReferences /><OptionalContent /><EmbeddedFiles /><DigitalSignatures /><NonInteractiveForms /><XFA /><Navigation /><Annotations>
<Problem Severity="Error" Clause="7.18.1" ObjectID="17" Page="1" Convertable="False" Code="7.18.1:2">Alternative description missing for an annotation</Problem>
<Problem Severity="Error" Clause="7.18.1" ObjectID="18" Page="1" Convertable="False" Code="7.18.1:2">Alternative description missing for an annotation</Problem>
<Problem Severity="Error" Clause="7.18.1" ObjectID="19" Page="1" Convertable="False" Code="7.18.1:2">Alternative description missing for an annotation</Problem></Annotations><Actions /><XObjects /><VersionIdentification>
<Problem Severity="Error" Clause="5" ObjectID="" Page="" Convertable="True" Code="5:1">PDF/UA identifier missing</Problem></VersionIdentification></File></Compliance>

The path and text errors are for the header objects, see 1.png and 2.png

“Alternative description” errors are for the table of content. In word, you are not allowed to set alternative decriptive text for this type of data since it is of type text and not image. See 3.png for example

After setting OutlineOptions we get file corruption on the pdf in 2 different tools. See one example for in image BookmarksGiveFormatError.png. This dissapears if all OutlineOptions rows are removed, just one (any) of them is enough to recreate the issue.
Without these, we get accessibility error on that there are no bookmarks instead

This is the code that we use for this part:

using System.IO;
using Aspose.Words;
using Aspose.Words.Saving;

namespace Canea.Common.DocumentFormat.Standard.PdfGeneration.Aspose
{
    public class DocumentStreamToPdfConverter : IStreamToPdfConverter
    {
        public MemoryStream ToPdf(Stream documentStream, string basePath)
        {
            var document = new Document(documentStream);
            var pdfStream = new MemoryStream();
            document.Save(pdfStream, GetMoreAccessibilityFriendlyOptions());
            return pdfStream;
        }

        private PdfSaveOptions GetMoreAccessibilityFriendlyOptions()
        {
            var options = new PdfSaveOptions()
            { 
                Compliance = PdfCompliance.PdfA1a, 
                ExportDocumentStructure = true, 
                FontEmbeddingMode = PdfFontEmbeddingMode.EmbedAll,
                EmbedFullFonts = false, 
                
                UseCoreFonts = false, 
                PreserveFormFields = false, 
                HeaderFooterBookmarksExportMode = HeaderFooterBookmarksExportMode.All,

                DisplayDocTitle = true
            };
            return SetTableOfContentsForAccessabilityWithBookmarks(options);
        }

        private PdfSaveOptions SetTableOfContentsForAccessabilityWithBookmarks(PdfSaveOptions options)
        {
            options.OutlineOptions.ExpandedOutlineLevels = 2;
            options.OutlineOptions.HeadingsOutlineLevels = 2;
            options.OutlineOptions.DefaultBookmarksOutlineLevel = 2;
            options.OutlineOptions.CreateMissingOutlineLevels = true;
            return options;
        }
    }
}

We are using version 19.11.0. It is the same errors when using the latest version (20.6.0)
files.zip (213.4 KB)

@lars.olsson

You are using the old version of Aspose.Words. We suggest you please try the latest version of Aspose.Words for .NET 20.6 and let us know how it goes on your side.
.
If you still face problem, please attach the following resources here for testing:

  • Please attach the output PDF file generated by Aspose.Words 20.6 that shows the undesired behavior.
  • Please attach the expected output PDF file that shows the desired behavior.
  • Please share the steps that you are using to reproduce this issue at our end.

As soon as you get these pieces of information ready, we will start investigation into your issue and provide you more information. Thanks for your cooperation.

PS: To attach these resources, please zip and upload them.

As i said, we get this exact error both on 19.11 and 20.6.
Output file is resource.pdf, input file is SimpleDoc.docx in the zip file. There is only 1 word file and 1 pdf in the zip so there is no confusion.
Expected output pdf is a valid UA pdf acording to your own validation method for UA

(This is how we verify now:
using (var document = new Aspose.Pdf.Document(pdf))
{
bool isValid = document.Validate(errorsHere, Aspose.Pdf.PdfFormat.PDF_UA_1);
})

@lars.olsson

Thanks for sharing the detail. You are saving Word document to PDF with PdfA1a compliance and validating it for PdfFormat.PDF_UA_1. So, Document.Validate returns false.

Please note that Aspose.Words does not support PdfCompliance PDF/UA. We already logged this feature as WORDSNET-6614 in our issue tracking system. You will be notified via this forum thread once this feature is available. We apologize for your inconvenience.

Please use Aspose.PDF to save the PDF with PDF/UA compliance as shown below. Hope this helps you.

Aspose.Words.Document doc = new Aspose.Words.Document(MyDir + "SimpleDoc1.docx");

string pdfFile = MyDir + "resource.pdf";

doc.Save(pdfFile,
new Aspose.Words.Saving.PdfSaveOptions
{
    SaveFormat = Aspose.Words.SaveFormat.Pdf
});

var pdfDocument = new Aspose.Pdf.Document(pdfFile);
pdfDocument.Convert(MyDir + "inputlog.xml", Aspose.Pdf.PdfFormat.PDF_UA_1, Aspose.Pdf.ConvertErrorAction.Delete);
pdfDocument.Save(MyDir + "outputpdfa.pdf");
pdfDocument = new Aspose.Pdf.Document(MyDir + "outputpdfa.pdf");
var isValid = pdfDocument.Validate(MyDir + "inputlog.xml", Aspose.Pdf.PdfFormat.PDF_UA_1);

I understand you dont fully support UA, but I am reporting bugs in the pdf conversion to you.
If I use your code suggestion, I still have these bugs:

  1. “Path/text object not tagged” because the text is generated as an object instead of a string when converting to pdf? (1.png and 2.pgn) Or maybe it’s just missing from the logical structure data? Do you know why?

  2. “Alternative description missing for an annotation” is due to the string being converted to an image instead of string? (3.png) To my understanding, this should not happen in A1a/fully embedd all fonts.
    Hmm… Here there would be links in the xml, so it’s text inside a TOC tag. Alternative text for all TOC tags could be “Table of content”? Or are you using it for other types of data also? It seems kinda buggy it’s not set, can I do this on my side without strange side effects? Hmm… why is an alternative text needed for a TOC tag, am I missing something?

If I add outlineoptions to your example, i get
3) Corrupted pdf (BookmarksGiveFormatError.png)

These errors doesnt depend on the format (UA vs A1a) and are the three bugs I am focusing on this report. The errors in the aspose verification log is more extra data that could be helpful if you had some insight into, like why is a font missing but only in 1 place, but not as important. The format isnt the intresting thing to us, neither is the boolean flag from verify in aspose, it is used to get the error messages from your code and show that they are the same problem we get from the external programs

There becomes a lot of #1 and #2 errors for real documents, and I cannot link which element in the pdf correlates with which element in the word doc file after conversion and fix stuff programatically afterwards. #3 stops the pdfs from being opened by some third party software which isnt good for us since customers will get problems with this (but I still need bookmarks, can this be done in an other way maybe?)

So just to repeat, bug #1 and #2 is still present in your code example, #3 is not there, since you are not using outlineoptions in your code example…

@lars.olsson

We are investigating this issue and will get back to you soon.

@lars.olsson

Regarding Aspose.Words, we have tested the scenario using the following code example and noticed that Document.Validate returns true.

var options = new PdfSaveOptions()
{
    Compliance = PdfCompliance.PdfA1a,
    ExportDocumentStructure = true,
    FontEmbeddingMode = PdfFontEmbeddingMode.EmbedAll,
    EmbedFullFonts = false,

    UseCoreFonts = false,
    PreserveFormFields = false,
    HeaderFooterBookmarksExportMode = HeaderFooterBookmarksExportMode.All,

    DisplayDocTitle = true
};

options.OutlineOptions.ExpandedOutlineLevels = 2;
options.OutlineOptions.HeadingsOutlineLevels = 2;
options.OutlineOptions.DefaultBookmarksOutlineLevel = 2;
options.OutlineOptions.CreateMissingOutlineLevels = true;

Aspose.Words.Document doc = new Aspose.Words.Document(MyDir + "SimpleDoc1.docx");
doc.Save(MyDir + "aw.output.pdf", options);

var pdfDocument = new Aspose.Pdf.Document(MyDir + "aw.output.pdf");
var isValid = pdfDocument.Validate(MyDir + "inputlog.xml", Aspose.Pdf.PdfFormat.PDF_A_1A); ;

Your issue is more related to Aspose.PDF. We are investigating it and will share our findings with you soon.

@lars.olsson

We have tested the scenario in our environment and noticed the issues in output PDF/UA document generated by Aspose.PDF for .NET 20.6. We have logged an issue as PDFNET-48513 in our issue tracking system for the sake of correction. We will further look into details of it and keep you informed about its rectification status. Please be patient and spare us some time.

We are sorry for the inconvenience.