PDF/A-1a Fails Accessibility Check | DOCX to PDF Conversion using .NET

Hello
Using Aspose.Words 20.6. I create PDF/A-1a documents. I test the resulting PDFs with Acrobat Prefight Tool and PAC3 (Die Barrierefreiheit von PDF-Dokumenten und die Missverständnisse zu PAC - Stiftung "Zugang für alle") for accessibility and get some errors:

  • Both detect a figure element without alternative text
    ** I guess the element should not be a figure but an Artifact instead.
  • Preflight detects problems with headlines and regularities
  • PAC3 detects irregular table rows.
  • PAC3 detects figure elements without a bounding box
  • PAC3 detects, that des base element ist a path element
    ** I guess the base element should be a document element instead.

I attach the input docx, the resulting pdf and screenshots of the preflight and pac3 checks:
aspose.zip (173.1 KB)

@dvtdaten

You are using Aspose.PDF to modify/save the PDF document. Please use the latest version of Aspose.Words for .NET 20.9 to save the document to PDF and let us know how it goes on your side.

If you still face problem, please ZIP and attach the problematic output PDF along with screenshots of issue detail. We will investigate this issue and provide you more information on it.

Yes, for the first test I modified the resulting PDF to meet this accessibility checker issue: Adobe pdf accessibility checker issue - document title is showing in title bar failed

I did anoher test, for which I only used Aspose.Words 20.9 for Java:

  • Both detect missing the document title in the title bar
    ** this can be fulfilled with Aspose.PDF - or is it possible with Aspose.Words too?
  • Both detect a figure element without alternative text
    ** I guess the element should not be a figure but an Artifact instead.
  • Preflight detects problems with headlines and regularities
  • PAC3 detects irregular table rows.
  • PAC3 detects figure elements without a bounding box
  • PAC3 detects, that des base element ist a part element
    ** I guess the base element should be a document element instead?!

aspose2.zip (143.9 KB)

@dvtdaten

We have logged this problem in our issue tracking system as WORDSNET-21116 . You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

Thank you, this might help me in the future.

By the way, there are a few questions, for which you maybe can give me an answer:

  • Is it possible to set the displayDocTitle via Aspose.Words as it can be done via Aspose.Pdf (Aspose.Pdf.Document.setDisplayDocTitle)?
  • for the base element, which in case is a part-element: how can I replace this element with a document-element?

@dvtdaten

Please use PdfSaveOptions.DisplayDocTitle property as shown below to set the title of PDF document.

Document doc = new Document(MyDir + "Rendering.docx");
doc.BuiltInDocumentProperties.Title = "Windows bar pdf title";

PdfSaveOptions pdfSaveOptions = new PdfSaveOptions { DisplayDocTitle = true };

doc.Save(ArtifactsDir + "PdfSaveOptions.WindowsBarPdfTitle.pdf", pdfSaveOptions);

Could you please share some more detail about this query along with input and expected output documents? We will then provide you more information on it.

Thank you for this hint!

If you look at the tags in the resultung pdf file in aspose2.zip, it shows the following structure:
<Tags><Part>...

PAC3 reports a warning for this case: “structural element ‘Part’ used as basic element”, so I guess the valid structure should be:
<Tags><Document><Part>...

I know, this is an Aspose.PDF specific question, but is it possible to correct the document structure to <Tags><Document><Part> via Aspose?

@dvtdaten

We have tested the scenario and noticed that the root element “Document” is missing in document structure of PDF. We have logged this problem in our issue tracking system as WORDSNET-21132 . You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

@dvtdaten

Could you please share why you think that the problematic figure should be an artifact? We do not see any specific properties of the shape by which Aspose.Words could detect it as an artifact and not as a meaningful graphics.

We have logged separate issue for it as WORDSNET-21379.

This issue is related to another issue WORDSNET-21132 and has been resolved. Its fix will be available in the December 2020 release.

Please note that using tables to organize the document layout is considered as a bad practice for the accessible documents. MS Word provides enough functionality to organize a layout without tables. Moreover, organizing a layout of your document without tables will also be a workaround for the problem with table regularities.

1 Like

Hello!
Here is an updated input.docx (without tables) and output.pdf: accessibilityCheck.zip (159.5 KB)

So, we can forget about the problem with table regularities.

Could you please share why you think that the problematic figure should be an artifact? We do not see any specific properties of the shape by which Aspose.Words could detect it as an artifact and not as a meaningful graphics.

In the document, there is the text “DVT - Daten-Verarbeitung-Tirol GmbH”, for which this situation applies, I guess.

This issue is related to another issue WORDSNET-21132 and has been resolved. Its fix will be available in the December 2020 release.

Very good :slight_smile:

@dvtdaten

Thanks for sharing the detail. We have logged this detail in our issue tracking system. You will be notified via this forum thread once these issues are resolved.

@dvtdaten

For PAC3 reports that the line shape from the page footer is not tagged, we have created new ticket as WORDSNET-21406.

Unfortunately, we still do not quite understand why you think that it should be an artifact. The problematic shape is just a regular textbox in the main document body and it is exported with a Figure tag. Maybe you consider it is related to the page header. Then it should be placed into the header document part and then Aspose.Words will be able to recognize it. But in this case you should note that there is a bug in Aspose.Words that shapes inside header/footer appears not tagged in PDF output.

Also we see the difference between Aspose.Words and MS Word export of this textbox. Aspose.Words exports it as a Figure tag and MS Word exports as a Sect tag. MS Word export seems to be more correct. If you are interested in the fix of this difference then please let us know. We will then create a separate ticket for it.

Hello!

As you state in the last paragraph, MS Word seems to export these Textbox in a more correct way leading to better accessibility check results of PAC3 and Acrobat Preflight. So if you ask me, I am clearly interested in a fix of this difference.

@dvtdaten

Thanks for your feedback. We have logged this issue as WORDSNET-21521 in our issue tracking system.

The issues you have found earlier have been fixed in this Aspose.Words for .NET 20.12 update and this Aspose.Words for Java 20.12 update.

The issues you have found earlier (filed as WORDSNET-21116) have been fixed in this Aspose.Words for .NET 21.1 update and this Aspose.Words for Java 21.1 update.

The issues you have found earlier (filed as WORDSNET-21406) have been fixed in this Aspose.Words for .NET 21.2 update and this Aspose.Words for Java 21.2 update.

The issues you have found earlier (filed as WORDSNET-21521) have been fixed in this Aspose.Words for .NET 21.5 update and this Aspose.Words for Java 21.5 update.

The issues you have found earlier (filed as WORDSNET-21379) have been fixed in this Aspose.Words for .NET 21.8 update and this Aspose.Words for Java 21.8 update.