PDF/A-1a fails accessibility check after DOCX to PDF conversion

dvtdaten · April 27, 2022, 1:03pm

Hello

Using Aspose.Words 22.3. I create PDF/A-1a documents, and the resulting PDFs are tested with PAC3 (Die Barrierefreiheit von PDF-Dokumenten und die Missverständnisse zu PAC - Stiftung "Zugang für alle" ) for accessibility. But both detect figure elements without alternative text, but I guess, the element should not be a figure but an artifact instead.
This is a similar problem as the one which was solved with WORDSNET-21521 last year.

Attached are input and output files and a screenshot of a PAC3 check: 4055.zip (245.5 KB)

KInd regards!

alexey.noskov · April 27, 2022, 2:06pm

@dvtdaten The mentioned issue has been resolved in 21.5 version of Aspose.Words.
In your case, the problem occurs because images in your document actually do not have alternative text. You can set it programmatically. For example see the following code:

Document doc = new Document("C:\\Temp\\in1.docx");

Iterable<Shape> shapes = doc.getChildNodes(NodeType.SHAPE,true);
for(Shape s : shapes)
{
    if(s.getAlternativeText() == "")
        s.setAlternativeText("This is an image");
}

PdfSaveOptions opt = new PdfSaveOptions();
opt.setCompliance(PdfCompliance.PDF_UA_1);
doc.save("C:\\Temp\\out.pdf", opt);

Here is the result of PDF check:

dvtdaten · April 29, 2022, 8:06am

Hello Alexey!

Yes, that workaround fixes the accessibility check, but the problem is, that some of that shapes are not images, they are textboxes. Therefore I referenced WORDSNET-21521 .

alexey.noskov · April 29, 2022, 8:19am

@dvtdaten Thank you for additional information. As i already mentioned the referenced issue has been fixed few releases ago. In your case you can filter shapes with images using condition like this:

for(Shape s : shapes)
{
    if(s.hasImage() && s.getAlternativeText() == "")
        s.setAlternativeText("This is an image");
}

dvtdaten · April 29, 2022, 9:15am

I referenced WORDSNET-21521 because I had the same problem there, this post PDF/A-1a Fails Accessibility Check | DOCX to PDF Conversion using .NET - #12 by tahir.manzoor and my respose apply to the the current case too, I guess: The element with the text “DVT - Daten-Verarbeitung-Tirol GmbH” for example should not be a figure, but an artifact instead.

alexey.noskov · April 29, 2022, 1:24pm

@dvtdaten As I can see MS Word also exports textboxes as figures structure elements in PDF, so Aspose.Words behavior seems to be correct here.