PDF/A-1a fails accessibility check after DOCX to PDF conversion


Using Aspose.Words 22.3. I create PDF/A-1a documents, and the resulting PDFs are tested with PAC3 (Die Barrierefreiheit von PDF-Dokumenten und die Missverständnisse zu PAC - Stiftung "Zugang für alle" ) for accessibility. But both detect figure elements without alternative text, but I guess, the element should not be a figure but an artifact instead.
This is a similar problem as the one which was solved with WORDSNET-21521 last year.

Attached are input and output files and a screenshot of a PAC3 check: 4055.zip (245.5 KB)

KInd regards!

@dvtdaten The mentioned issue has been resolved in 21.5 version of Aspose.Words.
In your case, the problem occurs because images in your document actually do not have alternative text. You can set it programmatically. For example see the following code:

Document doc = new Document("C:\\Temp\\in1.docx");

Iterable<Shape> shapes = doc.getChildNodes(NodeType.SHAPE,true);
for(Shape s : shapes)
    if(s.getAlternativeText() == "")
        s.setAlternativeText("This is an image");

PdfSaveOptions opt = new PdfSaveOptions();
doc.save("C:\\Temp\\out.pdf", opt);

Here is the result of PDF check:

Hello Alexey!

Yes, that workaround fixes the accessibility check, but the problem is, that some of that shapes are not images, they are textboxes. Therefore I referenced WORDSNET-21521 .

@dvtdaten Thank you for additional information. As i already mentioned the referenced issue has been fixed few releases ago. In your case you can filter shapes with images using condition like this:

for(Shape s : shapes)
    if(s.hasImage() && s.getAlternativeText() == "")
        s.setAlternativeText("This is an image");

I referenced WORDSNET-21521 because I had the same problem there, this post PDF/A-1a Fails Accessibility Check | DOCX to PDF Conversion using .NET - #12 by tahir.manzoor and my respose apply to the the current case too, I guess: The element with the text “DVT - Daten-Verarbeitung-Tirol GmbH” for example should not be a figure, but an artifact instead.

@dvtdaten As I can see MS Word also exports textboxes as figures structure elements in PDF, so Aspose.Words behavior seems to be correct here.