Support Converting Word documents to PdfCompliance PDF/UA Format by using Aspose.Words

Any news regarding this issue?

We also need to generate PDF/UA document from Microsoft Word documents, for the same reasons: The new law requiring it for all European governments when publishing new pdf documents online.

The law is in effect now, so this is urgent to our customers, and we really hope Aspose.Words could be the solution :slight_smile:

@gertsen,

We have moved your post into a new thread. Your thread has been linked to the appropriate issue (WORDSNET-6614) and you will be notified via this thread as soon as this feature is supported. I am afraid, there are no estimates available at the moment. Sorry for the inconvenience.

A possible workaround to convert Word documents to many other formats is as follows:

  1. Save Word document to plain PDF by using the following Aspose.Words’ code:
    Document doc = new Document("input.docx");
    doc.save("awjava-18.10.pdf");
  1. Use Aspose.PDF to convert this Aspose.Words’ generated plain PDF to PDF/UA for example:
    Document doc = new Document("awjava-18.10.pdf");
    doc.convert("file.log", PdfFormat.PDF_UA_1, ConvertErrorAction.Delete);
    doc.save("final.pdf");

I tried that, but text is not tagged, and thus cannot be saved as PDF_UA_1

Conversion log entry:

Text object not tagged

Report.png (62.4 KB)

As shown above, all text fails as “Text object not tagged”, one error for each letter.

Is there a fix for this?

The same is true for images.
Report.png (100.1 KB)

(The above screenshots are from the detailed report from PAC3, PDF Accessibility Checker 3)

@gertsen

Thanks for writing back.

Currently, PDF/UA Support is on under development stage for Aspose.PDF. We are working over implementing more features regarding image and text tagging inside PDF. We hope soon they will be available in the API for usage.

In the meantime, would you please share you sample PDF document along with complete requirements for making it tagged. This would really help us implementing the functionality accordingly.

In the ideal world, I would like to take any Word document that’s been marked up as well as Word allows (alternate texts on images, links etc. Title defined, tables with header rows defined etc.) and load it into Aspose.Words and save it as a PDF/UA file.

Alternately it would be nice to be able to open the Word document in Aspose.Words, save it as a PDF, then have Aspose.PDF open it and save it as PDF/UA.

This attached file is (according to PAC3) completely PDF/UA approved, except it’s not tagged as PDF/UA.
Yet when I use Aspose.PDF to save it as PDF/UA, I get the following errors in the log:

<Graphics>
	<Problem Severity="Error" Clause="7.3" ObjectID="76" Page="1" Convertable="False" Code="7.3:1(14.8.4.5)">'Figure' element on a single page with no bounding box</Problem>
	<Problem Severity="Error" Clause="7.3" ObjectID="177" Page="2" Convertable="False" Code="7.3:1(14.8.4.5)">'Figure' element on a single page with no bounding box</Problem>
	<Problem Severity="Error" Clause="7.3" ObjectID="124" Page="2" Convertable="False" Code="7.3:1(14.8.4.5)">'Figure' element on a single page with no bounding box</Problem>
</Graphics>
<Headings />
<Tables>
	<Problem Severity="Error" Clause="7.5" ObjectID="" Page="" Convertable="False" Code="7.5:2">Table header cell has no associated subcells</Problem>
</Tables>

PDF File:
https://www.dropbox.com/s/2o12k88n546rubf/test%20almost%20UA.pdf?dl=1
(I’m using DropBox because I need to remove this file again after you download a copy. Please let me know when you have it!)

Log file:
test almost UA.pdf convertlog.zip (914 Bytes)

@gertsen

Thanks for providing more details.

We have logged you requirements under the ticket ID PDFNET-45593 in our issue tracking system. These details would really help us implementing the feature in the API. We have linked the ticket ID with your post so that you can receive notification once the feature is available. Please be patient and spare us little time.

We are sorry for the inconvenience.

PS: You may please remove the file from dropbox as we have downloaded it.

The issues you have found earlier (filed as PDFNET-45593) have been fixed in Aspose.PDF for .NET 19.5.

1 Like

The issues you have found earlier (filed as WORDSNET-6614) have been fixed in this Aspose.Words for .NET 22.1 update also available on NuGet.