Word to Pdf/UA and Validating content

Hi,
I am converting Word into Pdf/UA format using Aspose.Words. I then validate the converted Pdf against the Pdf/UA standard using Aspose.Pdf.
I am using .NET and Aspose.Words.dll version 22.1 and Aspose.Pdf.dll 21.12.
This is some sample code that shows how the Word document is converted to Pdf/UA and then validated:

Aspose.Words.Document doc = new Aspose.Words.Document(wordFile);
Aspose.Words.Saving.PdfSaveOptions pdfOpts = new Aspose.Words.Saving.PdfSaveOptions();
pdfOpts.SaveFormat = Aspose.Words.SaveFormat.Pdf;
pdfOpts.Compliance = Aspose.Words.Saving.PdfCompliance.PdfUa1;
pdfOpts.ExportDocumentStructure = true;
pdfOpts.OutlineOptions.HeadingsOutlineLevels = 3;
pdfOpts.OutlineOptions.CreateMissingOutlineLevels = true;
doc.Save(pdfFile, pdfOpts);

Aspose.Pdf.Document document = new Aspose.Pdf.Document(pdfFile);
document.Validate(logfile, Aspose.Pdf.PdfFormat.PDF_UA_1);

From the validation result, I have a few questions about the actual nature of the issue, and if it’s something that may be an issue in either of the two products:

1. Inserted images are always validated with this error:

Problem Severity=“Warning” Clause=“7.1” ObjectID=“47” Page=“1” Convertable=“False” Code=“7.1:2.4.1”>Possibly inappropriate use of a ‘Figure’ structure element

Is there somway to change this, wither in Word document, in Aspose.Words or in the code that creates the Pdf?

2. Tables are always validated with this error:

Problem Severity=“Warning” Clause=“7.5” ObjectID="" Page="" Convertable=“False” Code=“7.5:3.2”>Table summary missing

Even tables that have Alt title and description added. There seems no way to add a table summary in Word, so maybe Aspose.Words can add the necessary tags using the Alt text info, if the error is indeed correct.

3. Table of content - each entry gives this error:

Problem Severity=“Error” Clause=“7.18.1” ObjectID=“20” Page=“2” Convertable=“False” Code=“7.18.1:2”>Alternative description missing for an annotation

As this is a heading put into a TOC automatically by Word, it’s not possible to have an Alt description. Is this down to an error in Aspose.Words or Aspose.Pdf? Or can it be corrected in either manually changing word document or by changing the code to create the Pdf?

As these issues would be related to either product, as well as related to thing that need to be changes in the actual Word document, I am hoping you can assist from this forum.

Regards
Thomas

@Lector Thank you for reporting these problems to us. I have logged them as WORDSNET-23290, WORDSNET-23291 and WORDSNET-23292. We will investigate the issues and provide you more information.

@Lector We have completed analyzing WORDSNET-23291 and concluded this is not a bug.

Summary attribute is not required by PDF/UA-1 specification. Here is a full quote of table requirements from the spec:

7.5 Tables
Tables should include headers. Table headers shall be tagged according to ISO 32000-1:2008, Table 337 and Table 349.
NOTE 1 Tables can contain column headers, row headers or both.
NOTE 2 As much information as possible about the structure of tables needs to be available when assistive technology is relied upon. Headers play a key role in providing structural information.
Structure elements of type TH should have a Scope attribute. If the table’s structure is not determinable via Headers and IDs, then structure elements of type TH shall have >a Scope attribute.
Table tagging structures shall only be used to tag content presented within logical row and/or column relationships.

Also here is note about summary attribute from “Tagged PDF Best Practices Guide”:

5.4.2 Summary attribute
It is recommended that use of this attribute be restricted to cases where visual information about the table would not be characteristically available to assistive technology.
Where auxiliary information or guidance would be useful to any user it is recommended that such be provided in text, and not hidden in a Summary attribute which would only be available to those using certain AT.
Providing a Summary is not precluded for specific target audiences, but it is recommended that the practice be limited to such cases.

Also Adobe Acrobat Accessibility Check, Adobe Acrobat Preflight PDF/UA-1 compliance check and PAC 3 do not show errors about the table Summary attribute. And also MS Word do not generate table Summary attribute either.

So according to all above it seems to us that table Summary attribute should not be generated in general case of conversion from MS Word document to PDF. If you have a specific case when table Summary attribute is required then it should be added on postprocessing of Aspose.Words output.

The severity level of this problem in Aspose.PDF report is “Warning” and not an “Error”. Most likely Aspose.PDF should reconsider to lower the severity level of this problem or to remove it at all.

@Lector We have completed analyzing WORDSNET-23292. PDF/UA specification requires alt text on all hyperlinks but it is not possible to set the alt text (ScreenTip) for autogenerated hyperlinks in TOC in MS Word GUI. Aspose.Words could update the TOC field and generate the links by itself. To work the problem around, you can generate ScreenTip for auto-generated hyperlinks in the document. For example see the following code:

Document doc = new Document(fileName);

var tocHyperLinks = doc.Range.Fields
    .Where(f => f.Type == FieldType.FieldHyperlink)
    .Cast<FieldHyperlink>()
    .Where(f => f.HRef.StartsWith("#_Toc"));

foreach (FieldHyperlink link in tocHyperLinks)
    link.ScreenTip = link.DisplayResult;

PdfSaveOptions opt = new PdfSaveOptions()
{
    Compliance = PdfCompliance.PdfUa1,
    DisplayDocTitle = true,
    ExportDocumentStructure = true,
};
opt.OutlineOptions.HeadingsOutlineLevels = 3;
opt.OutlineOptions.CreateMissingOutlineLevels = true;

var outFile = Path.ChangeExtension(fileName, "_aw.pdf");
doc.Save(outFile, opt);

@alexey.noskov, thanks for the descriptions and the code. This really helped a lot :smiley:

The issues you have found earlier (filed as WORDSNET-23292) have been fixed in this Aspose.Words for .NET 22.3 update also available on NuGet.