How to fix Title, Primary language, and Tagged PDF Meta Data of PDF output for Accessibility Check

When I create PDFs they fail the Accessibility Check in Acrobat PRO DC for:

  • Tagged PDF
  • Primary Language
  • Title

What edits I need to make in the .NET code so that these 3 things pass for accessibility?

@aclaudio

Would you please share your sample code snippet along with the screenshot of the error that you are noticing. We will test the scenario in our environment and address it accordingly.

Hi asad I’m trying to fix the accessibility errors from Adobe’s accessibility check. So far I’ve been able to designate the language, title, by using the C# interface ITaggedContent. The only issue I’m facing is the tagged content for alternate text for logos and html elements. If I update the HTML will that suffice?

Capture.JPG (34.9 KB)
If you see from the image I’m able to update the meta data such as title and language, but since the HTML is coming from a memory stream to create an instance of the Document class, I would like to know how to set alternate text for the image. In the HTML stream I have alt tag for HTML set but it doesn’t transfer over to the PDF document.

@aclaudio

We apologize for the delayed response.

We need to further investigate the scenario in order to check feasibility of your requirements. Would you kindly share a sample HTML file along with sample code snippet that you are using to generate accessible PDF file at your end from it. We will test the scenario in our environment and address it accordingly.

1 Like

I have attached the HTML template to a zip foldertesting_data.zip (18.0 KB)

The HTML template is shown in the code as “finalTemplate”, this is the following C# code:

byte[] htmlContentByteArray = Encoding.ASCII.GetBytes(finalTemplate);
MemoryStream htmlContentStream = new MemoryStream(htmlContentByteArray);
HtmlLoadOptions htmlLoadOptions = SethtmlLoadOptions(specifications);
Document pdfDoc = new Document(htmlContentStream, htmlLoadOptions);

I’m trying to tag the PDF, but not sure how to. I can add language and title by using ITaggedContent - but that’s about it. Adobe complains that the elements aren’t tagged and the image has no alternate text.

Any help would be appreciated, thanks!

@aclaudio

Thanks for providing requested data.

We have logged an investigation ticket as PDFNET-48846 in our issue tracking system for the sake of further analysis against your requirements. We will check the ticket in details and keep you posted with the status of its resolution. Please be patient and spare us some time.

We are sorry for the inconvenience.

While we wait for that response, as a paid license member is there no way we can get live support for free for a query as simple as this? Thanks.

@aclaudio

The functionality to generate tagged PDF has newly been introduced to the API and we are further working over enhancing it. We need to further investigate and check information at our side whether your requirements can be met or not and how much work is needed in order to get the required functionality implemented.

Please note that you can create Tagged PDF by illustrating structural elements like image using the API which provides the ability to specify alternate text. Furthermore, you can also check tagging images inside existing PDF document article as well in the API documentation.

Since you are generating PDFs from HTML, we need to check the possibility of creating tagged images during conversion and as soon as the investigation against the ticket is done, we will inform you within this forum thread. We highly appreciate your patience in this matter. Please give us some time.

We apologize for the inconvenience.

So in reference to the links you supplied I have tried the following before:

foreach (FigureElement figureElement in rootElement.FindElements<FigureElement>(true))

That doesn’t return me anything, my image is actually an Aspose XImage, and can be found as such:

foreach (XImage image in pdfDoc.Pages[1].Resources.Images)

However, for any object of type XImage, there is no “Alternative Text” property for me to set a value for, unlike FigureElement which has an Alternative Text property.

That link assumes that elements in the PDF document are tagged elements, my PDF document has the issue where it has no tagged elements, so I cannot access them via the interface ITaggedContent. What should I do in this situation?

@aclaudio

We are further investigating the scenario against your requirements. Would you kindly share the definition of the below method present in the code-shared by you:

SethtmlLoadOptions(specifications);
  private HtmlLoadOptions SethtmlLoadOptions(PDFSpecifications specifications)
        {
            Margin margin = (specifications == null || specifications.Layout == null)
                            ? _defaultmargin
                            : specifications.Layout;
            HtmlLoadOptions htmlLoadOptions = new HtmlLoadOptions();
            htmlLoadOptions.PageInfo.Width = Aspose.Pdf.PageSize.PageLegal.Width;
            htmlLoadOptions.PageInfo.Height = Aspose.Pdf.PageSize.PageLegal.Height;
            htmlLoadOptions.PageInfo.Margin = new MarginInfo(margin.left, margin.bottom, margin.right, margin.Top);
            return htmlLoadOptions;
        }

Aspose.Pdf.PageSize values are the following:

  • Aspose.Pdf.PageSize.PageLegal.Width = 612
  • Aspose.Pdf.PageSize.PageLegal.Height = 1008

specifications argument always has the following values

  • margin.left = 0
  • margin.bottom = 48
  • margin.right = 0
  • margin.top = 36

@aclaudio

We tried to convert the document directly into PDF/UA but image was not tagged in the output for alternate text. We then tried to tag the images by accessing ITaggedContent. But API did not find any StructureElement in the PDF.

HtmlLoadOptions htmlLoadOptions = new HtmlLoadOptions();
htmlLoadOptions.PageInfo.Width = Aspose.Pdf.PageSize.PageLegal.Width;
htmlLoadOptions.PageInfo.Height = Aspose.Pdf.PageSize.PageLegal.Height;
htmlLoadOptions.PageInfo.Margin = new MarginInfo(0, 48, 0, 36);

Document doc = new Document(dataDir + "testing_data.html", htmlLoadOptions);
doc.Convert(dataDir + "validationlog.xml", PdfFormat.PDF_UA_1, ConvertErrorAction.Delete);
doc.Save(dataDir + "pdf209.pdf");

doc = new Document(dataDir + "pdf209.pdf");
ITaggedContent taggedContent = doc.TaggedContent;
StructureElement rootElement = taggedContent.RootElement;

// Set title for tagged pdf document
taggedContent.SetTitle("Document with images");
foreach (FigureElement figureElement in rootElement.FindElements<FigureElement>(true))
{
 // Set Alternative Text  for Figure
 figureElement.AlternativeText = "Figure alternative text (technique 2)";
}
doc.Save(dataDir + "taggedimage.pdf");

We are afraid that the functionality which you are looking for (i.e. HTML to Tagged PDF with tagged images for alternate text) is not yet present in the API and needs to be investigated for implementation. The related logged ticket has also been updated with latest test results. We will further inform you as soon as we have some updates regarding ticket resolution. Please give us some time.

We apologize for the inconvenience.

Thanks, can you please investigate trying to take any element in the PDF and tagging it, for example tagging an element as a paragraph, header, etc. I cannot find a way to tag an existing element in the pdf document that is created from the HTML.

I am trying some work arounds, for example I’m able to create an IllustrationElement (a tagged image), however when I try to add it to the PDF document it adds it to the end of the document:

// Under Development
IllustrationElement figure1 = taggedContent.CreateFigureElement();
taggedContent.RootElement.AppendChild(figure1);
figure1.AlternativeText = "Figure One";
figure1.Title = "Image 1";
figure1.SetTag("Fig1");
figure1.SetImage("image.png");

How can I append it in such a way that it appears as the first element in the document so this way I can continue trying to remove other elements as I create tagged content out of them as well.

We have updated the ticket information accordingly and will surely investigate the case from this perspective.

Are you using the same PDF document obtained from HTML or you are building it from scratch? Would you please share a complete sample code snippet so that we can test it in our environment and share our feedback with you accordingly?

No I’m not doing it from scratch, I’m using the same HTML and code I sent, I’m just creating an attachment and appending like in the post I put in 10/2/2020, but anything I append using:
taggedContent.RootElement.AppendChild(some_object);
Will show at the end of the existing document, anyway to put it in the beginning?

@aclaudio

We were able to notice that new images were being appended at the end of the document and we could not find any property or method to position them as per your requirements. Hence, we have logged an issue as PDFNET-48877 in our issue management system for more investigation. We will let you know as soon as the ticket is resolved. Please spare us some time.

We apologize for the inconvenience.

1 Like

From the example I provided, is there any way I can at least auto-tag it? Then I could at least manipulate the tags programmatically.

@aclaudio

Regretfully, there is no functionality such as auto-tagging the elements. However, could you please explain a bit more about auto-tag feature like how you expect the API to auto-tag the images? We will check related information at our side and share our feedback with you.