508 Compliance

TJTomJames · March 1, 2011, 3:55pm

We are starting to put into practice more of the Aspose.Total for .NET. We have some 508 compliance questions. We searched through the forums and tried the live chat. But nothing seems to satisfy us as far as the answer. Either the data is out of date in the forums (2007/2008) or the live chat pushes us off to someone that is off line.<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

We need to ensure that all our PDF documents are 508 compliance. We are using Aspose.PDF and Aspose.PDF.Kit. To that end we are looking to see if the tools we currently have will allow us to automate the check for the 508 compliancy without user interaction.

1. Convert pdf into html

2. Convert pdf into tagged pdf

3. Convert pdf into xml

4. Convert html to pdf

5. Create a PDF based on a MS Word template.

What I am looking for is how I can programmatically create HTML and PDF code so that I can turn to those that are auditing our applications and websites and tell them that what we are producing is 508 compliant. I look at the following links in your forums pages and it looks as if your tools can not provide this. But I also look at the posting dates and realize that the newer versions might.

http://www.aspose.com/community/forums/272363/accessibility/showthread.aspx#272363

http://www.aspose.com/community/forums/106135/need-help-about-the-aspoose-functionality-for-our-project/showthread.aspx#106135

http://www.aspose.com/community/forums/99270/is-grid-accessible/showthread.aspx#99270

codewarior · March 3, 2011, 11:16am

Hi,

Thanks for considering Aspose.

I am representative from Aspose.Pdf team. This product is used to generate PDF documents from scratch. You can use its API, convert Image file, transform XML/XSL-FO files, convert HTML files and even simple text files into PDF format. I wonder I am not entirely certain about the exact policy under 508, which you would like to be followed when PDF documents are generated. Can you please share some details. We apologize for your inconvenience.

You may consider visiting the following links for information on

shahzadlatif · March 4, 2011, 11:34am

Hi,

I’m a representative of Aspose.Pdf.Kit. I’m sorry to share with you that Aspose.Pdf.Kit doesn’t provide any features to check for 508 compliance. However, if you could share some further details, our team will try to provide support for such feature in our future versions. Please provide the following information:

Do you want to check the PDF file for 508 compliance?
Do you also want to convert the PDF file to a 508 compliant PDF?
You also mentioned that you want to convert the PDF to XML, tagged PDF, and HTML; do you want these output files to be 508 compliant as well?

Please share some more details, so we could look into your requirement in detail.

We’re sorry for the inconvenience.
Regards,

TJTomJames · March 4, 2011, 3:02pm

PDF Accessibility

Before a PDF is accessible, it must:

Be properly tagged
Have a logical reading and tab order
Have alternative text for all images and objects
Have a specified language
Have bookmarks linked to the sections of the document for files of 10 pages or more

So I want to check that the document is 508 compliant.

I want to convert a PDF to a 508 compliant PDF.

I want the output of any PDF conversion (output/input) to be 508 compliant.

I want any PDF I create to be compliant so I need to know how to accomplish the above 5 tasks using Aspose.

shahzadlatif · March 5, 2011, 7:09am

Hi,

Thank you very much for sharing further details. We’ll look into your requirements in detail and update you with the results the earliest possible.

We’re sorry for the inconvenience.
Regards,

codewarior · March 8, 2011, 4:28am

Hi,

Thanks for sharing the details.

I have logged your requirement of converting HTML file into 508 compliant PDF document as PDFNET-25052 in our issue tracking system under new features list. We will further look into the details of this requirement and will keep you updated on the status of correction.

Besides this, I have some thoughts over your following requirements.

Have a logical reading and tab order

When generating the PDF documents from Aspose.Pdf for .NET, all the contents are placed in flow layout and in left-to-right and top-to-bottom order. Whereas when transforming the HTML file into PDF, the layout depends upon the sequence in which information is present inside source HTML file.

Have alternative text for all images and objects

When adding images to PDF document, you can specify the text information for image while using Image.ImageInfo.Title property. You may also try using Image.ImageNotes to specify the notes information. For more information, please visit Working with Images

Have a specified language

Aspose.Pdf supports Unicode characters. So you can add text contents of any language to PDF document. Whereas concerning to converting contents of HTML file containing information other than English, the source HTML should have the contents in specified language. For more information, please visit Font Handling

Have bookmarks linked to the sections of the document for files of 10 pages or more.

Aspose.Pdf supports the capability to create bookmarks inside PDF documents. For more required information, please visit Adding bookmarks in the PDF document

shahzadlatif · March 8, 2011, 1:21pm

Hi,

I’m sorry to inform you that checking 508 compliance or conversion to a 508 compliant PDF is not currently supported by Aspose.Pdf.Kit; however, I have logged the related new feature requests as given below:

PDFKITNET-25059 - Check whether the PDF file is 508 compliant
PDFKITNET-25060 - Convert PDF file to a 508 compliant PDF
PDFKITNET-25061 - Convert PDF file to 508 compliant HTML file
PDFKITNET-25062 - Convert PDF file to 508 compliant XML file

Our team will investigate these requirements and you’ll be updated via this forum thread once they’re supported.

We’re sorry for the inconvenience.
Regards,

asad.ali · September 22, 2018, 10:50pm

@TJTomJames

Thanks for being patient.

We would like to inform you that PDF/UA Compatibility has been added in Aspose.PDF for .NET 18.9 and now you can create PDF document which is compatible with PDF/UA standard (also known as “Section 508” or “WCAG standard”) and also check compatibility with this standard.

In latest version of the API, we have provided beta-version of functionality where support of PDF/UA standard includes two parts:

1- PDF/UA validation i.e. check of the PDF document on compatibility with PDF/UA standards.
2- Create new PDF document with appropriate tagging (tagged PDF) which is one of important requirements to make document PDF/UA compatible.

Currently, tagged text and images are supported and we are planning to extend this functionality according to the specification. The PDF/UA validation is implemented in manner similar to PDF/A validation. Below are the code snippets which demonstrate how to use PDF/UA functionality:

Check document PDF/UA compatibility

string docFilename = "example.pdf";
string logFilename = "example.xml";
Document document = new Document(docFilename);
bool isValidPdfUa = document.Validate(logFilename, PdfFormat.PDF_UA_1);

Result of document.Validate() indicates if document is PDF/UA compatible. The log file includes the detail information about validation. Currently (In Aspose.PDF for .NET 18.9) all checks were developed and implemented except parts Font, Text (partially, involves with Font) and XObject.

Create document with tagged text

Document doc = new Document();
Aspose.Pdf.Page page1 = doc.Pages.Add();
Aspose.Pdf.Page page2 = doc.Pages.Add();
Aspose.Pdf.Page page3 = doc.Pages.Add();

//Create TextState and configure it
Aspose.Pdf.Text.TextState ts = new Aspose.Pdf.Text.TextState();
ts.Font = FontRepository.FindFont("Arial");

//Creating tagged text element
//supported tags P, H,H1-H6
TaggedPdfTextElement textElement1 = new TaggedPdfTextElement(doc, "P","text",ts);
TaggedPdfTextElement textElement2 = new TaggedPdfTextElement(doc, "P","test1",ts);
TaggedPdfTextElement textElement3 = new TaggedPdfTextElement(doc, "P","test2",ts);
TaggedPdfTextElement textElement4 = new TaggedPdfTextElement(doc, "P","test3",ts);
TaggedPdfTextElement textElement5 = new TaggedPdfTextElement(doc, "P","test4",ts);
TaggedPdfTextElement textElement6 = new TaggedPdfTextElement(doc, "P","test5",ts);
TaggedPdfTextElement textElement7 = new TaggedPdfTextElement(doc, "P","test6",ts);
TaggedPdfTextElement textElement8 = new TaggedPdfTextElement(doc, "P","test7",ts);

//Addg tagged text element to content
page1.TaggedPdfContent.Add(textElement1);
page1.TaggedPdfContent.Add(textElement2);
page1.TaggedPdfContent.Add(textElement3);
page2.TaggedPdfContent.Add(textElement4);
page2.TaggedPdfContent.Add(textElement5);
page3.TaggedPdfContent.Add(textElement6);
page3.TaggedPdfContent.Add(textElement7);
page3.TaggedPdfContent.Add(textElement8);

// Save PDF Document
doc.Save(“output.pdf”);

Create document with tagged image

Document doc = new Document();
Aspose.Pdf.Page page1 = doc.Pages.Add();
// Create image
Image image = new Image();
//Assign image file 
image.File = @"image.png");

//Create BBox element
Rectangle BBox = new Rectangle(30, 70, 300, 720);
//Create tagged figure element
TaggedPdfFigureElement figureElement = new TaggedPdfFigureElement(doc,image, BBox);
Rectangle BBox1 = new Rectangle(550, 570, 300, 720);
TaggedPdfFigureElement figureElement1 = new TaggedPdfFigureElement(doc, image, BBox1);
//Add tagged figure element into content
page1.TaggedPdfContent.Add(figureElement);
page1.TaggedPdfContent.Add(figureElement1);
// Save PDF Document

doc.Save("output.pdf");

We request you to please use Aspose.PDF for .NET 18.9 in order to use above mentioned features and please share your feedback with us. It would be very helpful and valuable for us in improving and extending the functionality in future releases of the API.

Jesse.Voogt.ctr · October 3, 2018, 1:51pm

Hi Asad,

What a coincidence you have just rolled out a beta for PDF/UA accessibility, right as I was shopping around for a product that would fit my needs (converting fairly complex html with css & fonts to PDF/UA compliant pdf). I had been using Aspose, but until now it seemed accessibility was not really supported. Thank you for working on this, and for the code samples. It would be very helpful if you could point me to any new documentation pointing to how to make documents accessible with Aspose, especially using the HTML to PDF conversion features. Ideally the majority of the tagging could be done similar to iTextPDF’s “setTagged” method. I will be testing out the new version momentarily, so any pointers would be quite welcome.

asad.ali · October 3, 2018, 7:40pm

@Jesse.Voogt.ctr

Thanks for sharing your feedback.

We would like to share with you that we have been working over adding functionality to support PDF/UA compatibility and it is still under development. However, we have rolled out some beta version of functionality in API versions lately. We will definitely be adding more interesting and useful features in upcoming releases of the API.

Concerning to the examples in API docs, we are working over preparing respective documentation and articles regarding this new feature in our API and soon they will be publicly showcased. However, the code snippet given above in our previous reply, shows complete functionality which has been added in the API upto latest version. Most of the other PDF/UA features are related to validation of tagged PDF and soon they will be exposed as consolidated functionalities.

As you know that we are working over adding more features regarding 508 PDF Compliance, we are so open to our customers for their suggestions and further requirements. It would really be great if we can have some detailed requirement from your side on how you want to implement this functionality using our API. Would you please share some sample files and details of the scenario which you want to run. We will log all details accordingly and provide our feedback.

Jesse.Voogt.ctr · October 3, 2018, 10:56pm

Thanks for the quick feedback - I’ve been busy trying to figure out how much work it will be to handle the fact that Aspose.Pdf.Generator.Pdf API is gone (our services all rely on this heavily).

As mentioned, my main area of concern is producing accessible PDF files from accessible HTML files. In an ideal scenario, with minimal coding, I’d be able to do the following:

feed accessible HTML5 (via a string) as the input
receive an accessible, tagged PDF as the output that validates all automatic accessibility checks via Adobe’s checker

Of course you will note that CSS & fonts would need to be resolved, so the input isn’t quite as simple. For this I would want to be able to simply specify a local path for each of these resources. I’ll have to review how this is done in Aspose, but from my memory it was possibly to load resources linked via http:// or https:// prefixed href and src attributes at least.

I think defaulting PDF generation to be as accessible as possible would be the way to go, because it will mean more PDFs getting generated out in the wild will in the future be more accessible without developers even having to change code (or even understand about PDF/UA requirements and what they’re for). I understand however that this feature is new, so you might start with requiring developers set a flag like “AutoTag” to true to enable such a feature. Even if you auto-tag, it would be a good idea to provide some kind of mechanism to override that tagging function for particular nodes (or override it altogether). I imagine something like a “HtmlToPDFTagger” class that has a default implementation that you can inherit from to create your own class. A method on this might be something like “TagNode” which will execute for every node in the html, providing some way to set the tag.

I would like to mention that a lot of our applications still use tables for layout so it would be good if there were some way to either mark a table as a “layout” table (perhaps by adding a special, properly namespaced css class like aspose-layout-table? or a custom data- attribute like data-aspose-layout-table=“true”?), or else for layout vs data table to be automatically detected in some way. One method to detect it might be to scan the table for any “scope” attributes - if found, you might assume it is a data table, if not then a layout table.

I will work to provide a sample HTML file for you that I might expect to feed to some function to convert to an accessible PDF.

asad.ali · October 4, 2018, 8:19am

@Jesse.Voogt.ctr

Thanks for sharing more details.

Yes, your findings are correct about Aspose.Pdf.Generator approach has been discontinued and removed. It has been replaced with Aspose.Pdf (DOM) approach which is more efficient in terms of performance. In order to work with new DOM model, you may please check “Working with Aspose.Pdf” articles in our API documentation. In case you face any issue while updating your existing code, please feel free to let us know by creating a new topic.

Concerning to your requirements about PDF/UA compliance, we have recorded your concerns and will definitely consider them while implementing the related functionality. Please take your time to prepare a sample HTML file and share with us. We will log an investigation ticket in our issue tracking system and share the ID with you.

Jesse.Voogt.ctr · October 4, 2018, 1:16pm

Thanks - I’m working on crafting a few sample HTML files now.

Since you are still working on the PDF/UA features, should I assume it is not currently possible to convert HTML to accessible PDF using the general technique described in https://docs.aspose.com/html/net/convert-html-to-pdf/?

Jesse.Voogt.ctr · October 4, 2018, 4:03pm

I’ve prepared the following sample, which includes an html and one css file:
sample-accessible-html-v1.zip (10.2 KB)

While I didn’t thoroughly cover all things possible with HTML5, my goal was to give you a sample of accessible w3c compliant HTML that I might want to channel through to Aspose to convert to an accessible PDF. Most of the input currently isn’t nearly this semantic, unfortunately, but if we have to rewrite a lot of this code anyway it would be my preference to replace it with semantic HTML5 that complies with w3c standards. That way the new HTML could theoretically be used to display content as a web page just as easily as used as input for Aspose.

I’m going to use this file or something similar to test the capabilities of the HTML conversion available currently via the new DOM approach.

asad.ali · October 4, 2018, 9:37pm

@Jesse.Voogt.ctr

Thanks for getting back to us.

You can convert HTML files into PDF and then convert obtained PDF document into PDF/UA compliant one using following code snippet:

HtmlLoadOptions loadOptions = new HtmlLoadOptions(dataDir + "Sample\\");
loadOptions.PageInfo.Margin = new MarginInfo(0, 0, 0, 0);
loadOptions.PageInfo.Height = PageSize.A4.Width;
loadOptions.PageInfo.Width = PageSize.A4.Height;
loadOptions.PageInfo.IsLandscape = true;
Document doc = new Document(dataDir + "Sample\\sample-accessible-html-input.htm", loadOptions);
doc.Save(dataDir + "PDFUA.pdf");
doc = new Document(dataDir + "PDFUA.pdf");
doc.Convert(new MemoryStream(), PdfFormat.PDF_UA_1, ConvertErrorAction.Delete);
doc.Save(dataDir + "UA_out.pdf");

UA_out.pdf (307.1 KB)

In above code sample, we have used the HTML file shared by you and path given in HtmlLoadOptions Constructor was the path where CSS file was present which was used in the HTML. You may also check generated PDF/UA compliant document which is attached above and share your feedback with us. We will further proceed accordingly to assist you.

asad.ali · October 4, 2018, 10:16pm

@Jesse.Voogt.ctr

We have further tested the generated output for compliance validation and found that there were several issues in the output PDF. 2018-10-05_01-14-09.png (83.2 KB) We have logged an issue as PDFNET-45498 in our issue tracking system with all the details you have provided us about your requirements. We will further proceed investigating the issues and keep you posted with the status of its correction. Please spare us little time.

We are sorry for the inconvenience.

Jesse.Voogt.ctr · October 5, 2018, 6:01pm

Thanks - I will await your fixes.
By the way, I’m not sure what the image 2018-10-05_01-14-09.png contains, but I can’t view it, it says it’s private when I click on it.

I tend to think first converting to PDF and then trying to make it accessible will not work, because the initial conversion to PDF will possibly lose some of the information needed to make it accessible? I will take a look at your method in more detail soon to see what the issues are.

Jesse.Voogt.ctr · October 5, 2018, 8:00pm

Out of curiosity, is it intentional that you have already put up the release notes for version 18.10, but I’m not able to find the DLLs for that version or install via Nuget? Or do I just need to add the -prerelease option if I wanted to test the beta version instead of 18.9?

Edit: never mind, I see the 18.10.0 now in the package manager.

asad.ali · October 5, 2018, 8:20pm

@Jesse.Voogt.ctr

Please download the image from this link, so that you can check what were the issues the output PDF file had.

We have recorded your concerns and will definitely keep them in view while investigating and implementing the functionality. Please spare us little time.

We are sorry for the inconvenience.

Jesse.Voogt.ctr · October 31, 2018, 4:07pm

Any updates on accessibility of the HTML-to-PDF process?

I have also spent some time evaluating the ability to fill out AcroForm fields on an already accessible PDF and generate a flattened, but still accessible PDF. It seems that the act of flattening the fields produces a nicely read-only PDF without form fields, but that the resulting elements that were converted from form fields are not tagged and thus the PDF is not accessible. Furthermore the alt text is missing for the images that are the result of flattening buttons with images in the form - perhaps the source of that alt text could be the tooltip of the button or something in a future version of Aspose?