Converting PDF to Accessible Format failing

I am attempting to convert a PDF document to an accessible/tagging format, but upon calling what I believe is the recommended code and then saving the PDF document it unfortunately cannot be tagged. Below is the code that is being used to load the PDF document from disk into a PDF object and then convert it to the desired format and then save it out to a stream.

// Load the converted PDF into an Aspose PDF Document
Aspose.Pdf.Document pdfDoc2 = new Aspose.Pdf.Document(conversionFilePath + “.pdf”);

var pdfVersion = GetPdfOutputVersion(tenantConfig.PdfVersion.Trim()); // Returns Aspose.Pdf.PdfFormat.Tagged_PDF as the format

// Create dummy log file for the conversion file
string formatConversionLog = conversionFilePath + “.log”;
File.Create(formatConversionLog).Dispose();

// Convert the loaded PDF to the specified type
pdfDoc2.Convert(formatConversionLog, pdfVersion, Aspose.Pdf.ConvertErrorAction.Delete);

// Save the PDF document to the output stream
pdfDoc2.Save(pdfOutStream);

When the converted PDF is saved to the stream or back to disk it says that it has been locked since it is associated with a format that can be tagged, but when the tagging test is performed in Adobe Preflight it fails.

Are there any steps I might be missing in the conversion or saving process to get tagging working and Preflight to pass?

@jrubright

There are no additional steps to perform conversion of PDF to Tagged PDF. However, could you please share the sample PDF document along with screenshot of results in Adobe Preflight? We will test the scenario in our environment and address it accordingly.

Attached is a redacted copy of the generated PDF and also the preflight output. Please, let me know if there is anything else I can provide to help try to resolve this situation.image.png (108.1 KB)
SBC_Redacted.pdf (262.0 KB)

@jrubright

We have checked the file which you shared. It seems like it was generated using an older version of Aspose.PDF for .NET. Also, the file is already a tagged PDF file as “Read out Loud” option is working on it while opening in Adobe Reader?

Could you please share why you are converting it again into a tagged PDF when it is already a tagged document. Also, please share the definition of “GetPdfOutputVersion()” method that you are using at your end to get the PDF format. We will further proceed to assist you accordingly.

I have generated the document using a more recent version of the library, 19.8 and it still fails to pass any pre-flight checks. The GetPdfOutputVersion function just returns the type of PDF document that we want to convert the generated PDF to, which is data driven, below is the function:

private static Aspose.Pdf.PdfFormat GetPdfOutputVersion(string pdfOutputVersion)
        {
            switch (pdfOutputVersion)
            {
                case "A.1A":
                    {
                        return Aspose.Pdf.PdfFormat.PDF_A_1A;
                    }
                case "A.1B":
                    {
                        return Aspose.Pdf.PdfFormat.PDF_A_1B;
                    }
                case "A.2B":
                    {
                        return Aspose.Pdf.PdfFormat.PDF_A_2B;
                    }
                case "Tagged,PDF":
                    {
                        return Aspose.Pdf.PdfFormat.Tagged_PDF;
                    }
                case "1.3":
                    {
                        return Aspose.Pdf.PdfFormat.v_1_3;
                    }
                case "1.4":
                    {
                        return Aspose.Pdf.PdfFormat.v_1_4;
                    }
                case "1.5":
                    {
                        return Aspose.Pdf.PdfFormat.v_1_5;
                    }
                case "1.6":
                    {
                        return Aspose.Pdf.PdfFormat.v_1_6;
                    }
                case "1.7":
                    {
                        return Aspose.Pdf.PdfFormat.v_1_7;
                    }
                default:
                    {
                        Logger.Log(Syslog.Level.Critical, "Not support PDF version detected.  Please check Tenant configuration");
                        throw new HrException(HttpStatusCode.NotImplemented, ErrorCodeMessages.BadPdfVersion);
                    }
            }
        }

I was forcing the function to return the Tagged,PDF format when I generated the document that I attached previously, but that isn’t passing pre-flight. In regards to your question about why I am converting it again into a tagged document, I am not sure I can answer that. In the code that I provided previously we only convert the word document to a PDF and then we apply the conversion to it to try and get it to be a tagged format. Based on the code we are not doing a double conversion. At what point should I be setting the format on the generated PDF? Is it possible to do it during the Word -> PDF conversion process or do I need to do it afterwards like we are now?

@jrubright

Thanks for sharing the further details of your whole process.

It is quite possible that the Word file is already a tagged document which you are converting to a PDF document. As per our understanding of your case, you are converting a Word file into PDF, then you are checking the PDF format of generated output PDF and converting it into Tagged PDF if it is not tagged. The problem can be anywhere i.e. either in checking the PDF format or converting the PDF to Tagged PDF.

Please share your source Word file and the code snippet that you are using it to convert into PDF so that we can test the whole case accordingly and share our feedback with you.

You can please create a new inquiry about it in Aspose.Words forum category where you will be entertained accordingly.