I want to convert pdf-documents to tagged pdf (PDF/A-1a):
Document document = new Document(new ByteArrayInputStream(input));
PdfFormatConversionOptions options = new PdfFormatConversionOptions(PdfFormat.PDF_A_1A, ConvertErrorAction.Delete);
document.convert(options);
document.save(“output.pdf”);
If I show the tags of ‘output.pdf’ in adobe acrobat, the application crashes.
This problem does not occour, if the source pdf already had tags (sourceTagged.pdf) bevore the conversion to PDF/A-1a.
Can I expect tags to be created during conversion from PDF to PDF/A-1a or is this an impossible task?
Would you please confirm if our understanding are correct about your inquiry? You want to generate a PDF/A-1a document from a PDF which is tagged e.g. sourceTagged.pdf and source.pdf.
Also, would you please explain a bit more like how you show tags and how can we replicate the application crashing issue at our side? We will further proceed to assist you accordingly.
Well, I guess I was not precise enough. I wanted to show the differences in the PDF/A conversion process with 2 different input documents, which both do not conform to the pdf/a standard.
For the first conversion I used an untagged source document (“source.pdf”) in comparison with the second conversion where I used an already tagged source document (“sourceTagged.pdf”).
The result of the second conversion (“outputTagged.pdf”) ist fine, but the result of the first conversion (“output.pdf”) has no tags. You can replicate the application crash, if you open the file with adobe acrobat and show the tags in the left navigation, as you can see here: acrobatCrash.png (11.2 KB)
The insufficient result lead me to another question: Can I expect tags to be automatically created during conversion from PDF to PDF/A-1a, or is just not possible?
It is not possible to create tags while converting PDF to PDF/A-1a. The tags for accessibility are added while converting the document to PDF/UA which you can do by following the below code snippet:
var doc = new Document(dataDir + "PDFUA.pdf");
doc.Convert(new MemoryStream(), PdfFormat.PDF_UA_1, ConvertErrorAction.Delete);
doc.Save(dataDir + "UA_out.pdf");
It seems not justifiable to me, that the conversion from PDF to PDF/A-1a does not create tags, whereas the conversion from PDF to PDF/UA should create tags, becaues both standards deal with tagged PDF.
But I did a test and converted the untagged document (“source.pdf”) to PDF/UA, similar to the way shown above (using Apose.PDF for Java, I guess I cannot use MemoryStream). The result of this conversion “output.pdf” has no tags but also is not marked as PDF/UA conform document, as you can see here: pdfToPdfUA.zip (245.7 KB)
We have also noticed the similar issue in our environment while testing the scenario.
Furthermore, we have logged a ticket as PDFJAVA-39904 in our issue tracking system for further analysis on your case. We will further check the feasibility of your requirements whether they are able to achieve or not and let you know as soon as the ticket is resolved. Please be patient and give us some time.