PDF/A-1a conversion stopped working in .Net version 7.2.0.0 and 7.3.0.0

Berend · September 18, 2012, 12:00pm

Hi,

I am using Aspose.Net total, and I have the following issue: in Aspose.Pdf.dll version 7.1.0.0 I could convert a PDF document to PDF/A-1a format (albeit with some compliance errors), but that stopped working in version 7.2.0.0 and still doesn’t work in version 7.3.0.0. The output document actually is a plain PDF version 1.4 document, not PDF/A at all. I use the .Net 1.1 version of the DLL.

If I try the same with PDF/A-1b, the output is as expected and the document does not have compliance errors.

When I roll back to the DLL version 7.1.0.0, PDF/A-1a works, but I would like to get rid of the PDF/A-1a compliance errors and I was hoping you fixed that in the newer builds.

I use this validator: http://www.pdf-tools.com/pdf/validate-pdfa-online.aspx

PDF/A-1a validation errors with DLL version 7.1.0.0 are as follows:

Validating file “test.doc (7.1, 1A).pdf” for conformance level pdfa-1a

The key MarkInfo is required but missing.

The document does not conform to the requested standard.

The document doesn’t provide appropriate logical structure information.

Done.

From my work with itextsharp I know that the “MarkInfo” and “logical structure” validation errors would be resolved by just marking the output PDF as Tagged, the validator does not even require tags to be actually present. As the source document in my test is MS Word and has a table of contents that is still clickable in the output PDF produced by Aspose, it should however be feasible to provide structure information.

Best regards,

Berend Engelbrecht

nausherwan.aslam · September 19, 2012, 4:15am

Hi Berend,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thank you for the details.

Please share your template file and generated PDF file so we can further investigate the issue you are facing.

Sorry for the inconvenience,

Berend · September 19, 2012, 5:33am

Thanks for your fast response.

I have uploaded a Visual Studio 2010 solution in this zip file:

http://support.decos.nl/berend/AsposeTest-20120919.zip

The project is a stripped down version of our production code with the Aspose licensing information removed (so that the output PDF is marked as “Evaluation only”). I also removed all input file format handlers except Aspose.Words. However, our QA info\rmed me that the problem occurs with all input file types (which makes sense, as it appears to be related to Aspose.Pdf.dll only).

The example input document is included as AsposeTest\test.doc. When the test application is compiled and executed, the document is copied to AsposeTest\bin\Debug\test.doc and an output file AsposeTest\bin\Debug\test.doc.pdf is created.

In the zipfile I have included and compiled against Aspose.Pdf.dll version 7.3.0.0. This generates an output PDF that is plain PDF version 1.4. If I replace ReferencedAssemblies\Aspose-Net1.1\Aspose.Pdf.dll by version 7.1.0.0 and do a clean rebuild, the output file is generated as PDF/A-1a without any change to my source code.

nausherwan.aslam · September 20, 2012, 5:37am

Hi Berend,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thank you for sharing the sample application.

I am able to reproduce your mentioned issue after an initial test using your sample application. Your issue has been registered in our issue tracking system with issue id: PDFNEWNET-34273. Our development team will further look into your issue to identify the cause of the issue. You will be notified via this forum thread regarding any updates against your issue.

Sorry for the inconvenience,

Berend · December 18, 2012, 6:12am

Hello,

I noticed that this problem is fixed in Aspose.Pdf.dll version 7.6.0.0, build date December 9th 2012. It would not surprise me if you simply forgot to put a notification of the fix in this forum thread. I would appreciate if you would automate the issue status reporting to customers. For instance, you might consider to make the status public for issues originating from customer reports.

Best regards,

Berend Engelbrecht

[edit]

I see that the problem is not quite fixed: version 7.6.0.0 marks the output file as PDF/A 1b although 1a is requested. Both version 7.1 and 7.6 do not add the MarkInfo tag, even if the original document does contain bookmarks. This is a pity, because just adding MarkInfo would increase the compliance level. In code using iTextSharp we call PdfWriter.SetTagged even if the document does not contain tags to make it compliant.

[/edit]

tilal.ahmad · December 20, 2012, 2:10am

Hi Berend,

Thanks for your patience.

Sorry for the inconvenience faced. I'm afraid your reported issue is not resolved yet. In Aspose.Pdf v 7.2.0.0 we have implemented tagged Pdf validation feature which cause the issue. You’ll agree that it’s a complex task as Adobe doesn't support Pdf to Pdf/A-1a conversion. We do have plans to implement this feature sometimes later next year. However, we'll keep you updated via this forum thread.

Please feel free to contact us for any further assistance.

Best Regards,

Berend · November 17, 2013, 3:34am

Hi,

I wonder if you have any progress on this. I noticed that in the newest versions of Aspose.Words and Aspose.Diagram you have implemented native PDF/A-1a support. While that is great and the way to provide much better PDF/A-1a compliance, I still need generic PDF/A-1a support for any document type, as was present in Aspose.Pdf v7.1.0.0. My customers do not have an interest in fully tagged PDF files for all file formats, they just have a legal requirement to produce archive files marked as PDF/A-1a for any input file format.

Until newer Aspose.Pdf versions again provide this feature, I am stuck with the old 7.1.0.0 version. That is a pity because this means I can also not benefit from other bugfixes in Aspose.Pdf. For instance, you fixed PDFNEWNET-35485 in August 2013, and I still cannot use that fix because the PDF/A compliance option stops working if I use a newer Aspose.Pdf version than 7.1.0.0.

In Tilal Ahmad Sana’s reply -now almost a year ago- he wrote that “In Aspose.Pdf v 7.2.0.0 we have implemented tagged Pdf validation feature which cause the issue”. Couldn’t you simply provide an option to disable that validation, so that “PDF/A 1a” files can again be produced in the same way as Aspose.Pdf 7.1.0.0 did?

Thanks and best regards,

Berend

codewarior · November 17, 2013, 9:42am

Hi Berend,

Thanks for your patience.

In recent release versions, we have made quite some progress in terms of PDF to PDF/A_1b conversion, however the conversion of PDF to PDF/A_1a is not yet completely supported. The enhancement has already been logged in our issue tracking system as PDFNEWNET-34273. Meanwhile, I have again intimated the development team to look into this requirement and schedule its support. As soon as we have some further updates, we would be more than happy to update you with the status of correction. Please spare us little more time.

aspose.notifier · December 5, 2013, 2:09pm

The issues you have found earlier (filed as PDFNEWNET-34273) have been fixed in Aspose.Pdf for .NET 8.7.0.

This message was posted using Notification2Forum from Downloads module by Aspose Notifier.

Berend · December 6, 2013, 12:27am

Hi,

Thanks for that! This was a happy surprise. I have one question though: if I use the new version in my test prorgram (referred to in the third message of this thread), the output file is marked as PDF/A 1B if I request PDF/A 1A as output format. I am afraid that would still not be good enough for my customers, because they need to follow a guideline by the Dutch government, that specifies PDF/A 1A as the minimum acceptable compliance level for archiving.

When experimenting around a bit, I found that one line of code appears to make the difference:

format = Aspose.Pdf.PdfFormat.PDF_A_1A…
pdfdoc.RemoveMetadata()
pdfdoc.IgnoreCorruptedObjects = True
pdfdoc.Convert(msLog, format, erroraction)
pdfdoc.Save(sTargetFile)

When I add a call to RemoveMetadata() to the existing test code, the output format is PDF/A 1A as requested. Is this the correct way to ensure the desired output format, or is there another option that I can use to guarantee the requested output format to be applied?

Please note:

Test program and document are still available for download at: http://support.decos.nl/berend/AsposeTest-20120919.zip

(zip contains old Aspose.Pdf dll and does not have the RemoveMetaData call)

I am aware that Aspose.Words now supports direct PDF/A 1A output. This is intentionally not used in the test program because we want to test a generic fallback for all input document formats.

Thanks and best regards,

Berend

codewarior · December 6, 2013, 9:53pm

Berend:
Hi,

Thanks for that! This was a happy surprise. I have one question though: if I use the new version in my test prorgram (referred to in the third message of this thread), the output file is marked as PDF/A 1B if I request PDF/A 1A as output format. I am afraid that would still not be good enough for my customers, because they need to follow a guideline by the Dutch government, that specifies PDF/A 1A as the minimum acceptable compliance level for archiving.

When experimenting around a bit, I found that one line of code appears to make the difference:
format = Aspose.Pdf.PdfFormat.PDF_A_1A
…
pdfdoc.RemoveMetadata()
pdfdoc.IgnoreCorruptedObjects = True
pdfdoc.Convert(msLog, format, erroraction)
pdfdoc.Save(sTargetFile)

When I add a call to RemoveMetadata() to the existing test code, the output format is PDF/A 1A as requested. Is this the correct way to ensure the desired output format, or is there another option that I can use to guarantee the requested output format to be applied?

Hi Berend,

Thanks for sharing the details.

I have tested the scenario and I am able to
notice the same problem that PDF/A_1a file is not being generated. For the sake of correction, I have separately logged this issue
as PDFNEWNET-36132 in our issue tracking system. We will further
look into the details of this problem and will keep you updated on the status
of correction. Please be patient and spare us little time. We are sorry for
this inconvenience.

aspose.notifier · June 5, 2014, 2:20pm

The issues you have found earlier (filed as PDFNEWNET-36132) have been fixed in Aspose.Pdf for .NET 9.3.0.

Blog post for this release can be viewed over this link

This message was posted using Notification2Forum from Downloads module by Aspose Notifier.