Convert PDF to PDF/A-2A not valid

Hello Support Team

We make use of the conversion method (of aspose.pdf version 24.7) that converts pdf to pdfa-2a. In order to be able to assure the customer that the converted pdf’s are in pdfa-2a, we have used the validation method of Aspose too.

Convertion and validation use:

 try
 {
     temp_doc.Info.ModDate = DateTime.Now;
     temp_doc.Convert(filename, settings.PdfSaveSettings.PdfFormat, ConvertErrorAction.None); //pdfFormat = PDF_A_2A

 }
...

 var valid_Message = "";

 pdfIsValid = ValidatePDF(temp_doc, settings, filename);
 if (pdfIsValid)
 {
     valid_Message = $" It is {settings.PdfSaveSettings.PdfFormat} valid.";
 }

Validation method:

    public static Boolean ValidatePDF(Document doc,  AsposeSettings settings, string filename)
    {
        Logger.Debug("Start of ValidatePDF method.");
        pdfIsValid = false;
        pdfIsValid = doc.Validate(filename, settings.PdfSaveSettings.PdfFormat); //pdfFormat = PDF_A_2A
        if (!pdfIsValid)
        {
            return false;
        }

        RunConvert.pdfAValid++;
        Logger.Debug("End of ValidatePDF method.");
        return true;

As a result, we can see in the log that only 176 of the 2010 documents are PDF/A-2A valid and the rest have only been converted.

image.png (120.0 KB)

As soon as we open the documents that are not PDF/A-2A valid, we see this message ‘This file claims compliance with the PDF/A standard and has been opened read-only to prevent modification.’ (see in the cloud link under ‘pdf-2a not valid → before and after converting).

image.png (25.8 KB)

I have also added examples of valid conversions in the cloud link (see ‘pdf-2a valid → before and after converting’ ).

We have also taken some non pdf/a-2a valid documents and analysed them with other tools and some of them are valid compared to aspose (see in the cloud link under ‘pdf-2a not valid → pdfa-2a valid with other tools than aspose’).

We have the following questions in this regard:

  • We want to know why the majority is not PDF/A-2A valid.
  • What exactly does it mean if Aspose Valid function does not consider it as PDF/A-2A valid but the document claims to be PDF/A? What would be the consequences for the customer?
  • Are there other methods that we have not used in our code that would lead to better conversion or validation?
  • How can we convert all PDF’s to a valid PDF/-2A so that the majority and not the minority is valid?

The sample application code is also available in the cloud link if needed.
Link to the mentioned examples: ImageWare-Nextcloud
pw: dnXSobKz

Thanks in advance for a quick reply

@hasanirmak
To check and reproduce the problems, I used the following code.

var doc = new Document(dataDir + "00-Submitting Indexer sizing requests – iManage Support.pdf");
doc.Convert(dataDir + "report.xml", PdfFormat.PDF_A_2A, ConvertErrorAction.Delete);
bool isValidated = doc.Validate(new MemoryStream(), PdfFormat.PDF_A_2A);
Console.WriteLine("isValidated : " + isValidated);
doc.Save(dataDir + "00_out.pdf");

It can be said to be the same with minor variations (line/stream, etc.).
But what I would like to draw attention to is that it is advisable to use the ConvertErrorAction.Delete parameter (and not ConvertErrorAction.None) - in this case, the library tries to fix errors in the source document.

(for example, here, the log file shows what inconsistencies there were in the source document for 01…, which was converted to a valid PDF/A document).
Report.png (35.0 KB)

The presence of an inscription in an open document does not mean full compliance with the standard. The most basic tool with which we check the result is Adobe Preflight.
For 00.png (73.1 KB)

For the three problematic documents presented: (checked with the library version 24.10)

  • 00-Submitting Indexer… - the resulted document does not pass the Preflight check. (And accordingly, the Validate() result of false is correct). Task PDFNET-58359 has been created for the development team.
  • 01-1 Introduction to the… - the resulted document passes the Preflight check, but Validate() returns false. Task PDFNET-58361 has been created for the development team.
  • 11-SecurityBoost.pdf - the resulted document does not pass the Preflight check. (And accordingly, the Validate() result of false is correct). Task PDFNET-58360 has been created for the development team.

@hasanirmak
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-58359,PDFNET-58361,PDFNET-58360

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

Hi Sergei
I have checked it with the variant you suggested and get the following results:

Aspose PDF version 24.7: I get 11 out of 107 documents as PDF_A_2A valid.

With Aspose PDF version 24.10: I get 11 out of 107 documents as PDF_A_2A valid.

I can see that there are various errors in the xml report, but with ConvertErrorAction.Delete it doesn’t seem to have been able to fix or delete them, and there are still very few PDF_A_2A valid documents to be seen.

Do you have any other suggestions to get better results.

Best regards

@hasanirmak

A possible reason may be the lack of fonts. In short: the original document may not describe any of the fonts used, but the PDF/A document must have a font description.
You can see that this is happening in the reports. This situation (lack of font description in the document) corresponds to lines like:

<Problem Severity="Error" Clause="6.3.4" ObjectID="1151" Page="32" Convertable="False">Font 'ArialMT' is not embedded</Problem>

If Convertable=“True” the font description was found, if false, then no.

You can see ways to solve this problem in one of my answers in this forum thread. PDF conversion loosing data - #8
(where I provide the development team’s answer)

I will wait for your answer.

Hi Sergei

If i want to open the suggested link, i get the following: Oops! That page is private.

image.png (25.9 KB)

@hasanirmak
Sorry, I forgot that this is a private topic.
I will also provide the response from the development team here.

The source document has two problems that prevent its conversion to PDF/A-2b. First, it has XMP metadata associated with its pages, that contains some non-standard properties. This problem will be addressed in the 24.4 version, the pages metadata will now be updated to contain only standard entries.

Second, the document doesn’t contain a definition for the font “MinionPro-Regular”, therefore it isn’t possible to make a valid PDF/A document unless the missing font or its substitution is provided. To create a valid PDF/A-2b document beginning with the 24.4 version the customer may use one of the following techniques:

  • Use a default substitution font instead (for all text that uses MinionPro-Regular, the font will be changed to the Times New Roman):
var tempDoc = new Aspose.Pdf.Document(dataDir + "Produksjonsformat.pdf");
tempDoc.Form.Type = Aspose.Pdf.Forms.FormType.Standard;
var options = new PdfFormatConversionOptions(PdfFormat.PDF_A_2B);

// Replace the inaccessible MinionPro-Regular with the default substitution font (Times New Roman)
options.FontEmbeddingOptions.UseDefaultSubstitution = true;

tempDoc.Convert(options);
if (!tempDoc.Validate(new MemoryStream(), PdfFormat.PDF_A_2B))
    Console.WriteLine("not validate");

tempDoc.Save(dataDir + "Produksjonsformat-converted.pdf");
// Replace the inaccessible MinionPro-Regular with the user chosen font
FontRepository.Substitutions.Add(new SimpleFontSubstitution("MinionPro-Regular", "Arial"));

var tempDoc = new Aspose.Pdf.Document(dataDir + "Produksjonsformat.pdf");
tempDoc.Form.Type = Aspose.Pdf.Forms.FormType.Standard;
var options = new PdfFormatConversionOptions(PdfFormat.PDF_A_2B);

tempDoc.Convert(options);
if (!tempDoc.Validate(new MemoryStream(), PdfFormat.PDF_A_2B))
    Console.WriteLine("not validate");

tempDoc.Save(dataDir + "Produksjonsformat-converted.pdf");
  • Provide the external font definition for the MinionPro-Regular font if it’s not installed in the system:
// Add the folder containing the MinionPro-Regular font definition to the list of font sources
FontRepository.Sources.Add(new FolderFontSource("path_to_the_folder_with_the_font"));

var tempDoc = new Aspose.Pdf.Document(dataDir + "Produksjonsformat.pdf");
tempDoc.Form.Type = Aspose.Pdf.Forms.FormType.Standard;
var options = new PdfFormatConversionOptions(PdfFormat.PDF_A_2B);

tempDoc.Convert(options);
if (!tempDoc.Validate(new MemoryStream(), PdfFormat.PDF_A_2B))
    Console.WriteLine("not validate");

tempDoc.Save(dataDir + "Produksjonsformat-converted.pdf");

Hi again

i have also searched for “PDF conversion loosing data” in the forum and got one relevant, which tries to manipulate the document in order to make it pdf_a valid :smiley:

in our case the document cant be manipulated and it should be converted as it is :slight_smile:

Best regards

@hasanirmak
First of all, you need to check whether the conversion errors are related to fonts, as I described in the previous post.