Convert PDF to PDF/A-2A not valid

Hello Support Team

We make use of the conversion method (of aspose.pdf version 24.7) that converts pdf to pdfa-2a. In order to be able to assure the customer that the converted pdf’s are in pdfa-2a, we have used the validation method of Aspose too.

Convertion and validation use:

 try
 {
     temp_doc.Info.ModDate = DateTime.Now;
     temp_doc.Convert(filename, settings.PdfSaveSettings.PdfFormat, ConvertErrorAction.None); //pdfFormat = PDF_A_2A

 }
...

 var valid_Message = "";

 pdfIsValid = ValidatePDF(temp_doc, settings, filename);
 if (pdfIsValid)
 {
     valid_Message = $" It is {settings.PdfSaveSettings.PdfFormat} valid.";
 }

Validation method:

    public static Boolean ValidatePDF(Document doc,  AsposeSettings settings, string filename)
    {
        Logger.Debug("Start of ValidatePDF method.");
        pdfIsValid = false;
        pdfIsValid = doc.Validate(filename, settings.PdfSaveSettings.PdfFormat); //pdfFormat = PDF_A_2A
        if (!pdfIsValid)
        {
            return false;
        }

        RunConvert.pdfAValid++;
        Logger.Debug("End of ValidatePDF method.");
        return true;

As a result, we can see in the log that only 176 of the 2010 documents are PDF/A-2A valid and the rest have only been converted.

image.png (120.0 KB)

As soon as we open the documents that are not PDF/A-2A valid, we see this message ‘This file claims compliance with the PDF/A standard and has been opened read-only to prevent modification.’ (see in the cloud link under ‘pdf-2a not valid → before and after converting).

image.png (25.8 KB)

I have also added examples of valid conversions in the cloud link (see ‘pdf-2a valid → before and after converting’ ).

We have also taken some non pdf/a-2a valid documents and analysed them with other tools and some of them are valid compared to aspose (see in the cloud link under ‘pdf-2a not valid → pdfa-2a valid with other tools than aspose’).

We have the following questions in this regard:

  • We want to know why the majority is not PDF/A-2A valid.
  • What exactly does it mean if Aspose Valid function does not consider it as PDF/A-2A valid but the document claims to be PDF/A? What would be the consequences for the customer?
  • Are there other methods that we have not used in our code that would lead to better conversion or validation?
  • How can we convert all PDF’s to a valid PDF/-2A so that the majority and not the minority is valid?

The sample application code is also available in the cloud link if needed.
Link to the mentioned examples: ImageWare-Nextcloud
pw: dnXSobKz

Thanks in advance for a quick reply

@hasanirmak
To check and reproduce the problems, I used the following code.

var doc = new Document(dataDir + "00-Submitting Indexer sizing requests – iManage Support.pdf");
doc.Convert(dataDir + "report.xml", PdfFormat.PDF_A_2A, ConvertErrorAction.Delete);
bool isValidated = doc.Validate(new MemoryStream(), PdfFormat.PDF_A_2A);
Console.WriteLine("isValidated : " + isValidated);
doc.Save(dataDir + "00_out.pdf");

It can be said to be the same with minor variations (line/stream, etc.).
But what I would like to draw attention to is that it is advisable to use the ConvertErrorAction.Delete parameter (and not ConvertErrorAction.None) - in this case, the library tries to fix errors in the source document.

(for example, here, the log file shows what inconsistencies there were in the source document for 01…, which was converted to a valid PDF/A document).
Report.png (35.0 KB)

The presence of an inscription in an open document does not mean full compliance with the standard. The most basic tool with which we check the result is Adobe Preflight.
For 00.png (73.1 KB)

For the three problematic documents presented: (checked with the library version 24.10)

  • 00-Submitting Indexer… - the resulted document does not pass the Preflight check. (And accordingly, the Validate() result of false is correct). Task PDFNET-58359 has been created for the development team.
  • 01-1 Introduction to the… - the resulted document passes the Preflight check, but Validate() returns false. Task PDFNET-58361 has been created for the development team.
  • 11-SecurityBoost.pdf - the resulted document does not pass the Preflight check. (And accordingly, the Validate() result of false is correct). Task PDFNET-58360 has been created for the development team.

@hasanirmak
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-58359,PDFNET-58361,PDFNET-58360

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

Hi Sergei
I have checked it with the variant you suggested and get the following results:

Aspose PDF version 24.7: I get 11 out of 107 documents as PDF_A_2A valid.

With Aspose PDF version 24.10: I get 11 out of 107 documents as PDF_A_2A valid.

I can see that there are various errors in the xml report, but with ConvertErrorAction.Delete it doesn’t seem to have been able to fix or delete them, and there are still very few PDF_A_2A valid documents to be seen.

Do you have any other suggestions to get better results.

Best regards

@hasanirmak

A possible reason may be the lack of fonts. In short: the original document may not describe any of the fonts used, but the PDF/A document must have a font description.
You can see that this is happening in the reports. This situation (lack of font description in the document) corresponds to lines like:

<Problem Severity="Error" Clause="6.3.4" ObjectID="1151" Page="32" Convertable="False">Font 'ArialMT' is not embedded</Problem>

If Convertable=“True” the font description was found, if false, then no.

You can see ways to solve this problem in one of my answers in this forum thread. PDF conversion loosing data - #8
(where I provide the development team’s answer)

I will wait for your answer.

Hi Sergei

If i want to open the suggested link, i get the following: Oops! That page is private.

image.png (25.9 KB)

@hasanirmak
Sorry, I forgot that this is a private topic.
I will also provide the response from the development team here.

The source document has two problems that prevent its conversion to PDF/A-2b. First, it has XMP metadata associated with its pages, that contains some non-standard properties. This problem will be addressed in the 24.4 version, the pages metadata will now be updated to contain only standard entries.

Second, the document doesn’t contain a definition for the font “MinionPro-Regular”, therefore it isn’t possible to make a valid PDF/A document unless the missing font or its substitution is provided. To create a valid PDF/A-2b document beginning with the 24.4 version the customer may use one of the following techniques:

  • Use a default substitution font instead (for all text that uses MinionPro-Regular, the font will be changed to the Times New Roman):
var tempDoc = new Aspose.Pdf.Document(dataDir + "Produksjonsformat.pdf");
tempDoc.Form.Type = Aspose.Pdf.Forms.FormType.Standard;
var options = new PdfFormatConversionOptions(PdfFormat.PDF_A_2B);

// Replace the inaccessible MinionPro-Regular with the default substitution font (Times New Roman)
options.FontEmbeddingOptions.UseDefaultSubstitution = true;

tempDoc.Convert(options);
if (!tempDoc.Validate(new MemoryStream(), PdfFormat.PDF_A_2B))
    Console.WriteLine("not validate");

tempDoc.Save(dataDir + "Produksjonsformat-converted.pdf");
// Replace the inaccessible MinionPro-Regular with the user chosen font
FontRepository.Substitutions.Add(new SimpleFontSubstitution("MinionPro-Regular", "Arial"));

var tempDoc = new Aspose.Pdf.Document(dataDir + "Produksjonsformat.pdf");
tempDoc.Form.Type = Aspose.Pdf.Forms.FormType.Standard;
var options = new PdfFormatConversionOptions(PdfFormat.PDF_A_2B);

tempDoc.Convert(options);
if (!tempDoc.Validate(new MemoryStream(), PdfFormat.PDF_A_2B))
    Console.WriteLine("not validate");

tempDoc.Save(dataDir + "Produksjonsformat-converted.pdf");
  • Provide the external font definition for the MinionPro-Regular font if it’s not installed in the system:
// Add the folder containing the MinionPro-Regular font definition to the list of font sources
FontRepository.Sources.Add(new FolderFontSource("path_to_the_folder_with_the_font"));

var tempDoc = new Aspose.Pdf.Document(dataDir + "Produksjonsformat.pdf");
tempDoc.Form.Type = Aspose.Pdf.Forms.FormType.Standard;
var options = new PdfFormatConversionOptions(PdfFormat.PDF_A_2B);

tempDoc.Convert(options);
if (!tempDoc.Validate(new MemoryStream(), PdfFormat.PDF_A_2B))
    Console.WriteLine("not validate");

tempDoc.Save(dataDir + "Produksjonsformat-converted.pdf");

Hi again

i have also searched for “PDF conversion loosing data” in the forum and got one relevant, which tries to manipulate the document in order to make it pdf_a valid :smiley:

in our case the document cant be manipulated and it should be converted as it is :slight_smile:

Best regards

@hasanirmak
First of all, you need to check whether the conversion errors are related to fonts, as I described in the previous post.

Hi Segei

I have used the suggestions regarding font insertion and use the following code and have been able to validate a large number of them in pdf/a_2a.



 FontRepository.Sources.Add(new FolderFontSource(ConfigHelper.FontPath));
 foreach (var font in ConfigHelper.ListofFonts)
 {
     FontRepository.Substitutions.Add(new SimpleFontSubstitution(font.Key, font.Value)); //the font.value for all of the fonts is for now Arial
 }

var temp_stream = new MemoryStream();
var result = new ConvertedResult();
try
{
    var documentStream = org_doc.InMemoryStream;
    temp_doc = new Document(documentStream);
    temp_doc.Form.Type = AsposeForm.FormType.Standard;

    var options = new PdfFormatConversionOptions(settings.PdfSaveSettings.PdfFormat);
   
    options.FontEmbeddingOptions.UseDefaultSubstitution = true;
    options.ErrorAction = ConvertErrorAction.Delete;
    options.LogFileName = $"{org_doc.OutPutPath}\\{org_doc.Name}_report.xml";
    bool isValidated = temp_doc.Validate(new MemoryStream(), settings.PdfSaveSettings.PdfFormat);
    int count=0;
    while (isValidated == false && count < 2)
    {
        temp_doc.Convert(options);
        //temp_doc.Convert($"{org_doc.OutPutPath}\\{org_doc.Name}_report.xml", settings.PdfSaveSettings.PdfFormat, ConvertErrorAction.Delete);
        isValidated = temp_doc.Validate(new MemoryStream(), settings.PdfSaveSettings.PdfFormat);
        count++;
    }
    using (var resultPdfStream = new MemoryStream())
    {
        temp_doc.Save(resultPdfStream);
        var valid_Message = "";
        isValidated = temp_doc.Validate(new MemoryStream(), settings.PdfSaveSettings.PdfFormat);
        if (isValidated)
        {
            valid_Message = $" It is {settings.PdfSaveSettings.PdfFormat} valid after {count} iterations.";
            RunConvert.pdfAValid++;
        }
        else
        {
            valid_Message = $" It is not valid after {count} iterations.";
        }
        result.ConvertedDocument = resultPdfStream.ToArray();
        result.Result = Result.Successful;
        result.Message = $"PDF Conversion was successful. {valid_Message}";
    }

For some of them, however, it is not valid the first time, and I have to pass it through the converter twice until it is finally valid.

I have converted 107 documents → 3 of them are not converted at all and generate errors (see error file), some of them are converted despite font adjustment, but are not pdf/a_2a valid (see report file).

Now the question arises based on your example, why i could not get a valid pdf/a_2a the first time, and only after the second conversion (e.g. 31 from 92 docs were affected).

How do I manage to bring the remaining (not valid) into a valid state, including those where there is a permission problem?

Best regards
Error.zip (577.3 KB)

report.zip (11.0 KB)

@hasanirmak
I have already completed the work this week and unfortunately I will not be able to answer today. I will study the issue on Monday and write to you.

@hasanirmak
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-58481

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@hasanirmak
I checked the conversion of those three documents that are in the Errors archive.
I checked in Windows, in the .Net 6 project, Aspose.Pdf library version 24.10. The code was used:

var doc = new Document(dataDir + "02. OpenText Archive Server 10.1.1 Storage Platforms Release Notes.pdf");
doc.Convert(dataDir + "report.xml", PdfFormat.PDF_A_2A, ConvertErrorAction.Delete);
bool isValidated = doc.Validate(new MemoryStream(), PdfFormat.PDF_A_2A);
Console.WriteLine("isValidated : " + isValidated);
doc.Save(dataDir + "02.pdf");

For two files, the result passes Adobe Preflight validation, which is a model for us. For the document that does not pass the check after conversion (“02. OpenText Archive Server 10.1.1 Storage Platforms Release Notes.pdf”), I created a task for the PDFNET-58481 development team. For the conversion, I used only one pass.

Please check whether the given code works properly in your environment (for two documents that I converted).

Hi Sergei

we are using .Net Framework 4.8 with Aspose.Pdf library version 24.7 and 24.10, can you please check this as well and see if it is reproducible.
We can’t switch the whole project to Core in a hurry.

Best regards

@hasanirmak
When the .Net Framework 4.8 mentioned I pay attention to the fact that in the latest versions of the library it is supported only .Net Framework 4.8.1 and I recommend downloading the corresponding library from here: Download .NET Component DLL to Process PDF | Aspose.PDF API
For old Net Framework.png (46.2 KB)

However, in this case, when I created an empty .Net Framework 4.8 project and connected the library with Nuget converting of the mentioned 2 documents was successful.
I will attach them to make sure that we are talking about the same documents.
01. OpenText Archive Server 10.1.1 Release Notes.pdf (194.2 KB)

01.OpenText_Runtime_and_Core_Services&_Directory_Services_10.2.1_Release_Notes.pdf (240.1 KB)