Convert html to PDF/A1

Hello.
Version Assose 24.2

I have an HTML file. I open it using Aspose.Words and save it as a PDF. Then, using Aspose.PDF.Drawing, I open the resulting PDF file and convert it to PDF/A-1 format and version 1.4. However, after saving the file to disk, the PDF/A-1 format is not applied to the file.
If I repeat the conversion of the last file to PDF/A-1 and 1.4 again, the PDF/A-1 format is applied to the file.
Why doesn’t the conversion apply the first time?

528672e7d317429894c1a1d28cab11d0.zip (2.4 KB)

@PoweRDeaD

To convert an HTML file to PDF/A-1 format using Aspose.Words and Aspose.PDF, you can follow these steps. It seems that you are experiencing an issue where the PDF/A-1 format is not applied on the first conversion attempt. This could be due to the way the conversion process is handled in your code.

Here’s a general approach to ensure that the conversion to PDF/A-1 is applied correctly:

  1. Convert HTML to PDF: First, use Aspose.Words to load your HTML file and save it as a PDF.

  2. Open the PDF with Aspose.PDF: After saving the PDF, open it using Aspose.PDF.

  3. Convert to PDF/A-1: Use the Convert method to convert the PDF to PDF/A-1 format. Make sure to specify the correct parameters for the conversion.

Here is a sample code snippet that demonstrates this process:

using System;
using Aspose.Words;
using Aspose.Pdf;

class Program
{
    static void Main()
    {
        // Load HTML file and save as PDF
        Document doc = new Document("input.html");
        doc.Save("output.pdf");

        // Open the PDF document
        Document pdfDocument = new Document("output.pdf");

        // Convert to PDF/A-1b compliant document
        pdfDocument.Convert("log.xml", PdfFormat.PDF_A_1B, ConvertErrorAction.Delete);

        // Save the output document
        pdfDocument.Save("output_pdfa.pdf");

        Console.WriteLine("PDF file converted to PDF/A-1b compliant PDF.");
    }
}

Troubleshooting the Conversion Issue

If the PDF/A-1 format is not applied on the first attempt, consider the following:

  • Validation: Ensure that the PDF document meets the requirements for PDF/A-1 compliance. The conversion process includes validation, and if the document does not meet the criteria, it may not convert correctly.

  • Logging: Check the log file generated during the conversion process (log.xml). This file can provide insights into any issues encountered during the conversion.

  • Reconversion: As you mentioned, if you convert the resulting PDF again, it works. This might indicate that the initial PDF needs some adjustments or that the conversion process requires a second pass to ensure compliance.

If you continue to face issues, please provide more details about the specific errors or behaviors you observe during the conversion process.

Sources:
[1]: PDFToPDFA.cs

The PDF file obtained from the HTML file passes validation, and the conversion result is always true.

@PoweRDeaD
Let me investigate a bit this issue, I’ll write you back as soon as possible

@PoweRDeaD
Thank you for your patience
I tested with 25.3 version of Aspose Drawing the code you provided and issue wasn’t reproduced, probably it was already solved
Here’s result of first iteration of PDF_A_1A conversion, Adobe Acrobat and VeraPDf shows that no problems were found
first_iteration_of_PDFA_convert.png (82.3 KB)
investigation_results.zip (207.0 KB)
here’re attached files and code I used:

Save(fileNameHtml, fileName, 100);

// First Convert
using (var pdfDocument = new Aspose.Pdf.Document(fileName))
{
    ConvertCollageToOtherPdfVersion(pdfDocument, PdfFormat.PDF_A_1A, PdfVersion.v_1_4);

    pdfDocument.Save(fileName1);
}
using (var pdfDocument = new Aspose.Pdf.Document(fileName1))
{
    Console.WriteLine(string.Format("Result of first document validation is {0}",
        pdfDocument.Validate(new MemoryStream(), PdfFormat.PDF_A_1A)));
}

// Convert again
using (var pdfDocument = new Aspose.Pdf.Document(fileName1))
{
    ConvertCollageToOtherPdfVersion(pdfDocument, PdfFormat.PDF_A_1A, PdfVersion.v_1_4);

    pdfDocument.Save(fileName2);
}

using (var pdfDocument = new Aspose.Pdf.Document(fileName2))
{
    Console.WriteLine(string.Format("Result of second document validation  is {0}",
        pdfDocument.Validate(new MemoryStream(), PdfFormat.PDF_A_1A)));
}

What version of Aspose Drawing did you use?

Version Aspose 24.2

@PoweRDeaD
in this case I would recommend to upgrade version and see if the issue is still present.