Can't update a PDF saved from a browser (in Net Standard)

I’ve been creating PDFs by generating them in a headless browser using HTML and then saving to PDF. I then edit them using Aspose.PDF, such as joining multiple such PDFs together, removing or adding blank pages, and adding bookmarks.

This has worked fine while the code has been written in .Net 8.0, using that version of the Aspose.PDF library. To integrate with another application though, the code needs to work with .Net 4.7.2. I’ve tried copying all code across to an application using the Net 4.7.2 version of the Aspose.PDF library, or creating a Net Standard library using that version of Aspose.PDF, but both have the same problem - when loading the PDF, editing it, and then trying to save it, I get this error:

The type initializer for ‘Aspose.Pdf.XmpField’ threw an exception.
at :
at Aspose.Pdf.XmpField.get_Empty()
at #=zpXEIdHLgkPw3fKjOYyrucu6fwt4dd4T84WDqpK8=…ctor(#=z9hy_0p4Z4LUzlfl7t2QlpIYnQzI9_pDtPpOWAHo= #=zED89FWg=, Boolean #=zM2h5tgUxpdMv)
at #=zXypT$ksaYp24tjmzJ8ficb0Jo3Hu2GsXKk_7hcY=…ctor()
at #=zwQrdiNT4PE4SglHQd3fHO9XabPyvmfOm4VukXeA=…ctor()
at #=zrgh05faJJpAzO38O43LesOcAUoRKBz1wA_fOFsQ=.#=zKy33UWpywBGP()
at Aspose.Pdf.Document.get_Metadata()
at Aspose.Pdf.Document.get_IsPdfaCompliant()
at Aspose.Pdf.Document.#=z$BLJK39N59ot(Stream #=zbB05Mdw=, SaveOptions #=zVDJEEr4wyWHC)
at Aspose.Pdf.Document.#=z5M1wyJJj0io5(Stream #=zbB05Mdw=, SaveOptions #=zVDJEEr4wyWHC)
at Aspose.Pdf.Document.Save(Stream output)
at Paged.PdfFromPaged.CreatePdfFromHtml(String dmc, Int32 pageOffset, PdfFromPagedSettings settings, Boolean addToPdfList, Boolean separateOutFoldouts)

This has happened when trying to delete a page:

                            Document mainDoc = new Document(pdfPath);
                            mainDoc.Pages.Delete(numberOfPages);
                            using (MemoryStream ms = new MemoryStream())
                            {
                                mainDoc.Save(ms);
                                // Write the MemoryStream’s contents back to the original file
                                File.WriteAllBytes(pdfPath, ms.ToArray());
                            }

Previously I tried to delete the page using the PdfFileEditor.Delete function, the code shown is already an atttempt to get around the XmpField.get_Empty() exception, but it seems to happen no matter how I attempt to change the file using Aspose.PDF. If I avoid this piece of code altogether, I have the same exception when I later try to combine a set of PDFs together using PdfFileEditor. It doesn’t cause a problem with PDF files that Aspose has itself created, but seemingly with anything created from a browser, though I can’t tell what is different about them.

@DavidWebX

It seems you are encountering an issue when trying to edit PDFs that were generated from a browser, specifically when using the Aspose.PDF library in a .NET 4.7.2 environment. The error message you are receiving, related to the XmpField, suggests that there may be compatibility issues or specific metadata handling in the PDFs created by the browser.

Here are a few suggestions to troubleshoot and potentially resolve the issue:

  1. Check PDF Compatibility: Ensure that the PDFs generated from the browser are compliant with the PDF standards that Aspose.PDF expects. Sometimes, PDFs created by different tools may have unique structures or metadata that can cause issues when processed by Aspose.

  2. Use PdfFileEditor: Instead of directly manipulating the Document object, you might want to try using the PdfFileEditor class for operations like deleting pages. This class is designed for editing existing PDF files and may handle certain edge cases better. Here’s a sample code snippet for deleting pages using PdfFileEditor:

    using Aspose.Pdf;
    using Aspose.Pdf.Facades;
    
    // Create PdfFileEditor object
    PdfFileEditor pdfEditor = new PdfFileEditor();
    // Array of pages to delete
    int[] pagesToDelete = new int[] { 1, 2 }; // Specify the pages you want to delete
    // Delete pages
    pdfEditor.Delete("input.pdf", pagesToDelete, "output.pdf");
    
  3. Update Aspose.PDF Library: Ensure that you are using the latest version of the Aspose.PDF library compatible with .NET 4.7.2. Sometimes, bugs are fixed in newer releases that could resolve your issue.

  4. Examine PDF Metadata: If possible, inspect the metadata of the PDFs generated from the browser. You can use tools like Adobe Acrobat or other PDF inspection tools to see if there are any anomalies that might be causing the issue.

  5. Test with Different PDFs: As you mentioned that the issue does not occur with PDFs created by Aspose, try to create a simple PDF using Aspose and then perform the same operations on it. This can help isolate whether the problem is with the PDF structure or with the Aspose library itself.

If these suggestions do not resolve the issue, consider reaching out to Aspose support with specific details about the PDF files you are working with, as they may provide more targeted assistance.

Sources:
[1]: DeleteParticularPage.cs
[2]: DeletePagesUsingFilePath.cs