Error after reading PDF text: Aspose.Pdf.InvalidPdfFileFormatException: Incorrect file header

Hello, I am experiencing an error after i extract all pdf text. The error is Aspose.Pdf.InvalidPdfFileFormatException: Incorrect file header

We are using Aspose.pdf v25.1.0

The document is actually a test word document converted to pdf. It has only two placeholders I am trying to validate.

Here is the code i found on your docu page:

        static string ExtractTextFromDocument(string filepath)
        {
            var content = "";
            // Open PDF document
            using (var document = new Aspose.Pdf.Document(filepath))
            {
                // Create TextAbsorber object to extract text
                var textAbsorber = new Aspose.Pdf.Text.TextAbsorber();

                // Accept the absorber for all the pages
                document.Pages.Accept(textAbsorber);

                // Get the extracted text
                string extractedText = textAbsorber.Text;

                // Create a writer and open the file
                using (TextWriter tw = new StreamWriter(filepath))
                {
                    // Write a line of text to the file
                    content = extractedText;
                }
            }

            return content;
        }

The thing is, after first call i get the text, but if i start my program again, it throws me this exception.

Here are the exception details:

Aspose.Pdf.InvalidPdfFileFormatException: Incorrect file header
   bei #=zlYyarVQJMjpSyhehktU1mamPknAXDAwz2w==.#=zvyOI3NLrp5LW()
   bei #=zlYyarVQJMjpSyhehktU1mamPknAXDAwz2w==..ctor(Stream #=zwVGl0eI=, String #=zmTf2QaQ=, Boolean #=zQMj3m3ABoKZm)
   bei #=zlYyarVQJMjpSyhehktU1mamPknAXDAwz2w==..ctor(String #=zknqpjIY=)
   bei #=zIpW2EyzbsHVsvKtk$HBW7tgVPJ4cHEk7_3Vw24I=.#=z0zbOOc4=(String #=zknqpjIY=)
   bei #=zApFIQrNhKlNquO1I2tURgk$iDyBKKWVaVA==..ctor(String #=zknqpjIY=)
   bei #=zIpW2EyzbsHVsvKtk$HBW7tgVPJ4cHEk7_3Vw24I=.#=zbfTkVVvDZmFo(String #=zknqpjIY=)
   bei #=zg24GGpN3Sb5fWsHGy1SQMW2pMkKS.#=zqsgnhZI=(String #=zknqpjIY=)
   bei #=zg24GGpN3Sb5fWsHGy1SQMW2pMkKS..ctor(String #=zknqpjIY=)
   bei Aspose.Pdf.Document.#=zFh3k7lA=(String #=zknqpjIY=, String #=zmTf2QaQ=)
   bei Aspose.Pdf.Document..ctor(String filename)
   bei aspose.pdf_test.Program.ExtractTextFromDocument(String filepath)

The file I am trying with is here:
1.pdf (33.8 KB)

The file contains @@leftsigner and @@rightsigner placeholders, font “Times New Roman”, font size 1pt.

This reminders me to ask you, are all font sizes supported (even 0,5pt)?

Kind regards,
Aleksandar

@adragon952

The error you’re encountering, Aspose.Pdf.InvalidPdfFileFormatException: Incorrect file header, typically indicates that the PDF file you are trying to read is either corrupted or not a valid PDF format. Since you mentioned that the document is a test Word document converted to PDF, it’s possible that the conversion process did not create a proper PDF file.

Here are a few steps you can take to troubleshoot and resolve this issue:

  1. Check the PDF File: Ensure that the PDF file is not corrupted. You can try opening it in a PDF viewer to see if it displays correctly.

  2. Reconvert the Document: If the PDF file is indeed corrupted, try converting the Word document to PDF again using a reliable method or tool. Make sure to use a proper PDF converter that adheres to PDF standards.

  3. File Path: Ensure that the file path you are providing to the Aspose.Pdf.Document constructor is correct and that the file exists at that location.

  4. File Permissions: Check if your application has the necessary permissions to read the file. Sometimes, file access issues can lead to unexpected errors.

  5. Font Size Support: Regarding your question about font sizes, Aspose.PDF supports a wide range of font sizes, including very small sizes like 0.5pt. However, rendering issues may arise with extremely small font sizes, so it’s advisable to test how they appear in your specific use case.

If you continue to experience issues after trying these steps, consider sharing the PDF file with the Aspose support team for further analysis.

Here’s a code snippet for extracting text from a PDF document, which you can use as a reference:

static string ExtractTextFromDocument(string filepath)
{
    var content = "";
    // Open PDF document
    using (var document = new Aspose.Pdf.Document(filepath))
    {
        // Create TextAbsorber object to extract text
        var textAbsorber = new Aspose.Pdf.Text.TextAbsorber();

        // Accept the absorber for all the pages
        document.Pages.Accept(textAbsorber);

        // Get the extracted text
        string extractedText = textAbsorber.Text;

        // Create a writer and open the file
        using (TextWriter tw = new StreamWriter(filepath))
        {
            // Write a line of text to the file
            content = extractedText;
        }
    }

    return content;
}

Make sure to handle exceptions properly in your code to catch any issues that may arise during the text extraction process.

If you have any further questions or need additional assistance, feel free to ask!