System.IndexOutOfRangeException: Index was outside the bounds of the array while trying to use Page.Accept() method

Hello,

I am trying to Replace Text in a PDF file based on a Regular Expression using the code in the Documentation Page.

This is what my method looks like:

private static void RedactGivenTextFromContent(Document pdfDocument, string textToRedact)
{
    var regexPattern = new Regex(Regex.Escape(textToRedact), RegexOptions.IgnoreCase);
    var textFragmentAbsorber = new TextFragmentAbsorber(regexPattern, new TextSearchOptions(true));

    pdfDocument.Pages.Accept(textFragmentAbsorber);

    foreach (var textFragment in textFragmentAbsorber.TextFragments)
    {
        textFragment.Text = RedactedText;
    }
}

This method works fine for 99.9% of the PDF files I have tried to process. However, when I attempt to process one particular file, I get the following exception: System.IndexOutOfRangeException: Index was outside the bounds of the array while trying to use Page.Accept() method.

I am unsure what is causing this issue, but I was wondering if anyone has encountered a similar issue. I am on Aspose version 23.7.0.

Unfortunately, I am unable to share the PDF I am having issues with due to privacy reasons.

Thank you in advance.

@deejayjay

The System.IndexOutOfRangeException you are encountering while using the Page.Accept() method could be due to several reasons, especially since it works for most PDF files but fails for a specific one. Here are some potential causes and solutions to consider:

  1. Corrupted PDF File: The PDF file you are trying to process might be corrupted or not well-formed. You can try opening the PDF in a viewer to see if it displays correctly. If it doesn’t, you may need to repair the PDF or use a different file.

  2. Page Count: Ensure that the PDF document has pages. If the document is empty or has no pages, calling Accept() on an empty page collection could lead to this exception. You can check the page count before calling the method:

    if (pdfDocument.Pages.Count > 0)
    {
        pdfDocument.Pages.Accept(textFragmentAbsorber);
    }
    else
    {
        Console.WriteLine("The document has no pages.");
    }
    
  3. Text Fragment Absorber Configuration: The regular expression you are using might not match any text in the problematic PDF, leading to unexpected behavior. You can test the regex pattern with simpler text to see if it works correctly.

  4. Aspose Version: You mentioned you are using Aspose version 23.7.0. Ensure that you are using the latest version of the library, as bugs are often fixed in newer releases. Check the Aspose.PDF for .NET documentation for any updates or changes in the API.

  5. Debugging: You can add try-catch blocks around the Accept() method to catch exceptions and log more details about the PDF being processed. This might give you more insight into what is going wrong.

If none of these suggestions resolve the issue, consider reaching out to Aspose support with details about the specific PDF file and the context in which the error occurs. They may be able to provide more targeted assistance.

If you have any further questions or need additional help, feel free to ask!

Hi,
Thank you for the quick response. I did the following:

  1. Corrupted PDF File : Checked if the PDF file is corrupted. I was able to open it in Acrobat reader without any issues. I also used Aspose.PDF to extract some information from the same file.

  2. Page Count: I added code to proceed only if the document has at least one page. I still got the error.

  3. Text Fragment Absorber Configuration : Tried replacing the regex pattern with simpler text (for instance, Josh) and still got the error.

  4. Aspose Version : This is something I still have to try. My client will need to provide me with the license for a newer version. I will try this once I get the new license.

  5. Debugging: This was one of the first things I did. I could not find any other useful details other than the Index was outside the bounds of the array error.

@deejayjay

Would you please try using 24.11 version of the API and see if error still persists. Please share your sample file with us as well so that we can test the scenario in our environment and address it accordingly.

Hi,

Unfortunately, I am cannot share the PDF file I am having issues due to confidentiality reasons.

I am in the process of securing license for the latest version of the API and will try processing the document once I have it ready.

Thank you

@deejayjay

You can test with the latest version of the API using a 30-days temporary license if issue still persists, you can share your sample file in a private message by clicking on username and pressing Blue Message Button.