Aspose.PDF Java Getting com.aspose.pdf.internal.ms.System.l7k: Collection is of a fixed size Exception while replacing text in a PDF Document

manchandap · August 1, 2023, 6:56am

I am evaluating Aspose.PDF’s Java API to redact PII data in my PDF files and replace them with obfuscated text. My PDF contains some data as follows:

B, KATILDE
A, RICHARD
S, NARTURO

The order of appearance of these text values is same as in the PDF.

Here’s my sample code:

private void replaceTextInDocument(String sourceDoc, String destDoc, String textToReplace, String replacementText) {
 Document pdfDocument = new Document(sourceDoc);
    System.out.println("replaceTextInDocument processing: " + textToReplace);

    // Create TextAbsorber object to find all instances of the input search phrase
    TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(textToReplace);

    pdfDocument.getPages().accept(textFragmentAbsorber);

    // Get the extracted text fragments into collection
    TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();

    // Loop through the fragments
    for (TextFragment textFragment : (Iterable<TextFragment>) textFragmentCollection) {
      // Update text and other properties
      textFragment.setText(replacementText);
      textFragment.getTextState().setFont(FontRepository.findFont("Verdana"));
      textFragment.getTextState().setFontSize(8);
      textFragment.getTextState().setForegroundColor(Color.getBlack());
      textFragment.getTextState().setBackgroundColor(Color.getWhite());
      }
    // Save the updated PDF file
    pdfDocument.save(destDoc);
    pdfDocument.close();
  }

Now interestingly, this code is able to replace to ‘B, KATILDE’ with replacementText. However, when I try to replace other two values (A, RICHARD or S, NARTURO), I get the following exception:

Exception in thread "main" com.aspose.pdf.internal.ms.System.l7k: **Collection is of a fixed size**
    at com.aspose.pdf.internal.ms.System.ly$lb.removeAt(Unknown Source)
    at com.aspose.pdf.ADocument.lI(Unknown Source)
    at com.aspose.pdf.CharInfoCollection.copyTo(Unknown Source)
    at com.aspose.pdf.l19y.lI(Unknown Source)
    at com.aspose.pdf.l19y.lf(Unknown Source)
    at com.aspose.pdf.TextFragment.lf(Unknown Source)
    at com.aspose.pdf.TextFragment.setText(Unknown Source)

I am able to replace RICHARD or NARTURO successfully.

I am using v23.7 of the Aspose Library. Snippet from build.gradle:

 implementation group: 'com.aspose',  name: 'aspose-pdf', version: '23.7'

It’s very difficult to debug the Aspose code using IDE. So, I am unable to get more information about this issue.

Any help or pointers to resolve this issue would be helpful.

Thanks

asad.ali · August 1, 2023, 2:31pm

@manchandap

Can you please make sure to use a 30-days free temporary license? In case issue keeps occurring, please share your sample PDF document for our reference so that we can test the scenario in our environment and address it accordingly.

manchandap · August 2, 2023, 7:40am

Thanks @asid.ali, let me try the 30 days temp license.

manchandap · August 4, 2023, 2:25pm

Hi @asid.ali,

The usage of 30 day temp license has resolved the issue that I was facing.

One QQ: Is the structure and formatting of the original file retained in the new file (containing the replaced text)?

Thanks

asad.ali · August 4, 2023, 7:59pm

@manchandap

Its nice to hear that your issue has been resolved. About your question, the structure of the original PDF document should retain after saving using Aspose.PDF. However, if you notice any issue, please share with us.

manchandap · August 7, 2023, 7:42am

Thanks @asad.ali for your continued support. My redacted file is failing the validation. The issue at this stage is with the formatting i.e. Font.
The Font of text to be replaced in Original PDF is :

TimesNewRomanPS-BoldMT (TLMOQV+TimesNewRomanPS-BoldMT)

I have modified my code to set the Font Name and Font Size dynamically i.e. by retrieving the original font and size. The updated code snipped it below. However, when in inspect the redacted file, I get the following Font Name:

TimesNewRomanBold

Updated Code Snippet:

private void replaceTextInDocument(String sourceDoc, String destDoc, String textToReplace, String replacementText) {
 Document pdfDocument = new Document(sourceDoc);
    System.out.println("replaceTextInDocument processing: " + textToReplace);

    // Create TextAbsorber object to find all instances of the input search phrase
    TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(textToReplace);

    pdfDocument.getPages().accept(textFragmentAbsorber);

    // Get the extracted text fragments into collection
    TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();

    // Loop through the fragments
    for (TextFragment textFragment : (Iterable<TextFragment>) textFragmentCollection) {
     **String fontName = textFragment.getTextState().getFont().getFontName();**
**     float fSize =  textFragment.getTextState().getFontSize();**
**     Font font = FontRepository.findFont(fontName);**

      // Update text and other properties
      textFragment.setText(replacementText);
      **textFragment.getTextState().setFont(font));**
**      textFragment.getTextState().setFontSize(fSize);**
      textFragment.getTextState().setForegroundColor(Color.getBlack());
      textFragment.getTextState().setBackgroundColor(Color.getWhite());
      }
    // Save the updated PDF file
    pdfDocument.save(destDoc);
    pdfDocument.close();
  }

PS: I have also tried the ```
public static Font findFont(String fontName, boolean ignoreCase) with boolean value of True. However, the results are the same.
Please help me to understand the issue:

asad.ali · August 7, 2023, 6:37pm

@manchandap

Would you kindly share your sample PDF document as well because we need it to test the scenario and understand the issue more clearly? We will test the case in our environment and address it accordingly.

manchandap · August 9, 2023, 7:16am

Thanks @asid.ali for continued support on this.

A sample PDF to reproduce the issue is attached.

The Text under ‘Employee’ Column is the one under question where original Font is not set after replacing the text.

Following might help:

For text like ‘E, PEGGIE’, the code snippet

> String fontName = textFragment.getTextState().getFont().getFontName(); System.out.println("ORIGINAL FONT: " + fontName);

prints: TimesNewRomanPS-BoldMT

Now the following code snippet,

> Font font = FontRepository.findFont(fontName, true);
> System.out.println("FR FOUND FONT: " + font.getFontName());`

prints: Times New Roman Bold

Visually, both the fonts have same affect in the PDF i.e. bold text. However, programmatically it cases my PDF read logic to fail.

Thanks.

employee_sample.pdf (64.7 KB)

asad.ali · August 9, 2023, 6:26pm

@manchandap

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFJAVA-43015

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

manchandap · September 29, 2023, 11:05am

Hi @asid.ali,

Wanted to check if there is any update on this issue. The fix will help us to move forward towards adopting this library.

asad.ali · September 29, 2023, 8:20pm

@manchandap

We are afraid that the earlier logged ticket has not been yet resolved. However, we have recorded your concerns and will surely inform you as soon as we have some definite updates in this regard. Please be patient and spare us some time.

We are sorry for the inconvenience.