I am evaluating Aspose.PDF’s Java API to redact PII data in my PDF files and replace them with obfuscated text. My PDF contains some data as follows:
- B, KATILDE
- A, RICHARD
- S, NARTURO
The order of appearance of these text values is same as in the PDF.
Here’s my sample code:
private void replaceTextInDocument(String sourceDoc, String destDoc, String textToReplace, String replacementText) {
Document pdfDocument = new Document(sourceDoc);
System.out.println("replaceTextInDocument processing: " + textToReplace);
// Create TextAbsorber object to find all instances of the input search phrase
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(textToReplace);
pdfDocument.getPages().accept(textFragmentAbsorber);
// Get the extracted text fragments into collection
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();
// Loop through the fragments
for (TextFragment textFragment : (Iterable<TextFragment>) textFragmentCollection) {
// Update text and other properties
textFragment.setText(replacementText);
textFragment.getTextState().setFont(FontRepository.findFont("Verdana"));
textFragment.getTextState().setFontSize(8);
textFragment.getTextState().setForegroundColor(Color.getBlack());
textFragment.getTextState().setBackgroundColor(Color.getWhite());
}
// Save the updated PDF file
pdfDocument.save(destDoc);
pdfDocument.close();
}
Now interestingly, this code is able to replace to ‘B, KATILDE’ with replacementText. However, when I try to replace other two values (A, RICHARD or S, NARTURO), I get the following exception:
Exception in thread "main" com.aspose.pdf.internal.ms.System.l7k: **Collection is of a fixed size**
at com.aspose.pdf.internal.ms.System.ly$lb.removeAt(Unknown Source)
at com.aspose.pdf.ADocument.lI(Unknown Source)
at com.aspose.pdf.CharInfoCollection.copyTo(Unknown Source)
at com.aspose.pdf.l19y.lI(Unknown Source)
at com.aspose.pdf.l19y.lf(Unknown Source)
at com.aspose.pdf.TextFragment.lf(Unknown Source)
at com.aspose.pdf.TextFragment.setText(Unknown Source)
I am able to replace RICHARD or NARTURO successfully.
I am using v23.7 of the Aspose Library. Snippet from build.gradle:
implementation group: 'com.aspose', name: 'aspose-pdf', version: '23.7'
It’s very difficult to debug the Aspose code using IDE. So, I am unable to get more information about this issue.
Any help or pointers to resolve this issue would be helpful.
Thanks
@manchandap
Can you please make sure to use a 30-days free temporary license? In case issue keeps occurring, please share your sample PDF document for our reference so that we can test the scenario in our environment and address it accordingly.
Thanks @asid.ali, let me try the 30 days temp license.
Hi @asid.ali,
The usage of 30 day temp license has resolved the issue that I was facing.
One QQ: Is the structure and formatting of the original file retained in the new file (containing the replaced text)?
Thanks
@manchandap
Its nice to hear that your issue has been resolved. About your question, the structure of the original PDF document should retain after saving using Aspose.PDF. However, if you notice any issue, please share with us.
Thanks @asad.ali for your continued support. My redacted file is failing the validation. The issue at this stage is with the formatting i.e. Font.
The Font of text to be replaced in Original PDF is :
- TimesNewRomanPS-BoldMT (TLMOQV+TimesNewRomanPS-BoldMT)
I have modified my code to set the Font Name and Font Size dynamically i.e. by retrieving the original font and size. The updated code snipped it below. However, when in inspect the redacted file, I get the following Font Name:
Updated Code Snippet:
private void replaceTextInDocument(String sourceDoc, String destDoc, String textToReplace, String replacementText) {
Document pdfDocument = new Document(sourceDoc);
System.out.println("replaceTextInDocument processing: " + textToReplace);
// Create TextAbsorber object to find all instances of the input search phrase
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(textToReplace);
pdfDocument.getPages().accept(textFragmentAbsorber);
// Get the extracted text fragments into collection
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();
// Loop through the fragments
for (TextFragment textFragment : (Iterable<TextFragment>) textFragmentCollection) {
**String fontName = textFragment.getTextState().getFont().getFontName();**
** float fSize = textFragment.getTextState().getFontSize();**
** Font font = FontRepository.findFont(fontName);**
// Update text and other properties
textFragment.setText(replacementText);
**textFragment.getTextState().setFont(font));**
** textFragment.getTextState().setFontSize(fSize);**
textFragment.getTextState().setForegroundColor(Color.getBlack());
textFragment.getTextState().setBackgroundColor(Color.getWhite());
}
// Save the updated PDF file
pdfDocument.save(destDoc);
pdfDocument.close();
}
PS: I have also tried the ```
public static Font findFont(String fontName, boolean ignoreCase) with boolean value of True. However, the results are the same.
Please help me to understand the issue:
@manchandap
Would you kindly share your sample PDF document as well because we need it to test the scenario and understand the issue more clearly? We will test the case in our environment and address it accordingly.
Thanks @asid.ali for continued support on this.
A sample PDF to reproduce the issue is attached.
The Text under ‘Employee’ Column is the one under question where original Font is not set after replacing the text.
Following might help:
For text like ‘E, PEGGIE’, the code snippet
> String fontName = textFragment.getTextState().getFont().getFontName(); System.out.println("ORIGINAL FONT: " + fontName);
prints: TimesNewRomanPS-BoldMT
Now the following code snippet,
> Font font = FontRepository.findFont(fontName, true);
> System.out.println("FR FOUND FONT: " + font.getFontName());`
prints: Times New Roman Bold
Visually, both the fonts have same affect in the PDF i.e. bold text. However, programmatically it cases my PDF read logic to fail.
Thanks.
employee_sample.pdf (64.7 KB)
@manchandap
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): PDFJAVA-43015
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.
Hi @asid.ali,
Wanted to check if there is any update on this issue. The fix will help us to move forward towards adopting this library.
@manchandap
We are afraid that the earlier logged ticket has not been yet resolved. However, we have recorded your concerns and will surely inform you as soon as we have some definite updates in this regard. Please be patient and spare us some time.
We are sorry for the inconvenience.