Null Pointer Exception during setText

Tried in two version Java aspose.pdf-21.3 and aspose.pdf-21.7
The problem is the same
java.lang.NullPointerException: null at com.aspose.pdf.internal.ms.System.Text.RegularExpressions.java.lj.lI(Unknown Source) at com.aspose.pdf.internal.l57y.lv.lf(Unknown Source) at com.aspose.pdf.internal.l57y.lv.lI(Unknown Source) at com.aspose.pdf.internal.l57y.lv.lj(Unknown Source) at com.aspose.pdf.internal.l57y.lv.lt(Unknown Source) at com.aspose.pdf.TextFragmentAbsorber.lI(Unknown Source) at com.aspose.pdf.TextFragmentAbsorber.lI(Unknown Source) at com.aspose.pdf.TextFragmentAbsorber$4.lI(Unknown Source) at com.aspose.pdf.TextFragmentAbsorber$4.lI(Unknown Source) at com.aspose.pdf.TextSegment$3$1.lI(Unknown Source) at com.aspose.pdf.TextSegment$3$1.lI(Unknown Source) at com.aspose.pdf.TextSegment.lI(Unknown Source) at com.aspose.pdf.TextSegment.lb(Unknown Source) at com.aspose.pdf.TextFragment.lu(Unknown Source) at com.aspose.pdf.TextFragment.setText(Unknown Source)
The code is:
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(pattern); document.getPages().accept(textFragmentAbsorber); TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments(); for (TextFragment textFragment : textFragmentCollection) { try { textFragment.setText(replace); } catch (Exception e) { //TODO: remove this catch once the library is fixed - there is weird NPE Log.a(PdfDocumentReplace.class).error(e); } textFragment.getTextState().setBackgroundColor(Color.getYellow()); }
Input patterns and replacement values:
([\w-+]+(?>.[\w]+)@[\w-]+(.[\w]+)(?>.[a-z]{2,}))
anonymous@great.com
[0-9]{3}[-]?[0-9]{2}[-]?[0-9]{4}
111-99-8765
[+]?\d{0,3}[ ]?\d{10}|[+]?\d{0,3}[ ]?(?:\d{3}-){2}\d{4}|[+]?\d{0,3}[ ]?(\d{3})[ ]?\d{3}-?\d{4}
322-233-3223

As you can see, for the last pattern there are two occurrences. The first one is replaced without any problem, but the second one fails to set text, however does not fail to change background color.Input.PNG (18.1 KB)
Output.PNG (13.6 KB)
PdfMaskResult_pure.pdf (451.5 KB)
Sample_pure.pdf (422.2 KB)

The same code works as expected with aspose-pdf-19.11.
How can I know the difference between two versions - 19.11 and 21.7?

@lion.brotzky

Please try the following code with latest version as it is working as expected.

Document document = new Document(dataDir + "Sample_pure.pdf");
String pattern = "[+]?\\d{0,3}[ ]?\\d{10}|[+]?\\d{0,3}[ ]?(?:\\d{3}-){2}\\d{4}|[+]?\\d{0,3}[ ]?(\\d{3})[ ]?\\d{3}-?\\d{4}";
String replace = "322-233-3223";
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(pattern , new TextSearchOptions(true));
document.getPages().accept(textFragmentAbsorber);
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();
for (TextFragment textFragment : textFragmentCollection)
{ 
    try 
    { 
        textFragment.setText(replace);
    }
 catch (Exception e) { //TODO: remove this catch once the library is fixed - there is weird NPE Log.a(PdfDocumentReplace.class).error(e); 
 } 
    textFragment.getTextState().setBackgroundColor(Color.getYellow()); 
}
document.save(dataDir + "Sample_pure_replace.pdf");

Sample_pure_replace.pdf (449.1 KB)

I cannot use it as a workaround.
This part works
new TextFragmentAbsorber(pattern , new TextSearchOptions(true));
when pattern is a String, but if pattern is a compiled Pattern from regular expression it still fails.

Pattern.compile("[+]?\d{0,3}[ ]?\d{10}|[+]?\d{0,3}[ ]?(?:\d{3}-){2}\d{4}|[+]?\d{0,3}[ ]?(\d{3})[ ]?\d{3}-?\d{4}")
or even
Pattern.compile("+1 800-222-1222", Pattern.LITERAL)
When set new TextSearchOptions(true) same error. But when set as new TextSearchOptions(false) it does not fail. But, if I change to valid regular expression which is in fact a literal string like this
Pattern.compile(“800-222-1222”)
and set new TextSearchOptions(true) - then it fails, until I change to false. But it does not make any sense. The pattern is already compiled and valid.
This is redundant option which makes logic for API consumer overcomplicated.

@lion.brotzky

You need to set the flag to true only if you are comparing it with a regular expression. If you are comparing it exactly with a same value then use TextSearchOptions(false). Moreover, you may simplify it by using String value instead of using Pattern class instance. Please let us know your feedback.

It is unknown what will be in the input - literal or regular expression. When String is used but, in the input regular expression is received, then nothing is found.
All regex searches work properly and matches can be found :
Input string:
My phone number +1 800-222-1222
Regex 1:
[+]?\d{0,3}[ ]?\d{10}|[+]?\d{0,3}[ ]?(?:\d{3}-){2}\d{4}|[+]?\d{0,3}[ ]?(\d{3})[ ]?\d{3}-?\d{4}
Regex 2:
1 800-222-1222
However, following expression is invalid
Regex 3 - invalid
+1 800-222-1222
and it can be caught during compilation of pattern and set as literal. The failure to compile is only when it can be identified that TextSearchOptions(false) should be used.

For Regex 2, as no failure to compile (as it is absolutely valid regular expression) TextSearchOptions(true) is used. But, it causes initial failure.

Still, the same Regex 2 works perfectly to find all text fragments, but fails only on second attempt set new text value.

@lion.brotzky

I request you to summarize the problems and expected behavior of the API so that we can investigate it accordingly.

The problem is, when valid regular expression is just literal text, the function setText causes Null Pointer Exception on second instance of absorbed fragment.

@lion.brotzky

A ticket with ID PDFJAVA-40789 has been created in our issue tracking system to further investigate the issue on our end. This thread has been linked with the issue so that you may be notified once the issue will be fixed.

Hi,
I got Similar problem here. I can set string upto 4 character length. if i try to set a 5 character or more it gives me null reference exception.
i am attaching the pdf and screenshot.
My code is below:

// Open document
Document pdfDocument = new Document(OrigFile);

        // Create TextAbsorber object to find all instances of the input search phrase
        TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(origText);

        // Accept the absorber for all the pages
        pdfDocument.Pages.Accept(textFragmentAbsorber);

        // Get the extracted text fragments
        TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;

        // Loop through the fragments
        foreach (TextFragment textFragment in textFragmentCollection)
        {
            // Update text and other properties
            textFragment.Text = "abcde";
            textFragment.TextState.Font = FontRepository.FindFont("Arial");
            textFragment.TextState.FontSize = 12;
            textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Black);
        }

        // Save resulting PDF document.
        pdfDocument.Save(ResultFile);

error.png (52.1 KB)

ILL-000313256.pdf (21.5 KB)

@ferdous0905

I can not reproduce the issue with latest version so I request you to share the value of “origText”. ILL-000313256_out.pdf (47.8 KB)

@lion.brotzky

We have investigated this ticket. The problem is not in Aspose.PDF code. the 3rd Regex 3 - is invalid +1 800-222-1222, meta character ‘+’ can’t be used without wrapping.

Correct regex could be used as follows: [+]1 800-222-1222

We can test it here: Free Online Java Regular Expression Tester - FreeFormatter.com

Document document = new Document(dataDir+"Sample_pure.pdf");

        //Regex 1:
//        String regex = "[+]?\\d{0,3}[ ]?\\d{10}|[+]?\\d{0,3}[ ]?(?:\\d{3}-){2}\\d{4}|[+]?\\d{0,3}[ ]?(\\d{3})[ ]?\\d{3}-?\\d{4}";
//        TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(Pattern.compile(regex));
        //Regex 2: - this is actually not a regex pattern, but text to absorb
//        TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("1 800-222-1222");
        //Regex 3: - this is actually not a regex pattern, but text to absorb too
        TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("[+]1 800-222-1222");
        textFragmentAbsorber.setTextSearchOptions(new TextSearchOptions(true));

        document.getPages().accept(textFragmentAbsorber);
        TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();
        for (TextFragment textFragment : textFragmentCollection) {
            try {
                String s = "322-233-3223";
                textFragment.setText(s);
            } catch (Exception e) {
                e.printStackTrace();
                //TODO: remove this catch once the library is fixed - there is weird NPE
//                Log.a(PdfDocumentReplace.class).error(e);
            }
            textFragment.getTextState().setBackgroundColor(Color.getYellow());
        }
        document.save(dataDir+"PdfMaskResult_pure_.pdf");