Aspose Pdf 17.9 doesn't support broken search term's using regex

Dear Team,

The older version (aspose pdf 16.10.0) has support for broken search term’s, finding using regular expression.
However with latest version (aspose pdf 17.9) doesn’t support the broken search text using regex.

Kindly let me know in case if we need to configure any properties/ it is intentional implementation to suppress broken words.

Thanks,
Radha Krishna Garnepudi

@radhakrishna.garnepu,
Kindly send us a sample PDF along with the target search phrase and regular expression. We will investigate and share our findings with you.

Dear Imran/Team,

Thanks for the quick response.
We are looking for following results using aspose pdf.

  1. Broken search text :
    a. With the latest version aspose pdf 17.9 java, the target broken search phrase i.e. [software, operating systems] (word sentences broken to next line) in the attached example, couldn’t found using regular expression ([(.*?)]) However with aspose pdf 16.10.0 version its working fine. So is there no backward compatibility with this use case ? or Is it intentional change or do we need to use aspose pdf 16.10.0 in this case?

  2. Search text by color:
    a. Is there a way to find/ search texts by color, lets in this sample pdf Anyone, anywhere search term which is in blue color.
    b. One workaround we have seen is find text/text fragments with regular expression and verifying using its foreground color.
    textFragment.getTextState().getForegroundColor().toString()
    c. Is there a direct way with search text options logic using colors, in this case blue color?
    d. In case if there is no help in API find target phrase by color could you please help us what regular expression could be the right bet in case of text in blue color with spaces.

  3. Find image size in the page:
    a. Is there a code snippet which you can share to find the image size / content size for a particular page ?
    Page page = document.getPages().get_Item(i);
    b. Is there a code snippet which you can share to find the whether page has images, not the document level.

Kindly let me know in case if you need more details on these queries. Highly appreciate your help.

pdf-sample.pdf (171.6 KB)

Regards,
Radha Krishna

@radhakrishna.garnepu,

Kindly send the code which works with version 16.10.0 and does not work with the latest version 17.9.

There is no direct way to search the text with foreground color. We have logged an enhancement ticket ID PDFJAVA-37144 in our issue tracking system. We have linked your post to this ticket and will keep you informed regarding any available updates. The regular expressions do not help in identify the color of text.

The Resources.Images member of the Page instance, allows to count and retrieve image size by index. XImage class offer getWidth and getHeight members to retrieve the size of an image. Please refer to this help topic: Extract Images from the PDF File