Regular expression pattern text search is not working

Hi Aspose Team,

I am trying to find “coverage” word only if it’s available in end of the line in the PDF file using regular expression. But it’s not finding word in PDF file.

But this “(?i)coverage$” pattern is working if I test with regex (http://www.regexr.com/) and same pattern is not working in aspose.pdf text search.

Refer my source code below.

TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("(?i)coverage$");

TextSearchOptions textSearchOptions = new TextSearchOptions(true);

textFragmentAbsorber.TextSearchOptions = textSearchOptions;<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

editor.Document.Pages[pdfPage.Number].Accept(textFragmentAbsorber);

TextFragmentCollection textFragmentCollection =
textFragmentAbsorber.TextFragments;

int value = textFragmentCollection.Count;

Why
this
“(?i)coverage$” pattern is not finding the
word in pdf?

Please
help me to find out end of the line text search in PDF file using aspose.pdf
tool.

Regards,

Ganesan. B <o:p></o:p>

Hi Ganesan,


Thanks for contacting support.

Can you please share the source PDF file so that we can test the scenario at our end. However for the testing purposes, I have tried using 232835.pdf shared in your other forum thread and as per my observations, no instance of Coverage word is being found.

Hi Team,

Refer this attached input.pdf file for this testing.

Regards,

Ganesan. B

Hi Ganesan,


Thanks for sharing the resource file.

I have tested the scenario using PDF document shared earlier and as per my observations, the count of textFragmentCollection is being returned as 0 because we are searching string (?i)coverage$ whereas source document only contains string Testing. Can you please double check at your end.

Hi Team,

Sorry for wrong document, Please refer this attached test_coverage.pdf file for this testing.

Regards,

Ganesan. B

Hi Ganesan,


I have again tested the scenario using new PDF document which you have shared and as per my observations, I am still getting 0 as textFragmentCollection.Count. Can you please confirm if it’s the same issue which you are facing.

[C#]

Document doc = new
Document(“c:/pdftest/test_coverage
(1).pdf”
);<o:p></o:p>

TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("(?i)coverage$");

TextSearchOptions textSearchOptions = new TextSearchOptions(true);

textFragmentAbsorber.TextSearchOptions = textSearchOptions;

doc.Pages[1].Accept(textFragmentAbsorber);

TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;

int value = textFragmentCollection.Count;

Console.WriteLine(value);

Hi Nayyer Shahbaz,

Yes, i have same issue.

Please help me to find only end of the line "Coverage" word using regular expression in this PDF file.

Regards,

Ganesan. B

Hi Ganesan,


Thanks for sharing the details. I have logged this problem as PDFNEWNET-36548 in
our issue tracking system. We will further look into the details of this
problem and will keep you updated on the status of correction. Please be
patient and spare us little time. We are sorry for this inconvenience.<o:p></o:p>

Hi Team,


Any updates on this issue?

Regards,
Ganesan. B

Hi Ganesan,


Thanks for your inquiry. We have recently noticed the issue and it is still pending for investigation in the queue with other priority tasks. As soon as investigation of your reported issue is completed then we will share our initial findings/ETA with you. We will keep you updated about the issue resolution progress via this forum thread.

We are sorry for the inconvenience caused.

Best Regards,

The issues you have found earlier (filed as PDFNEWNET-36548) have been fixed in Aspose.Pdf for .NET 9.4.0.


This message was posted using Notification2Forum from Downloads module by Aspose Notifier.

Hi Ganesan,


Thanks for your patience. In reference to above issue resolution, please note test_coverage (1).pdf text contains spaces in the end of each line. Using of $ in regex pattern by default means the end of all text, not end of line in multi-line text. We recommend to use (\n\r|\n|\r) instead. Please check following code snippet for the purpose. It will help you to find all occurrences of “coverage”.

Document doc = new Document(“test_coverage (1).pdf”);<o:p></o:p>

TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(@"(?i)coverage\s(\r\n|\n|\r)");

TextSearchOptions textSearchOptions = new TextSearchOptions(true);

textFragmentAbsorber.TextSearchOptions = textSearchOptions;

doc.Pages[1].Accept(textFragmentAbsorber);

TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;

int value = textFragmentCollection.Count;

Console.WriteLine(value);


Please feel free to contact us for any further assistance.


Best Regards,

Hi Aspose Team,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

As per your suggestion we have tried your solution with Aspose.Pdf for .NET 9.4.0. Version aspose.pdf DLL. But still we have same issue.

Please give me this scenario working POC with Aspose.Pdf for .NET 9.4.0 aspose.pdf.dll.

NOTE: it’s very high priority for us to fix production issue. Please let me know your solution or working POC ASAP.

Regards,

Ganesan. Bv

Hi Ganesan,


Thanks for your feedback. While testing again the suggested code with your above shard file (test_coverage.pdf), I have noticed it contains 4 occurrences of coverage word at the end of line but suggested code is finding only three of them. We are revisiting the solution and will get back to you soon. Meanwhile, we will appreciate if you please confirm whether you are getting same issue with suggested code or some different?

We are truly sorry for the inconvenience caused.

Best Regards,

HI Ganesan,


You may use following expression to find all occurrences of “coverage” at the end of line. Hopefully it will help you to accomplish the task.

@"(?i)coverage.?(\s)(\r\n|\r|\n)"

Please feel free to contact us for any further assistance.

Best Regards,