ASPOSE.PDF search words

Hello;

I have made a code that search for some text into certain PDF files.
In most of them the libreary works fine but there are a few that AsposePDF can´t find text. All files have same format.

The code that I use is:


final PdfTextExtractor textExtractor = new PdfTextExtractor(reader);

for (int index = 1; index <= reader.getNumberOfPages(); index++) {
final String text = textExtractor.getTextFromPage(index);

boolean finFacturaEncontrado = true;

for (final String literal : textos) {
if (!StringUtils.contains(text, literal)) {
finFacturaEncontrado = false;
break;
}
}

if (finFacturaEncontrado) {
return index;
}
}

Basically we want to search for certain words (“Totales”, “Bruto:”, “Base imponible:”, “Impuestos:”, “Total:” ) into files and there are some of them that have this words and the library don´t find.

I have attach two files. Both of them have this words. With the file called “OK.pdf” Aspose can extract words but in the file called “KO.pdf” don´t.

Can you help me please?

Thank you

Hi there,


Thanks for your inquiry. It seems you have shared the iText sample code. However, I have tested the scenario with Aspose.Pdf for Java 10.4.0 and unable to notice the issue. Please check following sample code snippet to search and highlight the words list, its searching words successfully. Please download and try latest version of Aspose.Pdf for Java, it will resolve the issue.

Document document = new Document(myDir + “KO.pdf”);<o:p></o:p>

String words[]= new String[]{"Totales", "Bruto:", "Base imponible:", "Impuestos:", "Total:"};

for (String wrd : words) {

com.aspose.pdf.TextFragmentAbsorber textFragmentAbsorber1 = new

com.aspose.pdf.TextFragmentAbsorber(

wrd, new TextSearchOptions(true));

for (int cnt = 1; cnt <= document.getPages().size(); cnt++) {

Page page = document.getPages().get_Item(cnt);

page.accept(textFragmentAbsorber1);

}

TextFragmentCollection textFragmentCollection1 = textFragmentAbsorber1

.getTextFragments();

for (int cnt1 = 1; cnt1 <= textFragmentCollection1.size(); cnt1++) {

TextFragment textFragment = textFragmentCollection1

.get_Item(cnt1);

for (TextSegment textSegment : (Iterable)textFragment.getSegments())

{

textSegment.getTextState().setForegroundColor(com.aspose.pdf.Color.getBlack());

textSegment.getTextState().setBackgroundColor(com.aspose.pdf.Color.getLightBlue());

System.out.println(textSegment.getText() + " X: "+(float) textSegment.getPosition().getXIndent()+" Y:"+(float) textSegment.getPosition().getYIndent());

com.aspose.pdf.Rectangle rect = new com.aspose.pdf.Rectangle(

(float) textSegment.getPosition().getXIndent(),

(float) textSegment.getPosition().getYIndent(),

(float) textSegment.getPosition().getXIndent()

+ (float) textSegment.getRectangle()

.getWidth(), (float) textSegment

.getPosition().getYIndent()

+ (float) textSegment.getRectangle()

.getHeight());

HighlightAnnotation highlight = new HighlightAnnotation(

textFragment.getPage(), rect);

highlight.setOpacity(.80);

highlight.setBorder(new Border(highlight));

highlight.setColor(com.aspose.pdf.Color.getLightBlue());

textFragment.getPage().getAnnotations().add(highlight);

}

}

}

// save updated document - you can set your output file here

document.save(myDir + "KO_highlight.pdf");

Please feel free to contact us for any further assistance.


Best Regards,

Ok Thank you

It works. Sorry for the confussion.

Hi there,


Thanks for your feedback. It is good to know that you have managed accomplish your requirements.

Please keep using our API and feel free to contact us for any question or concern, we will be more than happy to extend our support.

Best Regards,

Hello, I reached this post from searching for a way to search a PDF for a certain string of text. May I know which part of the code above that is? Could you give me a sample code if I were to search for the string “OR Number” in a PDF?

Hi Junmil,


Thanks for your inquiry. You can easily search and replace text using Aspose.Pdf for Java. I have shared the details in your original query, please check it for the solution.

Best Regards,