Hello;
I have made a code that search for some text into certain PDF files.
In most of them the libreary works fine but there are a few that AsposePDF can´t find text. All files have same format.
The code that I use is:
final PdfTextExtractor textExtractor = new PdfTextExtractor(reader);
for (int index = 1; index <= reader.getNumberOfPages(); index++) {
final String text = textExtractor.getTextFromPage(index);
boolean finFacturaEncontrado = true;
for (final String literal : textos) {
if (!StringUtils.contains(text, literal)) {
finFacturaEncontrado = false;
break;
}
}
if (finFacturaEncontrado) {
return index;
}
}
Basically we want to search for certain words (“Totales”, “Bruto:”, “Base imponible:”, “Impuestos:”, “Total:” ) into files and there are some of them that have this words and the library don´t find.
I have attach two files. Both of them have this words. With the file called “OK.pdf” Aspose can extract words but in the file called “KO.pdf” don´t.
Can you help me please?
Thank you
Hi there,
Document document = new Document(myDir + “KO.pdf”);<o:p></o:p>
String words[]= new String[]{"Totales", "Bruto:", "Base imponible:", "Impuestos:", "Total:"};
for (String wrd : words) {
com.aspose.pdf.TextFragmentAbsorber textFragmentAbsorber1 = new
com.aspose.pdf.TextFragmentAbsorber(
wrd, new TextSearchOptions(true));
for (int cnt = 1; cnt <= document.getPages().size(); cnt++) {
Page page = document.getPages().get_Item(cnt);
page.accept(textFragmentAbsorber1);
}
TextFragmentCollection textFragmentCollection1 = textFragmentAbsorber1
.getTextFragments();
for (int cnt1 = 1; cnt1 <= textFragmentCollection1.size(); cnt1++) {
TextFragment textFragment = textFragmentCollection1
.get_Item(cnt1);
for (TextSegment textSegment : (Iterable)textFragment.getSegments())
{
textSegment.getTextState().setForegroundColor(com.aspose.pdf.Color.getBlack());
textSegment.getTextState().setBackgroundColor(com.aspose.pdf.Color.getLightBlue());
System.out.println(textSegment.getText() + " X: "+(float) textSegment.getPosition().getXIndent()+" Y:"+(float) textSegment.getPosition().getYIndent());
com.aspose.pdf.Rectangle rect = new com.aspose.pdf.Rectangle(
(float) textSegment.getPosition().getXIndent(),
(float) textSegment.getPosition().getYIndent(),
(float) textSegment.getPosition().getXIndent()
+ (float) textSegment.getRectangle()
.getWidth(), (float) textSegment
.getPosition().getYIndent()
+ (float) textSegment.getRectangle()
.getHeight());
HighlightAnnotation highlight = new HighlightAnnotation(
textFragment.getPage(), rect);
highlight.setOpacity(.80);
highlight.setBorder(new Border(highlight));
highlight.setColor(com.aspose.pdf.Color.getLightBlue());
textFragment.getPage().getAnnotations().add(highlight);
}
}
}
// save updated document - you can set your output file here
document.save(myDir + "KO_highlight.pdf");
Please feel free to contact us for any further assistance.
Best Regards,
Ok Thank you
It works. Sorry for the confussion.
Hi there,
Hello, I reached this post from searching for a way to search a PDF for a certain string of text. May I know which part of the code above that is? Could you give me a sample code if I were to search for the string “OR Number” in a PDF?
Hi Junmil,