Aspose .PDF Java - Search PDF content for string text

I am using Aspose.pdf java in my Talend job, here is a basic breakdown of the job:


  • accept PDF file - ok
  • search PDF file for certain string - my current problem
  • rename PDF file based on certain string - should be ok after problem is solved

I have attached a file and in the file I would like to search for the string of 8 numbers, space and 2 letters (“12345678 AB”) since this should be the new file name for my PDF. I will be working with diff PDF layouts all with the same “Factuurnummer” before the 8-number and 2-letter string if that would help.

Is there a way to do this preferrably with an examle? I have just chatted with support and was sent this link, but Im having a hard time trying to modify it skip the text file part and search for the string I will be working on.

Any and all help related to this issue would me much appreciated.

Best regards to all.

Hi Junmil,


Thanks for your inquiry. You can easily search and replace text using Aspose.Pdf. Please check following documentation links and sample code snippet for the purpose. It will help you to accomplish the task.


//Open document<o:p></o:p>

com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document("Sample.pdf");

//Create TextAbsorber object to find all instances of the input search phrase

com.aspose.pdf.TextFragmentAbsorber textFragmentAbsorber = new com.aspose.pdf.TextFragmentAbsorber("12345678 AB");

//Accept the absorber for all the pages

pdfDocument.getPages().accept(textFragmentAbsorber);

//Get the extracted text fragments into collection

com.aspose.pdf.TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();

//Loop through the fragments

for(com.aspose.pdf.TextFragment textFragment : (Iterable)textFragmentCollection)

{

System.out.println("Text :- " + textFragment.getText());

System.out.println("Position :- " + textFragment.getPosition());

System.out.println("XIndent :- " + textFragment.getPosition().getXIndent());

System.out.println("YIndent :- " + textFragment.getPosition().getYIndent());

System.out.println("Font - Name :- " + textFragment.getTextState().getFont().getFontName());

System.out.println("Font - IsAccessible :- " + textFragment.getTextState().getFont().isAccessible());

System.out.println("Font - IsEmbedded - " + textFragment.getTextState().getFont().isEmbedded());

System.out.println("Font - IsSubset :- " + textFragment.getTextState().getFont().isSubset());

System.out.println("Font Size :- " + textFragment.getTextState().getFontSize());

System.out.println("Foreground Color :- " + textFragment.getTextState().getForegroundColor());

}


Please feel free to contact us for any further assistance.

Best Regards,

Thanks for the reply, could you shorten the code to search a 50-character string starting from the first instance of “Invoice” searched? Im having a hard time trying to understand all the code, been using Aspose since only yesterday.

Hi Junmil,


Thanks for inquiry. For your requirement you need to use regular expression. Please check following documentation link for searching text using regular expression. Hopefully it will help you to accomplish the task.


Please feel free to contact us for any further assistance.

Best Regards,