Search and extract text from PDF

mistre83 · May 22, 2014, 2:26am

Hi,

iàve read the tutorial how search text on all paged of PDF document.

Searching on the forum i’ve found how to use Regular Expression for searcing “with ignore case”.

Now, when i’ve found a term, i would like to extract the entire paragraph and not only the single word. This is my actual code:

</div><div><div><span class="Apple-tab-span" style="white-space:pre">		</span>Document pdfDocument = new Document("document.pdf");</div><div><span class="Apple-tab-span" style="white-space:pre">		</span></div><div><span class="Apple-tab-span" style="white-space:pre">		</span>TextFragmentAbsorber absorber = new TextFragmentAbsorber("(?i)stringtosearch", new TextSearchOptions(true));</div><div><span class="Apple-tab-span" style="white-space:pre">		</span></div><div><span class="Apple-tab-span" style="white-space:pre">		</span>pdfDocument.getPages().accept(absorber);</div><div><span class="Apple-tab-span" style="white-space:pre">		</span></div><div><span class="Apple-tab-span" style="white-space:pre">		</span>TextFragmentCollection collection = absorber.getTextFragments();</div><div><span class="Apple-tab-span" style="white-space:pre">		</span></div><div><span class="Apple-tab-span" style="white-space:pre">		</span>for(TextFragment fragment : (Iterable<TextFragment>) collection)</div><div><span class="Apple-tab-span" style="white-space:pre">		</span>{</div><div><span class="Apple-tab-span" style="white-space:pre">			</span>for(TextSegment segment : (Iterable<TextSegment>)fragment.getSegments())</div><div><span class="Apple-tab-span" style="white-space:pre">			</span>{</div><div><span class="Apple-tab-span" style="white-space:pre">				</span>System.out.println("Text: " + segment.getText());</div><div><span class="Apple-tab-span" style="white-space:pre">			</span>}</div><div><span class="Apple-tab-span" style="white-space:pre">		</span>}</div></div><div>

tilal.ahmad · May 22, 2014, 11:48pm

Hi Francesco,

Thanks for your inquiry. We will appreciate if you please share your sample document and intended result. It would help us to address your requirement exactly.

Best Regards,