The extraction of this file
TEST_TEXTE.pdf (26.8 KB)
have an issue, some fragment of the text are duplicate as you can see blow in the result text extraction
TEST_TEXTE-2.txt.zip (321 Bytes)
I use the last version of Aspose (24-10)
below the java used to extract
image.png (26.5 KB)
asposeExtractText.zip (779 Bytes)
can you indicate how to solved this is issue?
regards
Fabien
Instead of initializing TextFragmentAbsorber outside the loop, please use it inside i.e. create new instance for every page (OR you can extract text from all page at once):
System.out.println("Page count: " + document.getPages().size());
for (Page aPage : document.getPages()) {
TextFragmentAbsorber tfa = new TextFragmentAbsorber();
// Extract text fragments from the page
aPage.accept(tfa);
for (TextFragment tf : tfa.getTextFragments()) {
// Write extracted contents to the writer
writer.append(tf.getText());
writer.newLine();
System.out.println(tf.getText()); // Print the extracted text to the console
}
}
Hi asad.
I Change my java, as you can see in the image below
image.png (58.0 KB)
but nothing change i still have the duplicate fragment
Regards
Fabien
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): PDFJAVA-44535
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.