We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Aspose pdf- looping through textFragmentCollection stop before finishing

Dears,

We are using aspose pdf 4.6.0 to loop over all the text of a pdf file in order to get the coordinates of each word,

We are using this code:

com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document(filePath);
com.aspose.pdf.TextFragmentAbsorber textFragmentAbsorber = new com.aspose.pdf.TextFragmentAbsorber();

pdfDocument.getPages().accept(textFragmentAbsorber);
// get the extracted text fragments into collection
com.aspose.pdf.TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();

for (com.aspose.pdf.TextFragment textFragment : (Iterable<com.aspose.pdf.TextFragment>) textFragmentCollection)
{
TextSegmentCollection textSegmentCollection = textFragment.getSegments();
for (com.aspose.pdf.TextSegment textSegment : (Iterable<com.aspose.pdf.TextSegment>) textSegmentCollection) {
if (!textSegment.getText().trim().equalsIgnoreCase("")) {
//do work here
}
}
}

While debuging, i can see that there is for example 64 fragments, but it only enter in the loop 5 times then exit the loop without exception,
The pdf file which i am using is attached,

Thank you

Hi Karine,

We are sorry for the inconvenience caused. While testing the scenario with the latest version of Aspose.Pdf for Java 9.0.0, We have managed to reproduce the reported issue and logged it in our bug tracking system as PDFNEWJAVA-34191 for further investigation and resolution. We will notify you via this thread as soon as it is resolved.

Please feel free to contact us for any further assistance.

Best Regards,

Hello,

Please any updates?
It’s an urgent matter , would appreciate it if this issue is solved as soon as possible.

Thank you,

Hi Karine,


Thanks for your inquiry. I am afraid we have recently noticed the issue and investigation of issue is still pending due to other priority tasks. However, we have requested our development team to investigate it and share their finding at their earliest. We will notify you via this forum thread as soon as we made some significant progress towards issue resolution.

We are sorry for the inconvenience caused.

Best Regards,

Hi,

Any updates concerning <span style=“font-size:10.0pt;font-family:“Arial”,“sans-serif”;
mso-fareast-font-family:“Times New Roman””>PDFNEWJAVA-34191 ?

Thank you.

Hi Karine,


Thanks for your inquiry. I am afraid the issue is still not resolved and it is pending for investigation due to other priority tasks. We will update you via this forum thread as soon as we made some significant progress towards the issue resolution.

Thanks for your patience and cooperation.

Best Regards,