We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Can not get XY Coordinates

I am new to Asopse. I have donwloaded the trial version. As per demo and documentation I would get the following code for extracting XY co-ordinates of text from PDF file.


//create PdfExtractor object
PdfExtractor extractor = new PdfExtractor();

//bind input pDF file
extractor.bindPdf(“input.pdf”);

//set start and end pages
extractor.setStartPage(1);
extractor.setEndPage(2);

//extract text
extractor.extractText();

//extract text segments
TextSegment[] segments = extractor.getFormattedText();

//get text information
for(int index = 0; index < segments.length; index++)
{
TextSegment text = segments[index];
System.out.println(“Segment #”+index);
System.out.println(text.getText());
System.out.println(text.getFontName());
System.out.println(text.getFontSize());
System.out.println(text.getTextColor().toString());
System.out.println(text.getX());
System.out.println(text.getY());
}


I am getting compilation error for line
TextSegment[] segments = extractor.getFormattedText();
The error shown is: The method getFormattedText() is undefined for the type PdfExtractor

I am using aspose.pdf-new-4.0.0.jar file.
Is this removed from newer version or am I missing something here.

In general how can I get XY coordinate of any text from PDF file?

Thanks in advance.

Hi Geeta,


Thanks for using our products.

I have tested the scenario and have observed that getFormattedText(…) method seems to be missing from com.aspose.pdf.facades.PdfExtractor class. For the sake of correction, I
have logged this issue as
PDFNEWJAVA-33470 in our issue
tracking system. We will further look into the details of this problem and will
keep you updated on the status of correction. Please be patient and spare us
little time. We are sorry for this inconvenience.

As a workaround, I would suggest you to please try using the code snippet shared over

<o:p></o:p>

Hi Geeta,


Thanks for your patience.

We have further investigated the issue PDFNEWJAVA-33470 reported earlier and following are our observations. In autoported mergedAPI release of Aspose.Pdf for Java, the working principle of some classes has been changed. So in order to get all the segments of text with formatting options, use object TextFragmentAbsorber. You may also consider visiting Extract Text From All the Pages of a PDF Document

[Java]

com.aspose.pdf.Document doc = new com.aspose.pdf.Document(“input.pdf”);<o:p></o:p>

com.aspose.pdf.TextFragmentAbsorber tfa = new com.aspose.pdf.TextFragmentAbsorber();<o:p></o:p>

doc.getPages().accept(tfa);<o:p></o:p>

com.aspose.pdf.TextFragmentCollection tfc = tfa.getTextFragments();