Aspose.Total for java problem

www.evget.com · May 10, 2018, 8:12am

Aspose word for java problem:
1.I want to obtain the content of table,but I can not get it.The attachement is my code,could you have a look about it?
2.I use TableAbsorber to analysis the file. It can not get table when I use absorber.getTableList() in some PDF.Also some PDF can not return data when use getTextFragments.
3.I want to highlight some special content,how to realize this?problem.zip (1.1 MB)

asad.ali · May 10, 2018, 1:36pm

@www.evget.com

Thanks for contacting support.

We have tested the scenario using your both PDF files and Aspose.PDF for Java 18.4. We were able to extract table as well as highlight text in one of your files (i.e splitDocument1.pdf). Code snippets to extract table and highlight text are as follows:

Extract Table

Document pdfDocument = new Document(dataDir + "splitDocument1.pdf");
TableAbsorber absorber = new TableAbsorber();
absorber.visit(pdfDocument.getPages().get_Item(1));
TextFragment fragment = absorber.getTableList().get_Item(0).getRowList().get_Item(0).getCellList().get_Item(0).getTextFragments().get_Item(1);
System.out.println(fragment.getText());

Console output: extracted_celltext.png (4.6 KB)

Highlight Text

Document doc = new Document(dataDir + "splitDocument1.pdf");
TextFragmentAbsorber tfa = new TextFragmentAbsorber("GB18580");

doc.getPages().get_Item(1).accept(tfa);
HighlightAnnotation ha = new HighlightAnnotation(doc.getPages().get_Item(1), tfa.getTextFragments().get_Item(1).getRectangle());
ha.setColor(Color.getYellow());
doc.getPages().get_Item(1).getAnnotations().add(ha);
doc.save(dataDir + "splitDocument1_Highlight.pdf");

Outputs:
Highlighted_Text_Output.png (73.3 KB)
splitDocument1_Highlight.pdf (950.5 KB)

Furthermore, we have also observed that the API was unable to extract table from your second PDF document i.e splitDocument2.pdf and for that, we have logged an issue as PDFJAVA-37706 in our issue tracking system. We will further look into the details of the issue and keep you posted with the status of its rectification.

However, we were able to highlight the text in second PDF document by using same code snippet which has been shared above. An output PDF document with highlighted text is also attached for your reference. splitDocument2_Highlight.pdf (159.3 KB)

jillwen · June 25, 2018, 6:29am

Hi,I thought I have just met the same problem with you and I wonder if you have fixed it or not,if the answer is yes,would you please do me a favor to send me the PDF that you used for test.My email is jillwen121@gmail.com,thanks a lot.@asad.ali

asad.ali · June 25, 2018, 10:51am

@jillwen

Thanks for contacting support.

Would you please confirm if you are facing issue regarding table or text extraction. We will check related detail at our side and share our feedback with you accordingly.

jillwen · June 25, 2018, 2:40pm

@asad.ali I got some error messages when using TableAbsorber to get the content of the table,it shows that the size of ‘TextFragment’ object is 0,but actually every cell in the table is not blank.

asad.ali · June 25, 2018, 8:47pm

@jillwen

Thanks for writing back.

Would you please make sure that you are using latest version of the API i.e. Aspose.PDF for Java 18.5? If so is the case and you are still facing the issue, please share your sample PDF document along with information of JDK version you are working with. We will test the scenario in our environment and address it accordingly.

jillwen · June 26, 2018, 1:21am

I am using the latest version,my JDK version is 1.8.0_172 and I’m using the demonstration code of TableAbsorber,my PDF has been sent to you by email.Thanks for help.

asad.ali · June 26, 2018, 1:02pm

@jillwen

We did not receive any PDF document in email. Please make sure that attach PDF document with your post or send it in a private message. However, if sending document via email is convenient for you, we have shared email address with you in a private message.

asad.ali · June 28, 2018, 8:27pm

@jillwen

Thanks for sharing sample PDF document in private message.

We have tested the scenario in our environment and were able to observe that API was unable to extract table data from your PDF document. This issue has been logged under the ticket ID PDFJAVA-37825, in our issue tracking system. We will further look into details of the issue and keep you posted with its resolution status. Please spare us little time.

We are sorry for the inconvenience.

asad.ali · July 4, 2018, 7:27am

@jillwen

Regarding PDFJAVA-37825, we have found this issue resolved in Aspose.PDF for Java 18.6. Please download latest version and try again. In case you still face any issue, feel free to let us know.

jillwen · July 19, 2018, 1:12pm

I’m sorry that I still not figured it out.

asad.ali · July 19, 2018, 6:39pm

@jillwen

Earlier we found an issue while extracting table from your PDF document i.e. table.pdf (69.1 KB), that API was throwing an exception i.e. Exception.png (10.1 KB). The issue has been resolved in Aspose.PDF for Java 18.6 and now API is able to extract table as well as table data from same PDF document using following code snippet:

// load existing PDF file
Document pdfDocument = new Document(dataDir+"table.pdf");
// Create TableAbsorber object to find tables
TableAbsorber absorber = new TableAbsorber();
// Visit first page with absorber
absorber.visit(pdfDocument.getPages().get_Item(1));
System.out.println(absorber.getTableList().size()); 
// Get access to first table on page, their first cell and text fragments in it
TextFragment fragment = absorber.getTableList().get_Item(0).getRowList().get_Item(0).getCellList().get_Item(0).getTextFragments().get_Item(1);
System.out.println(fragment.getText());

Would you please try using above code snippet with your PDF document and in case you are facing any issue, please share some more details about it like, what exception you are facing or which of your requirement is not fulfilled by the API. We will further check details and assist you accordingly.

aspose.notifier · August 16, 2018, 10:47pm

The issues you have found earlier (filed as PDFJAVA-37825) have been fixed in Aspose.PDF for Java 18.7.

aspose.notifier · December 2, 2018, 8:27pm

The issues you have found earlier (filed as PDFJAVA-37706) have been fixed in Aspose.PDF for Java 18.11.

asad.ali · March 5, 2023, 10:04pm

asad.ali · March 5, 2023, 10:04pm