Extract elements from PDF with aspose java

sshenllan · April 11, 2016, 4:32am

Hi Developer,

Can I ask 3 questions.

1 (general) Can I get all element detail info from pdf with aspose java pdf package.

please refer PDF/java code attached.

2 (detail) When I try to get text info with textFragment,I got wrong text color for “Presentation QA TEST” in page 1.in PDF it is RED, but return value from code is “Foreground Color :- #1F24DC”.

And background color is wrong for “QA Text” in page 2, it should be blue but “background Color :- null”.

3 (detail) how can I get save graph to image to disk.(Page 16)

Best Regards,

Stone

sshenllan · April 11, 2016, 8:09pm

Hi Developer,

Can you please help reply this.

Regards,

Stone

codewarior · April 12, 2016, 7:00am

sshenllan:

2 (detail) When I try to get text info with textFragment,I got wrong text color for “Presentation QA TEST” in page 1.in PDF it is RED, but return value from code is “Foreground Color :- #1F24DC”.

And background color is wrong for “QA Text” in page 2, it should be blue but “background Color :- null”.
Hi Lei,

Thanks for using our API’s.

I have tested the scenario and have managed to reproduce above stated issues. For the sake of correction, I have logged them as PDFNEWJAVA-35726 in our issue tracking system. We will further look into the details of these problem and will keep you posted on the status of correction. Please be patient and spare us little time. We are sorry for this inconvenience.

codewarior · April 12, 2016, 7:15am

1 (general) Can I get all element detail info from pdf with aspose java pdf package.
Hi Lei,

Thanks for using our API’s.

A PDF file is comprised of Image, Text, Attachment, Annotations, Graph, Form fields and link objects and in order to retrieve information for all the objects/elements inside it, you need to parse the document and determine individual object. For more information, please visit Working with com.aspose.pdf

3 (detail) how can I get save graph to image to disk.(Page 16)
The chart inside PDF file is drawn as graph object and in order to convert it to Image format, you need to convert the whole page or particular region of page to Image format. For further details, please visit

Convert PDF Pages to JPEG Image

sshenllan · April 12, 2016, 9:40am

Hi Nayyer,

great thanks for your replying.

actually I tried to get chart graph today. but I can not get chart object at all. my code is:

loop all pages and try Paragraphs paragraphs = page.getParagraphs();

but paragraphs length always be 0. graph should exist in paragraphs, right?

only if I got graph object, I can save its rectangle to image.(do not want save whole page)

if not , Can you please tell me since I can not find it in programming guild.

Q2 : can not get bullet object either, can you please help with code, since I tried all in programming guild either but get nothing.

Thanks!

Stone

codewarior · April 13, 2016, 12:38pm

Hi Lei,

Thanks for sharing the details.

I am afraid currently Aspose.Pdf for Java does not support the feature to manipulate graph charts in existing PDF files and therefore its rectangular region cannot be retrieved and cannot be transformed to Image format. However for the sake of implementation, I have logged it as PDFNEWJAVA-35730 in our issue tracking system under New Features list. We will further look into the details of this requirement and will keep you posted on the status of correction. Please be patient and spare us little time. We are sorry for this inconvenience.

aspose.notifier · February 23, 2017, 8:09am

The issues you have found earlier (filed as PDFJAVA-35730) have been fixed in Aspose.Pdf for Java 17.2.0.

This message was posted using Notification2Forum from Downloads module by Aspose Notifier.

tilal.ahmad · March 2, 2017, 1:10am

Hi Lei,

Thanks for your patience. As stated above, your reported issue PDFJAVA-35730 has been resolved in Aspose.Pdf for Java 17.2.0. Please note Chart object is placed as Marked Content object in PDF document. We have implemented a new method extractMarkedContentAsImage() to extract Marked Contents as image. Please check following code snippet to extract Chart objects. Hopefully it will help you to accomplish the task.

// Open document
Document document = new Document("sample.pdf");

// Instantiate PdfExtractor
PdfExtractor pdfExtractor = new PdfExtractor();

// Extract Chart objects as images in a folder
pdfExtractor.extractMarkedContentAsImages(document.getPages().get_Item(1), "C:/Temp/Charts_page_1");

document.close();

Best Regards,

aspose.notifier · April 7, 2017, 1:35am

The issues you have found earlier (filed as PDFJAVA-35726) have been fixed in Aspose.Pdf for Java 17.3.0 Release Notes.

This message was posted using Notification2Forum from Downloads module by Aspose Notifier.