Extract elements from PDF with aspose java

Hi Developer,


Can I ask 3 questions.

1 (general) Can I get all element detail info from pdf with aspose java pdf package.

please refer PDF/java code attached.
2 (detail) When I try to get text info with textFragment,I got wrong text color for “Presentation QA TEST” in page 1.in PDF it is RED, but return value from code is “Foreground Color :- #1F24DC”.

And background color is wrong for “QA Text” in page 2, it should be blue but “background Color :- null”.

3 (detail) how can I get save graph to image to disk.(Page 16)

Best Regards,
Stone

Hi Developer,


Can you please help reply this.


Regards,
Stone

sshenllan:
2 (detail) When I try to get text info with textFragment,I got wrong text color for “Presentation QA TEST” in page 1.in PDF it is RED, but return value from code is “Foreground Color :- #1F24DC”.

And background color is wrong for “QA Text” in page 2, it should be blue but “background Color :- null”.
Hi Lei,

Thanks for using our API’s.

I have tested the scenario and have managed to reproduce above stated issues. For the sake of correction, I have logged them as PDFNEWJAVA-35726 in our issue tracking system. We will further look into the details of these problem and will keep you posted on the status of correction. Please be patient and spare us little time. We are sorry for this inconvenience.

1 (general) Can I get all element detail info from pdf with aspose java pdf package.
Hi Lei,

Thanks for using our API’s.

A PDF file is comprised of Image, Text, Attachment, Annotations, Graph, Form fields and link objects and in order to retrieve information for all the objects/elements inside it, you need to parse the document and determine individual object. For more information, please visit Working with com.aspose.pdf

3 (detail) how can I get save graph to image to disk.(Page 16)
The chart inside PDF file is drawn as graph object and in order to convert it to Image format, you need to convert the whole page or particular region of page to Image format. For further details, please visit

Hi Nayyer,

great thanks for your replying.

actually I tried to get chart graph today. but I can not get chart object at all. my code is:

loop all pages and try Paragraphs paragraphs = page.getParagraphs();

but paragraphs length always be 0. graph should exist in paragraphs, right?
only if I got graph object, I can save its rectangle to image.(do not want save whole page)

if not , Can you please tell me since I can not find it in programming guild.


Q2 : can not get bullet object either, can you please help with code, since I tried all in programming guild either but get nothing.

Thanks!
Stone

Hi Lei,


Thanks for sharing the details.

I am afraid currently Aspose.Pdf for Java does not support the feature to manipulate graph charts in existing PDF files and therefore its rectangular region cannot be retrieved and cannot be transformed to Image format. However for the sake of implementation, I have logged it as PDFNEWJAVA-35730 in our issue tracking system under New Features list. We will further look into the details of this requirement and will keep you posted on the status of correction. Please be patient and spare us little time. We are sorry for this inconvenience.

The issues you have found earlier (filed as PDFJAVA-35730) have been fixed in Aspose.Pdf for Java 17.2.0.


This message was posted using Notification2Forum from Downloads module by Aspose Notifier.

Hi Lei,


Thanks for your patience. As stated above, your reported issue PDFJAVA-35730 has been resolved in Aspose.Pdf for Java 17.2.0. Please note Chart object is placed as Marked Content object in PDF document. We have implemented a new method extractMarkedContentAsImage() to extract Marked Contents as image. Please check following code snippet to extract Chart objects. Hopefully it will help you to accomplish the task.

//Open document<o:p></o:p>

Document document = new Document("sample.pdf");

//instantiate PdfExtractor

PdfExtractor pdfExtractor = new PdfExtractor();

//Extract Chart objects as image in a folder

pdfExtractor.extractMarkedContentAsImages(document.getPages().get_Item(1), "C:/Temp/Charts_page_1");

document.close();


Best Regards,

The issues you have found earlier (filed as PDFJAVA-35726) have been fixed in Aspose.Pdf for Java 17.3.0 Release Notes.


This message was posted using Notification2Forum from Downloads module by Aspose Notifier.