How to extract charts, tables, shapes & all vector objects from PDF using aspose pdf java

We want to know whether we can extract the shapes, tables, charts & all other vector elements by using the aspose pdf java apis. I can’t able to find the exact API to resolve my issue. However, when we convert the PDF to PPTX through online aspose tool, all those elements are converted into images in the converted PPTX. Hence, I believe there is a way to extract those elements. Kindly guide me on this case.

Sample PDF -
GifTest.pdf (115.5 KB)

@johnson123

There is not a single method to extract all these objects from the PDF document. For example, TextAbsorber is used to extract text and TableAbsorber class extracts tables from a PDF. Recently, we added GraphicAbsorber class to extract and move Graphics from one page to another inside a PDF. Below method uses same class to extract drawn graphics:

Document doc = new Document(dataDir + "GifTest.pdf");
for (Page page : doc.getPages()) {
    page.trySaveVectorGraphics(dataDir + "output" + page.getNumber() + ".svg");
}

Can you please share expected outputs for our reference so that we may know in which format you would like to extract them?

@asad.ali After testing the code you sent us, we’ve attached the results below. However, we anticipated that each shape would need to be saved as a separate image or object and with all properties of the shapes.

Basically we want to convert the PDF to PPTX with aspose PDF java library. For that, we need all shapes, images, charts,… etc with their associated text & their properties, … etc. To achieve this, can you provide us any sample example?

Shapes.pdf (1.5 KB)

@johnson123

Looks like your requirements is not about extract the Vector Graphics. In fact, you need to convert the PDF into PPTX in a way that every graphic and shape in output PPTX is a separate element.

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFJAVA-43706

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@asad.ali

We must, as you indicated above, transform every component from PDF to PPTX so that we may extract distinct details from it. We were unable to fully obtain the shape details from your API document, but they will be transformed to pptx with the shapes and their properties in your online tools. Could you please explain the process by which you obtain the forms and their properties?

@johnson123

Are you asking particularly during the PDF to PPTX conversion? OR generally, how API extracts form fields from the PDF?

Are you saying that the online tools is producing expected result? Can you please share the link you used along with the sample files?

@asad.ali

Indeed, we do a PPTX conversion from the PDF. Although the result we acquired with the online tool was not what we had anticipated, it was still significantly superior to the result we obtained from the API. Please review the sample results that are provided.

Online tool link -

Sample PDF-
GifTest.pdf (115.5 KB)

Online tool output-
2024-03-27 18-55-47.png (80.1 KB)

API output-
2024-03-28 12-12-04.png (9.8 KB)

@johnson123

Thanks for providing the requested details and more information. We have updated the ticket information accordingly to include provided details in investigation. As soon as we make some progress towards ticket resolution. We will inform you in this forum thread. We appreciate your patience in this regard. Please spare us some time.

We are sorry for the inconvenience.