I want to extract tables and paragraphs sequentially using aspose in python

kalyani222000 · June 12, 2024, 8:45pm

Hi, I am trying to use aspose for extracting paragraph wise in a pdf, i want a sequential extraction. For example- it reads a paragraph, extracts it. Now if it encounters a table, it extracts it. It should be sequential as it is present in the original pdf. I am not able to do this sequentially. How can I identify tables and paragraphs and sequentially extract it?

asad.ali · June 12, 2024, 11:58pm

@kalyani222000

Would you please share your sample PDF along with expected output and the code snippet you have used? We will log an investigation ticket and share the ID with you.

kalyani222000 · June 13, 2024, 9:05am

phishing_checklist.pdf (345.6 KB)

I want the text to be extracted, it is not extracting the text and is missing out.

kalyani222000 · June 13, 2024, 9:06am

In this pdf, i want it to identify the paragraph and extract paragraph, and identify table and extract table. It is currently extracting table also when I am using paragraph abosorber.
pdfToHtmlSample.pdf (110.8 KB)

asad.ali · June 13, 2024, 5:49pm

@kalyani222000

Can you please also share the code snippet that you are using so that we can investigate and address the issue accordingly.