Hi,
With below code , we can create pdf structure like - Page - Sections-Paragraphs -Text/Image etc.
Pdf pdf1 = new Pdf();
aspose.pdf.Section sec1 = pdf1.getSections().add();
sec1.getParagraphs().add(new aspose.pdf.Text(sec1, "paragraph 1 "));
But how to retrieve this structure for loaded PDF document. We load document as below:
com.aspose.pdf.Document pdfDoc= new com.aspose.pdf.Document(new FileInputStream(file));
Is there any other way to load PDF so as to retrieve sectionsparagraphs,textx,images ect and loop thorugh all nodes starting from Document node?
Thank you.
–Sonali
Hi Sonali,
Thanks for contacting support.
As per your understanding, the aspose.pdf package provides the feature to create PDF file in structured manner (i.e. create PDF object which contains one or more Section objects and each Section contains one or more paragraph objects). Similarly, the com.aspose.pdf package also provides the capability to create as well as manipulate existing PDF files in structured manner (i.e. retrieve/manipulate elements from PDF file) where Document represents the PDF file, which contains one or more Page object. Each Page element has Paragraphs collection where Image, Text, Annotation etc are paragraph level elements. Please visit the following link for further details on
- Extract Text From All the Pages of a PDF Document
- Extract Images from the PDF File
- Get Bookmarks from PDF Document
- Get All Annotations from Page in a PDF
- Get Attachments from a PDF Document
In case I have not properly understood your requirement or you have any further query, please share some further details.
Hi Nayyer,
Thanks for looking into the query.
Below few more queries...
1. With com.aspose.pdf.Document can we get document.getPages().get_item(1).getParagraphs ?
looks like getParagraphs() method no longer exists.
2. Suppose in one page I have inserted:
Text1 then image1 then text2 then image2 then Text3 .
with com.aspose.pdf ,suppose some how I found Text2. Now I want to delete image 'Image2' immediately following this text.
So based on text2, how can I get index of image2 to delete it?
Can we get object ids for text and images and delete particular objects directly from pdf irrespectve of object types?
3. Suppose image points to web url ,any method in com.aspose.pdf.XImage to get hyperlink associated with image?
4. We found if image has web url, it is coming as LinkAnnotation. any method in annotation to get this image or image location?
Thanks.
-Sonali
sonaliag1:
1. With com.aspose.pdf.Document can we get document.getPages().get_item(1).getParagraphs ?looks like getParagraphs() method no longer exists.
Hi Sonali,
Thanks for contacting support.
I have tested the scenario using latest hotfix of Aspose.Pdf for Java 4.5.1 and as per my observations, the getParagraphs(..) method exists.
sonaliag1:
2. Suppose in one page I have inserted:Text1 then image1 then text2 then image2 then Text3 .
with com.aspose.pdf ,suppose some how I found Text2. Now I want to delete image 'Image2' immediately following this text.
So based on text2, how can I get index of image2 to delete it?
Can we get object ids for text and images and delete particular objects directly from pdf irrespectve of object types?
Images are saved in Images collection and can be retrieved using getImages(..) method. Whereas Text can be accessed using TextAbsorber class. Images in their collection have separate indexing and they do not have any relation with text present in PDF.
I am working on other queries and will get back to you soon.
Hi,
Any update? Please updae us with whatever completed so far and you may continue furthur to complete it.
or Can you let us know approach you are taking for acieving this?
Thank you.
-Sonali
sonaliag1:
3. Suppose image points to web url ,any method in com.aspose.pdf.XImage to get hyperlink associated with image?
sonaliag1:
4. We found if image has web url, it is coming as LinkAnnotation. any method in annotation to get this image or image location?