Hi Aspose,
Hi Rajesh,
Thanks for your patience.
Please note that Headers and footers context is actual on pdf generation stage, but after saving the context, there is no more separation between headers/footers and main content. We cannot take any footers or headers from the just opened document.
In an open document, there is only a context that is located on certain coordinates. TextAbsorber
, ImagePlacementAbsorber
, TableAbsorber
classes can be used. Also the context can be grouped in “marked-content element”. If headers or footers are grouped into such elements we may able to extract this content as images:
Document document = new Document(myDir+ "sample.pdf");
PdfExtractor pe = new PdfExtractor();
//Specify the folder to save extracted images
pe.extractMarkedContentAsImages(document.getPages().get_Item(1), myDir + "MarkedContentElementFolder");
Furthermore, regarding the other logged ticket PDFJAVA-35762, we will surely let you know, once we have some definite updates regarding its resolution. Please be patient and spare us little time.
We are sorry for the inconvenience.
Thanks for your patience.
We are pleased to inform you that earlier logged feature request PDFJAVA-35762 has been fulfilled in Aspose.PDF for Java 18.1. We have implemented new functionality for searching sections and paragraphs in the text of PDF document pages. The following code snippets illustrates ParagraphAbsorber usage:
Sample #1 - Drawing border of sections and paragraphs of text on PDF page:
public void PDFJAVA_35762()
{
initLicense();
System.out.println("Is licensed = " + Document.isLicensed());
String myDir = "E:/LocalTesting/";
Document doc = new Document(myDir + "amblatt2013-10-05.pdf");
Page page = doc.getPages().get_Item(2);
ParagraphAbsorber absorber = new ParagraphAbsorber();
absorber.visit(page);
PageMarkup markup = absorber.getPageMarkups().get(0);
for (MarkupSection section : markup.getSections())
{
drawRectangleOnPageTest(section.getRectangle(), page);
for (MarkupParagraph paragraph : section.getParagraphs())
{
drawPolygonOnPageTest(paragraph.getPoints(), page);
}
}
doc.save(myDir + "amblatt2013-10-05_sections¶graphs" + version + ".pdf");
}
private void drawRectangleOnPageTest(Rectangle rectangle, Page page)
{
page.getContents().add(new Operator.GSave());
page.getContents().add(new Operator.ConcatenateMatrix(1, 0, 0, 1, 0, 0));
page.getContents().add(new Operator.SetRGBColorStroke(0, 1, 0));
page.getContents().add(new Operator.SetLineWidth(2));
page.getContents().add(
new Operator.Re(rectangle.getLLX(),
rectangle.getLLY(),
rectangle.getWidth(),
rectangle.getHeight()));
page.getContents().add(new Operator.ClosePathStroke());
page.getContents().add(new Operator.GRestore());
}
private void drawPolygonOnPageTest(Point[] polygon, Page page)
{
page.getContents().add(new Operator.GSave());
page.getContents().add(new Operator.ConcatenateMatrix(1, 0, 0, 1, 0, 0));
page.getContents().add(new Operator.SetRGBColorStroke(0, 0, 1));
page.getContents().add(new Operator.SetLineWidth(1));
page.getContents().add(new Operator.MoveTo(polygon[0].getX(), polygon[0].getY()));
for (int i = 1; i < polygon.length; i++)
{
page.getContents().add(new Operator.LineTo(polygon[i].getX(), polygon[i].getY()));
}
page.getContents().add(new Operator.LineTo(polygon[0].getX(), polygon[0].getY()));
page.getContents().add(new Operator.ClosePathStroke());
page.getContents().add(new Operator.GRestore());
}
Sample #2 - Iterating through paragraphs collection and get text of them:
String myDir = "E:/LocalTesting/";
Document doc = new Document(myDir + "amblatt2013-10-05.pdf");
ParagraphAbsorber absorber = new ParagraphAbsorber();
absorber.visit(doc);
for (PageMarkup markup : absorber.getPageMarkups())
{
int i = 1;
for (MarkupSection section : markup.getSections())
{
int j = 1;
for (MarkupParagraph paragraph : section.getParagraphs())
{
StringBuilder paragraphText = new StringBuilder();
for (List<TextFragment> line : paragraph.getLines())
{
for (TextFragment fragment : line)
{
paragraphText.append(fragment.getText());
}
paragraphText.append("\r\n");
}
paragraphText.append("\r\n");
System.out.println("Paragraph {" + j + "} of section {" + i + "} on page {" + markup.getNumber() + "}:");
System.out.println(paragraphText.toString());
j++;
}
i++;
}
}
Please try the functionality using suggested code snippet and in case you face any issue please provide details along with sample PDF document. We will test the scenario in our environment and address it accordingly.
PS: It would really be appreciated if you can share the JDK version in which you are working in your environment.