Hello,
4. Write the string array in text file.
Hello,
Hi Navnath,
Hello Shahbaz,
Hi Navnath,
nnkumbhar212:Now, my next requirement is to identify every new text paragraph.I want to convert each text paragraph into a single line of text.Hi Navnath,The earlier shared ticket ID PDFJAVA-35762 is regarding extraction of text paragraph by paragraph (rather extracting the content from complete document). Once this feature gets implemented, we will let you know.
We are sorry for this inconvenience.
Thanks for your patience.
We are pleased to inform you that earlier logged feature request PDFJAVA-35762 has been fulfilled in Aspose.PDF for Java 18.1. We have implemented new functionality for searching sections and paragraphs in the text of PDF document pages. The following code snippets illustrates ParagraphAbsorber usage:
Sample #1 - Drawing border of sections and paragraphs of text on PDF page:
public void PDFJAVA_35762()
{
initLicense();
System.out.println("Is licensed = " + Document.isLicensed());
String myDir = "E:/LocalTesting/";
Document doc = new Document(myDir + "amblatt2013-10-05.pdf");
Page page = doc.getPages().get_Item(2);
ParagraphAbsorber absorber = new ParagraphAbsorber();
absorber.visit(page);
PageMarkup markup = absorber.getPageMarkups().get(0);
for (MarkupSection section : markup.getSections())
{
drawRectangleOnPageTest(section.getRectangle(), page);
for (MarkupParagraph paragraph : section.getParagraphs())
{
drawPolygonOnPageTest(paragraph.getPoints(), page);
}
}
doc.save(myDir + "amblatt2013-10-05_sections¶graphs" + version + ".pdf");
}
private void drawRectangleOnPageTest(Rectangle rectangle, Page page)
{
page.getContents().add(new Operator.GSave());
page.getContents().add(new Operator.ConcatenateMatrix(1, 0, 0, 1, 0, 0));
page.getContents().add(new Operator.SetRGBColorStroke(0, 1, 0));
page.getContents().add(new Operator.SetLineWidth(2));
page.getContents().add(
new Operator.Re(rectangle.getLLX(),
rectangle.getLLY(),
rectangle.getWidth(),
rectangle.getHeight()));
page.getContents().add(new Operator.ClosePathStroke());
page.getContents().add(new Operator.GRestore());
}
private void drawPolygonOnPageTest(Point[] polygon, Page page)
{
page.getContents().add(new Operator.GSave());
page.getContents().add(new Operator.ConcatenateMatrix(1, 0, 0, 1, 0, 0));
page.getContents().add(new Operator.SetRGBColorStroke(0, 0, 1));
page.getContents().add(new Operator.SetLineWidth(1));
page.getContents().add(new Operator.MoveTo(polygon[0].getX(), polygon[0].getY()));
for (int i = 1; i < polygon.length; i++)
{
page.getContents().add(new Operator.LineTo(polygon[i].getX(), polygon[i].getY()));
}
page.getContents().add(new Operator.LineTo(polygon[0].getX(), polygon[0].getY()));
page.getContents().add(new Operator.ClosePathStroke());
page.getContents().add(new Operator.GRestore());
}
Sample #2 - Iterating through paragraphs collection and get text of them:
String myDir = "E:/LocalTesting/";
Document doc = new Document(myDir + "amblatt2013-10-05.pdf");
ParagraphAbsorber absorber = new ParagraphAbsorber();
absorber.visit(doc);
for ( PageMarkup markup : absorber.getPageMarkups())
{
int i = 1;
for (MarkupSection section : markup.getSections())
{
int j = 1;
for (MarkupParagraph paragraph : section.getParagraphs())
{
StringBuilder paragraphText = new StringBuilder();
for(List<TextFragment> line : paragraph.getLines())
{
for(TextFragment fragment : line)
{
paragraphText.append(fragment.getText());
}
paragraphText.append("\r\n");
}
paragraphText.append("\r\n");
System.out.println("Paragraph {"+j+"} of section {"+i+"} on page {"+markup.getNumber()+"}:");
System.out.println(paragraphText.toString());
j++;
}
i++;
}
}
Please try the functionality using suggested code snippet and in case you face any issue please provide details along with sample PDF document. We will test the scenario in our environment and address it accordingly.
PS: It would really be appreciated if you can share the JDK version in which you are working in your environment.