We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Aspose PDF Paragraph extraction is not working properly when there is a paragraph is flowing

Hi ,
I am trying to extract paragraphs from PDF using Paragraph Absorber. However , if a paragraph is flowing beyond the current column to the next column or if it is flowing beyond page boundaries aspose pdf is considering it as two different paragraphs but it should be one paragraph…

Document pddoc = new Document(new FileInputStream(“C:/Users/aswarna/Documents/FlowingErrosInParagraphIdentification.pdf”));

	for (int i = 1; i <= pddoc.getPages().size(); i++) {
		Page pdPage = pddoc.getPages().get_Item(i);
			ParagraphAbsorber paraAbsorber1 = new ParagraphAbsorber();

		List<PageMarkup> pm = paraAbsorber1.getPageMarkups();
		Iterator<PageMarkup> pmIter1 = pm.iterator();
		// Iterator<PageMarkup> pmIter2=
		// paraAbsorber2.getPageMarkups().iterator();
		while (pmIter1.hasNext()) {
			PageMarkup markup = pmIter1.next();
			List<MarkupSection> mss = markup.getSections();
			Iterator<MarkupSection> msIter1 = mss.iterator();

			while (msIter1.hasNext()) {
				MarkupSection ms = msIter1.next();
				List<MarkupParagraph> pgs = ms.getParagraphs();
				Iterator<MarkupParagraph> mpIter1 = pgs.iterator();

				while (mpIter1.hasNext()) {
					MarkupParagraph p1 = mpIter1.next();



If you observe in the given pdf lowerleft paragraph in the firstcolumn and top paragraph in the second column should be one paragraph.
Also bottom right corner paragraph of first page and first paragraph of second page should be also one paragraph.
Is there a way I can achieve this in aspose ?

issueContent.zip (232.0 KB)


We were able to notice similar issue in our environment while using Aspose.PDF for Java 20.8 and logged it as PDFJAVA-39700 in our issue tracking system. We will further investigate this issue in detail and keep you informed of its resolution status. Please have patience and spare us some time.

We are sorry for the inconvenience.