Hi,
I am currently looking for a new Framework to extract Text From PDF Files. Therefore we need to process MultiColumn Files hopefully only with 2 Columns. I followed one of the Code expamples but the order of the paragraphs is wrong and therefore i am unable to find the overflow of the last Section in the first column and the first one in the second Column
ParagraphAbsorber absorber = new ParagraphAbsorber(); absorber.visit(page); for (PageMarkup markup : absorber.getPageMarkups()) { int i = 0; markup.setMulticolumnParagraphsAllowed(true); for (MarkupSection section : markup.getSections()) { int j = 0; for (MarkupParagraph paragraph : section.getParagraphs()) { StringBuilder paragraphText = new StringBuilder(); for (java.util.List<TextFragment> line : paragraph.getLines()) { for (TextFragment fragment : line) { paragraphText.append(fragment.getText() + " "); } paragraphText.append("\r\n"); } paragraphText.append("\r\n"); System.out.println("Paragraph " + j + " of section " + i + " on page" + ":" + markup.getNumber()); System.out.println(paragraphText.toString()); j++; } i++; }