Failed to extract a PDF paragraph from aspose.pdf for Java

yujianchuntian · June 25, 2021, 3:09am

The problem：
The content of the extract paragraph is incomplete.
微信图片_20210625110141.png (96.0 KB)
微信图片_20210625110150.png (2.8 KB)

code:
public static void ExtractParagraph02() {
// Open an existing PDF file
Document doc = new Document(FilePdfName);
// Instantiate ParagraphAbsorber
ParagraphAbsorber absorber = new ParagraphAbsorber();
absorber.visit(doc);

    for (PageMarkup markup : absorber.getPageMarkups()) {
        int i = 1;

        for (MarkupSection section : markup.getSections()) {
            int j = 1;

            for (MarkupParagraph paragraph : section.getParagraphs()) {
                StringBuilder paragraphText = new StringBuilder();
                for (java.util.List<TextFragment> line : paragraph.getLines()) {
                    for (TextFragment fragment : line) {
                        paragraphText.append(fragment.getText());
                    }
                    paragraphText.append("\r\n");
                }
                paragraphText.append("\r\n");

                System.out.println("Paragraph "+j+" of section "+ i + " on page"+ ":"+markup.getNumber());
                System.out.println(paragraphText.toString());

                j++;
            }
            i++;
        }
    }
}

Can aspose.pdf Java extract the content of the box mark on the image separately?
QQ图片20210625110845.png (141.4 KB)

asad.ali · June 25, 2021, 5:55pm

@yujianchuntian

Are you using the API with a valid license? If yes and still you are facing the issue, please share your sample PDF document with us so that we can test the scenario in our environment and address it accordingly.

yujianchuntian · June 26, 2021, 8:44am

The test results did not meet my expectations, so I haven’t purchased this product yet.
There is currently no use to use the API with a valid license. Currently using the example in Extract Paragraph from PDF|Aspose.PDF for Java
QQ图片20210626163126.png (55.3 KB)

Test file：
testLamplight.pdf (119.0 KB)

At present, the results of my test are not completely extracting a section of the content.
QQ图片20210626163225.png (30.5 KB)

yujianchuntian · June 27, 2021, 3:33pm

@asad.ali Hello, do I need to purchase the aspose.pdf product before I can use the api to extract a complete paragraph?
In the above reply, I uploaded the test file. Did you test it in your environment?

asad.ali · June 28, 2021, 10:01am

@yujianchuntian

You do not need to purchase the API in order to evaluate it. In fact, you can use a free 30-days temporary license in order to test the API without any restrictions. The trial mode of the API offers a limitation where you can only process 4 elements of any collection e.g. paragraphs, annotations, etc.

We tested the file in our environment using a valid license and API was able to extract all paragraphs from your PDF.