Aspose extracted text in wrong location

Using the following code to extract the attached document,

void extract(Page pageObject) {
    var paragraphAbsorber = new ParagraphAbsorber();

    for (PageMarkup markup : paragraphAbsorber.getPageMarkups()) {
        for (MarkupSection section : markup.getSections()) {
            for (MarkupParagraph paragraph : section.getParagraphs()) {
				String text = paragraph.getText();

The extracted text does not match the text in the document.

The following text
Subject 999-999 was a 99-year-old xxxxxxxxxxx, who was diagnosed with atopic dermatitis in 9999 and had a disease duration of 9 years. The subject was randomized to receive placebo subcutaneous once every week starting on 99 XXX 9999 (Week x), as per protocol.

is extracted as

Subject was a -year-old , who was diagnosed with atopic dermatitis in and had a disease duration of years. The subject was randomized to receive placebo subcutaneous once every week starting on (Week999-999 ), as per protoc99 ol.

1524.pdf (199.6 KB)


We have managed to reproduce the same issue at our side. For the sake of correction, we have logged this problem in our issue tracking system as PDFJAVA-41405. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.