The same paragraph is split into two while converting multi-column PDF to DOCX

We are evaluating Aspose.PDF (v21.8.0) for converting PDF into DOCX.

I convert my PDF document (test.pdf) that is multi-column to DOCX (test_savedToDocx.docx) with the following code:

var pdfDocument = new Document("test.pdf");
pdfDocument.Save("test_savedToDocx.docx", SaveFormat.DocX);

The .docx output contains corrupted paragraph:

image.png (43.6 KB)

“blockchains, if…” is part of “The blockchain was …” paragraph, but moved into separate paragraph:

image.png (56.3 KB)

This is a bug or there is some save option that allows preventing such paragraph breaks?

test.pdf (34.4 KB)
test_savedToDocx.docx (15.9 KB)

@AdamSh

I can not reproduce the issue as the output looks same as the PDF file. I request you to check closely and share your feedback. DOCX_output.PNG (56.3 KB)

Yes, the output view is correct, but the layout is broken: in this case, the beginning of the second column should be continuation of the first, i.e. all content should be located in the same paragraph.

@AdamSh

A ticket with ID PDFNET-50438 has been created in our issue tracking system to further investigate the issue on our end. This thread has been linked with the issue so that you may be notified once the issue will be fixed.

1 Like