Convert PDF to DOCX in Java using Aspose.PDF | formatting issues in output file

We are getting formatting related issues when we convert PDF file to DOCX format.
Can you share us snippet of code so that formatting will be maintained and wont mess up the output

@crimson

You can use following code snippet in order to convert PDF into DOCX.

using (Document pdfFile = new Document(dataDir + "TestDoc.pdf"))
            {
                pdfFile.Save(dataDir + "ducaisoft.docx", new DocSaveOptions()
                {
                    Format = DocSaveOptions.DocFormat.DocX,
                    Mode = DocSaveOptions.RecognitionMode.Flow,
                    RecognizeBullets = true
                });
            }

In case you still have any issue, please share your sample PDF document with us. We will test the scenario in our environment and address it accordingly.

Hi,
Thanks for the snippet. I did the above changes the issue what we are facing is that it gives unnecessary line breaks in the docx file. Our objective is to use this Docx file for further editing which we cannot do as the output has line breaks for every sentence.
Please check this pdf and docx file(i.e. output file).
NOTE: We are implementing this in JAVA. Please share your code snippet accordingly.
input-output.zip (212.3 KB)

@crimson

We have checked the documents shared by you and noticed that output Word Document looked same as input PDF. Would you please share how you are detecting the line breaks after each sentence so that we can observe it in our environment too and address the issue accordingly.

In this example, when i tried to edit more data then it doesn’t get appended to next line. Ideally the text should have come between “perceived” & “benefits”.
image.png (12.8 KB)

@crimson

We have observed similar issue that you have mentioned and logged it under the ticket ID PDFJAVA-39503 in our issue tracking system. We will further check this in details and keep you posted with the status of its rectification. Please be patient and spare us some time.

We are sorry for the inconvenience.