PDF to DOCX: extra pages- list numbering- and odd characters

Hello -

I am using Aspose PDF for Java 9.7.0 to convert PDF to DOCX. There are several problems that have been exhibited in the last version of the tool that were not there previously.

First, I have noticed that with many different PDFs that it is inserting blank pages between each page of the output. Attached is a PDF file that exhibits this behavior, and its DOCX conversion.

Second, on the same file, you can also see that numbered lists are incorrect. On the original PDF, for example, you can see a bulleted list under header “4” that is ordered 4.1, 4.2, 4.3, etc.

But on the output, all of these sub-bullets read simply “4”. Further down is a similar scenario, but all sub-bullets read "10.7"

Third, please see bullets under “Value Adds” (section 8.0) in the output; there are ? characters that don’t belong after each bullet.

Thank you.



Hi Chris,


Thanks for contacting support.

I have tested the scenario using following code snippet in Eclipse Juno project running over Windows 7 (x64) where I have JDK 1.7 and I am unable to notice any problem. I am unable to notice above stated issues in resultant DOCX file. For your reference, I have also attached the resultant file generated with Aspose.Pdf for Java 9.7.0.

[Java]

// load source XPS file<o:p></o:p>

com.aspose.pdf.Document doc = new com.aspose.pdf.Document("c:/pdftest/testTemplate3.pdf");

// save output as PDF format

doc.save("C:/pdftest/testTemplate3_conversion.docx", com.aspose.pdf.SaveFormat.DocX);

Hi Nayyer,
I neglected to post the offending code snippet. It turns out we were using a combination of recognition mode and output that must be deprecated in newer versions of Aspose PDF. In this case we are using “Flow” recognition mode, but after I updated the code to reflect the newest version of the save options (bullet recognition, save mode, and recognition mode), I am able to get it to work. Thank you for your time.

Hi Chris,


Thanks for sharing the feedback.

Particular rendering modes are used depending upon requirement and as shared earlier, the output generated without using any
DocSaveOptions.RecognitionMode is correct. However we are glad to hear that your problems are resolved. Please continue using our API and in the event of any further query, please feel free to contact.