Errors when converting from PDF to Docx

When I try to convert a PDF file that has tables into a word document, the tables don’t align properly. They actually look more like images.

How do I preserve the formatting correctly when doing this conversion?

Thanks

Hi Charles,


Thanks for your inquiry. You may use Text RecognitionMode to preserve the formatting, please find attached document. But with this mode edit ability of resulting document could be limited. For complete details please check following documentation link.

Please feel free to contact us for any further assistance.

Best Regards,

I seem to be running into another issue now.

When I convert the PDF to a DOCX by itself, everything works fine. The temporary output file looks perfect.

However, if I try to load the converted file and append it to another document, the text boxes (box images) don’t align properly anymore. They appear to be shifted down 1 1/8 inches (or 81 points).

Any idea on what could be causing the alignment to mess up when the document is appended?




Hi Charles,


Thanks for sharing the resource files. We are working over this query and will get back to you soon.

Hi Charles,


Sorry for the inconvenience faced. In reference to formatting issue in PDF to DOCX conversion with flow recognition mode, I’ve logged an investigation ticket as PDFNEWNET-35574 in our issue tracking system for investigation and resolution.

Moreover in reference to formatting issue in DOCX merging, I’m moving your request to Aspose.Total forum. There one of my colleague from Aspose.Words support team will answer you soon.

Best Regards,

Hi Charles,


Thanks for your inquiry.
Charles:
I seem to be running into another issue now.

When I convert the PDF to a DOCX by itself, everything works fine. The temporary output file looks perfect.

However, if I try to load the converted file and append it to another document, the text boxes (box images) don’t align properly anymore. They appear to be shifted down 1 1/8 inches (or 81 points).

Any idea on what could be causing the alignment to mess up when the document is appended?
While using the latest version of Aspose.Words i.e. 13.6.0, I managed to reproduce this issue on my side. I have logged this issue in our bug tracking system. The ID of your issue is WORDSNET-8617. Your request has also been linked to this issue and you will be notified as soon as it is resolved. Sorry for the inconvenience.

Best regards,

Hi Charles,


Thanks for being patient. Our development team has completed the work on your issue (WORDSNET-8617) and has come to a conclusion that this issue and the undesired behaviour you’re observing in Aspose.Words is actually not a bug. So, we will most likely close this issue as ‘Not a Bug’. Please try doing the same using Microsoft Word i.e. insert SectionBreak nextPage and then insert all content from source document using Paste (Keep source formatting). You will get the same result like using Aspose.Words. It seems the reason of the problem is source document (docFirst.docx) itself, it has some problem with primary header. If you simply remove the HeaderPrimary from the ‘docFirst.docx’ the problem will be gone. We will develop a workaround code for you to avoid this problem as soon as possible.

Best regards,

Hi Charles,


Thanks for your patience.

We have further investigated the issue PDFNEWNET-35574 and the conversion of PDF to DOC seems to be fixed in latest release of Aspose.Pdf for .NET 9.5.0. Please try using the latest release and in case you encounter any issue or you have any further query, please feel free to contact.