See attached sample: it just opens a PDF file and saves it to DOCX. Now, a lot of texts that were placed at specific locations over an image are placed completely wrong.
And this is the sample - it contains also the original Pdf file. PdfToWord.zip (965.4 KB)
The Pdf file shows the seat plan for a open air theater, and the numbers are the seat numbers that seem to be absolutely placed texts. I received this file from a customer, so I don’t know how it was generated.
I hope this is something that you can optimize, as we have to use this feature to add existing PDF files to the end of a report.
@wknauf
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): WORDSNET-27271
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.
You should note, Aspose.Words is designed to work with MS Word documents. MS Word documents are flow documents and they have structure very similar to Aspose.Words Document Object Model. On the other hand PDF documents are fixed page format documents . While converting PDF document Fixed Page Document structure is converted into the Flow Document Object Model. Unfortunately, such conversion does not guaranty 100% fidelity.
I see that the linked issue has state “analysis complete”. Do you have any updates for me? Is there hope that this can/will be improved by Aspose.Words ;-)?
It is very complicated task to change current document recognition logic to support such files.
Maybe we can add option in PdfLoadOptions to create fixed layout in docx file.
@wknauf Unfortunately, there are no estimates yet.
Yes, there should not be any problems with appending document to existing one. But there might be difficulties with document editing, since MS Word documents are flow by their nature and it is hard to edit fixed content in the documents.
In our use case, there is no reason to edit the content of the appended pdf. Only the first part of the document (before the SectionStart.NewPage) might be edited. So, this might work.
Well, I have to continue waiting and will ask for an update every few weeks …