Hi,
Here is the sample code I used to convert from a PDF file to Word:
Document doc = new Document(dataDir + “Test.pdf”);
DocSaveOptions saveOptions = new DocSaveOptions();
saveOptions.setFormat(DocSaveOptions.DocFormat.DocX);
doc.save(dataDir + “Test.docx”, saveOptions);
I opened the file Test.docx in MS Word and I am trying to edit it, I am not able to add text to the area below ‘COLUMN A/B/C’. I think since the original PDF did not have text in that area it is not possible to edit it ?
Please see the screenshot below.
It is quite possible that source PDF document has that text in graphics format and API converts it as image in the resultant DOCX file. However, would you please share the original source PDF. We will test the scenario in our environment and address it accordingly.
We were able to replicate the issue in our environment and logged it as PDFJAVA-37920 in our issue tracking system. We will investigate the issue in details and keep you informed with its resolution status. Please be patient and spare us little time.
We have tested the scenario using Aspose.PDF for Java 18.7 and noticed that the text was editable in output DOCX file. For your kind reference, output is also attached. Would you please try with latest version of the API and in case you still face any issue, please feel free to let us know.
Hello,
May be I was not clear in explaining the issue, Please check if you can edit the white space below the table header ( I have marked the areas with red lines in the screenshot ). A typical use case is , user will fill in data for the table in ms Word.
Please take a look.
Thanks
Hello,
I noticed one more thing. The Output document does not have a MS Word table, instead it is text from PDF and a picture frame resembling a table. Please see the below screenshot where I have dragged the picture to the right.
We have have been able to reproduce the issue in our environment. A ticket with ID PDFJAVA-37923 has been logged in our issue management system for further investigation and resolution. The ticket ID has been linked with this thread so that you will receive notification as soon as the ticket is resolved.
The ticket PDFJAVA-37923 has recently been logged in our issue management system. It will be investigated on its due turn which can take some months. We will share the ETA with you as soon as the ticket will be investigated. Please be patient and spare us some time.
However, we also offer Paid Support, where issues are used to be investigated with higher priority. Our customers, who have paid support subscription, report their issue there which are meant to be investigated urgently. In case your reported issue is critical or blocker, you may please consider subscribing for Paid Support. For further information, please visit Paid Support FAQs.
We have found PDFJAVA-37923 to be a known issue because PDF format does not contain the table as separate entity but the text and lines. So we are afraid any other approach instead of table being rendered as image, may not be available in near future.
MS Word 2016 and MS Word 365 can convert the same though and the tables come out quite nicely
Once Word provides bit more reliable .NET/Interop APIs possibly that will be a way out for my customers.
We are afraid PDFJAVA-37920 and PDFJAVA-37923 have not been resolved yet where the later ticket may not be resolved soon.We will let you know once any significant progress will be made in this regard.