Cannot edit sections of word document generated from PDF and a corrupted MS Word

Hi,
Here is the sample code I used to convert from a PDF file to Word:
Document doc = new Document(dataDir + “Test.pdf”);
DocSaveOptions saveOptions = new DocSaveOptions();
saveOptions.setFormat(DocSaveOptions.DocFormat.DocX);
doc.save(dataDir + “Test.docx”, saveOptions);

I opened the file Test.docx in MS Word and I am trying to edit it, I am not able to add text to the area below ‘COLUMN A/B/C’. I think since the original PDF did not have text in that area it is not possible to edit it ?
Please see the screenshot below.

image.png (3.9 KB)

Also, if I modify the generated MS Word document save it as PDF and create MS Word using APSOSE.pdf Java library the output docx looks corrupted.

See the TEST.pdf and TEST2.docx attached.in the zip fileTEST-PDF_2-Word.zip (170.2 KB)

BR,
Dip

@Agiloft

Thanks for contacting support.

It is quite possible that source PDF document has that text in graphics format and API converts it as image in the resultant DOCX file. However, would you please share the original source PDF. We will test the scenario in our environment and address it accordingly.

We were able to replicate the issue in our environment and logged it as PDFJAVA-37920 in our issue tracking system. We will investigate the issue in details and keep you informed with its resolution status. Please be patient and spare us little time.

We are sorry for the inconvenience.

Hello,
Please find the original source PDF attached for the first issue.

.Exp_reg.pdf (59.1 KB)
Thanks

@Agiloft

Thanks for sharing sample PDF.

We have tested the scenario using Aspose.PDF for Java 18.7 and noticed that the text was editable in output DOCX file. For your kind reference, output is also attached. Would you please try with latest version of the API and in case you still face any issue, please feel free to let us know.

Exp_reg.zip (60.0 KB)
TextEditing.png (3.6 KB)

Hello,
May be I was not clear in explaining the issue, Please check if you can edit the white space below the table header ( I have marked the areas with red lines in the screenshot ). A typical use case is , user will fill in data for the table in ms Word.
Please take a look.
Thanks

image.png (5.0 KB)

Hello,
I noticed one more thing. The Output document does not have a MS Word table, instead it is text from PDF and a picture frame resembling a table. Please see the below screenshot where I have dragged the picture to the right.

image.png (10.5 KB)

@Agiloft

Thank you for elaborating it further.

We have have been able to reproduce the issue in our environment. A ticket with ID PDFJAVA-37923 has been logged in our issue management system for further investigation and resolution. The ticket ID has been linked with this thread so that you will receive notification as soon as the ticket is resolved.

We are sorry for the inconvenience.

Hello,
Ticket PDFJAVA-37923 is very critical for us.
Please let me know an ETA for it so that I can convey a date to my customers.

Thanks

@Agiloft

The ticket PDFJAVA-37923 has recently been logged in our issue management system. It will be investigated on its due turn which can take some months. We will share the ETA with you as soon as the ticket will be investigated. Please be patient and spare us some time.

However, we also offer Paid Support, where issues are used to be investigated with higher priority. Our customers, who have paid support subscription, report their issue there which are meant to be investigated urgently. In case your reported issue is critical or blocker, you may please consider subscribing for Paid Support. For further information, please visit Paid Support FAQs.

Hello,
Please let me know if there is any progress related to the ticket: PDFJAVA-37923.

@Agiloft

Thank you for getting back to us.

We have found PDFJAVA-37923 to be a known issue because PDF format does not contain the table as separate entity but the text and lines. So we are afraid any other approach instead of table being rendered as image, may not be available in near future.

MS Word 2016 and MS Word 365 can convert the same though and the tables come out quite nicely
Once Word provides bit more reliable .NET/Interop APIs possibly that will be a way out for my customers.

@Agiloft

Thank you for your feedback.

We have recorded your comments and will let you know as soon as some significant updates will be available in this regard.

Hi,
Please let me know if there is a fix for this in any recent version of ASPOSE-Java

@Agiloft

We are afraid PDFJAVA-37920 and PDFJAVA-37923 have not been resolved yet where the later ticket may not be resolved soon.We will let you know once any significant progress will be made in this regard.