Convert PDF to DOCX and HTML in Java using Aspose.PDF - Table should be editable table

Hi Tahir,

Thanks for your reply.
Attached two pdf files which need to be converted to docx and html files using Aspose PDF Java product.
Aspose2Column.zip (882.5 KB)

Note: The table should be editable table.[Not as image]
The entire output documents(docx/html) should be a 1 column layout.
Can you please convert them and send the java code for docx and html
Regards,
Berlin

@bmathew_virtusa_com

Your query is related to Aspose.PDF. So, we are moving this post to Aspose.PDF forum where you will be guided appropriately.

@bmathew_virtusa_com

We have converted your one of PDF documents (white-paper-c11-737224.pdf) into DOCX and HTML using following code snippet with Aspose.PDF for Java 19.12. For your kind reference, output files are also attached. Would you kindly view them and share your feedback in case you notice any anomaly. We will further proceed accordingly to assist you.

PDF to DOCX in Java

Document doc = new Document(dataDir + "white-paper-c11-737224.pdf");
DocSaveOptions options = new DocSaveOptions();
options.setFormat(DocSaveOptions.DocFormat.DocX);
options.setMode(DocSaveOptions.RecognitionMode.Flow);
doc.save(dataDir + "19.12.docx", options);

Docx.zip (332.2 KB)

PDF to HTML in Java

HtmlSaveOptions saveOptions = new HtmlSaveOptions();
Document doc = new Document(dataDir + "white-paper-c11-737224.pdf");
doc.save(dataDir + "output.html", saveOptions);

HTML.zip (1.3 MB)

Hi Ali,
Thanks for your response.

I checked the white-paper-c11-737224.pdf. Here are my observations

Docx file

  1. The output docx file has broken images in page number 2 and 9
  2. The output docx file should be single column even though the pdf is 2 column.

Html file

  1. The headers and footer are repeated. The html file should have 1 header and 1 footer
  2. The output html file should be single column even though the pdf is 2 column.

I attached one more file software-system-bulletin.pdf. Can you please recheck all the 3 files. The main points to cover is

  1. Table should be editable table and not an image in docx and html
  2. Images in the docx file should be same as it is in the pdf file
  3. layout should be 1 column in html and docx
  4. Headers and footers should not repeat in the html
    Aspose2Column.zip (1.8 MB)

Regards,
Berlin

@bmathew_virtusa_com

Thanks for your feedback.

We have created two investigation ticket for your requirements in our issue tracking system as following:

PDFJAVA-39094 - PDF to DOCX
PDFJAVA-39095 - PDF to HTML

We will further investigate the feasibility of your requirements and let you know about the status of ticket resolution. Please spare us some time.

We are sorry for the inconvenience.

Hi Aspose Team,

Any update on the two tickets.

Thanks & Regards,
Berlin

@bmathew_virtusa_com

The issues were logged recently in our issue tracking system and are still pending for analysis. We will surely investigate and resolve them on first come first serve basis and let you know as soon as they are resolved. Please spare us some time.

We are sorry for the inconvenience.

Hi Ali,

Any updates on the below open tickets?

PDFJAVA-39094 ---- Status : Open
PDFJAVA-39095 ---- Status : Open

Regards,
Berlin

@bmathew_virtusa_com

We would like to share with you that your issues are being investigated and expected to be resolved in Aspose.PDF for Java 20.3 which will be available in the end of March. We will surely inform you as soon as we have some more updates in this regard. Please spare us little time.

Hi Aspose Team,

Can you please let us know whether the below issues are fixed ?

PDFJAVA-39094 ---- Status : Open
PDFJAVA-39095 ---- Status : Open

Regards,
berlin

@bmathew_virtusa_com

Regretfully the tickets are not yet resolved. The implementation of the required features depends upon several internal components of the API and requires more time. For now, the tickets are under feedback status. We will surely inform you as soon as required functionality is implemented. Please spare us some time.

We are sorry for the inconvenience.

2 posts were split to a new topic: Issues while processing docx files with Aspose.Words for Java

Hello,

I’ve recently purchased Aspose.Words and Aspose.PDF for .Net, and I’m having the same problem with some elements been converted as images in PDF to HTML conversion, most important of them are Tables. I’d like to know if there is some progress on this request. On the blog, all that I can see is that this topic was split on two post, but I can’t find more info. I can see just one link (Issues while processing) docx…) without a solution (as far as I can tell).

Conversion from Docx to HTML works as expected (producing editable tables -Table, TR and TD elements on resulting HTML), but not for PDF to HTML.

In PDF to HTML conversion, Tables are converted as Images, and sometimes these ‘tables’ are merged with other objects in the same image. Our goal is to convert the PDF to insert it as HTML in an WYSIWYG Editor (Kendo), so user can edit the content if needed, and to be as close as the original as posible. Editable Tables is the main concern so far.

Regards
Edgardo

@edgardo.delarosa

The ticket related to the issue which you are also facing is PDFJAVA-39095 and it is pending for resolution. We have logged your comments under the ticket and will surely consider them during its investigation. We will inform you as soon as we have some certain updates regarding its resolution. Please be patient and spare us some time.

We are sorry for the inconvenience.