Free Support Forum - aspose.com

Convert PDF to DOCX and HTML in Java using Aspose.PDF - Table should be editable table

Hi Tahir,

Thanks for your reply.
Attached two pdf files which need to be converted to docx and html files using Aspose PDF Java product.
Aspose2Column.zip (882.5 KB)

Note: The table should be editable table.[Not as image]
The entire output documents(docx/html) should be a 1 column layout.
Can you please convert them and send the java code for docx and html
Regards,
Berlin

@bmathew_virtusa_com

Your query is related to Aspose.PDF. So, we are moving this post to Aspose.PDF forum where you will be guided appropriately.

@bmathew_virtusa_com

We have converted your one of PDF documents (white-paper-c11-737224.pdf) into DOCX and HTML using following code snippet with Aspose.PDF for Java 19.12. For your kind reference, output files are also attached. Would you kindly view them and share your feedback in case you notice any anomaly. We will further proceed accordingly to assist you.

PDF to DOCX in Java

Document doc = new Document(dataDir + "white-paper-c11-737224.pdf");
DocSaveOptions options = new DocSaveOptions();
options.setFormat(DocSaveOptions.DocFormat.DocX);
options.setMode(DocSaveOptions.RecognitionMode.Flow);
doc.save(dataDir + "19.12.docx", options);

Docx.zip (332.2 KB)

PDF to HTML in Java

HtmlSaveOptions saveOptions = new HtmlSaveOptions();
Document doc = new Document(dataDir + "white-paper-c11-737224.pdf");
doc.save(dataDir + "output.html", saveOptions);

HTML.zip (1.3 MB)

Hi Ali,
Thanks for your response.

I checked the white-paper-c11-737224.pdf. Here are my observations

Docx file

  1. The output docx file has broken images in page number 2 and 9
  2. The output docx file should be single column even though the pdf is 2 column.

Html file

  1. The headers and footer are repeated. The html file should have 1 header and 1 footer
  2. The output html file should be single column even though the pdf is 2 column.

I attached one more file software-system-bulletin.pdf. Can you please recheck all the 3 files. The main points to cover is

  1. Table should be editable table and not an image in docx and html
  2. Images in the docx file should be same as it is in the pdf file
  3. layout should be 1 column in html and docx
  4. Headers and footers should not repeat in the html
    Aspose2Column.zip (1.8 MB)

Regards,
Berlin

@bmathew_virtusa_com

Thanks for your feedback.

We have created two investigation ticket for your requirements in our issue tracking system as following:

PDFJAVA-39094 - PDF to DOCX
PDFJAVA-39095 - PDF to HTML

We will further investigate the feasibility of your requirements and let you know about the status of ticket resolution. Please spare us some time.

We are sorry for the inconvenience.