We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Differences in look after PDF -> HTML -> DOC conversion

Hello,

during my work with aspose-pdf and aspose-words modules I’ve come across major differences with look of the document when converting to and from html

Our general goal is to:

  1. convert .pdf document to html with aspose-pdf
  2. read and process the html
  3. convert the html to .doc format with aspose-words

while we’re pretty happy with pdf to html conversion, the look of .doc file differs greatly from the look of the original document

I’ve attached three files with simple example to visualize the problem
pdf-html-doc.zip (167.5 KB)

I tried two ways of loading html content into word document:

String htmlContent = “…”;
LoadOptions loadOptions = new HtmlLoadOptions();

Document wordDocument = new Document(new ByteArrayInputStream(htmlContent.getBytes()), loadOptions);

wordDocument.save(“c:\result.doc”);

String htmlContent = “…”;

DocumentBuilder wordDocumentBuilder = new DocumentBuilder();

wordDocumentBuilder.insertHtml(htmlContent);
wordDocumentBuilder.getDocument().save(“c:\result.doc”);

Is there something that can be done about this ? is the generated html compatible between pdf and words modules ?

I would be grateful for your assistance

@KayDash

Thanks for your inquiry. Please note that Aspose.Words mimics the behavior of MS Word. If you convert the HTML to DOC using MS Word, you will get the same output.