Hello,
during my work with aspose-pdf and aspose-words modules I’ve come across major differences with look of the document when converting to and from html
Our general goal is to:
- convert .pdf document to html with aspose-pdf
- read and process the html
- convert the html to .doc format with aspose-words
while we’re pretty happy with pdf to html conversion, the look of .doc file differs greatly from the look of the original document
I’ve attached three files with simple example to visualize the problem
pdf-html-doc.zip (167.5 KB)
I tried two ways of loading html content into word document:
String htmlContent = “…”;
LoadOptions loadOptions = new HtmlLoadOptions();
Document wordDocument = new Document(new ByteArrayInputStream(htmlContent.getBytes()), loadOptions);
wordDocument.save(“c:\result.doc”);
String htmlContent = “…”;
DocumentBuilder wordDocumentBuilder = new DocumentBuilder();
wordDocumentBuilder.insertHtml(htmlContent);
wordDocumentBuilder.getDocument().save(“c:\result.doc”);
Is there something that can be done about this ? is the generated html compatible between pdf and words modules ?
I would be grateful for your assistance