Free Support Forum - aspose.com

HTML document with images to .DOCx

I have HTML documents with separate images (as .zip files) that need to be converted to .DOCX format. How do I pass in both the HTML and the images to the Aspose “Document” object? Should I build everything into one MHTML file? Thank you for your help.

Hi Don,

Thanks for your inquiry.

I think you can use the same technique as demonstrated in this article here: http://www.aspose.com/documentation/.net-components/aspose.words-for-.net/howto-convert-an-image-to-pdf.html

In your case you will want to export to HTML/MHTML instead of PDF.

If you have any troubles please feel free to ask.

Thanks,

Adam, thanks for the answer. I looked through the code example and it is using image file types (.png, .tiff, etc.) which can have multiple “frames” per image file.

I’m trying to do something similar but the difference is I’m starting with multiple files and want to get one .docx document as the result.

I have one HTML file with “img” tags in it, along with the image files (image/png) that they refer to. What I want to get as output is one .docx file that includes both the text and formatting, from the HTML document, and the images, in the appropriate places.

(The HTML is input, not output.)

Hi Don,

Thanks for the clarification.

You can convert an HTML document with images to DOCX by simply loading it into a Document object and saving it as DOCX. Please see the code below.

Document doc = new Document("in.html");<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

doc.Save("out.docx");

If your HTML contains images with relative paths you may also want to use the BaseUri property of LoadOptions. This will allow you set the base location used to resolve such paths.

Thanks,

Thanks – all working fine now. (Looks fine in Microsoft Word.) here’s the document so far: http://pastebin.com/SBTP1jM5


The paths to the image are relative to the MHT file itself. So far I’m getting correct conversion of the text with a couple of issues: the image is missing, the order of the table cells is flipped, and the periods at the ends of the two sentences are coming through at the beginnings instead of the ends. The main problem is that the image is missing. The “src” attribute in the HTML and the “Content-Location” in the image/png part are the same, though.

Hi

It is perfect that all works fine. Please feel free to ask in case of any issues, we are always glad to help you.

Best regards,