We are currently exploring solutions for converting rich HTML content (including custom bullet points, table styles, images, and layouts) into a Word document that includes a cover page. Our current library requires manual XML modifications, which is cumbersome. Can Aspose help streamline this process? Which product would you recommend?
Additionally, we will soon need to convert HTML content into PowerPoint presentations, maintaining the layout style. Does Aspose offer a solution for this as well?
After a talk with the product team, we have more specific questions.
Currently, we use Pandoc to convert HTML to Docx. Itâs working well. But, weâre struggling applying specific word templates with styles.
Our customers need to export the HTML content to their docx template with:
a cover page, header, footer,
table of contents,
paragraph styles, bullet styles, table styles
positioned and sized images
We used to make a concatenation between the docx converted from HTML and the customer template but styles (bullet point and tables) arenât correctly applied.
So weâre looking for other solutions, like Aspose, and we have several questions:
Is it possible to add HTML content to an existing Word Template (or Word document) ?
Does the inserted and converted HTML content correctly apply the templateâs default styles for : Title, Paragraph, List, Table, Character, Section and Footnote styles?
Is it possible, with this HTML content insertion, to maintain a cover page, a table of contents and the Templateâs headers and footers?
Can images also be inserted using this method, while maintaining the sizes, alignments and proportions defined in the HTML?
Is it possible to add CSS to HTML for formatting adjustments, and have this taken into account in the conversion, while still applying the Templateâs default styles?
Which licence do you recommend for doing on-prem (private cloud) with our customers ?
Whatâs your advise: convert first html to docx and then apply a style ? Or other practice are better ?
Is your solution âplug-and-playâ ?
Very urgent subject for us. If you can come back to me soon, it would really be appreciated.
Yes, you can add HTML content to an existing Word document. For example you can insert bookmark in your document where HTML content should be inserted and then use DocumentBuilder.InsertHtml method:
Document doc = new Document(@"C:\Temp\in.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
// Move to bookmark.
builder.MoveToBookmark("InsertHtmlHere");
// Insert HTML
builder.InsertHtml("<b>This is my cool <i>HTML</i><b>");
doc.Save(@"C:\Temp\out.docx");
While inserting HTML you can control how HTML is inserted. For example you can specify HtmlInsertOptions.UseBuilderFormatting option. In this case font and paragraph formatting specified in DocumentBuilder will be used as base formatting for text inserted from HTML.
Sure, see the example provided in the first question answer.
Generally yes. But you should note, however, that HTML documents and MS Word documents object models are quite different and it is not always possible to provide 100% fidelity after conversion one format to another. In most cases Aspose.Words mimics MS Word behavior when work with HTML.
The same answer as above.
It would be better to contact our sales team in Aspose.Purchse forum. My colleagues from sales team will help you to select the right license for your needs.
Aspose.Words is a class library. It does not require any special configuration or additional software to work. So generally, yes, you can consider Aspose.Words solution as âplug-and-playâ.
Thank you for this replies ! Thatâs great, weâre currently testing thanks to your replies.
One more question, do you manage layouts like âcolumnsâ ? For example, I want my text on the left and an image on the right. In HTML, itâs easy to organize this type of layout, but is it possible to convert this kind of layout to a DOCX file?
I would need more information about the style when adding html in a document.
When I insert html to the builder, I expect the html to take the styles of the builder. But tables donât use the default style of the builder. they use a default bland style.
You can use table with two columns to achieve this.
Could you please provide you sample HTML, template, output and expected output documents here for our reference. As I have mentioned earlier, Aspose.Words is designed to work with MS Word documents. HTML documents and MS Word documents object models are quite different and it is not always possible to provide 100% fidelity after conversion one format to another. In most cases Aspose.Words mimics MS Word behavior when work with HTML.
@samthink You can apply the table style after inserting HTML. For example see the following code:
Document doc = new Document(@"C:\Temp\in.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
builder.MoveToDocumentEnd();
builder.InsertHtml(File.ReadAllText(@"C:\Temp\in.html"));
// Apply styles to the tables.
foreach (Table t in doc.GetChildNodes(NodeType.Table, true))
t.StyleName = "Tableau par defaut";
doc.Save(@"C:\Temp\out.docx");
Indeed, we succeeded adding the styles to the table, but we cannot reproduce the dynamic size of the images. In our case the images in the docx are all the same size. Iâm using the java version could it be the reason ?
In the html we shared we used relative â%â units and your converted docx was correct , is this an special case ? Having absolute units can be difficult on our side
@samthink It is not a special case or something other. This is HTML. As I have already mentioned HTML and MS Word document object models are different and it is impossible to convert one into another without loses.
I really understand, but your converter and our differ with the same code(I think) and the sames input files. Our output is different in the image handling size. Yours is correct and ours is not. Did you modify the html to put absolute units ?
Our version was different I was using 24.1 I thought it was the last version. My fault, thanks for your time and help. So now sizes are now correctly handled even with the relative size, like your output.
As you can see in the out.docx, we have an specific case where we have images in tabs. In the main document the size of the image (in %) is correctly handled,but if the image is in a tab cell, the image doesnât fit and doesnât correspond to the html.
Is there an option for this ?
Also I couldnât find any documentation specifying with HTML tags where handled by the InsertHtml
@samthink I am afraid, there is not such option. As it was mentioned above there is no way to retain the same HTML content formatting as in browser after inserting HTML into the document.
Unfortunately, there is no such documentation. In most cases Aspose.Words mimics MS Word behavior when work with HTML.