Insert small sections of html into a word document

Is there an easy way to tak a snippet of xhtml and insert it into a document, for example:

<p>test <b>paragraph</b>.</p><table><tr><td>hi</td><td>bye</td></tr></table>

I need to take little chunks of html and add them with headings into a word document.

Ive started by trying to load the html snippette into a Document, but I am getting a strange exception:

InputStream stream = new StringInputStream("<html><body><p>abc <i>def</i>.</p></body></html>");
Document hdoc = new Document(stream);

The exception is:

java.io.IOException: Problem exporting word document: The document appears to be corrupted and cannot be loaded.

The documentation implies that the Document(InputStream stream) creation method supports auto detecting the document format. Does it not support HTML?

I also tried the following as per this documentation, but it creates the same exception:

InputStream stream = new StringInputStream("<html><body><p>abc <i>def</i>.</p></body></html>");
Document hdoc = new Document(stream, "http://localhost/");

Hi

Thanks for your request. you can use DocumentBuilder.insertHtml method to achieve this. Please see the following link for more information:
https://reference.aspose.com/words/java/com.aspose.words/documentbuilder/#insertHtml-java.lang.String

To load HTML strings try using the following code:

String html = "<html><body><p>abc <i>def</i>.</p></body></html>";
InputStream stream = new ByteArrayInputStream(html.getBytes("UTF-8"));
Document doc = new Document(stream);
doc.save("C:\\temp\\out.doc");

Best regards.

It seems that your example does work when it is run from a simple java test class like what is below, but it fails when it is executed in a servlet under a Tomcat application server, still trying to track down why!

From in a servlet it fails in both the 3.3 version of the JAR and the 4.0 version as well.

public class Test4 {
	public static void main(String[] args) {
		try {
			String html = "<html><body><p>abc <i>def</i>.</p></body></html>";
			InputStream stream = new ByteArrayInputStream(html.getBytes("UTF-8"));
			Document doc = new Document(stream);
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

The idea about using the DocumentBuilder.insertHtml sounds good however I am not quite sure how to use this when my current code is using the basic Document class to build the document. Is threre a way to ‘switch’ to the DocumentBuilder to use the insertHtml method, then switch back to the normal Document class?

Currently the method called by the servlet looks like this:

public Document export(dao.Document policy) throws Exception {

	List<dao.Section> sections = DAOFactory.getSectionFactory().getByDocumentId(policy.getId(),0,500);
	SectionsHelper.sort(sections);

	Document doc = new Document();
	doc.removeAllChildren();

	formatStyles(doc);

	Section section = new Section(doc);
	doc.appendChild(section);
	section.getPageSetup().setSectionStart(SectionStart.NEW_PAGE);
	section.getPageSetup().setPaperSize(PaperSize.A4);

	Body body = new Body(doc);
	section.appendChild(body);

	addHeading(doc,body,"Heading 1", policy.getName()+" ("+policy.getNumber()+")");

	for(dao.Section documentSection : sections) {
		addHeading(doc,body,"Heading");


		String html = "<html><body><p>abc <i>def</i>.</p></body></html>";
		InputStream stream = new ByteArrayInputStream(html.getBytes("UTF-8"));
		Document hdoc = new Document(stream);

		// Now try to insert hdoc into the main doc
	}
	return doc;
}

Hi

Thank you for additional information. Actually, you can use DocumentBuilder class to build whole document. I think, this would be easier than building document using DOM.
For instance, you can try using the following code:

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
// Specify page setup
builder.getPageSetup().setSectionStart(SectionStart.NEW_PAGE);
builder.getPageSetup().setPaperSize(PaperSize.A4);
// insert headings
builder.getParagraphFormat().setStyleIdentifier(StyleIdentifier.HEADING_1);
builder.writeln("This is my heading");
// Insert some HTML
String html = "<html><body><p>abc <i>def</i>.</p></body></html>";
builder.insertHtml(html);

Please see the following link to learn more about DocumentBuilder:
https://docs.aspose.com/words/net/document-builder-overview/
Best regards.

Hi Alexey, thanks for the pointers, however I have already written a few thousand lines of code that generate my document without using DocumentBuilder, I am not quite sure how I would use document builder at the same time as using the basic Document class.

If necessary I will re-write the code to use the document builder, but it would save quite a few hours if I could work out how to setup a html document and just insert that.

Hi Jacob,

Thank you for additional information. I think, there is two ways you can achieve what you need:

  1. Create an empty Document and DocumentBuilder and insert your HTML into this empty document. Then you can insert this document where you need.
  2. In your code, you can move DocuemntBuilder cursor to the necessary position and insert HTML
    https://docs.aspose.com/words/net/navigation-with-cursor/

Best regards.