Keeping original formatting for imported HTML

Hi,

Is it possible to keep original formatting of the text (including headers) when doing import from HTML?

I managed to do this only when creating new document and immediately after using a builder to import HTML. My HTML formatting including fonts and paragraph styles is retained. When I try to do this on already established template that has it’s own styles, HTML formatting is adapted to that of the template (like fonts and color of headings, etc.). My HTML import will come from CLOB in database but I would need to keep all the formatting from the HTML. In my tests importing CLOB was OK except formatting.

Any ideas?

Thanks

Hi Branislav,

Thanks for your inquiry. Please note that, content inserted by DocumentBuilder.insertHtml method does not inherit formatting specified in DocumentBuilder options. Whole formatting is taken from HTML snippet. If you insert HTML with no formatting specified, then default formatting is used for inserted content, e.g. if font is not specified in your HTML snippet, default font will be applied (Times New Roman).

In case you are using an older version of Aspose.Words, I would suggest you please upgrade to the latest version (v13.3.0) from here and let us know how it goes on your side. If the problem still remains, please share following details for investigation purposes.

What environment are you running on?

  • OS (Windows Version or Linux Version)
  • Architecture (32 / 64 bit)
  • Java version
  • Please supply us with the code from your application that is causing the issue
  • Please supply us with the input document that is causing the issue
  • Please supply us with the output document showing the undesired behavior
  • Please supply us with the expected document showing the desired behavior (You can create this document using Microsoft Word).

Hi,

Thank you for the reply. It was an error on my part as font information was missing when taken from online editor. Regardless of this when document is imported at beginning of the document, fonts are different than when the document is inserted in the middle or at the end. I though that if no styling is specified, default template font should be applied. Also, when I specify style in HTML on document level, my entire document is then converted to this style and not just a section. I will attach an example.

How do I specify styles in the HTML file so they are only applicable to the HTML and not the entire document?

My environment: Win 7 Ent 64 bit, Java 1.6,

Inserting from file

DocumentBuilder builder = new DocumentBuilder(doc);
builder.getPageSetup().setLeftMargin(20);
builder.getPageSetup().setRightMargin(20);
builder.getPageSetup().setTopMargin(20);
builder.getPageSetup().setBottomMargin(20);
builder.moveToDocumentEnd();
String inputFile2 = new File(".").getAbsolutePath() + "/sourcedocuments/testhtml2.html";
builder.insertHtml(readFile(inputFile2));
String result4 = "c:/temp/result/mrs_4_" + (new Date().getTime()) + "_" + (new Random().nextInt(1000)) + ".docx";
doc.save(result4);

Inserting from database:

private class HandleMergeImageFieldFromBlob implements IFieldMergingCallback
{

    @Override
    public void fieldMerging(FieldMergingArgs e) throws Exception
    {
        if ("resume".equalsIgnoreCase(e.getFieldName()))
        {
            DB2Clob b = (DB2Clob) e.getFieldValue();
            if (b != null)
            {
                DocumentBuilder builder = new DocumentBuilder(e.getDocument());
                builder.moveToMergeField(e.getDocumentFieldName());
                InputStream in = b.getAsciiStream();
                StringWriter w = new StringWriter();
                IOUtils.copy(in, w);
                String clobAsString = w.toString();
                builder.insertHtml(clobAsString);
                e.setText("");
            }
        }
    }

Thanks

Hi Branislav,

Thanks for sharing the detail. Please note that Aspose.Words tries to mimic the same behavior as MS Word do. Upon processing HTML, some features of HTML might be lost. You can find a list of limitations upon HTML exporting/importing here:
https://docs.aspose.com/words/java/load-in-the-html-html-xhtml-mhtml-format/
https://docs.aspose.com/words/java/save-in-the-html-html-xhtml-mhtml-format/

In your case, I suggest you please use the insertDocument method to insert html document to any location of MS Word document. Please check the insertDocument method shared at following documentation link.
https://docs.aspose.com/words/java/insert-and-append-documents/

I have used the following code snippet to test your scenario. This code example insert the HTML document at mail merge field “RESUME”. I have attached the output document with this post for your kind reference. Hope this helps you. Please let us know if you have any more queries.

Document doc = new Document(MyDir + "Sample document generation.docx");
// This example creates a table, but you would normally load table from a database.
java.sql.ResultSet resultSet = createCachedRowSet(new String[]
{
    "RESUME"
});
addRow(resultSet, new String[]
{
    "RESUME"
});
com.aspose.words.DataTable dt = new com.aspose.words.DataTable(resultSet, "Employee");
doc.getMailMerge().setFieldMergingCallback(new HandleMergeImageFieldFromBlob());
doc.getMailMerge().executeWithRegions(dt);
doc.save(MyDir + "out.docx");
public class HandleMergeImageFieldFromBlob implements IFieldMergingCallback
{
    private DocumentBuilder mBuilder;
    public void fieldMerging(FieldMergingArgs e) throws Exception
    {
        if (mBuilder == null)
        {
            mBuilder = new DocumentBuilder(e.getDocument());
        }
        if (e.getFieldName().equals("RESUME"))
        {
            // if (e.getFieldValue() instanceof BufferedImage) {
            mBuilder.moveToMergeField(e.getFieldName());
            // mBuilder.insertHtml(readFile());
            Document doc2 = new Document(MyDir + "testhtml2.html");
            insertDocument(mBuilder.getCurrentParagraph(), doc2);
            // }
        }
    }
    public void imageFieldMerging(ImageFieldMergingArgs args) throws Exception
    {}
}