Converting HTML string to doc

Hello, I been looking through your documentation and past forum post and I couldn’t find anything on if you are able to convert HTML string into a doc and keep all of it’s formating. I notice you have can pass in a html file but that’s not what I want, if not a string than maybe some type of byte array? example I would pass in a string like:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    
    <meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
    <title>Untitled</title>
    <style type="text/css">
         p {
            text-indent: 0pt;
            padding: 0px 0px 0px 0px;
            margin-top: 0px;
            margin-right: 0px;
            margin-bottom: 12px;
            margin-left: 0px;
            text-align: left;
            font-family: 'Verdana';
            font-style: Normal;
            font-weight: normal;
            font-size: 16px;
            color: #000000;
        }

         .defaultDocumentStyle {
            telerik-style-type: default;
            telerik-style-name: defaultDocumentStyle;
            font-family: 'Arial';
            font-style: Normal;
            font-weight: normal;
            font-size: 16px;
            margin-bottom: 12px;
        }

         .p_E2968D9D {
            telerik-style-type: local;
        }

         .s_E2968D9D {
            telerik-style-type: local;
        }

        
    </style>
</head>
<body>
    <p class="p_E2968D9D"><span class="s_E2968D9D">Tsome text from the html</span></p>
</body>
</html>

Click the image below to view your page:
<img src="https://someimagefile.com/image.jpg">

and have the doc look like:


<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252"><title>Untitled</title>
<style type="text/css">
    p {
        text-indent: 0pt;
        padding: 0px 0px 0px 0px;
        margin-top: 0px;
        margin-right: 0px;
        margin-bottom: 12px;
        margin-left: 0px;
        text-align: left;
        font-family: 'Verdana';
        font-style: Normal;
        font-weight: normal;
        font-size: 16px;
        color: #000000;
    }

    .defaultDocumentStyle {
        telerik-style-type: default;
        telerik-style-name: defaultDocumentStyle;
        font-family: 'Arial';
        font-style: Normal;
        font-weight: normal;
        font-size: 16px;
        margin-bottom: 12px;
    }

    .p_E2968D9D {
        telerik-style-type: local;
    }

    .s_E2968D9D {
        telerik-style-type: local;
    }
</style><p class="p_E2968D9D"><span class="s_E2968D9D">some text from html </span></p>

Click the image below to view your page:
<img src="http://i.nflcdn.com/static/site/3.10/img/global/alt/large-logo.png">

Hi,

Thanks for your inquiry. I think you can achieve what you need after reading the article suggested below:
https://reference.aspose.com/words/net/aspose.words/documentbuilder/inserthtml/

Alternatively, you can load your HTML document into Aspose.Words and then append it to another empty Word document by using Aspose.Words. To achieve this, please see the following code snippet:

Document doc = new Document();
Document htmlDoc = new Document(@"c:\test\in.html");
doc.AppendDocument(htmlDoc, ImportFormatMode.KeepSourceFormatting);
doc.Save(@"c:\temp\out.docx");

Please let us know if you need more information, we are always glad to help you.

Best Regards,

The first link seems to be exactly what I am looking for, my second question is will this also render the images from the img tags or should I use the method found here: base64 as image

Hi,

Thanks for your inquiry. Yes DocumentBuilder.InsertHtml method supports rendering img tags. Please see the following code snippet:

DocumentBuilder builder = new DocumentBuilder();
builder.InsertHtml("");
builder.Document.Save(@"c:\temp\out.docx");

I hope, this will help.

Best Regards,

everything looks good in my test app I’ll let you know if I run into any issues thanks

Thanks for this, this solved a problem i was having as well.

The only thing i have left here is how to preserve the HTML formatting as it is imported into the word document.

I have a report chart built in straight HTML using CSS (or inline tags, whichever i need to use) to control the chart appearance. But when I port it into the word document using the document builder it comes out as flat text with no style formatting.

Is there a way to do this? or do i have to create the styles piece by piece as the document is built?

Any help you can provide is much appreciated. I want to use Aspose for all the reporting in my project, but due to a time crunch i need to port the ones already available in HTML into a word doc any way i can.

I’m having the same issue as philsanchez, it doesn’t handle my styling I’m using inline css.

Hi,

Thanks for your inquiry. Please note that Aspose.Words was originally designed to work with MS Word documents. Upon processing HTML, some features of HTML might be lost. You can find a list of limitations upon HTML exporting/importing here:

https://docs.aspose.com/words/net/save-in-html-xhtml-mhtml-formats/

Secondly, could you please attach your input HTML files here for testing? I will investigate the issue on my side and provide you more information.

Best Regards,

File is attached, pretty much just strip the text out of the file and then try converting to doc

Hi,

Thanks for your inquiry.

While using the latest version of Aspose.Words i.e. 10.8.0, I managed to reproduce some issues on my side. Your request has been linked to the appropriate issues and you will be notified as soon as it is resolved.

Moreover, the problems occur because, currently, Aspose.Words does not support inheriting styles from parent elements and assignment of multiple class names to a single element.

Sorry for inconvenience.

Best Regards,

The issues you have found earlier (filed as WORDSNET-3606) have been fixed in this .NET update and this Java update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(77)

The issues you have found earlier (filed as WORDSNET-2021) have been fixed in this .NET update and this Java update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(51)