Word Styles and HTML

Hi
Creating a text first as HTML and I then save to a Word document. Everything is fine but my need would be to tag the HTML code so that the Word document could treat different parts of the text with different styles. This would simplify the formatting of the word document. Is there a way of tagging the text in the HTML code so that Word would understand the texts with different styles? Now all text is with the ‘Normal’ style.

Hi

Thanks for your inquiry. Yes, of course you can achieve this. For example see the following HTML:

<html>
<head>
    <style type="text/css">
        .myStyle {
            font-size: 16pt;
            font-weight: bold;
            font-style: italic
        }
    </style>
</head>
<body>
    <div>
        <h1><span>This is heading 1 style</span></h1>
        <h2><span>This is heading 2 style</span></h2>
        <h3><span>This is heading 3 style</span></h3>
        <p><span>This is normal style</span></p>
        <p class="myStyle"><span>This is my custom style</span></p>
    </div>
</body>
</html>

Hope this helps.
Best regards.

ok, so when saved to word the
tag will be converted to a Word style named ‘myStyle’ and e.g. <font style="..." will not? What about a <div style=... ? What I think my problem then is that the following code removes all formatting when I only want to remove the hyperlinks (got this code from someone from Aspose some time ago. How do I remove just the hyperlink formatting (do not want the hyperlinks to be included in the Word document)?

Public Function removeHyperlinkFormatting(ByRef wordDoc As Aspose.Words.Document) As Boolean

    Try
        If Not wordDoc Is Nothing Then
            Dim fieldStarts As NodeCollection = wordDoc.GetChildNodes(NodeType.FieldStart, True)
            Dim nodesForRemoval As ArrayList = New ArrayList
            Dim fieldStart As Fields.FieldStart

            For Each fieldStart In fieldStarts
                If fieldStart.FieldType = Fields.FieldType.FieldHyperlink Then
                    Dim node As Node = fieldStart
                    While node.NodeType <> NodeType.FieldSeparator
                        nodesForRemoval.Add(node)
                        node = node.NextSibling

                    End While
                    nodesForRemoval.Add(node)
                    While node.NodeType <> NodeType.FieldEnd
                        If node.NodeType = NodeType.Run Then
                            CType(node, Run).Font.ClearFormatting()

                        End If
                        node = node.NextSibling

                    End While
                    nodesForRemoval.Add(node)
                End If
            Next
            For Each node As Node In nodesForRemoval
                node.Remove()
            Next
        End If
        Return True
    Catch ex As Exception
        QARoutinesLog.doLog(ex.ToString, TYPEERRORWORD)

        Return False
    End Try
End Function

Hi

Thanks for your request. The same can be accepted to spans, if you would like to use Character Styles.

<html>
<head>
    <style type="text/css">
        .myStyle {
            font-size: 16pt;
            font-weight: bold;
            font-style: italic
        }

        .myCharStyle {
            font-size: 10pt;
            font-weight: bold;
        }
    </style>
</head>
<body>
    <div>
        <h1><span>This is heading 1 style</span></h1>
        <h2><span>This is heading 2 style</span></h2>
        <h3><span>This is heading 3 style</span></h3>
        <p>
            <span>This is normal style </span>
            <span class="myCharStyle">This is my character style</span>
        </p>
        <p class="myStyle"><span>This is my custom style</span></p>
    </div>
</body>
</html>

Regarding hyperlinks, if you need to insert hyperlinks as a simple text, you can just remove hyperlinks from your HTML. For example, you can try using regular expressions to achieve this. Please see the following code:

// Read HTML string.
string html = File.ReadAllText(@"Test001\in.html");
// Replace hyperlinks in the HTML string with simple text.
Regex regex = new Regex("]*>(.*)", RegexOptions.Singleline | RegexOptions.IgnoreCase);
html = regex.Replace(html, "$1");
// Get HTML bytes and create stream.
byte[] htmlBytes = Encoding.UTF8.GetBytes(html);
MemoryStream htmlStream = new MemoryStream(htmlBytes);
// Create document from stream.
Document doc = new Document(htmlStream);
// Save output document.
doc.Save(@"Test001\out.doc");

Hope this helps.
Best regards.

Got the regex stuff working, no doubt I’ll get the styles working as well. The key seems to be the tag. Thanks once again, your support is excellent!