Save only HTML content

Hi,

I’m trying to convert a .doc file into a .html file and i’m having some trouble. The thing is that i don’t really want all the resulting html.

Imagine this is the resulting html:

<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta http-equiv="Content-Style-Type" content="text/css" /><meta name="generator" content="Aspose.Words for .NET 9.6.0.0" /><title></title></head><body><div><p style="margin:0pt"><span style="color:#ff0000; font-family:Calibri; font-size:12pt; font-weight:bold">Evaluation Only. Created with Aspose.Words. Copyright 2003-2010 Aspose Pty Ltd.</span></p><p style="margin:0pt"><span style="font-family:Calibri; font-size:14pt">On the Insert tab, the galleries include items that are designed to coordinate with the overall look of </span><span style="font-family:Calibri; font-size:14pt; font-style:italic">your</span><span style="font-family:Calibri; font-size:14pt"> document. You can use these galleries to insert tables, headers, footers, lists, cover pages, and other document building blocks. When you create pictures, charts, or diagrams, they also coordinate with your current document look.</span></p><p style="margin:0pt"><span style="font-family:Calibri; font-size:14pt">To change the overall look of yo</span><span style="font-family:Calibri; font-size:14pt; font-weight:bold">ur do</span><span style="font-family:Calibri; font-size:14pt">cu</span><span style="font-family:Calibri; font-size:14pt; font-style:italic">me</span><span style="font-family:Calibri; font-size:14pt">nt, choose new Theme elements on the Page Layout tab. To change the looks available in the Quick Style gallery, use the Change Current Quick Style Set command. Both the Themes gallery and the Quick Styles gallery provide reset commands so that you can always restore the look of y</span><a name="phrase_piece"></a><span style="font-family:Calibri; font-size:14pt">our </span><span style="font-family:Calibri; font-size:14pt; text-decoration:underline">doc</span><span style="font-family:Calibri; font-size:14pt">u</span><span style="font-family:Calibri; font-size:14pt">ment to the original contained in your current template.</span></p><p style="margin:0pt"><span style="font-family:Calibri; font-size:14pt">&#xa0;</span></p><p style="margin:0pt"><span style="font-family:Calibri; font-size:14pt">&#xa0;</span></p></div></body></html>

I wanted to save my .html (it can be saved as a .txt either) but only with the “real” content, leaving behind those initial tags like:

<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta http-equiv="Content-Style-Type" content="text/css" /><meta name="generator" content="Aspose.Words for .NET 9.6.0.0" />

It would give me just the text like:

<b>Hi!</b>

Is there any way to do this?

Regards,

Hi
Thanks for your request. As an option, you can try using regular expression to get “real” content from the HTML generated by Aspose.Words. You should just extract content between … tags.
Also, to optimize HTML output, you can call JoinRunsWithSameFormating method before converting document to HTML. This can significantly reduce number of tags in the output HTML:
https://reference.aspose.com/words/net/aspose.words/document/joinrunswithsameformatting/
Hope this helps. Please let me know if you need more assistance, I will be glad to help you.
Best regards,