Aspose.Words.Document converting to HTML is converting all <h1>tag into <p> tag

We are converting HTML to word doc and then again converting back word doc to html. Initially HTML contains <h1> tag but after converting to word doc and then to HTML it is converted into <p> tag.

@atulpandey34 Aspose.Words imports <h1> - <h6> as a paragraphs with the corresponding Heading style in MS Word document. For example see the following HTML:

<html>
<body>
    <h1>Heading 1</h1>
    <h2>Heading 2</h2>
    <p>Regular paragraph</p>
</body>
</html>

After conversion to DOCX, it will look like this:

<w:p>
	<w:pPr>
		<w:pStyle w:val="Heading1" />
		<w:keepNext w:val="0" />
		<w:keepLines w:val="0" />
		<w:spacing w:before="0" w:after="322" />
		<w:rPr>
			<w:b />
			<w:bCs />
			<w:sz w:val="48" />
			<w:szCs w:val="48" />
		</w:rPr>
	</w:pPr>
	<w:r>
		<w:rPr>
			<w:rFonts w:ascii="Times New Roman" w:eastAsia="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman" />
			<w:i w:val="0" />
			<w:color w:val="auto" />
		</w:rPr>
		<w:t>Heading 1</w:t>
	</w:r>
</w:p>
<w:p>
	<w:pPr>
		<w:pStyle w:val="Heading2" />
		<w:keepNext w:val="0" />
		<w:keepLines w:val="0" />
		<w:spacing w:before="299" w:after="299" />
		<w:rPr>
			<w:b />
			<w:bCs />
			<w:sz w:val="36" />
			<w:szCs w:val="36" />
		</w:rPr>
	</w:pPr>
	<w:r>
		<w:rPr>
			<w:rFonts w:ascii="Times New Roman" w:eastAsia="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman" />
			<w:i w:val="0" />
			<w:color w:val="auto" />
		</w:rPr>
		<w:t>Heading 2</w:t>
	</w:r>
</w:p>
<w:p>
	<w:pPr>
		<w:spacing w:before="240" w:after="240" />
	</w:pPr>
	<w:r>
		<w:t>Regular paragraph</w:t>
	</w:r>
</w:p>

When you convert the MS Word document to HTML the paragraphs with Heading style applied are exported as <h1> - <h6>, so <h1> - <h6> are roundtripped by Aspose.Words. Here is a simple code I have used for testing:

Document doc = new Document(@"C:\Temp\in.html");
doc.Save(@"C:\Temp\out.docx");
doc = new Document(@"C:\Temp\out.docx");
doc.Save(@"C:\Temp\out.html");

And here is the output:

<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta http-equiv="Content-Style-Type" content="text/css" />
    <meta name="generator" content="Aspose.Words for .NET 23.2.0" />
    <title></title>
</head>
<body style="font-family:'Times New Roman'; font-size:12pt">
    <div>
        <h1 style="margin-top:0pt; margin-bottom:16.1pt; font-size:24pt"><span>Heading 1</span></h1>
        <h2 style="margin-top:14.95pt; margin-bottom:14.95pt; font-size:18pt"><span>Heading 2</span></h2>
        <p style="margin-top:12pt; margin-bottom:12pt"><span>Regular paragraph</span></p>
    </div>
</body>
</html>

As you can see <h1> and <h2> are preserved after roundtrip.

Could you please attach your source HTML here for testing? We will check it and provide you more information.

Thanks for quick response, This this is very helpful. I will get back to you on this.

1 Like