Hello. We have tried converting the following document to html:
docHeaderOnly.docx (53.2 KB)
We are using latest version of [Aspose.Words 25.6.0], in .net8.
The code used is the following:
var docxFilePath = "docHeaderOnly.docx";
var docxSavePath = "result.html";
var saveOptions = new Aspose.Words.Saving.HtmlSaveOptions
{
SaveFormat = Aspose.Words.SaveFormat.Html,
CssStyleSheetType = CssStyleSheetType.Inline,
ExportImagesAsBase64 = true,
ExportFontsAsBase64 = true,
PrettyFormat = true,
Encoding = Encoding.UTF8,
ExportPageSetup = true,
ExportPageMargins = true,
ExportRoundtripInformation = false
};
var docxDocument = new Aspose.Words.Document(docxFilePath);
docxDocument.JoinRunsWithSameFormatting();
docxDocument.Save(docxSavePath, saveOptions);
We end up with an empty html file, like this (in archive):
result.7z (552 Bytes)
We also tried without any options set, and we get the same result.
Any idea on what to do? Thank you
Note: The docx has a header with some information, which doesn’t get converted in the html, and we end up with the empty html
@IBurzoEvoRWS
Cause
The issue with the conversion of your DOCX file to HTML, resulting in an empty HTML file, is likely due to the handling of headers and footers during the conversion process. Aspose.Words has specific settings for exporting these elements, and if they are not configured correctly, they may not appear in the output.
Solution
To ensure that headers and footers are included in the HTML output, you need to adjust the ExportHeadersFootersMode
property in your HtmlSaveOptions
. By default, this setting may exclude them, which can lead to missing content in your output.
Here’s how you can modify your code:
var saveOptions = new Aspose.Words.Saving.HtmlSaveOptions
{
SaveFormat = Aspose.Words.SaveFormat.Html,
CssStyleSheetType = CssStyleSheetType.Inline,
ExportImagesAsBase64 = true,
ExportFontsAsBase64 = true,
PrettyFormat = true,
Encoding = Encoding.UTF8,
ExportHeadersFootersMode = Aspose.Words.Saving.ExportHeadersFootersMode.PerSection // Ensure headers/footers are included
};
var docxDocument = new Aspose.Words.Document(docxFilePath);
docxDocument.JoinRunsWithSameFormatting();
docxDocument.Save(docxSavePath, saveOptions);
Next Steps
- Update your
HtmlSaveOptions
to include the ExportHeadersFootersMode
property as shown above.
- Run the conversion again and verify that the headers and footers are now included in the HTML output.
- If the issue persists, check the content of the original DOCX file to ensure that it contains the expected headers and footers.
Please implement these changes and let me know if you need further assistance!
1: Even if not provided, the Aspose.Words code says it automatically defaults to PerSection for Html conversion.
2: I tried also specifying it explicitly as suggested, and we still get no header
@IBurzoEvoRWS Unfortunatly, I think there is no way to preserve all different html headers/footers in the html document. By default Aspose.Words exports only primary headers/footers of the document per section when saving to HTML. It will be correct only with the primary headers/footers.
if you have primary headers/footers, but different first page, you can use following code to get them:
Document doc = new Document("input.docx");
HtmlSaveOptions saveOptions = new HtmlSaveOptions
{
ExportImagesAsBase64 = true,
ExportHeadersFootersMode = ExportHeadersFootersMode.FirstPageHeaderFooterPerSection,
ExportRoundtripInformation = false,
SaveFormat = SaveFormat.Html
};
// Save each page as separate HTML.
for (int i = 0; i < doc.PageCount; i++)
{
if (i == 1)
saveOptions.ExportHeadersFootersMode = ExportHeadersFootersMode.PerSection;
// Extract single page.
Document htmlDoc = doc.ExtractPages(i, 1);
htmlDoc.Save(string.Format("output_{0}.html", i), saveOptions);
}