Additional page is adding when converting from Docx to Html

Hai,
I was trying to convert a docx file to html pages, when I am trying to convert each page of docx to seperate html files, html page is generatimg addtional pages at certain pages

file : SIGNALS & SYSTEMSS.docx (599.4 KB)

string path = @"D:\POC\kb pdfs";
Aspose.Words.Document docFile = new Aspose.Words.Document(@"D:\POC\kb pdfs\SIGNALS & SYSTEMSS.docx");

int pageCount = docFile.PageCount;
for (int page = 0; page < pageCount; page++)
{
    using (MemoryStream pageStream = new MemoryStream())
    {
        // Save each page as a separate document.
        Aspose.Words.Document extractedPage = docFile.ExtractPages(page, 1);
        HtmlFixedSaveOptions htmlFixedSaveOptions = new HtmlFixedSaveOptions();
        htmlFixedSaveOptions.ExportEmbeddedCss = true;
        htmlFixedSaveOptions.ExportEmbeddedFonts = true;
        htmlFixedSaveOptions.ExportEmbeddedImages = true;
        htmlFixedSaveOptions.ExportEmbeddedSvg = true;
        htmlFixedSaveOptions.ExportFormFields = true;
        htmlFixedSaveOptions.ExportGeneratorName = true;
        string cssprefix = "aspose_doc" + page;
        htmlFixedSaveOptions.CssClassNamesPrefix = cssprefix;
        htmlFixedSaveOptions.AllowEmbeddingPostScriptFonts = true;
        //htmlFixedSaveOptions.UseTargetMachineFonts = true;
        htmlFixedSaveOptions.SaveFormat = Aspose.Words.SaveFormat.HtmlFixed;
        extractedPage.Save(Path.Combine(path, "convertedHtml", $"{ page + 1}.html"), htmlFixedSaveOptions);

    }
}

check 38.html, it would have an additional page generated

@pooja.jayan The problem might be caused by font substitution. While conversion your document the following fonts are not available on my side:

  • MathJax_Math
  • MathJax_Main
  • MathJax_AMS
  • MathJax_Size2
  • MathJax_Size3
  • MathJax_Size4
  • MathJax_Size1

Could you please attach these fonts here for testing? We will check the issue and provide you more information.
Also, you can check whether the fonts are available on your side by Implementing IwarningCallback like in the following example:

Document doc = new Document(@"C:\Temp\in.docx");
doc.WarningCallback = new WarningCallback();
doc.Save(@"C:\Temp\out.pdf");
private class WarningCallback : IWarningCallback
{
    public void Warning(WarningInfo info)
    {
        if (info.WarningType == WarningType.FontSubstitution)
            Console.WriteLine(info.Description);
    }
}
1 Like

Hai,

Thankyou for your quick response.

The said fonts were also not avaliable on my side.

What can I do in such situation?

https://www.math.usm.edu/MathJax/fonts/HTML-CSS/TeX/woff/

@pooja.jayan Could you please also convert your document to PDF and XPS using Aspose.Words and MS Word on your side? I will compare the results and provide you more information.
If the fonts are not available Aspose.Words applied a set of substitution rules. You can configure these rules to get more accurate result.

Hai,
Thankyou for your response.

PDF converted with Aspose:
demoPdf.pdf (843.5 KB)

PDF converted with MS Word:
SIGNALS & SYSTEMSS.pdf (1.1 MB)

I could not upload xps versions, as the format is not supporting to be uploaded

@pooja.jayan Thank you for additional information. As I can see in your code you are using Document.ExtractPages method. It is not required to use this method if your target format is fixed page format, like PDF, XPS or FixedHtml. Please modify your code like the following:

int pageCount = docFile.PageCount;
for (int page = 0; page < pageCount; page++)
{
    using (MemoryStream pageStream = new MemoryStream())
    {
        // Save each page as a separate document.
        HtmlFixedSaveOptions htmlFixedSaveOptions = new HtmlFixedSaveOptions();
        htmlFixedSaveOptions.PageSet = new PageSet(page);
        htmlFixedSaveOptions.ExportEmbeddedCss = true;
        htmlFixedSaveOptions.ExportEmbeddedFonts = true;
        htmlFixedSaveOptions.ExportEmbeddedImages = true;
        htmlFixedSaveOptions.ExportEmbeddedSvg = true;
        htmlFixedSaveOptions.ExportFormFields = true;
        htmlFixedSaveOptions.ExportGeneratorName = true;
        string cssprefix = "aspose_doc" + page;
        htmlFixedSaveOptions.CssClassNamesPrefix = cssprefix;
        htmlFixedSaveOptions.AllowEmbeddingPostScriptFonts = true;
        //htmlFixedSaveOptions.UseTargetMachineFonts = true;
        htmlFixedSaveOptions.SaveFormat = Aspose.Words.SaveFormat.HtmlFixed;
        docFile.Save(Path.Combine(path, "convertedHtml", $"{ page + 1}.html"), htmlFixedSaveOptions);
    }
}

Hai,
Thankyou for your response.

Working fine for me!!!

1 Like