Clarifying difference between SaveFormat.Html and SaveFormat.HtmlFixed

sft1100 · December 8, 2021, 2:19pm

Hi, we are using Aspose.Words 21.3.0 for converting doc, docx, rtf files to html format.
We noticed that if document (doc, docx, rtf) has two or more pages, SaveFormat.HtmlFixed creates a separate div tag (<div class=“awdiv awpage”) for each page (go to case 1).
However SaveFormat.Html doesn’t create a separate div tag for each page, It just put all content into the body (go to case 2).

case 1 (SaveFormat.HtmlFixed):

<!DOCTYPE html>
<html>
  ...
 <body>
  <div class="awdiv awpage" ... >
    ... FIRST PAGE CONTENT ...
  </div>
  <div class="awdiv awpage" ... >
    ... SECOND PAGE CONTENT ...
  </div>
  ...
</html>

case 2 (SaveFormat.Html):

<!DOCTYPE html>
<html>
 ...
<body>
 ... FIRST PAGE CONTENT ...
 ... SECOND PAGE CONTENT ...
...
</html>

So we have two questions:

Could you please give us more information about the difference between SaveFormat.Html and SaveFormat.HtmlFixed?
Can we somehow achieve the similar output (described in case 1) with SaveFormat.Html?

sergey.lobanov · December 9, 2021, 5:16am

@sft1100,
HTML is not a fixed-page format and it does not describe geometry of the content objects. It also doesn’t separate its content into pages (like it is implemented in DOCX documents). When using a HTML-fixed save format, Aspose.Words saves the document in the HTML format using absolutely positioned elements. For more information please check the following article:

sft1100 · December 15, 2021, 12:05pm

Could you please confirm that we cannot achieve the output described in case 1 with SaveFormat.Html?
Also, we want to know is there a way to add any markers at the end of each page using SaveFormat.Html? We want to use these markers for post-processing output Html files.

sergey.lobanov · December 16, 2021, 6:10am

@sft1100,

You can’t achieve the same output, described in case 1 with SaveFormat.Html. To get the desired result please use SaveFormat.HtmlFixed
You can use the Document.ExtractPages method to add a marker in the end of each page. Please check the following code example:

Document doc = new Document(@"C:\Temp\input.docx");
Document outputDoc = new Document();
DocumentBuilder builder = new DocumentBuilder(outputDoc);
outputDoc.RemoveAllChildren();
            
for (int i=0; i<doc.PageCount; i++)
{
    Document page = doc.ExtractPages(i, 1);
    outputDoc.AppendDocument(page, ImportFormatMode.KeepSourceFormatting);
    builder.MoveToDocumentEnd();
    builder.Writeln("PageEnd");
}
outputDoc.Save(@"C:\Temp\outputDoc.html");