Free Support Forum - aspose.com

Docx To HTML + split to pages


#1

I am testing Aspose.Words version 19.1.
I convert Word Document to Html pages. each page in document to HTML page. In most of pages, the content in HTML page is same as in the document, but in some cases, for instance, when paragraph start in one page and continue to second page, the conversion move the text to the next page.
I tried to loop each paragraph and use the property papagraph.ParagraphFormat.KeepTogether = false but it did not help.

How can I enforce the Aspose to convert from .docx to HTML AS-IS with the exact content in each page?

I think this issue relevant to Tables in document too.

my example:

var doc = new Document(fileLocation);
HtmlFixedSaveOptions options = new HtmlFixedSaveOptions();
options.PageCount = 1;
options.PageIndex = 0;
options.ExportEmbeddedImages = true;
options.NumeralFormat = NumeralFormat.System;
options.ExportEmbeddedCss = true;
options.UseHighQualityRendering = true;
options.SaveFormat = SaveFormat.HtmlFixed;
options.ExportEmbeddedSvg = true;
options.ExportEmbeddedFonts = true;
doc.Save(fileLocation, options);


#2

@Yehudayi

Thanks for your inquiry. Please ZIP and attach the following resources here for testing.

  • Input Word document.
  • HTML files showing undesired behavior.
  • Please create a standalone console application ( source code without compilation errors ) that helps us to reproduce your problem on our end and attach it here for testing.

As soon as you get these pieces of information ready, we will start investigation into your issue and provide you more information.

Thanks for your cooperation.


#3

AsposeTester.zip (4.9 MB)
the document file is GA1.docx and the outputs are GA1_0.html, GA1_1.html
the following paragraph starting in first page:
“Education: One of the major points addressed as to why youth join extremist groups is because they aren’t well educated about the realities of these groups and can be easily”
and is finished in the second page: “swayed by the lies … help bring a solution.”

in HTML output all the paragraph is in file “GA1_1.html”


#4

@Yehudayi

Thank you for sharing document. Please use the following code example to convert each page of Word document into separate HTML page. Hope, this helps.

var doc = new Document("D:\\temp\\GA1.docx");

HtmlFixedSaveOptions options = new HtmlFixedSaveOptions();
options.PageIndex = 0;
options.PageCount = doc.PageCount;
options.ExportEmbeddedImages = true;
options.NumeralFormat = NumeralFormat.System;
options.ExportEmbeddedCss = true;
options.UseHighQualityRendering = true;
options.SaveFormat = SaveFormat.HtmlFixed;
options.ExportEmbeddedSvg = true;
options.ExportEmbeddedFonts = true;
options.PageSavingCallback = new HandlePageSavingCallback();

doc.Save("D:\\temp\\GA1_19.1.html", options);

public class HandlePageSavingCallback: IPageSavingCallback {
 public void PageSaving(PageSavingArgs args) {
  args.PageFileName = string.Format(@ "D:\\temp\\GA1_19.1_Page_{0}.html", args.PageIndex);
 }
}

Please check attached HTML pages for your kind reference.Output.zip (78.9 KB)


#5

I run the code, you sent me.
my output is different.
Is the reason the messages like “Evaluation Only. Created with Aspose.Words. Copyright 2003-2019 Aspose Pty Ltd” ?
myoutput.zip (149.4 KB)


#6

@Yehudayi

Yes, your output is different due to evaluation watermark. You are using Aspose APIs in evaluation mode ( without applying a License ). If you want to test Aspose.Words without the evaluation watermark, you can request a 30-day Temporary License. Please refer to How to get a Temporary License?.

Please also refer to the following article:
https://docs.aspose.com/display/wordsnet/Licensing