Convert document to html: page numbers are wrong

Hello,

I have two issues with Aspose: I am converting a word document into html document with HtmlSaveOptions class provided. My word document has for exemple 5 page, where all pages have a footer representing the page number, the first page has his own header and the rest of the other pages have another header which is the same.

I use with HtmlSaveOptions class, the method setExportHeadersFootersMode by setting ExportHeadersFootersMode at PER_SECTION.

When I save my document, it separate in the html all pages by sections, but the page numbers are all the same equals to 1. Moreover, for the first page, there is no header and footer on it.

Do you have a solution to this?

I give you the code I used:

HtmlSaveOptions saveOptions = new HtmlSaveOptions();
saveOptions.setSaveFormat(SaveFormat.HTML);
saveOptions.setExportPageSetup(true);
saveOptions.setExportTocPageNumbers(true);
saveOptions.setExportImagesAsBase64(true);
saveOptions.setExportFontsAsBase64(true);
saveOptions.setExportPageMargins(true);
saveOptions.setExportDocumentProperties(true);
saveOptions.setPrettyFormat(true);
saveOptions.setExportHeadersFootersMode(ExportHeadersFootersMode.PER_SECTION);

Thank you in advance.

@amepage

You are facing the expected behavior of Aspose.Words. It is hard to meaningfully output headers and footers to HTML because HTML is not paginated.

Please note that when you export headers and footers with PerSection mode, only primary headers and footers at the beginning and the end of each section are exported. Please read the detail of HtmlSaveOptions.ExportHeadersFootersMode property.

In your case, we suggest you please save the document to HtmlFixed file format. Hope this helps you.

Thank you for your response. So, I have to use HtmlFixedSaveOptions class, but when I want to save the document with method save, I am using an Outpustream as the first parameters, and if I use HtmlFixedSaveOptions rather than HtmlSaveOptions, an issue occurs telling me : “Resource file(s) cannot be written to disk. When saving the document to a stream either ResourceFolder should be specified or ExportEmbeddedImages, ExportEmbeddedFonts, ExportEmbeddedCss, and ExportEmbeddedSvg should be set or custom streams should be provided via ResourceSavingCallback”

I saw in some exemples that the first parameter is a datadir which is the path of the file, but I need to work and to do it with outputstream.

I have found a hint with ResourceSavingCallBack:

HtmlFixedSaveOptions opts = new HtmlFixedSaveOptions();
opts.setPrettyFormat(true);
opts.setPageSavingCallback(new CustomOutputStreamSavingCallback());

doc.save(osBinDocument, opts);

implementing the class:

private class CustomOutputStreamSavingCallback implements IPageSavingCallback
	{
	    public void PageSaving(PageSavingArgs args)
	    {
	        // How can I get my outputstream and use it
	    	
	    }
	}

And in PageSaving method, I don’t know how can I get my outputstream.

Thank you in advance for your response.

@amepage

Please set the value of these properties as true to avoid the shared exception. If you still face problem, please ZIP and attach your input Word document here for testing. We will investigate the issue and provide you more information on it.

@tahir.manzoor Thank you! Unfortunately, as I am working after on the display of the HTML document as a string, headers and footers are at the top of the document. But I can contourn the problem.

However, I have after my HTML string which is displayed. I need then to convert this HTML string to a document word (my HTML string can be modified by the user after it is loaded from a word).

So in this case, I use this code:

Document doc = new Document();
Document builder = new DocumentBuilder(doc);

builder.insertHtml(strHTML);

doc.save(osData, SaveFormat.DOC);

where osData is an outputstream of the string.

Actually it is working very well, except the page numbers, all are equals to 1, and I don’t succeed in setting the good pagenumbers which are in the footer. Is there a way to update all pagenumber to the correct page number?

I saw some of code with this

builder.moveToHeaderFooter(HeaderFooterType.FOOTER_PRIMARY);
builder.insertField("PAGE "); which autocalculate the page number of the current page, and add to the footer the good page number, but I have the old page number after the good one it gives this in the word:

11 for page 1
21 for page 2
31 for page 3

So is it possible for this case to remove the old wrong number?

Thank you in advance for your response.

@amepage

To ensure a timely and accurate response, please attach the following resources here for testing:

  • Your input Word and HTML documents.
  • Please attach the output Word/HTML document that shows the undesired behavior.
  • Please attach the expected output document that shows the desired behavior.
  • Please create a simple Java application ( source code without compilation errors ) that helps us to reproduce your problem on our end and attach it here for testing.

As soon as you get these pieces of information ready, we will start investigation into your issue and provide you more information. Thanks for your cooperation.

PS: To attach these resources, please zip and upload them.

@tahir.manzoor

I have some troubles to make a java poject, as I have implemented the different functions I made to our software. I can give you the first word Document I use, which is composed of 2 simple pages.

First step: convert Document to HTML

Document doc = new Document(dataDir); where dataDir is the path of the word file.

HtmlSaveOptions saveOptions = new HtmlSaveOptions();
saveOptions.setSaveFormat(SaveFormat.HTML);
saveOptions.setExportPageSetup(true);
saveOptions.setExportTocPageNumbers(true);
saveOptions.setExportImagesAsBase64(true);
saveOptions.setExportFontsAsBase64(true);
saveOptions.setExportPageMargins(true);
saveOptions.setExportDocumentProperties(true);
saveOptions.setPrettyFormat(true);
			saveOptions.setExportHeadersFootersMode(ExportHeadersFootersMode.PER_SECTION);
doc.save(dataDir + "htmlFile.html", saveOptions);

Then, I can give you the HTML string of the HTML File previously converted into string. I do this because my software has some plugins which allows it.

Second step: I write some text to the HTML string, and then I use the following convert: HTML string to Document

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
			
builder.insertHtml(strHTML);

doc.save(dataDir + "finalWordDoc.doc", SaveFormat.DOC);

You can find in attachments the first word file, then the html string which is not modified, the html string with some text, and the final word doc converted from the string, which has wrong pagenumbers.aspose document.zip (68.7 KB)

@amepage

In FirstDocument.docx, the page number is represented as page field in the footer. MS Word 2016 shows the the page numbers correctly on all pages.

You are facing the expected behavior of Aspose.Words. Could you please share your expected output Word document? We will then provide you more information about your query.

@tahir.manzoor

Here the document obtained with the second step, I have page numbers which are the same.AsposeFinalWordDocAfterConvertingHTMLStringToWord.zip (19.2 KB)

@amepage

Please note that the roundtrip information is exported as -aw-* CSS properties of the corresponding HTML elements. Aspose.Words uses this information to import the document’s content back to DOCX e.g. page field, content control etc.

Please check the following HTML fragment for page number field. This snippet does not exist in your HTML string. This is the reason you are getting “1” on both pages. Please use this snippet in your input HTML to get the desired output.

<div style="-aw-headerfooter-type:footer-primary; clear:both">
<div style="-aw-sdt-tag:’’">
<p style=“margin-top:0pt; margin-bottom:0pt; text-align:justify; font-size:11pt”>
<span style="-aw-field-start:true"></span><span style="-aw-field-code:‘PAGE \* MERGEFORMAT’"></span><span style="-aw-field-separator:true"></span><span style=“font-family:Montserrat; color:#3b3838”>1</span><span style="-aw-field-end:true"></span>
</p>
</div>
</div>

@tahir.manzoor

Thank you for your solution. It worked well, even if I have to manipulate the HTML content.
Do you know if there is some documentation about the html span which indicate what is an header, footer and other part of a word document, like <div style="-aw-headerfooter-type:footer-primary; clear:both"> or the snippet part you gave me?

Thank you

@amepage

Please read following article about custom styles that Aspose.Words uses to save extra information in output HTML.
Custom Styles used for proper Aspose.Words-HTML-Aspose.Words Roundtrip