Hi Team,
When we convert docx to html, some blank pages are being generated in the html file.
Example: This image belongs to the converted HTML page :
This image belongs to the word document. As we can see , it does not have any blank page in it
I am attaching the word document and converted html as well for reference
TEST.docx (9.3 MB)
TEST_HTML.7z (67.9 KB)
@RV_2348 This is the expected result. If you render your document using MS Word (convert it to PDF or XPS) you will see exactly the same result.
I have simplified the document to the following:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 wp14 w16se w16cid">
<w:body>
<w:p w:rsidR="00F41738" w:rsidRPr="003B177B" w14:paraId="6F55A8BA" w14:textId="66E2E825">
<w:r w:rsidRPr="003B177B">
<w:t>Some paragraph</w:t>
</w:r>
</w:p>
<w:p w:rsidR="00E55508" w:rsidRPr="003B177B" w:rsidP="00B87A91" w14:paraId="1D94B8F6" w14:textId="39DAE850">
<w:pPr>
<w:sectPr w:rsidSect="007172B0">
<w:pgSz w:w="12240" w:h="15840" w:code="1" />
<w:pgMar w:top="1008" w:right="1440" w:bottom="864" w:left="1440" w:header="576" w:footer="576" w:gutter="144" />
<w:cols w:space="720" />
</w:sectPr>
</w:pPr>
</w:p>
<w:p w:rsidR="00E55508" w:rsidRPr="003B177B" w:rsidP="00425FBF" w14:paraId="6D887115" w14:textId="3358E3A8">
<w:r w:rsidRPr="003B177B">
<w:t>Some paragraph 2</w:t>
</w:r>
</w:p>
<w:sectPr w:rsidSect="007172B0">
<w:pgSz w:w="12240" w:h="15840" w:code="1" />
<w:pgMar w:top="1008" w:right="1440" w:bottom="864" w:left="1440" w:header="576" w:footer="576" w:gutter="144" />
<w:pgNumType w:start="1" />
<w:cols w:space="720" />
</w:sectPr>
</w:body>
</w:document>
As you can see the only different between the first and the last section page setup is <w:pgNumType w:start="1" />
if remove this element the document will have 2 pages instead of 3 (when rendered).in.docx (65.2 KB)
With your document you can reset this property using the following code:
Document doc = new Document(@"C:\Temp\in.docx");
doc.LastSection.PageSetup.RestartPageNumbering = false;
doc.Save(@"C:\Temp\out.html", SaveFormat.HtmlFixed);
1 Like
Is there any other way to solve this issue without setting RestartPageNumbering = false ? I will be using new page start number multiple times in my docx. For example: In my document there are multiple chapters and each chapter should start with a new page number. But, I don’t want any blank pages in the converted HTML.
@RV_2348 I am afraid there is no other way to work this problem around, since Aspose.Words behavior matches MS Word behavior.