The numbering after page splitting is different from origin Word file

ChengHuang · May 22, 2017, 1:40am

Hi there

I use Aspose Word 17.4 to split Word files page by page, and save them into HTML format.

It seem that the numbering is not continuing in the page below page #1, after splitting.

Please check the attachment, there is the Word file along with the page result.

And also my testing code:

@Test

public void testForAspose() {

try {

Document document = new Document(“input/meowBase.docx.docx”);

Document pageDocument;

LayoutCollector layoutCollector;

DocumentPageSplitter pageSplitter;

ByteArrayOutputStream output = new ByteArrayOutputStream();

HtmlSaveOptions htmlSaveOp = new HtmlSaveOptions();

htmlSaveOp.setExportImagesAsBase64(true);

htmlSaveOp.setExportTextInputFormFieldAsText(false);

htmlSaveOp.setExportTocPageNumbers(true);

htmlSaveOp.setExportPageSetup(true);

htmlSaveOp.setExportDocumentProperties(true);

htmlSaveOp.setExportRelativeFontSize(false);

htmlSaveOp.setUpdateFields(true);

layoutCollector = new LayoutCollector(document);

document.updatePageLayout();

pageSplitter = new DocumentPageSplitter(layoutCollector);

byte[] htmlByteArray;

String outPath = “output/”;

String targetDir = UUID.randomUUID().toString();

File outputDir = new File(outPath + “/” + targetDir + “/”);

if (!outputDir.exists())

outputDir.mkdir();

ByteArrayOutputStream savingOutputStream = new ByteArrayOutputStream();

for (int pageNum = 1; pageNum <= document.getPageCount(); pageNum++) {

System.out.println(“page:” + pageNum);

pageDocument = pageSplitter.getDocumentOfPage(pageNum);

savingOutputStream.reset();

output.reset();

pageDocument.save(output, htmlSaveOp);

htmlByteArray = output.toByteArray();

IOUtils.write(htmlByteArray, new FileOutputStream(outPath + “/” + targetDir + “/” + pageNum + “.html”));

}

} catch (Exception e) {

e.printStackTrace();

}

tahir.manzoor · May 22, 2017, 7:37am

Hi there,

Thanks for your inquiry. Your document contains the PAGE field in the footer. When you split Word document page by page, the output document have only one page. In this case, PAGE field returns value “1”. Please note that there is no ‘‘Page’’ concept in HTML. In your case, we suggest you following solution. Hope this helps you.

<pre style=“font-family: “Courier New”; font-size: 9pt;”>for (int pageNum = 1; pageNum <= document.getPageCount(); pageNum++) {

pageDocument = pageSplitter.GetDocumentOfPage(pageNum);
DocumentBuilder builder = new DocumentBuilder(pageDocument);
if(pageDocument.getFirstSection() != null)
{
for (Field field : pageDocument.getFirstSection().getRange().getFields())
{
if(field.getType() == FieldType.FIELD_PAGE)
{
builder.moveToField(field, true);
builder.write(""+pageNum);
field.remove();
}
}
savingOutputStream.reset();
output.reset();

pageDocument.save(output, htmlSaveOp);
htmlByteArray = output.toByteArray();

IOUtils.write(htmlByteArray, new FileOutputStream(outPath + “/” + targetDir + “/” + pageNum + “.html”));
}
}

ChengHuang · May 22, 2017, 9:49pm

Hi

Sorry for misunderstanding.

Please see the comparison image in the attachment.

I mean the numbering list in the content, not the page numbering at the footer.

tahir.manzoor · May 23, 2017, 2:58am

Hi there,

Thanks for sharing the detail. Please note that Aspose.Words mimics the same behavior as MS Word does. If you copy the contents of second page of document into new document using MS Word, you will get the same output.

In this case, we suggest you following solution.

1) After extracting the page’s content, get the last list number.

2) Get the paragraph that is member of a list from next page’s content.

3) Set the starting number of list level for next page’s content.

Hope this helps you.

<pre style=“font-family: “Courier New”; font-size: 9pt;”>Document doc = new Document(MyDir + “meowBase.docx”);
LayoutCollector layoutCollector = new LayoutCollector(doc);
doc.updatePageLayout();

HtmlSaveOptions htmlSaveOp = new HtmlSaveOptions();
htmlSaveOp.setExportImagesAsBase64(true);
htmlSaveOp.setExportTextInputFormFieldAsText(false);
htmlSaveOp.setExportTocPageNumbers(true);
htmlSaveOp.setExportPageSetup(true);
htmlSaveOp.setExportDocumentProperties(true);
htmlSaveOp.setExportRelativeFontSize(false);
htmlSaveOp.setUpdateFields(true);

DocumentPageSplitter splitter = new DocumentPageSplitter(layoutCollector);

Integer priviousListLevel = null;
Document pageDoc;
for (int page = 1; page <= doc.getPageCount(); page++) {

pageDoc = splitter.GetDocumentOfPage(page);
pageDoc.updateListLabels();

if (priviousListLevel != null) {
for (Paragraph para : (Iterable)pageDoc.getChildNodes(NodeType.PARAGRAPH, true))
{
if(para.isListItem())
{
com.aspose.words.List list = para.getListFormat().getList();
list.getListLevels().get(0).setStartAt(priviousListLevel+1);
break;
}
}
}

priviousListLevel = null;
int labelvalue = 0;
Node[] nodes = pageDoc.getChildNodes(NodeType.PARAGRAPH, true).toArray();
for (int i = nodes.length - 1; i >= 0; i–) {
Paragraph paragraph = (Paragraph) nodes[i];
if (paragraph.getListFormat().isListItem()) {
ListLabel label = paragraph.getListLabel();
labelvalue = label.getLabelValue();
priviousListLevel = labelvalue;
break;
}
}
pageDoc.save(MyDir + “Out_”+page+".html", htmlSaveOp);
}