We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Split a Word file by pages and a redundant page comes out

Hi there

I am testing splitting Word files by pages with Aspose Word 17.2.0 and PageSplitter.
Here is the test code:

@Test
public void splitTest() throws Exception{
    String fileName = "MAYJIANG_HAWAY_Improvement 2011.03.01.doc";
    String wordPassword = "";
    byte[] wordContent = IOUtils.toByteArray(new FileInputStream("custom/input/docx/"+fileName));
    splitMethod(wordContent, wordPassword, fileName);
}

protected void splitMethod(byte[] wordContent, String wordPassword, String fileName) throws Exception {
    Document wordDoc = null;

    try {
        if (StringUtil.isNotEmpty(wordPassword)) {
            LoadOptions loadOps = new LoadOptions(wordPassword);
            wordDoc = new Document(new ByteArrayInputStream(wordContent), loadOps);
        } else {
            wordDoc = new Document(new ByteArrayInputStream(wordContent));
        }

        String ext = FilenameUtils.getExtension(fileName);

        LayoutCollector layoutCollector = new LayoutCollector(wordDoc);
        wordDoc.updatePageLayout();
        DocumentPageSplitter splitter = new DocumentPageSplitter(layoutCollector);

        int totalPage = wordDoc.getPageCount();
        System.out.println("totalPage:" + totalPage);
        int index = 0;
        while (index < totalPage) {
            Document pageDoc = null;
            ByteArrayOutputStream stream = new ByteArrayOutputStream();
            File outputSinglePageWordFile = new File(
                    "custom/input/docx/" + "split_" + fileName + "/" + (index + 1) + "." + ext);
            FileUtil.md(outputSinglePageWordFile.getParentFile());
            try {

                pageDoc = splitter.getDocumentOfPage((index + 1));
                pageDoc.save(stream, SaveFormat.fromName(ext.toUpperCase()));

            } finally {
                IOUtils.closeQuietly(stream);
            }
            IOUtils.write(stream.toByteArray(), new FileOutputStream(outputSinglePageWordFile));
            index++;
        }
    } finally {
    }
}

In the result #1, opened with MS Word, you can see there is one more redundant empty page.

I have uploaded the origin Word file and the result.
Please check the attachment, and help us solve this issue, thanks~

Craig

Hi Craig,

Thanks for your inquiry. Please note your source document has empty paragraphs those creating the extra page issue in document splitting. Please remove empty paragraphs from your document before splitting the word document. Kindly check following code snippet for the purpose, it will help you to resolve the issue.

Document doc = new Document("MAYJIANG_HAWAY_Improvement+2011.03.01.doc");
for (Paragraph paragraph : (Iterable)doc.getChildNodes(NodeType.PARAGRAPH, true))
{
    if (paragraph.toString(SaveFormat.TEXT).trim().equals(""))
    {
        paragraph.remove();
    }
}

Please feel free to contact us for any further assistance.

Best Regards,

Hi there

I add the code snippet like this:

if (StringUtil.isNotEmpty(wordPassword))
{
    LoadOptions loadOps = new LoadOptions(wordPassword);
    wordDoc = new Document(new ByteArrayInputStream(wordContent), loadOps);
}
else
{
    wordDoc = new Document(new ByteArrayInputStream(wordContent));
}

for (Paragraph paragraph : (Iterable)wordDoc.getChildNodes(NodeType.PARAGRAPH, true))
{
    if (paragraph.toString(SaveFormat.TEXT).trim().equals(""))
    {
        paragraph.remove();
    }
}

The redundant page in split page #1 is gone
But the page header disappear as well, it is not like the result we want.
Please check the result again in the attachment.

Hi Craig,

Thanks for your feedback. We are looking into the issue and will update you soon.

Best Regards,

Hi Craig,

Thanks for your patience. Please remove the empty paragraphs without any children, It will resolve the header issue. Please check following code snippet, it will help you to accomplish the task.

com.aspose.words.Document doc = new com.aspose.words.Document("MAYJIANG_HAWAY_Improvement+2011.03.01.doc");
for (Paragraph paragraph : (Iterable)doc.getChildNodes(NodeType.PARAGRAPH, true))
{
    if (paragraph.toString(com.aspose.words.SaveFormat.TEXT).trim().equals(""))
    {
        if (paragraph.hasChildNodes())
            continue;
        paragraph.remove();
    }
}

We are sorry for the inconvenience.

Best Regards,

Hi Tilal.Ahmad

Thanks for your information.
This code snippet works fine with this sample file.

I would like to share another sample file.
The positions of image and text block seem a little lower in the result.
Further more in result page #2, they are covered by the other contents.

Please check the attachment, and thank you for the help

Craig

Hi Craig,

Thanks for your feedback. We have tested the scenario with updated document and noticed the reported issue. So we have logged a ticket WORDSNET-15193 in our issue tracking system for further investigation and rectification. We will keep you updated about the issue resolution progress within this forum thread.

We are sorry for the inconvenience.

Best Regards,