Accuracy of PageSplitter

Hi

I use Aspose.words and PageSplitter to help me splitting a doc file into multiple doc files.

These doc files after splitting seems not all the same as the pages of original doc file in the MS Office Word.

For example, there might be some empty page docs, which the original doc doesn’t contain.

Or there are some text moved to the wrong page.

So, Is there any way to make the splitting result much more closer to the original doc file?

Hi Craigabyss,

Thanks for your inquiry. In case you are using an older version of Aspose.Words, I would suggest you please upgrade to the latest version (v15.1.0) from here and let us know how it goes on your side. If the problem still remains, please attach your input Word document here for testing. I will investigate the issue on my side and provide you more information.

Hi

I am using Aspose Words 15.1.0.

I uploaded a doc file and splitted doc files by following code:

FileStream stream = new FileStream("…/…/test.doc", FileMode.Open);
Document doc = new Document(stream);
LayoutCollector layoutCollector = new LayoutCollector(doc);
doc.UpdatePageLayout();
DocumentPageSplitter splitter = new DocumentPageSplitter(layoutCollector);
for (int page = 1; page <= doc.PageCount; page++)
{
    try
    {
        Document pageDoc = splitter.GetDocumentOfPage(page);
        pageDoc.Save("…/…/result/" + page + ".doc");
    }
    finally
    { }
}

There are some pages quite different from the original one.

And the page count of the result is more than original file.

I hope this helps make it better, thanks

Hi Craigabyss,

Thanks for sharing the detail. I have tested the scenario and have managed to reproduce the same issue at my side. For the sake of correction, I have logged this problem in our issue tracking system as WORDSNET-11461. I have linked this forum thread to the same issue and you will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

I’m having a problem with the PageSplitter example code also. The first line of a pages is added to the end previous page. Is there updated PageSplitter code? I have example code from 7/29/2015. It seems to be ignoring the hard page breaks.

Thanks,

Tony

Exscribe, Inc

Hi Tony,

Thanks for your inquiry. Could you please attach your input Word document here for testing? Please also share the page number of document which have incorrect output. I will investigate the issue on my side and provide you more information.

Thanks Tahir,

Here’s a RAR of the original file and the 4 pages it splits to. you’ll see that the date is at the end of the page where it should be at the beginning of the next one.

any help would be great. Of course I’m on a deadline and may have to come up with another solution like searching for the page-breaks and splitting there.

Thanks!

Tony

Sr. Systems Programmer

Exscribe, Inc
www.exscribe.com

Hi Tony,

Thanks
for sharing the detail. In your input document, the page breaks are in Run nodes as [Page break]April 2, 2015. So the text ‘April 2, 2015’ is at page 2 and page break is at page 1. Due to this issue, you are not getting the correct output. You can workaround this issue by using following code example.

Hope this helps you. Please let us know if you have any more queries.

Document doc = new Document(MyDir + "TestPages.doc");
LayoutCollector collector = new LayoutCollector(doc);
// Retrieve all paragraphs in the document.
NodeCollection paragraphs = doc.GetChildNodes(NodeType.Paragraph, true);
// Iterate through all paragraphs
foreach (Paragraph para in paragraphs)
{
    foreach (Run run in para.Runs)
    {
        Console.WriteLine(run.Text);
        if (run.Text.StartsWith(ControlChar.PageBreak))
        {
            if (run.ParentParagraph.PreviousSibling != null)
            {
                ((Paragraph)run.ParentParagraph.PreviousSibling).AppendChild(new Run(doc, ControlChar.PageBreak));
            }
            run.Text = run.Text.Replace(ControlChar.PageBreak, "");
        }
    }
}
doc.Save(MyDir + "Out.doc");

DocumentPageSplitter splitter = new DocumentPageSplitter(collector);
for (int i = 1; i <= doc.PageCount; i++)
{
    Aspose.Words.Document dstDoc = splitter.GetDocumentOfPage(i);
    dstDoc.Save(MyDir + "Out" + i + ".docx");
}

PERFECT! That worked.

Thank you so much. I did see that the page numbers for that paragraph was spread over two pages,but I wasn’t quite sure how to work around it.

Thanks again,

Tony

Hi Tony,

Thanks
for your feedback. Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.