Divide Extract Individual Pages into Separate Word Documents using C# Java | PageSplitter DocumentPageSplitter PageNumberFinder SectionSplitter

KevinOldCastle · April 11, 2013, 3:12pm

I have been tasked with dividing up a relatively large document into individual pages. I have referenced the information that was available for Aspose Documents and have had mixed successes thus far. I am relatively new to aspose. I have been dividing my document up using Sections. However, it appears that the document is no longer divided up by sections. Is there another way to divide a document up into pages using something other sections.

Please advise,

Thanks,

Kevin

awais.hafeez · April 12, 2013, 1:29am

Hi Kevin,

Thanks for your inquiry.

Please note that Microsoft Word documents are flow documents and are not natively laid out into lines and pages. I think, in your case you can achieve splitting of document into pages by using the utility methods available in the attached ‘PageNumberFinder’ class.

For example, you can use the code like below to extract pages to an external document. The SplitNodes method will split the sections of the document which contain content across multiple pages into separate sections, which are one per page. You can then extract each page by extracting each section and insert it into a new document.

Document doc = new Document("Document.docx");

// Set up the document which pages will be copied to. Remove the empty section. 

Document dstDoc = new Document();

dstDoc.RemoveAllChildren();

PageNumberFinder finder = new PageNumberFinder(doc);

// Split nodes which are found across pages.

finder.SplitNodesAcrossPages(true);

// Copy all content including headers and footers from the specified pages into the destination document.

ArrayList pageSections = finder.RetrieveAllNodesOnPage(3, 5, NodeType.Section);

foreach (Section section in pageSections)

dstDoc.AppendChild(section);

dstDoc.Save(dataDir + "Document Out.docx");

If you have any issues, please attach your document here for testing.

Best regards,

KevinOldCastle · April 12, 2013, 10:49am

Thanks for the help. It looks like the PageSplitter class is needed in order to use the PageNumberFinder class that you provided.

awais.hafeez · April 15, 2013, 1:08am

Hi Kevin,

Thanks for your inquiry. Please find attached the missing PageSplitter class. I apologize for any inconvenience.

Best regards,

KevinOldCastle · April 16, 2013, 8:26am

Thanks for your quick responses to my question. I’m still having some difficulty using the PageFinder class in order to extract each individual page from a document. I’m attaching a copy of a data-less report along with the code. Can you please help me?

Thank you

Document doc = new Document(@"c:\TestFiles\Prod3.docx");
// Set up the document which pages will be copied to. Remove the empty section. 
Document dstDoc = new Document();
dstDoc.RemoveAllChildren();
PageNumberFinder finder = new PageNumberFinder(doc);
// Split nodes which are found across pages.
finder.SplitNodesAcrossPages(true);
// Copy all content including headers and footers from the specified pages into the destination document.
ArrayList pageSections = finder.RetrieveAllNodesOnPages(1, 5, NodeType.Section);
foreach (Section section in pageSections)
    dstDoc.ImportNode(section, true);
//dstDoc.AppendChild(section);
dstDoc.Save(@"C:\TestFiles\Prod111.docx");

v

awais.hafeez · April 17, 2013, 12:50am

Hi Kevin,

Thanks for your inquiry. Please change your code as follows:

…
...
// Copy all content including headers and footers from the specified pages into the destination document.
ArrayList pageSections = finder.RetrieveAllNodesOnPages(1, 5, NodeType.Section);
foreach (Section section in pageSections)
    dstDoc.AppendChild(dstDoc.ImportNode(section, true));
...
...

I hope, this helps.

Best regards,

KevinOldCastle · April 17, 2013, 10:13am

The code ultimately copy’s the file into a new file only it does not import the specified pages. It imports the entire document because the document does not have multiple sections but has multiple pages. The attached document is an example of a document that I’m trying to extract pages from. Please run the code on this document to see that it only copies the entire document to another document. I’m looking to extract pages from it.

Thanks for your help.

awais.hafeez · April 18, 2013, 12:18am

Hi Kevin,

Thanks for your inquiry.

I tested your scenario with the latest version of Aspose.Words i.e. 13.3.0 and was unable to reproduce this issue on my side. I have attached the DOCX file, i.e. generated on my side, here for your reference. You can download the latest version of Aspose.Words (13.3.0) from the following link:

https://downloads.aspose.com/words/net

I hope, this helps.

Secondly, the code imports content of each page in source document into a separate Section in destination document. You can verify it by checking the size of destination document (The size of NoData.docx is 155kb while the size of out.docx is just 72kb). Moreover, you can remove the empty Paragraph from the end of last Section by using the following line of code:

dstDoc.LastSection.Body.LastParagraph.Remove();

Best regards,

KevinOldCastle · April 18, 2013, 1:29am

Should I use the new DLLs?

awais.hafeez · April 18, 2013, 3:44am

Hi Kevin,

Thanks for your inquiry. Yes, I would suggest you please upgrade to the latest version of Aspose.Words. Please let me know if I can be of any further assistance.

Best regards,

awais.hafeez · April 19, 2013, 2:53am

Hi Kevin,

Thanks for your inquiry. I have attached a sample Console Application here for your reference. Please try execute this with Aspose.Words 13.3.0 on your side and let me know how does it go? I hope, this helps.

Best regards,

KevinOldCastle · April 22, 2013, 10:08am

The code in this project seems to work just fine. Thank you for your help.

prathyusha.korrapati · May 18, 2020, 10:06pm

Hi…
where are the PageNumberFinder and Pagesplitter classes that you sent to KevinOldCastle? I don’t them find them in github . Can you please send to me

awais.hafeez · May 19, 2020, 6:58am

@prathyusha.korrapati,

The ‘PageSplitter’, ‘DocumentPageSplitter’, ‘PageNumberFinder’, ‘SectionSplitter’ classes etc can be found at the following link:

Download PageSplitter.cs from GitHub