We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Download DocumentPageSplitter Class

Hi Team,


I need to extract the pages one by one using Aspose.Words for .NET

Previously i was using the page finder class and below code given by aspose team. Now i heard there was DocumentPageSplitter class to extract specific pages.

Please help me to download that class and its related components.

Current code used:
// Set up the document which pages will be copied to. Remove the empty section.
Aspose.Words.Document pageDocument = new Aspose.Words.Document();
pageDocument.RemoveAllChildren();

PageNumberFinder finder = new PageNumberFinder(pdfWholeDocument);
// Split nodes which are found across pages.
finder.SplitNodesAcrossPages(true);
// Copy all content including headers and footers from the specified pages into the destination document.
System.Collections.ArrayList pageSections = finder.RetrieveAllNodesOnPages(Convert.ToInt16(docPageCount.GetValue(i).ToString()), Convert.ToInt16(docPageCount.GetValue(i).ToString()), Aspose.Words.NodeType.Section);

foreach (Aspose.Words.Section section in pageSections)
{
pageDocument.AppendChild(pageDocument.ImportNode(section, true));
}


Hi Senthil,


Thanks
for your inquiry. Sure, you can achieve this using the “PageSplitter”
example project. You can find PageSplitter project in Aspose.Words for .NET examples repository at GitHub.

Please let us know if we can be of any further assistance.


Document doc = new Document(docName);

// Create and attach collector to the document before page layout is built.

LayoutCollector layoutCollector = new LayoutCollector(doc);

// This will build layout model and collect necessary information.

doc.UpdatePageLayout();

// Split nodes in the document into separate pages.

DocumentPageSplitter splitter = new DocumentPageSplitter(layoutCollector);

Document newDoc = splitter.GetDocumentOfPageRange(3, 5);

newDoc.Save(MyDir + “Out.docx”);

Thanks Tahir for the examples.


I’m able to find few examples to retrieve the document contents as below
1. Using PageNumberFinder class
2. Using DocumentPageSplitter class
3. Using DocumentVisitor class.

Can you please advise which one of the above methods will be more efficient and disadvantages if any.

Hi Senthil,


Thanks
for your inquiry. The DocumentPageSplitter class uses the PageNumberFinder class. If you want to extract the contents of a document page by page, you can use PageSplitter. There is no disadvantages of this code example.

Could you please share in which scenario you are using DocumentVisitor along with code? We will then provide you more information about your query.

We are having MS-Word document which consists of 50 pages and we need to extract the entire text from the word document. We will keep this extracted string as a source and we will do indexof search for 140000 records using Parallel.For each loop. We tried the following things to achieve this functionality.

1.Extract the entire document text using Range.Text method.

2.Extract entire page text using DocumentVisitor method and do the string search.


DocumentVisitor Code

PageNumberFinder finder = new PageNumberFinder(pdfWholeDocument);

MyDocToTxtWriter myConverter = new MyDocToTxtWriter();

pdfWholeDocument.Accept(myConverter);

string docContent = myConverter.GetText();

Attached DocumentVisitor class for your reference.


In the above methods the performance is really high while using the documentVisitor , So we would like to know whether the document visitor approach is having any disadvantages.

Hi Senthil,


Thanks

for your inquiry.

Yes, in your case, using DocumentVisitor is faster approach. From the shared code, It seems that you are using older version of PageNumberFinder. I suggest you please use the latest code of PageNumberFinder class. Please find this code in “PageSplitter”
example project
.

Moreover, Aspose.Words uses our own Rendering Engine to layout documents into pages. The Aspose.Words.Layout namespace provides
classes that allow to access information such as on what page and where
on a page particular document elements are positioned, when the document
is formatted into pages. Please read about LayoutCollector and
LayoutEnumerator from here:
http://www.aspose.com/docs/display/wordsnet/LayoutCollector+class
http://www.aspose.com/docs/display/wordsnet/LayoutEnumerator+class

Please let us know if you have any more queries.