for your inquiry. Sure, you can achieve this using the “PageSplitter”
example project. You can find PageSplitter project in Aspose.Words for .NET examples repository at GitHub.
Please let us know if we can be of any further assistance.
Document doc = new Document(docName);
// Create and attach collector to the document before page layout is built.
LayoutCollector layoutCollector = new LayoutCollector(doc);
// This will build layout model and collect necessary information.
// Split nodes in the document into separate pages.
DocumentPageSplitter splitter = new DocumentPageSplitter(layoutCollector);
Document newDoc = splitter.GetDocumentOfPageRange(3, 5);
newDoc.Save(MyDir + “Out.docx”);
Thanks Tahir for the examples.
for your inquiry. The DocumentPageSplitter class uses the PageNumberFinder class. If you want to extract the contents of a document page by page, you can use PageSplitter. There is no disadvantages of this code example.
Could you please share in which scenario you are using DocumentVisitor along with code? We will then provide you more information about your query.
We are having MS-Word document which consists of 50 pages and we need to extract the entire text from the word document. We will keep this extracted string as a source and we will do indexof search for 140000 records using Parallel.For each loop. We tried the following things to achieve this functionality.
1.Extract the entire document text using Range.Text method.
2.Extract entire page text using DocumentVisitor method and do the string search.
PageNumberFinder finder = new PageNumberFinder(pdfWholeDocument);
MyDocToTxtWriter myConverter = new MyDocToTxtWriter();
string docContent = myConverter.GetText();
Attached DocumentVisitor class for your reference.
In the above methods the performance is really high while using the documentVisitor , So we would like to know whether the document visitor approach is having any disadvantages.
for your inquiry.
Yes, in your case, using DocumentVisitor is faster approach. From the shared code, It seems that you are using older version of PageNumberFinder. I suggest you please use the latest code of PageNumberFinder class. Please find this code in “PageSplitter”
Moreover, Aspose.Words uses our own Rendering Engine to layout documents into pages. The Aspose.Words.Layout namespace provides
classes that allow to access information such as on what page and where
on a page particular document elements are positioned, when the document
is formatted into pages. Please read about LayoutCollector and
LayoutEnumerator from here:
Please let us know if you have any more queries.