Extract Text for a specific page in word file

Greetings,

I need to generate individual text file for each page in the word file. I know that we can generate individual TIFF file for each page by making use of the ImageSaveOptions object and specifying the Page Number but I don’t think such option is present in the TxtSaveOption class.

Is there any efficient workaround available for this?

Looking for an urgent reply.

Regards,
Syed.

Hi Syed,

Thanks for your inquiry.

I’m afraid there is no options to specify the page range used when saving to flow formats. Your request has been linked to the appropriate issue. We will inform you as soon as there are any developments.

In the mean time you can try using the PageFinder code from this page: How to findout Pageno of "Run" Node. Using this code you can extract only the nodes in the page range you want. Please see the code below.

public static void SaveDocumentWithPageRange(Document doc, string outputPath, int fromPage, int toPage)
{
    Document docCopy = doc.Clone();
    PageNumberFinder finder = new PageNumberFinder(docCopy);

    // Split all nodes in the document including sections so they appear on one page only.
    finder.SplitNodesAcrossPages(true);

    // Remove any nodes on pages that are outside our desired page range.
    ArrayList sectionsToRemove = finder.RetrieveAllNodesOnPages(0, fromPage - 1, NodeType.Section);
    sectionsToRemove.AddRange(finder.RetrieveAllNodesOnPages(toPage + 1, doc.PageCount + 1, NodeType.Section));

    foreach (Section section in sectionsToRemove)
        section.Remove();

    // All that should remain is the content from the desired page range. Save this content to disk in the appropriate format.
    docCopy.Save(outputPath);
}

Hopefully this helps.

Thanks,

The issues you have found earlier (filed as WORDSNET-5643) have been fixed in this Aspose.Words for .NET 21.4 update and this Aspose.Words for Java 21.4 update.