Extract text from a pages range in a PDF document using C# with Aspose.PDF

tparassin · April 29, 2020, 9:06am

hi, we use the following code to extract text from a pdf file.
How to extract text only from a pages range ?
Do we have to set a loop into the pdf pages ?

is there a mean to speed the process up ?

thanks in advance.

                // Create TextAbsorber object to extract text
                TextAbsorber textAbsorber = new TextAbsorber();
                textAbsorber.ExtractionOptions.FormattingMode = TextExtractionOptions.TextFormattingMode.MemorySaving;
                // Accept the absorber for all the pages
                pdfDocument.Pages.Accept(textAbsorber);
                // Get the extracted text
                string extractedText = textAbsorber.Text;

asad.ali · April 29, 2020, 6:26pm

@tparassin

Thanks for contacting support.

The text extraction process can vary between different type of PDF documents as time cost depends upon the complexity and structure of the PDF file that is being processed. You can surely loop through different pages in order to extract text from your desired pages. In case you are experiencing any performance issue, please share your sample PDF document with us so that we can test the scenario in our environment and address it accordingly.