I have large pdf file around pages of 92k , not able to extract the text , getting memory out of exception , look into this tried following code
code 1:
`foreach (Page pdfPage in pdfDocument.Pages)
{
using (MemoryStream textStream = new MemoryStream())
{
// Create text device
TextDevice textDevice = new TextDevice(new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure));
// Convert a particular page and save text to the stream
textDevice.Process(pdfPage, textStream);
textStream.Close();
pdfText += $"{Encoding.Unicode.GetString(textStream.ToArray())} ";
}
pdfPage.Dispose();
}
}
`
code 2:
// Open document
using (Document pdfDocument = new Document(pdfPath))
{
// Create TextAbsorber object to extract text
TextAbsorber textAbsorber = new TextAbsorber();
// Accept the absorber for all the pages
pdfDocument.Pages.Accept(textAbsorber);
// Get the extracted text
return textAbsorber.Text;
}
not able to attached the pdf here