Unable to read text from PDF document

I am trying to read the contents from one of my PDF which is a multi language document . I tried with PDF Extractor as well as with TextAbsorber both but it’s giving an error “Index was outside the bounds of the array” at line “extractor.ExtractText(Encoding.ASCII)”. Please find the attached document and below code which i used to extract the pdf content :


private static string GetPdfFileContents(string fileName)
{
PdfExtractor extractor = new PdfExtractor();
//bind PDF file with the extractor object
extractor.BindPdf(fileName);

return GetPdfFileContents(extractor);
}

private static string GetPdfFileContents(PdfExtractor extractor)
{
string contents = string.Empty;
string tempPath = Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location) + @"/Temp";
string outFilePath = tempPath + @"/temptOut.txt";
if (!Directory.Exists(tempPath))
{
Directory.CreateDirectory(tempPath);
}

//extract all text from the PDF
extractor.ExtractText(Encoding.ASCII);
//save extracted text in a text file
extractor.GetText(outFilePath);

contents = System.IO.File.ReadAllText(outFilePath);

System.IO.File.Delete(outFilePath);
return contents;
}

Hi Kamal,

Thanks for your inquiry. I have tested your scenario with shared document using Aspose.Pdf for .NET 11.7.0 and managed to observe the reported exception. For further investigation, I have logged an issue in our issue tracking system as PDFNEWNET-40903 and also linked your request to it. We will keep you updated via this thread regarding the issue status.

We are sorry for the inconvenience caused.

The issues you have found earlier (filed as PDFNET-40903) have been fixed in Aspose.PDF for .NET 19.4.