Extract Text from PDF document using Aspose.PDF for .NET - index out of range exception

I’m getting an index out of range when running a .ExtractText method against the following file.

http://www2.cybercom-intl.com/transfer/project_manual.pdf

aspose.PDF.dll Version 20.6.0.0 6/1/2020

@CoastalCIU

Could you please share the sample code snippet that you are using to extract the text from this PDF. We will test the scenario in our environment and address it accordingly.

Dim byteDoc As Byte()
byteDoc = System.IO.File.ReadAllBytes(“c:\temp\project_manual.pdf”)
Dim msPDF As MemoryStream = New MemoryStream(byteDoc)
Dim extractor As New PdfExtractor
extractor.BindPdf(msPDF)

Dim msPDFText As New MemoryStream()

extractor.ExtractText()
extractor.GetText(msPDFText)

@CoastalCIU

We were able to observe the issue in our environment while using Aspose.PDF for .NET 20.7. We also tried using following code snippet and faced similar exception:

TextAbsorber ta = new TextAbsorber();
Document pdfDocument = new Document(dataDir + "project_manual.pdf");
pdfDocument.Pages.Accept(ta);

Therefore, we have logged an issue as PDFNET-48571 in our issue tracking system. We will further look into its details and keep you posted with its rectification status. Please be patient and spare us some time.

We are sorry for the inconvenience.

Thank you very much for your help! This is a production issue and any assistance is appreciated.

@CoastalCIU

We will surely resolve the issue which has recently been logged. However, it will be resolved/investigated on first come first serve basis. We will let you know as soon as we have additional updates regarding ticket resolution.

The issues you have found earlier (filed as PDFNET-48571) have been fixed in Aspose.PDF for .NET 24.2.