Free Support Forum - aspose.com

Aspose.Pdf crashes when trying to get content from this pdf

Hello,

The Pdf module crashes with this error :

System.IndexOutOfRangeException: Index was outside the bounds of the array.
at ..(„ , Int32 , , , Int32 )
at ..(Int32 )
at .Š.(Int32 )
at .Š.( )
at .‚.(š str, Int32 beginCharIndex, Int32 endCharIndex, Double fontSize, Double& width, Double& height, ˜[] charMetricsToFill, Boolean& isHorizontal)
at .‚.(š str, Int32 beginCharIndex, Int32 endCharIndex, Double fontSize, ˜[]& charMetrics)
at .ˆ.(š , Int32& , Int32 , Int32 , Boolean )
at .‰.(Int32 , Int32 , Boolean )
at .ÂŽ..ctor(ArrayList )
at ..(String , Boolean )
at Aspose.Pdf.Text.TextAbsorber.Visit(Page page)
at Aspose.Pdf.Facades.PdfExtractor.ExtractText(Encoding encoding)



Here is the code

public void ExtractContent(string filename, ExtractedContentBuilder builder)
{
var pdfReader = new PdfFileInfo(filename);
builder.SetTitle(pdfReader.Title);

var pdfExtractor = new PdfExtractor();
pdfExtractor.BindPdf(filename);
pdfExtractor.ExtractTextMode = 1;
pdfExtractor.StartPage = 0;
pdfExtractor.EndPage = pdfReader.NumberOfPages;
pdfExtractor.ExtractText(Encoding.UTF8);

byte[] contentBytes;
using (var stream = new MemoryStream())
{
stream.SetLength(0);
stream.Position = 0;
pdfExtractor.GetText(stream);
contentBytes = stream.ToArray();
}

string content = Encoding.UTF8.GetString(contentBytes);
builder.AddContent(content, 0, 0);

}

Hi Yassine,


Thanks for contacting support.

I have tested the scenario using Aspose.Pdf for .NET 7.4.0 over Windows 7 X64 and I am unable to notice any issue. Please take a look over the attached document which contains the extracted contents from PDF file. Please try using the latest release version and in case you encounter any issue or you have any further query, please feel free to contact.

We are sorry for this inconvenience.

Hi,

I am getting similar exception

System.IndexOutOfRangeException: Index was outside the bounds of the array.
at ..(ž )
at ..()
at ..(a , œ )
at ..()
at ...ctor( )
at Aspose.Pdf.Text.TextAbsorber.Visit(Page page)
at Aspose.Pdf.PageCollection.Accept(TextAbsorber visitor)

Initially we got this error for 6 files but after restarting application 4 files got prcoessed successfully, but still getting exception with other two files

Thanks,

Ashvin

Hi Ashvin,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thank you for using our product.

Please share your template PDF files and sample code with us to reproduce the issue. We will check them and get back to you soon.

Sorry for the inconvenience,