Extract text from PDF - Issues

QwerTech · July 1, 2014, 1:33am

Hi Team,

I use Aspose.Pdf version 6.9.0.0. And when you try to get the text from the PDF document gets an exception.

Here is the code snippet,

using (var ms = new MemoryStream(data))

{

var pdfDocument = new Aspose.Pdf.Document(ms);

//create TextAbsorber object to extract text

TextAbsorber textAbsorber;

textAbsorber = new TextAbsorber();

pdfDocument.Pages.Accept(textAbsorber);

}

Exception message:

Item has already been added. Key in dictionary: "61.68" key is added: '61, 68 '

Stack trace:

в System.Collections.SortedList.Add(Object key, Object value)

в . . (ArrayList )

в . ..ctor(ArrayList )

в . . ()

в Aspose.Pdf.Text.TextAbsorber. ( , Boolean )

в Aspose.Pdf.Text.TextAbsorber.Visit(Page page)

в Aspose.Pdf.Page.Accept(TextAbsorber visitor)

в Star.Salut.DataStruct.ModelEntityManager.LargeObjectManager.UpdateFilesTextData() в

tilal.ahmad · July 1, 2014, 9:50pm

Hi Pavel,

Thanks for your inquiry. We have test your sample code with a sample document and Aspose.Pdf for .NET 9.3.0. I am afraid we are unable to replicate the issue. Please share your sample document here, we will test the scenario and will provide you more information accordingly.

We are sorry for the inconvenience caused.

Best Regards,