We have purchased a developer OEM license for this product, but I am unable to use it in any production scenario due to performance and stability issues with text extraction.
i.e.
page.Accept(textFragmentAbsorber);
or
textFragmentAbsorber.visit(page)
Files that are processed successfully by both Adobe Reader, and by PDFTron/PDFnet - cause Out of memory exceptions with Aspose (see attachment)
Please elaborate if the issue occurs with every file you are working with, or if the exception is thrown for specific PDF files. Please share a narrowed down code snippet along with source PDF files so that we may investigate further to help you out.
Please also share a .zip project file along with all necessary resources to compare and explain performance issues you are noticing with our API.
I have worked with the data shared by you and have been able to reproduce below issues. Following tickets have been logged in our issue management system for further investigation and resolution.
PDFNET-44331: 9781139882019 - fails on page 3 - with a Null Reference Exception PDFNET-44332: 9781292249117 - crushingly slow and OOME PDFNET-44333: Performance and memory consumption
However, you can avoid the problem with 9781447969662.pdf file by using below code snippet in your environment.
PdfFileInfo info = new PdfFileInfo(path + "9781447969662.pdf");
if (info.IsPdfFile)
{
//Your Code Here
}
Please keep the files in your Google Drive, with link sharing on, for our reference. The issue IDs have been linked with this thread so that you will receive notifications as soon as the issues are resolved.