Hang in PdfExtractor.ExtractText and TextDevice.Process

Hi,

We recently moved from Pdf.Kit to Pdf. However, there are some files that cause hangs in the new Pdf, that did not with regard to Pdf.Kit. I have attached a sample PDF file.There seems to be two different functions that cause the hang, both used for text extraction. Only #2 was available for Pdf.Kit, but it worked, and extracted the text. When opened with Abobe Reader, there is an alert prompt, but it otherwise loads. Regardless, I am not expecting a hang! We have multiple files that cause this problem.

1)
Aspose.Pdf.Document pdfDoc = new Aspose.Pdf.Document(file, "");
foreach (Aspose.Pdf.Page pdfPage in pdfDoc.Pages)
{
using (MemoryStream textStream = new MemoryStream())
{
Aspose.Pdf.Devices.TextDevice textDevice = new Aspose.Pdf.Devices.TextDevice();
Aspose.Pdf.Text.TextOptions.TextExtractionOptions textExtOptions = new Aspose.Pdf.Text.TextOptions.TextExtractionOptions(Aspose.Pdf.Text.TextOptions.TextExtractionOptions.TextFormattingMode.Raw);
textDevice.ExtractionOptions = textExtOptions;
textDevice.Process(pdfPage, textStream);//hangs
}
}

2)
Aspose.Pdf.Facades.PdfExtractor pdfEx = new Aspose.Pdf.Facades.PdfExtractor();
pdfEx.BindPdf(file);
pdfEx.ExtractText();//hangs

Thanks,

Daniel

Hi Daniel,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thank you for sharing the template file and sample code.

I am able to reproduce your mentioned issue after an initial test. Your issue has been registered in our issue tracking system with issue id: PDFNEWNET-33150. We will notify you via this forum thread regarding any update against your reported issue.

Sorry for inconvenience,

Hi,

I wanted to let you know that a hang also occurs, with at least one other file, in the various "open" functions (the entry-point into the file). The problem is not limited to the two functions, ExtractText and TextDevice.Process, as initially thought. I am not able to share that PDF with you, but I hope that your team can see, once the bug as reported is fixed, whether the "open" functions could also have caused a hang, for the same reason (e.g. there's a shared function that was the problem). We're looking to see whether we can find a file that exhibits this problem, that can be shared. Like the intial report, the hang does not occur with Aspose.Pdf.Kit, where the file is processed without any problems. Examples that hang (for the PDF I cannot share):

Document pdfDoc = new Aspose.Pdf.Document(file, "");//hangs

Aspose.Pdf.Facades.PdfExtractor pdfEx = new Aspose.Pdf.Facades.PdfExtractor();
pdfEx.BindPdf(file);//hangs

Thanks,

Daniel

Hi Daniel,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thank you for the details.

However, it would be better if you can share the Pdf file which is causing the problem at your end. This will help us identify the issue soon as it may be caused by some other reason then your last reported issue. If your file is confidential, please make this post private (by checking ‘Keep this post Private’ checkbox while replying to this post). This way only you and Aspose Staff members will be able to see your share file.

Thank You & Best Regards,

Thanks. Unfortunately, the file is from a client, and therefore, we're not able to share it, at all. Once the original issue is fixed, I'll test the file with the new release, and hopefully, it won't hang.

Thanks,
Daniel

Hi,

I have same problem and want to know, when bug will be fixed.
It’s unlucky, that function hangs, so it’s not easy to act. Maybe function should throw an exception …

Do you need pdf file?

Christian

Hi Christian,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Please share your template file with us as it may be possible that you are facing the issue due to some other reason. It will be better if we can test your issue and identify the cause using your template PDF file. Also, just to inform you, we have released the new version of Aspose.Pdf for .NET v6.8. You can download and try it at your end and see if you are still facing the problem.

Sorry for the inconvenience,

see attached file.

Greetings
Christian

Hi Christian,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

I checked your template file with the latest version of Aspose.Pdf for .NET v6.8 and it works fine. You can download and try it at your end and see if you are still facing the problem, please share your sample code with us. This will help us identify the issue.

Sorry for the inconvenience,

Thanks, that fixed hang problem.

The issues you have found earlier (filed as PDFNEWNET-33150) have been fixed in Aspose.Pdf for .NET 7.7.0.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.