I have been having issues parsing large PDFs (200 - 400) MB. In this case a series of textbooks that have a lot of images baked in.
This issue is difficult to produce. According to the stack I have (attached) I am stuck in FileStream called by Aspose.
I used the simple Aspose.Pdf.Document(pdfPath)to create the pdfDocument
TextAbsorber textAbsorber = new TextAbsorber();
textAbsorber.ExtractionOptions.FormattingMode = Aspose.Pdf.Text.TextOptions.TextExtractionOptions.TextFormattingMode.Raw;
this.pdfDocument.Pages[pageOffsetPlusOne].Accept(textAbsorber);
return textAbsorber.Text;
While this is in operation sometime memory usage gets very very high. See memory_leak*.png attached.
As of right now I do not have permission to host this file. If that changes I will see what I can do. I have attached a transcript of chat I had with Tilal Ahmad about this issue as well.
Anyone else have these issues? I noticed these issue initially in 8.7.0. Just as test I moved onto 9.6.0 and have yet to have the issue. However, testing is still in early phases and, like I said above, it is difficult to reproduce. What I would really like is way to set a timeout one the Accept(TextAbsorber), if that is possible, with an exception/indication that a timeout occurred.
I do still have some older ghostscript code in the mix reading from the file. Does that cause any known issues?
Thanks
As of right now I do not have permission to host this file. If that changes I will see what I can do. I have attached a transcript of chat I had with Tilal Ahmad about this issue as well.
Anyone else have these issues? I noticed these issue initially in 8.7.0. Just as test I moved onto 9.6.0 and have yet to have the issue. However, testing is still in early phases and, like I said above, it is difficult to reproduce. What I would really like is way to set a timeout one the Accept(TextAbsorber), if that is possible, with an exception/indication that a timeout occurred.
I do still have some older ghostscript code in the mix reading from the file. Does that cause any known issues?
Thanks