Open PDF Document Programmatically using Aspose.PDF

Hi All,

I am using Aspose .NET for PDF, version 17.7. On my C# application.
It opens all PDFs normally, except 1 file I have which takes very very long to open, and CPU usage goes high.

I tried upgrading to latest Aspose .NET for PDF version 18.9.1, and the same problem persists.

The PDF is here:

Does anyone have same problem with this file, or know a possible cause?

Thank you

@hanishawa

Thanks for contacting support.

We have tested the scenario in an environment i.e. Windows 10 EN x64 with 8GB RAM, Aspose.PDF for .NET 18.9.1, Console Application, Framework 4.0, Debug Mode x64 and were unable to notice the issue which you have mentioned. Would you please share your environment details and also please try to execute your program in x64 mode of Debug so that it can have access to full memory installed in your system.

Hi,

I am using your same setup except Visual Studio 2015, .NET framework 4.6 and C# 6.0.

@hanishawa

Thanks for your response.

Would you please share a sample console application which is able to replicate the issue in any environment. We will again test the scenario in our environment and address it accordingly.

Hi Hereby a sample project:

  1. the freezing happens at PdfView_test.cs:119
  2. at PdfView_test.cs:112 you need to supply a license file in the current directory
    3)i Debug the program, and gave it as argument the trouble PDF from my first post. (under project properties -> debug -> command line arguments

@hanishawa

We were unable to find any attachment with your post. Please make sure to upload ZIP archive of your sample project while posting in thread.

@hanishawa

Thanks for sharing sample console application.

We have tried to run the application which you have share and faced an exception shown in the image below:

Ecofont.png (4.3 KB)

It seems that the project contains some unnecessary references and not able to resolve them in different environment. Would you please share a narrowed down and simple console application able to run in any environment and reproduce the issue which you are facing. We will again test test the scenario in our environment and address it accordingly.

Hi Asad,
I cannot see your file, I get the message:
Sorry, this file is private. Only visible to topic owner and staff members.

@hanishawa

Please download the screenshot from this link and please share a simple console application which is able to run without any dependency and reproduce the issue.

(post withdrawn by author, will be automatically deleted in 24 hours unless flagged)

@hanishawa

We are testing the scenario and will update you with our feedback in a while.

@hanishawa

Thanks for being patient.

We were able to observe the issue in our environment and logged it as PDFNET-45452 in our issue tracking system. We will further investigate the issue and keep you posted with the status of its rectification. Please be patient and spare us little time.

We are sorry for the inconvenience.

Hi Asad,
Thanks for your effort. Let us know as soon as cause is found

@hanishawa

We will surely let you know in case of further updates regarding issue resolution. However, please note that the issue is logged under free support model where issues are resolved on first come first serve basis and there is already large number of pending issues in the queue. In case this issue is urgent for you, you may please consider escalating its priority using paid support model where issues have high priority.

Hi Asad,

Thanks for update. I know we are using the free support model, and we will wait for resolution. But we would like to know an estimate when this issue will be resolved. The reason is we are pressed with time with our clients.

thank you

@knightrider

Thanks for contacting support.

I am afraid that we may not be able to share any reliable ETA at the moment as there is a long queue of the issues which are pending to be investigated and were logged previously. Your issue will definitely be investigated and resolved however, we will be able to share some definite updates about resolution ETA once it is investigated on its schedule. We greatly appreciate your patience and comprehension in this regard. Please spare us little time.

We are sorry for the inconvenience.

@knightrider @hanishawa

Thank you for being patient.

We have investigated the issue and found no actual bugs in the performance and memory consumption of Aspose.PDF. Two main reasons were causing low performance and high memory consumption for the document.

  • The first ‘ITF-TTF Manual.pdf’ is a very large document. It has about of 82 MB of compressed data that includes 450 pages, about of 24500 text fragments and a lot of raster and vector images.
  • The second reason is that the code is not optimized. The point is that simple code samples work well for small documents but they are less effective in the case of large documents.

By default TextFragmentAbsorber stores all found text fragments in the memory. TextFragment is not a lightweight object. It contains reference for page with their resources. But there is no necessity in storing all text fragments at once in the most of document processing scenarios.
TextFragmentAbsorber processes pages sequentially and any changes of TextFragment object of one page has no influence on other pages. So we recommend to use absorber.Reset(); after processing each page.

Moreover, your code performs only reading operations. Therefore memory of Page objects may be freed after reading necessary information about text. You can use page.FreeMemory(); for that.

Thus the recommended code will be following:

Document pdfDocument = new Document(myDir + @"ITF-TTF Manual.pdf");
TextFragmentAbsorber absorber = new TextFragmentAbsorber(new TextEditOptions(TextEditOptions.FontReplace.RemoveUnusedFonts));
foreach (Page page in pdfDocument.Pages)
{
    page.Accept(absorber);

    //Read something from fragments
    count += absorber.TextFragments.Count;

    absorber.Reset();
    page.FreeMemory();
    //GC.Collect();        
}

On our test system it consumes less than 500 MB of memory and takes about 55 seconds with Aspose.PDF for .NET 19.5. In our opinion, these values are adequate for this document.

Additionally you may charge .NET garbage collector to collect freed objects after processing each page by adding GC.Collect(); into the loop. It will decrease maximum memory consumption to about 250 MB by the cost of additional 10 seconds of processing time.