Document.Pages Odd Behavior

Hi,

We are facing an issue with Aspose.PDF (v18.5 licenced) Document class. When running the code below the Pages collection can return odd results. With a specific PDF file (which can be provided) that has 18 pages it will report that it has 18 pages but the collection will show one of the following depending on the surrounding code base (not been able to pin point what causes each result):

  1. Enumerates only 4 pages with the internal collection showing 18 items of which 4 are populated and the others are null.
  2. Enumerates only 4 pages with the internal collection showing 18 items of which all are populated.
  3. Enumerates only 4 pages with the internal collection showing 18 items of which all are null.
  4. Enumerates only 4 pages then locks and never returns with the memory growing to 4 to 5 GB after a few minutes.

Scenario 4 is the one we have in our main code base which is causing big issues as we are unable to stop the runaway memory growth which is degrading the service.

Code Snippet:

using (var doc = new Document(fromFileName))
{
	//If this line is included then the doc.Pages internal results shows 18 items populated.
	//With it commented out the internal results shows 18 items with 4 populated.
	doc.Save(toFileName, SaveFormat.Pdf);
					
	foreach (Page page in doc.Pages)
	{
		//Do some stuff
	}
}

When looking at a VS diagnostics it appears the largest memory usage is for Aspose.Slides (also referenced along with other Aspose products) when scenario 4 occurs. Although Aspose.Slides has not directly been used during the running of code when this is happening.

Thanks

@pjb_maintology

Thanks for contacting support.

Would you please make sure that you are setting license for each Aspose API separately before using it. If so is the case, please share your sample PDF document with us so that we can test the scenario in our environment and address it accordingly. Also, please try latest version as well before sharing your feedback as it is recommended to use latest version of the API always, because of improved performance and enhancements.

Hi,
I have tried with the latest version and have the same result. Our main product uses the licence for all Aspose components. I have done some further testing in a standalone console app (see code below) and can reproduce the issue reliably by including the doc.Save() method.

The PDF file can be found here: https://1drv.ms/b/s!ApigC0Pns5l66ljj4SbLBKxvY6jJ


using System;

namespace PDFTest
{
	class Program
	{
		static void Main(string[] args)
		{
			var pdfLicense = new Aspose.Pdf.License();
			pdfLicense.SetLicense("Aspose.Total.lic");

			var fileName = "Test.pdf";
			var tempFileName = "temp.pdf";
			if (System.IO.File.Exists(tempFileName)) System.IO.File.Delete(tempFileName);


			using (var doc = new Aspose.Pdf.Document(fileName))
			{
				//Removing this line will cause all pages to be processed as expected.
				//With it in causes the pages to be null and locks the thread at page 6/7.
				doc.Save(tempFileName, Aspose.Pdf.SaveFormat.Pdf);

				Console.WriteLine($"{fileName} - Document has {doc.Pages.Count} pages.");
				int pageNumber = 0;
				foreach (Aspose.Pdf.Page page in doc.Pages)
				{
					pageNumber++;
					if (page == null)
					{
						Console.WriteLine($"{fileName} - Page {pageNumber} is null");
						continue;
					}

					Console.WriteLine($"{fileName} - Page {pageNumber} is not null");
				}
			}


			Console.WriteLine("Press any key to exit");
			Console.ReadKey();
		}
	}
}

@pjb_maintology

Thanks for sharing sample PDF document.

We have tested the scenario and were able to observe the API behavior which you have mentioned.

Please note that Document.Save() method functions more like disposing of Document object. Once this method is called, allocated resources of initialized document to memory get disposed. Since you were extracting pages from same Document which was saved before, the issue was occurring. In other words, program kept trying accessing lost resources (in your case Pages information/collection) in the background. This phenomenon can also be observed by putting a break point right after save method and add watch on Document object.

In order to prevent the issue, you need to re-initialize the object and final code snippet would be as follows:

using (var doc = new Aspose.Pdf.Document(fileName))
{
 //Removing this line will cause all pages to be processed as expected.
 //With it in causes the pages to be null and locks the thread at page 6/7.
 doc.Save(tempFileName, Aspose.Pdf.SaveFormat.Pdf);
 var doc1 = new Document(fileName); // add breakpoint here and watch above doc object
 Console.WriteLine($"{fileName} - Document has {doc1.Pages.Count} pages.");
 int pageNumber = 0;
 foreach (Aspose.Pdf.Page page in doc1.Pages)
 {
  pageNumber++;
  if (page == null)
  {
     Console.WriteLine($"{fileName} - Page {pageNumber} is null");
     continue;
  }

  Console.WriteLine($"{fileName} - Page {pageNumber} is not null");
 }
}

18Pages.png (6.1 KB)

In case of further assistance, please feel free to let us know.